Every year, an edition of the ECMAScript Language Specification is released with the new proposals that are officially ready. In practical terms, the proposals are attached to the latest expected edition when they are accepted and reached stage 4 in the TC39 process:
In this article, we’re going to examine and explain the "String.prototype.matchAll" proposal that has been reached stage 4 and belongs to ECMAScript 2020 – the 11th edition.
The content is available as a video as well:
Motivation
Capturing groups in regular expressions are multiple characters that are enclosed within parentheses and treated as a single unit.
This means that setting a quantifier at the end of the parentheses allows referring to the entire grouped characters (and not just the single leading character).
When involving the String.prototype.match
method against a regular expression containing capturing groups, the result is typically an array (would be null
if no matches are found) that might be produced with different content – depending on whether we use the g
flag or not.
Let’s demonstrate the difference between the results.
All Matches without Capturing Groups
To begin with, we execute the method against a global regular expression that’s built from multiple groups:
const regex = /t(e)(st(\d?))/g;
const result = 'test1test2'.match(regex);
console.log(result); // ['test1', 'test2']
As it stands, the result doesn’t contain the matching capturing groups at all – but rather the entire set of characters that are matched. For instance, "e" and "st1" are strings matching the capturing groups – so we would expect them to be contained as well.
That is, the capturing groups are ignored. 🤷🏻♂️
The First Match with Capturing Groups
This time we execute against the same regex without using the g
flag:
const regex = /t(e)(st(\d?))/;
const result = 'test1test2'.match(regex);
console.log(result); // ['test1', 'e', 'st1', '1', index: 0, input: "test1test2", groups: undefined]
Well, the multiple capturing groups indeed considered in the result – but it refers only to the first match. 🤦🏻♂️
Notice that the result isn’t just an array of plain strings – it actually has additional properties: index
, input
and groups
. Later, we’ll name these objects "matching values".
Having said that, what if we’d like to combine both use-cases above so that:
- We retrieve results for all matches considering the capturing groups
- We retrieve them simply without getting complicated
How do we make it?
The Proposal
The proposal specifies a new String.prototype.matchAll
method that addresses the case in question.
Here’s the official definition out of the specification:
Performs a regular expression match of the String representing the this value against regexp and returns an iterator. Each iteration result’s value is an Array object containing the results of the match, or null if the String did not match.
We can understand from the definition that:
- The method returns an iterator representing results of the matches
- The method returns
null
if there are no matches - All results of each match are contained
Put it simply, the results of each match (namely "matching values") truly consider capturing groups and are accessible through the returned iterator.
Next, we’re going to introduce several practical usages for the method.
All Matches with Capturing Groups
Starting with executing the String.prototype.matchAll
method against our regex:
const regex = /t(e)(st(\d?))/g;
const result = 'test1test2'.matchAll(regex);
console.log(result); // [RegExpStringIterator]
The method merely takes a regex, like String.prototype.match
does, but differs in the result type. Importantly, only global regular expressions are acceptable – which means, we must use the g
flag to avoid getting a TypeError
.
Here we explore the result:
const regex = /t(e)(st(\d?))/g;
const result = 'test1test2'.matchAll(regex);
const resultAsArray = [...result];
console.log(resultAsArray[0]); // ["test1", "e", "st1", "1", index: 0, input: "test1test2", groups: undefined]
console.log(resultAsArray[1]); // ["test2", "e", "st2", "2", index: 5, input: "test1test2", groups: undefined]
Since we already know that the result is an iterator, we spread it into resultAsArray
to easily access the matching values.
The first matching value is absolutely identical to the result in the case of String.prototype.match
without the g
flag. The cool thing, however, is having an additional matching value – which refers to the second match and also considers its capturing groups!
Thereby, instead of messing with complicated solutions to combine the global matching results with capturing groups – the method provides this combination simply and natively. 💪🏻
Iterating the Matches
So far, the common practice to iterate all the matching values was using a while
loop:
const regex = /t(e)(st(\d?))/g;
let matchingValue;
while ((matchingValue = regex.exec('test1test2')) !== null) {
console.log(matchingValue);
}
// Output:
// ["test1", "e", "st1", "1", index: 0, input: "test1test2", groups: undefined]
// ["test2", "e", "st2", "2", index: 5, input: "test1test2", groups: undefined]
Now with String.prototype.matchAll
, we can benefit from the returned iterator to implement conveniently:
const regex = /t(e)(st(\d?))/g;
const result = 'test1test2'.matchAll(regex);
for (const matchingValue of result) {
console.log(matchingValue);
}
The output obviously remains the same.
Also, we can destructure the iterator to access a specific matching value:
const regex = /t(e)(st(\d?))/g;
const result = 'test1test2'.matchAll(regex);
const [matchingValue, matchingValue2] = result;
It’s worth mentioning that both practices above exhaust the iterator (we reached the last value) – meaning, we need to have another non-consumed iterator by reinvoking the method in order to iterate or destructure (which actually iterates indirectly) again.
Hence, another way to go is by creating an array to be iterated repeatedly:
const regex = /t(e)(st(\d?))/g;
const str = 'test1test2';
// Using spread operator
const array = [...str.matchAll(regex)];
// Using `from` method
const array2 = Array.from(str.matchAll(regex));
Thus it’s possible to iterate the matching values as we please without reinvoking the method.
The truth is that in the majority of cases we could settle for an iterator, and that’s the primary reason behind the design decision to always return an iterator instead of an array.
Other than that, there is a matter of performance – the iterator performs the matching for each invocation of its next
method. Practically this means that if we decide to break the loop in the middle or destructure only part of the values, the next matches wouldn’t be performed.
By contrast, if the result was an array – all the matching values would be collected beforehand without exception. It’s definitely significant when there are tons of potential matches and/or capturing groups.
Anyway, on the whole, the point is we’re not restricted. The iterator provides the versatility to iterate, destructure or transform by choice to an array – depending on our use-case.
Summary
We explained today the primary reason motivating the "String.prototype.matchAll" proposal and also introduced concrete usages to the method.
Let’s recap:
- The proposal belongs to ECMAScript 2020, which is the 11th edition
- When using
String.prototype.match
withg
flag, the result doesn’t consider capturing groups - When using
String.prototype.match
withoutg
flag, the result considers the capturing groups but refers only to the first match - The proposal specifies new method called
String.prototype.matchAll
String.prototype.matchAll
returns an iterator representing the matches and allowing to iterate, destructure or transform to an array if necessaryString.prototype.matchAll
returnsnull
if there are no matchesString.prototype.matchAll
considers all matches including capturing groups in a simple usageString.prototype.matchAll
throws aTypeError
when using non-global regular expression- After exhausting the iterator, we need to reinvoke
String.prototype.matchAll
to iterate once more
Here’s attached a project with the examples: