Every year, an edition of the ECMAScript Language Specification is released with the new proposals that are officially ready. In practical terms, the proposals are attached to the latest expected edition when they are accepted and reached stage 4 in the TC39 process:
In this article, we’re going to examine and explain the “String.prototype.matchAll” proposal that has been reached stage 4 and belongs to ECMAScript 2020 - the 11th edition.
The content is available as a video as well:
Capturing groups in regular expressions are multiple characters that are enclosed within parentheses and treated as a single unit. This means that setting a quantifier at the end of the parentheses allows referring to the entire grouped characters (and not just the single leading character).
When involving the
String.prototype.match method against a regular expression containing capturing groups, the result is typically an array (would be
null if no matches are found) that might be produced with different content - depending on whether we use the
g flag or not.
Let’s demonstrate the difference between the results.
All Matches without Capturing Groups
To begin with, we execute the method against a global regular expression that’s built from multiple groups:
As it stands, the result doesn’t contain the matching capturing groups at all - but rather the entire set of characters that are matched. For instance, “e” and “st1” are strings matching the capturing groups - so we would expect them to be contained as well.
That is, the capturing groups are ignored. 🤷🏻♂️
The First Match with Capturing Groups
This time we execute against the same regex without using the
Well, the multiple capturing groups indeed considered in the result - but it refers only to the first match. 🤦🏻♂️
Notice that the result isn’t just an array of plain strings - it actually has additional properties:
groups. Later, we’ll name these objects “matching values”.
Having said that, what if we’d like to combine both use-cases above so that:
- We retrieve results for all matches considering the capturing groups
- We retrieve them simply without getting complicated
How do we make it?
The proposal specifies a new
String.prototype.matchAll method that addresses the case in question.
Here’s the official definition out of the specification:
Performs a regular expression match of the String representing the this value against regexp and returns an iterator. Each iteration result’s value is an Array object containing the results of the match, or null if the String did not match.
We can understand from the definition that:
- The method returns an iterator representing results of the matches
- The method returns
nullif there are no matches
- All results of each match are contained
Put it simply, the results of each match (namely “matching values”) truly consider capturing groups and are accessible through the returned iterator.
Next, we’re going to introduce several practical usages for the method.
All Matches with Capturing Groups
Starting with executing the
String.prototype.matchAll method against our regex:
The method merely takes a regex, like
String.prototype.match does, but differs in the result type. Importantly, only global regular expressions are acceptable - which means, we must use the
g flag to avoid getting a
Here we explore the result:
Since we already know that the result is an iterator, we spread it into
resultAsArray to easily access the matching values.
The first matching value is absolutely identical to the result in the case of
String.prototype.match without the
g flag. The cool thing, however, is having an additional matching value - which refers to the second match and also considers its capturing groups!
Thereby, instead of messing with complicated solutions to combine the global matching results with capturing groups - the method provides this combination simply and natively. 💪🏻
Iterating the Matches
So far, the common practice to iterate all the matching values was using a
String.prototype.matchAll, we can benefit from the returned iterator to implement conveniently:
The output obviously remains the same.
Also, we can destructure the iterator to access a specific matching value:
It’s worth mentioning that both practices above exhaust the iterator (we reached the last value) - meaning, we need to have another non-consumed iterator by reinvoking the method in order to iterate or destructure (which actually iterates indirectly) again.
Hence, another way to go is by creating an array to be iterated repeatedly:
Thus it’s possible to iterate the matching values as we please without reinvoking the method.
The truth is that in the majority of cases we could settle for an iterator, and that’s the primary reason behind the design decision to always return an iterator instead of an array.
Other than that, there is a matter of performance - the iterator performs the matching for each invocation of its
next method. Practically this means that if we decide to break the loop in the middle or destructure only part of the values, the next matches wouldn’t be performed.
By contrast, if the result was an array - all the matching values would be collected beforehand without exception. It’s definitely significant when there are tons of potential matches and/or capturing groups.
Anyway, on the whole, the point is we’re not restricted. The iterator provides the versatility to iterate, destructure or transform by choice to an array - depending on our use-case.
We explained today the primary reason motivating the “String.prototype.matchAll” proposal and also introduced concrete usages to the method.
- The proposal belongs to ECMAScript 2020, which is the 11th edition
- When using
gflag, the result doesn’t consider capturing groups
- When using
gflag, the result considers the capturing groups but refers only to the first match
- The proposal specifies new method called
String.prototype.matchAllreturns an iterator representing the matches and allowing to iterate, destructure or transform to an array if necessary
nullif there are no matches
String.prototype.matchAllconsiders all matches including capturing groups in a simple usage
TypeErrorwhen using non-global regular expression
- After exhausting the iterator, we need to reinvoke
String.prototype.matchAllto iterate once more
Here’s attached a project with the examples: