|Easily create and understand regular expressions today. |
Compose and analyze regex patterns with RegexBuddy's easy-to-grasp regex blocks and intuitive regex tree, instead of or in combination with the traditional regex syntax. Developed by the author of this website, RegexBuddy makes learning and using regular expressions easier than ever. Get your own copy of RegexBuddy now
Perl 5.10 introduced a new regular expression feature called a branch reset group. JGsoft V2 and PCRE 7.2 and later also support this, as do languages like PHP, Delphi, and R that have regex functions based on PCRE. Boost added them to its ECMAScript grammar in version 1.42.
Alternatives inside a branch reset group share the same capturing groups. The syntax is (?|regex) where (?| opens the group and regex is any regular expression. If you don't use any alternation or capturing groups inside the branch reset group, then its special function doesn't come into play. It then acts as a non-capturing group.
The regex (?|(a)|(b)|(c)) consists of a single branch reset group with three alternatives. This regex matches either a, b, or c. The regex has only a single capturing group with number 1 that is shared by all three alternatives. After the match, $1 holds a, b, or c.
Compare this with the regex (a)|(b)|(c) that lacks the branch reset group. This regex also matches a, b, or c. But it has three capturing groups. After the match, $1 holds a or nothing at all, $2 holds b or nothing at all, while $3 holds c or nothing at all.
Backreferences to capturing groups inside branch reset groups work like you'd expect. (?|(a)|(b)|(c))\1 matches aa, bb, or cc. Since only one of the alternatives inside the branch reset group can match, the alternative that participates in the match determines the text stored by the capturing group and thus the text matched by the backreference.
The alternatives in the branch reset group don't need to have the same number of capturing groups. (?|abc|(d)(e)(f)|g(h)i) has three capturing groups. When this regex matches abc, all three groups are empty. When def is matched, $1 holds d, $2 holds e and $3 holds f. When ghi is matched, $1 holds h while the other two are empty.
You can have capturing groups before and after the branch reset group. Groups before the branch reset group are numbered as usual. Groups in the branch reset group are numbered continued from the groups before the branch reset group, which each alternative resetting the number. Groups after the branch reset group are numbered continued from the alternative with the most groups, even if that is not the last alternative. So (x)(?|abc|(d)(e)(f)|g(h)i)(y) defines five capturing groups. (x) is group 1, (d) and (h) are group 2, (e) is group 3, (f) is group 4, and (y) is group 5.
You can use named capturing groups inside branch reset groups. If you do, you should use the same names for the groups that will get the same numbers. Otherwise you'll get undesirable behavior in Perl or Boost. PowerGREP treats mismatched group names as an error. PCRE only reliably supports named groups inside branch reset groups starting with version 8.00. This means Delphi only does so starting with XE7 and PHP starting with version 5.2.14.
(?'before'x)(?|abc|(?'left'd)(?'middle'e)(?'right'f)|g(?'left'h)i)(?'after'y) is the same as the previous regex. It names the five groups "before", "left", "middle", "right", and "after". Notice that because the 3rd alternative has only one capturing group, that must be the name of the first group in the other alternatives.
If you omit the names in some alternatives, the groups will still share the names with the other alternatives. In the regex (?'before'x)(?|abc|(?'left'd)(?'middle'e)(?'right'f)|g(h)i)(?'after'y) the group (h) is still named "left" because the branch reset group makes it share the name and number of (?'left'd).
In Perl, PCRE, and Boost, it is best to use a branch reset group when you want groups in different alternatives to have the same name. That's the only way in Perl, PCRE, and Boost to make sure that groups with the same name really are one and the same group.
In PowerGREP, groups with the same name are always treated as one and the same group. So you don't really need to use a branch reset group in PowerGREP when using named capturing groups.
It's time for a more practical example. These two regular expressions match a date in m/d or mm/dd format. They exclude invalid dates such as 2/31.
^(?:(0?|1)/(?[0-9]|3) # 31 days
| (0?|11)/(?[0-9]|30) # 30 days
| (0?2)/(?[0-9]) # 29 days
The first version uses a non-capturing group (?:…) to group the alternatives. It has six separate capturing groups. $1 and $2 would hold the month and the day for months with 31 days, $3 and $4 for months with 30 days, and $5 and $6 would only be used for February.
^(?|(0?|1)/(?[0-9]|3) # 31 days
| (0?|11)/(?[0-9]|30) # 30 days
| (0?2)/(?[0-9]) # 29 days
The second version uses a branch reset group (?|…) to group the alternatives and merge their capturing groups. Now there are only two capturing groups that are shared between the tree alternatives. When a match is found, $1 always holds the month, and 2 always holds the day, regardless of the number of days in the month.
Did this website just save you a trip to the bookstore? Please make a donation to support this site, and you'll get a lifetime of advertisement-free access to this site! Credit cards, PayPal, and Bitcoin gladly accepted.
Page URL: http://www.regular-expressions.info/branchreset.html
Page last updated: 26 January 2017
Site last updated: 06 March 2017
Copyright © 2003-2017 Jan Goyvaerts. All rights reserved.
|Table of Contents|
|Regex Engine Internals|
|Character Class Subtraction|
|Character Class Intersection|
|Shorthand Character Classes|
|Grouping & Capturing|
|Backreferences, part 2|
|Branch Reset Groups|
|Free-Spacing & Comments|
|Lookahead & Lookbehind|
|Lookaround, part 2|
|Keep Text out of The Match|
|Recursion & Capturing|
|Recursion & Backreferences|
|Recursion & Backtracking|
|POSIX Bracket Expressions|
|Regular Expressions Quick Start|
|Regular Expressions Tutorial|
|Replacement Strings Tutorial|
|Applications and Languages|
|Regular Expressions Examples|
|Regular Expressions Reference|
|Replacement Strings Reference|
|About This Site|
|RSS Feed & Blog|
|PowerGREP is probably the most powerful regex-based text processing tool available today. A knowledge worker's Swiss army knife for searching through, extracting information from, and updating piles of files.|
|Use regular expressions to search through large numbers of text and binary files. Quickly find the files you are looking for, or extract the information you need. Look through just a handful of files or folders, or scan entire drives and network shares.|
|Search and replace using text, binary data or one or more regular expressions to automate repetitive editing tasks. Preview replacements before modifying files, and stay safe with flexible backup and undo options.|
|Use regular expressions to rename files, copy files, or merge and split the contents of files. Work with plain text files, Unicode files, binary files, compressed files, and files in proprietary formats such as MS Office, OpenOffice, and PDF. Runs on Windows XP, Vista, 7, 8, 8.1, and 10.|
|Download PowerGREP now|