Quick Start
Tutorial
Tools & Languages
Examples
Reference
Book Reviews
Replacement Text Tutorial
Introduction
Characters
Non-Printable Characters
Matched Text
Backreferences
Match Context
Case Conversion
Conditionals
More on This Site
Introduction
Regular Expressions Quick Start
Regular Expressions Tutorial
Replacement Strings Tutorial
Applications and Languages
Regular Expressions Examples
Regular Expressions Reference
Replacement Strings Reference
Book Reviews
Printable PDF
About This Site
RSS Feed & Blog

Numbered Backreferences

If your regular expression has named or numbered capturing groups, then you can reinsert the text matched by any of those capturing groups in the replacement text. Your replacement text can reference as many groups as you like, and can even reference the same group more than once. This makes it possible to rearrange the text matched by a regular expression in many different ways. As a simple example, the regex \*(\w+)\* matches a single word between asterisks, storing the word in the first (and only) capturing group. The replacement text <b>\1</b> replaces each regex match with the text stored by the capturing group between bold tags. Effectively, this search-and-replace replaces the asterisks with bold tags, leaving the word between the asterisks in place. This technique using backreferences is important to understand. Replacing *word* as a whole with <b>word</b> is far easier and far more efficient than trying to come up with a way to correctly replace the asterisks separately.

The \1 syntax for backreferences in the replacement text is borrowed from the syntax for backreferences in the regular expression. \1 through \9 are supported by the JGsoft applications, Delphi, Perl (though deprecated), Python, Ruby, PHP, R, Boost, and Tcl. Double-digit backreferences \10 through \99 are supported by the JGsoft applications, Delphi, Python, and Boost. If there are not enough capturing groups in the regex for the double-digit backreference to be valid, then all these flavors treat \10 through \99 as a single-digit backreference followed by a literal digit. The flavors that support single-digit backreferences but not double-digit backreferences also do this.

$1 through $99 for single-digit and double-digit backreferences are supported by the JGsoft applications, Delphi, .NET, Java, JavaScript, VBScript, PCRE2, PHP, Boost, std::regex, and XPath. These are also the variables that hold text matched by capturing groups in Perl. If there are not enough capturing groups in the regex for a double-digit backreference to be valid, then $10 through $99 are treated as a single-digit backreference followed by a literal digit by all these flavors except .NET, Perl, PCRE2, and std::regex..

Putting curly braces around the digit ${1} isolates the digit from any literal digits that follow. This works in the JGsoft applications, Delphi, .NET, Perl, PCRE2, PHP, Boost, and XRegExp.

Named Backreferences

If your regular expression has named capturing groups, then you should use named backreferences to them in the replacement text. The regex (?'name'group) has one group called “name”. You can reference this group with ${name} in the JGsoft applications, Delphi, .NET, PCRE2, Java 7, and XRegExp. PCRE2 also supports $name without the curly braces. In Perl 5.10 and later you can interpolate the variable $+{name}. Boost too uses $+{name} in replacement strings. ${name} does not work in any version of Perl. $name is unique to PCRE2.

In Python, if you have the regex (?P<name>group) then you can use its match in the replacement text with \g<name>. This syntax also works in the JGsoft applications and Delphi. Python and the JGsoft applications, but not Delphi, also support numbered backreferences using this syntax. In Python this is the only way to have a numbered backreference immediately followed by a literal digit.

PHP and R support named capturing groups and named backreferences in regular expressions. But they do not support named backreferences in replacement texts. You’ll have to use numbered backreferences in the replacement text to reinsert text matched by named groups. To determine the numbers, count the opening parentheses of all capturing groups (named and unnamed) in the regex from left to right.

Backreferences to Non-Existent Capturing Groups

An invalid backreference is a reference to a number greater than the number of capturing groups in the regex or a reference to a name that does not exist in the regex. Such a backreference can be treated in three different ways. Delphi, Perl, Ruby, PHP, R, Boost, std::regex, XPath, and Tcl substitute the empty string for invalid backreferences. Java, XRegExp, PCRE2, and Python treat them as a syntax error. JavaScript (without XRegExp) and .NET treat them as literal text.

The original JGsoft flavor replaced invalid backreferences with the empty string. But JGsoft V2 treats them as a syntax error. Applications using the V2 flavor all apply syntax coloring to replacement strings, highlighting invalid backreferences in red.

Backreferences to Non-Participating Capturing Groups

A non-participating capturing group is a group that did not participate in the match attempt at all. This is different from a group that matched an empty string. The group in a(b?)c always participates in the match. Its contents are optional but the group itself is not optional. The group in a(b)?c is optional. It participates when the regex matches abc, but not when the regex matches ac.

In most applications, there is no difference between a backreference in the replacement string to a group that matched the empty string or a group that did not participate. Both are replaced with an empty string. Two exceptions are Python and PCRE2. They do allow backreferences in the replacement string to optional capturing groups. But the search-and-replace will return an error code in PCRE2 if the capturing group happens not to participate in one of the regex matches. The same situation raises an exception in Python 3.4 and prior. Python 3.5 no longer raises the exception.

Backreference to The Highest-Numbered Group

In the JGsoft applications and Delphi, $+ inserts the text matched by the highest-numbered group that actually participated in the match. In Perl 5.18, the variable $+ holds the same text. When (a)(b)|(c)(d) matches ab, $+ is substituted with b. When the same regex matches cd, $+ inserts d. \+ does the same in the JGsoft applications, Delphi, and Ruby.

In .NET, VBScript, and Boost $+ inserts the text matched by the highest-numbered group, regardless of whether it participated in the match or not. If it didn’t, nothing is inserted. In Perl 5.16 and prior, the variable, the variable $+ holds the same text. When (a)(b)|(c)(d) matches ab, $+ is substituted with the empty string. When the same regex matches cd, $+ inserts d.

Boost 1.42 added additional syntax of its own invention for either meaning of highest-numbered group. $^N, $LAST_SUBMATCH_RESULT, and ${^LAST_SUBMATCH_RESULT} all insert the text matched by the highest-numbered group that actually participated in the match. $LAST_PAREN_MATCH and ${^LAST_PAREN_MATCH} both insert the text matched by the highest-numbered group regardless of whether participated in the match.