Replacement Text Tutorial |
Introduction |
Characters |
Non-Printable Characters |
Matched Text |
Backreferences |
Match Context |
Case Conversion |
Conditionals |
Replacement string conditionals allow you to use one replacement when a particular capturing group participated in the match and another replacement when that capturing group did not participate in the match. They are supported by JGsoft V2, Boost, and PCRE2. Boost and PCRE2 each invented their own syntax. JGsoft V2 supports both.
For conditionals to work in Boost, you need to pass regex_constants::format_all to regex_replace. For them to work in PCRE2, you need to pass PCRE2_SUBSTITUTE_EXTENDED to pcre2_substitute.
Boost’s syntax is (?1matched:unmatched) where 1 is a number between 1 and 99 referencing a numbered capturing group. matched is used as the replacement for matches in which the capturing group participated. unmatched is used for matches in which the group did not participate. The colon : delimits the two parts. If you want a literal colon in the matched part, then you need to escape it with a backslash. If you want a literal closing parenthesis anywhere in the conditional, then you need to escape that with a backslash too.
The parentheses delimit the conditional from the remainder of the replacement string. start(?1matched:unmatched)finish replaces with startmatchedfinish when the group participates and with startunmatchedfinish when it doesn’t. JGsoft V2 requires the parentheses. Boost allows you to omit the parentheses if nothing comes after the conditional in the replacement. So ?1matched:unmatched is the same as (?1matched:unmatched).
The matched and unmatched parts can be blank. You can omit the colon if the unmatched part is blank. So (?1matched:) and (?1matched) replace with matched when the group participates. They replace the match with nothing when the group does not participate.
You can use the full replacement string syntax in matched and unmatched. This means you can nest conditionals inside other conditionals. So (?1one(?2two):(?2two:none)) replaces with onetwo when both groups participate, with one or two when group 1 or 2 participates and the other doesn’t, and with none when neither group participates. With Boost ?1one(?2two):?2two:none does exactly the same but omits parentheses that aren’t needed.
The JGsoft V2 regex flavor treats conditionals that reference non-existing capturing groups as an error. If there are two digits after the question mark but not enough capturing groups for a two-digit conditional to be valid, then only the first digit is used for the conditional and the second digit is a literal. So when there are less than 12 capturing groups in the regex, (?12matched) replaces with 2matched when capturing group 1 participates in the match.
Boost treats conditionals that reference a non-existing group number as conditionals to a group that never participates in the match. So (?12twelve:not twelve) always replaces with not twelve when there are fewer than 12 capturing groups in the regex.
You can avoid the ambiguity between single digit and double digit conditionals by placing curly braces around the number. (?{1}1:0) replaces with 1 when group 1 participates and with 0 when it doesn’t, even if there are 11 or more capturing groups in the regex. (?{12}twelve:not twelve) is always a conditional that references group 12, even if there are fewer than 12 groups in the regex (which may make the conditional invalid).
The syntax with curly braces also allows you to reference named capturing groups by their names. (?{name}matched:unmatched) replaces with matched when the group “name” participates in the match and with unmatched when it doesn’t. If the group does not exist, the JGsoft V2 regex flavor treats the conditionals as an error. Boost, however, treats conditionals that reference a non-existing group name as literals. So (?{nonexisting}matched:unmatched) uses ?{nonexisting}matched:unmatched as a literal replacement.
PCRE2’s syntax is ${1:+matched:unmatched} where 1 is a number between 1 and 99 referencing a numbered capturing group. If your regex contains named capturing groups then you can reference them in a conditional by their name: ${name:+matched:unmatched}.
matched is used as the replacement for matches in which the capturing group participated. unmatched is used for matches in which the group did not participate. :+ delimits the group number or name from the first part of the conditional. The second colon delimits the two parts. If you want a literal colon in the matched part, then you need to escape it with a backslash. If you want a literal closing curly brace anywhere in the conditional, then you need to escape that with a backslash too. Plus signs have no special meaning beyond the :+ that starts the conditional, so they don’t need to be escaped.
You can use the full replacement string syntax in matched and unmatched. This means you can nest conditionals inside other conditionals. So ${1:+one${2:+two}:${2:+two:none}} replaces with onetwo when both groups participate, with one or two when group 1 or 2 participates and the other doesn’t, and with none when neither group participates.
${1:-unmatched} and ${name:-unmatched} are shorthands for ${1:+${1}:unmatched} and ${name:+${name}:unmatched}. They insert the text captured by the group if it participated in the match. They insert unmatched if the group did not participate. When using this syntax, :- delimits the group number or name from the contents of the conditional. The conditional has only one part in which colons and minus signs have no special meaning.
Both PCRE2 and JGsoft V2 treat conditionals that reference non-existing capturing groups as an error.
As explained above, you need to use backslashes to escape colons that you want to use as literals when used in the matched part of the conditional. You also need to escape literal closing parentheses (Boost) or curly braces (PCRE2) with backslashes inside conditionals.
In replacement string flavors that support conditionals, you can escape colons, parentheses, curly braces, and even question marks with backslashes to make sure they are interpreted as literals anywhere in the replacement string. But generally there is no need to.
The colon does not have any special meaning in the unmatched part or outside conditionals. So you don’t need to escape it there. The question mark does not have any special meaning if it is not followed by a digit or a curly brace. In PCRE2 it never has a special meaning. So you only need to escape question marks with backslashes if you want to use a literal question mark followed by a literal digit or curly brace as the replacement in Boost or JGsoftV2.
In the JGsoft V2 flavor, opening parentheses are part of the syntax for conditionals. The first unescaped closing parenthesis that follows it then ends the conditional. All other unescaped opening and closing parentheses are literals.
Boost always uses parentheses for grouping. An unescaped opening parenthesis always opens a group. Groups can be nested. An unescaped closing parenthesis always closes a group. An unescaped closing parenthesis that does not have a matching opening parenthesis effectively truncates the replacement string. So Boost requires you to always escape literal parentheses with backslashes.
| Quick Start | Tutorial | Tools & Languages | Examples | Reference | Book Reviews |
| Introduction | Characters | Non-Printable Characters | Matched Text | Backreferences | Match Context | Case Conversion | Conditionals |
Page URL: https://www.regular-expressions.info/replaceconditional.html
Page last updated: 12 August 2021
Site last updated: 16 August 2024
Copyright © 2003-2024 Jan Goyvaerts. All rights reserved.