Replacement Text Tutorial |
Introduction |
Characters |
Non-Printable Characters |
Matched Text |
Backreferences |
Match Context |
Case Conversion |
Conditionals |
The most basic replacement string consists only of literal characters. The replacement replacement simply replaces each regex match with the text replacement.
Because we want to be able to do more than simply replace each regex match with the exact same text, we need to reserve certain characters for special use. In most replacement text flavors, two characters tend to have special meanings: the backslash \ and the dollar sign $. Whether and how to escape them depends on the application you’re using. In some applications, you always need to escape them when you want to use them as literal characters. In other applications, you only need to escape them if they would form a replacement text token with the character that follows.
In the JGsoft flavor and Delphi, you can use a backslash to escape the backslash and the dollar, and you can use a dollar to escape the dollar. \\ replaces with a literal backslash, while \$ and $$ replace with a literal dollar sign. You only need to escape them to suppress their special meaning in combination with other characters. In \! and $!, the backslash and dollar are literal characters because they don’t have a special meaning in combination with the exclamation point. You can’t and needn’t escape the exclamation point or any other character except the backslash and dollar, because they have no special meaning in JGsoft and Delphi replacement strings.
In .NET, JavaScript, VBScript, XRegExp, PCRE2, and std::regex you can escape the dollar sign with another dollar sign. $$ replaces with a literal dollar sign. XRegExp and PCRE2 require you to escape all literal dollar signs. They treat unescaped dollar signs that don’t form valid replacement text tokens as errors. In .NET, JavaScript (without XRegExp), and VBScript you only need to escape the dollar sign to suppress its special meaning in combination with other characters. In $\ and $!, the dollar is a literal character because it doesn’t have a special meaning in combination with the backslash or exclamation point. You can’t and needn’t escape the backslash, exclamation point, or any other character except dollar, because they have no special meaning in .NET, JavaScript, VBScript, and PCRE2 replacement strings.
In Java, an unescaped dollar sign that doesn’t form a token is an error. You must escape the dollar sign with a backslash or another dollar sign to use it as a literal character. $! is an error because the dollar sign is not escaped and has no special meaning in combination with the exclamation point. A backslash always escapes the character that follows. \! replaces with a literal exclamation point, and \\ replaces with a single backslash. A single backslash at the end of the replacement text is an error.
In Python and Ruby, the dollar sign has no special meaning. You can use a backslash to escape the backslash. You only need to escape the backslash to suppress its special meaning in combination with other characters. In \!, the backslash is a literal character because it doesn’t have a special meaning in combination with the exclamation point. You can’t and needn’t escape the exclamation point or any other character except the backslash, because they have no special meaning in Python and Ruby replacement strings. An unescaped backslash at the end of the replacement text, however, is an error in Python but a literal backslash in Ruby.
In PHP’s preg_replace, you can use a backslash to escape the backslash and the dollar. \\ replaces with a literal backslash, while \$ replaces with a literal dollar sign. You only need to escape them to suppress their special meaning in combination with other characters. In \!, the backslash is a literal character because it doesn’t have a special meaning in combination with the exclamation point. You can’t and needn’t escape the exclamation point or any other character except the backslash and dollar, because they have no special meaning in PHP replacement strings.
In Boost, a backslash always escapes the character that follows. \! replaces with a literal exclamation point, and \\ replaces with a single backslash. A single backslash at the end of the replacement text is ignored. An unescaped dollar sign is a literal dollar sign if it doesn’t form a replacement string token. You can escape dollar signs with a backslash or with another dollar sign. So $, $$, and \$ all replace with a single dollar sign.
In R, the dollar sign has no special meaning. A backslash always escapes the character that follows. \! replaces with a literal exclamation point, and \\ replaces with a single backslash. A single backslash at the end of the replacement text is ignored.
In Tcl, the ampersand & has a special meaning, and must be escaped with a backslash if you want a literal ampersand in your replacement text. You can use a backslash to escape the backslash. You only need to escape the backslash to suppress its special meaning in combination with other characters. In \!, the backslash is a literal character because it doesn’t have a special meaning in combination with the exclamation point. You can’t and needn’t escape the exclamation point or any other character except the backslash and ampersand, because they have no special meaning in Tcl replacement strings. An unescaped backslash at the end of the replacement text is a literal backslash.
In XPath, an unescaped backslash is an error. An unescaped dollar sign that doesn’t form a token is also an error. You must escape backslashes and dollars with a backslash to use them as literal characters. The backslash has no special meaning other than to escape another backslash or a dollar sign.
Perl is a special case. Perl doesn’t really have a replacement text syntax. So it doesn’t have escape rules for replacement texts either. In Perl source code, replacement strings are simply double-quoted strings. What looks like backreferences in replacement text are really interpolated variables. You could interpolate them in any other double-quoted string after a regex match, even when not doing a search-and-replace.
The rules in the previous section explain how the search-and-replace functions in these programming languages parse the replacement text. If your application receives the replacement text from user input, then the user of your application would have to follow these escape rules, and only these rules. You may be surprised that characters like the single quote and double quote are not special characters. That is correct. When using a regular expression or grep tool like PowerGREP or the search-and-replace function of a text editor like EditPad Pro, you should not escape or repeat the quote characters like you do in a programming language.
If you specify the replacement text as a string constant in your source code, then you have to keep in mind which characters are given special treatment inside string constants by your programming language. That is because those characters are processed by the compiler, before the replacement text function sees the string. So Java, for example, to replace all regex matches with a single dollar sign, you need to use the replacement text \$, which you need to enter in your source code as "\\$". The Java compiler turns the escaped backslash in the source code into a single backslash in the string that is passed on to the replaceAll() function. That function then sees the single backslash and the dollar sign as an escaped dollar sign.
See the tools and languages section of this website for more information on how to use replacement strings in various programming languages.
| Quick Start | Tutorial | Tools & Languages | Examples | Reference | Book Reviews |
| Introduction | Characters | Non-Printable Characters | Matched Text | Backreferences | Match Context | Case Conversion | Conditionals |
Page URL: https://www.regular-expressions.info/replacecharacters.html
Page last updated: 12 August 2021
Site last updated: 16 August 2024
Copyright © 2003-2024 Jan Goyvaerts. All rights reserved.