Quick Start
Tutorial
Tools & Languages
Examples
Reference
Book Reviews
Regex Reference
Introduction
Table of Contents
Quick Reference
Characters
Basic Features
Character Classes
Shorthands
Anchors
Word Boundaries
Quantifiers
Unicode
Capturing Groups & Backreferences
Named Groups & Backreferences
Special Groups
Mode Modifiers
Recursion & Balancing Groups
Replacement Reference
Characters
Matched Text & Backreferences
Context & Case Conversion
Conditionals
More on This Site
Introduction
Regular Expressions Quick Start
Regular Expressions Tutorial
Replacement Strings Tutorial
Applications and Languages
Regular Expressions Examples
Regular Expressions Reference
Replacement Strings Reference
Book Reviews
Printable PDF
About This Site
RSS Feed & Blog
RegexBuddy—Better than a regular expression reference!

Regular Expression Unicode Syntax Reference

This reference page explains what the Unicode tokens do when used outside character classes. All of these except \X can also be used inside character classes. Inside a character class, these tokens add the characters that they normally match to the character class.

FeatureSyntaxDescriptionExampleJGsoft .NET Java Perl PCRE PCRE2 PHP Delphi R JavaScript VBScript XRegExp Python Ruby std::regex Boost Tcl ARE POSIX BRE POSIX ERE GNU BRE GNU ERE Oracle XML XPath
Grapheme \X Matches a single Unicode grapheme, whether encoded as a single code point or multiple code points using combining marks. A grapheme most closely resembles the everyday concept of a “character”. \X matches à encoded as U+0061 U+0300, à encoded as U+00E0, ©, etc. YESno9YES5.0YES5.0.5YESYESnononono2.0noECMA
extended
egrep
awk
nononononononono
Code point \uFFFF where FFFF are 4 hexadecimal digits Matches a specific Unicode code point. \u00E0 matches à encoded as U+00E0 only. \u00A9 matches © YESYESYESnonononononoYESYESYES3.3
2.4 string
1.9ECMAnoYESnonononononono
Code point \u{FFFF} where FFFF are 1 to 4 hexadecimal digits Matches a specific Unicode code point. \u{E0} matches à encoded as U+00E0 only. \u{A9} matches © V2nonononono7.0.0 stringnononono3no1.9nononononononononono
Code point \xFFFF where FFFF are 4 hexadecimal digits Matches a specific Unicode code point. \x00E0 matches à encoded as U+00E0 only. \x00A9 matches © nonononononononononononononostringno8.4–8.5nonononononono
Code point \x{FFFF} where FFFF are 1 to 4 hexadecimal digits Matches a specific Unicode code point. \x{E0} matches à encoded as U+00E0 only. \x{A9} matches © YESno7YESYESYESYESYESYESnonononononoECMA
extended
egrep
awk
nononononononono
Unicode category \pL where L is a Unicode category Matches a single Unicode code point in the specified Unicode category. \pL matches à encoded as U+00E0; \pS matches © YESnoYESYES5.0YES5.0.5YESYESnono3nononononononononononono
Unicode category \PL where L is a Unicode category Matches a single Unicode code point that is not in the specified Unicode category. \PS matches à encoded as U+00E0; \PL matches © YESnoYESYES5.0YES5.0.5YESYESnono3nononononononononononono
Unicode category \p{L} where L is a Unicode category Matches a single Unicode code point in the specified Unicode category. \p{L} matches à encoded as U+00E0; \p{S} matches © YESYESYESYES5.0YES5.0.5YESYESnonoYESno1.9nonononononononoYESYES
Unicode category \p{IsL} where L is a Unicode category Matches a single Unicode code point in the specified Unicode category. \p{IsL} matches à encoded as U+00E0; \p{IsS} matches © YESnoYESYESnononononononononononononononononononono
Unicode category \p{Category} Matches a single Unicode code point in the specified Unicode category. \p{Letter} matches à encoded as U+00E0; \p{Symbol} matches © YESnonoYESnononononononoYESno1.9nononononononononono
Unicode category \p{IsCategory} Matches a single Unicode code point in the specified Unicode category. \p{IsLetter} matches à encoded as U+00E0; \p{IsSymbol} matches © YESnonoYESnononononononononononononononononononono
Unicode script \p{Script} Matches a single Unicode code point that is part of the specified Unicode script. Each Unicode code point is part of exactly one script. Scripts never contain unassigned code points. \p{Greek} matches Ω YESnonoYES6.5YES5.1.3YESYESnonoYESno1.9nononononononononono
Unicode script \p{IsScript} Matches a single Unicode code point that is part of the specified Unicode script. Each Unicode code point is part of exactly one script. Scripts never contain unassigned code points. \p{IsGreek} matches Ω YESno7YESnononononononononononononononononononono
Unicode block \p{Block} Matches a single Unicode code point that is part of the specified Unicode block. Each Unicode code point is part of exactly one block. Blocks may contain unassigned code points. \p{Arrows} matches any of the code points from U+2190 until U+21FF ( until ) YESnonoYESnononononononononononononononononononono
Unicode block \p{InBlock} Matches a single Unicode code point that is part of the specified Unicode block. Each Unicode code point is part of exactly one block. Blocks may contain unassigned code points. \p{InArrows} matches any of the code points from U+2190 until U+21FF ( until ) YESnoYESYESnonononononono2–4no2.0nononononononononono
Unicode block \p{IsBlock} Matches a single Unicode code point that is part of the specified Unicode block. Each Unicode code point is part of exactly one block. Blocks may contain unassigned code points. \p{IsArrows} matches any of the code points from U+2190 until U+21FF ( until ) YESYESnoYESnonononononononononononononononononoYESYES
Negated Unicode property \P{Property} Matches a single Unicode code point that does not have the specified property (category, script, or block). \P{L} matches © YESYESYESYES5.0YES5.0.5YESYESnonoYESno1.9noECMA
extended
egrep
awk
nonononononoYESYES
Negated Unicode property \p{^Property} Matches a single Unicode code point that does not have the specified property (category, script, or block). \p{^L} matches © YESnonoYES5.0YES5.0.5YESYESnonoYESno1.9nononononononononono
Unicode property \P{^Property} Matches a single Unicode code point that does have the specified property (category, script, or block). Double negative is taken as positive. \P{^L} matches q V2nonoYES5.0YES5.0.5YESYESnononono1.9nononononononononono
FeatureSyntaxDescriptionExampleJGsoft .NET Java Perl PCRE PCRE2 PHP Delphi R JavaScript VBScript XRegExp Python Ruby std::regex Boost Tcl ARE POSIX BRE POSIX ERE GNU BRE GNU ERE Oracle XML XPath