Quick Start
Tutorial
Tools & Languages
Examples
Reference
Book Reviews
RegexBuddy Instantly and accurately analyze and test any regular expression with RegexBuddy. Save yourself the headache of figuring out the differences between regex flavors. RegexBuddy knows all the regex features and regex flavors listed in this reference. RegexBuddy also explains and emulates many more subtle differences and idiosyncrasies that are omitted here to keep this reference readable. Get your own copy of RegexBuddy now

Regular Expression Unicode Syntax Reference

This reference page explains what the Unicode tokens do when used outside character classes. All of these except \X can also be used inside character classes. Inside a character class, these tokens add the characters that they normally match to the character class.

FeatureSyntaxDescriptionExample JGsoft .NET Java Perl PCRE PHP Delphi R JavaScript VBScript XRegExp Python Ruby std::regex Tcl ARE POSIX BRE POSIX ERE GNU BRE GNU ERE Oracle XML XPath
Grapheme \X Matches a single Unicode grapheme, whether encoded as a single code point or multiple code points using combining marks. A grapheme most closely resembles the everyday concept of a "character". \X matches à encoded as U+0061 U+0300, à encoded as U+00E0, ©, etc. YESnonoYES5.05.0.5YESYESnononono2.0nonononononononono
Code point \uFFFF where FFFF are 4 hexadecimal digits Matches a specific Unicode code point. \u00E0 matches à encoded as U+00E0 only. \u00A9 matches © YESYESYESnononononoYESYESYES3.3
2.4 string
1.9ECMAYESnonononononono
Code point \u{FFFF} where FFFF are 1 to 4 hexadecimal digits Matches a specific Unicode code point. \u{E0} matches à encoded as U+00E0 only. \u{A9} matches © nononononononononononono1.9nonononononononono
Code point \xFFFF where FFFF are 4 hexadecimal digits Matches a specific Unicode code point. \x00E0 matches à encoded as U+00E0 only. \x00A9 matches © nononononononononononononostring8.4–8.5nonononononono
Code point \x{FFFF} where FFFF are 1 to 4 hexadecimal digits Matches a specific Unicode code point. \x{E0} matches à encoded as U+00E0 only. \x{A9} matches © YESno7YESYESYESYESYESnononononononononononononono
Unicode category \pL where L is a Unicode category Matches a single Unicode code point in the specified Unicode category. \pL matches à encoded as U+00E0; \pS matches © YESnoYESYES5.05.0.5YESYESnononononononononononononono
Unicode category \PL where L is a Unicode category Matches a single Unicode code point that is not in the specified Unicode category. \PS matches à encoded as U+00E0; \PL matches © YESnoYESYES5.05.0.5YESYESnononononononononononononono
Unicode category \p{L} where L is a Unicode category Matches a single Unicode code point in the specified Unicode category. \p{L} matches à encoded as U+00E0; \p{S} matches © YESYESYESYES5.05.0.5YESYESnonoYESno1.9nononononononoYESYES
Unicode category \p{IsL} where L is a Unicode category Matches a single Unicode code point in the specified Unicode category. \p{IsL} matches à encoded as U+00E0; \p{IsS} matches © YESnoYESYESnononononononononononononononononono
Unicode category \p{Category} Matches a single Unicode code point in the specified Unicode category. \p{Letter} matches à encoded as U+00E0; \p{Symbol} matches © YESnonoYESnonononononoYESno1.9nonononononononono
Unicode category \p{IsCategory} Matches a single Unicode code point in the specified Unicode category. \p{IsLetter} matches à encoded as U+00E0; \p{IsSymbol} matches © YESnonoYESnononononononononononononononononono
Unicode script \p{Script} Matches a single Unicode code point that is part of the specified Unicode script. Each Unicode code point is part of exactly one script. Scripts never contain unassigned code points. \p{Greek} matches Ω YESnonoYES6.55.1.3YESYESnonoYESno1.9nonononononononono
Unicode script \p{IsScript} Matches a single Unicode code point that is part of the specified Unicode script. Each Unicode code point is part of exactly one script. Scripts never contain unassigned code points. \p{IsGreek} matches Ω YESno7YESnononononononononononononononononono
Unicode block \p{Block} Matches a single Unicode code point that is part of the specified Unicode block. Each Unicode code point is part of exactly one block. Blocks may contain unassigned code points. \p{Arrows} matches any of the code points from U+2190 until U+21FF ( until ) YESnonoYESnononononononononononononononononono
Unicode block \p{InBlock} Matches a single Unicode code point that is part of the specified Unicode block. Each Unicode code point is part of exactly one block. Blocks may contain unassigned code points. \p{InArrows} matches any of the code points from U+2190 until U+21FF ( until ) YESnoYESYESnonononononoYESno2.0nonononononononono
Unicode block \p{IsBlock} Matches a single Unicode code point that is part of the specified Unicode block. Each Unicode code point is part of exactly one block. Blocks may contain unassigned code points. \p{IsArrows} matches any of the code points from U+2190 until U+21FF ( until ) YESYESnoYESnonononononononononononononononoYESYES
Negated Unicode property \P{Property} Matches a single Unicode code point that does not have the specified property (category, script, or block). \P{L} matches © YESYESYESYES5.05.0.5YESYESnonoYESno1.9nononononononoYESYES
Negated Unicode property \p{^Property} Matches a single Unicode code point that does not have the specified property (category, script, or block). \p{^L} matches © YESnonoYES5.05.0.5YESYESnonoYESno1.9nonononononononono
Unicode property \P{^Property} Matches a single Unicode code point that does have the specified property (category, script, or block). Double negative is taken as positive. \P{^L} matches q nononoYES5.05.0.5YESYESnononono1.9nonononononononono
FeatureSyntaxDescriptionExample JGsoft .NET Java Perl PCRE PHP Delphi R JavaScript VBScript XRegExp Python Ruby std::regex Tcl ARE POSIX BRE POSIX ERE GNU BRE GNU ERE Oracle XML XPath

Make a Donation

Did this website just save you a trip to the bookstore? Please make a donation to support this site, and you'll get a lifetime of advertisement-free access to this site! Credit cards, PayPal, and Bitcoin gladly accepted.

Regex Reference
Introduction
Table of Contents
Quick Reference
Characters
Basic Features
Character Classes
Anchors
Word Boundaries
Quantifiers
Unicode
Capturing Groups & Backreferences
Named Groups & Backreferences
Special Groups
Recursion & Balancing Groups
Replacement Reference
Characters
Matched Text & Backreferences
Context & Case Conversion
More on This Site
Introduction
Regular Expressions Quick Start
Regular Expressions Tutorial
Replacement Strings Tutorial
Applications and Languages
Regular Expressions Examples
Regular Expressions Reference
Replacement Strings Reference
Book Reviews
Printable PDF
About This Site
RSS Feed & Blog