Quick Start
Tutorial
Search & Replace
Tools & Languages
Examples
Reference
Unicode Regexes
Introduction
Astral Characters
Code Points and Graphemes
Unicode Categories
Unicode Scripts
Unicode Blocks
Unicode Binary Properties
Unicode Property Sets
Unicode Script Runs
Unicode Boundaries
Regex Tutorial
Introduction
Table of Contents
Special Characters
Non-Printable Characters
Regex Engine Internals
Character Classes
Character Class Subtraction
Character Class Intersection
Shorthand Character Classes
Dot
Anchors
Word Boundaries
Alternation
Optional Items
Repetition
Grouping & Capturing
Backreferences
Backreferences, part 2
Named Groups
Relative Backreferences
Branch Reset Groups
Free-Spacing & Comments
Unicode Characters & Properties
Mode Modifiers
Atomic Grouping
Possessive Quantifiers
Lookahead & Lookbehind
Lookaround, part 2
Lookbehind Limitations
(Non-)Atomic Lookaround
Keep Text out of The Match
Conditionals
Balancing Groups
Recursion
Subroutines
Infinite Recursion
Recursion & Quantifiers
Recursion & Capturing
Recursion & Backreferences
Recursion & Backtracking
POSIX Bracket Expressions
Zero-Length Matches
Continuing Matches
Backtracking Control Verbs
Control Verb Arguments
More on This Site
Introduction
Regular Expressions Quick Start
Regular Expressions Tutorial
Replacement Strings Tutorial
Applications and Languages
Regular Expressions Examples
Regular Expressions Reference
Replacement Strings Reference
Book Reviews
Printable PDF
About This Site
RSS Feed & Blog
PowerGREP—The world’s most powerful tool to flex your regex muscles!
RegexBuddy—Better than a regular expression tutorial!

Unicode Binary Properties

Binary properties are a kind of Unicode property that flavors may support. They are called binary because a code point either has the property or not. The 44 characters in the string 0123456789ABCDEFabcdef0123456789ABCDEFabcdef all have the Hex_Digit property. Every other code point does not have the property. Unassigned code points also don’t have the property. So \p{Hex_Digit} matches any of these 44 characters, while \P{Hex_Digit} matches every other code point, including unassigned ones.

The \p{Property_Name} syntax is supported by ICU, Perl, and JavaScript with /u. Ruby has supported binary properties since version 1.9. PCRE2 added support for binary properties in version 10.40. Because they are based on PCRE2, PHP adds it with version 8.2.0 and R with version 4.2.2.

A binary property can also be seen as a property set with two possible values: Yes and No. The 44 characters in the string above have the value Yes for the property Hex_Digit, while all other code points have the value No. ICU and Perl support the property set syntax for binary properties. \p{Hex_Digit=Yes} matches the 44 hex digits while \p{Hex_Digit=No} matches all other code points. You may find the latter more readable than the negated property \P{Hex_Digit}. Be mindful of double negation. \P{Hex_Digit=No} matches all code points that have Hex_Digit != No which are the 44 code points in the sample string. Thus \P{Hex_Digit=No} is identical to \p{Hex_Digit=Yes}.

Most binary properties have a long canonical name and a shorter alias. The Hex_Digit property has Hex as its alias. The Yes and No values also have Y and N as aliases. So you can write \p{Hex} or \p{Hex=Y}.

Unicode provides quite the list of binary properties. New Unicode versions regularly add new properties in support of new Unicode features. None of the flavors discussed here support all binary properties. The fact that a property exists in a certain Unicode version and that your regex flavor is based on that Unicode version doesn’t mean that the flavor supports that property. See the Unicode binary property reference for a complete list of all Unicode properties and which versions of which regex flavors support them.

| Quick Start | Tutorial | Search & Replace | Tools & Languages | Examples | Reference |

| Introduction | Astral Characters | Code Points and Graphemes | Unicode Categories | Unicode Scripts | Unicode Blocks | Unicode Binary Properties | Unicode Property Sets | Unicode Script Runs | Unicode Boundaries |

| Introduction | Table of Contents | Special Characters | Non-Printable Characters | Regex Engine Internals | Character Classes | Character Class Subtraction | Character Class Intersection | Shorthand Character Classes | Dot | Anchors | Word Boundaries | Alternation | Optional Items | Repetition | Grouping & Capturing | Backreferences | Backreferences, part 2 | Named Groups | Relative Backreferences | Branch Reset Groups | Free-Spacing & Comments | Unicode Characters & Properties | Mode Modifiers | Atomic Grouping | Possessive Quantifiers | Lookahead & Lookbehind | Lookaround, part 2 | Lookbehind Limitations | (Non-)Atomic Lookaround | Keep Text out of The Match | Conditionals | Balancing Groups | Recursion | Subroutines | Infinite Recursion | Recursion & Quantifiers | Recursion & Capturing | Recursion & Backreferences | Recursion & Backtracking | POSIX Bracket Expressions | Zero-Length Matches | Continuing Matches | Backtracking Control Verbs | Control Verb Arguments |