Quick Start
Tutorial
Search & Replace
Tools & Languages
Examples
Reference
Unicode Regexes
Introduction
Astral Characters
Code Points and Graphemes
Unicode Categories
Unicode Scripts
Unicode Blocks
Unicode Binary Properties
Unicode Property Sets
Unicode Script Runs
Unicode Boundaries
Regex Tutorial
Introduction
Table of Contents
Special Characters
Non-Printable Characters
Regex Engine Internals
Character Classes
Character Class Subtraction
Character Class Intersection
Shorthand Character Classes
Dot
Anchors
Word Boundaries
Alternation
Optional Items
Repetition
Grouping & Capturing
Backreferences
Backreferences, part 2
Named Groups
Relative Backreferences
Branch Reset Groups
Free-Spacing & Comments
Unicode Characters & Properties
Mode Modifiers
Atomic Grouping
Possessive Quantifiers
Lookahead & Lookbehind
Lookaround, part 2
Lookbehind Limitations
(Non-)Atomic Lookaround
Keep Text out of The Match
Conditionals
Balancing Groups
Recursion
Subroutines
Infinite Recursion
Recursion & Quantifiers
Recursion & Capturing
Recursion & Backreferences
Recursion & Backtracking
POSIX Bracket Expressions
Zero-Length Matches
Continuing Matches
Backtracking Control Verbs
Control Verb Arguments
More on This Site
Introduction
Regular Expressions Quick Start
Regular Expressions Tutorial
Replacement Strings Tutorial
Applications and Languages
Regular Expressions Examples
Regular Expressions Reference
Replacement Strings Reference
Book Reviews
Printable PDF
About This Site
RSS Feed & Blog
PowerGREP—The world’s most powerful tool to flex your regex muscles!
RegexBuddy—Better than a regular expression tutorial!

Unicode Property Sets

Property sets are a kind of Unicode property that flavors may support. A property set can have multiple values. Every Unicode code point is assigned exactly one value of each set. In a regular expression you specify both a set and one of its values as the Property in \p{Property}. For example, the Bidi_Class property set has Left_To_Right as one of its values. The regex \p{Bidi_Class=Left_To_Right} matches any code point that has the Left_To_Right value for the Bidi_Class property.

Strictly speaking, General_Category, Script, and Block are also property sets. But we handle those separately in this tutorial because they are supported by many more regex flavors and have alternative syntax specific to them. The only flavors to support any Unicode property sets other than these three are ICU, Perl, Ruby, and PCRE2. The Unicode property set reference lists all the property sets and all their possible values and indicates which versions of which flavors support them. Only ICU and Perl support most of the property sets. Ruby supports the Age property since Ruby 1.9 and the Grapheme_Cluster_Break property since Ruby 2.4. PCRE2 supports only the Bidi_Class property starting with version 10.40. This also applies to PHP 8.2.0 and R 4.2.2 as they are based on PCRE2.

Many flavors only support a limited number of property sets. The fact that a flavor is built on a certain version of Unicode does not mean it supports all the property sets that exist in that version of Unicode. The table below indicates which flavors support which properties sets. If a flavor supports a property set then it does support all property values that are part of that set. Those values are listed in the description. The exact code points matched by each property do depend on the Unicode version the flavor was built with.

Ad-Free Access and Printable PDF Download

If you find the content on this website helpful they you may want a copy you can read offline or even print, or browse the site as often as you want without ads. You can purchase your own copy of the Regular-Expressions.info printable PDF download. As a bonus, you'll get a lifetime of advertisement-free access to this site!

| Quick Start | Tutorial | Search & Replace | Tools & Languages | Examples | Reference |

| Introduction | Astral Characters | Code Points and Graphemes | Unicode Categories | Unicode Scripts | Unicode Blocks | Unicode Binary Properties | Unicode Property Sets | Unicode Script Runs | Unicode Boundaries |

| Introduction | Table of Contents | Special Characters | Non-Printable Characters | Regex Engine Internals | Character Classes | Character Class Subtraction | Character Class Intersection | Shorthand Character Classes | Dot | Anchors | Word Boundaries | Alternation | Optional Items | Repetition | Grouping & Capturing | Backreferences | Backreferences, part 2 | Named Groups | Relative Backreferences | Branch Reset Groups | Free-Spacing & Comments | Unicode Characters & Properties | Mode Modifiers | Atomic Grouping | Possessive Quantifiers | Lookahead & Lookbehind | Lookaround, part 2 | Lookbehind Limitations | (Non-)Atomic Lookaround | Keep Text out of The Match | Conditionals | Balancing Groups | Recursion | Subroutines | Infinite Recursion | Recursion & Quantifiers | Recursion & Capturing | Recursion & Backreferences | Recursion & Backtracking | POSIX Bracket Expressions | Zero-Length Matches | Continuing Matches | Backtracking Control Verbs | Control Verb Arguments |