Quick Start
Tutorial
Search & Replace
Tools & Languages
Examples
Reference
Regex Tutorial
Introduction
Table of Contents
Special Characters
Non-Printable Characters
Regex Engine Internals
Character Classes
Character Class Subtraction
Character Class Intersection
Shorthand Character Classes
Dot
Anchors
Word Boundaries
Alternation
Optional Items
Repetition
Grouping & Capturing
Backreferences
Backreferences, part 2
Named Groups
Relative Backreferences
Branch Reset Groups
Free-Spacing & Comments
Unicode Characters & Properties
Mode Modifiers
Atomic Grouping
Possessive Quantifiers
Lookahead & Lookbehind
Lookaround, part 2
Lookbehind Limitations
(Non-)Atomic Lookaround
Keep Text out of The Match
Conditionals
Balancing Groups
Recursion and Subroutines
POSIX Bracket Expressions
Zero-Length Matches
Continuing Matches
Backtracking Control Verbs
Control Verb Arguments
More on This Site
Introduction
Regular Expressions Quick Start
Regular Expressions Tutorial
Replacement Strings Tutorial
Applications and Languages
Regular Expressions Examples
Regular Expressions Reference
Replacement Strings Reference
Book Reviews
Printable PDF
About This Site
RSS Feed & Blog
RegexBuddy—Better than a regular expression tutorial!

Character Class Intersection

Character class intersection is supported by Java, ICU, JGsoft V2, and by Ruby 1.9 and later. It makes it easy to match any single character that must be present in two sets of characters. The syntax for this is [class&&[intersect]]. You can use the full character class syntax within the intersected character class.

If the intersected class does not need a negating caret then Java, ICU, and Ruby allow you to omit the nested square brackets: [class&&intersect]. The JGsoft flavor requires the nested brackets. Otherwise it interprets the ampersands as literals. It treats [class&&intersect] as a character class containing only literals, just like [clas&inter].

The character class [a-z&&[^aeiuo]] matches a single letter that is not a vowel. In other words: it matches a single consonant. Without character class subtraction or intersection, the only way to do this would be to list all consonants: [b-df-hj-np-tv-z].

The character class [\p{Nd}&&[\p{IsThai}]] matches any single Thai digit. [\p{IsThai}&&[\p{Nd}]] does exactly the same.

Intersection of Multiple Classes

You can intersect the same class more than once. [0-9&&[0-6&&[4-9]]] is the same as [4-6] as those are the only digits present in all three parts of the intersection. In Java, ICU, and Ruby you can write the same regex as [0-9&&[0-6]&&[4-9]], [0-9&&[0-6&&4-9]], [0-9&&0-6&&[4-9]], or just [0-9&&0-6&&4-9]. The nested square brackets are only needed for parts of the intersection that need to be negated.

If you do not use square brackets around the right hand part of the intersection, then there is no confusion that the entire remainder of the character class is the right hand part of the intersection. If you do use the square brackets, you could write something like [0-9&&[12]56]. In Ruby and ICU, this is the same as [0-9&&1256]. It is too in Java 17 and later. But Java 16 and prior had bugs that cause it to treat this as [0-9&&56], completely ignoring the nested brackets and their contents.

The JGsoft flavor does not allow anything after the nested ]. The characters 56 in [0-9&&[12]56] are an error. This way there is no ambiguity about their meaning.

You also shouldn’t put && at the very start or very end of the regex. Ruby treats [0-9&&] and [&&0-9] as intersections with an empty class, which matches no characters at all. Java ignores leading and trailing && operators. ICU treats them as an error. The JGsoft flavor treats them as literal ampersands.

Intersection in Negated Classes

The character class [^1234&&[3456]] is both negated and intersected. In the JGsoft flavor, negation takes precedence over intersection. It reads this regex as “(not 1234) and 3456” which makes this class the same as [56], matching the digits 5 or 6. In ICU and Ruby, intersection takes precedence over negation. They read [^1234&&3456] as “not (1234 and 3456)” which is the same as [^34], matching anything except the digits 3 and 4. Java changed its mind with version 9. Java 4 to 8 give negation precedence over intersection, so this regex is equivalent to [56]. Java 9 and later give intersection precedence over negation, so this regex becomes equivalent to [^34]. This is obviously a dramatic difference. Do not intersect a negated character class if your regex needs to work with both Java 8 or prior and Java 9 or later.

If you want to negate the right hand side of the intersection then you must use square brackets. Those automatically control precedence. So Java, ICU, Ruby, and the JGsoft flavor all read [1234&&[^3456]] as “1234 and (not 3456)”. Thus this regex is the same as [12].

Notational Compatibility with Other Regex Flavors

The ampersand has no special meaning in character classes in any other regular expression flavors discussed in this tutorial. The ampersand is simply a literal, and repeating it just adds needless duplicates. All these flavors treat [1234&&3456] as identical to [&123456].

Strictly speaking, this means that the character class intersection syntax is incompatible with the majority of other regex flavors. But in practice there’s no difference, because there is no point in using two ampersands in a character class when you just want to add a literal ampersand. A single ampersand is still treated as a literal by Java, ICU, Ruby, and the JGsoft flavor.