Quick Start
Tutorial
Tools & Languages
Examples
Reference
Book Reviews
RegexBuddy Easily create and understand regular expressions today.
Compose and analyze regex patterns with RegexBuddy's easy-to-grasp regex blocks and intuitive regex tree, instead of or in combination with the traditional regex syntax. Developed by the author of this website, RegexBuddy makes learning and using regular expressions easier than ever. Get your own copy of RegexBuddy now

Free-Spacing Regular Expressions

Most modern regex flavors support a variant of the regular expression syntax called free-spacing mode. This mode allows for regular expressions that are much easier for people to read. Of the flavors discussed in this tutorial, only XML Schema and the POSIX and GNU flavors don't support it. Plain JavaScript doesn't either, but XRegExp does. The mode is usually enabled by setting an option or flag outside the regex. With flavors that support mode modifiers, you can put (?x) the very start of the regex to make the remainder of the regex free-spacing.

In free-spacing mode, whitespace between regular expression tokens is ignored. Whitespace includes spaces, tabs and line breaks. Note that only whitespace between tokens is ignored. a b c is the same as abc in free-spacing mode, but \ d and \d are not the same. The former matches  d, while the latter matches a digit. \d is a single regex token composed of a backslash and a "d". Breaking up the token with a space gives you an escaped space (which matches a space), and a literal "d".

Likewise, grouping modifiers cannot be broken up. (?>atomic) is the same as (?> ato mic ) and as ( ?>ato mic). They all match the same atomic group. They're not the same as (? >atomic). In fact, the latter will cause a syntax error. The ?> grouping modifier is a single element in the regex syntax, and must stay together. This is true for all such constructs, including lookaround, named groups, etc.

A character class is also treated as a single token. [abc] is not the same as [ a b c ]. The former matches one of three letters, while the latter matches those three letters or a space. In other words: free-spacing mode has no effect inside character classes. Spaces and line breaks inside character classes will be included in the character class. This means that in free-spacing mode, you can use or [ ] to match a single space. Use whichever you find more readable. The hexadecimal escape \x20 also works, of course.

Java, however, does not treat a character class as a single token in free-spacing mode. Java does ignore whitespace and comments inside character classes. So in Java's free-spacing mode, [abc] is identical to [ a b c ]. To add a space to a character class, you'll have to escape it with a backslash. But even in free-spacing mode, the negating caret must appear immediately after the opening bracket. [ ^ a b c ] matches any of the four characters ^, a, b or c just like [abc^] would. With the negating caret in the proper place, [^ a b c ] matches any character that is not a, b or c.

Comments in Free-Spacing Mode

Another feature of free-spacing mode is that the # character starts a comment. The comment runs until the end of the line. Everything from the # until the next line break character is ignored.

XPath and Oracle do not support comments within the regular expression, even though they have a free-spacing mode. They always treat # as a literal character.

Java is the only flavor that treats # as the start of a comment inside character classes in free-spacing mode. The comment runs until the end of the line, so you can use a ] to close a comment. All other flavors treat # as a literal inside character classes.

Putting it all together, I could clarify the regex to match a valid date by writing it across multiple lines as:

# Match a 20th or 21st century date in yyyy-mm-dd format
(19|20)\d\d                # year (group 1)
[- /.]                     # separator
(0[1-9]|1[012])            # month (group 2)
[- /.]                     # separator
(0[1-9]|[12][0-9]|3[01])   # day (group 3)

RegexBuddy makes regular expressions more readable with syntax coloring

Comments Without Free-Spacing

Many flavors also allow you to add comments to your regex without using free-spacing mode. The syntax is (?#comment) where "comment" can be whatever you want, as long as it does not contain a closing parenthesis. The regex engine ignores everything after the (?# until the first closing parenthesis.

Of the flavors discussed in this tutorial, all flavors that support comment in free-spacing mode, except Java and Tcl, also support (?#comment). The flavors that don't support comments in free-spacing mode or don't support free-spacing mode at all also don't support (?#comment).

Make a Donation

Did this website just save you a trip to the bookstore? Please make a donation to support this site, and you'll get a lifetime of advertisement-free access to this site! Credit cards, PayPal, and Bitcoin gladly accepted.

Regex Tutorial
Introduction
Table of Contents
Special Characters
Non-Printable Characters
Regex Engine Internals
Character Classes
Character Class Subtraction
Character Class Intersection
Shorthand Character Classes
Dot
Anchors
Word Boundaries
Alternation
Optional Items
Repetition
Grouping & Capturing
Backreferences
Backreferences, part 2
Named Groups
Branch Reset Groups
Free-Spacing & Comments
Unicode
Mode Modifiers
Atomic Grouping
Possessive Quantifiers
Lookahead & Lookbehind
Lookaround, part 2
Keep Text out of The Match
Conditionals
Balancing Groups
Recursion
Subroutines
Recursion & Capturing
Recursion & Backreferences
Recursion & Backtracking
POSIX Bracket Expressions
Zero-Length Matches
Continuing Matches
More on This Site
Introduction
Regular Expressions Quick Start
Regular Expressions Tutorial
Replacement Strings Tutorial
Applications and Languages
Regular Expressions Examples
Regular Expressions Reference
Replacement Strings Reference
Book Reviews
Printable PDF
About This Site
RSS Feed & Blog