
| Easily create and understand regular expressions today. Compose and analyze regex patterns with RegexBuddy's easy-to-grasp regex blocks and intuitive regex tree, instead of or in combination with the traditional regex syntax. Developed by the author of this website, RegexBuddy makes learning and using regular expressions easier than ever. Get your own copy of RegexBuddy now |
This regular expression tutorial teaches you every aspect of regular expressions. Each topic assumes you have read and understood all previous topics. So if you are new to regular expressions, I recommend you read the topics in the order presented.
The introduction indicates the scope of the tutorial and which regex flavors will be discussed. It also introduces basic terminology.
Literal Characters and Special Characters
The simplest regex consists of only literal characters. Certain characters have special meanings in a regex and have to be escaped. Escaping rules may get a bit complicated when using regexes in software source code. How to enter non-printable characters.
How a Regex Engine Works Internally
First look at the internals of the regular expression engine's internals. Later topics will build on this information. Knowing the engine's internals will greatly help you to craft regexes that match what you intended, and not match what you do not want.
Character Classes or Character Sets
A character class or character set matches a single character out of several possible characters, consisting of individual characters and/or ranges of characters. A negated character class matches a single character not in the character class. Shorthand character classes allow you to use common sets quickly.
The dot matches any character, though usually not line break characters unless you change an option.
Start of String and End of String Anchors
Anchors are zero-width. They do not match any characters, but rather a position. The caret and the dollar sign match at the start and the end of the string. Depending on your regex flavor and its options, they can match at the start and the end of a line as well.
Word boundaries are like anchors, but match at the start of a word and the end of a word. However, most regex flavors define the concept of a "word" differently than your English teacher in grade school.
By separating different sub-regexes with vertical bars, you can tell the regex engine to attempt them from left to right, and return success as soon as one of them can be matched.
Putting a question mark after an item tells the regex engine to match the item if possible, but continue anyway (rather than admit defeat) if it cannot be matched.
Repetition Using Various Quantifiers
Three styles of operators, the star, the plus, and curly braces, allow you to repeat an item zero or more times, once or more, or an arbitrary number of times. It is important to understand that these quantifiers are "greedy" by default, unless you explicitly make them "lazy".
By placing round brackets around part of the regex, you tell the engine to treat that part as a single item when applying operators such as quantifiers. With round brackets, you can also create backreferences that allow you to reuse the text matched by part of the regex inside the regular expression, or later in the replacement text of a search and replace operation. Backreferences are also very useful for extracting parts from a string in a programming language.
Unicode Characters and Properties
If your regular expression flavor supports Unicode, then you can use special Unicode regex tokens to match specific Unicode characters, or to match any character that has a certain Unicode property or is part of a particular Unicode script or block.
Change matching modes such as "case insensitive" for specific parts of the regular expression.
Atomic Grouping and Possessive Quantifiers
Nested quantifiers can cause an exponentially increasing amount of backtracking that brings the regex engine to a grinding halt. Atomic grouping and possessive quantifiers provide a solution.
Lookaround with Zero-Width Assertions, part 1 and part 2
Lookahead and lookbehind (collectively lookaround) are zero-width. With positive lookaround, you can specify multiple requirements (sub-regexes) to be applied to the same part of the string. With negative lookaround, you can invert the result of a regex match (i.e. match something that does not match something else).
Continuing from The Previous Match Attempt
Forcing a regex match to start at the end of a previous match provides an efficient way to parse text data.
Combining Positive and Negative Lookaround with Conditionals
A conditional is a special construct that will first evaluate a lookaround, and then execute one sub-regex if the lookaround succeeds, and another sub-regex if the lookaround fails.
XML Schema regular expressions support four shorthand character classes to match XML names. They also introduce a handy feature called "character class subtraction", which is now also available in the JGsoft and .NET regex engines.
If you are using a POSIX-compliant regular expression engine, you can use POSIX bracket expressions to match locale-dependent characters.
Some regex flavors allow you to add comments to make complex regular expressions easier to understand.
Splitting a regular expression into multiple lines, adding comments and whitespace, makes it even more readable.
Did this website just save you a trip to the bookstore? Please make a donation to support this site, and you'll get a lifetime of advertisement-free access to this site!
Page URL: http://www.Regular-Expressions.info/tutorialcnt.html
Page last updated: 17 June 2009
Site last updated: 05 March 2010
Copyright © 2003-2010 Jan Goyvaerts. All rights reserved.
| More Information |
| Introduction |
| Quick Start |
| Tutorial |
| Tools and Languages |
| Examples |
| Books |
| Reference |
| Print PDF |
| About This Site |
| RSS Feed & Blog |
| PowerGREP 3 |
| Use regular expressions to search through large numbers of text and binary files, such as source code, correspondence, server or system logs, reference texts, archives, etc. Quickly find the files you are looking for, or extract the information you need. Look through just a handful of files, or thousands of files and folders. |
| Perform comprehensive text and binary replacement operations for easy maintenance of websites, source code, reports, etc. Preview replacements before modifying files, and stay safe with flexible backup and undo options. |
| Work with plain text files, Unicode files, binary files, files stored in zip archives, and even MS Word documents, Excel spreadsheets and PDF files. Runs on Windows 98, ME, NT4, 2000, XP, Vista, and 7. |
| More information |
| Download PowerGREP now |