|Easily create and understand regular expressions today. |
Compose and analyze regex patterns with RegexBuddy's easy-to-grasp regex blocks and intuitive regex tree, instead of or in combination with the traditional regex syntax. Developed by the author of this website, RegexBuddy makes learning and using regular expressions easier than ever. Get your own copy of RegexBuddy now
This regular expression tutorial teaches you every aspect of regular expressions. Each topic assumes you have read and understood all previous topics. So if you are new to regular expressions, I recommend you read the topics in the order presented.
The introduction indicates the scope of the tutorial and which regex flavors will be discussed. It also introduces basic terminology.
The simplest regex consists of only literal characters. Certain characters have special meanings in a regex and have to be escaped. Escaping rules may get a bit complicated when using regexes in software source code. How to enter non-printable characters.
First look at the internals of the regular expression engine's internals. Later topics will build on this information. Knowing the engine's internals will greatly help you to craft regexes that match what you intended, and not match what you do not want.
A character class or character set matches a single character out of several possible characters, consisting of individual characters and/or ranges of characters. A negated character class matches a single character not in the character class. Shorthand character classes allow you to use common sets quickly.
The dot matches any character, though usually not line break characters unless you change an option.
Anchors are zero-width. They do not match any characters, but rather a position. The caret and the dollar sign match at the start and the end of the string. Depending on your regex flavor and its options, they can match at the start and the end of a line as well.
Word boundaries are like anchors, but match at the start of a word and the end of a word. However, most regex flavors define the concept of a "word" differently than your English teacher in grade school.
By separating different sub-regexes with vertical bars, you can tell the regex engine to attempt them from left to right, and return success as soon as one of them can be matched.
Putting a question mark after an item tells the regex engine to match the item if possible, but continue anyway (rather than admit defeat) if it cannot be matched.
Three styles of operators, the star, the plus, and curly braces, allow you to repeat an item zero or more times, once or more, or an arbitrary number of times. It is important to understand that these quantifiers are "greedy" by default, unless you explicitly make them "lazy".
By placing round brackets around part of the regex, you tell the engine to treat that part as a single item when applying operators such as quantifiers. With round brackets, you can also create backreferences that allow you to reuse the text matched by part of the regex inside the regular expression, or later in the replacement text of a search and replace operation. Backreferences are also very useful for extracting parts from a string in a programming language.
If your regular expression flavor supports Unicode, then you can use special Unicode regex tokens to match specific Unicode characters, or to match any character that has a certain Unicode property or is part of a particular Unicode script or block.
Change matching modes such as "case insensitive" for specific parts of the regular expression.
Nested quantifiers can cause an exponentially increasing amount of backtracking that brings the regex engine to a grinding halt. Atomic grouping and possessive quantifiers provide a solution.
Lookahead and lookbehind (collectively lookaround) are zero-width. With positive lookaround, you can specify multiple requirements (sub-regexes) to be applied to the same part of the string. With negative lookaround, you can invert the result of a regex match (i.e. match something that does not match something else).
Forcing a regex match to start at the end of a previous match provides an efficient way to parse text data.
A conditional is a special construct that will first evaluate a lookaround, and then execute one sub-regex if the lookaround succeeds, and another sub-regex if the lookaround fails.
XML Schema regular expressions support four shorthand character classes to match XML names. They also introduce a handy feature called "character class subtraction", which is now also available in the JGsoft and .NET regex engines.
If you are using a POSIX-compliant regular expression engine, you can use POSIX bracket expressions to match locale-dependent characters.
Some regex flavors allow you to add comments to make complex regular expressions easier to understand.
Splitting a regular expression into multiple lines, adding comments and whitespace, makes it even more readable.
Did this website just save you a trip to the bookstore? Please make a donation to support this site, and you'll get a lifetime of advertisement-free access to this site!
Page URL: http://www.Regular-Expressions.info/tutorialcnt.html
Page last updated: 17 June 2009
Site last updated: 18 April 2013
Copyright © 2003-2013 Jan Goyvaerts. All rights reserved.
|Table of Contents|
|Regex Engine Internals|
|Grouping & Backreferences|
|Lookahead & Lookbehind|
|Lookaround, part 2|
|XML Character Classes|
|POSIX Bracket Expressions|
|Tools and Languages|
|About This Site|
|RSS Feed & Blog|
|PowerGREP is probably the most powerful regex-based text processing tool available today. A knowledge worker's Swiss army knife for searching through, extracting information from, and updating piles of files.|
|Use regular expressions to search through large numbers of text and binary files. Quickly find the files you are looking for, or extract the information you need. Look through just a handful of files or folders, or scan entire drives and network shares.|
|Search and replace using text, binary data or one or more regular expressions to automate repetitive editing tasks. Preview replacements before modifying files, and stay safe with flexible backup and undo options.|
|Use regular expressions to rename files, copy files, or merge and split the contents of files. Work with plain text files, Unicode files, binary files, compressed files, and files in proprietary formats such as MS Office, OpenOffice, and PDF. Runs on Windows 2000, XP, Vista, 7, and 8.|
|Download PowerGREP now|