
| Collect your own regular expression library with RegexBuddy. RegexBuddy's regular expression library includes all the examples on this website, plus many more. Easily edit any of the regexes or create your own. Build your own personal regular expression library. It'll often come in handy and save you time when searching through files on your computer, writing applications or scripts, or processing text or data. Get your own copy of RegexBuddy now. |
Often, you want to match complete lines in a text file rather than just the part of the line that satisfies a certain requirement. This is useful if you want to delete entire lines in a search-and-replace in a text editor, or collect entire lines in an information retrieval tool.
To keep this example simple, let's say we want to match lines containing the word "John". The regex John makes it easy enough to locate those lines. But the software will only indicate John as the match, not the entire line containing the word.
The solution is fairly simple. To specify that we need an entire line, we will use the caret and dollar sign and turn on the option to make them match at embedded newlines. In software aimed at working with text files like EditPad Pro and PowerGREP, the anchors always match at embedded newlines. To match the parts of the line before and after the match of our original regular expression John, we simply use the dot and the star. Be sure to turn off the option for the dot to match newlines.
The resulting regex is: ^.*John.*$. You can use the same method to expand the match of any regular expression to an entire line, or a block of complete lines. In some cases, such as when using alternation, you will need to group the original regex together using round brackets.
If a line can meet any out of series of requirements, simply use alternation in the regular expression. ^.*\b(one|two|three)\b.*$
matches a complete line of text that contains any of the words "one", "two" or "three". The first backreference will contain the word the line actually contains. If it contains more than one of the words, then the last (rightmost) word will be captured into the first backreference. This is because the star is greedy. If we make the first star lazy, like in ^.*?\b(one|two|three)\b.*$, then the backreference will contain the first (leftmost) word.
If a line must satisfy all of multiple requirements, we need to use lookahead. ^(?=.*?\bone\b)(?=.*?\btwo\b)(?=.*?\bthree\b).*$
matches a complete line of text that contains all of the words "one", "two" and "three". Again, the anchors must match at the start and end of a line and the dot must not match line breaks. Because of the caret, and the fact that lookahead is zero-width, all of the three lookaheads are attempted at the start of the each line. Each lookahead will match any piece of text on a single line (.*?) followed by one of the words. All three must match successfully for the entire regex to match. Note that instead of words like \bword\b, you can put any regular expression, no matter how complex, inside the lookahead. Finally, .*$ causes the regex to actually match the line, after the lookaheads have determined it meets the requirements.
If your condition is that a line should not contain something, use negative lookahead. ^((?!regexp).)*$
matches a complete line that does not match regexp. Notice that unlike before, when using positive lookahead, I repeated both the negative lookahead and the dot together. For the positive lookahead, we only need to find one location where it can match. But the negative lookahead must be tested at each and every character position in the line. We must test that regexp fails everywhere, not just somewhere.
Finally, you can combine multiple positive and negative requirements as follows: ^(?=.*?\bmust-have\b)(?=.*?\bmandatory\b)((?!avoid|illegal).)*$
. When checking multiple positive requirements, the .* at the end of the regular expression full of zero-width assertions made sure that we actually matched something. Since the negative requirement must match the entire line, it is easy to replace the .* with the negative test.
Did this website just save you a trip to the bookstore? Please make a donation to support this site, and you'll get a lifetime of advertisement-free access to this site!
Page URL: http://www.Regular-Expressions.info/completelines.html
Page last updated: 17 June 2009
Site last updated: 01 July 2009
Copyright © 2003-2009 Jan Goyvaerts. All rights reserved.
| Examples |
| Examples |
| Numeric Ranges |
| Floating Point Numbers |
| Email Addresses |
| Valid Dates |
| Credit Card Numbers |
| Matching Complete Lines |
| Deleting Duplicate Lines |
| Programming |
| Two Near Words |
| Pitfalls |
| Catastrophic Backtracking |
| Making Everything Optional |
| Repeated Capturing Group |
| Mixing Unicode & 8-bit |
| More Information |
| Introduction |
| Quick Start |
| Tutorial |
| Tools and Languages |
| Examples |
| Books |
| Reference |
| Print PDF |
| About This Site |
| RSS Feed & Blog |
| PowerGREP 3 |
| Use regular expressions to search through large numbers of text and binary files, such as source code, correspondence, server or system logs, reference texts, archives, etc. Quickly find the files you are looking for, or extract the information you need. Look through just a handful of files, or thousands of files and folders. |
| Perform comprehensive text and binary replacement operations for easy maintenance of websites, source code, reports, etc. Preview replacements before modifying files, and stay safe with flexible backup and undo options. |
| Work with plain text files, Unicode files, binary files, files stored in zip archives, and even MS Word documents, Excel spreadsheets and PDF files. Runs on Windows 98, ME, NT4, 2000, XP & Vista. |
| More information |
| Download PowerGREP now |