|Easily use the power of regular expressions in R with RegexBuddy.
Create and analyze regex patterns with RegexBuddy's intuitive regex building blocks. Implement regexes in your applications with instant R code snippets. Just tell RegexBuddy what you want to achieve, and copy and paste the auto-generated R code. Get your own copy of RegexBuddy now.
The R Project for Statistical Computing provides five regular expression functions in its base package. All these functions support three regular expression flavors. You have two parameters called extended and perl at your disposal to indicate the flavor you want.
If you omit these parameters, extended is TRUE, and perl is FALSE. Then the default flavor, GNU Extended Regular Expressions, is used. R's documentation says it implements the POSIX standard for regular expressions, but actually it uses the GNU regex library, which is an extension of POSIX. If you set both parameters to FALSE, the GNU Basic Regular Expressions are used. Despite their names, GNU ERE and GNU BRE actually implement the same limited set of features. Only the syntax is slightly different.
For maximum regex functionality, set the perl parameter to TRUE. The extended parameter is then ignored. This tells R to use the PCRE regular expressions library.
The grep function takes your regex as the first argument, and the input vector as the second argument. Use the 3rd argument to make the regex case insensitive (TRUE) or case sensitive (FALSE). Arguments 4 and 5 are the extended and perl arguments to select the regex flavor. The 6th argument is the value parameter. If you set it to FALSE or omit it, grep returns a new vector with the indices of the elements in the input vector that could be (partially) matched by the regular expression. If you set value to TRUE, then grep returns a vector with copies of the actual elements in the input vector that could be (partially) matched.
> grep("a+", c("abc", "def", "cba a", "aa"), value=FALSE)  1 3 4 > grep("a+", c("abc", "def", "cba a", "aa"), value=TRUE)  "abc" "cba a" "aa"
The regexpr function takes the same arguments as the grep function, except for the value argument, which is not supported. regexpr returns an integer vector with the same length as the input vector. Each element in the returned vector indicates the character position in each corresponding string element in the input vector at which the (first) regex match was found. A match at the start of the string is indicated with character position 1. If the regex could not find a match in a certain string, its corresponding element in the result vector is -1. The returned vector also has a match.length attribute. This is another integer vector with the number of characters in the (first) regex match in each string, or -1 for strings that didn't match.
gregexpr is the same as regexpr, except that it finds all matches in each string. It returns a vector with the same length as the input vector. Each element is another vector, with one element for each match found in the string indicating the character position at which that match was found. Each vector element in the returned vector also has a match.length attribute with the lengths of all matches. If no matches could be found in a particular string, the element in the returned vector is still a vector, but with just one element -1.
> regexpr("a+", c("abc", "def", "cba a", "aa"))  1 -1 3 1 attr(,"match.length")  1 -1 1 2 > gregexpr("a+", c("abc", "def", "cba a", "aa")) []  1 attr(,"match.length")  1 []  -1 attr(,"match.length")  -1 []  3 5 attr(,"match.length")  1 1 []  1 attr(,"match.length")  2
The sub function has three required parameters: a string with the regular expression, a string with the replacement text, and the input vector. Use the 4th argument to make the regex case insensitive (TRUE) or case sensitive (FALSE). Arguments 5 and 6 are the extended and perl arguments to select the regex flavor.
sub returns a new vector with the same length as the input vector. If a regex match could be found in a string element, it is replaced with the replacement text. Only the first match in each string element is replaced. If no matches could be found in some strings, those are copied into the result vector unchanged.
Use gsub instead of sub to replace all regex matches in all the string elements in your vector. Other than replacing all matches, gsub works in exactly the same way, and takes exactly the same arguments.
You can use the backreferences \1 through \9 in the replacement text to reinsert text matched by a capturing group. There is no replacement text token for the overall match. Place the entire regex in a capturing group and then use \1.
> sub("(a+)", "z\\1z", c("abc", "def", "cba a", "aa"))  "zazbc" "def" "cbzaz a" "zaaz" > gsub("(a+)", "z\\1z", c("abc", "def", "cba a", "aa"))  "zazbc" "def" "cbzaz zaz" "zaaz"
Did this website just save you a trip to the bookstore? Please make a donation to support this site, and you'll get a lifetime of advertisement-free access to this site!
Page URL: http://www.Regular-Expressions.info/rlanguage.html
Page last updated: 22 September 2010
Site last updated: 18 April 2013
Copyright © 2003-2013 Jan Goyvaerts. All rights reserved.
|Languages & Libraries|
|Visual Basic 6|
|XQuery & XPath|
|Tools and Languages|
|About This Site|
|RSS Feed & Blog|