|Easily use the power of regular expressions in your C/C++ applications with RegexBuddy. |
Create and analyze regex patterns with RegexBuddy's intuitive regex building blocks. Implement regexes in your applications with instant C and C++ code snippets. Just tell RegexBuddy what you want to achieve, and copy and paste the auto-generated PCRE-based C code. Get your own copy of RegexBuddy now.
PCRE is short for Perl Compatible Regular Expressions. It is the name of an open source library written in C by Phillip Hazel. The library is compatible with a great number of C compilers and operating systems. Many people have derived libraries from PCRE to make it compatible with other programming languages. The regex features included with PHP, Delphi, and R, and Xojo (REALbasic) are all based on PCRE. The library is also included with many Linux distributions as a shared .so library and a .h header file.
Though PCRE claims to be Perl-compatible, there are more than enough differences between contemporary versions of Perl and PCRE to consider them distinct regex flavors. Recent versions of Perl have even copied features from PCRE that PCRE had copied from other programming languages before Perl had them, in an attempt to make Perl more PCRE-compatible. Today PCRE is used more widely than Perl because PCRE is part of so many libraries and applications.
Using PCRE is very straightforward. Before you can use a regular expression, it needs to be converted into a binary format for improved efficiency. To do this, simply call pcre_compile() passing your regular expression as a null-terminated string. The function returns a pointer to the binary format. You cannot do anything with the result except pass it to the other pcre functions.
To use the regular expression, call pcre_exec() passing the pointer returned by pcre_compile(), the character array you want to search through, and the number of characters in the array (which need not be null-terminated). You also need to pass a pointer to an array of integers where pcre_exec() stores the results, as well as the length of the array expressed in integers. The length of the array should equal the number of capturing groups you want to support, plus one (for the entire regex match), multiplied by three (!). The function returns -1 if no match could be found. Otherwise, it returns the number of capturing groups filled plus one. If there are more groups than fit into the array, it returns 0. The first two integers in the array with results contain the start of the regex match (counting bytes from the start of the array) and the number of bytes in the regex match, respectively. The following pairs of integers contain the start and length of the backreferences. So array[n*2] is the start of capturing group n, and array[n*2+1] is the length of capturing group n, with capturing group 0 being the entire regex match.
When you are done with a regular expression, all pcre_dispose() with the pointer returned by pcre_compile() to prevent memory leaks.
The PCRE library only supports regex matching, a job it does rather well. It provides no support for search-and-replace, splitting of strings, etc. This may not seem as a major issue because you can easily do these things in your own code. The unfortunate consequence, however, is that all the programming languages and libraries that use PCRE for regex matching have their own replacement text syntax and their own idiosyncrasies when splitting strings.
You can find more information about PCRE on http://www.pcre.org/.
By default, PCRE compiles without Unicode support. If you try to use \p, \P or \X in your regular expressions, PCRE will complain it was compiled without Unicode support.
To compile PCRE with Unicode support, you need to define the SUPPORT_UTF8 and SUPPORT_UCP conditional defines. If PCRE's configuration script works on your system, you can easily do this by running ./configure --enable-unicode-properties before running make. The regular expressions tutorial on this website assumes that you've compiled PCRE with these options and that all other options are set to their defaults.
A feature unique to PCRE is the "callout". If you put (?C1) through (?C255) anywhere in your regex, PCRE calls the pcre_callout function when it reaches the callout during the match attempt.
Did this website just save you a trip to the bookstore? Please make a donation to support this site, and you'll get a lifetime of advertisement-free access to this site!
Page URL: http://www.regular-expressions.info/pcre.html
Page last updated: 16 September 2013
Site last updated: 22 October 2014
Copyright © 2003-2014 Jan Goyvaerts. All rights reserved.
|Languages & Libraries|
|Visual Basic 6|
|XQuery & XPath|
|Regular Expressions Quick Start|
|Regular Expressions Tutorial|
|Replacement Strings Tutorial|
|Applications and Languages|
|Regular Expressions Examples|
|Regular Expressions Reference|
|Replacement Strings Reference|
|About This Site|
|RSS Feed & Blog|