|Languages & Libraries|
|Visual Basic 6|
|XQuery & XPath|
|Regular Expressions Quick Start|
|Regular Expressions Tutorial|
|Replacement Strings Tutorial|
|Applications and Languages|
|Regular Expressions Examples|
|Regular Expressions Reference|
|Replacement Strings Reference|
|About This Site|
|RSS Feed & Blog|
Delphi XE is the first release of Delphi that has built-in support for regular expressions. In most cases you’ll use the RegularExpressions unit. This unit defines a set of records that mimic the regular expression classes in the .NET framework. Just like in .NET, they allow you to use a regular expression in just one line of code without explicit memory management.
Internally the RegularExpressions unit uses the RegularExpressionsCore unit which defines the TPerlRegEx class. TPerlRegEx is a wrapper around the open source PCRE library developed by the author of this website. Thus both the RegularExpressions and RegularExpressionsCore units use the PCRE regex flavor.
The RegularExpressions unit defines TRegEx, TMatch, TMatchCollection, TGroup, and TGroupCollection as records rather than as classes. That means you don’t need to call Create and Free to allocate and deallocate memory.
TRegEx does have a Create constructor that you can call if you want to use the same regular expression more than once. That way TRegEx doesn’t compile the same regex twice. If you call the constructor, you can then call any of the non-static methods that do not take the regular expression as a parameter. If you don’t call the constructor, you can only call the static (class) methods that take the regular expression as a parameter. All TRegEx methods have static and non-static overloads. Which ones you use solely depends on whether you want to make more than one call to TRegEx using the same regular expression.
The IsMatch method takes a string and returns True or False indicating whether the regular expression matches (part of) the string.
The Match method takes a string and returns a TMatch record with the details of the first match. If the match fails, it returns a TMatch record with the Success property set to nil. The non-static overload of Match() takes an optional starting position and an optional length parameter that you can use to search through only part of the input string.
The Matches method takes a string and returns a TMatchCollection record. The default Item property of this record holds a TMatch for each match the regular expression found in the string. If there are no matches, the Count property of the returned TMatchCollection record is zero.
Use the Replace method to search-and-replace all matches in a string. You can pass the replacement text as a string using the JGsoft replacement text flavor. Or, you can pass a TMatchEvaluator which is nothing more than a method that takes one parameter called Match of type TMatch and returns a string. The string returned by your method is used as a literal replacement string. If you want backreferences in your string to be replaced when using the TMatchEvaluator overload, call the Result method on the provided Match parameter before returning the string.
Use the Split method to split a string along its regex matches. The result is returned as a dynamic array of strings. As in .NET, text matched by capturing groups in the regular expression are also included in the returned array. If you don’t like this, remove all named capturing groups from your regex and pass the roExplicitCapture option to disable numbered capturing groups. The non-static overload of Split() takes an optional Count parameter to indicate the maximum number of elements that the returned array may have. In other words, the string is split at most Count-1 times. Capturing group matches are not included in the count. So if your regex has capturing groups, the returned array may have more than Count elements. If you pass Count, you can pass a second optional parameter to indicate the position in the string at which to start splitting. The part of the string before the starting position is returned unsplit in the first element of the returned array.
The TMatch record provides several properties with details about the match. Success indicates if a match was found. If this is False, all other properties and methods are invalid. Value returns the matched string. Index and Length indicate the position in the input string and the length of the match. Groups returns a TGroupCollection record that stores a TGroup record in its default Item property for each capturing group. You can use a numeric index to Item for numbered capturing groups, and a string index for named capturing groups.
TMatch also provides two methods. NextMatch returns the next match of the regular expression after this one. If your TMatch is part of a TMatchCollection you should not use NextMatch to get the next match but use TMatchCollection.Item instead, in order to avoid repeating the search. TMatch.Result takes one parameter with the replacement text as a string using the JGsoft replacement text flavor. It returns the string that this match would have been replaced with if you had used this replacement text with TRegEx.Replace.
The TGroup record has Success, Value, Index and Length properties that work just like those of the TMatch.
In Delphi XE5 and prior TRegEx always skips zero-length matches. This was fixed in Delphi XE6. You can make the same fix in XE5 and prior by modifying RegularExpressionsCore.pas to remove the line State := [preNotEmpty] from TPerlRegEx.Create. This change will also affect code that uses TPerlRegEx directly without setting the State property.
TPerlRegEx has been available long before Embarcadero licensed a copy for inclusion with Delphi XE. Depending on your needs, you can download one of two versions for use with Delphi 2010 and earlier.
The latest release of TPerlRegEx is fully compatible with the RegularExpressionsCore unit in Delphi XE. For new code written in Delphi 2010 or earlier, using the latest release of TPerlRegEx is strongly recommended. If you later migrate your code to Delphi XE, all you have to do is replace PerlRegEx with RegularExrpessionsCore in the uses clause of your units.
The older versions of TPerlRegEx are non-visual components. This means you can put TPerlRegEx on the component palette and drop it on a form. The original TPerlRegEx was developed when Borland’s goal was to have a component for everything on the component palette.
If you want to migrate from an older version of TPerlRegEx to the latest TPerlRegEx, start with removing any TPerlRegEx components you may have placed on forms or data modules and instantiate the objects at runtime instead. When instantiating at runtime, you no longer need to pass an owner component to the Create() constructor. Simply remove the parameter.
Some of the property and method names in the original TPerlRegEx were a bit unwieldy. These have been renamed in the latest TPerlRegEx. Essentially, in all identifiers SubExpression was replaced with Group and MatchedExpression was replaced with Matched. Here is a complete list of the changed identifiers:
|Old Identifier||New Identifier|
If you’re using RegexBuddy or RegexMagic to generate Delphi code snippets, set the language to “Delphi (TPerlRegEx)” to use the old identifiers, or to “Delphi XE (Core)” to use the new identifiers, regardless of which (older) version of Delphi you’re actually using.
One thing you need to watch out for is that the TPerlRegEx versions you can download here as well as those included with Delphi XE, XE2, and XE3 use UTF8String properties and all the Offset and Length properties are indexes to those UTF-8 strings. This is because at that time PCRE only supported UTF-8 and using UTF8String avoids repeated conversions. If performance is critical, you should use TPerlRegEx instead of TRegEx with these versions of Delphi. If your data is already UTF-8, you can pass the UTF-8 directly to TPerlRegEx. If your data uses another encoding, you can control when the conversion to UTF-8 happens to avoid repeated conversions of the same data.
In Delphi XE4 and XE5 TPerlRegEx has UnicodeString (UTF-16) properties but still returns UTF-8 offsets and lengths. In Delphi XE6 the Offset and Length properties were changed to UTF-16 offsets and lengths. This means that code that works with XE3 or XE6 that uses the Offset and Length properties will not work with XE4 and XE5 if your strings contain non-ASCII characters. Delphi XE4 through and Delphi 10 through 10.2 continued to use the UTF-8 version of PCRE even though PCRE already had native UTF-16 support. This combined with the use of UnicodeString means constant conversions between UTF-16 and UTF-8 which can significantly degrade regex performance, particularly with long subject strings.
Delphi 10.3 and later use the UTF-16 version of PCRE on the Windows platform. TRegEx and TPerlRegEx now use UnicodeString for everything, without any conversion to UTF-8. Upgrading from Delphi XE4 or later to 10.3 or later will definitely improve the performance of any code that uses TRegEx or TPerlRegEx. Upgrading from Delphi XE3 or prior will improve performance unless you were doing everything with UTF-8.
Delphi Prism was Embarcadero’s variant of the Delphi language specifically developed to target the .NET framework. Delphi Prism lived inside the Visual Studio IDE. It was based entirely on the .NET framework. In Delphi Prism you could simply add the System.Text.RegularExpressions namespace to the uses clause of your units. Then you could access the .NET regex classes such as Regex, Match, and Group. You could them with Delphi Prism just as they can be used by C# and VB developers.
Delphi 8, 2005, 2006, and 2007 included a Delphi for .NET compiler for developing WinForms and VCL.NET applications. Though Delphi for .NET only supported .NET 1.1 or 2.0, depending on your Delphi version, you could still use .NET’s full regular expression support. You only needed to add the System.Text.RegularExpressions namespace to the uses clause of your units to be able to access all the .NET regex classes.
Page URL: https://www.regular-expressions.info/delphi.html
Page last updated: 24 August 2021
Site last updated: 02 December 2022
Copyright © 2003-2022 Jan Goyvaerts. All rights reserved.