Quick Start
Tutorial
Tools & Languages
Examples
Reference
Book Reviews
Examples
Regular Expressions Examples
Numeric Ranges
Floating Point Numbers
Email Addresses
IP Addresses
Valid Dates
Numeric Dates to Text
Credit Card Numbers
Matching Complete Lines
Deleting Duplicate Lines
Programming
Two Near Words
Pitfalls
Catastrophic Backtracking
Too Many Repetitions
Denial of Service
Making Everything Optional
Repeated Capturing Group
Mixing Unicode & 8-bit
More on This Site
Introduction
Regular Expressions Quick Start
Regular Expressions Tutorial
Replacement Strings Tutorial
Applications and Languages
Regular Expressions Examples
Regular Expressions Reference
Replacement Strings Reference
Book Reviews
Printable PDF
About This Site
RSS Feed & Blog
RegexBuddy—The most comprehensive regular expression library!

Replacing Numerical Dates with Textual Dates

This example shows how you can replace numerical dates from 1/1/50 or 01/01/50 through 12/31/49 with their textual equivalents from January 1st, 1950 through December 31st, 2049. This is only possible with a single regular expression if you can vary the replacement based on what was matched. One way to do this is to build each replacement in procedural code. This example shows how you can do it using replacement string conditionals. This example works can be used with PowerGREP 5, the Boost C++ library, and the PCRE2 C library.

To be able to use replacement string conditionals, the regular expression needs a separate capturing group for each part of the match that needs a different replacement. Each month needs to be replaced with its own name, so we need a separate capturing group to match each month number. Cardinal numbers ending with 1, 2, and 3 have unique suffixes. So we need four groups to match day numbers ending with 1, 2, 3, or another digit. We’re assuming year numbers 50 to 99 to be 1950 to 1999, and year numbers 00 to 49 to be 2000 to 2049. So we need two more groups to match each half century.

The Regular Expression

Putting this all together results in a rather long regular expression. Free-spacing helps to keep it readable. The structure of the regex is the same as what you would use for matching valid dates. It’s only more long-winded because we need 12 alternatives to match the month, 4 alternatives to match the day, and 2 alternatives to match the year.

\b
(?: # Month
   
(?'jan'0?1)|(?'feb'0?2)|(?'mar'0?3)|(?'apr'0?4)|(?'may'0?5)|(?'jun'0?6)
  
|(?'jul'0?7)|(?'aug'0?8)|(?'sep'0?9)|(?'oct'10)|(?'nov'11)|(?'dec'12)
  
) /
0?(?: # Day
   
(?'1st'[23]?1)|(?'2nd'2?2)|(?'3rd'2?3)|(?'nth'30|1[123]|[12]?[4-90])
  
) /
(?: # Year
   
(?'19xx'[5-9][0-9])|(?'20xx'[0-4][0-9])
  
)
\b

The Replacement String

The replacement string will use backreferences to reinsert the date numbers. Since we want to omit leading zeros from the replacements, we placed 0? outside the capturing groups for date numbers. This means that our regex also allows leading zeros for days 10 to 31. Since our goal is to replace dates rather than validate them, we can live with this. Otherwise, we would need two sets of four alternatives to match the day of the month. One set for single digit days, and one set for double digit days.

Unfortunately, free-spacing does not work with replacement strings. So the replacement consists of one very long line. It is broken into multiple lines here to fit the width of the page. This is the replacement using Boost syntax:

(?{jan}January)(?{feb}February)(?{mar}March)(?{apr}April)(?{may}May)(?{jun}June)
(?{jul}July)(?{aug}August)(?{sep}September)(?{oct}October)(?{nov}November)
(?{dec}December) (?{1st}${1st}st)(?{2nd}${2nd}nd)(?{3rd}${3rd}rd)(?{nth}${nth}th)
(?{19xx}19${19xx})(?{20xx}20${20xx})

This is the replacement using PCRE2 syntax:

${jan:+January}${feb:+February}${mar:+March}${apr:+April}${may:+May}${jun:+June}
${jul:+July}${aug:+August}${sep:+September}${oct:+October}${nov:+November}
${dec:+December} ${1st:+${1st}st}${2nd:+${2nd}nd}${3rd:+${3rd}rd}${nth:+${nth}th}
${19xx:+19${19xx}}${20xx:+20${20xx}}

First we have 12 conditionals that reference the 12 capturing groups for the months. Each conditional inserts the month’s name when their group participates. They insert nothing when their group does not participate. Since only one of these groups participates in any match, only one of these conditionals actually inserts anything into the replacement.

Then we have a literal space and 4 more conditionals that reference the 4 capturing groups for the days. When the group participates, the conditional uses a backreference to the same group to reinsert the day number matched by the group. The backreference is followed by a literal suffix.

Finally, we have a literal comma, a literal space, and 2 more conditionals for the year. The conditionals again use literal text and a backreference to expand the year from 2 to 4 digits.