Pitfalls |
Catastrophic Backtracking |
Too Many Repetitions |
Denial of Service |
Making Everything Optional |
Repeated Capturing Group |
Mixing Unicode & 8-bit |
^(19|20)\d\d[- /.](0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])$ matches a date in yyyy-mm-dd format from 1900-01-01 through 2099-12-31, with a choice of four separators. The anchors make sure the entire variable is a date, and not a piece of text containing a date. The year is matched by (19|20)\d\d. I used alternation to allow the first two digits to be 19 or 20. The parentheses are mandatory. Had I omitted them, the regex engine would go looking for 19 or the remainder of the regular expression, which matches a date between 2000-01-01 and 2099-12-31. Parentheses are the only way to stop the vertical bar from splitting up the entire regular expression into two options.
The month is matched by 0[1-9]|1[012], again enclosed by parentheses to keep the two options together. By using character classes, the first option matches a number between 01 and 09, and the second matches 10, 11 or 12.
The last part of the regex consists of three options. The first matches the numbers 01 through 09, the second 10 through 29, and the third matches 30 or 31.
Smart use of alternation allows us to exclude invalid dates such as 2000-00-00 that could not have been excluded without using alternation. To be really perfectionist, you would have to split up the month into various options to take into account the length of the month. The above regex still matches 2003-02-31, which is not a valid date. Making leading zeros optional could be another enhancement.
If you want to require the delimiters to be consistent, you could use a backreference. ^(19|20)\d\d([- /.])(0[1-9]|1[012])\2(0[1-9]|[12][0-9]|3[01])$ will match 1999-01-01 but not 1999/01-01.
Again, how complex you want to make your regular expression depends on the data you are using it on, and how big a problem it is if an unwanted match slips through. If you are validating the user’s input of a date in a script, it is probably easier to do certain checks outside of the regex. For example, excluding February 29th when the year is not a leap year is far easier to do in a scripting language. It is far easier to check if a year is divisible by 4 (and not divisible by 100 unless divisible by 400) using simple arithmetic than using regular expressions.
Here is how you could check a valid date in Perl. I also added parentheses to capture the year into a backreference.
sub isvaliddate {
my $input = shift;
if ($input =~ m!^((?:19|20)\d\d)[- /.](0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])$!) {
# At this point, $1 holds the year, $2 the month and $3 the day of the date entered
if ($3 == 31 and ($2 == 4 or $2 == 6 or $2 == 9 or $2 == 11)) {
return 0; # 31st of a month with 30 days
} elsif ($3 >= 30 and $2 == 2) {
return 0; # February 30th or 31st
} elsif ($2 == 2 and $3 == 29 and not ($1 % 4 == 0 and ($1 % 100 != 0 or $1 % 400 == 0))) {
return 0; # February 29th outside a leap year
} else {
return 1; # Valid date
}
} else {
return 0; # Not a date
}
}
To match a date in mm/dd/yyyy format, rearrange the regular expression to ^(0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])[- /.](19|20)\d\d$. For dd-mm-yyyy format, use ^(0[1-9]|[12][0-9]|3[01])[- /.](0[1-9]|1[012])[- /.](19|20)\d\d$. You can find additional variations of these regexes in RegexBuddy’s library.
| Quick Start | Tutorial | Tools & Languages | Examples | Reference | Book Reviews |
| Regular Expressions Examples | Numeric Ranges | Floating Point Numbers | Email Addresses | IP Addresses | Valid Dates | Numeric Dates to Text | Credit Card Numbers | Matching Complete Lines | Deleting Duplicate Lines | Programming | Two Near Words |
| Catastrophic Backtracking | Too Many Repetitions | Denial of Service | Making Everything Optional | Repeated Capturing Group | Mixing Unicode & 8-bit |
Page URL: https://www.regular-expressions.info/dates.html
Page last updated: 2 September 2021
Site last updated: 29 August 2024
Copyright © 2003-2024 Jan Goyvaerts. All rights reserved.