PostgreSQL 7.4 and later use the exact same regular expression engine that was developed by Henry Spencer for Tcl 8.2. This means that PostgreSQL supports the same three regular expressions flavors: Tcl Advanced Regular Expressions, POSIX Extended Regular Expressions and POSIX Basic Regular Expressions. Just like in Tcl, AREs are the default. All my comments on Tcl's regular expression flavor, like the unusual mode modifiers and word boundary tokens, fully apply to PostgreSQL. You should definitely review them if you're not familiar with Tcl's AREs. Unfortunately, PostgreSQL's regexp_replace function does not use the same syntax for the replacement text as Tcl's regsub command, however.
PostgreSQL versions prior to 7.4 supported POSIX Extended Regular Expressions only. If you are migrating old database code to a new version of PostgreSQL, you can set PostgreSQL's "regex_flavor" run-time parameter to "extended" instead of the default "advanced" to make EREs the default.
PostgreSQL also supports the traditional SQL LIKE operator, and the SQL:1999 SIMILAR TO operator. These use their own pattern languages, which are not discussed here. AREs are far more powerful, and no more complicated if you don't use functionality not offered by LIKE or SIMILAR TO.
The tilde infix operator returns true or false depending on whether a regular expression can match part of a string, or not. E.g. 'subject' ~ 'regexp' returns false, while 'subject' ~ '\\w' returns true. If the regex must match the whole string, you'll need to use anchors. E.g. 'subject' ~ '^\\w$' returns false, while 'subject' ~ '^\\w+$' returns true. There are 4 variations of this operator:
While only case sensitivity can be toggled by the operator, all other options can be set using mode modifiers at the start of the regular expression. Mode modifiers override the operator type. E.g. '(?c)regex' forces the to be regex case sensitive.
The most common use of this operator is to select rows based on whether a column matches a regular expression, e.g.:
select * from mytable where mycolumn ~* 'regexp'
The backslash is used to escape characters in PostgreSQL strings. So a regular expression like \w that contains a backslash becomes '\\w' when written as a literal string in a PostgreSQL statement. To match a single literal backslash, you'll need the regex \\ which becomes '\\\\' in PostgreSQL.
With the substring(string from pattern) function, you can extract part of a string or column. It takes two parameters: the string you want to extract the text from, and the pattern the extracted text should match. If there is no match, substring() returns null. E.g. substring('subject' from 'regexp') returns null. If there is a match, and the regex has one or more capturing groups, the text matched by the first capturing group is returned. E.g. substring('subject' from 's(\\w)') returns 'u'. If there is a match, but the regex has no capturing groups, the whole regex match is returned. E.g. substring('subject' from 's\\w') returns 'su'. If the regex matches the string more than once, only the first match is returned. Since the substring() function doesn't take a "flags" parameter, you'll need to toggle any matching options using mode modifiers.
This function is particularly useful to extract information from columns. E.g. to extract the first number from the column mycolumn for each row, use:
select substring(mycolumn from '\d+') from mytable
With regexp_replace(subject, pattern, replacement [, flags]) you can replace regex matches in a string. If you omit the flags parameter, the regex is applied case sensitively, and only the first match is replaced. If you set the flags to 'i', the regex is applied case insensitively. The 'g' flag (for "global") causes all regex matches in the string to be replaced. You can combine both flags as 'gi'.
You can use the backreferences \1 through \9 in the replacement text to re-insert the text matched by a capturing group into the regular expression. \& re-inserts the whole regex match. Remember to double up the backslashes in literal strings.
E.g. regexp_replace('subject', '(\w)\w', '\&\1', 'g') returns 'susbjbecet'.
PostgreSQL 8.3 and later have two new functions to split a string along its regex matches. regexp_split_to_table(subject, pattern[, flags]) returns the split string as a new table. regexp_split_to_array(subject, pattern[, flags]) returns the split string as an array of text. If the regex finds no matches, both functions return the subject string.
Did this website just save you a trip to the bookstore? Please make a donation to support this site, and you'll get a lifetime of advertisement-free access to this site!
Page URL: https://www.regular-expressions.info/postgresql.html
Page last updated: 30 June 2017
Site last updated: 20 June 2019
Copyright © 2003-2019 Jan Goyvaerts. All rights reserved.
|Languages & Libraries|
|Visual Basic 6|
|XQuery & XPath|
|Regular Expressions Quick Start|
|Regular Expressions Tutorial|
|Replacement Strings Tutorial|
|Applications and Languages|
|Regular Expressions Examples|
|Regular Expressions Reference|
|Replacement Strings Reference|
|About This Site|
|RSS Feed & Blog|