Quick Start
Tutorial
Search & Replace
Tools & Languages
Examples
Reference
Regex Tools
grep
PowerGREP
RegexBuddy
RegexMagic
General Applications
EditPad Lite
EditPad Pro
Google Docs
Google Sheets
LibreOffice
Notepad++
Languages & Libraries
Boost
C#
Delphi
F#
GNU (Linux)
Groovy
ICU (Unicode)
Java
JavaScript
.NET
PCRE (C/C++)
PCRE2 (C/C++)
Perl
PHP
POSIX
PowerShell
Python
Python.NET and IronPython
R
RE2
Ruby
std::regex
Tcl
TypeScript
VBScript
Visual Basic 6
Visual Basic (.NET)
wxWidgets
XML Schema
XQuery & XPath
Xojo
XRegExp
Databases
Google BigQuery
MySQL
Oracle
PostgreSQL
More on This Site
Introduction
Regular Expressions Quick Start
Regular Expressions Tutorial
Replacement Strings Tutorial
Applications and Languages
Regular Expressions Examples
Regular Expressions Reference
Replacement Strings Reference
Book Reviews
Printable PDF
About This Site
RSS Feed & Blog
RegexBuddy—The best regex editor and tester for bigquery developers!

Google BigQuery REGEXP Functions

Google BigQuery uses Google’s RE2 regex engine for both finding regex matches and for replacing regex matches. Everything said about RE2’s regular expression flavor and replacement text flavor on this website also applies to Google BigQuery. None of Google BigQuery’s regexp functions allow you to pass flags outside the regular expression. So it is always case sensitive and the dot does never matches newlines.

REGEXP_CONTAINS(value, regexp) returns TRUE or FALSE depending on whether regexp can find a match in value. Partial matches are allowed.

REGEXP_EXTRACT(value, regexp[, position[, occurrence]]) returns a string with the part of value matched by the regular expression. If the match attempt fails then NULL is returned. You can use REGEXP_EXTRACT with a single string or with a column. You can use it in SELECT clauses to retrieve only a certain part of a column.

The other two parameters are optional. The position parameter specifies the character position in the source string at which the match attempt should start. The first character has position 1. The occurrence parameter specifies which match to get. Set it to 1 to get the first match. If you specify a higher number then BigQuery will continue to attempt to match the regex starting at the end of the previous match, until it found as many matches as you specified. The last match is then returned. If there are fewer matches then NULL is returned.

Do not confuse the occurrence parameter with backreferences. BigQuery returns the overall regex match if regexp does not contain any capturing groups. It returns the match of the capturing group if regexp contains exactly one capturing group. It returns an error if regexp has two or more capturing groups. RE2 supports non-capturing groups. To extract a specific group, make all other groups non-capturing. To extract the overall regex match, make all groups non-capturing.

REGEXP_SUBSTR(value, regexp[, position[, occurrence]]) is a synonym for REGEXP_EXTRACT. Other databases such as Oracle, MySQL, and PostgreSQL have a similar REGEXP_SUBSTR function, but no REGEXP_EXTRACT function. Do keep in mind that the handling of capturing groups specified in the previous paragraph is unique to BigQuery.

REGEXP_EXTRACT_ALL(value, regexp) returns an array with all matches found by regexp in value. The first match attempt always begins at the start of the string. Following match attempts begin at the end of the previous match attempt. So only non-overlapping matches are returned. An empty array is returned if regexp cannot match value at all. If the regular expression has no capturing groups then the array contains overall regex matches. If the regular expression has one capturing group then the array contains the part of each regex match captured by that group. A regex with two or more capturing groups is an error.

REGEXP_INSTR(value, regexp[, position[, occurrence[, occurrence_position]]]) returns an INT64 with beginning or ending position of a regex match in the source string. This function takes the same parameters as REGEXP_EXTRACT, plus one more. Set occurrence_position to zero or omit the parameter to get the position of the first character in match. Set it to one to get the position of the first character after the match. The first character in the string has position 1. LENGTH(value) + 1 is the ending position of a match that includes the last character in the string. REGEXP_INSTR returns zero if the match cannot be found.

REGEXP_REPLACE(value, regexp, replacement) returns value with all matches of regexp replaced with replacement. The regular expression can have multiple capturing groups. They can be referenced by the replacement using the same replacement text syntax as RE2.

| Quick Start | Tutorial | Search & Replace | Tools & Languages | Examples | Reference |

| grep | PowerGREP | RegexBuddy | RegexMagic |

| EditPad Lite | EditPad Pro | Google Docs | Google Sheets | LibreOffice | Notepad++ |

| Boost | C# | Delphi | F# | GNU (Linux) | Groovy | ICU (Unicode) | Java | JavaScript | .NET | PCRE (C/C++) | PCRE2 (C/C++) | Perl | PHP | POSIX | PowerShell | Python | Python.NET and IronPython | R | RE2 | Ruby | std::regex | Tcl | TypeScript | VBScript | Visual Basic 6 | Visual Basic (.NET) | wxWidgets | XML Schema | XQuery & XPath | Xojo | XRegExp |

| Google BigQuery | MySQL | Oracle | PostgreSQL |