Preamble
The Oracle/PLSQL REGEXP_SUBSTR function is an extension of function SUBSTR. This function, represented in Oracle 10g, allows you to extract substring from a string using regular expression pattern matching.
Syntax of the Oracle/PLSQL function REGEXP_SUBSTR
REGEXP_SUBSTR( string_id, pattern_id [, start_position_id [, nth_appearance_id [, match_parameter_id [, sub_expression_id ] ] ] ] ]
Parameters and function arguments
- string_id – A search line. It can be CHAR, VARCHAR2, NCHAR, NVARCHAR2, CLOB or NCLOB.
- pattern_id – Template. Regular expression for comparison. It can be a combination of the following values:
|
Meaning
|
Description
|
|---|---|
|
^
|
Corresponds to the beginning of the line. When using match_parameter with m, corresponds to the beginning of the string anywhere within the expression.
|
|
$
|
Corresponds to the end of the line. When using match_parameter with m, it corresponds to the end of the string anywhere within the expression.
|
|
*
|
Corresponds to zero or more occurrences.
|
|
+
|
Corresponds to one or more occurrences.
|
|
?
|
Corresponds to zero or one entry.
|
|
.
|
Corresponds to any character except NULL.
|
|
|
|
Used as “OR” to specify more than one alternative.
|
|
[ ]
|
It is used to specify a list of matches where you try to match any of the characters in the list.
|
|
[^ ]
|
It is used to specify a nonmatching list where you try to match any character except for those on the list.
|
|
( )
|
Used for group expressions as subexpressions.
|
|
{m}
|
Corresponds m times.
|
|
{m,}
|
Matching at least m times.
|
|
{m,n}
|
Matching at least m times, but not more than n times.
|
|
\n
|
n is a number between 1 and 9. It corresponds to the n-th subexpression located in ( ) before \n.
|
|
[..]
|
Corresponds to a single element mappings that can be more than one character.
|
|
[::]
|
Meets the symbol class.
|
|
[==]
|
Corresponds to the class of equivalence.
|
|
\d
|
Corresponds to the digital symbol.
|
|
\D
|
Corresponds to a non-digital symbol.
|
|
\w
|
Corresponds to the text symbol.
|
|
\W
|
Corresponds to a non-text symbol.
|
|
\s
|
Corresponds to the space character.
|
|
\S
|
Doesn’t match the space character.
|
|
\A
|
Corresponds to the beginning of a line or corresponds to the end of a line before a new line character.
|
|
\Z
|
Corresponds to the end of the line.
|
|
*?
|
Corresponds to the previous pattern of zero or more occurrences.
|
|
+?
|
One or more entries correspond to the previous template.
|
|
??
|
Corresponds to the previous zero or one entry pattern.
|
|
{n}?
|
Corresponds to the previous template n times.
|
|
{n,}?
|
Corresponds to the previous template at least n times.
|
|
{n,m}?
|
Corresponds to the previous template at least n times, but not more than m times.
|
- start_position_id – Optional. This is the position in the line from which the search will start. If this parameter is omitted, by default it is 1, which is the first position in the string.
- nth_appearance_id – Optional. This is the n-th view of the pattern in the string. If this option is omitted, it defaults to 1, which is the first entry of the template in the string. If you specify 0 for this parameter, all template entries in the string will be replaced.
- match_parameter_id – It’s optional. This allows you to change the compliance behavior for the REGEXP_REPLACE condition. This can be a combination of the following values:
|
Meaning
|
Description
|
|---|---|
|
‘c’
|
Performs register-sensitive alignment.
|
|
‘i’
|
Performs case insensitive alignment.
|
|
‘n’
|
Allows a character period (.) to match the character of a new string. By default, the metasymic period.
|
|
‘m’
|
The expression assumes that there are several lines where ^ is the beginning of a line and $ is the end of a line, regardless of the position of these characters in the expression. By default, the expression is assumed to be on the same line.
|
|
‘x’
|
The symbols of spaces are ignored. By default, the space characters are the same as any other character.
|
- subexpression_id – Optional. Used when the template has subexpressions, and you want to specify which subexpression in the template is the target. This is an integer value between 0 and 9, indicating that the subexpression matches the template.
The REGEXP_SUBSTR function returns a string value.
If REGEXP_SUBSTR does not detect any pattern occurrence, it returns NULL.
If there are conflicting values for match_parameter, the REGEXP_SUBSTR function will use the last value.
REGEXP_SUBSTR function can be used in the following versions of Oracle / PLSQL
Oracle 12c, Oracle 11g, Oracle 10g
Example of a match in words
Let’s start by extracting the first word from the string.
For example:
SELECT REGEXP_SUBSTR ("Google is a great search engine.", '(\S*)(\s)')
FROM dual;
--Result: 'Google'
This example will return ‘Google’ because it will extract all characters without spaces as specified (\S*) and then the first character of the space specified (\s). The result will include both the first word and the space character after the word.
If you do not want to include a space in the result, we will change our example as follows:
SELECT REGEXP_SUBSTR ("Google is a great search engine.", '(\S*)')
FROM dual;
-Result: 'Google'
This example will return ‘Google’ without a space at the end.
If we need to find the second word in a line, we will change our function as follows:
SELECT REGEXP_SUBSTR ("Google is a great search engine.", '(\S*)(\s)', 1, 2)
FROM dual;
--Result: 'is '
This example will return ‘is ‘ with a space at the end of the line.
If we need to find the fourth word in the string, we will change our function as follows:
SELECT REGEXP_SUBSTR ("Google is a great search engine.", '(\S*)(\s)', 1, 4)
FROM dual;
--Result: 'great'
This example will return ‘great’ with a space at the end of the line.
Example of a number match
Let’s see how we will use the function REGEXP_SUBSTR to compare the pattern of digital characters.
For example:
SELECT REGEXP_SUBSTR ('2, 4, and 10 numbers for example', '\d').
FROM dual;
--Result: '2'
In this example, the first digit will be extracted from a line, as specified at \d. In this case, it will match number 2.
We could change our pattern to find a two-digit number.
For example:
SELECT REGEXP_SUBSTR ('2, 4, and 10 numbers for example', '(\d)(\d)')
FROM dual;
--Result: '10'
In this example, a number will be printed that has two digits as specified in (\d)(\d). In this case it will skip the numeric values 2 and 4 and return 10.
Let’s see how we will use the REGEXP_SUBSTR function with a table column and look for a two-digit number.
For example:
SELECT REGEXP_SUBSTR (address, '(\d)(\d)')
FROM contacts;
In this example, we are going to extract the first two-digit value from the address field in the contacts table.
Example of matching several alternatives
The following example, which we will look at, includes the use of | template. | template is used as an “OR” to specify several alternatives.
For example:
SELECT REGEXP_SUBSTR ('AeroSmith', 'a|e|i|o|u').
FROM dual;
--Result: 'e'
This example will return an ‘e’ because it looks for the first vowel (a, e, i, o or u) in the string. Since we didn’t specify a match_parameter value, the REGEXP_SUBSTR function will perform a case sensitive search, which means that ‘A’ in ‘AeroSmith’ will not be matched.
To perform a case-insensitive search, we will modify our query in the following way:
SELECT REGEXP_SUBSTR ('AeroSmith', 'a|e|i|o|u', 1, 1, 'i')
FROM dual;
--Result: 'A'
Now since we have provided match_parameter = ‘i’, the query will return ‘A’ as a result. This time ‘A’ in ‘AeroSmith’ will be matched.
Now consider how you will use this function with a column.
So, suppose we have a contact table with the following data:
|
contact_id
|
last_name
|
|---|---|
|
1000
|
AeroSmith
|
|
2000
|
Joy
|
|
3000
|
Scorpions
|
Now let’s start the next request:
SELECT contact_id, last_name, REGEXP_SUBSTR (last_name, 'a|e|i|o|u', 1, 1, 'i') AS "First Vowel".
FROM contacts;
The results to be returned by the request:
|
contact_id
|
last_name
|
First Vowel
|
|---|---|---|
|
1000
|
AeroSmith
|
A
|
|
2000
|
Joy
|
o
|
|
3000
|
Scorpions
|
o
|
Example of matches based on nth_occurrence parameter
The next example we will consider includes the nth_occurrence parameter. The nth_occurrence parameter allows you to choose from which occurrence of the template you want to extract the substring.
First occurrence
Let’s see how to extract the first occurrence of the template in a row.
For example:
SELECT REGEXP_SUBSTR ('AeroSmith', 'a|e|i|o|u', 1, 1, 'i')
FROM dual;
--Result: 'A'
This example will return ‘A’ because it retrieves the first vowel occurrence (a, e, i, o or u) in the string.
Second occurrence
Then we will select a template for the second line entry.
For example:
SELECT REGEXP_SUBSTR ('AeroSmith', 'a|e|i|o|u', 1, 2, 'i')
FROM dual;
--Result: 'e'
This example will return ‘e’ because it retrieves the second occurrence of a vowel (a, e, i, o or u) in a string.
The third occurrence
For example:
SELECT REGEXP_SUBSTR ('AeroSmith', 'a|e|i|o|u', 1, 3, 'i')
FROM dual;
--Result: 'o'
This example will return an ‘o’ because it retrieves the third vowel occurrence (a, e, i, o or u) in a string.
Oracle regular expression: extracting substring regexp substr
About Enteros
Enteros offers a patented database performance management SaaS platform. It proactively identifies root causes of complex business-impacting database scalability and performance issues across a growing number of clouds, RDBMS, NoSQL, and machine learning database platforms.
The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.
Are you interested in writing for Enteros’ Blog? Please send us a pitch!
RELATED POSTS
How to Drive Intelligent Cloud Governance with Enteros Database Management Platform and AIOps
- 22 May 2026
- Database Performance Management
Introduction Cloud computing has become the foundation of modern digital transformation. Organizations across industries increasingly rely on cloud-native infrastructures, distributed applications, AI-driven services, and real-time analytics platforms to support innovation, scalability, and operational agility. Today’s enterprises operate highly complex cloud ecosystems that support: Business-critical applications Database environments Customer engagement platforms AI and machine learning workloads … Continue reading “How to Drive Intelligent Cloud Governance with Enteros Database Management Platform and AIOps”
How to Enhance Media Growth Strategies with Enteros Generative AI and Cost Management Analytics
Introduction The media industry is undergoing rapid transformation as organizations embrace digital platforms, cloud-native infrastructures, streaming technologies, AI-driven content strategies, and real-time audience analytics. Modern media companies must manage increasingly complex operational ecosystems while delivering personalized, high-quality experiences across multiple digital channels. Today’s media organizations rely heavily on digital technologies to support: Streaming platforms Content … Continue reading “How to Enhance Media Growth Strategies with Enteros Generative AI and Cost Management Analytics”
Enhancing Digital Banking Performance and Scalability with AI-Driven Database Analytics
- 21 May 2026
- Database Performance Management
Introduction Digital banking has transformed the global financial landscape. Customers now expect instant account access, real-time transactions, personalized financial services, and seamless digital experiences across mobile and web platforms. To meet these expectations, banks and financial institutions rely heavily on high-performance data infrastructure powered by complex database environments. Every digital banking operation—whether it involves payments, … Continue reading “Enhancing Digital Banking Performance and Scalability with AI-Driven Database Analytics”
How Intelligent Database Analytics Improves Performance and Reliability in Modern Healthcare Platforms
Introduction Healthcare organizations today operate in an increasingly data-driven environment. Hospitals, clinics, diagnostic centers, telemedicine platforms, and healthcare networks rely heavily on digital systems to manage patient records, medical imaging, billing systems, analytics platforms, and clinical workflows. At the center of these operations lies a complex healthcare data infrastructure powered by databases. These databases process … Continue reading “How Intelligent Database Analytics Improves Performance and Reliability in Modern Healthcare Platforms”