Multiple Patterns in Regex - sql

Can there be multiple patterns in Regexp_Replace.
Pattern 1 : '^#.*'
Pattern 2: '^//.*'
Pattern 3 : '^&&.*'
I want all three patterns in same regexp_replace function like
select REGEXP_REPLACE ('Unit testing last level','Pattern 1,Pattern 2,Pattern 3','',1,0,'m')
from dual;

You can use an alternation group where all alternative branches are |-separated.
^(#|//|&&).*
The (...) form a grouping construct where you may place your various #, &&, and other possible "branches". A | is an alternation operator.
The pattern will match:
^ - start of a line (as you are passing m match_parameter)
(#|//|&&) - either #, // or &&
.* - any 0+ chars other than a newline (since n match_parameter is not used).

Related

ORACLE REGEXP limitation?

I'm testing oracle REGEXP_SUBSTR function and regexp that works in Python or Web testing tools like https://regex101.com/ doesn't work with Oracle.
Example:
((?:NF\s{0,1}EN){0,1}[\s]{0,1}ISO[\s]{0,1}[\d]{3,6}(?:[\:]{0,1}\d{1,4}){0,1}[\-]{0,1}\d{0,1})
STRING: VAS H M1582/950-80 ABCDFEF - ISO4014
MATCH: ISO4014, but oracle regexp_like doesn't match:
NOT MATCH:
SELECT REGEXP_SUBSTR (
'VAS H M1582/950-80 ABCDFEF - ISO4014',
'((?:NF\s{0,1}EN){0,1}[\s]{0,1}ISO[\s]{0,1}[\d]{3,6}(?:[\:]{0,1}\d{1,4}){0,1}[\-]{0,1}\d{0,1})')
FROM DUAL;
Any idea?
You can use
(NF\s?EN)?\s?ISO\s?\d{3,6}(:?\d{1,4})?-?\d?
See its demo at regex101.com.
Note:
Oracle regex does not "like" [\s], i.e. shorthand character classes inside brackets, you should not use them like that
{0,1} is equal to ? (one or zero occurrences)
(?:...), non-capturing groups, are not supported, you should replace them with capturing groups. (Note that (:? is not a non-capturing group, it is just an optional colon at the start of the second capturing group in the pattern).
You can use my XT_REGEXP for PCRE compatible regular expressions: https://github.com/xtender/XT_REGEXP
select *
from
table(xt_regexp.get_matches(
'VAS H M1582/950-80 ABCDFEF - ISO4014',
'((?:NF\s{0,1}EN){0,1}[\s]{0,1}ISO[\s]{0,1}[\d]{3,6}(?:[\:]{0,1}\d{1,4}){0,1}[\-]{0,1}\d{0,1})'
));
Results:
COLUMN_VALUE
------------------------------
ISO4014
1 row selected.

Extract string between different special symbols

I am having following string in my query
.\ABC\ABC\2021\02\24\ABC__123_123_123_ABC123.txt
beginning with a period from which I need to extract the segment between the final \ and the file extension period, meaning following expected result
ABC__123_123_123_ABC123
Am fairly new to using REGEXP and couldn't help myself to an elegant (or workable) solution with what Q&A here or else. In all queries the pattern is the same in quantity and order but for my growth of knowledge I'd prefer to not just count and cut.
You can use REGEXP_REPLACE function such as
REGEXP_REPLACE(col,'(.*\\)(.*)\.(.*)','\2')
in order to extract the piece starting from the last slash upto the dot. Preceding slashes in \\ and \. are used as escape characters to distinguish the special characters and our intended \ and . characters.
Demo
You need just regexp_substr and simple regexp ([^\]+)\.[^.]*$
select
regexp_substr(
'.\ABC\ABC\2021\02\24\ABC__123_123_123_ABC123.txt',
'([^\]+)\.[^.]*$',
1, -- position
1, -- occurence
null, -- match_parameter
1 -- subexpr
) substring
from dual;
([^\]+)\.[^.]*$ means:
([^\]+) - find one or more(+) any characters except slash([] - set, ^ - negative, ie except) and name it as group \1(subexpression #1)
\. - then simple dot (. is a special character which means any character, so we need to "escape" it using \ which is an escape character)
[^.]* - zero or more any characters except .
$ - end of line
So this regexp means: find a substring which consist from: one or more any characters except slash followed by dot followed by zero or more any characters except dot and it should be in the end of string. And subexpr parameter = 1, says oracle to return first subexpression (ie first matched group in (...))
Other parameters you can find in the doc.
Here is my simple full compatible example with Oracle 11g R2, PCRE2 and some other languages.
Oracle 11g R2 using function substr (Reference documentation)
select
regexp_substr(
'.\ABC\ABC\2021\02\24\ABC__123_123_123_ABC123.txt',
'((\w)+(_){2}(((\d){3}(_)){3}){1}((\w)+(\d)+){1}){1}',
1,
1
) substring
from dual;
Pattern: ((\w)+(_){2}(((\d){3}(_)){3}){1}((\w)+(\d)+){1}){1}
Result: ABC__123_123_123_ABC123
Just as simple as it can be, regular expressions always follow a minimal standard, as you can see portability also provided, just for the case someone else is interested in going the simplest way.
Hopefully, this will help you out!

Regexp_like from text

I'm always in trouble with regular expressions:
So i made a query with listagg:
Output:
21.09.2017 09:43 Status 8-2#Partner 0-178#EXP_date 24-2#EXP_interval 30-365#Partner_code template-0925584#amount 0-70#
21.11.2019 08:10 Status 8-2#Partner 0-178#EXP_date 24-0#EXP_interval 30-1#Partner_code template-0925805#
17.12.2019 10:23 Status 5-1#
I need to regexp this "text" with values Status,1#,2#
regexp_like(my_column,'Status (1#|2#)')
So i want all of the rows, where exists Status AND ('1#' OR '2#') at same time.
What is the correct form of regexp_like ?
You could shorten the alternation with the | to a character class.
Match the space after Status and then use .* to get to matching either 1# or 2#
Status .*[12]#
Regex demo
Consider:
regexp_like(my_column,'Status .*(1|2)#')
The problem with your original regex is that it was expecting '1#' or '2#' right after 'Status '. Adding '.*' allows any sequence of characters in between.
If you want to capture until the first # then use .+? else use .* for last #
Status.+?(1|2)#
REGEX DEMO: https://regex101.com/r/15pvJz/5
Explanation:
.+? matches any character (except for line terminators)
+? Quantifier — Matches between one and unlimited times, as few
times as possible, expanding as needed (lazy)

Regex sub-group behaviour with and without space

Say the task were to append the last numbers in a product code to itself with a hyphen between the original and added numbers (purely for experimentation).
I would like to understand why including a space is necessary in the following example:
with foo ( prod )
as ( values ('MYPRODUCT 123'))
select
'dot aster space' as test_type,
'''(.* (\d+))'',''$1-$2''' as the_regex,
regexp_replace(prod,'(.* (\d+))','$1-$2')
from foo
UNION ALL
select
'dot aster no space',
'''(.*(\d+))'',''$1-$2''',
regexp_replace(prod,'(.*(\d+))','$1-$2')
from foo
Result
TEST_TYPE THE_REGEX REGEXP_REPLACE
dot aster space '(.* (\d+))','$1-$2' MYPRODUCT 123-123
dot aster no space '(.*(\d+))','$1-$2' MYPRODUCT 123-3
I would have expected that, since the period matches any character, including a blank space, the two regexes would have the same result.
However, even accepting that they do not, I can't figure out why only the last 3 is captured in the second group.
Thanks.
It's a matter of greediness.
With the regex
'(.* (\d+))'
you ask explicitely for a space before the digits, so \d+ will get the 3 digits.
With the regex
'(.*(\d+))'
the dot .* will take as many characters as it can before matching a digit or more. So .* will match 'MYPRODUCT 12' and \d+ will match '3'.
Solution : the non-greedy quantifier '?'.
The regex would be
'(.*?(\d+))'
and it will match a maximum digits for \d+, then the remainder for .*

Oracle SQL - find string pattern in string

I need to extract some text from a string, but only where the text matches a string pattern. The string pattern will consist of...
2 numbers, a forward slash and 6 numbers
e.g. 12/123456
or
2 numbers, a forward slash, 6 numbers, a hyphen and 2 numbers
e.g. 12/123456-12
I know how to use INSTR to find a specific string. Is it possible to find a string that matches a specific pattern?
You'll need to use regexp_like to filter the results and regexp_substr to get the substring.
Here is roughly what it should look like:
select id, myValue, regexp_substr(myValue, '[0-9]{2}/[0-9]{6}') as myRegExMatch
from Foo
where regexp_like(myValue,'^([a-zA-Z0-9 ])*[0-9]{2}/[0-9]{6}([a-zA-Z0-9 ])*$')
with a link to a SQLFiddle that you can see in action and adjust to your taste.
The regexp_like provided in the sample above takes into consideration the alphanumerics and whitespace characters that may bound the number pattern.
Use regexp_like.
where regexp_like(col_name,'\s[0-9]{2}\/[0-9]{6}(-[0-9]{2})?\s')
\s matches a space. Include them at the start and end of pattern.
[0-9]{2}\/[0-9]{6} matches 2 numerics, a forward slash and 6 numerics
(-[0-9]{2})? is optional for a hyphen and 2 numerics following the previous pattern.
regexp_like(col_name,'^\d{2}/\d{6}($|-\d{2}$)')
or
regexp_like(col_name,'^\d{2}/\d{6}(-\d{2})?$')