ORACLE REGEXP limitation? - sql

I'm testing oracle REGEXP_SUBSTR function and regexp that works in Python or Web testing tools like https://regex101.com/ doesn't work with Oracle.
Example:
((?:NF\s{0,1}EN){0,1}[\s]{0,1}ISO[\s]{0,1}[\d]{3,6}(?:[\:]{0,1}\d{1,4}){0,1}[\-]{0,1}\d{0,1})
STRING: VAS H M1582/950-80 ABCDFEF - ISO4014
MATCH: ISO4014, but oracle regexp_like doesn't match:
NOT MATCH:
SELECT REGEXP_SUBSTR (
'VAS H M1582/950-80 ABCDFEF - ISO4014',
'((?:NF\s{0,1}EN){0,1}[\s]{0,1}ISO[\s]{0,1}[\d]{3,6}(?:[\:]{0,1}\d{1,4}){0,1}[\-]{0,1}\d{0,1})')
FROM DUAL;
Any idea?

You can use
(NF\s?EN)?\s?ISO\s?\d{3,6}(:?\d{1,4})?-?\d?
See its demo at regex101.com.
Note:
Oracle regex does not "like" [\s], i.e. shorthand character classes inside brackets, you should not use them like that
{0,1} is equal to ? (one or zero occurrences)
(?:...), non-capturing groups, are not supported, you should replace them with capturing groups. (Note that (:? is not a non-capturing group, it is just an optional colon at the start of the second capturing group in the pattern).

You can use my XT_REGEXP for PCRE compatible regular expressions: https://github.com/xtender/XT_REGEXP
select *
from
table(xt_regexp.get_matches(
'VAS H M1582/950-80 ABCDFEF - ISO4014',
'((?:NF\s{0,1}EN){0,1}[\s]{0,1}ISO[\s]{0,1}[\d]{3,6}(?:[\:]{0,1}\d{1,4}){0,1}[\-]{0,1}\d{0,1})'
));
Results:
COLUMN_VALUE
------------------------------
ISO4014
1 row selected.

Related

Extract string between different special symbols

I am having following string in my query
.\ABC\ABC\2021\02\24\ABC__123_123_123_ABC123.txt
beginning with a period from which I need to extract the segment between the final \ and the file extension period, meaning following expected result
ABC__123_123_123_ABC123
Am fairly new to using REGEXP and couldn't help myself to an elegant (or workable) solution with what Q&A here or else. In all queries the pattern is the same in quantity and order but for my growth of knowledge I'd prefer to not just count and cut.
You can use REGEXP_REPLACE function such as
REGEXP_REPLACE(col,'(.*\\)(.*)\.(.*)','\2')
in order to extract the piece starting from the last slash upto the dot. Preceding slashes in \\ and \. are used as escape characters to distinguish the special characters and our intended \ and . characters.
Demo
You need just regexp_substr and simple regexp ([^\]+)\.[^.]*$
select
regexp_substr(
'.\ABC\ABC\2021\02\24\ABC__123_123_123_ABC123.txt',
'([^\]+)\.[^.]*$',
1, -- position
1, -- occurence
null, -- match_parameter
1 -- subexpr
) substring
from dual;
([^\]+)\.[^.]*$ means:
([^\]+) - find one or more(+) any characters except slash([] - set, ^ - negative, ie except) and name it as group \1(subexpression #1)
\. - then simple dot (. is a special character which means any character, so we need to "escape" it using \ which is an escape character)
[^.]* - zero or more any characters except .
$ - end of line
So this regexp means: find a substring which consist from: one or more any characters except slash followed by dot followed by zero or more any characters except dot and it should be in the end of string. And subexpr parameter = 1, says oracle to return first subexpression (ie first matched group in (...))
Other parameters you can find in the doc.
Here is my simple full compatible example with Oracle 11g R2, PCRE2 and some other languages.
Oracle 11g R2 using function substr (Reference documentation)
select
regexp_substr(
'.\ABC\ABC\2021\02\24\ABC__123_123_123_ABC123.txt',
'((\w)+(_){2}(((\d){3}(_)){3}){1}((\w)+(\d)+){1}){1}',
1,
1
) substring
from dual;
Pattern: ((\w)+(_){2}(((\d){3}(_)){3}){1}((\w)+(\d)+){1}){1}
Result: ABC__123_123_123_ABC123
Just as simple as it can be, regular expressions always follow a minimal standard, as you can see portability also provided, just for the case someone else is interested in going the simplest way.
Hopefully, this will help you out!

How to display all numbers only in oracle sql?

How to display all numeric only in oracle sql?
Example:
12345
abcdef
6789
123abc
abc1234
987
Result:
12345
6789
987
Tried this syntax but it doesn’t work:
WHERE ID_No like '%[0-9]%'
One option would be using reverse logic through REGEXP_LIKE() function with '[^0-9]' pattern
SELECT *
FROM t
WHERE NOT REGEXP_LIKE(ID_No, '[^0-9]')
or
with [:digit:] posix as [^[:digit:]] pattern
SELECT *
FROM t
WHERE NOT REGEXP_LIKE(ID_No, '[^[:digit:]]')
Demo
The fastest solution uses standard (non-regular-expression) functions.
select *
from t
where translate(id_no, '~0123456789', '~') is null
;
Note the use of an additional character (I used ~ but you could use any other non-digit character) - this is needed due to Oracle's bizarre specification of TRANSLATE when the third argument is null.
translate will replace every digit with "nothing" (meaning, it will remove them all), while replacing tilde with itself and leaving all other characters untouched. So, the return value is null only if all the characters were digits.
Equivalent solution:
...
where ltrim(id_no, '0123456789') is null
Use REGEXP_LIKE() function:
select * from tablename
where REGEXP_LIKE(ID_No, '^[0-9]+$')
Yet another option, if you are on oracle 12.2 or higher, is to use TO_NUMBER with conversion error clause as follows:
SELECT *
FROM YOUR_TABLE
WHERE TO_NUMBER(ID_NO DEFAULT -1 ON CONVERSION ERROR) <> - 1
OR ID_NO = '-1'
Cheers!!

Postgresql: Extracting substring after first instance of delimiter

I'm trying to extract everything after the first instance of a delimiter.
For example:
01443-30413 -> 30413
1221-935-5801 -> 935-5801
I have tried the following queries:
select regexp_replace(car_id, E'-.*', '') from schema.table_name;
select reverse(split_part(reverse(car_id), '-', 1)) from schema.table_name;
However both of them return:
01443-30413 -> 30413
1221-935-5801 -> 5801
So it's not working if delimiter appears multiple times.
I'm using Postgresql 11. I come from a MySQL background where you can do:
select SUBSTRING(car_id FROM (LOCATE('-',car_id)+1)) from table_name
Why not just do the PG equivalent of your MySQL approach and substring it?
SELECT SUBSTRING('abcdef-ghi' FROM POSITION('-' in 'abcdef-ghi') + 1)
If you don't like the "from" and "in" way of writing arguments, PG also has "normal" comma separated functions:
SELECT SUBSTR('abcdef-ghi', STRPOS('abcdef-ghi', '-') + 1)
I think that regexp_replace is appropriate, but using the correct pattern:
select regexp_replace('1221-935-5801', E'^[^-]+-', '');
935-5801
The regex pattern ^[^-]+- matches, from the start of the string, one or more non dash characters, ending with a dash. It then replaces with empty string, effectively removing this content.
Note that this approach also works if the input has no dashes at all, in which case it would just return the original input.
Use this regexp pattern :
select regexp_replace('1221-935-5801', E'^[^-]+-', '') from schema.table_name
Regexp explanation :
^ is the beginning of the string
[^-]+ means at least one character different than -
...until the - character is met
I tried it in a conventional way in general what we do (found
something similar to instr as strpos in postgrsql .) Can try the below
SELECT
SUBSTR(car_id,strpos(car_id,'-')+1,
length(car_id) ) from table ;

Regex in Postgres to extract full DN in OpenLDAP

I have a program to pass a full string of groups a user in OpenLDAP to Postgres query. The string is exactly like this:
( 'cn=user1,ou=org1,ou=suborg1,o=myorg','cn=user2,ou=org2,ou=suborg1,o=myorg','cn=user3,ou=org1,ou=suborg1,o=myorg','cn=user4,ou=org1,ou=suborg2,o=myorg' )
In a query, I only want that to be this in Postgres:
'user1','user3'
Basically extract value of cn= when the rest of the string is ou=org1,ou=suborg1,o=myorg.
user2 has ou=org2,ou=suborg1,o=myorg which is org2 so it won't match.
user4 won't match on suborg2 ,... The variation is unlimited so I like to look for exact match ou=org1,ou=suborg1,o=myorg only.
I know how to do replace but it can't handle unlimited scenarios. Is there a clean way to do this in regexp_replace or regexp_extract?
Probably the cleanest is by using SUBSTRING that can return just the captured substring:
SELECT SUBSTRING(strs FROM 'cn=([^,]+),ou=org1,ou=suborg1,o=myorg') FROM tb1;
Here, you match cn=, then capture into Group 1 any one or more chars other than , with the negated bracket expression [^,]+ and then match ,ou=org1,ou=suborg1,o=myorg to make sure there is your required right-hand context.
Else, you may try a REGEXP_REPLACE approach, but it will leave the values where no match is found intact:
SELECT REGEXP_REPLACE(strs, '.*cn=([^,]+),ou=org1,ou=suborg1,o=myorg.*', '\1') from tb1;
It matches any 0+ chars with .*, then cn=, again captures the non-comma chars into Group 1 and then matches ,ou=org1,ou=suborg1,o=myorg and 0+ chars to the end of the string.
See an online PostgreSQL demo:
CREATE TABLE tb1
(strs character varying)
;
INSERT INTO tb1
(strs)
VALUES
('cn=user1,ou=org1,ou=suborg1,o=myorg'),
('cn=user2,ou=org2,ou=suborg1,o=myorg'),
('cn=user3,ou=org1,ou=suborg1,o=myorg'),
('cn=user4,ou=org1,ou=suborg2,o=myorg')
;
SELECT REGEXP_REPLACE(strs, '.*cn=([^,]+),ou=org1,ou=suborg1,o=myorg.*', '\1') from tb1;
SELECT substring(strs from 'cn=([^,]+),ou=org1,ou=suborg1,o=myorg') from tb1;
Results:
Note you may leverage a very useful word boundary \y construct (see Table 9.20. Regular Expression Constraint Escapes) if you do not want to match ocn= with cn=,
'.*\ycn=([^,]+),ou=org1,ou=suborg1,o=myorg\y.*'
^^ ^^
You can use regexp_matches() to get all matching cn. Then use string_agg() to build a comma separated list of them.
SELECT string_agg(ldap.cn[1],
',') cn
FROM regexp_matches('( ''cn=user1,ou=org1,ou=suborg1,o=myorg'',''cn=user2,ou=org2,ou=suborg1,o=myorg'',''cn=user3,ou=org1,ou=suborg1,o=myorg'',''cn=user4,ou=org1,ou=suborg2,o=myorg'' )',
'''cn=([^,]*),ou=org1,ou=suborg1,o=myorg''',
'g') ldap(cn);
SQL Fiddle
Try regex: (?<=cn=)\w+(?=,ou=org1,ou=suborg1,o=myorg)
Demo

Regular Expression in oracle 11g how to retrieve data from a column

I have the following data stored in the database column xyz:
a=222223333;b=433333657675457;c=77777;
a=52424252424242;d=5353535353;b=7373737373;
There is no requirement that b value should always be there but if b value is present I have to retrieve the value following b=.
I want to retrieve the value of b using regular expressions in Oracle and I am unable to do it.
Can anyone please help me find a solution for this?
I suggest using Oracle built-in function REGEXP_SUBSTR which returns a substring using regular expressions. According to the example you posted, the following should work.
SELECT REGEXP_SUBSTR(xyz, 'b=\d+;') FROM your_table
You can use regexp_substr:
select substr(regexp_substr(';' || xyz, ';b=\d+'), 4) from your_table;
Concatenation with ; is to distinguish between key-value pair with key say 'ab' and 'b'.
Use a Subexpression to Select a Substring of Your REGEXP_SUBSTR Matching Pattern
My match pattern, 'b=(\d+);', includes the parenthesis which mark this subexpression which is the last parameter of REGEXP_SUBSTR.
If you look at the 12c documentation, you will see that the third example uses a subexpression.
The escaped d just is a regular expression shorthand to indicate that we are looking for digits and the plus symbol is a quantifier indicating 1 or more digits.
SCOTT#db>WITH smple AS (
2 SELECT
3 'a=52424252424242;d=5353535353;b=7373737373;' dta
4 FROM
5 dual
6 ) SELECT
7 dta,
8 regexp_substr(a.dta,'b=(\d+);',1,1,NULL,1) subexp
9 FROM
10 smple a;
DTA subexp
---------------------------------------------------------
a=52424252424242;d=5353535353;b=7373737373; 7373737373
above solution is working in all the cases even if b contains alphanumeric