Oracle REGEXP_SUBSTR from CLOB - sql

I'm trying to find a substring from a CLOB-field in my database.
Consider the following string:
someothertext 2. Grad Dekubitus (Druckgeschwür) mit
Abschürfung/Blase/Hautverlust someothertext
I only want to extract the "2. Grad" from the string, but my Regexp doesn't seem to work - I tested it on the string in some online regexp checkers, where it does actually work (Fiddle)
This is my regular expression:
REGEXP_SUBSTR(DBMS_LOB.SUBSTR(cf.TEXT, 4000), '\b[0-9]\.\sGrad$') AS "Grad"
Currently, it returns NULL, but I'm not sure why.
Any ideas on how to get this working?

Oracle does not support word boundaries \b in regular expressions.
Either remove the \b or replace it with (^|\s) if you are expecting white space before the digit.
You also need to remove the trailing $ as you are not trying to match the end of the string at that point.
REGEXP_SUBSTR( DBMS_LOB.SUBSTR(cf.TEXT, 4000), '(^|\s)[0-9]\.\sGrad' ) AS "Grad"
Also, if you can have multi-digit numbers then you may want to use [0-9]+.
If you do not want the leading white space then you can wrap the second part of your expression in a capturing group and then extract that capturing group's value with the 6th argument of REGEXP_SUBSTR:
REGEXP_SUBSTR(
DBMS_LOB.SUBSTR(cf.TEXT, 4000),
'(^|\s)([0-9]\.\sGrad)',
1, -- Start from the 1st character
1, -- Find the 1st occurrence
NULL, -- No flags
2 -- Return the 2nd capturing group
) AS "Grad"

Oracle regex does not support word boundaries. Also, the $ is redundant in your pattern (note you do not use it in your regex demo).
You can use
REGEXP_SUBSTR(
'someothertext 2. Grad Dekubitus (Druckgeschwür) mit Abschürfung/Blase/Hautverlust someothertext',
'(^|\D)([0-9]\.\sGrad)', 1, 1, NULL, 2
) AS "Grad"
where
(^|\D) - Group 1: start of string or a non-digit
([0-9]\.\sGrad) - Group 2: a digit, a dot, as whitespace and Grad
If the digit matched with [0-9] should be preceded with whitespace, you may replace (^|\D) with (\s|^).

Related

Get the part of the string up to the second to last occurrence of a character

I have the below strings
'HOLA1_HOLA2_HOLA3_HOLA4'
'HOLA1_HOLA2_HOLA3_HOLA4_HOLA5'
How could I get the part of the string up to the second to last occurrence of the '_' character?
Expected result:
'HOLA1_HOLA2'
'HOLA1_HOLA2_HOLA3'
Use simple (fast) string functions and find the substring up to the second-to-last underscore (rather than using (slow) regular expressions):
SELECT SUBSTR(value, 1, INSTR(value, '_', -1, 2) - 1) AS first_part
FROM table_name;
Which, for the sample data:
CREATE TABLE table_name (value) AS
SELECT 'HOLA1_HOLA2_HOLA3_HOLA4' FROM DUAL UNION ALL
SELECT 'HOLA1_HOLA2_HOLA3_HOLA4_HOLA5' FROM DUAL;
Outputs:
FIRST_PART
HOLA1_HOLA2
HOLA1_HOLA2_HOLA3
fiddle
Regarding a regular expression with this behaviour:
You can use a lookahead to get the expected Result
.*(?=_)
A lookahead is a zero length assertion (is not included in the match) that asserts that a match has to be followed by the given expression ( _ in this case). By default regex does a greedy match, therefore the lookahead targets the last underscore in the given text.
Try it here: regex101

How to extract a text between brackets in oracle sql query

I am trying to extract a value between the brackets from a string.
For example, I have this string:
No information was found [AI1234].
And I want to get the result between the brackets, i.e. AI1234.
However the expression is not always the same. It may vary.
I am trying to write a query like this:
REGEXP_SUBSTR(mssg, '\((.+)\)', 1, 1, NULL, 1) AS "description" from book
But it is not returning anything.What am I missing?
Also I already tried something like that, the things is that the bracket length may vary. So this one below will return something, but not what I am looking for:
substr(mssg,instr(mssg,'(')-8,10) as "description"
If you're looking for a group of digits between square brackets, try this:
WITH
indata(msg) AS (
SELECT 'No information was found [1234]'
)
SELECT
REGEXP_SUBSTR(
msg -- the string
, '^[^[]+[[](\d+)[]].*$' -- the pattern (with a captured
-- string "\d+" in round parentheses)
, 1 -- start from position 1
, 1 -- first found occurrence
, '' -- no modifiers
, 1 -- first captured group
) AS extr
FROM indata;
extr
------
1234
You should do googling more about oracle regexp.
Please try with this.(above Oracle 11g)
SELECT REGEXP_SUBSTR(mssg, '\[[^0-9]*(\d+)[^0-9]*\]', 1, 1, NULL, 1) description
FROM book;
** This helped me to answer here.
UPDATE: This will be OK.
SELECT REGEXP_SUBSTR('No information was found [{AI1234}].', '[[({][^0-9]*(\d+)[^0-9]*[]})]', 1, 1, NULL, 1) description
FROM dual;
UPDATE: Final solution
SELECT REGEXP_SUBSTR('No information was found [{AI1234}].', '[[({]+([^][)(}{]*)[])}]+', 1, 1, NULL, 1) description
FROM dual;
Here, you should take care to [^][)(}{].
DO NOT swap the bracket chracters.
I'll quote from Oracle 11g Regexp reference
[ ]
Bracket expression for specifying a matching list that should match any one of the expressions represented in the list. A non-matching list expression begins with a circumflex (^) and specifies a list that matches any character except for the expressions represented in the list.
To specify a right bracket (]) in the bracket expression, place it first in the list (after the initial circumflex (^), if any).
To specify a hyphen in the bracket expression, place it first in the list (after the initial circumflex (^), if any), last in the list, or as an ending range point in a range expression.
This part - [^ ] - was a hard nut to crack and finally I found solution from the reference, that's why I emphasis this.

Extract string between different special symbols

I am having following string in my query
.\ABC\ABC\2021\02\24\ABC__123_123_123_ABC123.txt
beginning with a period from which I need to extract the segment between the final \ and the file extension period, meaning following expected result
ABC__123_123_123_ABC123
Am fairly new to using REGEXP and couldn't help myself to an elegant (or workable) solution with what Q&A here or else. In all queries the pattern is the same in quantity and order but for my growth of knowledge I'd prefer to not just count and cut.
You can use REGEXP_REPLACE function such as
REGEXP_REPLACE(col,'(.*\\)(.*)\.(.*)','\2')
in order to extract the piece starting from the last slash upto the dot. Preceding slashes in \\ and \. are used as escape characters to distinguish the special characters and our intended \ and . characters.
Demo
You need just regexp_substr and simple regexp ([^\]+)\.[^.]*$
select
regexp_substr(
'.\ABC\ABC\2021\02\24\ABC__123_123_123_ABC123.txt',
'([^\]+)\.[^.]*$',
1, -- position
1, -- occurence
null, -- match_parameter
1 -- subexpr
) substring
from dual;
([^\]+)\.[^.]*$ means:
([^\]+) - find one or more(+) any characters except slash([] - set, ^ - negative, ie except) and name it as group \1(subexpression #1)
\. - then simple dot (. is a special character which means any character, so we need to "escape" it using \ which is an escape character)
[^.]* - zero or more any characters except .
$ - end of line
So this regexp means: find a substring which consist from: one or more any characters except slash followed by dot followed by zero or more any characters except dot and it should be in the end of string. And subexpr parameter = 1, says oracle to return first subexpression (ie first matched group in (...))
Other parameters you can find in the doc.
Here is my simple full compatible example with Oracle 11g R2, PCRE2 and some other languages.
Oracle 11g R2 using function substr (Reference documentation)
select
regexp_substr(
'.\ABC\ABC\2021\02\24\ABC__123_123_123_ABC123.txt',
'((\w)+(_){2}(((\d){3}(_)){3}){1}((\w)+(\d)+){1}){1}',
1,
1
) substring
from dual;
Pattern: ((\w)+(_){2}(((\d){3}(_)){3}){1}((\w)+(\d)+){1}){1}
Result: ABC__123_123_123_ABC123
Just as simple as it can be, regular expressions always follow a minimal standard, as you can see portability also provided, just for the case someone else is interested in going the simplest way.
Hopefully, this will help you out!

Use REGEXP_SUBSTR to extract string of varied length

I want to extract alphanumeric text of varied length from a string between the second occurrence of a specific characters.
I have tried various forms of substr and regexp_substr but can't seem to get the syntax right. This is for use in Teradata SQL assistant. In the past I would have to create a temp table and use substr twice before trimming down the string to what I need. I want to do it all in one go.
SELECT regexp_substr('Channel:DF GB, Order Num:12345T6, Order Date:01/01/2019, Charge Codes:TAXES,,GBRAX', 'Num\\:+(\\:+)',1,2, ':') as RESULTING_STRING
My desired result is to return ONLY what is between "Num:" and the next "," in this case "12345T6". The length of the order number can vary so it is not a fixed length. When I run my code the actual output is a '?' returned by Teradata. What am I doing wrong?
This seems to work:
SELECT regexp_substr('Channel:DF GB, Order Num:12345T6, Order Date:01/01/2019, Charge Codes:TAXES,,GBRAX', 'Num:(\w*)', 1, 1, NULL, 1) as RESULTING_STRING from dual
Finds Num: and then captures as many word characters (, is not a word char) as there are available. The last parameter - subexpr - specifies which subexpression (aka capture group) you want, without it the whole thing will be matched (Num:12345T6).
Assuming you use Teradata SQL Assistant to query a Teradata system (but why do you tag Oracle then) the RegEx syntax is slightly different (both use a different RegEx dialects):
Teradata's RegExp_Substr doesn't support the subexpression parameter, you can either switch to the (I really don't know why) undocumented RegExp_Substr_gpl
RegExp_Substr_gpl(x, 'Num:([^,]*)', 1, 1, 'i', 1)
or tell the RegEx to forget the previous match using \K:
RegExp_Substr(x, 'Num:\K[^,]*', 1,1, 'i')
You can give a try to the below pattern search !
SELECT REGEXP_REPLACE ((REGEXP_SUBSTR('Channel:DF GB, Order Num:12345T6, Order Date:01/01/2019, Charge Codes:TAXES,,GBRAX', 'Num:[A-Za-z0-9]*',1,1, 'i')),'Num:','',1,1,'i') AS RESULTING_STRING
Regexp_substr pattern search ['Num:[A-Za-z0-9]*'], will first filter out the alphanumeric characters that follow the pattern 'Num:',astriek, helps to find out zero or more occurrences of the specified pattern.
For eg:, in this 'Num:12345T6' will be filtered out of the string provided, also note the last parameter in the regexp_substr is 'i', which ensures case in-specific search.
Lastly, Regexp_replace will replace the pattern 'Num:' from the output of the regexp_substr with an empty string,resulting in a final string as '12345T6'.

SQL regex expression for text before pipe

I need an oracle regex to fetch data before first pipe and after the last slash from the text before pipe.
For example, from the string:
test=file://2019/13/40/9/53/**2abc123-7test-1edf-9xyz-12345678.bin**|type
the data to be fetched is:
2abc123-7test-1edf-9xyz-12345678.bin
This works in Oracle :
select regexp_substr(col,'[^|/]+\.\w+',1,1,'i')
from (
select 'test=file://2019/13/40/9/53/2abc123-7test-1edf-9xyz-12345678.bin|type=app/href|size=1234|encoding=|locale=en_|foo.bar' as col
from dual
) q
MySql & TeraData also have such REGEXP_SUBSTR function, but haven't tested it on those.
The pattern ^.+?/([^/]+?)\| starts at the beginning of the string, skips over every character, then captures all non-slash characters, between the last slash and the first pipe.
You may use:
REGEXP_SUBSTR(column, '/([^/|]+)\|', 1, 1, NULL, 1)
Live demo here
Regex breakdown:
/ Match literally
( Start of capturing group #1
[^/|]+ Match anything except slash and pipe, at least one character
) End of CG #1
\| Match a pipe
[^\/]*?(?=\|)
[^\/]*? — matches any character that is not a backslash
(?=\|) — positive lookahead to match a vertical line