How to use REGEX_SUBSTR - sql

I need to extract a substring from a long string. I tried the following query but doesn't work , it returns me NULL,
I want to extract the first value 12 between the <cc> and </cc>
select regexp_substr('<CC>3</CC><CN>ROSSI</CN><NO>MARIO</NO><IN>VIA DELLE MIMOSE 4</IN>,'<CN>[^</CN>]*')
"REGEXPR_SUBSTR"
FROM DUAL;
I get as a result <CN>ROSSI but I want also to eliminate also the <CN> , any suggestion?

Don't use a regular expression to parse XML data; use a proper XML parser:
SELECT t.*
FROM XMLTABLE(
'/root'
PASSING XMLTYPE(
'<root>'
|| '<CC>3</CC><CN>ROSSI</CN><NO>MARIO</NO><IN>VIA DELLE MIMOSE 4</IN>'
|| '</root>'
)
COLUMNS
cc NUMBER PATH './CC',
cn VARCHAR2(20) PATH './CN',
no VARCHAR2(20) PATH './NO',
"IN" VARCHAR2(50) PATH './IN'
) t
Which outputs:
CC | CN | NO | IN
-: | :---- | :---- | :-----------------
3 | ROSSI | MARIO | VIA DELLE MIMOSE 4
db<>fiddle here

You may get ROSSI using
select regexp_substr('<CC>3</CC><CN>ROSSI</CN><NO>MARIO</NO><IN>VIA DELLE MIMOSE 4</IN>','<CN>([^<]*)</CN>', 1, 1, NULL, 1)
See the online Oracle demo.
The <CN>([^<]*)</CN> regex matches <CN>, then captures into Group 1 any zero or more chars other than < and then matches </CN>. Only the captured part is returned due to the last argument 1.

Use a subexpression (a matching group enclosed in parentheses) to grab what you want:
SELECT REGEXP_SUBSTR('<CC>3</CC><CN>ROSSI</CN><NO>MARIO</NO><IN>VIA DELLE MIMOSE 4</IN>',
'<CN>(.*?)</CN>', 1, 1, NULL, 1)
FROM DUAL;
Here we're telling REGEXP_SUBSTR that we want to match a string which begins with <CN>, is followed by a subexpression of any number of any characters (.*), and ends when </CN> is found. Because there's only a single subexpression ((.*?)) in the regular expression it's sub-expression number 1, which is indicated by the last parameter passed to REGEXP_SUBSTR above.
db<>fiddle here

Related

Merging tags to values separated by new line character in Oracle SQL

I have a database field with several values separated by newline.
Eg-(can be more than 3 also)
A
B
C
I want to perform an operation to modify these values by adding tags from front and end.
i.e the previous 3 values should need to be turned into
<Test>A</Test>
<Test>B</Test>
<Test>C</Test>
Is there any possible query operation in Oracle SQL to perform such an operation?
Just replace the start and end of each string with the XML tags using a multi-line match parameter of the regular expression:
SELECT REGEXP_REPLACE(
REGEXP_REPLACE( value, '^', '<Test>', 1, 0, 'm' ),
'$', '</Test>', 1, 0, 'm'
) AS replaced_value
FROM table_name;
Which, for the sample data:
CREATE TABLE table_name ( value ) AS
SELECT 'A
B
C' FROM DUAL;
Outputs:
| REPLACED_VALUE |
| :------------- |
| <Test>A</Test> |
| <Test>B</Test> |
| <Test>C</Test> |
db<>fiddle here
You can use normal replace function as follows:
Select '<test>'
|| replace(your_column,chr(10),'</test>'||chr(10)||'<test>')
|| '</test>'
From your_table;
It will be faster than its regexp_replace function.
Db<>fiddle

How to extract a string with several lines between brackets in oracle sql query

I am trying to extract a value between the brackets from a string.
Here(How to extract a string between brackets in oracle sql query), it is explains how to do.
But in my situation, the string has 2 lines. With this way, I get only NULL.
SELECT REGEXP_SUBSTR('Gupta, Abha (01792)', '\((.+)\)', 1, 1, NULL, 1) FROM dual --01792
SELECT REGEXP_SUBSTR('Gupta, Abha (01
792)', '\((.+)\)', 1, 1, NULL, 1) FROM dual -- NULL
I known that i can remove the break line symbol and then use regex_substr but i need to keep the break line symbol
I would adress this with the following regex:
\(([^)]*)\
This makes use of a custom character class, [^)], which means: everything but a closing parenthese. This way, you do not have to worry about line breaks (since, obviously, a line break is not a closing parenthese), or any other special character:
Demo on DB Fiddle:
SELECT REGEXP_SUBSTR('Gupta, Abha (01
792)', '\(([^)]*)\)') res FROM dual
| RES |
| :------------------- |
| (01 |
| 792) |

PostgreSQL - Extract string before ending delimiter

I have a column of data that looks like this:
58,0:102,56.00
52,0:58,68
58,110
57,440.00
52,0:58,0:106,6105.95
I need to extract the character before the last delimiter (',').
Using the data above, I want to get:
102
58
58
57
106
Might be done with a regular expression in substring(). If you want:
the longest string of only digits before the last comma:
substring(data, '(\d+)\,[^,]*$')
Or you may want:
the string before the last comma (',') that's delimited at the start either by a colon (':') or the start of the string.
Could be another regexp:
substring(data, '([^:]*)\,[^,]*$')
Or this:
reverse(split_part(split_part(reverse(data), ',', 2), ':', 1))
More verbose but typically much faster than a (expensive) regular expression.
db<>fiddle here
Can't promise this is the best way to do it, but it is a way to do it:
with splits as (
select string_to_array(bar, ',') as bar_array
from foo
),
second_to_last as (
select
bar_array[cardinality(bar_array)-1] as field
from splits
)
select
field,
case
when field like '%:%' then split_part (field, ':', 2)
else field
end as last_item
from second_to_last
I went a little overkill on the CTEs, but that was to expose the logic a little better.
With a CTE that removes everything after the last comma and then splits the rest into an array:
with cte as (
select
regexp_split_to_array(
replace(left(col, length(col) - position(',' in reverse(col))), ':', ','),
','
) arr
from tablename
)
select arr[array_upper(arr, 1)] from cte
See the demo.
Results:
| result |
| ------ |
| 102 |
| 58 |
| 58 |
| 57 |
| 106 |
The following treats the source string as an "array of arrays". It seems each data element can be defined as S(x,y) and the overall string as S1:S2:...Sn.
The task then becomes to extract x from Sn.
with as_array as
( select string_to_array(S[n], ',') Sn
from (select string_to_array(col,':') S
, length(regexp_replace(col, '[^:]','','g'))+1 n
from tablename
) t
)
select Sn[array_length(Sn,1)-1] from as_array
The above extends S(x,y) to S(a,b,...,x,y) the task remains to extracting x from Sn. If it is the case that all original sub-strings S are formatted S(x,y) then the last select reduces to select Sn[1]

Non-greedy Oracle SQL regexp_replace [duplicate]

This question already has answers here:
Why doesn't a non-greedy quantifier sometimes work in Oracle regex?
(4 answers)
Closed 5 years ago.
I'm having some issues dealing with the non-greedy regex operator in Oracle.
This seems to work:
select regexp_replace('abcc', '^ab.*?c', 'Z') from dual;
-- output: Zc (does not show greedy behavior)
while this does not:
select regexp_replace('abc:"123", def:"456", hji="789", dasdjaoijdsa', '(^.*def:")(.*?)(".*$)', '\2') from dual;
-- output: 456", hji="789 (shows greedy behavior)
-- I would expect 456 as output.
Is there something glaringly obvious that I may be missing here?
Thanks
You can use a non-greedy regular expression in REGEXP_SUBSTR:
SELECT REGEXP_SUBSTR(
'abc:"123", def:"456", hji="789", dasdjaoijdsa', -- input
'def:"(.*?)"', -- pattern
1, -- start character
1, -- occurrence
NULL, -- flags
1 -- capture group
) AS def
FROM DUAL;
Results:
| DEF |
|-----|
| 456 |
If you want to skip escaped quotation marks then you can use:
SELECT REGEXP_SUBSTR(
'abc:"123", def:"456\"Test\"", hji="789", dasdjaoijdsa',
'def:"((\\"|[^"])*)"',
1,
1,
NULL,
1
) AS def
FROM DUAL;
Results:
| DEF |
|-------------|
| 456\"Test\" |
Update:
You can get your query to work by making the first wild-card match non-greedy:
select regexp_replace(
'abc:"123", def:"456", hji="789", dasdjaoijdsa',
'(^.*?def:")(.*?)(".*$)',
'\2'
) AS def
FROM DUAL;
Results:
| DEF |
|-----|
| 456 |
I don't know exactly why your regex replace is failing, but I can offer a version of your query which is working:
select
regexp_replace('abc:"123", def:"456", hji="789", dasdjaoijdsa',
'^(.*def:")([^"]*).*',
'\2') from dual
The only explanation I have is that lazy dot isn't working, at least not in the context of the capture group. When I switch ([^"]*) above to (.*?), the query will fail.
Demo

PL SQL - Trimming a string proceedurally

I need a way to trim a string in PL/SQL based on the location of the last commas in the string. However, there is no uniform format for the incoming strings, and I can't find a way to trim the string effectively.
HU-15-02 | HU, NYI, HAA East (should be trimmed to just HAA East)
MX-01-05 | MX, 01-05, OFFICES (OFFICES)
DK-94-02 | DK, ViewCom (VIEWCOM)
the format is country code, followed by a building ID (if applicable), followed by the name of the building (which is what I want)
Get the location of last comma by counting back from the end of the string and then trim from the comma plus space forward
select substr(your_text,INSTR(your_text,',',-1) +2)
from your_table;
REGEXP_SUBSTR() to the rescue:
SQL> with tbl(str) as (
select 'HU, NYI, HAA East' from dual
union
select 'MX-01-05 | MX, 01-05, OFFICES' from dual
union
select 'DK-94-02 | DK, ViewCom' from dual
)
select regexp_substr(str, '^.*, (.*)$', 1, 1, null, 1) bldg_name
from tbl;
BLDG_NAME
-----------------------------
ViewCom
HAA East
OFFICES
SQL>