Non-greedy Oracle SQL regexp_replace [duplicate] - sql

This question already has answers here:
Why doesn't a non-greedy quantifier sometimes work in Oracle regex?
(4 answers)
Closed 5 years ago.
I'm having some issues dealing with the non-greedy regex operator in Oracle.
This seems to work:
select regexp_replace('abcc', '^ab.*?c', 'Z') from dual;
-- output: Zc (does not show greedy behavior)
while this does not:
select regexp_replace('abc:"123", def:"456", hji="789", dasdjaoijdsa', '(^.*def:")(.*?)(".*$)', '\2') from dual;
-- output: 456", hji="789 (shows greedy behavior)
-- I would expect 456 as output.
Is there something glaringly obvious that I may be missing here?
Thanks

You can use a non-greedy regular expression in REGEXP_SUBSTR:
SELECT REGEXP_SUBSTR(
'abc:"123", def:"456", hji="789", dasdjaoijdsa', -- input
'def:"(.*?)"', -- pattern
1, -- start character
1, -- occurrence
NULL, -- flags
1 -- capture group
) AS def
FROM DUAL;
Results:
| DEF |
|-----|
| 456 |
If you want to skip escaped quotation marks then you can use:
SELECT REGEXP_SUBSTR(
'abc:"123", def:"456\"Test\"", hji="789", dasdjaoijdsa',
'def:"((\\"|[^"])*)"',
1,
1,
NULL,
1
) AS def
FROM DUAL;
Results:
| DEF |
|-------------|
| 456\"Test\" |
Update:
You can get your query to work by making the first wild-card match non-greedy:
select regexp_replace(
'abc:"123", def:"456", hji="789", dasdjaoijdsa',
'(^.*?def:")(.*?)(".*$)',
'\2'
) AS def
FROM DUAL;
Results:
| DEF |
|-----|
| 456 |

I don't know exactly why your regex replace is failing, but I can offer a version of your query which is working:
select
regexp_replace('abc:"123", def:"456", hji="789", dasdjaoijdsa',
'^(.*def:")([^"]*).*',
'\2') from dual
The only explanation I have is that lazy dot isn't working, at least not in the context of the capture group. When I switch ([^"]*) above to (.*?), the query will fail.
Demo

Related

Merging tags to values separated by new line character in Oracle SQL

I have a database field with several values separated by newline.
Eg-(can be more than 3 also)
A
B
C
I want to perform an operation to modify these values by adding tags from front and end.
i.e the previous 3 values should need to be turned into
<Test>A</Test>
<Test>B</Test>
<Test>C</Test>
Is there any possible query operation in Oracle SQL to perform such an operation?
Just replace the start and end of each string with the XML tags using a multi-line match parameter of the regular expression:
SELECT REGEXP_REPLACE(
REGEXP_REPLACE( value, '^', '<Test>', 1, 0, 'm' ),
'$', '</Test>', 1, 0, 'm'
) AS replaced_value
FROM table_name;
Which, for the sample data:
CREATE TABLE table_name ( value ) AS
SELECT 'A
B
C' FROM DUAL;
Outputs:
| REPLACED_VALUE |
| :------------- |
| <Test>A</Test> |
| <Test>B</Test> |
| <Test>C</Test> |
db<>fiddle here
You can use normal replace function as follows:
Select '<test>'
|| replace(your_column,chr(10),'</test>'||chr(10)||'<test>')
|| '</test>'
From your_table;
It will be faster than its regexp_replace function.
Db<>fiddle

How to get first string after character Oracle SQL

I'm trying to get first string after a character.
Example is like
ABCDEF||GHJ||WERT
I need only
GHJ
I tried to use REGEXP but i couldnt do it.
Can anyone help me with please?
Thank you
Somewhat simpler:
SQL> select regexp_substr('ABCDEF||GHJ||WERT', '\w+', 1, 2) result from dual;
^
RES |
--- give me the 2nd "word"
GHJ
SQL>
which reads as: give me the 2nd word out of that string. Won't work properly if GHJ consists of several words (but that's not what your example suggests).
Something like I interpret with a separator in place, In this case it is || or | example is with oracle database
-- pattern -- > [^] represents non-matching character and + for says one or more character followed by ||
-- 3rd parameter --> starting position
-- 4th parameter --> nth occurrence
WITH tbl(str) AS
(SELECT 'ABCDEF||GHJ||WERT' str FROM dual)
SELECT regexp_substr(str
,'[^||]+'
,1
,2) output
FROM tbl;
I think the most general solution is:
WITH tbl(str) AS (
SELECT 'ABCDEF||GHJ||WERT' str FROM dual UNION ALL
SELECT 'ABC|DEF||GHJ||WERT' str FROM dual UNION ALL
SELECT 'ABClDEF||GHJ||WERT' str FROM dual
)
SELECT regexp_replace(str, '^.*\|\|(.*)\|\|.*', '\1')
FROM tbl;
Note that this works even if the individual elements contain punctuation or a single vertical bar -- which the other solutions do not. Here is a comparison.
Presumably, the double vertical bar is being used for maximum flexibility.
You should use regexp_substr function
select regexp_substr('ABCDEF||GHJ||WERT ', '\|{2}([^|]+)', 1, 1, 'i', 1) str
from dual;
STR
---
GHJ

How to use REGEX_SUBSTR

I need to extract a substring from a long string. I tried the following query but doesn't work , it returns me NULL,
I want to extract the first value 12 between the <cc> and </cc>
select regexp_substr('<CC>3</CC><CN>ROSSI</CN><NO>MARIO</NO><IN>VIA DELLE MIMOSE 4</IN>,'<CN>[^</CN>]*')
"REGEXPR_SUBSTR"
FROM DUAL;
I get as a result <CN>ROSSI but I want also to eliminate also the <CN> , any suggestion?
Don't use a regular expression to parse XML data; use a proper XML parser:
SELECT t.*
FROM XMLTABLE(
'/root'
PASSING XMLTYPE(
'<root>'
|| '<CC>3</CC><CN>ROSSI</CN><NO>MARIO</NO><IN>VIA DELLE MIMOSE 4</IN>'
|| '</root>'
)
COLUMNS
cc NUMBER PATH './CC',
cn VARCHAR2(20) PATH './CN',
no VARCHAR2(20) PATH './NO',
"IN" VARCHAR2(50) PATH './IN'
) t
Which outputs:
CC | CN | NO | IN
-: | :---- | :---- | :-----------------
3 | ROSSI | MARIO | VIA DELLE MIMOSE 4
db<>fiddle here
You may get ROSSI using
select regexp_substr('<CC>3</CC><CN>ROSSI</CN><NO>MARIO</NO><IN>VIA DELLE MIMOSE 4</IN>','<CN>([^<]*)</CN>', 1, 1, NULL, 1)
See the online Oracle demo.
The <CN>([^<]*)</CN> regex matches <CN>, then captures into Group 1 any zero or more chars other than < and then matches </CN>. Only the captured part is returned due to the last argument 1.
Use a subexpression (a matching group enclosed in parentheses) to grab what you want:
SELECT REGEXP_SUBSTR('<CC>3</CC><CN>ROSSI</CN><NO>MARIO</NO><IN>VIA DELLE MIMOSE 4</IN>',
'<CN>(.*?)</CN>', 1, 1, NULL, 1)
FROM DUAL;
Here we're telling REGEXP_SUBSTR that we want to match a string which begins with <CN>, is followed by a subexpression of any number of any characters (.*), and ends when </CN> is found. Because there's only a single subexpression ((.*?)) in the regular expression it's sub-expression number 1, which is indicated by the last parameter passed to REGEXP_SUBSTR above.
db<>fiddle here

Extract Value from a string PostgreSQL

Simple Question
I have the following type of results in a string field
'Number=123456'
'Number=1234567'
'Number=12345678'
How do I extract the value from the string with regard that the value can change between 5-8 figures
So far I did this but I doubt that fits my requirement
SELECT substring('Size' from 8 for ....
If I can tell it to start from the = sign till the end that would help!
The likely simplest solution is to trim 7 leading characters with right():
right(str, -7)
Demo:
SELECT str, right(str, -7)
FROM (
VALUES ('Number=123456')
, ('Number=1234567')
, ('Number=12345678')
) t(str);
str | right
-----------------+----------
Number=123456 | 123456
Number=1234567 | 1234567
Number=12345678 | 12345678
You could use REPLACE:
SELECT col, REPLACE(col, 'Number=', '')
FROM tab;
DBFiddle Demo
Based on this question:
Split comma separated column data into additional columns
You could probably do the following:
SELECT *, split_part(col, '=', 2)
FROM table;
You may use regexp_matches :
with t(str) as
(
select 'Number=123456' union all
select 'Number=1234567' union all
select 'Number=12345678' union all
select 'Number=12345678x9'
)
select t.str as "String",
regexp_matches(t.str, '=([A-Za-z0-9]+)', 'g') as "Number"
from t;
String Number
-------------- ---------
Number=123456 123456
Number=1234567 1234567
Number=12345678 12345678
Number=12345678x9 12345678x9
--> the last line shows only we look chars after equal sign even if non-digit
Rextester Demo

Split string by space and character as delimiter in Oracle with regexp_substr

I'm trying to split a string with regexp_subtr, but i can't make it work.
So, first, i have this query
select regexp_substr('Helloworld - test!' ,'[[:space:]]-[[:space:]]') from dual
which very nicely extracts my delimiter - blank-blank
But then, when i try to split the string with this option, it just doesn't work.
select regexp_substr('Helloworld - test!' ,'[^[[:space:]]-[[:space:]]]+')from dual
The query returns nothing.
Help will be much appreciated!
Thanks
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE TEST( str ) AS
SELECT 'Hello world - test-test! - test' FROM DUAL
UNION ALL SELECT 'Hello world2 - test2 - test-test2' FROM DUAL;
Query 1:
SELECT Str,
COLUMN_VALUE AS Occurrence,
REGEXP_SUBSTR( str ,'(.*?)([[:space:]]-[[:space:]]|$)', 1, COLUMN_VALUE, NULL, 1 ) AS split_value
FROM TEST,
TABLE(
CAST(
MULTISET(
SELECT LEVEL
FROM DUAL
CONNECT BY LEVEL < REGEXP_COUNT( str ,'(.*?)([[:space:]]-[[:space:]]|$)' )
)
AS SYS.ODCINUMBERLIST
)
)
Results:
| STR | OCCURRENCE | SPLIT_VALUE |
|-----------------------------------|------------|--------------|
| Hello world - test-test! - test | 1 | Hello world |
| Hello world - test-test! - test | 2 | test-test! |
| Hello world - test-test! - test | 3 | test |
| Hello world2 - test2 - test-test2 | 1 | Hello world2 |
| Hello world2 - test2 - test-test2 | 2 | test2 |
| Hello world2 - test2 - test-test2 | 3 | test-test2 |
If i understood correctly, this will help you. Currently you are getting output as Helloworld(with space at the end). So i assume u don't want to have space at the end. If so you can simply use the space in the delimiter also like.
select regexp_substr('Helloworld - test!' ,'[^ - ]+',1,1)from dual;
OUTPUT
Helloworld(No space at the end)
As u mentioned in ur comment if u want two columns output with Helloworld and test!. you can do the following.
select regexp_substr('Helloworld - test!' ,'[^ - ]+',1,1),
regexp_substr('Helloworld - test!' ,'[^ - ]+',1,3) from dual;
OUTPUT
col1 col2
Helloworld test!
Trying to negate the match string '[[:space:]]-[[:space:]]' by putting it in a character class with a circumflex (^) to negate it will not work. Everything between a pair of square brackets is treated as a list of optional single characters except for named named character classes which expand out to a list of optional characters, however, due to the way character classes nest, it's very likely that your outer brackets are being interpreted as follows:
[^[[:space:]] A single non space non left square bracket character
- followed by a single hyphen
[[:space:]] followed by a single space character
]+ followed by 1 or more closing square brackets.
It may be easier to convert your multi-character separator to a single character with regexp_replace, then use regex_substr to find you individual pieces:
select regexp_substr(regexp_replace('Helloworld - test!'
,'[[:space:]]-[[:space:]]'
,chr(11))
,'([^'||chr(11)||']*)('||chr(11)||'|$)'
,1 -- Start here
,2 -- return 1st, 2nd, 3rd, etc. match
,null
,1 -- return 1st sub exp
)
from dual;
In this code I first changed - to chr(11). That's the ASCII vertical tab (VT) character which is unlikely to appear in most text strings. Then the match expression of the regexp_substr matches all non VT characters followed by either a VT character or the end of line. Only the non VT characters are returned (the first subexpression).
Slight improvement on MT0's answer. Dynamic count using regexp_count and proves it handles nulls where the format of [^delimiter]+ as a pattern does NOT handle NULL list elements. More info on that here: Split comma seperated values to columns
SQL> with tbl(str) as (
2 select ' - Hello world - test-test! - - test - ' from dual
3 )
4 SELECT LEVEL AS Occurrence,
5 REGEXP_SUBSTR( str ,'(.*?)([[:space:]]-[[:space:]]|$)', 1, LEVEL, NULL, 1 ) AS split_value
6 FROM tbl
7 CONNECT BY LEVEL <= regexp_count(str, '[[:space:]]-[[:space:]]')+1;
OCCURRENCE SPLIT_VALUE
---------- ----------------------------------------
1
2 Hello world
3 test-test!
4
5 test
6
6 rows selected.
SQL>
CREATE OR REPLACE FUNCTION field(i_string VARCHAR2
,i_delimiter VARCHAR2
,i_occurance NUMBER
,i_return_number NUMBER DEFAULT 0
,i_replace_delimiter VARCHAR2) RETURN VARCHAR2 IS
-----------------------------------------------------------------------
-- Function Name.......: FIELD
-- Author..............: Dan Simson
-- Date................: 05/06/2016
-- Description.........: This function is similar to the one I used from
-- long ago by Prime Computer. You can easily
-- parse a delimited string.
-- Example.............:
-- String.............: This is a cool function
-- Delimiter..........: ' '
-- Occurance..........: 2
-- Return Number......: 3
-- Replace Delimiter..: '/'
-- Return Value.......: is/a/cool
-------------------------------------------------------------------------- ---
v_return_string VARCHAR2(32767);
n_start NUMBER := i_occurance;
v_delimiter VARCHAR2(1);
n_return_number NUMBER := i_return_number;
n_max_delimiters NUMBER := regexp_count(i_string, i_delimiter);
BEGIN
IF i_return_number > n_max_delimiters THEN
n_return_number := n_max_delimiters + 1;
END IF;
FOR a IN 1 .. n_return_number LOOP
v_return_string := v_return_string || v_delimiter || regexp_substr (i_string, '[^' || i_delimiter || ']+', 1, n_start);
n_start := n_start + 1;
v_delimiter := nvl(i_replace_delimiter, i_delimiter);
END LOOP;
RETURN(v_return_string);
END field;
SELECT field('This is a cool function',' ',2,3,'/') FROM dual;
SELECT regexp_substr('This is a cool function', '[^ ]+', 1, 1) Word1
,regexp_substr('This is a cool function', '[^ ]+', 1, 2) Word2
,regexp_substr('This is a cool function', '[^ ]+', 1, 3) Word3
,regexp_substr('This is a cool function', '[^ ]+', 1, 4) Word4
,regexp_substr('This is a cool function', '[^ ]+', 1, 5) Word5
FROM dual;