Regexp_substr is not working as expected - sql

So, I have a piece of simple SQL like:
select
REGEXP_SUBSTR (randomcol, '[^|]+', 1, 2)
||'|'|| REGEXP_SUBSTR (randomcol, '[^|]+', 1, 3)
||'|'|| REGEXP_SUBSTR (randomcol, '[^|]+', 1, 4)
from table1 where ADDTL_DETAIL_INFO is not null and module_key='01-07-2016 00:00:00/2212/ 1';
The idea is to get the pipe separated values present in the randomcol column where the value present is:
~custom|HELLO1||HELLO3
So I need the values like HELLO1,whitespace (as there is no value between the second pipe and the third pipe) and HELLO3.
But when I ran the above query it returns as:
HELLO1|HELLO3|
and the white space is gone. I need this white space to retain. So what am I doing wrong here?

Regex of the form '[^|]+' does not work with NULL list elements and should be avoided! See this post for more info: Split comma separated values to columns in Oracle
Use this form instead:
select regexp_substr('1,2,3,,5,6', '(.*?)(,|$)', 1, 5, NULL, 1) from dual;
Which can be read as "get the 1st remembered group of the 5th occurrence of the set of characters ending with a comma or the end of the line".
So for your 4th element, you would use this to preserve the NULL in element 3 (assuming you want to build it by separate elements and not just grab the string from the character after the first separator to the end):
...
REGEXP_SUBSTR (addtl_detail_info, '(.*?)(\||$)', 1, 2) ||
REGEXP_SUBSTR (addtl_detail_info, '(.*?)(\||$)', 1, 3) ||
REGEXP_SUBSTR (addtl_detail_info, '(.*?)(\||$)', 1, 4)
...
You know, this may be easier. Just grab everything after the first pipe:
SQL> select REGEXP_replace('~custom|HELLO1||HELLO3', '^.*?\|(.*)', '\1') result
from dual;
RESULT
--------------
HELLO1||HELLO3
SQL>
The parenthesis surround what you want to "remember" and the replace string references the 1st remembered group with "\1".

Related

How to substring last two tokens from a string separated by dot in ORACLE SQL if only 4 and 5th tokens exists

I have a bunch of strings in a table called code_table and column named code as shown where teach token in the string is separated by a dot. The number of tokens in each string can vary but I have to consider ONLY the strings that have both 4th and 5th tokens and others can be ignored. A code string can have maximum of 5 tokens. If and only if the string contain both 4th and 5th tokens, then I have to concatenate them and create a single code. Here is an example:Please Click Here
You can use INSTR to look for the 4th occurrence of a . to determine which rows should be returned, then a simple SUBSTR and REPLACE to format the result as you would like.
WITH
code_table (code)
AS
(SELECT 'ppge.34.aaa.fesl' FROM DUAL
UNION ALL
SELECT 'aler.56.fghh.sdef.wath' FROM DUAL
UNION ALL
SELECT 'apef.23.ared' FROM DUAL
UNION ALL
SELECT 'eard.wed' FROM DUAL
UNION ALL
SELECT 'wera.fgeg.23d.adef.erff' FROM DUAL)
SELECT REPLACE (SUBSTR (code,
INSTR (code,
'.',
1,
3)
+ 1),
'.')
FROM code_table
WHERE INSTR (code,
'.',
1,
4) > 0;

How to use Regex in Oracle SQL request

I have this complex string in a database, which represents 4 elements separated by delimeters '|'
1944913|200010|157|4*
1944913|200085|157|1.22*
NOTE: both lines of strings exist in the same database row. There could be up to 10 lines of strings in a single cell
I want an oracle sql command that, given either 200010 or 200085, returns either the 1st or second element next to it, like:
Given 200010, returns 157 or returns 4*
Given 200085, returns 157 or returns 1.22
How would I go about doing that?
You can use regexp_substr and instr to get the required results:
select a.*,
CASE WHEN instr(text1,'200085')>0 then REGEXP_SUBSTR (substr(text1,instr(text1,'200085'),500),'[^,|]+', 1, 2)
ELSE NULL END as partA,
CASE WHEN instr(text1,'200085')>0 then REGEXP_SUBSTR (substr(text1,instr(text1,'200085'),500),'[^,|]+', 1, 3)
ELSE NULL END as partB
from tableA a
Simply replace '200085' with the string you are looking for and you should be good to go. Hope this helps.
You can split this into columns easily. So, I think something like this is pretty much what you want:
select regexp_substr(col, '[^|]+', 1, 2) as val_2,
regexp_substr(col, '[^|]+', 1, 3) as val_3,
regexp_substr(col, '[^|]+', 1, 4) as val_4
from t;

regexp_substr strip text between first forward slash and second one

/abc/required_string/2/ should return abc with regexp_substr
SELECT REGEXP_SUBSTR ('/abc/blah/blah/', '/([a-zA-Z0-9]+)/', 1, 1, NULL, 1) first_val
from dual;
You might try the following:
SELECT TRIM('/' FROM REGEXP_SUBSTR(mycolumn, '^\/([^\/]+)'))
FROM mytable;
This regular expression will match the first occurrence of a pattern starting with / (I habitually escape /s in regular expressions, hence \/ which won't hurt anything) and including any non-/ characters that follow. If there are no such characters then it will return NULL.
Hope this helps.
You can search for /([^/]+)/, which says:
/ forward slash
( start of subexpression (usually called "group" in other languages)
[^/] any character other than forward slash
+ match the preceding expression one or more times
) end of subexpression
/ forward slash
You can use the 6th argument to regexp_substr to select a subexpression.
Here we pass 1 to match only the characters between the /s:
select regexp_substr(txt, '/([^/]+)/', 1, 1, null, 1)
from t1
See it working at SQL Fiddle.
Classic SUBSTR + INSTR offer a simple solution; I know you specified regular expressions, but - consider this too, might work better for a large data volume.
SQL> with test (col) as
2 (select '/abc/required_string/2/' from dual)
3 select substr(col, 2, instr(col, '/', 1, 2) - 2) result
4 from test;
RES
---
abc
SQL>
Here's another way to get the 2nd occurrence of a string of characters followed by a forward slash. It handles the problem if that element happens to be NULL as well. Always expect the unexpected!
Note: If you use the regex form of [^/]+, and that element is NULL it will return "required string" which is NOT what you expect! That form does NOT handle NULL elements. See here for more info: [https://stackoverflow.com/a/31464699/2543416]
with tbl(str) as (
select '/abc/required_string/2/' from dual union all
select '//required_string1/3/' from dual
)
select regexp_substr(str, '(.*?)(/)', 1, 2, null, 1)
from tbl;

How to extract value between 2 slashes

I have a string like "1490/2334/5166400411000434" from which I need to derive value after second slash. I tried below logic
select REGEXP_SUBSTR('1490/2334/5166400411000434','[^/]+',1,3) from dual;
it is working fine. But when i dont have value between first and second slash it is returining blank.
For example my string is "1490//5166400411000434" and am trying
select REGEXP_SUBSTR('1490//5166400411000434','[^/]+',1,3) from dual;
it is returning blank. Please suggest me what i am missing.
If I understand well, you may need
regexp_substr(t, '(([^/]*/){2})([^/]*)', 1, 1, 'i', 3)
This handles the first 2 parts like 'xxx/' and then checks for a sequence of non / characters; the parameter 3 is used to get the 3rd matching subexpression, which is what you want.
For example:
with test(t) as (
select '1490/2334/5166400411000434' from dual union all
select '1490//5166400411000434' from dual union all
select '1490//5166400411000434/ramesh/3344' from dual
)
select t, regexp_substr(t, '(([^/]*/){2})([^/]*)', 1, 1, 'i', 3) as substr
from test
gives:
T SUBSTR
---------------------------------- ----------------------------------
1490/2334/5166400411000434 5166400411000434
1490//5166400411000434 5166400411000434
1490//5166400411000434/ramesh/3344 5166400411000434
You can REVERSE() your string and take the value before the first slash. And then reverse again to obtain the desired output.
select reverse(regexp_substr(reverse('1490//5166400411000434'), '[^/]+', 1, 1)) from dual;
It can also be done with basic substring and instr function:
select reverse(SUBSTR(reverse('1490//5166400411000434'), 0, INSTR(reverse('1490//5166400411000434'), '/')-1)) from dual;
Use other options in REGEXP_SUBSTR to match a pattren
select REGEXP_SUBSTR('1490//5166400411000434','(/\d*)/(\d+)',1,1,'x',2) from dual
Basically it is finding the pattren of two / including digits starting from 1 with 1 appearance and ignoring whitespaces ('x') then outputting 2nd subexpression that is in second expression within ()
... pattern,1,1,'x',subexp2)

How to replace more than one character in oracle?

How to replace multiple whole characters, except those in combinations...?
The below code replaces multiple characters, but it also disturbing those in combinations.
SELECT regexp_replace('a,ca,va,ea,r,y,q,b,g','(a|y|q|g)','X') RESULT FROM dual;
Current output:
RESULT
--------------------
X,cX,vX,eX,r,X,X,b,X
Expected output:
RESULT
------------------------
'X,ca,va,ea,r,X,X,b,X
I just want to replace only separate whole characters('a','y','q','g'), but not the 1 in combinations('ca','va','ea')...
Because you are delimiting with a comma ',' you can combine that like ',a,'
and this will replace only single a's.
you can try follows:
with t as
(
select 'a,ca,va,ea,r,y,q,b,g' str
from dual
)
select substr(sys_connect_by_path(regexp_replace(regexp_substr(str, '[^,]+', 1, level), '^(a|y|q|g)$', 'X'), ','), 2) as str
from t
where connect_by_isleaf = 1
connect by level <= length(regexp_replace(str, '[^,]*')) + 1;
Sadly oracle doesn´t support lookahead and lookbehind. But this is a solution i came up with.
SELECT regexp_replace
(regexp_replace
('a,ca,va,ea,r,y,q,b,g',
'^[ayqg](,)|(,)[ayqg](,)|(,)[ayqg]$',
'\2\4X\1\3'),'(,)[ayqg](,)','\1X\2')
RESULT FROM dual;
I had to use the regexp twice sadly, since it doesn´t find two similar values following after each other and replacing it. ..,a,y,.. is getting replaced as ..,X,y,... So the second call replaces the missing [ayqg] with the exact values. In the first inner regexp call replaces the first and last values.
Maybe this could be simplified into one expression, but i am not that conform with the regex from oracle.
As a explanation i am grouping the commata and basicly replace every ,[ayqg], with ,X, by backreferencing the commata
You would look for word boundaries, which is \b, and which is unfortunately not supported by Oracle's regexp_replace.
So let's look for a non-word character \W or the beginning ^ or ending $ of the text.
select
regexp_replace('a,ca,va,ea,r,y,q,b,g','(^|$|\W)(a|y|q|g)(^|$|\W)','\1X\3') as result
from dual;
In order to not remove the non-word characters, we must have them in the replace string: \1 for the expression in the first parenteses, \3 for the ones in the third. Thus we only change the expression in the second parentheses, which is a, y, q or g, with X.
Unfortunately above gives
X,ca,va,ea,r,X,q,b,X
The q was not replaced, because we recognize ',y,' thus being positioned a 'g,' whereas we'd need to be positioned at ',g,' to recognize g as a word, too.
So we need to replace in iterations (i.e. recursively):
with results(txt, num) as
(
select 'a,ca,va,ea,r,y,q,b,g' as txt, 0 as num from dual
union all
select regexp_replace(txt, '(^|$|\W)(a|y|q|g)(^|$|\W)','\1X\3'), num + 1 as num
from results
where txt <> regexp_replace(txt, '(^|$|\W)(a|y|q|g)(^|$|\W)','\1X\3')
)
select max(txt) keep (dense_rank last order by num) as result
from results;
EDIT: Kevin Esche is right; of course one has to do it only twice. Hence you can also do:
select
regexp_replace(txt, search_str, replace_str) as result
from
(
select
regexp_replace(txt, search_str, replace_str) as txt, search_str, replace_str
from
(
select
'a,ca,va,ea,r,y,q,y,q,b,g' as txt,
'(^|$|\W)(a|y|q|g)(^|$|\W)' as search_str,
'\1X\3' as replace_str
from dual
)
);
with replaced_values as (
SELECT case when length(val)=1 then regexp_replace(val,'(a|y|q|g)','X') else val end new_val, lvl
from (
SELECT regexp_substr('a,ca,va,ea,r,y,q,b,g','[^,]+', 1, LEVEL) val, level lvl FROM dual
connect by regexp_substr('a,ca,va,ea,r,y,q,b,g','[^,]+',1, LEVEL) is not null
) all_values
)
select lISTAGG(new_val, ',') WITHIN GROUP (ORDER BY lvl) RESULT
from replaced_values
This statement pivots data into rows and replaces only lines wich contains one character.
Data are then unpivoted in one rows
This sql works also with empty entries like 'a,,,b,c' and more complex regular expressions:
with t as
(select ',a,,ca,va,ea,bbb,ba,r,y,q,b,g,,,' as str,
',' as delimiter,
'(a|y|q|g|ea|[b]*)' as regexp_expr,
'X' as replace_expr
from dual)
(select substr (sys_connect_by_path(regexp_replace(substr(str,
decode(level - 1, 0, 0, instr(str, ',', 1, level - 1)) + 1,
decode(instr(str, ',', 1, level),
0,
length(str),
instr(str, ',', 1, level) - 1) -
decode(level - 1, 0, 0, instr(str, ',', 1, level - 1))),
'^' || regexp_expr || '$',
replace_expr), ','), 2)
from t
where connect_by_isleaf = 1
connect by level <= length(regexp_replace(str, '[^'|| delimiter||']')) + 1)
Result
,X,,ca,va,X,X,ba,r,X,X,X,X,,,
Don't Know much Oracle, but I would have thought something like this could work. Assuming the delimiter is always a comma.
SELECT
regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace('a,ca,va,ea,r,y,q,b,g','(,a,|,y,|,q,|,g,)',',X,') ,'(,a,|,y,|,q,|,g,)',',X,'), '(^a,|^y,|^q,|^g,)','X,'), '(,a$|,y$|,q$|,g$)',',X'), '(^a$|^y$|^q$|^g$)','X')
RESULT FROM test;
The first two parts replaces a single character in commas in the middle, the third part gets those at the start of the string, the fourth is for the end of the string and the fifth is for when then string has just one character.
This answer might will be simplifiable by advanced Regexp use.
How i can replace words?
RS & OS ===> D, LS & IS ==== >
SECTION_ID Output required
1-LS-1991 1-P-1991
1-IS-1991 1-P-1991
1-RS-1991 1- D- 1991
1-OS-1991 1-D-1991