Manipulating with regexp_substr - sql

I have an ETL task for datawarehouse-ing purposes, I need to extract the second part of a String after a delimiter occurence such as: '#', 'ý', '-'. For example test case string:
'Tori 1#MHK-MahallaKingaveKD' I should retrieve only 'MHK'
'HPHelm2ýFFS-Tredddline' I should retrieve only 'FFS'
I already tried using the cases above:
TRIM(CASE
WHEN INSTR('HPHelm2ýFFS-Tredddline', '#',1,1) > 0
THEN (REPLACE(
REGEXP_SUBSTR('HPHelm2ýFFS-Tredddline', '[^#]+', 1,2),
'#'
))
ELSE (CASE
WHEN INSTR('HPHelm2ýFFS-Tredddline', '-',1,1) > 0
THEN (REPLACE(
REGEXP_SUBSTR('HPHelm2ýFFS-Tredddline', '[^-]+', 1,2),
'-'
))
ELSE (CASE
WHEN INSTR('HPHelm2ýFFS-Tredddline','-') = 0 AND INSTR('HPHelm2ýFFS-Tredddline','ý') = 0 AND INSTR('HPHelm2ýFFS-Tredddline','#') = 0
THEN 'HPHelm2ýFFS-Tredddline'
ELSE (CASE
WHEN INSTR('HPHelm2ýFFS-Tredddline','ý',1,1) > 0
THEN (REPLACE(
REGEXP_SUBSTR('HPHelm2ýFFS-Tredddline', '[^ý]+', 1,2),
'ý'
))
END)
END)
END)
END)
Using the code above I can retrieve:
'Tori 1#MHK-MahallaKingaveKD' ====> 'MHK-MahallaKingaveKD'
'HPHelm2ýFFS-Tredddline' ====> 'FFS-Tredddline'
Expected output:
'Tori 1#MHK-MahallaKingaveKD' ====> 'MHK'
'HPHelm2ýFFS-Tredddline' ====> 'FFS'
So I have to exclude '-' and the string after.
I guess I should modify the regexp_substr pattern but can't seem to find a clear solution since '-' is specified in the case when statements as a delimiter.

I suggest retrieving the second occurrence of 1+ chars other than your delimiter chars:
regexp_substr(col, '[^#ý-]+', 1, 2)
Here, the search starts with the first char in the record (1), and the second occurrence is returned (2).
The [^#ý-]+ pattern matches one or more (+) chars other than #, ý and -.

The following will give you what you're looking for:
WITH cteData AS (SELECT 'Tori 1#MHK-MahallaKingaveKD' AS STRING FROM DUAL UNION ALL
SELECT 'HPHelm2ýFFS-Tredddline' FROM DUAL)
SELECT STRING, REGEXP_SUBSTR(STRING, '[#ý-](.*)[#ý-]', 1, 1, NULL, 1) AS SUB_STRING
FROM cteData;
The parentheses around the .* between the delimiter groups makes the .* a sub-expression, and the final ,1 in the parameter list tells REGEXP_SUBSTR to give you back the value of sub-expression #1. Since there's only one sub-expression in the regular expression it gives you back the value of the .*, which is what you're looking for.
sqlfiddle here

Related

Display only characters to the left of a special character

Using Oracle 11. We need to remove a semi colon and anything to the right of the semi colon in a set of strings.
The strings may or may not contain a semi colon. If there is no semi colon, we will return the entire string.
I can see using CASE to alter the string only if there is a semi colon, but am sure of the syntax to handle the removal of the semi colon and everything that follows the semi colon.
Strings
123456;789154 would return 123456
123456789 would return 123456789
Case
When string1 like ('%;%')
then substr( …….) or trim(…)
Else string1
End
As trimmedstring
Any and all help/pointers appreciated
Assuming that you have to remove anything starting from the first semicolon in the string, this could be a way:
with test(s) as (
select '123456;789154' from dual union all
select '123456;789154;567' from dual union all
select '123456789' from dual
)
select s,
case
when instr(s, ';') = 0 then s
else substr(s, 1, instr(s, ';')-1)
end
from test
With regular expressions you could get the same result in a more compact, but less efficient way, with:
regexp_substr(s, '[^;]*')
Try this:
Select LEFT(Name, Case
WHEN CHARINDEX(';', Name) = 0
then Len(Name)
else CHARINDEX(';', Name)-1 END) from CustomerDetails

How to replace more than one character in oracle?

How to replace multiple whole characters, except those in combinations...?
The below code replaces multiple characters, but it also disturbing those in combinations.
SELECT regexp_replace('a,ca,va,ea,r,y,q,b,g','(a|y|q|g)','X') RESULT FROM dual;
Current output:
RESULT
--------------------
X,cX,vX,eX,r,X,X,b,X
Expected output:
RESULT
------------------------
'X,ca,va,ea,r,X,X,b,X
I just want to replace only separate whole characters('a','y','q','g'), but not the 1 in combinations('ca','va','ea')...
Because you are delimiting with a comma ',' you can combine that like ',a,'
and this will replace only single a's.
you can try follows:
with t as
(
select 'a,ca,va,ea,r,y,q,b,g' str
from dual
)
select substr(sys_connect_by_path(regexp_replace(regexp_substr(str, '[^,]+', 1, level), '^(a|y|q|g)$', 'X'), ','), 2) as str
from t
where connect_by_isleaf = 1
connect by level <= length(regexp_replace(str, '[^,]*')) + 1;
Sadly oracle doesn´t support lookahead and lookbehind. But this is a solution i came up with.
SELECT regexp_replace
(regexp_replace
('a,ca,va,ea,r,y,q,b,g',
'^[ayqg](,)|(,)[ayqg](,)|(,)[ayqg]$',
'\2\4X\1\3'),'(,)[ayqg](,)','\1X\2')
RESULT FROM dual;
I had to use the regexp twice sadly, since it doesn´t find two similar values following after each other and replacing it. ..,a,y,.. is getting replaced as ..,X,y,... So the second call replaces the missing [ayqg] with the exact values. In the first inner regexp call replaces the first and last values.
Maybe this could be simplified into one expression, but i am not that conform with the regex from oracle.
As a explanation i am grouping the commata and basicly replace every ,[ayqg], with ,X, by backreferencing the commata
You would look for word boundaries, which is \b, and which is unfortunately not supported by Oracle's regexp_replace.
So let's look for a non-word character \W or the beginning ^ or ending $ of the text.
select
regexp_replace('a,ca,va,ea,r,y,q,b,g','(^|$|\W)(a|y|q|g)(^|$|\W)','\1X\3') as result
from dual;
In order to not remove the non-word characters, we must have them in the replace string: \1 for the expression in the first parenteses, \3 for the ones in the third. Thus we only change the expression in the second parentheses, which is a, y, q or g, with X.
Unfortunately above gives
X,ca,va,ea,r,X,q,b,X
The q was not replaced, because we recognize ',y,' thus being positioned a 'g,' whereas we'd need to be positioned at ',g,' to recognize g as a word, too.
So we need to replace in iterations (i.e. recursively):
with results(txt, num) as
(
select 'a,ca,va,ea,r,y,q,b,g' as txt, 0 as num from dual
union all
select regexp_replace(txt, '(^|$|\W)(a|y|q|g)(^|$|\W)','\1X\3'), num + 1 as num
from results
where txt <> regexp_replace(txt, '(^|$|\W)(a|y|q|g)(^|$|\W)','\1X\3')
)
select max(txt) keep (dense_rank last order by num) as result
from results;
EDIT: Kevin Esche is right; of course one has to do it only twice. Hence you can also do:
select
regexp_replace(txt, search_str, replace_str) as result
from
(
select
regexp_replace(txt, search_str, replace_str) as txt, search_str, replace_str
from
(
select
'a,ca,va,ea,r,y,q,y,q,b,g' as txt,
'(^|$|\W)(a|y|q|g)(^|$|\W)' as search_str,
'\1X\3' as replace_str
from dual
)
);
with replaced_values as (
SELECT case when length(val)=1 then regexp_replace(val,'(a|y|q|g)','X') else val end new_val, lvl
from (
SELECT regexp_substr('a,ca,va,ea,r,y,q,b,g','[^,]+', 1, LEVEL) val, level lvl FROM dual
connect by regexp_substr('a,ca,va,ea,r,y,q,b,g','[^,]+',1, LEVEL) is not null
) all_values
)
select lISTAGG(new_val, ',') WITHIN GROUP (ORDER BY lvl) RESULT
from replaced_values
This statement pivots data into rows and replaces only lines wich contains one character.
Data are then unpivoted in one rows
This sql works also with empty entries like 'a,,,b,c' and more complex regular expressions:
with t as
(select ',a,,ca,va,ea,bbb,ba,r,y,q,b,g,,,' as str,
',' as delimiter,
'(a|y|q|g|ea|[b]*)' as regexp_expr,
'X' as replace_expr
from dual)
(select substr (sys_connect_by_path(regexp_replace(substr(str,
decode(level - 1, 0, 0, instr(str, ',', 1, level - 1)) + 1,
decode(instr(str, ',', 1, level),
0,
length(str),
instr(str, ',', 1, level) - 1) -
decode(level - 1, 0, 0, instr(str, ',', 1, level - 1))),
'^' || regexp_expr || '$',
replace_expr), ','), 2)
from t
where connect_by_isleaf = 1
connect by level <= length(regexp_replace(str, '[^'|| delimiter||']')) + 1)
Result
,X,,ca,va,X,X,ba,r,X,X,X,X,,,
Don't Know much Oracle, but I would have thought something like this could work. Assuming the delimiter is always a comma.
SELECT
regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace('a,ca,va,ea,r,y,q,b,g','(,a,|,y,|,q,|,g,)',',X,') ,'(,a,|,y,|,q,|,g,)',',X,'), '(^a,|^y,|^q,|^g,)','X,'), '(,a$|,y$|,q$|,g$)',',X'), '(^a$|^y$|^q$|^g$)','X')
RESULT FROM test;
The first two parts replaces a single character in commas in the middle, the third part gets those at the start of the string, the fourth is for the end of the string and the fifth is for when then string has just one character.
This answer might will be simplifiable by advanced Regexp use.
How i can replace words?
RS & OS ===> D, LS & IS ==== >
SECTION_ID Output required
1-LS-1991 1-P-1991
1-IS-1991 1-P-1991
1-RS-1991 1- D- 1991
1-OS-1991 1-D-1991

Oracle Regexp_replace multiple occurrence

Hi I want to append a letter C to a string if it starts with a number .
Also if it has any punctuation then replace with underscore _
Eg : 5-2-2-1 ==> C5_2_2_1
I tried ,but I am not able to replace the multiple occurrence of the punctuation. I am missing some simple thing, I cant get it.
SELECT REGEXP_REPLACE('9-1-1','^(\d)(-),'C\1_' ) FROM DUAL;
SELECT case when REGEXP_LIKE('9-1-1','^[[:digit:]]') then 'C' END
|| REGEXP_REPLACE('9-1-1', '[[:punct:]]', '_')
FROM DUAL;
[:digit:] any digit
[:punct:] punctuation symbol
if you have a lot of rows with different values then try to avoid regex:
SELECT case when substr('9-1-1',1,1) between '0' and '9' then 'C' end
|| translate('9-1-1', ',.!-', '_')
FROM DUAL;
Check here for example: Performance of regexp_replace vs translate in Oracle?
Try this:
select (case when substr(val, 1, 1) between '0' and '9' then 'C' else '' end) ||
regexp_replace(val, '([-+.,;:'"!])', '_')

oracle 12c - select string after last occurrence of a character

I have below string:
ThisSentence.ShouldBe.SplitAfterLastPeriod.Sentence
So I want to select Sentence since it is the string after the last period. How can I do this?
Just for completeness' sake, here's a solution using regular expressions (not very complicated IMHO :-) ):
select regexp_substr(
'ThisSentence.ShouldBe.SplitAfterLastPeriod.Sentence',
'[^.]+$')
from dual
The regex
uses a negated character class to match anything except for a dot [^.]
adds a quantifier + to match one or more of these
uses an anchor $ to restrict matches to the end of the string
You can probably do this with complicated regular expressions. I like the following method:
select substr(str, - instr(reverse(str), '.') + 1)
Nothing like testing to see that this doesn't work when the string is at the end. Something about - 0 = 0. Here is an improvement:
select (case when str like '%.' then ''
else substr(str, - instr(reverse(str), ';') + 1)
end)
EDIT:
Your example works, both when I run it on my local Oracle and in SQL Fiddle.
I am running this code:
select (case when str like '%.' then ''
else substr(str, - instr(reverse(str), '.') + 1)
end)
from (select 'ThisSentence.ShouldBe.SplitAfterLastPeriod.Sentence' as str from dual) t
And yet another way.
Not sure from a performance standpoint which would be best...
The difference here is that we use -1 to count backwards to find the last . when doing the instr.
With CTE as
(Select 'ThisSentence.ShouldBe.SplitAfterLastPeriod.Sentence' str, length('ThisSentence.ShouldBe.SplitAfterLastPeriod.Sentence') len from dual)
Select substr(str,instr(str,'.',-1)+1,len-instr(str,'.',-1)+1) from cte;
select
substr(
'ThisSentence.ShouldBe.SplitAfterLastPeriod.Sentence',
INSTR(
'ThisSentence.ShouldBe.SplitAfterLastPeriod.Sentence',
'.',
-1
)+1
)
from dual;
The INSTR function accepts a third parameter, the occurrence. It defaults to 1 (the first occurrence), but also accepts negative numbers (meaning counting from the last occurrence backwards).
select substr(str, instr(str, '.', -1) + 1)
from (
select 'ThisSentence.ShouldBe.SplitAfterLastPeriod.Sentence'
as str
from dual);
Sentence
how many dots in a string?
select length(str) - length(replace(str, '.', '') number_of_dots from ...
get substring after last dot:
select substr(str, instr(str, '.', 1, number_of_dots)+1) from ...

oracle searching word within the input string

i have to find out INPUT string word found within the other string that is pipe delimited,i am trying below way but it is surprisingly return 'Y' instead of 'N'.please let me know what i am doing in wrong in below cast statement.
CASE
WHEN REGEXP_INSTR('TCS|XY|XZ','CS',1,1,1,'i') > 0
THEN 'Y'
ELSE 'N'
END
Regards,
Raj
There is really no need to use regexp_instr() regular expression function. If you just need to know if a particular character literal is part of another character literal, instr() function will completely cover your needs:
with t1(col) as(
select 'TCS|XY|XZ' from dual union all
select 'TAB|XY|XZ' from dual
)
select col
, case
when instr(col, 'CS') > 0
then 'Y'
else 'N'
end as Is_Part
from t1
Result:
COL IS_PART
--------- -------
TCS|XY|XZ Y
TAB|XY|XZ N
Edit
If you need to take vertical bars into consideration - returning yes only if there is a standalone CS sub-string surrounded by vertical bars |CS| then yes, you could use regexp_instr() regular expression function as follows:
with t1(col) as(
select 'TCS|XY|XZ|' from dual
)
select col
, case
when regexp_instr(col, '(\||^)CS(\||$)', 1, 1, 0, 'i') > 0
then 'YES'
else 'NO'
end as res
from t1
Result:
COL RES
---------- ---
TCS|XY|XZ| NO
Note: If a character literal is dynamic you could use a concatenation operator || to form a search pattern '(\||^)' || <<'character literal', column or variable>> || '(\||$)'
The first field (TCS) contains CS which counts as a match.
If you want to match an entire field you can do like this:
CASE
WHEN REGEXP_INSTR('|' || 'TCS|XY|XZ' || '|' , '\|' || 'CS' || '\|',1,1,1,'i') > 0
THEN 'Y'
ELSE 'N'
END
Add the delimiter to your query string to "anchor" the search to whole fields. To be able to match the first and last field I also added the delimiter to the searched string.