Retrivieng specific occurrences of a given Regex with Oracle SQL - sql

In a simplified form, I'm attempting to retrieve either the first occurrence of the '.*?=(.*?);.*' regex, or the second, or the third -- that is, either x or y or z (that is, I want to be able to hardcode in this query that I want the first, second or third values) in this following example:
select regexp_replace(
'margin=x;margin=y;margin=z;',
'.*?=(.*?);.*',
'\1',
1 -- occurrences. I thought that picking 1, 2 or 3 would solve my problem?
) from dual;
-- This returns "xyz", which is terrible. I was expecting it to return "x", in this case.
Looking at the Oracle documentation, I thought this would be relatively straightforward, as the last parameter (occurrences), apparently allows me to select which groups to take into consideration. But it doesn't! Why?
Thanks

i´m goingoff to another completly different solution. Would combining a hierarchial substring select with a regexp_replace be an option for your needs?
This way you could create an option to either select one or multiple values, depending on your needs. You wouldn´t need to write a concatinating regex value and you could adjust the select a bit more to your needs
select regexp_replace(subselect.val, '.*=(.*?);', '\1') -- remove "margin="
from (select regexp_substr(
'margin=x;margin=y;margin=z;',
'.*?=(.*?);',
1,
level) val,
level lvl
from dual
connect by regexp_substr('margin=x;margin=y;margin=z;',
'.*?=(.*?);',
1,
level) is not null) subselect -- This select represents each margin=T as a single row
where lvl = 1; -- cou could define multiple values to select aswell.

You need a regex that will match 1 to n occurrences of the whole group. E.g.
([^=]*=([^;]*);){2}.*
(replaced with \2 backreference) will get the 2nd attribute value. Your regex can also be used (though it is quite synonymous to the above pattern): (.*?=(.*?);){2}.*.
See the regex demo
If you define the index variable as IDX, you can use something like
select regexp_replace(
'margin=x;margin=y;margin=z;',
CONCAT('([^=]*=([^;]*);){', IDX, '}.*'),
'\2'
) from dual;
NOTE: If you want to get an empty string as a result of trying to obtain a non-existing value, add |.* at the end of the regex:
(.*?=(.*?);){4}.*|.*
See this regex demo (with your input string, the result will be empty string).

Perhaps all you need is this.... The fourth parameter is NOT the occurrence but the POSITION from which the search starts. The FIFTH parameter is the occurrence.
https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions130.htm
Also, are you sure you want REPLACE and not SUBSTR?
EDITED: To clarify (it seems at least one person was confused). I show a possible solution to what you need (perhaps) at the end, but first let's look at REGEXP_REPLACE. I rewrote your query to use different occurrences; I put the index in a CTE, but you can instead make idx into a bind variable, or any other mechanism you need to use. As you will see, the output makes no sense.
with t1 ( idx ) as (select 1 from dual union all select 2 from dual
union all select 3 from dual)
select idx,
regexp_replace('margin=x;margin=y;margin=z;', '.*?=(.*?);.*', '\1', 1, idx) as val
from t1;
Output:
IDX VAL
---------- -----------------------
1 xmargin=y;margin=z;
2 margin=x;ymargin=z;
3 margin=x;margin=y;z
3 rows selected.
I guess this is not what you needed - but it demonstrates what was wrong in your query. The fourth argument to REGEXP_REPLACE, 1 in all cases in the above query, is the position from which the search begins. The fifth argument, idx, is the occurrence. This query replaces the first, second, third occurrence with the subexpression - probably not what you wanted.
If you need to extract x, or y, or z, depending on the occurrence number, you must use REGEXP_SUBSTR, not REGEXP_REPLACE. Note also that I changed the match pattern - the .*? at the beginning and the .* at the end are unnecessary. If you want to find x, y or z in something like margin=x; but not in length=x; then you must make that explicit, the match pattern should be 'margin=(.*?);'.
with t1 ( idx ) as (select 1 from dual union all select 2 from dual
union all select 3 from dual)
select idx,
regexp_replace('margin=x;margin=y;margin=z;', '=(.*?);', '\1', 1, idx) as val
from t1;
Output:
IDX VAL
---------- -------
1 x
2 y
3 z

Related

Search a pattern from comma seperated parameters in plsql

My Parameter to a procedure lv_ip := 'MNS-GC%|CS,MIB-TE%|DC'
My cursor query should search for records that start with 'MNS-GC%' and 'MIB-TE%'.
Select id, date,program,program_start_date
from table_1
where program like 'MNS-GC%' or program LIKE 'MIB-TE%'
Please suggest ways to read it from the parameter and an alternative to LIKE.
Since you mention you want to preserve what's on the right side of the pipe, and want to be able to process parameters dynamically, here's a way to parse multi-delimited data that could give you some ideas using a CTE.
The table called 'tbl' just sets up your original data. tbl_comma contains that data split on the comma. The final query splits that data into name/value pairs.
Hopefully this will help give you some ideas even though it's not the exact answer you are looking for.
COLUMN ID FORMAT a3
COLUMN PROGRAM FORMAT a10
COLUMN part2 FORMAT a6
-- Original data
WITH tbl(ID, DATA) AS (
SELECT 1, 'MNS-GC%|CS,MIB-TE%|DC' FROM dual UNION ALL
SELECT 2, 'MNS-GC%|CS,MIB-TE%|DC,MIB-TA%|AB,MIB-TB%|BC' FROM dual
),
tbl_comma(ID, CASE) AS (
SELECT ID,
REGEXP_SUBSTR(DATA, '(.*?)(,|$)', 1, LEVEL, NULL, 1) CASE
FROM tbl
CONNECT BY REGEXP_SUBSTR(DATA, '(.*?)(,|$)', 1, LEVEL) IS NOT NULL
AND PRIOR ID = ID
AND PRIOR SYS_GUID() IS NOT NULL
)
--SELECT * FROM tbl_comma;
-- Parse into name/value pairs
SELECT ID,
REGEXP_REPLACE(CASE, '^(.*)\|.*', '\1') PROGRAM,
REGEXP_REPLACE(CASE, '.*\|(.*)$', '\1') PART2
FROM tbl_comma;
ID PROGRAM PART2
--- ---------- ------
1 MNS-GC% CS
1 MIB-TE% DC
2 MNS-GC% CS
2 MIB-TE% DC
2 MIB-TA% AB
2 MIB-TB% BC
6 rows selected.
If you're stuck with that input and the structure is fixed, with each comma-separated element having a pipe-delimited value, you could possibly convert that string to a regular expression pattern, and then use regexp_like to pattern-match:
select id, date, program, program_start_date
from table_1
where regexp_like(
program,
'^(' || rtrim(regexp_replace(lv_ip, '%\|.*?(,|$)', '|'), '|') || ')')
With your example parameter, the
'^(' || rtrim(regexp_replace(lv_ip, '%\|.*?(,|$)', '|'), '|') || ')'
would generate the pattern
^(MNS-GC|MIB-TE)
i.e. looking for either of those strings at the start of the program value.
db<>fiddle
Alternatively you could split the input up yourself, with instr and substr, and - since the number of elements may vary - create a dynamic query using them. That might be faster than using regular expression, but might be harder to maintain.
What would the regexp be to match CS|DC
It depends how you plan to use those values, but if you're looking for some column exactly matching one of them, then you could do something similar with:
'^(' || ltrim(regexp_replace(l_ip, '(^|,)[^|]*', null), '|') || ')$'
which with your input string would generate the pattern
^(CS|DC)$
But if you need to match the corresponding values as pairs - so the equivalent of something like:
where (program like 'MNS-GC%' and some_col = 'CS')
or (program like 'MIB-TE%' and some_col = 'DC')
... then you'd need to extract them as pairs, as #Gary_W has shown.

How to replace characters at specific position in several words using REGEX_REPLACE

I have a query similar to this:
SELECT YEAR_CODE FROM YEAR_CODES
and it returns several records: typically 1 but sometimes 2 or 3. The returned records look like this: 2018FOO, 2019BAR
I need to get the matching previous year of the returned codes. For instance:
2018FOO becomes 2017FOO
2019BAR becomes 2018BAR
Looking for something similar to:
REGEX_REPLACE(SELECT YEAR_CODE FROM YEAR_CODES, 4th character, 4th character minus 1)
You don't need regexp_replace(), using substr() string operator with concat() function (or concatenation operators ||) is enough :
with year_codes(year_code) as
(
select '2018FOO' from dual union all
select '2019BAR' from dual
)
select concat(substr(year_code,1,4) - 1,substr(year_code,-3)) as year_code
from year_codes;
YEAR_CODE
---------
2017FOO
2018BAR
to_number() conversion is redundant, since Oracle implicitly considers a string as a number which is completely composed of digits for an arithmetic operation.
You can do use string operations:
with c as (
<your query here>
)
select
from year_code yc
where to_number(substr(yc.code, 1, 4)) = to_number(substr(c.code)) - 1 and
substr(yc.code, 5) = substr(c.code, 5)

How to extract value between 2 slashes

I have a string like "1490/2334/5166400411000434" from which I need to derive value after second slash. I tried below logic
select REGEXP_SUBSTR('1490/2334/5166400411000434','[^/]+',1,3) from dual;
it is working fine. But when i dont have value between first and second slash it is returining blank.
For example my string is "1490//5166400411000434" and am trying
select REGEXP_SUBSTR('1490//5166400411000434','[^/]+',1,3) from dual;
it is returning blank. Please suggest me what i am missing.
If I understand well, you may need
regexp_substr(t, '(([^/]*/){2})([^/]*)', 1, 1, 'i', 3)
This handles the first 2 parts like 'xxx/' and then checks for a sequence of non / characters; the parameter 3 is used to get the 3rd matching subexpression, which is what you want.
For example:
with test(t) as (
select '1490/2334/5166400411000434' from dual union all
select '1490//5166400411000434' from dual union all
select '1490//5166400411000434/ramesh/3344' from dual
)
select t, regexp_substr(t, '(([^/]*/){2})([^/]*)', 1, 1, 'i', 3) as substr
from test
gives:
T SUBSTR
---------------------------------- ----------------------------------
1490/2334/5166400411000434 5166400411000434
1490//5166400411000434 5166400411000434
1490//5166400411000434/ramesh/3344 5166400411000434
You can REVERSE() your string and take the value before the first slash. And then reverse again to obtain the desired output.
select reverse(regexp_substr(reverse('1490//5166400411000434'), '[^/]+', 1, 1)) from dual;
It can also be done with basic substring and instr function:
select reverse(SUBSTR(reverse('1490//5166400411000434'), 0, INSTR(reverse('1490//5166400411000434'), '/')-1)) from dual;
Use other options in REGEXP_SUBSTR to match a pattren
select REGEXP_SUBSTR('1490//5166400411000434','(/\d*)/(\d+)',1,1,'x',2) from dual
Basically it is finding the pattren of two / including digits starting from 1 with 1 appearance and ignoring whitespaces ('x') then outputting 2nd subexpression that is in second expression within ()
... pattern,1,1,'x',subexp2)

to_number from char sql

I have to select only the IDs which have only even digits (an ID looks like: p19 ,p20 etc). That is, p20 is good (both 2 and 0 are even digits); p18 is not.
I thought to use substr to get each number from the IDs and then see if it's even .
select from profs
where to_number(substr(id_prof,2,2))%2=0 and to_number(substr(id_prof,3,2))%2=0;
IF you need all rows consist of 'p' in beginning and even digits on tail It should look like:
select *
from profs
where regexp_like (id_prof, '^p[24680]+$');
with
profs ( prof_id ) as (
select 'p18' from dual union all
select 'p24' from dual union all
select 'p53' from dual
)
-- End of test data; what is above this line is NOT part of the solution.
-- The solution (SQL query) begins here.
select *
from profs
where length(prof_id) = length(translate(prof_id, '013579', '0'));
PROF_ID
-------
p24
This solution should work faster than anything using regular expressions. All it does is to replace 0 with itself and DELETE all odd digits from the input string. (The '0' is included due to a strange but documented behavior of translate() - the third argument can't be empty). If the length of the input string doesn't change after the translation, that means the input string didn't have any odd digits.
where mod(to_number(regexp_replace(id_prof, '[^[:digit:]]', '')),2) = 0

SQL change date formats inside a string

I would like to convert a string containing dates in SQL select from Oracle 11g database.
Original string (CLOB) example:
"1.12.2011 - event 1
2.2.2012 - event 2
13.3.2012 - event 44"
Desired output:
"20111201 - event 1
20120202 - event 2
20120313 - event 44"
Is there a better (faster) way than using 4 separate replacements?
regexp_replace(regexp_replace(regexp_replace(regexp_replace(my_string,
'(\d\d)\.(\d\d)\.(20\d\d)', '\3\2\1'),
'(\d\d)\.(\d)\.(20\d\d)', '\30\2\1'),
'(\d)\.(\d\d)\.(20\d\d)', '\3\20\1'),
'(\d)\.(\d)\.(20\d\d)', '\30\20\1')
Especially if you're using clobs you have to be careful unless you're certain of the data in there.
However, if your clob only looks like that then you need threeregexp_replace in order for this to work; it'll also be much more dynamic. Just explicitly specify digits using [[:digit:]] then specify a minimum and maximum number of times these digits could be there using {1,2}.
Then the following would work:
select regexp_replace(
regexp_replace(
regexp_replace( my_string
, '([[:digit:]]{1,2})\.([[:digit:]]{1,2})\.(20[[:digit:]]{2})'
, '\3-\2-\1')
, '-([[:digit:]]{1}(-|$))'
, '0\1' )
, ('-')
, '')
from dual
This means:
match ( group 1 ) 1 or 2 digits
match a full stop.
match ( group 2 ) 1 or 2 digits
match a full stop
match ( group 3 ) 20 + 2 digits.
Then take out only groups 1, 2 and 3, i.e. ignoring the full stops and return then in the order 3, 2, 1 padded with a hyphen
Then replace any [digit] that is followed by either a hyphen or the end of the string, i.e. the number of digits is only 1 with -0[digit].
Lastly replace all the hyphens.
Separately from that I agree with tbone. It would make a lot more sense to store this data in a separate table (event_id number, event_date date). Any string transformations are easy with no chance of getting it wrong, unlike in this situation, and the data is easy to query and compare.
there are no better options (both correct and readable) with better performance - or if there are, no one cares..
i prefer a 2-level regexp_replace for date part:
select regexp_replace(
regexp_replace( my_string,
'([[:digit:]]{1,2})\.([[:digit:]]{1,2})\.(20[[:digit:]]{2})',
'\3-0\2-0\1' ),
'(20[[:digit:]]{2})-0?([[:digit:]]{2})-0?([[:digit:]]{2})',
'\3\2\1' )
from dual;
Demo
Maybe try doing:
select to_char(to_date('13.3.2011', 'DD.MM.YYYY'),'YYYYMMDD') from dual;