Oracle REGEXP_SUBSTR not working with my pattern - sql

I have the following query:
SELECT DISTINCT A.REZ FROM
(
SELECT REGEXP_SUBSTR(P_EQUATION, '([A-Z|a-z|0-9]+)\{([0-9|\+|\-| |\*|\/\)\(]+)\}#([A-Z|a-z|0-9|_]+)#',1, LEVEL) AS REZ FROM DUAL
CONNECT BY REGEXP_SUBSTR(P_EQUATION, '([A-Z|a-z|0-9]+)\{([0-9|\+|\-| |\*|\/\)\(]+)\}#([A-Z|a-z|0-9|_]+)#',1, LEVEL) IS NOT NULL
) A;
If I supplied the following input:
P_EQUATION := 'A123{(01+02)*2}#ACCOUNT_BALANCE# + B123{(20+10)/20}#ACCOUNT_BALANCE#';
It gives me the following:
REZ
-------------------------------------
A123{(01+02)*2}#ACCOUNT_BALANCE#
B123{(20+10)/20}#ACCOUNT_BALANCE#
But, although the minus sign is included in the pattern, if I have added it inside the curly brackets, it will not recognize the text anymore as a match!
ex:
P_EQUATION := 'A123{(01-02)*2}#ACCOUNT_BALANCE#';
I'm not able to find a solution to this, it is freaking me out, especially, when I tried to match the minus sign alone it works, if I tried to match digits alone it also works :(

Oracle appears to be using POSIX style regexes: https://docs.oracle.com/cd/B12037_01/server.101/b10759/ap_posix001.htm#i690819
The backslash is NOT a metacharacter in a POSIX bracket expression. So in POSIX, the regular expression [\d] matches a \ or a d
> http://www.regular-expressions.info/posixbrackets.html
The backslashes are probably messing it up, and they're not necessary. You also don't realize that | is a literal inside a char class (which the comments also pointed out). I have fixed these problems, and I moved the - to the start of the char class, which allows it to be interpreted as a literal.
Here you go:
([A-Za-z0-9]+)\{([-0-9+ */)(]+)\}#([A-Za-z0-9_]+)#

Couldn't quite figure out the issue with your code but here is one way to do it:
with temp as
(
select 'A123{(01+02)*2}#ACCOUNT_BALANCE# + B123{(20+10)/20}#ACCOUNT_BALANCE#' P_EQUATION from dual union all
select 'A123{(01-02)*2}#ACCOUNT_BALANCE#' P_EQUATION from dual
)
SELECT DISTINCT A.REZ FROM
(
SELECT REGEXP_SUBSTR(P_EQUATION, '[[:alpha:]]+[[:digit:]]+{\([[:digit:]]+\S[[:digit:]]+\)\S[[:digit:]]+}#[[:alpha:]]+_[[:alpha:]]+#',1, LEVEL) AS REZ FROM temp
CONNECT BY REGEXP_SUBSTR(P_EQUATION, '[[:alpha:]]+[[:digit:]]+{\([[:digit:]]+\S[[:digit:]]+\)\S[[:digit:]]+}#[[:alpha:]]+_[[:alpha:]]+#',1, LEVEL) IS NOT NULL
) A;
OUTPUT:
REZ
---------------------------------------
B123{(20+10)/20}#ACCOUNT_BALANCE#
A123{(01-02)*2}#ACCOUNT_BALANCE#
A123{(01+02)*2}#ACCOUNT_BALANCE#

Related

How do I dynamically extract substring from string?

I’m trying to dynamically extract a substring from a very long URL. For example, I may have the following URLs:
https://www.google.com/ABCDEF Version=“0.0.00.0” GHIJK
https://www.google.com/ABCDEFGH Version=“0.0.0.0” IJKLM
https://www.google.com/ABC Version=“0.0.0.00” 12345
I am trying to extract the version code only (0.0.0.0).
This is what I have so far:
SELECT SUBSTR(col, INSTR(col, ‘Version=“‘)+9)
FROM table
This query returns the following result:
0.0.00.0” GHIJK … (url continues on)
So, I attempt to find “Version” in the link, so I can start from the same position in each row. This works fine, however I’m having a hard time dynamically locating the ending quote (“). I tried using INSTR in the third parameter of my SUBSTR function, like so:
SELECT SUBSTR(col, INSTR(col, ‘Version=“‘)+9, INSTR(col, ‘“‘))
FROM table
I figured that this would find the position of the ending quote, and then use that number for the length, but it returns a strange output. I’ve also used POSITION, CHARINDEX, LENGTH, and LOCATE. None of these functions work in Oracle.
I think maybe when I put +9 after the first INSTR function, it’s setting the query to a fixed position instead of a dynamic one, but I’m not sure how else to remove ‘Version=“‘.
Here's one option (which, actually, selects what's between double quotes - that's version in your example; if there were some other similar substring, you'd get a wrong result).
with test (col) as
(select 'https://www.google.com/ABCDEF Version="0.0.00.0" GHIJK' from dual union all
select 'https://www.google.com/ABCDEFGH Version="0.0.0.0" IJKLM' from dual union all
select 'https://www.google.com/ABC Version="0.0.0.00" 12345' from dual
)
select col,
replace(regexp_substr(col, '".+"'), '"') version
from test;
which results in
https://www.google.com/ABCDEF Version="0.0.00.0" GHIJK 0.0.00.0
https://www.google.com/ABCDEFGH Version="0.0.0.0" IJKLM 0.0.0.0
https://www.google.com/ABC Version="0.0.0.00" 12345 0.0.0.00
You can still use use INSTR to locate the second " in the string, then subtract the location of the first " to get the length that you need to get. Below is an example query:
SELECT col,
SUBSTR (col, INSTR (col, '"') + 1, INSTR (col, '"', 1, 2) - INSTR (col, '"') - 1) version
FROM test;
You can use REGEXP_SUBSTR() with Version=(\d.*\d?) pattern in order to extract the piece between Version=" and "(your quotes are presumed to be regular double quotes " ")
SELECT REGEXP_SUBSTR(url,'Version="(\d.*\d)"',1,1,null,1) AS version
FROM t
where
the third argument(1) is position,
the fourth argument(1) is occurence, and especially important to use the last one as being capture group (1)
indeed using '"(\d.*\d)"' pattern is enough for the
current data set
or
REGEXP_REPLACE() with capture group \2 as
SELECT REGEXP_REPLACE(url,'^(.*Version=")([^"]*).*','\2') AS version
FROM t
Demo

ORACLE regexp_substr extract everything after specific char

How to get rest of string after specific char?
I have a string 'a|b|c|2|:x80|3|rr|' and I would like to get result after 3rd occurance of |. So the result should be like 2|:x80|3|rr|
The query
select REGEXP_SUBSTR('a|b|c|2|:x80|3|rr|','[^|]+$',1,4)
from dual
Returned me NULL
Use SUBSTR / INSTR combination
WITH t ( s ) AS (
SELECT 'a|b|c|2|:x80|3|rr|'
FROM dual
) SELECT substr(s,instr(s,'|',1,3) + 1)
FROM t;
Demo
REGEXP_REPLACE() will do the trick. Skip 3 groups of anything followed by a pipe, then replace with the 2nd group, which is the rest of the line (anchored to the end).
SQL> select regexp_replace('a|b|c|2|:x80|3|rr|', '(.*?\|){3}(.*)$', '\2') trimmed
2 from dual;
TRIMMED
------------
2|:x80|3|rr|
SQL>
I suggest a nice by long way by using regexp_substr, regexp_count and listagg together as :
select listagg(str) within group (order by lvl)
as "Result String"
from
(
with t(str) as
(
select 'a|b|c|2|:x80|3|rr|' from dual
)
select level-1 as lvl,
regexp_substr(str,'(.*?)(\||$)',1,level) as str
from dual
cross join t
connect by level <= regexp_count('a|b|c|2|:x80|3|rr|','\|')
)
where lvl >= 3;
Rextester Demo
If you use oracle 11g and above you can specify a subexpression to return like this:
select REGEXP_SUBSTR('a|b|c|2|:x80|3|rr|','([^|]+\|){3}(.+)$',1,1,null,2) from dual
Erkko,
You need to use the combination of SUBSTR and REGEXP_INSTR OR INSTR.
Your query will look like this. (Without Regex)
SELECT SUBSTR('a|b|c|2|:x80|3|rr|',INSTR('a|b|c|2|:x80|3|rr|','|',1,3)+1) from dual;
Your query will look like this. (With Regex as you want to use)
SELECT SUBSTR('a|b|c|2|:x80|3|rr|',REGEXP_INSTR('a|b|c|2|:x80|3|rr|','\|',1,3)+1) from dual;
Explanation:
First, you will need to find the place of the string you want as you mentioned. So in your case | comes at place 6. So that +1 would be your position to start to substring.
Second, from the original string, substring from that position+1 to unlimited.(Where your string ends)
Example:
https://dbfiddle.uk/?rdbms=oracle_11.2&fiddle=6fd782db95f575201eded084493232ee

remove & symbolic from Query

we have Oracle query running in Informatica SQ transformation, I have given my query below
SELECT
CAST(T.COLUMN_VALUE.EXTRACT('//text()') AS VARCHAR2(200))
FROM
(SELECT regexp_replace('assaley&lee#direct.wvhin.org','(!##$%^\&*()_+=)*','') as RECIPIENTS FROM DUAL) T1,
TABLE( xmlsequence( XMLTYPE( '<x><x>' || REPLACE(t1.RECIPIENTS, ',', '</x><x>') || '</x></x>' ).EXTRACT('//x/*'))) t
where length(T1.RECIPIENTS) =28 -- and length(T1.RECIPIENTS) > 25
if i run this above query prompting some user input due '&' this symbolic reference, i should turnoff this prompt step.
could any one help me with this one?
Note: 'assaley&lee#direct.wvhin.org' this value is hard-coded value.
Thanks
Pandia
One method is to use UNISTR function and hardcode& as unicode literal: \0026:
https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions204.htm
SELECT unistr( 'assaley\0026lee#direct.wvhin.org' ) x
from dual;
X
----------------------------
assaley&lee#direct.wvhin.org
Regardless of the issue you are currently experiencing, the code should be corrected.
Within the regular expression you should use square brackets instead or round brackets.
Round brackets are intend for expression capturing.
Square brackets are intend for characters set.
SELECT regexp_replace('assaley&lee#direct.wvhin.org','[!##$%^\&*()_+=]+','') as RECIPIENTS
FROM DUAL
;
Or -
remove anything except for letters and digits:
SELECT regexp_replace('assaley&lee#direct.wvhin.org','[^[:alpha:][:digit:]]','') as RECIPIENTS
FROM DUAL
;

to_number from char sql

I have to select only the IDs which have only even digits (an ID looks like: p19 ,p20 etc). That is, p20 is good (both 2 and 0 are even digits); p18 is not.
I thought to use substr to get each number from the IDs and then see if it's even .
select from profs
where to_number(substr(id_prof,2,2))%2=0 and to_number(substr(id_prof,3,2))%2=0;
IF you need all rows consist of 'p' in beginning and even digits on tail It should look like:
select *
from profs
where regexp_like (id_prof, '^p[24680]+$');
with
profs ( prof_id ) as (
select 'p18' from dual union all
select 'p24' from dual union all
select 'p53' from dual
)
-- End of test data; what is above this line is NOT part of the solution.
-- The solution (SQL query) begins here.
select *
from profs
where length(prof_id) = length(translate(prof_id, '013579', '0'));
PROF_ID
-------
p24
This solution should work faster than anything using regular expressions. All it does is to replace 0 with itself and DELETE all odd digits from the input string. (The '0' is included due to a strange but documented behavior of translate() - the third argument can't be empty). If the length of the input string doesn't change after the translation, that means the input string didn't have any odd digits.
where mod(to_number(regexp_replace(id_prof, '[^[:digit:]]', '')),2) = 0

sql Search a string between certain words

If the key word is "Find", is it possible to extract a string that is between the "Find"?
stackoverflow is awesome. FindHello, World!Find It has everything!
The result should be 'Hello, World!' because the string is between "Find"
My initial idea was to use Instr to locate two "Find", then locate what's between "Find".
Is there any better way to do this?
You can use either regular expressions or instr() to achieve what you're after.
I actually prefer regular expressions, if you're using version 10g or later, because I find doing multiple contortions with instr() fairly unwieldy, but it's up to you.
with phrases as (
select 'stackoverflow is awesome. FindHello, World!Find It has everything!' as phrase
from dual
)
select substr( phrase
, instr(phrase,'Find',1,1) + 4
, instr(phrase,'Find',1,2)
- instr(phrase,'Find',1,1)
- 4
)
from phrases
This gets the first and second occurrences of the string Find, starting from the first character, then uses these to work out the positions that you should be doing the sub-string on.
Alternatively, using regular expressions:
with phrases as (
select 'stackoverflow is awesome. FindHello, World!Find It has everything!' as phrase
from dual
)
select regexp_replace(phrase
, '([[:print:]]+Find)([[:print:]]+)(Find[[:print:]]+)', '\2')
from phrases
;
This takes any printable character multiple times, followed by the string Find etc. But, the main bit is the grouping (), which separates each part of the phrase. The \2 means that of the original matched string only the second group, i.e. that between the Find's is returned.
Here's a little SQL Fiddle to demonstrate.
This query suppose to handle more than two 'Find's
with SourceString as(
select 'Find123Find45345Find76876234Find87687Find' s_string
, 'Find' delimiter
from dual
)
select substr(s_string, f_f - s_f + length(delimiter), s_f-Length(delimiter ) )
from (select f_f
, s_f
from(select f_f
, f_f - lag(f_f, 1, f_f) over(order by 1) s_f
from (select Instr(s_string, delimiter , 1, level) f_f
from SourceString
connect by level <= Length(s_string))
)
where s_f > 0)
, SourceString