Fetching value from Pipe-delimited String using Regex (Oracle) - sql

I have a sample source string like below, which was in pipe delimited format in that the value obr can be at anywhere. I need to get the second value of the pipe from the first occurrence of obr. So for the below source strings the expected would be,
Source string:
select 'asd|dfg|obr|1|value1|end' text from dual
union all
select 'a|brx|123|obr|2|value2|end' from dual
union all
select 'hfv|obr|3|value3|345|pre|end' from dual
Expected output:
value1
value2
value3
I have tried the below regexp in oracle sql, but it is not working fine properly.
with t as (
select 'asd|dfg|obr|1|value1|end' text from dual
union all
select 'a|brx|123|obr|2|value2|end' from dual
union all
select 'hfv|obr|3|value3|345|pre|end' from dual
)
select text,to_char(regexp_replace(text,'*obr\|([^|]*\|)([^|]*).*$', '\2')) output from t;
It is working fine when the string starts with OBR, but when OBR is in the middle like the above samples it is not working fine.
Any help would be appreciated.

Not sure of how Oracle handles regular expressions, but starting with an asterisk usually implies that you're looking for zero or more null characters.
Have you tried '^.*obr\|([^|]*\|)([^|]*).*$' ?

This handles null elements and is wrapped in a NVL() call which supplies a value if 'obr' is not found or occurs too far toward the end of a record so a value 2 away is not possible:
SQL> with t(id, text) as (
select 1, 'asd|dfg|obr|1|value1|end' from dual
union
select 2, 'a|brx|123|obr|2|value2|end' from dual
union
select 3, 'hfv|obr|3|value3|345|pre|end' from dual
union
select 4, 'hfv|obr||value4|345|pre|end' from dual
union
select 5, 'a|brx|123|obriem|2|value5|end' from dual
union
select 6, 'a|brx|123|obriem|2|value6|obr' from dual
)
select
id,
nvl(regexp_substr(text, '\|obr\|[^|]*\|([^|]*)(\||$)', 1, 1, null, 1), 'value not found') value
from t;
ID VALUE
---------- -----------------------------
1 value1
2 value2
3 value3
4 value4
5 value not found
6 value not found
6 rows selected.
SQL>
The regex basically can be read as "look for a pattern of a pipe, followed by 'obr', followed by a pipe, followed by zero or more characters that are not a pipe, followed by a pipe, followed by zero or more characters that are not a pipe (remembered in a captured group), followed by a pipe or the end of the line". The regexp_substr() call then returns the 1st captured group which is the set of characters between the pipes 2 fields from the 'obr'.

Related

Find value that is not a number or a predefined string

I have to test a column of a sql table for invalid values and for NULL.
Valid values are: Any number and the string 'n.v.' (with and without the dots and in every possible combination as listed in my sql command)
So far, I've tried this:
select count(*)
from table1
where column1 is null
or not REGEXP_LIKE(column1, '^[0-9,nv,Nv,nV,NV,n.v,N.v,n.V,N.V]+$');
The regular expression also matches the single character values 'n','N','v','V' (with and without a following dot). This shouldn't be the case, because I only want the exact character combinations as written in the sql command to be matched. I guess the problem has to do with using REGEXP_LIKE. Any ideas?
I guess this regexp will work:
NOT REGEXP_LIKE(column1, '^([0-9]+|n\.?v\.?)$', 'i')
Note that , is not a separator, . means any character, \. means the dot character itself and 'i' flag could be used to ignore case instead of hard coding all combinations of upper and lower case characters.
No need to use regexp (performance will increase by large data) - plain old TRANSLATE is good enough for your validation.
Note that the first translate(column1,'x0123456789','x') remove all numeric charcters from the string, so if you end with nullthe string is OK.
The second translate(lower(column1),'x.','x') removes all dots from the lowered string so you expect the result nv.
To avoid cases as n.....v.... you also limit the string length.
select
column1,
case when
translate(column1,'x0123456789','x') is null or /* numeric string */
translate(lower(column1),'x.','x') = 'nv' and length(column1) <= 4 then 'OK'
end as status
from table1
COLUMN1 STATUS
--------- ------
1010101 OK
1012828n
1012828nv
n.....v....
n.V OK
Test data
create table table1 as
select '1010101' column1 from dual union all -- OK numbers
select '1012828n' from dual union all -- invalid
select '1012828nv' from dual union all -- invalid
select 'n.....v....' from dual union all -- invalid
select 'n.V' from dual; -- OK nv
You can use:
select count(*)
from table1
WHERE TRANSLATE(column1, ' 0123456789', ' ') IS NULL
OR LOWER(column1) IN ('nv', 'n.v', 'nv.', 'n.v.');
Which, for the sample data:
CREATE TABLE table1 (column1) AS
SELECT '12345' FROM DUAL UNION ALL
SELECT 'nv' FROM DUAL UNION ALL
SELECT 'NV' FROM DUAL UNION ALL
SELECT 'nV' FROM DUAL UNION ALL
SELECT 'n.V.' FROM DUAL UNION ALL
SELECT '...................n.V.....................' FROM DUAL UNION ALL
SELECT '..nV' FROM DUAL UNION ALL
SELECT 'n..V' FROM DUAL UNION ALL
SELECT 'nV..' FROM DUAL UNION ALL
SELECT 'xyz' FROM DUAL UNION ALL
SELECT '123nv' FROM DUAL;
Outputs:
COUNT(*)
5
or, if you want any quantity of . then:
select count(*)
from table1
WHERE TRANSLATE(column1, ' 0123456789', ' ') IS NULL
OR REPLACE(LOWER(column1), '.') = 'nv';
Which outputs:
COUNT(*)
9
db<>fiddle here

Get substring with REGEXP_SUBSTR

I need to use regexp_substr, but I can't use it properly
I have column (l.id) with numbers, for example:
1234567891123!123 EXPECTED OUTPUT: 1234567891123
123456789112!123 EXPECTED OUTPUT: 123456789112
12345678911!123 EXPECTED OUTPUT: 12345678911
1234567891123!123 EXPECTED OUTPUT: 1234567891123
I want use regexp_substr before the exclamation mark (!)
SELECT REGEXP_SUBSTR(l.id,'[%!]',1,13) from l.table
is it ok ?
You can try using INSTR() and substr()
DEMO
select substr(l.id,1,INSTR(l.id,'!', 1, 1)-1) from dual
You want to remove the exclamation mark and all following characters it seems. That is simply:
select regexp_replace(id, '!.*', '') from mytable;
Look at it like a delimited string where the bang is the delimiter and you want the first element, even if it is NULL. Make sure to test all possibilities, even the unexpected ones (ALWAYS expect the unexpected)! Here the assumption is if there is no delimiter you'll want what's there.
The regex returns the first element followed by a bang or the end of the line. Note this form of the regex handles a NULL first element.
SQL> with tbl(id, str) as (
select 1, '1234567891123!123' from dual union all
select 2, '123456789112!123' from dual union all
select 3, '12345678911!123' from dual union all
select 4, '1234567891123!123' from dual union all
select 5, '!123' from dual union all
select 6, '123!' from dual union all
select 7, '' from dual union all
select 8, '12345' from dual
)
select id, regexp_substr(str, '(.*?)(!|$)', 1, 1, NULL, 1)
from tbl
order by id;
ID REGEXP_SUBSTR(STR
---------- -----------------
1 1234567891123
2 123456789112
3 12345678911
4 1234567891123
5
6 123
7
8 12345
8 rows selected.
SQL>
If you like to use REGEXP_SUBSTR rather than regexp_replace then you can use
SELECT REGEXP_SUBSTR(l.id,'^\d+')
assuming you have only numbers before !
If I understand correctly, this is the pattern that you want:
SELECT REGEXP_SUBSTR(l.id,'^[^!]+', 1)
FROM (SELECT '1234567891123!123' as id from dual) l

Retrieve certain number from data set in Oracle 10g

1. <0,0><120.96,2000><241.92,4000><362.88,INF>
2. <0,0><143.64,2000><241.92,4000><362.88,INF>
3. <0,0><125.5,2000><241.92,4000><362.88,INF>
4. <0,0><127.5,2000><241.92,4000><362.88,INF>
Above is the data set I have in Oracle 10g. I need output as below
1. 120.96
2. 143.64
3. 125.5
4. 125.5
the output I want is only before "comma" (120.96). I tried using REGEXP_SUBSTR but I could not get any output. It will be really helpful if someone could provide effective way to solve this
Here is one method that first parses out the second element and then gets the first number in it:
select regexp_substr(regexp_substr(x, '<[^>]*>', 1, 2), '[0-9.]+', 1, 1)
Another method just gets the third number in the string:
select regexp_substr(x, '[0-9.]+', 1, 3)
Here is an approach without using Regexp.
Find the index of second occurrence of '<'. Then find the second occurrence of ',' use those values in substring.
with
data as
(
select '<0,0><120.96,2000><241.92,4000><362.88,INF>' x from dual
UNION ALL
select '<0,0><143.64,2000><241.92,4000><362.88,INF>' x from dual
UNION ALL
select '<0,0><125.5,2000><241.92,4000><362.88,INF>' from dual
)
select substr(x, instr(x,'<',1,2)+1, instr(x,',',1,2)- instr(x,'<',1,2)-1)
from data
Approach Using Regexp:
Identify the 2nd occurence of numerical value followed by a comma
Then remove the trailing comma.
with
data as
(
select '<0,0><120.96,2000><241.92,4000><362.88,INF>' x from dual
UNION ALL
select '<0,0><143.64,2000><241.92,4000><362.88,INF>' x from dual
UNION ALL
select '<0,0><125.5,2000><241.92,4000><362.88,INF>' from dual
)
select
trim(TRAILING ',' FROM regexp_substr(x,'[0-9.]+,',1,2))
from data
This example uses regexp_substr to get the string contained within the 2nd occurance of a less than sign and a comma:
SQL> with tbl(id, str) as (
select 1, '<0,0><120.96,2000><241.92,4000><362.88,INF>' from dual union
select 2, '<0,0><143.64,2000><241.92,4000><362.88,INF>' from dual union
select 3, '<0,0><125.5,2000><241.92,4000><362.88,INF>' from dual union
select 4, '<0,0><127.5,2000><241.92,4000><362.88,INF>' from dual
)
select id,
regexp_substr(str, '<(.*?),', 1, 2, null, 1) value
from tbl;
ID VALUE
---------- -------------------------------------------
1 120.96
2 143.64
3 125.5
4 127.5
EDIT: I realized the OP specified 10g and the regexp_substr example I gave used the 6th argument (subgroup) which was added in 11g. Here is an example using regexp_replace instead which should work with 10g:
SQL> with tbl(id, str) as (
select 1, '<0,0><120.96,2000><241.92,4000><362.88,INF>' from dual union
select 2, '<0,0><143.64,2000><241.92,4000><362.88,INF>' from dual union
select 3, '<0,0><125.5,2000><241.92,4000><362.88,INF>' from dual union
select 4, '<0,0><127.5,2000><241.92,4000><362.88,INF>' from dual
)
select id,
regexp_replace(str, '^(.*?)><(.*?),.*$', '\2') value
from tbl;
ID VALUE
---------- ----------
1 120.96
2 143.64
3 125.5
4 127.5
SQL>

Oracle: Replace first character to other character

I have a table in oracle in which we have one column having data as B12345, means first alphabet always B and followed by numeric. I want to replace all such instances with BH that will become BH12345
So if already there is a value called BH45678 in that column don't update.
Only where find B followed by numeric need updates.
Get the rows which have B followed by digits using regexp_like. Then use replace to replace B with BH for those rows.
select replace(col,'B','BH')
from tablename
where regexp_like(col,'^B\d+$')
with
inputs( str ) as (
select 'B123' from dual union all
select 'BONE' from dual union all
select 'BH55' from dual union all
select 'Z123' from dual union all
select 'B13H' from dual
)
select str, regexp_replace(str, '^B(\d)', 'BH\1') as new_str
from inputs
;
STR NEW_STR
---- -------
B123 BH123
BONE BONE
BH55 BH55
Z123 Z123
B13H BH13H
5 rows selected.

how to select exact 7 or 10 world in oracle using regular expression

I am working on below query, I am expected to select exact 7 or 10 digit values columns using regular expression, I have used express in regexp_like() function of oracle, but its not working, please help
Query :
select * from
(select '1234567CELL' "a" from dual
union
select '123CaLLAsasd12' "a" from dual
union
select 'as9960488188CELLas12' "a" from dual
union
select '1234567' "a" from dual
union
select '9960488188' "a" from dual
union
select 'asdCELLqw' "a" from dual) b
where b."a" like '%CELL%' and regexp_like(b."a",'^(\d{7}|\d{10})$');
Expected output
"1234567"
"9960488188"
as above two rows, please check
^ and $ match the start and end of a string and the value cannot contain the string CELL and be solely a 7- or 10-digit number. Instead you could use the regular expression (^|\D)(\d{7}|\d{10})($|\D) which will match either the start of the string or a not digit character (^|\D) then either 7- or 10- digits and then either the end of the string or a non digit character ($|\D).
Like this:
WITH data ( a ) AS (
select '1234567CELL' from dual union
select '123CaLLAsasd12' from dual union
select 'as9960488188CELLas12' from dual union
select '1234567' from dual union
select '9960488188' from dual union
select 'asdCELLqw' from dual
)
SELECT a,
REGEXP_SUBSTR( a, '(^|\D)(\d{7}|\d{10})($|\D)', 1, 1, NULL, 2 ) AS val
FROM data
WHERE a LIKE '%CELL%'
AND REGEXP_LIKE( a, '(^|\D)(\d{7}|\d{10})($|\D)');
Output:
A VAL
-------------------- ----------
1234567CELL 1234567
as9960488188CELLas12 9960488188
You may just use
where regexp_like(b."a",'^([[:digit:]]{7}|[[:digit:]]{10})$')
Since the pattern is anchored (^ matches the start of the string and $ matches the end of the string) there can't be CELL inside the entries you fetch, and you can remove where b."a" like '%CELL%' from the query.