Get substring with REGEXP_SUBSTR - sql

I need to use regexp_substr, but I can't use it properly
I have column (l.id) with numbers, for example:
1234567891123!123 EXPECTED OUTPUT: 1234567891123
123456789112!123 EXPECTED OUTPUT: 123456789112
12345678911!123 EXPECTED OUTPUT: 12345678911
1234567891123!123 EXPECTED OUTPUT: 1234567891123
I want use regexp_substr before the exclamation mark (!)
SELECT REGEXP_SUBSTR(l.id,'[%!]',1,13) from l.table
is it ok ?

You can try using INSTR() and substr()
DEMO
select substr(l.id,1,INSTR(l.id,'!', 1, 1)-1) from dual

You want to remove the exclamation mark and all following characters it seems. That is simply:
select regexp_replace(id, '!.*', '') from mytable;

Look at it like a delimited string where the bang is the delimiter and you want the first element, even if it is NULL. Make sure to test all possibilities, even the unexpected ones (ALWAYS expect the unexpected)! Here the assumption is if there is no delimiter you'll want what's there.
The regex returns the first element followed by a bang or the end of the line. Note this form of the regex handles a NULL first element.
SQL> with tbl(id, str) as (
select 1, '1234567891123!123' from dual union all
select 2, '123456789112!123' from dual union all
select 3, '12345678911!123' from dual union all
select 4, '1234567891123!123' from dual union all
select 5, '!123' from dual union all
select 6, '123!' from dual union all
select 7, '' from dual union all
select 8, '12345' from dual
)
select id, regexp_substr(str, '(.*?)(!|$)', 1, 1, NULL, 1)
from tbl
order by id;
ID REGEXP_SUBSTR(STR
---------- -----------------
1 1234567891123
2 123456789112
3 12345678911
4 1234567891123
5
6 123
7
8 12345
8 rows selected.
SQL>

If you like to use REGEXP_SUBSTR rather than regexp_replace then you can use
SELECT REGEXP_SUBSTR(l.id,'^\d+')
assuming you have only numbers before !

If I understand correctly, this is the pattern that you want:
SELECT REGEXP_SUBSTR(l.id,'^[^!]+', 1)
FROM (SELECT '1234567891123!123' as id from dual) l

Related

Oracle REGEXP_REPLACE function to find decimal and special characters

I am working with table data that contains strings with decimal and back-slash like below:
info
1/2.2.2
2/1.1.1
3/1.1.11
I need to use a regular expression to replace the data like below:
info
1/2.2
2/1.1
3/1.1
Don't use a (slow) regular expression, use simple (faster) string functions instead:
SELECT info,
CASE
WHEN INSTR(info, '.', 1, 2) > 0
THEN SUBSTR(info, 1, INSTR(info, '.', 1, 2) - 1)
ELSE info
END AS part
FROM table_name;
Which, for the sample data:
CREATE TABLE table_name (info) AS
SELECT '1/2.2.2' FROM DUAL UNION ALL
SELECT '2/1.1.1' FROM DUAL UNION ALL
SELECT '3/1.1.11' FROM DUAL UNION ALL
SELECT '3/1.1' FROM DUAL;
Outputs:
INFO
PART
1/2.2.2
1/2.2
2/1.1.1
2/1.1
3/1.1.11
3/1.1
3/1.1
3/1.1
If you want to update the table then:
UPDATE table_name
SET info = SUBSTR(info, 1, INSTR(info, '.', 1, 2) - 1)
WHERE INSTR(info, '.', 1, 2) > 0
fiddle
For the sake of argument, here's a solution using REGEXP_SUBSTR(). REGEXP_SUBSTR() returns NULL if the pattern is not found. Thanks to MT0 for the CTE so I didn't have to type it up :-)
WITH table_name(ID, info) AS (
SELECT 1, '1/2.2.2' FROM DUAL UNION ALL
SELECT 2, '2/1.1.1' FROM DUAL UNION ALL
SELECT 3, '3/1.1.11' FROM DUAL UNION ALL
SELECT 4, '3/1.1' FROM DUAL UNION ALL
SELECT 5, '4/4' FROM DUAL)
SELECT ID, REGEXP_SUBSTR(info, '\d/\d\.\d') DATA
from table_name;
ID DATA
---------- --------
1 1/2.2
2 2/1.1
3 3/1.1
4 3/1.1
5
5 rows selected.

Extracting substring in Oracle

Let's say I have three rows with value as
1 121/2808B|:6081
2 OD308B|:6081_1:
3 008312100001200|:6081_1
I want to display value only until B but want to exclude everything after B. So as you can see in above data:
from 121/2808B|:6081 I want only 121/2808B
from OD308B|:6081_1: only OD308B
from 008312100001200|:6081_1 only 008312100001200.
Thanks for the Help.
Try this: regexp_substr('<Your_string>','[^B]+')
SELECT
REGEXP_SUBSTR('121/2808B|:6081', '[^B]+')
FROM
DUAL;
REGEXP_S
--------
121/2808
SELECT
REGEXP_SUBSTR('OD308B|:6081_1:', '[^B]+')
FROM
DUAL;
REGEX
-----
OD308
SELECT
REGEXP_SUBSTR('008312100001200.', '[^B]+')
FROM
DUAL;
REGEXP_SUBSTR('0
----------------
008312100001200.
db<>fiddle demo
Cheers!!
You could try using SUBSTR() and INSTR()
select SUBSTR('121/2808B|:6081',1,INSTR('121/2808B|:6081','B', 1, 1) -1)
from DUAL
I think you forgot to mention that you wanted to use | as a field separator, but I deduced this from the expected result from the third string. As such the following should give you what you want:
WITH cteData AS (SELECT 1 AS ID, '121/2808B|:6081' AS STRING FROM DUAL UNION ALL
SELECT 2, 'OD308B|:6081_1:' FROM DUAL UNION ALL
SELECT 3, '008312100001200|:6081_1' FROM DUAL)
SELECT ID, STRING, SUBSTR(STRING, 1, CASE
WHEN INSTR(STRING, 'B') = 0 THEN INSTR(STRING, '|')-1
ELSE INSTR(STRING, 'B')-1
END) AS UP_TO_B
FROM cteData;
dbfiddle here
Assuming Bob Jarvis is correct in the assumption that "|" is also a delimiter (as seems likely) try:
-- define test data
with test as
( select '121/2808B|:6081' stg from dual union all
select 'OD308B|:6081_1:' from dual union all
select '008312100001200|:6081_1' from dual
)
-- execute extract
select regexp_substr(stg , '[^B|]+') val
from test ;

How to remove one or more instances of '?*' in a string using REGEXP_REPLACE function in oracle?

I'm getting the characters '?*' one to three times in a column called Line. I am required to remove these characters. How do I do it using Replace or REGEXP_REPLACE?
SELECT
Line, REGEXP_REPLACE(LINE,'[?*]','')--([^\+]+)
FROM
TABLE
WHERE
INSTR(LINE,'?*') != 0;
where
REGEXP_REPLACE(LINE,'\?*','') replaces the ? alone and leaves the * untouched.
REGEXP_REPLACE(LINE,'?*','') replaces nothing.
REGEXP_REPLACE(LINE,'[?*]','') replaces all ?s and all *s. I am only replacing when ? and * comes together as ?*.
If you need the remove the string '?*', you can use replace:
SQL> with test(string) as (
2 select 'aa?*b?' from dual union all
3 select 'a*a?*??b?' from dual union all
4 select 'a*a??b???c*?**cc' from dual union all
5 select 'aa?b?*?cc?d??*?*?' from dual
6 )
7 select string, replace(string, '?*', '') as result
8 from test;
STRING RESULT
----------------- ---------------
aa?*b? aab?
a*a?*??b? a*a??b?
a*a??b???c*?**cc a*a??b???c**cc
aa?b?*?cc?d??*?*? aa?b?cc?d??
Use (\?\*) as pattern :
with tab(line) as
(
select 'abc?*ghh*?g?l*' from dual union all
select '?*U?KJ*H' from dual union all
select '*R5?4*&t?*frg?*' from dual
)
select regexp_replace(line,'(\?\*)') as "Result String"
from tab;
Result String
-------------
abcghh*?g?l*
U?KJ*H
*R5?4*&tfrg
Demo

Retrieve certain number from data set in Oracle 10g

1. <0,0><120.96,2000><241.92,4000><362.88,INF>
2. <0,0><143.64,2000><241.92,4000><362.88,INF>
3. <0,0><125.5,2000><241.92,4000><362.88,INF>
4. <0,0><127.5,2000><241.92,4000><362.88,INF>
Above is the data set I have in Oracle 10g. I need output as below
1. 120.96
2. 143.64
3. 125.5
4. 125.5
the output I want is only before "comma" (120.96). I tried using REGEXP_SUBSTR but I could not get any output. It will be really helpful if someone could provide effective way to solve this
Here is one method that first parses out the second element and then gets the first number in it:
select regexp_substr(regexp_substr(x, '<[^>]*>', 1, 2), '[0-9.]+', 1, 1)
Another method just gets the third number in the string:
select regexp_substr(x, '[0-9.]+', 1, 3)
Here is an approach without using Regexp.
Find the index of second occurrence of '<'. Then find the second occurrence of ',' use those values in substring.
with
data as
(
select '<0,0><120.96,2000><241.92,4000><362.88,INF>' x from dual
UNION ALL
select '<0,0><143.64,2000><241.92,4000><362.88,INF>' x from dual
UNION ALL
select '<0,0><125.5,2000><241.92,4000><362.88,INF>' from dual
)
select substr(x, instr(x,'<',1,2)+1, instr(x,',',1,2)- instr(x,'<',1,2)-1)
from data
Approach Using Regexp:
Identify the 2nd occurence of numerical value followed by a comma
Then remove the trailing comma.
with
data as
(
select '<0,0><120.96,2000><241.92,4000><362.88,INF>' x from dual
UNION ALL
select '<0,0><143.64,2000><241.92,4000><362.88,INF>' x from dual
UNION ALL
select '<0,0><125.5,2000><241.92,4000><362.88,INF>' from dual
)
select
trim(TRAILING ',' FROM regexp_substr(x,'[0-9.]+,',1,2))
from data
This example uses regexp_substr to get the string contained within the 2nd occurance of a less than sign and a comma:
SQL> with tbl(id, str) as (
select 1, '<0,0><120.96,2000><241.92,4000><362.88,INF>' from dual union
select 2, '<0,0><143.64,2000><241.92,4000><362.88,INF>' from dual union
select 3, '<0,0><125.5,2000><241.92,4000><362.88,INF>' from dual union
select 4, '<0,0><127.5,2000><241.92,4000><362.88,INF>' from dual
)
select id,
regexp_substr(str, '<(.*?),', 1, 2, null, 1) value
from tbl;
ID VALUE
---------- -------------------------------------------
1 120.96
2 143.64
3 125.5
4 127.5
EDIT: I realized the OP specified 10g and the regexp_substr example I gave used the 6th argument (subgroup) which was added in 11g. Here is an example using regexp_replace instead which should work with 10g:
SQL> with tbl(id, str) as (
select 1, '<0,0><120.96,2000><241.92,4000><362.88,INF>' from dual union
select 2, '<0,0><143.64,2000><241.92,4000><362.88,INF>' from dual union
select 3, '<0,0><125.5,2000><241.92,4000><362.88,INF>' from dual union
select 4, '<0,0><127.5,2000><241.92,4000><362.88,INF>' from dual
)
select id,
regexp_replace(str, '^(.*?)><(.*?),.*$', '\2') value
from tbl;
ID VALUE
---------- ----------
1 120.96
2 143.64
3 125.5
4 127.5
SQL>

Fetching value from Pipe-delimited String using Regex (Oracle)

I have a sample source string like below, which was in pipe delimited format in that the value obr can be at anywhere. I need to get the second value of the pipe from the first occurrence of obr. So for the below source strings the expected would be,
Source string:
select 'asd|dfg|obr|1|value1|end' text from dual
union all
select 'a|brx|123|obr|2|value2|end' from dual
union all
select 'hfv|obr|3|value3|345|pre|end' from dual
Expected output:
value1
value2
value3
I have tried the below regexp in oracle sql, but it is not working fine properly.
with t as (
select 'asd|dfg|obr|1|value1|end' text from dual
union all
select 'a|brx|123|obr|2|value2|end' from dual
union all
select 'hfv|obr|3|value3|345|pre|end' from dual
)
select text,to_char(regexp_replace(text,'*obr\|([^|]*\|)([^|]*).*$', '\2')) output from t;
It is working fine when the string starts with OBR, but when OBR is in the middle like the above samples it is not working fine.
Any help would be appreciated.
Not sure of how Oracle handles regular expressions, but starting with an asterisk usually implies that you're looking for zero or more null characters.
Have you tried '^.*obr\|([^|]*\|)([^|]*).*$' ?
This handles null elements and is wrapped in a NVL() call which supplies a value if 'obr' is not found or occurs too far toward the end of a record so a value 2 away is not possible:
SQL> with t(id, text) as (
select 1, 'asd|dfg|obr|1|value1|end' from dual
union
select 2, 'a|brx|123|obr|2|value2|end' from dual
union
select 3, 'hfv|obr|3|value3|345|pre|end' from dual
union
select 4, 'hfv|obr||value4|345|pre|end' from dual
union
select 5, 'a|brx|123|obriem|2|value5|end' from dual
union
select 6, 'a|brx|123|obriem|2|value6|obr' from dual
)
select
id,
nvl(regexp_substr(text, '\|obr\|[^|]*\|([^|]*)(\||$)', 1, 1, null, 1), 'value not found') value
from t;
ID VALUE
---------- -----------------------------
1 value1
2 value2
3 value3
4 value4
5 value not found
6 value not found
6 rows selected.
SQL>
The regex basically can be read as "look for a pattern of a pipe, followed by 'obr', followed by a pipe, followed by zero or more characters that are not a pipe, followed by a pipe, followed by zero or more characters that are not a pipe (remembered in a captured group), followed by a pipe or the end of the line". The regexp_substr() call then returns the 1st captured group which is the set of characters between the pipes 2 fields from the 'obr'.