Oracle regexp_substr - Find and extract all occurances in a column - sql

I'm working with an Oracle DB and I'm trying to find and extract ALL occurrences in a string matching a specific pattern...
It's supposed to be 3 letters, 3 numbers and then maybe a letter or not
I tried this:
SELECT REGEXP_SUBSTR(my_column, '[A-Za-z]{3}(\d)(\d)(\d)') AS values
FROM my_table
but it only returns the first occurrence.
Using
REGEXP_SUBSTR(my_column, '[A-Za-z]{3}(\d)(\d)(\d)', 0, 0, 'i')
doesn't work either
Does anybody have any ideas?
Edit:
I'm trying to extract it from PLSQL files. So its pretty much like SQL queries like
select *
from abc123
where some_value = 'some_value'

Try this query to break ABC123CDE456FGHI789 squence
with mine as (select 'ABC123CDE456FGH789' hello from dual)
select regexp_substr(hello, '[A-Za-z]{3}(\d){3}', 1, level) STR from mine
connect by regexp_substr(hello, '[A-Za-z]{3}(\d){3}', 1, level) is not null
Output
ABC123
CDE456
GHI789
For get specific postion then you want to use
select regexp_substr('ABC123CDE456FGH789', '[A-Za-z]{3}(\d){3}', 1, i) STR from dual
change i value as per position like
select regexp_substr('ABC123CDE456FGH789', '[A-Za-z]{3}(\d){3}', 1, 1) STR from dual
Output :
ABC123

Try to get number in 1.2.3 (suppose it is a domain of a country)
SELECT str,level,REGEXP_SUBSTR(str, '[0-9]', 1, LEVEL) AS substr
FROM (
SELECT country_domain str from country where regexp_like(country_domain, '[0-9]')
)
CONNECT BY LEVEL <= REGEXP_count(str, '[0-9]');
Output
STR LEVEL SUBSTR
1.2.3 1 1
1.2.3 2 2
1.2.3 3 3

Related

Oracle replace all characters before "." dot

I need to replace all characters with nothing before the . character and also replace all [ and ] with nothing.
Please see examples below:
from
to
[PINWHEEL_ASSET].[MX5530]
MX5530
[PINWHEEL_TRADE].[AR5403]
AR5403
The parts before and after the . dot are variables.
with
sample_data (my_string) as (
select '[PINWHEEL_ASSET].[MX5530]' from dual
)
select rtrim(substr(my_string, instr(my_string, '.') + 2), ']') as second_part
from sample_data
;
SECOND_PART
-----------
MX5530
This assumes that the input string looks exactly like this: [first].[second], where "first" and "second" are (possibly empty) strings that do not contain periods or closing brackets.
Yet another option is to use regular expressions (see line #6).
Sample data:
SQL> with test (col) as
2 (select '[PINWHEEL_ASSET].[MX5530]' from dual union all
3 select '[PINWHEEL_TRADE].[AR5403]' from dual
4 )
Query begins here:
5 select col,
6 regexp_substr(col, '\w+', 1, 2) result
7 from test;
COL RESULT
------------------------- --------------------
[PINWHEEL_ASSET].[MX5530] MX5530
[PINWHEEL_TRADE].[AR5403] AR5403
SQL>

shorten the string to the desired values

I have a table with a "Link" attribute. It has the following meaning:
INC102
INC1020
INC10200
I want to get the following result:
INC102
INC1020
INC10200
I need to leave the INC and the numbers after it without .
Tell me which command will help here? Since I understand that "Substr" will not work here.
I am use SQL Developer - Oracle
If you just want to extract "INC" with the following digits, use regexp_substr():
select regexp_substr(link, 'INC[0-9]+')
Here is a db<>fiddle.
Since I understand that "Substr" will not work here.
Says who?
SQL> with test (col) as
2 (select 'INC102' from dual union all
3 select 'INC1020' from dual union all
4 select 'INC10200' from dual
5 )
6 select
7 substr(col,
8 instr(col, '>') + 1,
9 instr(col, '<', instr(col, '>')) - instr(col, '>') - 1
10 ) result
11 from test;
RESULT
------------------------------------------------------------------------------------------------------------------------
INC102
INC1020
INC10200
SQL>
What does it do?
lines #1 - 5: sample data
line #8: starting point of the SUBSTR function is one character after the first > sign
line #9: length (used as the 3rd parameter of the SUBSTR) is position of the first < that follows the first > minus position of the first >
And that's it ... why wouldn't it work?
You could treat [<>] as the word delimiter and take the second word:
with test (col) as
( select 'INC102' from dual union all
select 'INC1020' from dual union all
select 'INC10200' from dual
)
select regexp_substr(col,'[^<>]+', 1, 2)
from test;
REGEXP_SUBSTR(COL,'[^<>]+',1,2)
-------------------------------
INC102
INC1020
INC10200

USING SQL . extract numbers comma separated from string 'HEADER|N1000|E1001|N1002|E1003|N1004|N1005'

'HEADER|N1000|E1001|N1002|E1003|N1004|N1005'
'HEADER|N156|E1|N7|E122|N4|E5'
'HEADER|E0|E1|E2|E3|E4|E5'
'HEADER|N0|N1|N2|N3|N4|N5'
'HEADER|N125'
How to extract the numbers in comma-separated format from this stringS?
Expected result:
1000,1001,1002,1003,1004,1005
How to extract the numbers with N or E as suffix/prefix ie.
N1000
Expected result:
1000,1002,1004,1005
below regex does not return the result needed. But I want some thing like this
select REGEXP_REPLACE(REGEXP_REPLACE('HEADER|N1000|E1001|N1002|E1003|N1004|N1005', '.*?(\d+)', '\1,'), ',?\.*$', '') from dual
the problem here is
when i want numbers with E OR N
select REGEXP_REPLACE(REGEXP_REPLACE('HEADER|N1000|E1001|N1002|E1003|N1004|N1005', '.*?N(\d+)', '\1,'), ',?\.*$', '') from dual
select REGEXP_REPLACE(REGEXP_REPLACE('HEADER|N1000|E1001|N1002|E1003|N1004|N1005', '.*?E(\d+)', '\1,'), ',?\.*$', '') from dual
they give good results for this scenerio
but when i input 'HEADER|N1000|E1001' it gives wrong answer plzzz verify and correct it
Update
Based on the changes to the question, the original answer is not valid. Instead, the solution is considerably more complex, using a hierarchical query to extract all the numbers from the string and then LISTAGG to put back together a list of numbers extracted from each string. To extract all numbers we use this query:
WITH cte AS (
SELECT DISTINCT data, level AS l, REGEXP_SUBSTR(data, '[NE]\d+', 1, level) AS num FROM test
CONNECT BY REGEXP_SUBSTR(data, '[NE]\d+', 1, level) IS NOT NULL
)
SELECT data, LISTAGG(SUBSTR(num, 2), ',') WITHIN GROUP (ORDER BY l) AS "All numbers"
FROM cte
GROUP BY data
Output (for the new sample data):
DATA All numbers
HEADER|E0|E1|E2|E3|E4|E5 0,1,2,3,4,5
HEADER|N0|N1|N2|N3|N4|N5 0,1,2,3,4,5
HEADER|N1000|E1001|N1002|E1003|N1004|N1005 1000,1001,1002,1003,1004,1005
HEADER|N125 125
HEADER|N156|E1|N7|E122|N4|E5 156,1,7,122,4,5
To select only numbers beginning with E, we modify the query to replace the [EN] in the REGEXP_SUBSTR expressions with just E i.e.
SELECT DISTINCT data, level AS l, REGEXP_SUBSTR(data, 'E\d+', 1, level) AS num FROM test
CONNECT BY REGEXP_SUBSTR(data, 'E\d+', 1, level) IS NOT NULL
Output:
DATA E-numbers
HEADER|E0|E1|E2|E3|E4|E5 0,1,2,3,4,5
HEADER|N0|N1|N2|N3|N4|N5
HEADER|N1000|E1001|N1002|E1003|N1004|N1005 1001,1003
HEADER|N125
HEADER|N156|E1|N7|E122|N4|E5 1,122,5
A similar change can be made to extract numbers commencing with N.
Demo on dbfiddle
Original Answer
One way to achieve your desired result is to replace a string of characters leading up to a number with that number and a comma, and then replace any characters from the last ,| to the end of string from the result:
SELECT REGEXP_REPLACE(REGEXP_REPLACE('HEADER|N1000|E1001|N1002|E1003|N1004|N1005|', '.*?(\d+)', '\1,'), ',?\|.*$', '') FROM dual
Output:
1000,1001,1002,1003,1004,1005
To only output the numbers beginning with N, we add that to the prefix string before the capture group:
SELECT REGEXP_REPLACE(REGEXP_REPLACE('HEADER|N1000|E1001|N1002|E1003|N1004|N1005|', '.*?N(\d+)', '\1,'), ',?\|.*$', '') FROM dual
Output:
1000,1002,1004,1005
To only output the numbers beginning with E, we add that to the prefix string before the capture group:
SELECT REGEXP_REPLACE(REGEXP_REPLACE('HEADER|N1000|E1001|N1002|E1003|N1004|N1005|', '.*?E(\d+)', '\1,'), ',?\|.*$', '') FROM dual
Output:
1001,1003
Demo on dbfiddle
I don't know what DBMS you are using, but here's one way to do it in Postgres:
WITH cte AS (
SELECT CAST('HEADER|N1000|E1001|N1002|E1003|N1004|N1005|' AS VARCHAR(1000)) AS myValue
)
SELECT SUBSTRING(MyVal FROM 2)
FROM (
SELECT REGEXP_SPLIT_TO_TABLE(myValue,'\|') MyVal
FROM cte
) src
WHERE SUBSTRING(MyVal FROM 1 FOR 1) = 'N'
;
SQL Fiddle
As Far as I have understood the question , you want to extract substrings starting with N from the string, You can try following (And then you can merge the output seperated by commas if needed)
select REPLACE(value, 'N', '')
from STRING_SPLIT('HEADER|N1000|E1001|N1002|E1003|N1004|N1005|', '|')
where value like 'N%'
OutPut :
1000
1002
1004
1005

making comma separated string in Oracle

How can we convert 12345 into 1,2,3,4,5 .
I can do the reverse by using replace command and i can replace comma by null. But I am not able to do the above. Could you please help on the same.
Thanks in Advance
You can use regexp_replace():
rtrim(regexp_replace('12345', '([0-9])', '\1,'), ',')
The rtrim() is necessary because the last digit is also replaced.
Online example: https://dbfiddle.uk/?rdbms=oracle_11.2&fiddle=724adecda03305b281ad3bf0f380ca58
If you have a fixed template with consecutive, one-digit numbers like 1234 or 12345 or 123456789
you may try the following by using listagg function of Oracle :
with t as
(
select '12345' as col from dual
)
select listagg(level,',') within group (order by level) as str
from t
connect by level <= length(col);
STR
---------
1,2,3,4,5
SQL Fiddle Demo 1
OR
For more generalized solution, use the following :
with t as
(
select 'abcde' as col from dual
)
select listagg(substr(col,level,1),',') within group (order by level) as str
from t
connect by level <= length(col);
STR
---------
a,b,c,d,e
SQL Fiddle Demo 2

How to replace more than one character in oracle?

How to replace multiple whole characters, except those in combinations...?
The below code replaces multiple characters, but it also disturbing those in combinations.
SELECT regexp_replace('a,ca,va,ea,r,y,q,b,g','(a|y|q|g)','X') RESULT FROM dual;
Current output:
RESULT
--------------------
X,cX,vX,eX,r,X,X,b,X
Expected output:
RESULT
------------------------
'X,ca,va,ea,r,X,X,b,X
I just want to replace only separate whole characters('a','y','q','g'), but not the 1 in combinations('ca','va','ea')...
Because you are delimiting with a comma ',' you can combine that like ',a,'
and this will replace only single a's.
you can try follows:
with t as
(
select 'a,ca,va,ea,r,y,q,b,g' str
from dual
)
select substr(sys_connect_by_path(regexp_replace(regexp_substr(str, '[^,]+', 1, level), '^(a|y|q|g)$', 'X'), ','), 2) as str
from t
where connect_by_isleaf = 1
connect by level <= length(regexp_replace(str, '[^,]*')) + 1;
Sadly oracle doesn´t support lookahead and lookbehind. But this is a solution i came up with.
SELECT regexp_replace
(regexp_replace
('a,ca,va,ea,r,y,q,b,g',
'^[ayqg](,)|(,)[ayqg](,)|(,)[ayqg]$',
'\2\4X\1\3'),'(,)[ayqg](,)','\1X\2')
RESULT FROM dual;
I had to use the regexp twice sadly, since it doesn´t find two similar values following after each other and replacing it. ..,a,y,.. is getting replaced as ..,X,y,... So the second call replaces the missing [ayqg] with the exact values. In the first inner regexp call replaces the first and last values.
Maybe this could be simplified into one expression, but i am not that conform with the regex from oracle.
As a explanation i am grouping the commata and basicly replace every ,[ayqg], with ,X, by backreferencing the commata
You would look for word boundaries, which is \b, and which is unfortunately not supported by Oracle's regexp_replace.
So let's look for a non-word character \W or the beginning ^ or ending $ of the text.
select
regexp_replace('a,ca,va,ea,r,y,q,b,g','(^|$|\W)(a|y|q|g)(^|$|\W)','\1X\3') as result
from dual;
In order to not remove the non-word characters, we must have them in the replace string: \1 for the expression in the first parenteses, \3 for the ones in the third. Thus we only change the expression in the second parentheses, which is a, y, q or g, with X.
Unfortunately above gives
X,ca,va,ea,r,X,q,b,X
The q was not replaced, because we recognize ',y,' thus being positioned a 'g,' whereas we'd need to be positioned at ',g,' to recognize g as a word, too.
So we need to replace in iterations (i.e. recursively):
with results(txt, num) as
(
select 'a,ca,va,ea,r,y,q,b,g' as txt, 0 as num from dual
union all
select regexp_replace(txt, '(^|$|\W)(a|y|q|g)(^|$|\W)','\1X\3'), num + 1 as num
from results
where txt <> regexp_replace(txt, '(^|$|\W)(a|y|q|g)(^|$|\W)','\1X\3')
)
select max(txt) keep (dense_rank last order by num) as result
from results;
EDIT: Kevin Esche is right; of course one has to do it only twice. Hence you can also do:
select
regexp_replace(txt, search_str, replace_str) as result
from
(
select
regexp_replace(txt, search_str, replace_str) as txt, search_str, replace_str
from
(
select
'a,ca,va,ea,r,y,q,y,q,b,g' as txt,
'(^|$|\W)(a|y|q|g)(^|$|\W)' as search_str,
'\1X\3' as replace_str
from dual
)
);
with replaced_values as (
SELECT case when length(val)=1 then regexp_replace(val,'(a|y|q|g)','X') else val end new_val, lvl
from (
SELECT regexp_substr('a,ca,va,ea,r,y,q,b,g','[^,]+', 1, LEVEL) val, level lvl FROM dual
connect by regexp_substr('a,ca,va,ea,r,y,q,b,g','[^,]+',1, LEVEL) is not null
) all_values
)
select lISTAGG(new_val, ',') WITHIN GROUP (ORDER BY lvl) RESULT
from replaced_values
This statement pivots data into rows and replaces only lines wich contains one character.
Data are then unpivoted in one rows
This sql works also with empty entries like 'a,,,b,c' and more complex regular expressions:
with t as
(select ',a,,ca,va,ea,bbb,ba,r,y,q,b,g,,,' as str,
',' as delimiter,
'(a|y|q|g|ea|[b]*)' as regexp_expr,
'X' as replace_expr
from dual)
(select substr (sys_connect_by_path(regexp_replace(substr(str,
decode(level - 1, 0, 0, instr(str, ',', 1, level - 1)) + 1,
decode(instr(str, ',', 1, level),
0,
length(str),
instr(str, ',', 1, level) - 1) -
decode(level - 1, 0, 0, instr(str, ',', 1, level - 1))),
'^' || regexp_expr || '$',
replace_expr), ','), 2)
from t
where connect_by_isleaf = 1
connect by level <= length(regexp_replace(str, '[^'|| delimiter||']')) + 1)
Result
,X,,ca,va,X,X,ba,r,X,X,X,X,,,
Don't Know much Oracle, but I would have thought something like this could work. Assuming the delimiter is always a comma.
SELECT
regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace('a,ca,va,ea,r,y,q,b,g','(,a,|,y,|,q,|,g,)',',X,') ,'(,a,|,y,|,q,|,g,)',',X,'), '(^a,|^y,|^q,|^g,)','X,'), '(,a$|,y$|,q$|,g$)',',X'), '(^a$|^y$|^q$|^g$)','X')
RESULT FROM test;
The first two parts replaces a single character in commas in the middle, the third part gets those at the start of the string, the fourth is for the end of the string and the fifth is for when then string has just one character.
This answer might will be simplifiable by advanced Regexp use.
How i can replace words?
RS & OS ===> D, LS & IS ==== >
SECTION_ID Output required
1-LS-1991 1-P-1991
1-IS-1991 1-P-1991
1-RS-1991 1- D- 1991
1-OS-1991 1-D-1991