'HEADER|N1000|E1001|N1002|E1003|N1004|N1005'
'HEADER|N156|E1|N7|E122|N4|E5'
'HEADER|E0|E1|E2|E3|E4|E5'
'HEADER|N0|N1|N2|N3|N4|N5'
'HEADER|N125'
How to extract the numbers in comma-separated format from this stringS?
Expected result:
1000,1001,1002,1003,1004,1005
How to extract the numbers with N or E as suffix/prefix ie.
N1000
Expected result:
1000,1002,1004,1005
below regex does not return the result needed. But I want some thing like this
select REGEXP_REPLACE(REGEXP_REPLACE('HEADER|N1000|E1001|N1002|E1003|N1004|N1005', '.*?(\d+)', '\1,'), ',?\.*$', '') from dual
the problem here is
when i want numbers with E OR N
select REGEXP_REPLACE(REGEXP_REPLACE('HEADER|N1000|E1001|N1002|E1003|N1004|N1005', '.*?N(\d+)', '\1,'), ',?\.*$', '') from dual
select REGEXP_REPLACE(REGEXP_REPLACE('HEADER|N1000|E1001|N1002|E1003|N1004|N1005', '.*?E(\d+)', '\1,'), ',?\.*$', '') from dual
they give good results for this scenerio
but when i input 'HEADER|N1000|E1001' it gives wrong answer plzzz verify and correct it
Update
Based on the changes to the question, the original answer is not valid. Instead, the solution is considerably more complex, using a hierarchical query to extract all the numbers from the string and then LISTAGG to put back together a list of numbers extracted from each string. To extract all numbers we use this query:
WITH cte AS (
SELECT DISTINCT data, level AS l, REGEXP_SUBSTR(data, '[NE]\d+', 1, level) AS num FROM test
CONNECT BY REGEXP_SUBSTR(data, '[NE]\d+', 1, level) IS NOT NULL
)
SELECT data, LISTAGG(SUBSTR(num, 2), ',') WITHIN GROUP (ORDER BY l) AS "All numbers"
FROM cte
GROUP BY data
Output (for the new sample data):
DATA All numbers
HEADER|E0|E1|E2|E3|E4|E5 0,1,2,3,4,5
HEADER|N0|N1|N2|N3|N4|N5 0,1,2,3,4,5
HEADER|N1000|E1001|N1002|E1003|N1004|N1005 1000,1001,1002,1003,1004,1005
HEADER|N125 125
HEADER|N156|E1|N7|E122|N4|E5 156,1,7,122,4,5
To select only numbers beginning with E, we modify the query to replace the [EN] in the REGEXP_SUBSTR expressions with just E i.e.
SELECT DISTINCT data, level AS l, REGEXP_SUBSTR(data, 'E\d+', 1, level) AS num FROM test
CONNECT BY REGEXP_SUBSTR(data, 'E\d+', 1, level) IS NOT NULL
Output:
DATA E-numbers
HEADER|E0|E1|E2|E3|E4|E5 0,1,2,3,4,5
HEADER|N0|N1|N2|N3|N4|N5
HEADER|N1000|E1001|N1002|E1003|N1004|N1005 1001,1003
HEADER|N125
HEADER|N156|E1|N7|E122|N4|E5 1,122,5
A similar change can be made to extract numbers commencing with N.
Demo on dbfiddle
Original Answer
One way to achieve your desired result is to replace a string of characters leading up to a number with that number and a comma, and then replace any characters from the last ,| to the end of string from the result:
SELECT REGEXP_REPLACE(REGEXP_REPLACE('HEADER|N1000|E1001|N1002|E1003|N1004|N1005|', '.*?(\d+)', '\1,'), ',?\|.*$', '') FROM dual
Output:
1000,1001,1002,1003,1004,1005
To only output the numbers beginning with N, we add that to the prefix string before the capture group:
SELECT REGEXP_REPLACE(REGEXP_REPLACE('HEADER|N1000|E1001|N1002|E1003|N1004|N1005|', '.*?N(\d+)', '\1,'), ',?\|.*$', '') FROM dual
Output:
1000,1002,1004,1005
To only output the numbers beginning with E, we add that to the prefix string before the capture group:
SELECT REGEXP_REPLACE(REGEXP_REPLACE('HEADER|N1000|E1001|N1002|E1003|N1004|N1005|', '.*?E(\d+)', '\1,'), ',?\|.*$', '') FROM dual
Output:
1001,1003
Demo on dbfiddle
I don't know what DBMS you are using, but here's one way to do it in Postgres:
WITH cte AS (
SELECT CAST('HEADER|N1000|E1001|N1002|E1003|N1004|N1005|' AS VARCHAR(1000)) AS myValue
)
SELECT SUBSTRING(MyVal FROM 2)
FROM (
SELECT REGEXP_SPLIT_TO_TABLE(myValue,'\|') MyVal
FROM cte
) src
WHERE SUBSTRING(MyVal FROM 1 FOR 1) = 'N'
;
SQL Fiddle
As Far as I have understood the question , you want to extract substrings starting with N from the string, You can try following (And then you can merge the output seperated by commas if needed)
select REPLACE(value, 'N', '')
from STRING_SPLIT('HEADER|N1000|E1001|N1002|E1003|N1004|N1005|', '|')
where value like 'N%'
OutPut :
1000
1002
1004
1005
How to replace multiple whole characters, except those in combinations...?
The below code replaces multiple characters, but it also disturbing those in combinations.
SELECT regexp_replace('a,ca,va,ea,r,y,q,b,g','(a|y|q|g)','X') RESULT FROM dual;
Current output:
RESULT
--------------------
X,cX,vX,eX,r,X,X,b,X
Expected output:
RESULT
------------------------
'X,ca,va,ea,r,X,X,b,X
I just want to replace only separate whole characters('a','y','q','g'), but not the 1 in combinations('ca','va','ea')...
Because you are delimiting with a comma ',' you can combine that like ',a,'
and this will replace only single a's.
you can try follows:
with t as
(
select 'a,ca,va,ea,r,y,q,b,g' str
from dual
)
select substr(sys_connect_by_path(regexp_replace(regexp_substr(str, '[^,]+', 1, level), '^(a|y|q|g)$', 'X'), ','), 2) as str
from t
where connect_by_isleaf = 1
connect by level <= length(regexp_replace(str, '[^,]*')) + 1;
Sadly oracle doesn´t support lookahead and lookbehind. But this is a solution i came up with.
SELECT regexp_replace
(regexp_replace
('a,ca,va,ea,r,y,q,b,g',
'^[ayqg](,)|(,)[ayqg](,)|(,)[ayqg]$',
'\2\4X\1\3'),'(,)[ayqg](,)','\1X\2')
RESULT FROM dual;
I had to use the regexp twice sadly, since it doesn´t find two similar values following after each other and replacing it. ..,a,y,.. is getting replaced as ..,X,y,... So the second call replaces the missing [ayqg] with the exact values. In the first inner regexp call replaces the first and last values.
Maybe this could be simplified into one expression, but i am not that conform with the regex from oracle.
As a explanation i am grouping the commata and basicly replace every ,[ayqg], with ,X, by backreferencing the commata
You would look for word boundaries, which is \b, and which is unfortunately not supported by Oracle's regexp_replace.
So let's look for a non-word character \W or the beginning ^ or ending $ of the text.
select
regexp_replace('a,ca,va,ea,r,y,q,b,g','(^|$|\W)(a|y|q|g)(^|$|\W)','\1X\3') as result
from dual;
In order to not remove the non-word characters, we must have them in the replace string: \1 for the expression in the first parenteses, \3 for the ones in the third. Thus we only change the expression in the second parentheses, which is a, y, q or g, with X.
Unfortunately above gives
X,ca,va,ea,r,X,q,b,X
The q was not replaced, because we recognize ',y,' thus being positioned a 'g,' whereas we'd need to be positioned at ',g,' to recognize g as a word, too.
So we need to replace in iterations (i.e. recursively):
with results(txt, num) as
(
select 'a,ca,va,ea,r,y,q,b,g' as txt, 0 as num from dual
union all
select regexp_replace(txt, '(^|$|\W)(a|y|q|g)(^|$|\W)','\1X\3'), num + 1 as num
from results
where txt <> regexp_replace(txt, '(^|$|\W)(a|y|q|g)(^|$|\W)','\1X\3')
)
select max(txt) keep (dense_rank last order by num) as result
from results;
EDIT: Kevin Esche is right; of course one has to do it only twice. Hence you can also do:
select
regexp_replace(txt, search_str, replace_str) as result
from
(
select
regexp_replace(txt, search_str, replace_str) as txt, search_str, replace_str
from
(
select
'a,ca,va,ea,r,y,q,y,q,b,g' as txt,
'(^|$|\W)(a|y|q|g)(^|$|\W)' as search_str,
'\1X\3' as replace_str
from dual
)
);
with replaced_values as (
SELECT case when length(val)=1 then regexp_replace(val,'(a|y|q|g)','X') else val end new_val, lvl
from (
SELECT regexp_substr('a,ca,va,ea,r,y,q,b,g','[^,]+', 1, LEVEL) val, level lvl FROM dual
connect by regexp_substr('a,ca,va,ea,r,y,q,b,g','[^,]+',1, LEVEL) is not null
) all_values
)
select lISTAGG(new_val, ',') WITHIN GROUP (ORDER BY lvl) RESULT
from replaced_values
This statement pivots data into rows and replaces only lines wich contains one character.
Data are then unpivoted in one rows
This sql works also with empty entries like 'a,,,b,c' and more complex regular expressions:
with t as
(select ',a,,ca,va,ea,bbb,ba,r,y,q,b,g,,,' as str,
',' as delimiter,
'(a|y|q|g|ea|[b]*)' as regexp_expr,
'X' as replace_expr
from dual)
(select substr (sys_connect_by_path(regexp_replace(substr(str,
decode(level - 1, 0, 0, instr(str, ',', 1, level - 1)) + 1,
decode(instr(str, ',', 1, level),
0,
length(str),
instr(str, ',', 1, level) - 1) -
decode(level - 1, 0, 0, instr(str, ',', 1, level - 1))),
'^' || regexp_expr || '$',
replace_expr), ','), 2)
from t
where connect_by_isleaf = 1
connect by level <= length(regexp_replace(str, '[^'|| delimiter||']')) + 1)
Result
,X,,ca,va,X,X,ba,r,X,X,X,X,,,
Don't Know much Oracle, but I would have thought something like this could work. Assuming the delimiter is always a comma.
SELECT
regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace('a,ca,va,ea,r,y,q,b,g','(,a,|,y,|,q,|,g,)',',X,') ,'(,a,|,y,|,q,|,g,)',',X,'), '(^a,|^y,|^q,|^g,)','X,'), '(,a$|,y$|,q$|,g$)',',X'), '(^a$|^y$|^q$|^g$)','X')
RESULT FROM test;
The first two parts replaces a single character in commas in the middle, the third part gets those at the start of the string, the fourth is for the end of the string and the fifth is for when then string has just one character.
This answer might will be simplifiable by advanced Regexp use.
How i can replace words?
RS & OS ===> D, LS & IS ==== >
SECTION_ID Output required
1-LS-1991 1-P-1991
1-IS-1991 1-P-1991
1-RS-1991 1- D- 1991
1-OS-1991 1-D-1991