Extract words from a comma separated string in oracle - sql

Suppose I have string
Str = 'Aaa,Bbb,Abb,Ccc'
I want to separate the above str in two parts as follows
Str1 = 'Aaa,Abb'
Str2 = 'Bbb,Ccc'
That is any word in str starting with A should go in str1 rest all in str2.
How can I achieve this using Oracle queries?

That is any word in str starting with A should go in str1 rest all in str2.
To achieve it in pure SQL, I will use the following:
REGEXP_SUBSTR
LISTAGG
SUBSTR
INLINE VIEW
So, first I will split the comma delimited string using the techniques as demonstrated here Split single comma delimited string into rows.
And then, I will aggregate them using LISTAGG in an order.
For example,
SQL> WITH
2 t1 AS (
3 SELECT 'Aaa,Bbb,Abb,Ccc' str FROM dual
4 ),
5 t2 AS (
6 SELECT trim(regexp_substr(str, '[^,]+', 1, LEVEL)) str
7 FROM t1
8 CONNECT BY LEVEL <= regexp_count(str, ',')+1
9 ORDER BY str
10 )
11 SELECT
12 (SELECT listagg(str, ',') WITHIN GROUP(
13 ORDER BY NULL) str1
14 FROM t2
15 WHERE SUBSTR(str, 1, 1)='A'
16 ) str1,
17 (SELECT listagg(str, ',') WITHIN GROUP(
18 ORDER BY NULL) str
19 FROM t2
20 WHERE SUBSTR(str, 1, 1)<>'A'
21 ) str2
22 FROM dual
23 /
STR1 STR2
---------- ----------
Aaa,Abb Bbb,Ccc
SQL>
The WITH clause is just for demonstration purpose, in your real scenario, remove the with clause and use you table name directly. Though it looks neat using the WITH clause.

Use regext expression and ListAg function.
NOTE: LISTAGG function is available since Oracle 11g!
select listagg(s.name, ',') within group (order by name)
from (select regexp_substr('Aaa,Bbb,Abb,Ccc,Add,Ddd','[^,]+', 1, level) name from dual
connect by regexp_substr('Aaa,Bbb,Abb,Ccc,Add,Ddd', '[^,]+', 1, level) is not null) s
group by decode(substr(name,1,1),'A', 1, 0);

This query gives you the desired output in two different rows:
with temp as (select trim (both ',' from 'Aaa,Bbb,Abb,Ccc') as str from dual),
base_table as
( select trim (regexp_substr (t.str,
'[^' || ',' || ']+',
1,
level))
str
from temp t
connect by instr (str,
',',
1,
level - 1) > 0),
ult_table as
(select str,
case upper (substr (str, 1, 1)) when 'A' then 1 else 2 end
as l
from base_table)
select listagg (case when l = 1 then str else null end, ',')
within group (order by str)
str1,
listagg (case when l = 2 then str else null end, ',')
within group (order by str)
str2
from ult_table;
Output
L STR
---------- --------------------------------------------------------------------------------
1 Aaa,Abb
2 Bbb,Ccc

Related

How to count each item in Oracle?

Help, this is my query
SELECT distinct id, trim(regexp_substr(str, '[^,]+', 1, level)) str
FROM (SELECT id, REGEXP_REPLACE(to_char(advantages1), '"|\[|\]', '') str
FROM feedbacks) t
CONNECT BY instr(str, ',', 1, level - 1) > 0
order by str
and here's the result:
result
So, i want to make another column to count the result of each number in str. For example
STR TOTAL
110 1
111 2
112 2
113 4
114 1
How can i do that?
If I follow you correctly, you can use aggregation:
select trim(regexp_substr(str, '[^,]+', 1, level)) str , count(distinct id) total
from (select id, regexp_replace(to_char(advantages1), '"|\[|\]', '') str from feedbacks) t
connect by instr(str, ',', 1, level - 1) > 0
group by trim(regexp_substr(str, '[^,]+', 1, level))
order by str
If you just want to add a column to current SELECT-list, you'd need a COUNT() OVER() analytic function, but values of this column are repeated for each str value.
As looking at the expected result set, you want to distinctly count the result of the current query. The straightforward method is nesting this query :
SELECT str, COUNT(*) AS total
FROM ( <your_current_query_without_order_by_part> )
GROUP BY str
ORDER BY TO_NUMBER(str)

REGEXP_REPLACE back-reference with function call

Can I use some function call on REGEXP_REPLACE back-reference value?
For example I want to call chr() or any other function on back-reference value, but this
SELECT REGEXP_REPLACE('a 98 c 100', '(\d+)', ASCII('\1')) FROM dual;
just returns ASCII value of '\':
'a 92 c 92'
I want that the last parameter (replacement string) to be evaluated first and then to replace string. So result would be:
'a b c d'
Just for fun really, you could do the tokenization, conversion of numbers to characters, and aggregation using XPath:
select *
from xmltable(
'string-join(
for $t in tokenize($s, " ")
return if ($t castable as xs:integer) then codepoints-to-string(xs:integer($t)) else $t,
" ")'
passing 'a 98 c 100' as "s"
);
Result Sequence
--------------------------------------------------------------------------------
a b c d
The initial string value is passed in as $s; tokenize() splits that up using a space as the delimiter; each $t that generates is evaluated to see if it's an integer, and if it is then it's converted to the equivalent character via codepoints-to-string, otherwise it's left alone; then all the tokens are recombined with string-join().
If the original has runs of multiple spaces those will collapse to a single space (as they will with Littlefoot's regex).
I'm not that smart to do it using one regular expression, but - step-by-step, something like this might help. It splits the source string into rows, checks whether part of it is a number and - if so - selects CHR of it. Finally, everything is aggregated back to a single string.
SQL> with test (col) as
2 (select 'a 98 c 100' from dual),
3 inter as
4 (select level lvl,
5 regexp_substr(col, '[^ ]+', 1, level) c_val
6 from test
7 connect by level <= regexp_count(col, ' ') + 1
8 ),
9 inter_2 as
10 (select lvl,
11 case when regexp_like(c_val, '^\d+$') then chr(c_val)
12 else c_val
13 end c_val_2
14 from inter
15 )
16 select listagg(c_val_2, ' ') within group (order by lvl) result
17 from inter_2;
RESULT
--------------------
a b c d
SQL>
It can be shortened for one step (I intentionally left it as is so that you could execute one query at a time and check the result, to make things clearer):
SQL> with test (col) as
2 (select 'a 98 c 100' from dual),
3 inter as
4 (select level lvl,
5 case when regexp_like(regexp_substr(col, '[^ ]+', 1, level), '^\d+$')
6 then chr(regexp_substr(col, '[^ ]+', 1, level))
7 else regexp_substr(col, '[^ ]+', 1, level)
8 end c_val
9 from test
10 connect by level <= regexp_count(col, ' ') + 1
11 )
12 select listagg(c_val, ' ') within group (order by lvl) result
13 from inter;
RESULT
--------------------
a b c d
SQL>
[EDIT: what if input looks differently?]
That is somewhat simpler. Using REGEXP_SUBSTR, extract digits: ..., 1, 1 returns the first one, ... 1, 2 the second one. Pure REPLACE then replaces numbers with their CHR values.
SQL> with test (col) as
2 (select 'a98c100e' from dual)
3 select
4 replace(replace(col, regexp_substr(col, '\d+', 1, 1), chr(regexp_substr(col, '\d+', 1, 1))),
5 regexp_substr(col, '\d+', 1, 2), chr(regexp_substr(col, '\d+', 1, 2))) result
6 from test;
RESULT
--------------------
abcde
SQL>

How to remove duplicates from space separated list by Oracle regexp_replace? [duplicate]

This question already has answers here:
How to remove duplicates from comma separated list by regexp_replace in Oracle?
(2 answers)
Closed 4 years ago.
I have a list called 'A B A A C D'. My expected result is 'A B C D'. So far from web I have found out
regexp_replace(l_user ,'([^,]+)(,[ ]*\1)+', '\1');
Expression. But this is for , separated list. What is the modification need to be done in order to make it space separated list. no need to consider the order.
If I understand well you don't simply need to replace ',' with a space, but also to remove duplicates in a smarter way.
If I modify that expression to work with space instead of ',', I get
select regexp_replace('A B A A C D' ,'([^ ]+)( [ ]*\1)+', '\1') from dual
which gives 'A B A C D', not what you need.
A way to get your needed result could be the following, a bit more complicated:
with string(s) as ( select 'A B A A C D' from dual)
select listagg(case when rn = 1 then str end, ' ') within group (order by lev)
from (
select str, row_number() over (partition by str order by 1) rn, lev
from (
SELECT trim(regexp_substr(s, '[^ ]+', 1, level)) str,
level as lev
FROM string
CONNECT BY instr(s, ' ', 1, level - 1) > 0
)
)
My main problem here is that I'm not able to build a regexp that checks for non adjacent duplicates, so I need to split the string, check for duplicates and then aggregate again the non duplicated values, keeping the order.
If you don't mind the order of the tokens in the result string, this can be simplified:
with string(s) as ( select 'A B A A C D' from dual)
select listagg(str, ' ') within group (order by 1)
from (
SELECT distinct trim(regexp_substr(s, '[^ ]+', 1, level)) as str
FROM string
CONNECT BY instr(s, ' ', 1, level - 1) > 0
)
Assuming you want to keep the component strings in the order of their first occurrence (and not, say, reorder them alphabetically - your example is poorly chosen in this regard, because both lead to the same result), the problem is more complicated, because you must keep track of order too. Then for each letter you must keep just the first occurrence - here is where row_number() helps.
with
inputs ( str ) as ( select 'A B A A C D' from dual)
-- end test data; solution begins below this line
select listagg(token, ' ') within group (order by id) as new_str
from (
select level as id, regexp_substr(str, '[^ ]+', 1, level) as token,
row_number() over (
partition by regexp_substr(str, '[^ ]+', 1, level)
order by level ) as rn
from inputs
connect by regexp_substr(str, '[^ ]+', 1, level) is not null
)
where rn = 1
;
Xquery?
select xmlquery('string-join(distinct-values(ora:tokenize(.," ")), " ")' passing 'A B A A C D' returning content) result from dual

Regexp_replace processing result

I have a string with groups of nubmers. And Id like to make constant length string. Now I use two regexp_replace. First to add 10 numbers to string and next to cut string and take last 10 values:
with s(txt) as ( select '1030123:12031:1341' from dual)
select regexp_replace(
regexp_replace(txt, '(\d+)','0000000000\1')
,'\d+(\d{10})','\1') from s ;
But Id like to use only one regex something like
regexp_replace(txt, '(\d+)',lpad('\1',10,'0'))
But it don't work. lpad executed before regexp. Could you have any ideas?
With a slightly different approach, you can try the following:
with s(id, txt) as
(
select rownum, txt
from (
select '1030123:12031:1341' as txt from dual union all
select '1234:0123456789:1341' from dual
)
)
SELECT listagg(lpad(regexp_substr(s.txt, '[^:]+', 1, lines.column_value), 10, '0'), ':') within group (order by column_value) txt
FROM s,
TABLE (CAST (MULTISET
(SELECT LEVEL FROM dual CONNECT BY instr(s.txt, ':', 1, LEVEL - 1) > 0
) AS sys.odciNumberList )) lines
group by id
TXT
-----------------------------------
0001030123:0000012031:0000001341
0000001234:0123456789:0000001341
This uses the CONNECT BY to split every string based on the separator ':', then uses LPAD to pad to 10 and then aggregates the strings to build rows containing the concatenation of padded values
This works for non-empty sequences (e.g. 123::456)
with s(txt) as ( select '1030123:12031:1341' from dual)
select regexp_replace (regexp_replace (txt,'(\d+)',lpad('0',10,'0') || '\1'),'0*(\d{10})','\1')
from s
;

comma Separated List

I have procedure that has parameter that takes comma separated value ,
so when I enter Parameter = '1,0,1'
I want to return ' one , Zero , One' ?
You could use REPLACE function.
For example,
SQL> WITH DATA(str) AS(
2 SELECT '1,0,1' FROM dual
3 )
4 SELECT str,
5 REPLACE(REPLACE(str, '0', 'Zero'), '1', 'One') new_str
6 FROM DATA;
STR NEW_STR
----- ------------------------------------------------------------
1,0,1 One,Zero,One
SQL>
This query splits list into into numbers, converts numbers into words and joins them again together with function listagg:
with t1 as (select '7, 0, 11, 132' col from dual),
t2 as (select level lvl,to_number(regexp_substr(col,'[^,]+', 1, level)) col
from t1 connect by regexp_substr(col, '[^,]+', 1, level) is not null)
select listagg(case
when col=0 then 'zero'
else to_char(to_date(col,'j'), 'jsp')
end,
', ') within group (order by lvl) col
from t2
Output:
COL
-------------------------------------------
seven, zero, eleven, one hundred thirty-two
The limitation of this solution is that values range is between 0 and 5373484 (because 5373484 is maximum value for function to_date).
If you need higher values you can find hints in this article.