Duplicate values when splitting a string

Duplicate values when splitting a string - sql

I'm trying to create a row for each person, str but I am getting extra output.
Can someone please explain what I did wrong and show me how to fix it.
Below is my test CASE and expected results. Thanks to all who answer and your expertise.
with rws as (
select 'Bob' person, 'AB,CR,DE' str from dual UNION ALL
select 'Jane' person, 'AB' str from dual
)
select person,
regexp_substr (
str,
'[^,]+',
1,
level
) value
from rws
connect by level <=
length ( str ) - length ( replace ( str, ',' ) ) + 1
ORDER BY person, str;
PERSON VALUE
Bob AB
Bob CR
Bob DE
Bob DE
Bob CR
Jane AB
Expected results
PERSON VALUE
Bob AB
Bob CR
Bob DE
Jane AB

The problem with your original query is that connect-by is looking at previous rows more than once - essentially, the second level of rows for Bob is also picking up the first row for Jane. This is a fairly well-known issue. You can avoid that by including a unique ID (with this example you'd have to rely on the name, and hope it's unique); but that then will loop, which you can avoid by adding a non-deterministic function call:
...
connect by level <=
length ( str ) - length ( replace ( str, ',' ) ) + 1
and prior person = person
and prior dbms_random.value is not null
ORDER BY person, str;
You could also use recursive subquery factoring instead of a hierarchical query:
with rws as (
select 'Bob' person, 'AB,CR,DE' str from dual UNION ALL
select 'Jane' person, 'AB' str from dual
),
rcte (person, str, cnt, lvl, value) as (
select person, str, length ( str ) - length ( replace ( str, ',' ) ), 1,
regexp_substr (
str,
'[^,]+',
1,
1
)
from rws
union all
select person, str, cnt, lvl + 1,
regexp_substr (
str,
'[^,]+',
1,
lvl + 1
)
from rcte
where lvl <= cnt
)
select person, value
from rcte
order by person, value;
fiddle
but you might find one of the other answers performs better, or at least is easy to understand and maintain.
Incidentally, your regular expression pattern might cause issues if you ever have a null element (i.e. two adjacent commas); this this answer for an explanation.

Here's one option:
SQL> WITH
2 rws
3 AS
4 (SELECT 'Bob' person, 'AB,CR,DE' str FROM DUAL
5 UNION ALL
6 SELECT 'Jane' person, 'AB' str FROM DUAL)
7 SELECT person,
8 REGEXP_SUBSTR (str,
9 '[^,]+',
10 1,
11 COLUMN_VALUE) VALUE
12 FROM rws
13 CROSS JOIN
14 TABLE (
15 CAST (
16 MULTISET ( SELECT LEVEL
17 FROM DUAL
18 CONNECT BY LEVEL <= REGEXP_COUNT (str, ',') + 1)
19 AS SYS.odcinumberlist))
20 ORDER BY person, str;
PERS VALUE
---- --------
Bob AB
Bob CR
Bob DE
Jane AB
SQL>
Your solution would return desired result if you applied SELECT DISTINCT (and fixed order by clause, but that's irrelevant), but that would also behave badly as number of rows you're working with grows.
SQL> with rws as (
2 select 'Bob' person, 'AB,CR,DE' str from dual UNION ALL
3 select 'Jane' person, 'AB' str from dual
4 )
5 select distinct person,
6 regexp_substr (
7 str,
8 '[^,]+',
9 1,
10 level
11 ) value
12 from rws
13 connect by level <=
14 length ( str ) - length ( replace ( str, ',' ) ) + 1;
PERS VALUE
---- --------
Jane AB
Bob CR
Bob AB
Bob DE
SQL>

You can use a recursive query and simple string functions (which is slightly more to type but is faster than regular expressions):
with rws (person, str) as (
select 'Bob', 'AB,CR,DE' from dual UNION ALL
select 'Jane', 'AB' from dual
),
bounds (person, str, spos, epos) AS (
SELECT person,
str,
1,
INSTR(str, ',', 1)
FROM rws
UNION ALL
SELECT person,
str,
epos + 1,
INSTR(str, ',', epos + 1)
FROM bounds
WHERE epos > 0
)
SELECT person,
CASE epos
WHEN 0
THEN SUBSTR(str, spos)
ELSE SUBSTR(str, spos, epos - spos)
END AS value
FROM bounds
ORDER BY person, value;
Which outputs:
PERSON
VALUE
Bob
AB
Bob
CR
Bob
DE
Jane
AB
fiddle

If you don't have quotes in the data, for 12c+ you may use JSON_TABLE and lateral join instead of recursion.
with rws as (
select 'Bob' person, 'AB,CR,DE' str from dual UNION ALL
select 'Jane' person, 'AB' str from dual union all
select 'Mark', null from dual
)
select
rws.person,
l.val_splitted,
l.rn
from rws
left join lateral (
select *
from json_table(
'["' || replace(rws.str, ',', '","') || '"]',
'$[*]'
columns (
val_splitted varchar2(10) path '$',
rn for ordinality
)
)
) l
on 1 = 1
order by 1
PERSON
VAL_SPLITTED
RN
Bob
AB
1
Bob
CR
2
Bob
DE
3
Jane
AB
1
Mark
1

Related

SUBSTR to ADD value in oracle

I have table with column having data in below format in Oracle DB.
COL 1
abc,mno:EMP
xyz:EMP;tyu,opr:PROF
abc,mno:EMP;tyu,opr:PROF
I am trying to convert the data in below format
COL 1
abc:EMP;mno:EMP
xyz:EMP;tyu:PROF;opr:PROF
abc:EMP;mno:EMP;tyu:PROF;opr:PROF
Basically trying to get everything after : and before ; to move it substitute comma with it.
I tried some SUBSTR and LISTAGG but couldn't get anything worth sharing.
Regards.

Here's one option; read comments within code.
SQL> with test (id, col) as
2 -- sample data
3 (select 1, 'abc,mno:EMP' from dual union all
4 select 2, 'xyz:EMP;tyu,opr:PROF' from dual union all
5 select 3, 'abc,mno:EMP;tyu,opr:PROF' from dual
6 ),
7 temp as
8 -- split sample data to rows
9 (select id,
10 column_value cv,
11 regexp_substr(col, '[^;]+', 1, column_value) val
12 from test cross join
13 table(cast(multiset(select level from dual
14 connect by level <= regexp_count(col, ';') + 1
15 ) as sys.odcinumberlist))
16 )
17 -- finally, replace comma with a string that follows a colon sign
18 select id,
19 listagg(replace(val, ',', substr(val, instr(val, ':')) ||';'), ';') within group (order by cv) new_val
20 from temp
21 group by id
22 order by id;
ID NEW_VAL
---------- ----------------------------------------
1 abc:EMP;mno:EMP
2 xyz:EMP;tyu:PROF;opr:PROF
3 abc:EMP;mno:EMP;tyu:PROF;opr:PROF
SQL>

Using the answer of littlefoot, if i were to use cross apply i wouldnt need to cast as multiset...
with test (id, col) as
-- sample data
(select 1, 'abc,mno:EMP' from dual union all
select 2, 'xyz:EMP;tyu,opr:PROF' from dual union all
select 3, 'abc,mno:EMP;tyu,opr:PROF' from dual
),
temp as
-- split sample data to rows
(select id,
column_value cv,
regexp_substr(col, '[^;]+', 1, column_value) val
from test
cross apply (select level as column_value
from dual
connect by level<= regexp_count(col, ';') + 1)
)
-- finally, replace comma with a string that follows a colon sign
select id,
listagg(replace(val, ',', substr(val, instr(val, ':')) ||';'), ';') within group (order by cv) new_val
from temp
group by id
order by id;

You do not need recursive anything, just basic regex: if the pattern is always something,something2:someCode (e.g. you have no colon before the comma), then it would be sufficient.
with test (id, col) as (
select 1, 'abc,mno:EMP' from dual union all
select 2, 'xyz:EMP;tyu,opr:PROF' from dual union all
select 3, 'abc,mno:EMP;tyu,opr:PROF' from dual union all
select 3, 'abc,mno:EMP;tyu,opr:PROF;something:QWE;something2:QWE' from dual
)
select
/*
Grab this groups:
1) Everything before the comma
2) Then everything before the colon
3) And then everything between the colon and a semicolon
Then place group 3 between 1 and 2
*/
trim(trailing ';' from regexp_replace(col || ';', '([^,]+),([^:]+):([^;]+)', '\1:\3;\2:\3')) as res
from test
| RES |
| :------------------------------------------------------------- |
| abc:EMP;mno:EMP |
| xyz:EMP;tyu:PROF;opr:PROF |
| abc:EMP;mno:EMP;tyu:PROF;opr:PROF |
| abc:EMP;mno:EMP;tyu:PROF;opr:PROF;something:QWE;something2:QWE |
db<>fiddle here

Sorting comma delimited datasets in row

This is what is given
Numbers Powers
4,5,1 WATER,FIRE
6,3,9 ICE,WATER,FIRE
My requirement is (sorted order)
Numbers Powers
1,4,5 FIRE,WATER
3,6,9 FIRE,ICE,WATER .
I want it in sorted order! How to do it in database?

Split column to rows, then aggregate them back, sorted.
SQL> with test (id, num, pow) as
2 (select 1, '4,5,1', 'water,fire' from dual union all
3 select 2, '6,3,9', 'ice,water,fire' from dual
4 ),
5 temp as
6 -- split columns to rows
7 (select id,
8 regexp_substr(num, '[^,]+', 1, column_value) num1,
9 regexp_substr(pow, '[^,]+', 1, column_value) pow1
10 from test join table(cast(multiset(select level from dual
11 connect by level <= regexp_count(num, ',') + 1
12 ) as sys.odcinumberlist)) on 1 = 1
13 )
14 -- aggregate them back, sorted
15 select id,
16 listagg(num1, ',') within group (order by to_number(num1)) num_result,
17 listagg(pow1, ',') within group (order by pow1) pow_result
18 from temp
19 group by id;
ID NUM_RESULT POW_RESULT
---------- ------------------------------ ------------------------------
1 1,4,5 fire,water
2 3,6,9 fire,ice,water
SQL>

Oracle Setup:
CREATE TABLE test_data ( Numbers, Powers ) AS
SELECT '4,5,1', 'WATER,FIRE' FROM DUAL UNION ALL
SELECT '6,3,9', 'ICE,WATER,FIRE' FROM DUAL UNION ALL
SELECT '7', 'D,B,E,C,A' FROM DUAL
Query:
SELECT (
SELECT LISTAGG( TO_NUMBER( REGEXP_SUBSTR( t.numbers, '\d+', 1, LEVEL ) ), ',' )
WITHIN GROUP ( ORDER BY TO_NUMBER( REGEXP_SUBSTR( t.numbers, '\d+', 1, LEVEL ) ) )
FROM DUAL
CONNECT BY LEVEL <= REGEXP_COUNT( t.numbers, ',' ) + 1
) AS numbers,
(
SELECT LISTAGG( REGEXP_SUBSTR( t.powers, '[^,]+', 1, LEVEL ), ',' )
WITHIN GROUP ( ORDER BY REGEXP_SUBSTR( t.powers, '[^,]+', 1, LEVEL ) )
FROM DUAL
CONNECT BY LEVEL <= REGEXP_COUNT( t.powers, ',' ) + 1
) AS numbers
FROM test_data t
Output:
NUMBERS | NUMBERS
:------ | :-------------
1,4,5 | FIRE,WATER
3,6,9 | FIRE,ICE,WATER
7 | A,B,C,D,E
db<>fiddle here

You can try the following:
I have used the table as I will need some value to get a distinct value. here I have used ROWID.
SELECT
ID,
LISTAGG(NUM, ',') WITHIN GROUP(
ORDER BY
NUM
) AS NUM,
LISTAGG(POW, ',') WITHIN GROUP(
ORDER BY
POW
) AS POW
FROM
(
SELECT
DISTINCT ROWID,
ID,
REGEXP_SUBSTR(NUM, '[^,]+', 1, LEVEL) NUM,
REGEXP_SUBSTR(POW, '[^,]+', 1, LEVEL) POW
FROM
TEST
CONNECT BY REGEXP_SUBSTR(NUM, '[^,]+', 1, LEVEL) IS NOT NULL
OR REGEXP_SUBSTR(POW, '[^,]+', 1, LEVEL) IS NOT NULL
)
GROUP BY ID
ORDER BY ID;
db<>fiddle demo
Cheers!!
----
UPDATE
----
As mentioned in a comment that it is generating duplicates, I have re-framed the whole query as following:
SELECT
ID,
LISTAGG(C_S.NUM, ',') WITHIN GROUP(
ORDER BY
C_S.NUM
) AS NUM,
LISTAGG(C_S.POW, ',') WITHIN GROUP(
ORDER BY
C_S.POW
) AS POW
FROM
(SELECT
T.ID,
REGEXP_SUBSTR(T.NUM, '[^,]+', 1, NUMS_COMMA.COLUMN_VALUE) NUM,
REGEXP_SUBSTR(T.POW, '[^,]+', 1, NUMS_COMMA.COLUMN_VALUE) POW
FROM
TEST T,
TABLE ( CAST(MULTISET(
SELECT
LEVEL
FROM
DUAL
CONNECT BY
LEVEL <= GREATEST(LENGTH(REGEXP_REPLACE(T.NUM, '[^,]+')),
LENGTH(REGEXP_REPLACE(T.POW, '[^,]+'))) + 1
) AS SYS.ODCINUMBERLIST) ) NUMS_COMMA) C_S
GROUP BY ID;
db<>fiddle demo updated
Cheers!!

Oracle/SQL - Need query that will select max value from string in each row

I need a graceful way to select the max value from a field holding a comma delimited list.
Expected Values:
List_1 | Last
------ | ------
A,B,C | C
B,D,C | D
I'm using the following query and I'm not getting what's expected.
select
list_1,
(
select max(values) WITHIN GROUP (order by 1)
from (
select
regexp_substr(list_1,'[^,]+', 1, level) as values
from dual
connect by regexp_substr(list_1, '[^,]+', 1, level) is not null)
) as last
from my_table
Anyone have any ideas to fix my query?

with
test_data ( id, list_1 ) as (
select 101, 'A,B,C' from dual union all
select 102, 'B,D,C' from dual union all
select 105, null from dual union all
select 122, 'A' from dual union all
select 140, 'A,B,B' from dual
)
-- end of simulated table (for testing purposes only, not part of the solution)
select id, list_1, max(token) as max_value
from ( select id, list_1,
regexp_substr(list_1, '([^,])(,|$)', 1, level, null, 1) as token
from test_data
connect by level <= 1 + regexp_count(list_1, ',')
and prior id = id
and prior sys_guid() is not null
)
group by id, list_1
order by id
;
ID LIST_1_ MAX_VAL
---- ------- -------
101 A,B,C C
102 B,D,C D
105
122 A A
140 A,B,B B
In Oracle 12.1 or higher, this can be re-written using the LATERAL clause:
select d.id, d.list_1, x.max_value
from test_data d,
lateral ( select max(regexp_substr(list_1, '([^,]*)(,|$)',
1, level, null, 1)) as max_value
from test_data x
where x.id = d.id
connect by level <= 1 + regexp_count(list_1, ',')
) x
order by d.id
;

Oracle Regex Connect By

I am trying to produce multiple rows after performing a regex on a column splitting all values in square brackets. I'm only able to return a single value though, currently.
The field I am performing the regex has this value:
[1265]*[1263]
I am trying to get 1265 and 1263 in my result set as separate rows.
SELECT REGEXP_SUBSTR(column,'\[(.*?)\]',1,LEVEL) AS "col1"
FROM table
CONNECT BY REGEXP_SUBSTR(column,'\[(.*?)\]',1,LEVEL) IS NOT NULL;
Instead I just get this in the result set.
[1263]

with test (rn, col) as
(
select 'a', '[123]*[abc] []' from dual union all
select 'b', '[45][def] ' from dual union all
select 'c', '[678],.*' from dual
),
coll (rn, col) as
(
select rn,regexp_replace(col, '(\[.*?\])|.', '\1') from test
),
cte (rn, cnt, col, i) as
(
select rn, 1, col, regexp_substr(col, '(\[(.*?)\])', 1, 1, null, 2)
from coll
union all
select rn, cnt+1, col, regexp_substr(col, '(\[(.*?)\])', 1, cnt+1, null, 2)
from cte
where cnt+1 <= regexp_count(col, '\[.*?\]')
)
select * from cte
order by 1,2;

This regex counts elements by looking for closing brackets and returns the digits inside the brackets, allowing for NULLs. Separators are ignored since the data elements you want are surrounded by square brackets we can focus on those.
SQL> with test(rownbr, col) as (
select 1, '[1265]**[1263]' from dual union
select 2, '[123]' from dual union
select 3, '[111][222]*[333]' from dual union
select 4, '[411]*[][433]' from dual
)
select distinct rownbr, level as element,
regexp_substr(col, '\[([0-9]*)\]', 1, level, null, 1) value
from test
connect by level <= regexp_count(col, ']')
order by rownbr, element;
ROWNBR ELEMENT VALUE
---------- ---------- -----
1 1 1265
1 2 1263
2 1 123
3 1 111
3 2 222
3 3 333
4 1 411
4 2
4 3 433
9 rows selected.
SQL>

Split comma separated values of a column in row, through Oracle SQL query

I have a table like below:
-------------
ID | NAME
-------------
1001 | A,B,C
1002 | D,E,F
1003 | C,E,G
-------------
I want these values to be displayed as:
-------------
ID | NAME
-------------
1001 | A
1001 | B
1001 | C
1002 | D
1002 | E
1002 | F
1003 | C
1003 | E
1003 | G
-------------
I tried doing:
select split('A,B,C,D,E,F', ',') from dual; -- WILL RETURN COLLECTION
select column_value
from table (select split('A,B,C,D,E,F', ',') from dual); -- RETURN COLUMN_VALUE

Try using below query:
WITH T AS (SELECT 'A,B,C,D,E,F' STR FROM DUAL) SELECT
REGEXP_SUBSTR (STR, '[^,]+', 1, LEVEL) SPLIT_VALUES FROM T
CONNECT BY LEVEL <= (SELECT LENGTH (REPLACE (STR, ',', NULL)) FROM T)
Below Query with ID:
WITH TAB AS
(SELECT '1001' ID, 'A,B,C,D,E,F' STR FROM DUAL
)
SELECT ID,
REGEXP_SUBSTR (STR, '[^,]+', 1, LEVEL) SPLIT_VALUES FROM TAB
CONNECT BY LEVEL <= (SELECT LENGTH (REPLACE (STR, ',', NULL)) FROM TAB);
EDIT:
Try using below query for multiple IDs and multiple separation:
WITH TAB AS
(SELECT '1001' ID, 'A,B,C,D,E,F' STR FROM DUAL
UNION
SELECT '1002' ID, 'D,E,F' STR FROM DUAL
UNION
SELECT '1003' ID, 'C,E,G' STR FROM DUAL
)
select id, substr(STR, instr(STR, ',', 1, lvl) + 1, instr(STR, ',', 1, lvl + 1) - instr(STR, ',', 1, lvl) - 1) name
from
( select ',' || STR || ',' as STR, id from TAB ),
( select level as lvl from dual connect by level <= 100 )
where lvl <= length(STR) - length(replace(STR, ',')) - 1
order by ID, NAME

There are multiple options. See Split comma delimited strings in a table in Oracle.
Using REGEXP_SUBSTR:
SQL> WITH sample_data AS(
2 SELECT 10001 ID, 'A,B,C' str FROM dual UNION ALL
3 SELECT 10002 ID, 'D,E,F' str FROM dual UNION ALL
4 SELECT 10003 ID, 'C,E,G' str FROM dual
5 )
6 -- end of sample_data mimicking real table
7 SELECT distinct id, trim(regexp_substr(str, '[^,]+', 1, LEVEL)) str
8 FROM sample_data
9 CONNECT BY LEVEL <= regexp_count(str, ',')+1
10 ORDER BY ID
11 /
ID STR
---------- -----
10001 A
10001 B
10001 C
10002 D
10002 E
10002 F
10003 C
10003 E
10003 G
9 rows selected.
SQL>
Using XMLTABLE:
SQL> WITH sample_data AS(
2 SELECT 10001 ID, 'A,B,C' str FROM dual UNION ALL
3 SELECT 10002 ID, 'D,E,F' str FROM dual UNION ALL
4 SELECT 10003 ID, 'C,E,G' str FROM dual
5 )
6 -- end of sample_data mimicking real table
7 SELECT id,
8 trim(COLUMN_VALUE) str
9 FROM sample_data,
10 xmltable(('"'
11 || REPLACE(str, ',', '","')
12 || '"'))
13 /
ID STR
---------- ---
10001 A
10001 B
10001 C
10002 D
10002 E
10002 F
10003 C
10003 E
10003 G
9 rows selected.

i solved similar problem this way...
select YT.ID,
REPLACE(REGEXP_SUBSTR(','||YT.STR||',',',.*?,',1,lvl.lvl),',','') AS STR
from YOURTABLE YT
join (select level as lvl
from dual
connect by level <= (select max(regexp_count(STR,',')+1) from YOURTABLE)
) lvl on lvl.lvl <= regexp_count(YT.STR,',')+1

You may try something like this:
CREATE OR REPLACE TYPE "STR_TABLE"
as table of varchar2
create or replace function GetCollection( iStr varchar2, iSplit char default ',' ) return STR_TABLE as
pStr varchar2(4000) := trim(iStr);
rpart varchar(255);
pColl STR_TABLE := STR_TABLE();
begin
while nvl(length(pStr),0) > 0 loop
pos := inStr(pStr, iSplit );
if pos > 0 then
rpart := substr(pStr,1, pos-1);
pStr := substr(pStr,pos+1,length(pStr));
else
rpart := pStr;
pStr := null;
end if;
if rpart is not null then
pColl.Extend;
pColl(pColl.Count) := rpart;
end if;
end loop;
return pColl;
end;

Do not use CONNECT BY or REGEXP which results in a Cartesian product on a complex query. Furthermore the above solutions expect you know the possible results (A,B,C,D,E,F) rather than a list of combinations
Use XMLTable:
SELECT c.fname, c.lname,
trim(COLUMN_VALUE) EMAIL_ADDRESS
FROM
CONTACTS c, CONTACT_STATUS s,
xmltable(('"'
|| REPLACE(EMAIL_ADDRESS, ';', '","')
|| '"'))
where c.status = s.id
The COLUMN_VALUE is a pseudocolumn that belongs to xmltable. This is quick and correct and allows you to reference a column w/o know its values.
This takes the column and makes a table of values "item","item2","item3" and automatically joins to its source table (CONTACTS). This was tested on thousands of rows
Note the ';' in the xmltable is the separator in the column field.

I tried the solution of Lalit Kumar B and it worked so far. But with more data I ran into an performance issue (> 60 Rows, >7 Level). Therefore I used a more static variation, I would like to share as alternative.
WITH T AS (
SELECT 1001 AS ID, 'A,B,C' AS NAME FROM DUAL
UNION SELECT 1002 AS ID, 'D,E,F' AS NAME FROM DUAL
UNION SELECT 1003 AS ID, 'C,E,G' AS NAME FROM DUAL
) --SELECT * FROM T
SELECT ID as ID,
distinct_column AS NAME
FROM ( SELECT t.ID,
trim(regexp_substr(t.NAME, '[^,]+', 1,1)) AS c1,
trim(regexp_substr(t.NAME, '[^,]+', 1,2)) AS c2,
trim(regexp_substr(t.NAME, '[^,]+', 1,3)) AS c3,
trim(regexp_substr(t.NAME, '[^,]+', 1,4)) AS c4 -- etc.
FROM T )
UNPIVOT ( distinct_column FOR cn IN ( c1, c2, c3, c4 ) )
ID NAME
------ ------
1001 A
1001 B
1001 C
1002 D
1002 E
1002 F
1003 C
1003 E
1003 G
9 Zeilen gewählt

this version works also with strings longer than one char:
select regexp_substr('A,B,C,Karl-Heinz,D','[^,]+', 1, level) from dual
connect by regexp_substr('A,B,C,Karl-Heinz,D', '[^,]+', 1, level) is not null;
see How to split comma separated string and pass to IN clause of select statement

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Duplicate values when splitting a string - sql

Related

SUBSTR to ADD value in oracle

Sorting comma delimited datasets in row

Oracle/SQL - Need query that will select max value from string in each row

Oracle Regex Connect By

Split comma separated values of a column in row, through Oracle SQL query

Categories

Resources