split string into several rows - sql

I have a table with a string which contains several delimited values, e.g. a;b;c.
I need to split this string and use its values in a query. For example I have following table:
str
a;b;c
b;c;d
a;c;d
I need to group by a single value from str column to get following result:
str count(*)
a 1
b 2
c 3
d 2
Is it possible to implement using single select query? I can not create temporary tables to extract values there and query against that temporary table.

From your comment to #PrzemyslawKruglej answer
Main problem is with internal query with connect by, it generates astonishing amount of rows
The amount of rows generated can be reduced with the following approach:
/* test table populated with sample data from your question */
SQL> create table t1(str) as(
2 select 'a;b;c' from dual union all
3 select 'b;c;d' from dual union all
4 select 'a;c;d' from dual
5 );
Table created
-- number of rows generated will solely depend on the most longest
-- string.
-- If (say) the longest string contains 3 words (wont count separator `;`)
-- and we have 100 rows in our table, then we will end up with 300 rows
-- for further processing , no more.
with occurrence(ocr) as(
select level
from ( select max(regexp_count(str, '[^;]+')) as mx_t
from t1 ) t
connect by level <= mx_t
)
select count(regexp_substr(t1.str, '[^;]+', 1, o.ocr)) as generated_for_3_rows
from t1
cross join occurrence o;
Result: For three rows where the longest one is made up of three words, we will generate 9 rows:
GENERATED_FOR_3_ROWS
--------------------
9
Final query:
with occurrence(ocr) as(
select level
from ( select max(regexp_count(str, '[^;]+')) as mx_t
from t1 ) t
connect by level <= mx_t
)
select res
, count(res) as cnt
from (select regexp_substr(t1.str, '[^;]+', 1, o.ocr) as res
from t1
cross join occurrence o)
where res is not null
group by res
order by res;
Result:
RES CNT
----- ----------
a 2
b 2
c 3
d 2
SQLFIddle Demo
Find out more about regexp_count()(11g and up) and regexp_substr() regular expression functions.
Note: Regular expression functions relatively expensive to compute, and when it comes to processing a very large amount of data, it might be worth considering to switch to a plain PL/SQL. Here is an example.

This is ugly, but seems to work. The problem with the CONNECT BY splitting is that it returns duplicate rows. I managed to get rid of them, but you'll have to test it:
WITH
data AS (
SELECT 'a;b;c' AS val FROM dual
UNION ALL SELECT 'b;c;d' AS val FROM dual
UNION ALL SELECT 'a;c;d' AS val FROM dual
)
SELECT token, COUNT(1)
FROM (
SELECT DISTINCT token, lvl, val, p_val
FROM (
SELECT
regexp_substr(val, '[^;]+', 1, level) AS token,
level AS lvl,
val,
NVL(prior val, val) p_val
FROM data
CONNECT BY regexp_substr(val, '[^;]+', 1, level) IS NOT NULL
)
WHERE val = p_val
)
GROUP BY token;
TOKEN COUNT(1)
-------------------- ----------
d 2
b 2
a 2
c 3

SELECT NAME,COUNT(NAME) FROM ( SELECT NAME FROM ( (SELECT rownum as ID, REGEXP_SUBSTR('a;b;c', '[^;]+', 1, LEVEL ) NAME
FROM dual CONNECT BY REGEXP_SUBSTR('a;b;c', '[^;]+', 1, LEVEL) IS NOT NULL))
UNION ALL (SELECT NAME FROM ( (SELECT rownum as ID, REGEXP_SUBSTR('b;c;d', '[^;]+', 1, LEVEL ) NAME
FROM dual CONNECT BY REGEXP_SUBSTR('b;c;d', '[^;]+', 1, LEVEL) IS NOT NULL)))
UNION ALL
(SELECT NAME FROM ( (SELECT rownum as ID, REGEXP_SUBSTR('a;c;d', '[^;]+', 1, LEVEL ) NAME
FROM dual CONNECT BY REGEXP_SUBSTR('a;c;d', '[^;]+', 1, LEVEL) IS NOT NULL)))) GROUP BY NAME
NAME COUNT(NAME)
----- -----------
d 2
a 2
b 2
c 3

Related

SELECT rows with a new DISTINCT from a VARCHAR with CSV in it

I have an Oracle database table with a field called Classification which is VARCHAR. The VARCHAR is a CSV(using semi colons). Example:
;CHR;
;OTR;CHR;ROW;
;CHA;ROW;
;OTR;ROW;
I want to pull all the rows with ONLY a different value in the CSV from the others. It is ok if a row has a previously found value as long as it has a new different value.
For instance from the above dataset it would be:
;CHR;
;OTR;CHR;ROW;
;CHA;ROW;
If I do just:
Select DISTINCT Classification from Table1
I get rows that overlap distinct values due to the overall VARCHAR being Distinct.
I can get all the distinct values using:
select LISTAGG(val,',') WITHIN GROUP ( ORDER BY val ) as final
FROM
(
select distinct trim(regexp_substr("Classification",'[^;]+', 1, level) ) as val
from Table1
connect by regexp_substr("Classification", '[^,]+', 1, level) is not null
ORDER BY val
)
which give me
FINAL
CHA,CHR,OTR,ROW
but am unable to make the link to pull out one record per unique value
Is this possible with SQL?
EDIT: This is a database created by a large corporation and mine purchased the product. Now I am tasked with data mining the backend database for BI and have absolutely no control of the database structure.
No offence but I see many answers in the questions I have researched stating 'Do better database design/normalization' and while I agree MOST I have read have no control over the database and are asking for SO assistance with a problem because of this, not ridicule on bad database design.
I apologize if I offend anyone
There is no parent/child relationship. I cannot see the object layer but I assume these values are changed in the object layer before propagating to the client as there is no link to them in the actual database
Clarification:
I see 2 ways to solve this:
1: One select statement that pulls out 1 row based on a new unique value within the VARCHAR CSV(Classification)
2: Use my select statement to loop through and pull one row containing that value in the VARCHAR CSV(Classification)
Thanks all for the input. I upvoted the ones that worked for me. In the end I will be using the one I developed just because I can easily manipulate the output(to a csv) for what the analyst wishes.
Here's one way to approach it:
Assign row numbers to the original CSV data
Split the CSV -> rows
Now assign the split CSV values row numbers, sorted by the CSV ordering from the first step
Return any rows where the row number for the previous step = 1
Return the distinct list of CSVs
For example:
with tab as (
select ';CHR;' str from dual union all
select ';OTR;CHR;ROW;' str from dual union all
select ';CHA;ROW;' str from dual union all
select ';OTR;ROW;' str from dual
), ranks as (
select row_number() over ( order by str ) rn, tab.* from tab
), rws as (
select trim ( regexp_substr(str,'[^;]+', 1, level ) ) as val, rn, str
from ranks
connect by regexp_substr ( str, '[^;]+', 1, level ) is not null
and prior rn = rn
and prior sys_guid () is not null
), rns as (
select row_number () over (
partition by val
order by rn
) val_rn, r.*
from rws r
)
select distinct str
from rns
where val_rn = 1;
STR
;CHA;ROW;
;OTR;CHR;ROW;
;CHR;
This is an ad Hoc solution proposal if the generic answer yields a suboptimal performance and some restrictions are fullfiled:
all the keys have a fixed length
the maximal number of the keys is known
Than to parse the CSV string you may use this query (add further UNION ALL for longer strings)
with tab as (
select ';CHR;' str from dual union all
select ';OTR;CHR;ROW;' str from dual union all
select ';CHA;ROW;' str from dual union all
select ';OTR;ROW;' str from dual
), tab2 as (
select str, substr(str,2,3) val from tab union all
select str, substr(str,6,3) val from tab where substr(str,6,3) is not null union all
select str, substr(str,10,3) val from tab where substr(str,10,3) is not null)
select * from tab2;
which results in
STR VAL
------------- ------------
;CHR; CHR
;OTR;CHR;ROW; OTR
;CHA;ROW; CHA
;OTR;ROW; OTR
;OTR;CHR;ROW; CHR
;CHA;ROW; ROW
;OTR;ROW; ROW
;OTR;CHR;ROW; ROW
Now you need only to find the first occurence of each key and get all distinct strings with this first occurence.
I'm reusing the approach from the solution of Chris Saxon
with tab as (
select ';CHR;' str from dual union all
select ';OTR;CHR;ROW;' str from dual union all
select ';CHA;ROW;' str from dual union all
select ';OTR;ROW;' str from dual
), tab2 as (
select str, substr(str,2,3) val from tab union all
select str, substr(str,6,3) val from tab where substr(str,6,3) is not null union all
select str, substr(str,10,3) val from tab where substr(str,10,3) is not null),
tab3 as (
select STR, VAL,
row_number() over (partition by val order by str) rn
from tab2)
select distinct str
from tab3
where rn = 1
You were very close since you had already gotten the list of distinct values. Instead of combining them with LISTAGG, you can use that list to find a row that contains that unique value. Below are two separate queries that will return a Classification for each unique value. You can try them both and see which performs better based on the data you have in the table.
Query Option 1
WITH
table1 (classification)
AS
(SELECT ';CHR;' FROM DUAL
UNION ALL
SELECT ';OTR;CHR;ROW;' FROM DUAL
UNION ALL
SELECT ';CHA;ROW;' FROM DUAL
UNION ALL
SELECT ';OTR;ROW;' FROM DUAL),
dist_vals (val)
AS
( SELECT DISTINCT TRIM (REGEXP_SUBSTR (classification,
'[^;]+',
1,
LEVEL)) AS val
FROM Table1
CONNECT BY LEVEL < REGEXP_COUNT (classification, ';'))
SELECT val, classification
FROM (SELECT dv.val,
t.classification,
ROW_NUMBER () OVER (PARTITION BY dv.val ORDER BY t.classification) AS occurence
FROM dist_vals dv, table1 t
WHERE t.classification LIKE '%;' || dv.val || ';%')
WHERE occurence = 1;
Query Option 2
WITH
table1 (classification)
AS
(SELECT ';CHR;' FROM DUAL
UNION ALL
SELECT ';OTR;CHR;ROW;' FROM DUAL
UNION ALL
SELECT ';CHA;ROW;' FROM DUAL
UNION ALL
SELECT ';OTR;ROW;' FROM DUAL),
dist_vals (val)
AS
( SELECT DISTINCT TRIM (REGEXP_SUBSTR (classification,
'[^;]+',
1,
LEVEL)) AS val
FROM Table1
CONNECT BY LEVEL < REGEXP_COUNT (classification, ';'))
SELECT dv.val,
(SELECT classification
FROM table1
WHERE classification LIKE '%;' || dv.val || ';%' AND ROWNUM = 1)
FROM dist_vals dv;
I figured it out this way and it runs fast(even once all my joins to other tables are added). Will test other answers as I can and decide best one(others look better than mine if they work as I would rather not use dbms_output)
DECLARE
v_search_string varchar2(4000);
v_classification varchar2(4000);
BEGIN
select LISTAGG(val,',') WITHIN GROUP ( ORDER BY val ) as final
INTO v_search_string
FROM
(
select distinct trim(regexp_substr("Classification",'[^;]+', 1, level) ) as val
from mytable
connect by regexp_substr("Classification", '[^,]+', 1, level) is not null
ORDER BY val
);
FOR i IN
(SELECT trim(regexp_substr(v_search_string, '[^,]+', 1, LEVEL)) l
FROM dual
CONNECT BY LEVEL <= regexp_count(v_search_string, ',')+1
)
LOOP
SELECT "Classification"
INTO v_classification
FROM mytable
WHERE "Classification" LIKE '%' || i.l || '%'
FETCH NEXT 1 ROWS ONLY;
dbms_output.put_line(v_classification);
END LOOP;
END;

Oracle SQL Replace multiple characters in different positions

I'm using Oracle 11g and I'm having trouble replacing multiple characters based on positions mentioned in a different table. For example:
Table 1
PRSKEY POSITION CHARACTER
123 3 ć
123 9 ć
Table 2
PRSKEY NAME
123 Becirovic
I have to replace the NAME in Table 2 to Bećirović.
I've tried regexp_replace but this function doesn't provide replacing more then 1 position, is there an easy way to fix this?
Here's another way to do it.
with tab1 as (select 123 as prskey, 3 as position, 'ć' as character from dual
union select 123, 9, 'ć' from dual),
tab2 as (select 123 as prskey, 'Becirovic' as name from dual)
select listagg(nvl(tab1.character, namechar)) within group(order by lvl)
from
(select prskey, substr(name, level, 1) as namechar, level as lvl
from tab2
connect by level <= length(name)
) splitname
left join tab1 on position = lvl and tab1.prskey = splitname.prskey
;
Simple solution using cursor ...
create table t1 (
prskey int,
pos int,
character char(1)
);
create table t2
(
prskey int,
name varchar2(100)
);
insert into t1 values (1, 1, 'b');
insert into t1 values (1, 3, 'e');
insert into t2 values (1, 'dear');
begin
for t1rec in (select * from t1) loop
update t2
set name = substr(name, 1, t1rec.pos - 1) || t1rec.character || substr(name, t1rec.pos + 1, length(name) - t1rec.pos)
where t2.prskey = t1rec.prskey;
end loop;
end;
/
I would prefer approach via PL/SQL, but in your tag only 'sql', so I made this monster:
with t as (
select 123 as id, 3 as pos, 'q' as new_char from dual
union all
select 123 as id, 6 as pos, 'z' as new_char from dual
union all
select 123 as id, 9 as pos, '1' as new_char from dual
union all
select 456 as id, 1 as pos, 'A' as new_char from dual
union all
select 456 as id, 4 as pos, 'Z' as new_char from dual
),
t1 as (
select 123 as id, 'Becirovic' as str from dual
union all
select 456 as id, 'Test' as str from dual
)
select listagg(out_text) within group (order by pos)
from(
select id, pos, new_char, str, prev, substr(str,prev,pos-prev)||new_char as out_text
from(
select id, pos, new_char, str, nvl(lag(pos) over (partition by id order by pos)+1,1) as prev
from (
select t.id, pos, new_char, str
from t, t1
where t.id = t1.id
) q
) a
) w
group by id
Result:
Beqirzvi1
AesZ

ORACLE join two table with comma separated ids

I have two tables
Table 1
ID NAME
1 Person1
2 Person2
3 Person3
Table 2
ID GROUP_ID
1 1
2 2,3
The IDs in all the columns above refer to the same ID (Example - a Department)
My Expected output (by joining both the tables)
GROUP_ID NAME
1 Person1
2,3 Person2,Person3
Is there a query with which I can achieve this.
It can be done. You shouldn't do it, but perhaps you don't have the power to change the world. (If you have a say in it, you should normalize your table design - in your case, both the input and the output fail the first normal form).
Answering more as good practice for myself... This solution guarantees that the names will be listed in the same order as the id's. It is not the most efficient, and it doesn't deal with id's in the list that are not found in the first table (it simply discards them instead of leaving a marker of some sort).
with
table_1 ( id, name ) as (
select 1, 'Person1' from dual union all
select 2, 'Person2' from dual union all
select 3, 'Person3' from dual
),
table_2 ( id, group_id ) as (
select 1, '1' from dual union all
select 2, '2,3' from dual
),
prep ( id, lvl, token ) as (
select id, level, regexp_substr(group_id, '[^,]', 1, level)
from table_2
connect by level <= regexp_count(group_id, ',') + 1
and prior id = id
and prior sys_guid() is not null
)
select p.id, listagg(t1.name, ',') within group (order by p.lvl) as group_names
from table_1 t1 inner join prep p on t1.id = p.token
group by p.id;
ID GROUP_NAMES
---- --------------------
1 Person1
2 Person2,Person3
select t2.group_id, listagg(t1.name,',') WITHIN GROUP (ORDER BY 1)
from table2 t2, table1 t1
where ','||t2.group_id||',' like '%,'||t1.id||',%'
group by t2.id, t2.group_id
Normalize you data model, this perversion !!! Сomma separated list should not exist in database. Only individual rows per data unit.

In oracle, how to 'group by' properties that are in comma separated values?

Say, I have a table like
Name Pets
-------------------------
Anna Cats,Dogs,Hamsters
John Cats
Jake Dogs,Cats
Jill Parrots
And I want to count, how many people have different types of pets. The output would be something like
Pets Owners
---------------
Cats 3
Dogs 2
Hamsters 1
Parrots 1
Limitations:
Reworking of DB scheme is impractical. If I could do it, I wouldn't be here.
All logic must be done in one SQL query.
I can't take result table and deduce owner count later in code.
Using built-in Oracle functions is OK, but writing custom functions is discouraged.
Oracle version – 11 and up.
It's a terrible design - as you mentioned - so I don't envy you having to work with it!
It's possible to do what you're after, although I wouldn't like to say that it would be performant for larger datasets!
Assuming the name column is the primary key (or at least unique):
with t1 as (select 'Anna' name, 'Cats,Dogs,Hamsters' pets from dual union all
select 'John' name, 'Cats' pets from dual union all
select 'Jake' name, 'Dogs,Cats' pets from dual union all
select 'Jill' name, 'Parrots' pets from dual)
select pet pets,
count(*) owners
from (select name,
regexp_substr(pets, '(.*?)(,|$)', 1, level, null, 1) pet
from t1
connect by prior name = name
and prior sys_guid() is not null
and level <= regexp_count(pets, ',') + 1)
group by pet
order by owners desc, pet;
PETS OWNERS
---------- ----------
Cats 3
Dogs 2
Hamsters 1
Parrots 1
It is a bad design to store comma-separated values in a single column. You should consider normalizing the data. Having such a design will always push you to have an overhead of manipulating delimited-strings.
Anyway, as a workaround, you could use REGEXP_SUBSTR and CONNECT BY to split the comma-delimited string into multiple rows and then count the pets.
There are other ways of doing it too, like XMLTABLE, MODEL clause. Have a look at split the comma-delimited string into multiple rows.
SQL> WITH sample_data AS(
2 SELECT 'Anna' NAME, 'Cats,Dogs,Hamsters' pets FROM dual UNION ALL
3 SELECT 'John' NAME, 'Cats' pets FROM dual UNION ALL
4 SELECT 'Jake' NAME, 'Dogs,Cats' pets FROM dual UNION ALL
5 SELECT 'Jill' NAME, 'Parrots' pets FROM dual
6 )
7 -- end of sample_data mimicking a real table
8 SELECT pets,
9 COUNT(*) cnt
10 FROM
11 (SELECT trim(regexp_substr(t.pets, '[^,]+', 1, lines.COLUMN_VALUE)) pets
12 FROM sample_data t,
13 TABLE (CAST (MULTISET
14 (SELECT LEVEL FROM dual CONNECT BY LEVEL <= regexp_count(t.pets, ',')+1
15 ) AS sys.odciNumberList ) ) lines
16 ORDER BY NAME,
17 lines.COLUMN_VALUE
18 )
19 GROUP BY pets
20 ORDER BY cnt DESC;
PETS CNT
------------------ ----------
Cats 3
Dogs 2
Hamsters 1
Parrots 1
SQL>
My try, only with substr and instr :)
with a as (
select 'Anna' as name, 'Cats,Dogs,Hamsters' as pets from dual union all
select 'John', 'Cats' from dual union all
select 'Jake', 'Dogs,Cats' from dual union all
select 'Jill', 'Parrots' from dual
),
b as(
select name, pets, substr(pets, starting_pos, ending_pos - starting_pos) pet
from (
select name, pets,
decode(lvl, 1, 0, instr(a.pets,',',1,lvl-1))+1 starting_pos,
instr(a.pets,',',1,lvl) ending_pos
from (select name, pets||',' pets from a
)a
join (select level lvl from dual connect by level < 10)
on instr(a.pets,',', 1, lvl) > 0
)
)
--select * from b
select pet, count(*) from b group by pet;
select x pets ,count(x) Owners from (
select extractvalue(value(x), '/b') x
from (select yourcolumn as str from yourtable) t,
table(
xmlsequence(
xmltype('<a><b>' || replace(str, ',', '</b><b>') || '</b></a>' ).extract('/*/*')
)
) x)
group by x;
/replace yourcolumn with your column name (pets) and yourtable with your table name./

Which value(s) in WHERE CLAUSE LIST are not available in the table

I want to search which value(s) in MY WHERE CLAUSE LIST are not available in the table.
Table name is test
Column1
--------------
1
2
3
My query : I have a search list 2, 3, 4, 5 and I want to see which all are not in my database. When I query, I should get 4, 5 and NOT 1.
I do not want the list of values which are there in the table and not in where clause list(select * from test where column1 not in (2, 3, 4, 5)
Can someone please help ?
WITH my_list AS
(SELECT regexp_substr('2,3,4,5', '[^,]+', 1, LEVEL) AS search_val
FROM dual
CONNECT BY level <= regexp_count('2,3,4,5',',') + 1
)
SELECT *
FROM my_list
WHERE NOT EXISTS
(SELECT 'X' FROM YOUR_TABLE WHERE YOUR_COLUMN = search_val
);
Let's Convert the comma separated values into a view and then do what's needed.
You can do it as follows:
SELECT List FROM
(SELECT 2 as List
UNION
SELECT 3
UNION
SELECT 4
UNION
SELECT 5) T
WHERE List NOT IN
(SELECT Column1 FROM TableName)
In this case, I would do a simple select
select *
from test
where column1 in (2, 3, 4, 5)
and do the set operation in the host language (Java, C++, Perl, ...).
This seems far simpler than any SQL solution.
with cte as
(select 2 as val from dual
union all
select 3 from dual
union all
select 4 from dual
union all
select 5 from dual
union all
)
select * from cte as t1
where not exists
( select * from test as t2 where t1.val = t2.column1)
For a large number of values you might better create a temporary table, insert the rows and then use this instead of the common table expression.
Try below Query:
WITH MY_DATA_TABLE AS
(
SELECT regexp_substr('2,3,4,5', '[^,]+', 1, LEVEL) AS MY_DATA_VALUE
FROM dual
CONNECT BY level <= (length('2,3,4,5') - length(replace('2,3,4,5', ',')))
)
SELECT *
FROM MY_DATA_TABLE
WHERE NOT EXISTS
(SELECT 'TRUE' FROM TABLE_NAME WHERE TABLE_FIELD_VALUE = MY_DATA_VALUE
);
Your query with huge data would translate in ORACLE to:
WITH MY_DATA_TABLE AS
(
SELECT regexp_substr('1,4,5,8,9,12,13,14,20,39,43,48,51,54,55,57,61,65,68,75,78,80,81,82,91,92,96,99,‌​102,103,109,112,113,224,227,249,250,251,600,601,604,605,608,609,614,802', '[^,]+', 1, LEVEL) AS MY_DATA_VALUE
FROM dual
CONNECT BY level <= (length('1,4,5,8,9,12,13,14,20,39,43,48,51,54,55,57,61,65,68,75,78,80,81,82,91,92,96,99,‌​102,103,109,112,113,224,227,249,250,251,600,601,604,605,608,609,614,802') - length(replace('1,4,5,8,9,12,13,14,20,39,43,48,51,54,55,57,61,65,68,75,78,80,81,82,91,92,96,99,‌​102,103,109,112,113,224,227,249,250,251,600,601,604,605,608,609,614,802', ',')))
)
SELECT *
FROM MY_DATA_TABLE
WHERE NOT EXISTS
(SELECT 'TRUE' FROM TABLE_NAME WHERE TABLE_FIELD_VALUE = MY_DATA_VALUE
);