Related
I have an Oracle database table with a field called Classification which is VARCHAR. The VARCHAR is a CSV(using semi colons). Example:
;CHR;
;OTR;CHR;ROW;
;CHA;ROW;
;OTR;ROW;
I want to pull all the rows with ONLY a different value in the CSV from the others. It is ok if a row has a previously found value as long as it has a new different value.
For instance from the above dataset it would be:
;CHR;
;OTR;CHR;ROW;
;CHA;ROW;
If I do just:
Select DISTINCT Classification from Table1
I get rows that overlap distinct values due to the overall VARCHAR being Distinct.
I can get all the distinct values using:
select LISTAGG(val,',') WITHIN GROUP ( ORDER BY val ) as final
FROM
(
select distinct trim(regexp_substr("Classification",'[^;]+', 1, level) ) as val
from Table1
connect by regexp_substr("Classification", '[^,]+', 1, level) is not null
ORDER BY val
)
which give me
FINAL
CHA,CHR,OTR,ROW
but am unable to make the link to pull out one record per unique value
Is this possible with SQL?
EDIT: This is a database created by a large corporation and mine purchased the product. Now I am tasked with data mining the backend database for BI and have absolutely no control of the database structure.
No offence but I see many answers in the questions I have researched stating 'Do better database design/normalization' and while I agree MOST I have read have no control over the database and are asking for SO assistance with a problem because of this, not ridicule on bad database design.
I apologize if I offend anyone
There is no parent/child relationship. I cannot see the object layer but I assume these values are changed in the object layer before propagating to the client as there is no link to them in the actual database
Clarification:
I see 2 ways to solve this:
1: One select statement that pulls out 1 row based on a new unique value within the VARCHAR CSV(Classification)
2: Use my select statement to loop through and pull one row containing that value in the VARCHAR CSV(Classification)
Thanks all for the input. I upvoted the ones that worked for me. In the end I will be using the one I developed just because I can easily manipulate the output(to a csv) for what the analyst wishes.
Here's one way to approach it:
Assign row numbers to the original CSV data
Split the CSV -> rows
Now assign the split CSV values row numbers, sorted by the CSV ordering from the first step
Return any rows where the row number for the previous step = 1
Return the distinct list of CSVs
For example:
with tab as (
select ';CHR;' str from dual union all
select ';OTR;CHR;ROW;' str from dual union all
select ';CHA;ROW;' str from dual union all
select ';OTR;ROW;' str from dual
), ranks as (
select row_number() over ( order by str ) rn, tab.* from tab
), rws as (
select trim ( regexp_substr(str,'[^;]+', 1, level ) ) as val, rn, str
from ranks
connect by regexp_substr ( str, '[^;]+', 1, level ) is not null
and prior rn = rn
and prior sys_guid () is not null
), rns as (
select row_number () over (
partition by val
order by rn
) val_rn, r.*
from rws r
)
select distinct str
from rns
where val_rn = 1;
STR
;CHA;ROW;
;OTR;CHR;ROW;
;CHR;
This is an ad Hoc solution proposal if the generic answer yields a suboptimal performance and some restrictions are fullfiled:
all the keys have a fixed length
the maximal number of the keys is known
Than to parse the CSV string you may use this query (add further UNION ALL for longer strings)
with tab as (
select ';CHR;' str from dual union all
select ';OTR;CHR;ROW;' str from dual union all
select ';CHA;ROW;' str from dual union all
select ';OTR;ROW;' str from dual
), tab2 as (
select str, substr(str,2,3) val from tab union all
select str, substr(str,6,3) val from tab where substr(str,6,3) is not null union all
select str, substr(str,10,3) val from tab where substr(str,10,3) is not null)
select * from tab2;
which results in
STR VAL
------------- ------------
;CHR; CHR
;OTR;CHR;ROW; OTR
;CHA;ROW; CHA
;OTR;ROW; OTR
;OTR;CHR;ROW; CHR
;CHA;ROW; ROW
;OTR;ROW; ROW
;OTR;CHR;ROW; ROW
Now you need only to find the first occurence of each key and get all distinct strings with this first occurence.
I'm reusing the approach from the solution of Chris Saxon
with tab as (
select ';CHR;' str from dual union all
select ';OTR;CHR;ROW;' str from dual union all
select ';CHA;ROW;' str from dual union all
select ';OTR;ROW;' str from dual
), tab2 as (
select str, substr(str,2,3) val from tab union all
select str, substr(str,6,3) val from tab where substr(str,6,3) is not null union all
select str, substr(str,10,3) val from tab where substr(str,10,3) is not null),
tab3 as (
select STR, VAL,
row_number() over (partition by val order by str) rn
from tab2)
select distinct str
from tab3
where rn = 1
You were very close since you had already gotten the list of distinct values. Instead of combining them with LISTAGG, you can use that list to find a row that contains that unique value. Below are two separate queries that will return a Classification for each unique value. You can try them both and see which performs better based on the data you have in the table.
Query Option 1
WITH
table1 (classification)
AS
(SELECT ';CHR;' FROM DUAL
UNION ALL
SELECT ';OTR;CHR;ROW;' FROM DUAL
UNION ALL
SELECT ';CHA;ROW;' FROM DUAL
UNION ALL
SELECT ';OTR;ROW;' FROM DUAL),
dist_vals (val)
AS
( SELECT DISTINCT TRIM (REGEXP_SUBSTR (classification,
'[^;]+',
1,
LEVEL)) AS val
FROM Table1
CONNECT BY LEVEL < REGEXP_COUNT (classification, ';'))
SELECT val, classification
FROM (SELECT dv.val,
t.classification,
ROW_NUMBER () OVER (PARTITION BY dv.val ORDER BY t.classification) AS occurence
FROM dist_vals dv, table1 t
WHERE t.classification LIKE '%;' || dv.val || ';%')
WHERE occurence = 1;
Query Option 2
WITH
table1 (classification)
AS
(SELECT ';CHR;' FROM DUAL
UNION ALL
SELECT ';OTR;CHR;ROW;' FROM DUAL
UNION ALL
SELECT ';CHA;ROW;' FROM DUAL
UNION ALL
SELECT ';OTR;ROW;' FROM DUAL),
dist_vals (val)
AS
( SELECT DISTINCT TRIM (REGEXP_SUBSTR (classification,
'[^;]+',
1,
LEVEL)) AS val
FROM Table1
CONNECT BY LEVEL < REGEXP_COUNT (classification, ';'))
SELECT dv.val,
(SELECT classification
FROM table1
WHERE classification LIKE '%;' || dv.val || ';%' AND ROWNUM = 1)
FROM dist_vals dv;
I figured it out this way and it runs fast(even once all my joins to other tables are added). Will test other answers as I can and decide best one(others look better than mine if they work as I would rather not use dbms_output)
DECLARE
v_search_string varchar2(4000);
v_classification varchar2(4000);
BEGIN
select LISTAGG(val,',') WITHIN GROUP ( ORDER BY val ) as final
INTO v_search_string
FROM
(
select distinct trim(regexp_substr("Classification",'[^;]+', 1, level) ) as val
from mytable
connect by regexp_substr("Classification", '[^,]+', 1, level) is not null
ORDER BY val
);
FOR i IN
(SELECT trim(regexp_substr(v_search_string, '[^,]+', 1, LEVEL)) l
FROM dual
CONNECT BY LEVEL <= regexp_count(v_search_string, ',')+1
)
LOOP
SELECT "Classification"
INTO v_classification
FROM mytable
WHERE "Classification" LIKE '%' || i.l || '%'
FETCH NEXT 1 ROWS ONLY;
dbms_output.put_line(v_classification);
END LOOP;
END;
I have a string like this: aa;bb;cc
Number of chars of each block could be different.
; is the delimiter.
I need to take values seperately. For example: I want to take only the occurrence in the second position (bb).
I tried this:
SELECT trim(regexp_substr('aa;bb;cc', '[^;]+', 1, LEVEL)) str
FROM dual
CONNECT BY regexp_substr('aa;bb;cc', '[^;]+', 1, LEVEL) IS NOT NULL;
But if I do:
SELECT * FROM (SELECT trim(regexp_substr('aa;bb;cc', '[^;]+', 1, LEVEL)) str
FROM dual
CONNECT BY regexp_substr('aa;bb;cc', '[^;]+', 1, LEVEL) IS NOT NULL)
WHERE ROWNUM = 2;
It doesn't work.
Why don't you simply write this?
select trim(regexp_substr('aa;bb;cc', '[^;]+', 1, 2)) str from dual
If you want use recursive query use rownum with alias in inner query or use level pseudocolumn:
select str
from (
select level lvl, trim(regexp_substr('aa;bb;cc', '[^;]+', 1, 2)) str
from dual
connect by regexp_substr('aa;bb;cc', '[^;]+', 1, level) is not null)
where lvl = 2
WHERE ROWNUM = 2 will never return any result, as the rownum is calculated from the resultset of the query. But as there never is a first row, ROWNUM =2 will never be reached.
Easiest is to use OFFSET and LIMIT instead:
SELECT * FROM (
SELECT trim(regexp_substr('aa;bb;cc', '[^;]+', 1, LEVEL)) str
FROM dual
CONNECT BY regexp_substr('aa;bb;cc', '[^;]+', 1, LEVEL) IS NOT NULL
)
OFFSET 1 ROW FETCH NEXT 1 ROWS ONLY
You can use
with t(str) as
(
select 'aa;bb;cc' from dual
), t2 as
(
select trim(regexp_substr(str, '[^;]+', 1, level)) str,
level as lvl
from t
connect by regexp_substr(str, '[^;]+', 1, level) is not null
)
select str
from t2
where lvl = 2;
STR
---
bb
Demo
I don't suggest you use rownum as much as possible, especially queries with subqueries and order by clauses. In your case, WHERE ROWNUM = 1 returns a value ( and the result is untrustable, I mean may be other than you want for real values derived from tables) but for the other equalities ROWNUM = 2 or ROWNUM = 3 even do not return a value.
I want to search which value(s) in MY WHERE CLAUSE LIST are not available in the table.
Table name is test
Column1
--------------
1
2
3
My query : I have a search list 2, 3, 4, 5 and I want to see which all are not in my database. When I query, I should get 4, 5 and NOT 1.
I do not want the list of values which are there in the table and not in where clause list(select * from test where column1 not in (2, 3, 4, 5)
Can someone please help ?
WITH my_list AS
(SELECT regexp_substr('2,3,4,5', '[^,]+', 1, LEVEL) AS search_val
FROM dual
CONNECT BY level <= regexp_count('2,3,4,5',',') + 1
)
SELECT *
FROM my_list
WHERE NOT EXISTS
(SELECT 'X' FROM YOUR_TABLE WHERE YOUR_COLUMN = search_val
);
Let's Convert the comma separated values into a view and then do what's needed.
You can do it as follows:
SELECT List FROM
(SELECT 2 as List
UNION
SELECT 3
UNION
SELECT 4
UNION
SELECT 5) T
WHERE List NOT IN
(SELECT Column1 FROM TableName)
In this case, I would do a simple select
select *
from test
where column1 in (2, 3, 4, 5)
and do the set operation in the host language (Java, C++, Perl, ...).
This seems far simpler than any SQL solution.
with cte as
(select 2 as val from dual
union all
select 3 from dual
union all
select 4 from dual
union all
select 5 from dual
union all
)
select * from cte as t1
where not exists
( select * from test as t2 where t1.val = t2.column1)
For a large number of values you might better create a temporary table, insert the rows and then use this instead of the common table expression.
Try below Query:
WITH MY_DATA_TABLE AS
(
SELECT regexp_substr('2,3,4,5', '[^,]+', 1, LEVEL) AS MY_DATA_VALUE
FROM dual
CONNECT BY level <= (length('2,3,4,5') - length(replace('2,3,4,5', ',')))
)
SELECT *
FROM MY_DATA_TABLE
WHERE NOT EXISTS
(SELECT 'TRUE' FROM TABLE_NAME WHERE TABLE_FIELD_VALUE = MY_DATA_VALUE
);
Your query with huge data would translate in ORACLE to:
WITH MY_DATA_TABLE AS
(
SELECT regexp_substr('1,4,5,8,9,12,13,14,20,39,43,48,51,54,55,57,61,65,68,75,78,80,81,82,91,92,96,99,102,103,109,112,113,224,227,249,250,251,600,601,604,605,608,609,614,802', '[^,]+', 1, LEVEL) AS MY_DATA_VALUE
FROM dual
CONNECT BY level <= (length('1,4,5,8,9,12,13,14,20,39,43,48,51,54,55,57,61,65,68,75,78,80,81,82,91,92,96,99,102,103,109,112,113,224,227,249,250,251,600,601,604,605,608,609,614,802') - length(replace('1,4,5,8,9,12,13,14,20,39,43,48,51,54,55,57,61,65,68,75,78,80,81,82,91,92,96,99,102,103,109,112,113,224,227,249,250,251,600,601,604,605,608,609,614,802', ',')))
)
SELECT *
FROM MY_DATA_TABLE
WHERE NOT EXISTS
(SELECT 'TRUE' FROM TABLE_NAME WHERE TABLE_FIELD_VALUE = MY_DATA_VALUE
);
Here's my data:
with first_three as
(
select 'AAAA' as code from dual union all
select 'BBBA' as code from dual union all
select 'BBBB' as code from dual union all
select 'BBBC' as code from dual union all
select 'CCCC' as code from dual union all
select 'CCCD' as code from dual union all
select 'FFFF' as code from dual union all
select 'GFFF' as code from dual )
select substr(code,1,3) as r1
from first_three
group by substr(code,1,3)
having count(*) >1
This query returns the characters that meet the cirteria. Now, how do I select from this to get desired results? Or, is there another way?
Desired Results
BBBA
BBBB
BBBC
CCCC
CCCD
WITH code_frequency AS (
SELECT code,
COUNT(1) OVER ( PARTITION BY SUBSTR( code, 1, 3 ) ) AS frequency
FROM table_name
)
SELECT code
FROM code_frequency
WHERE frequency > 1
WITH first_three AS (
...
)
SELECT *
FROM first_three f1
WHERE EXISTS (
SELECT 1 FROM first_three f2
WHERE f1.code != f2.code
AND substr(f1.code, 1, 3) = substr(f2.code, 1, 3)
)
select res from (select res,count(*) over
(partition by substr(res,1,3) order by null) cn from table_name) where cn>1;
I have a table with a string which contains several delimited values, e.g. a;b;c.
I need to split this string and use its values in a query. For example I have following table:
str
a;b;c
b;c;d
a;c;d
I need to group by a single value from str column to get following result:
str count(*)
a 1
b 2
c 3
d 2
Is it possible to implement using single select query? I can not create temporary tables to extract values there and query against that temporary table.
From your comment to #PrzemyslawKruglej answer
Main problem is with internal query with connect by, it generates astonishing amount of rows
The amount of rows generated can be reduced with the following approach:
/* test table populated with sample data from your question */
SQL> create table t1(str) as(
2 select 'a;b;c' from dual union all
3 select 'b;c;d' from dual union all
4 select 'a;c;d' from dual
5 );
Table created
-- number of rows generated will solely depend on the most longest
-- string.
-- If (say) the longest string contains 3 words (wont count separator `;`)
-- and we have 100 rows in our table, then we will end up with 300 rows
-- for further processing , no more.
with occurrence(ocr) as(
select level
from ( select max(regexp_count(str, '[^;]+')) as mx_t
from t1 ) t
connect by level <= mx_t
)
select count(regexp_substr(t1.str, '[^;]+', 1, o.ocr)) as generated_for_3_rows
from t1
cross join occurrence o;
Result: For three rows where the longest one is made up of three words, we will generate 9 rows:
GENERATED_FOR_3_ROWS
--------------------
9
Final query:
with occurrence(ocr) as(
select level
from ( select max(regexp_count(str, '[^;]+')) as mx_t
from t1 ) t
connect by level <= mx_t
)
select res
, count(res) as cnt
from (select regexp_substr(t1.str, '[^;]+', 1, o.ocr) as res
from t1
cross join occurrence o)
where res is not null
group by res
order by res;
Result:
RES CNT
----- ----------
a 2
b 2
c 3
d 2
SQLFIddle Demo
Find out more about regexp_count()(11g and up) and regexp_substr() regular expression functions.
Note: Regular expression functions relatively expensive to compute, and when it comes to processing a very large amount of data, it might be worth considering to switch to a plain PL/SQL. Here is an example.
This is ugly, but seems to work. The problem with the CONNECT BY splitting is that it returns duplicate rows. I managed to get rid of them, but you'll have to test it:
WITH
data AS (
SELECT 'a;b;c' AS val FROM dual
UNION ALL SELECT 'b;c;d' AS val FROM dual
UNION ALL SELECT 'a;c;d' AS val FROM dual
)
SELECT token, COUNT(1)
FROM (
SELECT DISTINCT token, lvl, val, p_val
FROM (
SELECT
regexp_substr(val, '[^;]+', 1, level) AS token,
level AS lvl,
val,
NVL(prior val, val) p_val
FROM data
CONNECT BY regexp_substr(val, '[^;]+', 1, level) IS NOT NULL
)
WHERE val = p_val
)
GROUP BY token;
TOKEN COUNT(1)
-------------------- ----------
d 2
b 2
a 2
c 3
SELECT NAME,COUNT(NAME) FROM ( SELECT NAME FROM ( (SELECT rownum as ID, REGEXP_SUBSTR('a;b;c', '[^;]+', 1, LEVEL ) NAME
FROM dual CONNECT BY REGEXP_SUBSTR('a;b;c', '[^;]+', 1, LEVEL) IS NOT NULL))
UNION ALL (SELECT NAME FROM ( (SELECT rownum as ID, REGEXP_SUBSTR('b;c;d', '[^;]+', 1, LEVEL ) NAME
FROM dual CONNECT BY REGEXP_SUBSTR('b;c;d', '[^;]+', 1, LEVEL) IS NOT NULL)))
UNION ALL
(SELECT NAME FROM ( (SELECT rownum as ID, REGEXP_SUBSTR('a;c;d', '[^;]+', 1, LEVEL ) NAME
FROM dual CONNECT BY REGEXP_SUBSTR('a;c;d', '[^;]+', 1, LEVEL) IS NOT NULL)))) GROUP BY NAME
NAME COUNT(NAME)
----- -----------
d 2
a 2
b 2
c 3