How can I count distinct only the values in VARCHAR array? - sql

I have a VARCHAR column with arrays in it (string as a array) in Oracle sql dialect how can I count the distinct values in it?
for example I have the following rows
ID List
1 ["351","364"]
2 ["364","351"]
3 ["364","951"]
4 ["951"]
I expected to count 3.

Assuming you're on a recent version of Oracle, you can use JSON functions to extract the elements of the arrays:
select t.id, j.value
from your_table t
outer apply json_table (t.list, '$[*]' columns (value path '$')) j
ID
VALUE
1
351
1
364
2
364
2
351
3
364
3
951
4
951
And then just count the distinct values:
select count(distinct j.value) as c
from your_table t
outer apply json_table (t.list, '$[*]' columns (value path '$')) j
C
3

Related

Get certain rows, plus rows before and after

Let's say I have the following data set:
ID
Identifier
Admission_Date
Release_Date
234
2
5/1/22
5/5/22
234
1
4/25/22
4/30/22
234
2
4/20/22
4/24/22
234
2
4/15/22
4/18/22
789
1
7/15/22
7/19/22
789
2
7/8/22
7/14/22
789
2
7/1/22
7/5/22
321
2
6/1/21
6/3/21
321
2
5/27/21
5/31/21
321
1
5/20/21
5/26/21
321
2
5/15/21
5/19/21
321
2
5/6/21
5/10/21
I want all rows with identifier=1. I also want rows that are either directly below or above rows with Identifier=1 - sorted by most recent to least recent.
There is always a row below rows with identifier=1. There may or may not be a row above. If there is no row with identifier=1 for an ID, then it will not be brought in with a prior step.
The resulting data set should be as follows:
ID
Identifier
Admission Date
Release Date
234
2
5/1/22
5/5/22
234
1
4/25/22
4/30/22
234
2
4/20/22
4/24/22
789
1
7/15/22
7/19/22
789
2
7/8/22
7/14/22
321
2
5/27/21
5/31/21
321
1
5/20/21
5/26/21
321
2
5/15/21
5/19/21
I am using DBeaver, which runs PostgreSQL.
I admittedly don't know Postgres well so the following could possibly be optimised, however using a combination of lag and lead to obtain the previous and next dates (assuming Admission_date is the one to order by) you could try
with d as (
select *,
case when identifier = 1 then Lag(admission_date) over(partition by id order by Admission_Date desc) end pd,
case when identifier = 1 then Lead(admission_date) over(partition by id order by Admission_Date desc) end nd
from t
)
select id, Identifier, Admission_Date, Release_Date
from d
where identifier = 1
or exists (
select * from d d2
where d2.id = d.id
and (d.Admission_Date = pd or d.admission_date = nd)
)
order by Id, Admission_Date desc;
One way:
SELECT (x.my_row).* -- decompose fields from row type
FROM (
SELECT identifier
, lag(t) OVER w AS t0 -- take whole row
, t AS t1
, lead(t) OVER w AS t2
FROM tbl t
WINDOW w AS (PARTITION BY id ORDER BY admission_date)
) sub
CROSS JOIN LATERAL (
VALUES (t0), (t1), (t2) -- pivot
) x(my_row)
WHERE sub.identifier = 1
AND (x.my_row).id IS NOT NULL; -- exclude rows with NULL ( = missing row)
db<>fiddle here
The query is designed to only make a single pass over the table.
Uses some advanced SQL / Postgres features.
About LATERAL:
What is the difference between a LATERAL JOIN and a subquery in PostgreSQL?
About the VALUES expression:
Postgres: convert single row to multiple rows (unpivot)
The manual about extracting fields from a composite type.
If there are many rows per id, other solutions will be (much) faster - with proper index support. You did not specify ...

Explode a row into multiple rows based on a column value presto

I am looking to explode a row into multiple rows based on a column[integer] value, I am trying to do this using presto
Below is an example
id
count
1
5
2
2
expected output
id
count
1
5
1
5
1
5
1
5
1
5
2
2
2
2
in the above example, id 1 has to be repeated 5 times and id 2 has to be repeated 2 times based on the count.
Based on my experience, presto doesnt support recursive CTE.
Any help would be appreciated.
Thanks
You could make the count into array with REPEAT and then CROSS JOIN.
Your input:
CREATE TABLE test AS
SELECT id, count
FROM (
VALUES
(1, 5),
(2, 2)
) AS x (id, count)
Then:
SELECT id, t.count
FROM test
CROSS JOIN UNNEST(repeat(count, count)) AS t (count)
You don't need recursive CTE's here, you can use sequence or repeat to generate an array for corresponding length and then flatten it with unnest:
-- sample data
WITH dataset (id, count) AS (
VALUES (1, 5),
(2, 2)
)
-- query
select id, count
from dataset,
unnest (repeat(id, count)) as t (ignore)
-- unnest (sequence(0, count - 1)) as t (ignore)
Output:
id
count
1
5
1
5
1
5
1
5
1
5
2
2
2
2

Split function across multiple fields in BigQuery SQL

I have data like this:
Each column will have the same number of elements across a row, where the first element in the first column corresponds to the first element in the second column etc.
How can I flatten this to get the below?
With a single column I am able to do this by combining a CROSS JOIN with an UNNEST but I cannot get this to work with multiple columns since the join ends up creating multiple variations and UNNEST loses the order of the array so I can't match them.
If I were building the arrays from scratch, I would use some kind of STRUCT element in there, but I can't find a way of doing this when the arrays are created from a SPLIT()?
WITH_OFFSET is your friend here:
WITH strings AS (
SELECT "a,b,c" a, "aa,bb,cc" b
UNION ALL
SELECT "a1,b1,c1" a, "aa1,bb1,cc1" b
)
SELECT x_a, x_b
FROM strings
, UNNEST(SPLIT(a)) x_a WITH OFFSET o_a
JOIN UNNEST(SPLIT(b)) x_b WITH OFFSET o_b
ON o_a=o_b
Another approach for BigQuery Standard SQL is shown below
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, 'a|b|c' col1, 'n|o|p' col2 UNION ALL
SELECT 2, 'd|e', 'q|r' UNION ALL
SELECT 3, 'f|g|h|i', 's|t|u|v' UNION ALL
SELECT 4, 'j', 'w' UNION ALL
SELECT 5, 'k|l|m', 'x|y|z'
)
SELECT
id,
SPLIT(col1, '|')[SAFE_ORDINAL(pos)] value1,
SPLIT(col2, '|')[SAFE_ORDINAL(pos)] value2
FROM `project.dataset.table`,
UNNEST(GENERATE_ARRAY(1, ARRAY_LENGTH(SPLIT(col1, '|')))) pos
with expected result
Row id value1 value2
1 1 a n
2 1 b o
3 1 c p
4 2 d q
5 2 e r
6 3 f s
7 3 g t
8 3 h u
9 3 i v
10 4 j w
11 5 k x
12 5 l y
13 5 m z

Formulating Query

I have a table 'TempC3'
Itemset itemset2
1 3
2 3
2 5
3 5
I want combination of elements in these columns without repetition. So the output table shall be
Itemset itemset2 Itemset3
1 3 5
2 3 5
1 2 3
I designed a query but it wont return the last row of the desired output table -
Select distinct a.Itemset,
a. Itemset2,
c.itemset2
from TempC3 a
Join TempC3 c
ON c.Itemset2 > a.Itemset2
This query only results this:
Itemset itemset2 Itemset3
1 3 5
2 3 5
Since you want all combinations of itemsets, you have to concatenate the two columns in your input table into a single column first. You could do this, for example, using a CTE:
Fiddle Here
WITH CTE AS (
SELECT Itemset FROM TempC3
UNION
SELECT Itemset2 FROM TempC3
)
SELECT I1.Itemset, I2.Itemset, I3.Itemset FROM CTE AS I1
INNER JOIN CTE AS I2 ON I2.Itemset > I1.Itemset
INNER JOIN CTE AS I3 ON I3.Itemset > I2.Itemset

Search string value from mssql column, regex, group by

These data:
ID Desc
1 CUSTSEG
2 CUSTSEG;CARDMNU;CRC;CRCBISOA;CARDMNU;CRC;CRCBISOA
3 CUSTSEG;HKM
4 CUSTSEG;HKM;HKM
5 CUSTSEG;HKM;HKM;HKM;HKM;HKM;HKM;HKM
6 CUSTSEG;PHPM
7 CUSTSEG;PHPM;CARDMNU
8 CUSTSEG;PHPM;CARDMNU;ATM
must be queried into this format:
COUNT Desc
1 ATM
4 CARDMNU
2 CRC
2 CRCBISOA
8 CUSTSEG
10 HKM
3 PHPM
How can I achieve this using? Substring? I've tried this:
SELECT COUNT(*), CallTraversalLog
FROM [IVR].[dbo].[tblReportData]
WHERE CallTraversalLog Like '%CUSTSEG%'
GROUP BY CallTraversalLog
But the resultset I got is
COUNT Desc
1 CUSTSEG;PHPM;CARDMNU;CRC;ATM
1 CUSTSEG;PHPM;CARDMNU;CRC;CARDMNU;CRC
1 CUSTSEG;PHPM;CARDMNU;CRC;CARDMNU;CRC;CRCBISOA
2 CUSTSEG;PHPM;CARDMNU;CRC;CC
3 CUSTSEG;PHPM;CARDMNU;CRC;CRC
2 CUSTSEG;PHPM;CARDMNU;CRC;CRC;CARDMNU;CRC
1 CUSTSEG;PHPM;CARDMNU;CRC;CRC;CRC;CRC;CARDMNU;CRC
25 CUSTSEG;PHPM;CARDMNU;CRC;CRCACTIVATION
4 CUSTSEG;PHPM;CARDMNU;CRC;CRCACTIVATION;CRCENROLL
55 CUSTSEG;PHPM;CARDMNU;CRC;CRCAPST
I would split the strings and count the items. You need a table valued function that splits a string by delimiter. If you don't want to write your own function you can easily google one. Then CROSS APPLY the function to your table and count the items.
SELECT s.item, count(*)
FROM [IVR].[dbo].[tblReportData] d
CROSS APPLY dbo.fnSplitString(d.CallTraversalLog, ';') s
GROUP BY s.item