Redshift, count items in column seperated with comma - sql

I have data that a column saved a group of number
| user | col |
| ------- | ------- |
| 1 | 3,7,11,25,44,56,77,32,34,55 |
| 2 | 3,7,25,44,37,89,56,99,103,13 |
| 1 | 3,10,11,25,44,56,33,32,34,55 |
I know I can split part the columns and count but do we have any different way to count the numbers?
|user| new-col | count|
| ------- | ------- |
| 1 | 3 | 2 |
| 1 | 7 | 1 |
| 1 | 11 | 2 |
| 1 | 25 | 2 |
| 1 | 44 |2 |
| 1 | 56 |1 |
| 1 | 77 | 1 |
| 1 | 32 | 2 |

You could use a union query along with SPLIT_PART:
WITH cte AS (
SELECT user, SPLIT_PART(col, ',', 1) AS val FROM yourTable UNION ALL
SELECT user, SPLIT_PART(col, ',', 2) FROM yourTable UNION ALL
SELECT user, SPLIT_PART(col, ',', 3) FROM yourTable UNION ALL
SELECT user, SPLIT_PART(col, ',', 4) FROM yourTable UNION ALL
SELECT user, SPLIT_PART(col, ',', 5) FROM yourTable UNION ALL
SELECT user, SPLIT_PART(col, ',', 6) FROM yourTable UNION ALL
SELECT user, SPLIT_PART(col, ',', 7) FROM yourTable UNION ALL
SELECT user, SPLIT_PART(col, ',', 8) FROM yourTable UNION ALL
SELECT user, SPLIT_PART(col, ',', 9) FROM yourTable UNION ALL
SELECT user, SPLIT_PART(col, ',', 10) FROM yourTable
)
SELECT
user,
val,
COUNT(*) AS cnt
FROM cte
GROUP BY
user,
val;
But note that all we are doing above in the CTE is really just normalizing your data so that each user-value relationship occupies a separate record. Ideally you should change your table design and move away from storing CSV.
If you instead want just the count of numbers per user, then use:
SELECT
user,
COUNT(*) AS cnt
FROM cte
GROUP BY
user;

Query.
with t as (
select 1 as user, '3,7,11,25,44,56,77,32,34,55' as col
union all
select 2 as user, '3,7,25,44,37,89,56,99,103,13' as col
union all
select 1 as user, '3,10,11,25,44,56,33,32,34,55' as col
)
select a.user, a.val, count(*) as cnt
from (
select a.user
, SPLIT_PART(a.col, ',', b.no) as val
from t a
cross join (
select * from generate_series(1,10) as no
) b
) a
group by a.user, a.val
order by a.user, a.val

Count the number of commas in the string using REGEXP_COUNT and add 1.
CREATE TEMP TABLE examples (
user_id INT
, value_list VARCHAR
);
INSERT INTO examples
SELECT 1 , '3,7,11,25,44,56,77,32,34,55'
UNION ALL SELECT 2 , '3,7,25,44,37,89,56,99,103,13'
UNION ALL SELECT 1 , '3,10,11,25,44,56,33,32,34,55'
;
SELECT user_id
, SUM(REGEXP_COUNT(value_list,',')+1) value_count
FROM examples
GROUP BY 1
;
Output
user_id | value_count
---------+-------------
1 | 20
2 | 10

This answers the original version of the question.
You can count the number of comma-delimited values with:
select (case when col = '' then 0
else length(col) - length(replace(col, ',', '')) + 1
end) as values_count
from t;
That said, you should fix your data model so you are not storing multiple values in a column. It is particularly irksome that you are storing numbers as strings, as well. You want a junction/association table.

Related

How can I count the total no of rows which is not containing some value in array field? It should include null values as well

| names |
| -----------------------------------|
| null |
| null |
| [{name:'test'},{name:'test1'}] |
| [{name:'test'},{name:'test1'}] |
| [{name:'test1'},{name:'test2'}] |
I want to count the no of rows which does not have the value 'test' in the name key.
Here it should give answer as 3 (Row no 1, 2 and 5th row) because all these row do not contain the value 'test'.
Use below approach
select count(*)
from your_table
where 0 = ifnull(array_length(regexp_extract_all(names, r"\b(name:'test')")), 0)
you can test it with below data (that resemble whatever you presented in your question)
with your_table as (
select null names union all
select null union all
select "[{name:'test'},{name:'test1'}]" union all
select "[{name:'test'},{name:'test1'}]" union all
select "[{name:'test1'},{name:'test2'}]"
)
with output
Below approach will work,
with your_table as (
select null names union all
select null union all
select [('name','test'),('name','test1')] union all
select [('name','test'),('name','test1')] union all
select [('name','test1'),('name','test2')]
)
SELECT COUNT(*) as count FROM your_table
WHERE NOT EXISTS (
SELECT 1
FROM UNNEST(your_table.names) AS names
WHERE names IN (('name','test'))
)

create window group based on value of preceding row

I have a table like so:
#standardSQL
WITH k AS (
SELECT 1 id, 1 subgrp, 'stuff1' content UNION ALL
SELECT 2, 2, 'stuff2' UNION ALL
SELECT 3, 3, 'stuff3' UNION ALL
SELECT 4, 4, 'stuff4' UNION ALL
SELECT 5, 1, 'ostuff1' UNION ALL
SELECT 6, 2, 'ostuff2' UNION ALL
SELECT 7, 3, 'ostuff3' UNION ALL
SELECT 8, 4, 'ostuff4'
)
and like to group based on the subgrp value to re-create the missing grp: if subgrp value is smaller than previous row, belongs to same group.
Intermediate result would be:
| id | grp | subgrp | content |
| 1 | 1 | 1 | stuff1 |
| 2 | 1 | 2 | stuff2 |
| 3 | 1 | 3 | stuff3 |
| 4 | 1 | 4 | stuff4 |
| 5 | 2 | 1 | ostuff1 |
| 6 | 2 | 2 | ostuff2 |
| 7 | 2 | 3 | ostuff3 |
| 8 | 2 | 4 | ostuff4 |
on which I can then apply
SELECT id, grp, ARRAY_AGG(STRUCT(subgrp, content)) rcd
FROM k ORDER BY id, grp
to have I nice nested structure.
Notes:
with 'id' ordered, subgrp is always in sequence so no 3 before 2
groups are not always 4 subgrp's - this is just to illustrate so cannot hardcode
Problem: how can I (re)create the grp column here ? I played with several Window functions to no avail.
EDIT
Although Gordon's answer work, it took 3min over 104M records to run and I had to remove an ORDER BY on the final resultset because of Resources exceeded during execution: The query could not be executed in the allotted memory. ORDER BY operator used too much memory.
Anyone having an alternative solution for large dataset ?
A simple way to assign the group is to do a cumulative count of the subgrp = 1 values:
select k.*,
sum(case when subgrp = 1 then 1 else 0 end) over (order by id) as grp
from k;
You can also do it your way, using lag() and a cumulative sum. That requires a subquery:
select k.*,
sum(case when prev_subgrp = subgrp then 0 else 1 end) over (order by id) as grp
from (select k.*,
lag(subgrp) over (order by id) as prev_subgrp
from k
) k
Below can potentially perform better - but has limitation - I assume there is no gaps in numbering within subgroups and respective ids
#standardSQL
WITH k AS (
SELECT 1 id, 1 subgrp, 'stuff1' content UNION ALL
SELECT 2, 2, 'stuff2' UNION ALL
SELECT 3, 3, 'stuff3' UNION ALL
SELECT 4, 4, 'stuff4' UNION ALL
SELECT 5, 1, 'ostuff1' UNION ALL
SELECT 6, 2, 'ostuff2' UNION ALL
SELECT 7, 3, 'ostuff3' UNION ALL
SELECT 8, 4, 'ostuff4'
)
SELECT
ROW_NUMBER() OVER(ORDER BY id) grp,
rcd
FROM (
SELECT
MIN(id) id,
ARRAY_AGG(STRUCT(subgrp, content)) rcd
FROM k
GROUP BY id - subgrp
)
result is
Row grp rcd.subgrp rcd.content
1 1 1 stuff1
2 stuff2
3 stuff3
4 stuff4
2 2 1 ostuff1
2 ostuff2
3 ostuff3
4 ostuff4

Union merging different columns

I have two tables:
Table1:
DATAID| NAME | FACTOR
1 | Ann | 1
2 | Kate | 1
3 | Piter | 1
Table2:
DATAID| NAME | FACTOR
1 | John | 2
6 | Arse | 2
3 | Garry | 2
I would like UNION those tables and get this result:
DATAID| NAME | FACTOR
1 | Ann | 1,2
2 | Kate | 1
3 | Piter | 1,2
6 | Arse | 2
So when there's 2 rows with same dataid, I would like to get 'NAME' column from Table1 and some kind of aggregated FACTOR, for example '1,2' or 3
One method uses listagg():
select dataid, name,
listagg(factor, ',') within group (order by factor) as factors
from ((select dataid, name, factor from table1 t1
) union all
(select dataid, name, factor from table2 t2
)
) t
group by dataid, name;
Note: I notice that the names are not the same for a given id. You can choose one by using aggregation functions.
Or, if you only have one row in each table, you can use a full outer join:
select coalesce(t1.dataid, t2.dataid) as dataid,
coalesce(t1.name, t2.name) as name,
trim(leading ',' from coalesce(',' || t1.factor, ',') || coalesce(',' || t2.factor, '') as factors
from t1 full outer join
t2
on t1.dataid = t2.dataid;
Something like this should work. In your actual situation you will not need the first two CTE's (the subqueries in the WITH clause I added for testing).
with
table1 ( dataid, name, factor ) as (
select 1, 'Ann' , 1 from dual union all
select 2, 'Kate' , 1 from dual union all
select 3, 'Piter', 1 from dual
),
table2 ( dataid, name, factor ) as (
select 1, 'John' , 2 from dual union all
select 6, 'Arse' , 2 from dual union all
select 3, 'Garry', 2 from dual
),
u ( dataid, name, factor, source ) as (
select dataid, name, factor, 1 from table1
union all
select dataid, name, factor, 2 from table2
),
z ( dataid, name, factor ) as (
select dataid, first_value(name) over (partition by dataid order by source),
factor
from u
)
select dataid, name,
listagg(factor, ',') within group (order by factor) as factor
from z
group by dataid, name
order by dataid
;
Output:
DATAID NAME FACTOR
------- ----- ---------
1 Ann 1,2
2 Kate 1
3 Piter 1,2
6 Arse 2
4 rows selected.

Using Case in a select statement

Consider the following table
create table temp (id int, attribute varchar(25), value varchar(25))
And values into the table
insert into temp select 100, 'First', 234
insert into temp select 100, 'Second', 512
insert into temp select 100, 'Third', 320
insert into temp select 101, 'Second', 512
insert into temp select 101, 'Third', 320
I have to deduce a column EndResult which is dependent on 'attribute' column. For each id, I have to parse through attribute values in the order
First, Second, Third and choose the very 1st value which is available i.e. for id = 100, EndResult should be 234 for the 1st three records.
Expected result:
| id | EndResult |
|-----|-----------|
| 100 | 234 |
| 100 | 234 |
| 100 | 234 |
| 101 | 512 |
| 101 | 512 |
I tried with the following query in vain:
select id, case when isnull(attribute,'') = 'First'
then value
when isnull(attribute,'') = 'Second'
then value
when isnull(attribute,'') = 'Third'
then value
else '' end as EndResult
from
temp
Result
| id | EndResult |
|-----|-----------|
| 100 | 234 |
| 100 | 512 |
| 100 | 320 |
| 101 | 512 |
| 101 | 320 |
Please suggest if there's a way to get the expected result.
You can use analytical function like dense_rank to generate a numbering, and then select those rows that have the number '1':
select
x.id,
x.attribute,
x.value
from
(select
t.id,
t.attribute,
t.value,
dense_rank() over (partition by t.id order by t.attribute) as priority
from
Temp t) x
where
x.priority = 1
In your case, you can conveniently order by t.attribute, since their alphabetical order happens to be the right order. In other situations you could convert the attribute to a number using a case, like:
order by
case t.attribute
when 'One' then 1
when 'Two' then 2
when 'Three' then 3
end
In case the attribute column have different values which are not in alphabetical order as is the case above you can write as:
with cte as
(
select id,
attribute,
value,
case attribute when 'First' then 1
when 'Second' then 2
when 'Third' then 3 end as seq_no
from temp
)
, cte2 as
(
select id,
attribute,
value,
row_number() over ( partition by id order by seq_no asc) as rownum
from cte
)
select T.id,C.value as EndResult
from temp T
join cte2 C on T.id = C.id and C.rownum = 1
DEMO
Here is how you can achieve this using ROW_NUMBER():
WITH t
AS (
SELECT *
,ROW_NUMBER() OVER (
PARTITION BY id ORDER BY (CASE attribute WHEN 'First' THEN 1
WHEN 'Second' THEN 2
WHEN 'Third' THEN 3
ELSE 0 END)
) rownum
FROM TEMP
)
SELECT id
,(
SELECT value
FROM t t1
WHERE t1.id = t.id
AND rownum = 1
) end_result
FROM t;
For testing purpose, please see SQL Fiddle demo here:
SQL Fiddle Example
keep it simple
;with cte as
(
select row_number() over (partition by id order by (select 1)) row_num, id, value
from temp
)
select t1.id, t2.value
from temp t1
left join cte t2
on t1.Id = t2.id
where t2.row_num = 1
Result
id value
100 234
100 234
100 234
101 512
101 512

tSQL UNPIVOT of comma concatenated column into multiple rows

I have a table that has a value column. The value could be one value or it could be multiple values separated with a comma:
id | assess_id | question_key | item_value
---+-----------+--------------+-----------
1 | 859 | Cust_A_1 | 1,5
2 | 859 | Cust_B_1 | 2
I need to unpivot the data based on the item_value to look like this:
id | assess_id | question_key | item_value
---+-----------+--------------+-----------
1 | 859 | Cust_A_1 | 1
1 | 859 | Cust_A_1 | 5
2 | 859 | Cust_B_1 | 2
How does one do that in tSQL on SQL Server 2012?
We have a user defined function that we use for stuff like this that we called "split_delimiter":
CREATE FUNCTION [dbo].[split_delimiter](#delimited_string VARCHAR(8000), #delimiter_type CHAR(1))
RETURNS TABLE AS
RETURN
WITH cte10(num) AS
(
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
)
,cte100(num) AS
(
SELECT 1
FROM cte10 t1, cte10 t2
)
,cte10000(num) AS
(
SELECT 1
FROM cte100 t1, cte100 t2
)
,cte1(num) AS
(
SELECT TOP (ISNULL(DATALENGTH(#delimited_string),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM cte10000
)
,cte2(num) AS
(
SELECT 1
UNION ALL
SELECT t.num+1
FROM cte1 t
WHERE SUBSTRING(#delimited_string,t.num,1) = #delimiter_type
)
,cte3(num,[len]) AS
(
SELECT t.num
,ISNULL(NULLIF(CHARINDEX(#delimiter_type,#delimited_string,t.num),0)-t.num,8000)
FROM cte2 t
)
SELECT delimited_item_num = ROW_NUMBER() OVER(ORDER BY t.num)
,delimited_value = SUBSTRING(#delimited_string, t.num, t.[len])
FROM cte3 t;
GO
It will take a varchar value up to 8000 characters and will return a table with the delimited elements broken into rows. In your example, you'll want to use an outer apply to turn those delimited values into separate rows:
SELECT my_table.id, my_table.assess_id, question_key, my_table.delimited_items.item_value
FROM my_table
OUTER APPLY(
SELECT delimited_value AS item_value
FROM my_database.dbo.split_delimiter(my_table.item_value, ',')
) AS delimited_items