I just can’t figure this one out. I've been trying for hours.
I have a table like this…
ID
sample
1
A
1
B
1
C
1
D
2
A
2
B
3
A
4
A
4
B
4
C
5
B
I'm interested in getting all the samples that match 'A', 'B' and 'C' for a given ID. The ID must contain all 3 sample types. There are a lot more sample types in the table but I'm interested in just A, B and C.
Here's my desired output...
ID
sample
1
A
1
B
1
C
4
A
4
B
4
C
If I use this:
WHERE sample in ('A', 'B', 'C')
I get this result:
ID
sample
1
A
1
B
1
C
1
D
2
A
2
B
3
A
4
A
4
B
4
C
5
B
Any ideas on how I can get my desired output?
One ANSI compliant way would be to aggregate using a distinct count
select id, sample
from t
where sample in ('A','B','C')
and id in (
select id from t
where sample in ('A','B','C')
group by id
having Count(distinct sample)=3
);
WHERE sample in (‘A’, ‘B’, ‘C’)
Should eliminate any other samples such as 'D'.
You could also try the following:
WHERE sample = ('A' OR 'B' OR 'C')
Not sure what flavor of SQL is being used, but here's an example to work off of:
Postgre - db-fiddle
SELECT id
FROM t
GROUP BY id
HAVING array_agg(sample) #> array['A', 'B', 'C']::varchar[];
-- HAVING 'A' = ANY (array_agg(sample))
-- AND 'B' = ANY (array_agg(sample))
-- AND 'C' = ANY (array_agg(sample))
Presto
SELECT id
FROM t
GROUP BY id
HAVING contains(array_agg(sample), 'A')
AND contains(array_agg(sample), 'B')
AND contains(array_agg(sample), 'C')
Related
I have a dataset grouped by test subjects that is filled according to the actions they perform. I need to find which customer does A and then, at some point, does B; but it doesn't necessarily have to be in the next action/row. And it can't be first does B and then A, it has to be specifically in that order. For example, I have this table:
Subject ActionID ActionOrder
1 A 1
1 C 2
1 D 3
1 B 4
1 C 5
2 D 1
2 A 2
2 C 3
2 B 4
3 B 1
3 D 2
3 A 3
4 A 1
Here subjects 1 and 2 are the ones that fulfil the order of actions condition. While 3 does not because it performs the actions in reverse order. And 4 only does action A
How can I get only subjects 1 and 2 as results? Thank you very much
Use conditional aggregation:
SELECT Subject
FROM tablename
WHERE ActionID IN ('A', 'B')
GROUP BY Subject
HAVING MAX(CASE WHEN ActionID = 'A' THEN ActionOrder END) <
MIN(CASE WHEN ActionID = 'B' THEN ActionOrder END)
See the demo.
Consider below option
select Subject
from (
select Subject,
regexp_replace(string_agg(ActionID, '' order by ActionOrder), r'[^AB]', '') check
from `project.dataset.table`
group by Subject
)
where not starts_with(check, 'B')
and check like '%AB%'
Above assumes that Subject can potentially do same actions multiple times that's why few extra checks in where clause. Other wise it would be just check = 'AB'
I have a SQL table of the following format:
ID Cat
1 A
1 B
1 D
1 F
2 B
2 C
2 D
3 A
3 F
Now, I want to create a table with one ID per row, and multiple Cat's in a row. My desired output looks as follows:
ID A B C D E F
1 1 1 0 1 0 1
2 0 1 1 1 0 0
3 1 0 0 0 0 1
I have found:
Transform table to one-hot-encoding of single column value
However, I have more than 1000 Cat's, so I am looking for code to write this automatically, rather than manually. Who can help me with this?
First let me transform the data you pasted into an actual table:
WITH data AS (
SELECT REGEXP_EXTRACT(data2, '[0-9]') id, REGEXP_EXTRACT(data2, '[A-Z]') cat
FROM (
SELECT SPLIT("""1 A
1 B
1 D
1 F
2 B
2 C
2 D
3 A
3 F""", '\n') AS data1
), UNNEST(data1) data2
)
SELECT * FROM data
(try sharing a table next time)
Now we can do some manual 1-hot encoding:
SELECT id
, MAX(IF(cat='A',1,0)) cat_A
, MAX(IF(cat='B',1,0)) cat_B
, MAX(IF(cat='C',1,0)) cat_C
FROM data
GROUP BY id
Now we want to write a script that will automatically create the columns we want:
SELECT STRING_AGG(FORMAT("MAX(IF(cat='%s',1,0))cat_%s", cat, cat), ', ')
FROM (
SELECT DISTINCT cat
FROM data
ORDER BY 1
)
That generates a string that you can copy paste into a query, that 1-hot encodes your arrays/rows:
SELECT id
,
MAX(IF(cat='A',1,0))cat_A, MAX(IF(cat='B',1,0))cat_B, MAX(IF(cat='C',1,0))cat_C, MAX(IF(cat='D',1,0))cat_D, MAX(IF(cat='F',1,0))cat_F
FROM data
GROUP BY id
And that's exactly what the question was asking for. You can generate SQL with SQL, but you'll need to write a new query using that result.
BigQuery has no dynamic column with standardSQL, but depending on what you want to do on the next step, there might be a way to make it easier.
Following code sample groups Cat by ID and uses a JavaScript function to do one-hot encoding and return JSON string.
CREATE TEMP FUNCTION trans(cats ARRAY<STRING>)
RETURNS STRING
LANGUAGE js
AS
"""
// TODO: Doing one hot encoding for one cat and return as JSON string
return "{a:1}";
"""
;
WITH id_cat AS (
SELECT 1 as ID, 'A' As Cat UNION ALL
SELECT 1 as ID, 'B' As Cat UNION ALL
SELECT 1 as ID, 'C' As Cat UNION ALL
SELECT 2 as ID, 'A' As Cat UNION ALL
SELECT 3 as ID, 'C' As Cat)
SELECT ID, trans(ARRAY_AGG(Cat))
FROM id_cat
GROUP BY ID;
I have a table like in example below.
SQL> select * from test;
ID PARENT_ID NAME
1 1 A
2 1 B
3 2 A
4 2 B
5 3 A
6 3 B
7 3 C
8 4 A
What I need is to get all unique subsets of names ((A,B), (A,B,C), (A)) or exclude duplicate subsets. You can see that (A,B) is twice there, one for PARENT_ID=1 and one for 2.
I want to exclude such duplicates:
ID PARENT_ID NAME
1 1 A
2 1 B
5 3 A
6 3 B
7 3 C
8 4 A
You can use DISTINCT to only return different values.
e.g.
SELECT DISTINCT GROUP_CONCAT(NAME SEPARATOR ',') as subsets
FROM TABLE_1
GROUP BY PARENT_ID;
SQL Fiddle
I have used 'group_concat' assuming you are using 'Mysql'. The equivalent function in Oracle is 'listagg()'. you can see it in action here in SQL fiddle
Here is the solution:-
Select a.* from
test a
inner join
(
Select nm, min(parent_id) as p_id
from
(
Select Parent_id, group_concat(NAME) as nm
from test
group by Parent_ID
) a
group by nm
)b
on a.Parent_id=b.p_id
order by parent_id, name
I have a table that looks similar to this:
ID OLD NEW TIME
1 a b 5
1 b c 7
1 c d 45
1 d e 4
2 a b 1
2 b d 8
2 d e 45
3 b c 15
3 c d 14
And I would like to build a report that looks like this (basically for each OLD data point grab the TIME value):
ID TimeForA TimeForB TimeForC TimeForD
1 5 7 45 4
2 1 8 NULL 45
3 NULL 15 14 NULL
I have been able to get all the data into the correct columns, but have not been able to combine each row into a single row for each ID. My current query looks like this (no I don't have every column in place yet, still just testing):
WITH CTE (id, ATime, BTime)
AS
(
select T1.oid, T1.loggedFor, null, T1.time as Atime
from Table1 T1
where T1.OLD = 'a'
union
select T1.oid, T1.loggedFor, T1.time as BTime, null
from Table1 T1
where T1.old = 'b'
)
select ID, ATime, BTime
from CTE
order by ID
Any help appreciated!
Try this:
select id,
sum(if(old = 'a',time,null)) as time_for_a,
sum(if(old = 'b',time,null)) as time_for_b,
sum(if(old = 'c',time,null)) as time_for_c,
sum(if(old = 'd',time,null)) as time_for_d
from test_tbl
group by id
order by id;
My SQL query returns results with 4 columns "A", "B", "C", "D".
Suppose the results are:
A B C D
1 1 1 1
1 1 1 2
2 2 2 1
Is it possible to get the count of duplicate rows with columns "A", "B", "C" in each row.
e.g. the expected result is:
A B C D cnt
1 1 1 1 2
1 1 1 2 2
2 2 2 1 1
I tried using count(*) over. But it returns me the total number of rows returned by the query.
Another information is that in example I have mentioned only 3 columns based on which I need to check the count. But my actual query has such 8 columns. And number of rows in database are huge. So I think group by will not be a feasible option here.
Any hint is appreciable.
Thanks.
Maybe too late, but probably the count over as analytic function (aka window function) within oracle helps you. When I understand your request correctly, this should solve your problem :
create table sne_test(a number(1)
,b number(1)
,c number(1)
,d number(1)
,e number(1)
,f number(1));
insert into sne_test values(1,1,1,1,1,1);
insert into sne_test values(1,1,2,1,1,1);
insert into sne_test values(1,1,2,4,1,1);
insert into sne_test values(1,1,2,5,1,1);
insert into sne_test values(1,2,1,1,3,1);
insert into sne_test values(1,2,1,2,1,2);
insert into sne_test values(2,1,1,1,1,1);
commit;
SELECT a,b,c,d,e,f,
count(*) over (PARTITION BY a,b,c)
FROM sne_test;
A B C D E F AMOUNT
-- -- -- -- -- -- ------
1 1 1 1 1 1 1
1 1 2 4 1 1 3
1 1 2 1 1 1 3
1 1 2 5 1 1 3
1 2 1 1 3 1 2
1 2 1 2 1 2 2
2 1 1 1 1 1 1
To find duplicates you must group the data based on key column
select
count(*)
,empno
from
emp
group by
empno
having
count(*) > 1;
This allows you to aggregate by empno even when multiple records exist for each category (more than one).
You have to use a subquery where you get the count of rows, grouped by A, B and C. And then you join this subquery again with your table (or with your query), like this:
select your_table.A, your_table.B, your_table.C, your_table.D, cnt
from
your_table inner join
(SELECT A, B, C, count(*) as cnt
FROM your_table
GROUP BY A, B, C) t
on t.A = your_table.A
and t.B = your_table.B
and t.C = your_table.C