Flag=1/0 based on multiple criteria on same column - sql

I have a temp table that is being created, we will say that column 1 is YearMonth, column2 as user_id, Column 3 is Type.
YearMonth User_id Type
200101 1 x
200101 2 y
200101 2 z
200102 1 x
200103 2 x
200103 2 p
200103 2 q
I want to count userids based on flag based on type. Hence I am trying to set flag to 1 and 0 but it always results in 0.
So for e.g. when the type contains x or y or z AND type contains P or Q then flag=1 by YearMonth.
I am trying something like
SELECT count (distinct t1.user_id) as count,
t1.YearMonth,
case when t1.type in ('x','y','z')
and
t1.type in ('p','q') then 1 else 0 end as flag
FROM table t1
group by 2,3;
I would like to know why it doesn't give output as below:
count YearMonth Flag
0 200001 1
2 200001 0
1 200002 1
1 200002 0
What am I missing here? Thanks

If I follow you correctly, you can use two levels of aggregation:
select yearmonth, flag, count(*) cnt
from (
select yearmonth, id,
case when max(case when t1.type in ('x', 'y', 'z') then 1 else 0 end) = 1
and max(case when t1.type in ('p', 'q') then 1 else 0 end) = 1
then 1
else 0
end as flag
from mytable
group by yearmonth, id
) t
group by yearmonth, flag
This first flags users for each month, using conditional aggregation, then aggregates by flag and month.
If you also want to display 0 for flags that do not appear for a given month, then you can generate the combinations with a cross join first, then brin the above resultset with a left join:
select y.yearmonth, f.flag, count(t.id) cnt
from (select distinct yearmonth from mytable) y
cross join (values (0), (1)) f(flag)
left join (
select yearmonth, id,
case when max(case when t1.type in ('x', 'y', 'z') then 1 else 0 end) = 1
and max(case when t1.type in ('p', 'q') then 1 else 0 end) = 1
then 1
else 0
end as flag
from mytable
group by yearmonth, id
) t on t.yearmonth = y.yearmonth and t.flag = f.flag
group by y.yearmonth, f.flag

I thought a very similar idea as GMB, however, like him, I don't get the expected results. Likely, however, we both are assuming the expected results are wrong:
SELECT COUNT(DISTINCT UserID) AS [Count],
YearMonth,
CASE WHEN COUNT(CASE WHEN [Type] IN ('x','y','z') THEN 1 END) > 0
AND COUNT(CASE WHEN [Type] IN ('p','q') THEN 1 END) > 0 THEN 1 ELSE 0
END AS Flag
FROM (VALUES(200101,1,'x'),
(200101,2,'y'),
(200101,2,'z'),
(200102,1,'x'),
(200103,2,'x'),
(200103,2,'p'),
(200103,2,'q')) V(YearMonth,UserID,[Type])
GROUP BY YearMonth;

Related

Select the greatest occurence from a column, based on date is frequencies are the same

I have the following dataset with let's say ID = {1,[...],5} and Col1 = {a,b,c,Null} :
ID
Col1
Date
1
a
01/10/2022
1
a
02/10/2022
1
a
03/10/2022
2
b
01/10/2022
2
c
02/10/2022
2
c
03/10/2022
3
a
01/10/2022
3
b
02/10/2022
3
Null
03/10/2022
4
c
01/10/2022
5
b
01/10/2022
5
Null
02/10/2022
5
Null
03/10/2022
I would like to group my rows by ID, compute new columns to show the number of occurences and compute a new column that would show a string of characters, depending on the frequency of Col1. With most a = Hi, most b = Hello, most c = Welcome, most Null = Unknown. If multiple modalities except Null have the same frequency, the most recent one based on date wins.
Here is the dataset I need :
ID
nb_a
nb_b
nb_c
nb_Null
greatest
1
3
0
0
0
Hi
2
0
1
2
0
Welcome
3
1
1
0
1
Hello
4
0
0
1
0
Welcome
5
0
1
0
2
Unknown
I have to do this in a compute recipe in Dataiku. The group by is handled by the group by section of the recipe while the rest of the query needs to be done in the "custom aggregations" section of the recipe. I'm having troubles with the if equality then most recent part of the code.
My SQL code looks like this :
CASE WHEN SUM(CASE WHEN Col1 = a THEN 1 ELSE 0) >
SUM(CASE WHEN Col1 = b THEN 1 ELSE 0)
AND SUM(CASE WHEN Col1 = a THEN 1 ELSE 0) >
SUM(CASE WHEN Col1 = c THEN 1 ELSE 0)
THEN 'Hi'
CASE WHEN SUM(CASE WHEN Col1 = b THEN 1 ELSE 0) >
SUM(CASE WHEN Col1 = a THEN 1 ELSE 0)
AND SUM(CASE WHEN Col1 = b THEN 1 ELSE 0) >
SUM(CASE WHEN Col1 = c THEN 1 ELSE 0)
THEN 'Hello'
CASE WHEN SUM(CASE WHEN Col1 = c THEN 1 ELSE 0) >
SUM(CASE WHEN Col1 = a THEN 1 ELSE 0)
AND SUM(CASE WHEN Col1 = c THEN 1 ELSE 0) >
SUM(CASE WHEN Col1 = b THEN 1 ELSE 0)
THEN 'Welcome'
Etc, etc, repeat for other cases.
But surely there must be a better way to do this right? And I have no idea how to include the most recent one when frequencies are the same.
Thank you for your help and sorry if my message isn't clear.
I tried to repro this in Azure Synapse using SQL script. Below is the approach.
Sample Table is created as in below image.
Create table tab1 (id int, col1 varchar(50), date_column date)
Insert into tab1 values(1,'a','2021-10-01')
Insert into tab1 values(1,'a','2021-10-02')
Insert into tab1 values(1,'a','2021-10-03')
Insert into tab1 values(2,'b','2021-10-01')
Insert into tab1 values(2,'c','2021-10-02')
Insert into tab1 values(2,'c','2021-10-03')
Insert into tab1 values(3,'a','2021-10-01')
Insert into tab1 values(3,'b','2021-10-02')
Insert into tab1 values(3,'Null','2021-10-03')
Insert into tab1 values(4,'c','2021-10-01')
Insert into tab1 values(5,'b','2021-10-01')
Insert into tab1 values(5,'Null','2021-10-02')
Insert into tab1 values(5,'Null','2021-10-03')
Step:1
Query is written to find the count of values within the group id,col1 and maximum date value within each combination of id, col1.
select
distinct id,col1,
count(*) over (partition by id,col1) as count,
case when col1='Null' then null else max(date_column) over (partition by id,col1) end as max_date
from tab1
Step:2
Row number is calculated within each id, col1 group on the decreasing order of count and max_date columns. This is done when two or more values have same frequency, then to assign value based on latest date.
select *, row_number() over (partition by id order by count desc, max_date desc) as row_num from
(select
distinct id,col1,
count(*) over (partition by id,col1) as count,
case when col1='Null' then null else max(date_column) over (partition by id,col1) end as max_date
from tab1)q1
Step:3
Line items with row_num=1 are filtered and values for the greatest column is assigned with the logic
most a = Hi, most b = Hello, most c = Welcome, most Null = Unknown.
Full Query
select id,
[greatest]=case when col1='a' then 'Hi'
when col1='b' then 'Hello'
when col1='c' then 'Welcome'
else 'Unknown'
end
from
(select *, row_number() over (partition by id order by count desc, max_date desc) as row_num from
(select
distinct id,col1,
count(*) over (partition by id,col1) as count,
case when col1='Null' then null else max(date_column) over (partition by id,col1) end as max_date
from tab1)q1
)q2 where row_num=1
Output
By this approach, even when the frequencies are same, based on the most recent date, required values can be updated.

Adding a dummy identifier to data that varies by position and value

I am working on a project in SQL Server with diagnosis codes and a patient can have up to 4 codes but not necessarily more than 1 and a patient cannot repeat a code more than once. However, codes can occur in any order. My goal is to be able to count how many times a Diagnosis code appears in total, as well as how often it appears in a set position.
My data currently resembles the following:
PtKey
Order #
Order Date
Diagnosis1
Diagnosis2
Diagnosis3
Diagnosis 4
345
1527
7/12/20
J44.9
R26.2
NULL
NULL
367
1679
7/12/20
R26.2
H27.2
G47.34
NULL
325
1700
7/12/20
G47.34
NULL
NULL
NULL
327
1710
7/12/20
I26.2
J44.9
G47.34
NULL
I would think the best approach would be to create a dummy column here that would match up the diagnosis by position. For example, Diagnosis 1 with A, and Diagnosis 2 with B, etc.
My current plan is to rollup the diagnosis using an unpivot:
UNPIVOT ( Diag for ColumnALL IN (Diagnosis1, Diagnosis2, Diagnosis3, Diagnosis4)) as unpvt
However, this still doesn’t provide a way to count the diagnoses by position on a sales order.
I want it to look like this:
Diagnosis
Total Count
Diag1 Count
Diag2 Count
Diag3 Count
Diag4 Count
J44.9
2
1
1
0
0
R26.2
1
1
0
0
0
H27.2
1
0
1
0
0
I26.2
1
1
0
0
0
G47.34
3
1
0
2
0
You can unpivot using apply and aggregate:
select v.diagnosis, count(*) as cnt,
sum(case when pos = 1 then 1 else 0 end) as pos_1,
sum(case when pos = 2 then 1 else 0 end) as pos_2,
sum(case when pos = 3 then 1 else 0 end) as pos_3,
sum(case when pos = 4 then 1 else 0 end) as pos_4
from data d cross apply
(values (diagnosis1, 1),
(diagnosis2, 2),
(diagnosis3, 3),
(diagnosis4, 4)
) v(diagnosis, pos)
where diagnosis is not null;
Another way is to use UNPIVOT to transform the columns into groupable entities:
SELECT Diagnosis, [Total Count] = COUNT(*),
[Diag1 Count] = SUM(CASE WHEN DiagGroup = N'Diagnosis1' THEN 1 ELSE 0 END),
[Diag2 Count] = SUM(CASE WHEN DiagGroup = N'Diagnosis2' THEN 1 ELSE 0 END),
[Diag3 Count] = SUM(CASE WHEN DiagGroup = N'Diagnosis3' THEN 1 ELSE 0 END),
[Diag4 Count] = SUM(CASE WHEN DiagGroup = N'Diagnosis4' THEN 1 ELSE 0 END)
FROM
(
SELECT * FROM #x UNPIVOT (Diagnosis FOR DiagGroup IN
([Diagnosis1],[Diagnosis2],[Diagnosis3],[Diagnosis4])) up
) AS x GROUP BY Diagnosis;
Example db<>fiddle
You can also manually unpivot via UNION before doing the conditional aggregation:
SELECT Diagnosis, COUNT(*) As Total Count
, SUM(CASE WHEN Position = 1 THEN 1 ELSE 0 END) As [Diag1 Count]
, SUM(CASE WHEN Position = 2 THEN 1 ELSE 0 END) As [Diag2 Count]
, SUM(CASE WHEN Position = 3 THEN 1 ELSE 0 END) As [Diag3 Count]
, SUM(CASE WHEN Position = 4 THEN 1 ELSE 0 END) As [Diag4 Count]
FROM
(
SELECT PtKey, Diagnosis1 As Diagnosis, 1 As Position
FROM [MyTable]
UNION ALL
SELECT PtKey, Diagnosis2 As Diagnosis, 2 As Position
FROM [MyTable]
WHERE Diagnosis2 IS NOT NULL
UNION ALL
SELECT PtKey, Diagnosis3 As Diagnosis, 3 As Position
FROM [MyTable]
WHERE Diagnosis3 IS NOT NULL
UNION ALL
SELECT PtKey, Diagnosis4 As Diagnosis, 4 As Position
FROM [MyTable]
WHERE Diagnosis4 IS NOT NULL
) d
GROUP BY Diagnosis
Borrowing Aaron's fiddle, to avoid needing to rebuild the schema from scratch, and we get this:
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=d1f7f525e175f0f066dd1749c49cc46d

Oracle SQL Developer question from newbie

Sorry could not think of more descriptive title. I have data that looks like:
MEMBERID
TICKETID
STATUS
A
123
Y
A
012
N
A
456
Y
B
XYZ
N
B
ABC
N
C
DEF
Y
C
789
Y
I want to separate the above into three tables:
(1) Members that ONLY have tickets with Status=Y
(2) Members that have mixed status tickets (so at least one ticket with status=Y and at least one ticket with status=N)
(3) Members that ONLY have tickets with Status=N
In Excel I would just do a pivot table that results in something like:
MEMBERID
"Y"
"N"
A
2
1
B
0
2
C
2
0
...then add a 4th column with a formula that allows me to separate member IDs by "Only Y", "Only N", and "Y/N". I'm new to SQL though, and can't seem to get "pivot" to run correctly, or maybe there's a "where" clause that could resolve this without using pivot? Help!
You could pivot but it's probably simpler to just do the aggregation yourself:
select memberid,
count(case when status = 'Y' then ticketid end) as y,
count(case when status = 'N' then ticketid end) as n
from your_table
group by memberid
order by memberid;
To get the fourth column you can either repeat the counts within another case expression:
select memberid,
count(case when status = 'Y' then ticketid end) as y,
count(case when status = 'N' then ticketid end) as n,
case
when count(case when status = 'Y' then ticketid end) > 0
and count(case when status = 'N' then ticketid end) > 0
then 'Y/N'
when count(case when status = 'Y' then ticketid end) > 0
then 'Only Y'
when count(case when status = 'N' then ticketid end) > 0
then 'Only N'
end as yn
from your_table
group by memberid
order by memberid;
Or put the initial query into a CTE or inline view which is clearer and has less repetition, so easier to maintain:
select memberid, y, n,
case
when y > 0 and n > 0 then 'Y/N'
when y > 0 then 'Only Y'
when n > 0 then 'Only N'
end as yn
from (
select memberid,
count(case when status = 'Y' then ticketid end) as y,
count(case when status = 'N' then ticketid end) as n
from your_table
group by memberid
)
order by memberid;
Either way you end up with:
MEMBERID Y N YN
-------- - - ------
A 2 1 Y/N
B 0 2 Only N
C 2 0 Only Y
SQL Fiddle

Best way to do a Count in TSQL

I am not so good in TSQL and i want to write a report in this manner:
input: Table A
ID Company Product Flag
1 A Car Y
2 A Van N
3 B Van Y
4 A Part N
Output
Company Y N
A 1 2
B 1 0
if one can assist in TSQL...
You could use conditional aggregation:
SELECT Company
,SUM(CASE WHEN Flag = 'Y' THEN 1 ELSE 0 END) AS Y
,SUM(CASE WHEN Flag = 'N' THEN 1 ELSE 0 END) AS N
FROM tab
GROUP BY Company
You are looking for conditional aggregation:
select company,
sum(case when flag = 'Y' then 1 else 0 end) as num_y,
sum(case when flag = 'N' then 1 else 0 end) as num_n
from t
group by company;
You can use CASE expressions (the people call it "conditional aggregation") to count the flagged products per customer like this (which will ignore a record when the Product column is empty):
SELECT Company
, COUNT(CASE Flag WHEN 'Y' THEN Product END) AS Y
, COUNT(CASE Flag WHEN 'N' THEN Product END) AS N
FROM YourTable
GROUP BY Company;
Or you can use this PIVOT query, which is a short form of writing the above:
SELECT Company, Y, N
FROM (SELECT Company, Product, Flag FROM YourTable) AS src
PIVOT (COUNT(Product) FOR Flag IN (Y, N)) AS pvt;
use case when
select company,
sum(case when flag='Y' then 1 else 0 end) as Y,
sum(case when flag='N' then 1 else 0 end) as N from tabe_data
group by company

multiple count conditions with single query

I have a table like below -
Student ID | History | Maths | Geography
1 A B B
2 C C E
3 D A B
4 E D A
How to find out how many students got A in history, B in maths and E in Geography with a single sql query ?
If you want to get number of students who got A in History in one column, number of students who got B in Maths in second column and number of students who got E in Geography in third then:
select
sum(case when [History] = 'A' then 1 else 0 end) as HistoryA,
sum(case when [Maths] = 'B' then 1 else 0 end) as MathsB,
sum(case when [Geography] = 'E' then 1 else 0 end) as GeographyC
from Table1
If you want to count students who got A in history, B in maths and E in Geography:
select count(*)
from Table1
where [History] = 'A' and [Maths] = 'B' and [Geography] = 'E'
If you want independent counts use:
SELECT SUM(CASE WHEN Condition1 THEN 1 ELSE 0 END) AS 'Condition1'
,SUM(CASE WHEN Condition2 THEN 1 ELSE 0 END) AS 'Condition2'
,SUM(CASE WHEN Condition3 THEN 1 ELSE 0 END) AS 'Condition3'
FROM YourTable
If you want multiple conditions for one count use:
SELECT COUNT(*)
FROM YourTable
WHERE Condition1
AND Condition2
AND Condition3
It sounds like you want multiple independent counts:
SELECT SUM(CASE WHEN History = 'A' THEN 1 ELSE 0 END) AS 'History A'
,SUM(CASE WHEN Maths = 'B' THEN 1 ELSE 0 END) AS 'Maths B'
,SUM(CASE WHEN Geography = 'E' THEN 1 ELSE 0 END) AS 'Geography E'
FROM YourTable
You can try to select from multiple select statements
SELECT t1.*, t2.*, t3.* FROM
(SELECT COUNT(*) AS h FROM students WHERE History = 'A') as t1,
(SELECT COUNT(*) AS m FROM students WHERE Maths = 'B') as t2,
(SELECT COUNT(*) AS g FROM students WHERE Geography = 'E') as t3