SQL group by count unique values into separate columns - sql

I have two columns, a and b, and both have categorical values. Say the database looks like this,
a b
a1 b1
a2 b2
a3 b1
......
I want to group by a and count unique values of b into separate columns, for example,
Value b1 b2 b3
a1 5 10 3
a2 4 6 7
....
I tried SELECT a, b, count(b) FROM table GROUP BY a, b and got something similar like this:
a1 b1 5
a1 b2 10
....
What's the SQL query to produce the desired output? Thanks.

Below is for BigQuery Standard SQL
SELECT
a,
COUNTIF(b = 'b1') AS b1,
COUNTIF(b = 'b2') AS b2,
COUNTIF(b = 'b3') AS b3
FROM t
GROUP BY a
-- ORDER BY a

You can perform conditional addition. For example:
select
a,
sum(case when b = 'b1' then 1 else 0 end) as b1,
sum(case when b = 'b2' then 1 else 0 end) as b2,
sum(case when b = 'b3' then 1 else 0 end) as b3
from t
group by a
order by a

A simple approach to achieve would be:
Select a,
count(CASE WHEN b = 'b1' THEN 1 ELSE NULL END) b1,
count(CASE WHEN b = 'b2' THEN 1 ELSE NULL END) b2,
count(CASE WHEN b = 'b2' THEN 1 ELSE NULL END) b3
from table
group by a
order by 1

Related

Window function with where condition (conditional row_number())

I have the below clause in a select statement
ROW_NUMBER() OVER(
PARTITION BY pc
ORDER BY
a DESC, b DESC
) r
How can I apply that function only to the rows which fulfill a certain condition but without filtering the selection at the end in a where clause? .
Sample data:
PC
A
B
pc1
a1
b1
pc1
a2
b2
pc1
a3
b3
Desired output (the condition in this case would be where a2!='a2'):
PC
A
B
R
pc1
a1
b1
1
pc1
a2
b2
null
pc1
a3
b3
2
EDIT: I've tried the below, but it does not start from 1 but from the whole rownum count.
CASE
WHEN condition THEN
ROW_NUMBER() OVER(
PARTITION BY pc
ORDER BY
a, b
)
END r1
Use row_number() within a "case when" statement with a second case statement in the "partition by" as below:
(Case condition when true then ROW_NUMBER() OVER(
PARTITION BY (case condition when true then pc end)
ORDER BY
a DESC, b DESC
)
end)r
Example:
create table sampledata(PC varchar(10), A varchar(10), B varchar(10));
insert into sampledata values('pc1', 'a1', 'b1');
insert into sampledata values('pc1', 'a2', 'b2');
insert into sampledata values('pc1', 'a3', 'b3');
Query:
select *,(Case when A<>'a2' then ROW_NUMBER() OVER(
PARTITION BY (case when A<>'a2' then pc end)
ORDER BY a , b DESC
)
end)r
from sampledata order by a, b desc
Output:
pc
a
b
r
pc1
a1
b1
1
pc1
a2
b2
null
pc1
a3
b3
2
db<fiddle here
If condition is A<>'a1' then
Query:
select *,(Case when A<>'a1' then ROW_NUMBER() OVER(
PARTITION BY (case when A<>'a1' then pc end)
ORDER BY a , b DESC
)
end)r
from sampledata order by a, b desc
Output:
pc
a
b
r
pc1
a1
b1
null
pc1
a2
b2
1
pc1
a3
b3
2
db<fiddle here
I suspect you want a conditional sum:
SUM(CASE WHEN <condition> THEN 1 ELSE 0 END) OVER (
PARTITION BY pc
ORDER BY a DESC, b DESC
) as r
If you want NULL for the non-matching values:
(CASE WHEN <condition>
THEN SUM(CASE WHEN <condition> THEN 1 ELSE 0 END) OVER (
PARTITION BY pc
ORDER BY a DESC, b DESC
)
END) as r

PostgreSQL - Removing NULLS row and column from conditional aggregation results

I have a query for a multidimensional table using conditional aggregation
select A,
SUM(case when D = 3 then D end) as SUM_D1,
SUM(case when D = 4 then D end) as SUM_D2)
The result:
A SUM_D1 SUM_D2
-------------------
a1 100 NULL
a1 200 NULL
a3 NULL NULL
a4 NULL NULL
However, I would like to hide all NULL rows and columns as follows:
A SUM_D1
-----------
a1 100
a1 200
I have looked for similar problems but they are not my expected answer.
Any help is much appreciated,
Thank you
I think this does what you want:
select A,
coalesce(sum(case when D = 3 then D end),
sum(case when D = 4 then D end)
) as sum_d
from t
group by A
having sum(case when d in (3, 4) then 1 else 0 end) > 0;
Note that this returns only one column -- as in your example. If both "3" and "4" are in the data, then the value is for the "3"s.
If you want a query that returns a variable number of columns, then you need to use dynamic SQL -- or some other method. SQL queries return a fixed number of columns.
One method would be to return the values as an array:
select a,
array_agg(d order by d) as ds,
array_agg(sumd order by d) as sumds
from (select a, d, sum(d) as sumd
from t
where d in (3, 4)
group by a, d
) d
group by a;
To filter all-NULL rows you can use HAVING
select *
from
(
select A,
SUM(case when D = 3 then D end) as SUM_D1,
SUM(case when D = 4 then D end) as SUM_D2)
...
) as dt
where SUM_D1 is not null
and SUM_D2 is not null
Of course, if you got simple conditions like the ones in your example you better filter before aggregation:
select A,
SUM(case when D = 3 then D end) as SUM_D1,
SUM(case when D = 4 then D end) as SUM_D2)
...
where D in (3,4)
Now at least one calculation will return a value, thus no need to check for all-NULL.
To filter all-NULL columns you need some Dynamic SQL:
materialize the data in a temporary tabke using Insert/Select
scan each column for all-NULL select 1 from temp having count(SUM_D1) > 0
dynamically create the Select list based on this
run the Select
But why do you think you need this? It will be confusing for a user to run the same Stored Procedure and receive a different number of columns for each run.
I may have misinterpreted your question because the solution seems so simple:
select A,
SUM(case when D = 3 then D end) as SUM_D1,
SUM(case when D = 4 then D end) as SUM_D2)
where D is not null
This is not what you want, is it? :-)
Null appear because the condition that's not handled by case statement
select A,
SUM(case when D = 3 then D end) as SUM_D1,
SUM(case when D = 4 then D end) as SUM_D2
from
Table1
group by
A
having
(case when D = 3 or D = 4 then D end) is not null
As comment said if you want to suppress the null value.. You can use having to suppress null using is not null

Joining multiple select queries on same table PostgreSql

Below is the sample table structure what I have got,
C1 C2 C3 C4
A D G X
B E H X
C F I X
select C2 as 1_C2, C3 as 1_C3 from table1 where C1 = A and C4=X
select C2 as 2_C2, C3 as 2_C3 from table1 where C1 = B and C4=X
select C2 as 3_C2, C3 as 3_C3 from table1 where C1 = C and C4=X
Above are the three select statements what I have got. Now I need to join all three select statements and get just one row as the output like,
1_C2 2_C2 3_C2 1_C3 2_C3 3_C3
D E F G H I
Saw multiple other posts but didn't match this requirement. Any help is highly appreciated.
You could use a CASE expression, combined with MAX():
select MAX(CASE WHEN C1 = 'A' THEN C2 END) as 1_C2,
MAX(CASE WHEN C1 = 'B' THEN C2 END) as 2_C2,
MAX(CASE WHEN C1 = 'C' THEN C2 END) as 3_C2,
MAX(CASE WHEN C1 = 'A' THEN C3 END) as 1_C3,
MAX(CASE WHEN C1 = 'B' THEN C3 END) as 2_C3,
MAX(CASE WHEN C1 = 'C' THEN C3 END) as 3_C3
from table1
where C1 in ('A', 'B', 'C')
and C4 = 'X';

Conditional UNPIVOT in TSQL

Let's say I have a table with an ID column, and several property columns
MyTable (ID, PI, P2, P3, P4)
ID P1 P2 P3 P4
1 A1 B C1 D1
2 C1 C2 B NULL
3 C2 Z NULL NULL
4 X A1 C1 NULL
So, I need to write a query to find out how many distinct property values out there, no matter in which column they are.
Value Count
A1 2
B 2
C1 3
C2 2
X1 1
...
I think I can get this by using UNPIVOT (correct me, if I am wrong)
Now, how can I get similar count but grouped by a number of non-null values in the row (the count of non-null values per row may, or may not include key columns, doesn't matter), i.e. output like this:
Value NonNullCount Count
A1 3 1
A1 4 1
B 3 1
B 4 1
C1 2 3
C1 4 1
C2 3 1
C2 2 1
...
Here is one method, using cross apply for the unpivot:
select vals.p, t.NonNullCount, count(*)
from (select t.*,
((case when p1 is not null then 1 else 0 end) +
(case when p2 is not null then 1 else 0 end) +
(case when p3 is not null then 1 else 0 end) +
(case when p4 is not null then 1 else 0 end)
) as NonNullCount
from table t
) t cross apply
(values (p1), (p2), (p3), (p4)) vals(p)
where vals.p is not null
group by vals.p, t.NonNullCount;

SQL Update with Case/IFs

I have 8 bit columns, A1,A2,a3,a4,b1,b2,b3,b4. All 8 are completely independent and its based on these that another field should be populated.
I want to update this other field with the text A, B or AB depending on which of any of the 8 columns are set to 1.
Here are a couple of examples;
- all 8 fields are set to 1 then populate with AB,
- if A3 and B1 are set to 1 then populate with AB,
- if A1 and A3 are set to 1 then populate with A,
- if B4 and B2 are set to 1 then populate with B.
So for any combination of A1 through B4 the field should be set
Below is the what I have tried but it is incomplete but will give an idea;
UPDATE
Correct answer from adrianm
UPDATE m
SET ref = ASet + BSet
FROM contactMaster m
inner join contact c on
m.contactid = c.contactid
CROSS APPLY (
SELECT CASE WHEN (c.A1 | c.A2 | c.A3 | c.A4) = 1 THEN 'C' ELSE '' END AS ASet
,CASE WHEN (c.B1 | c.B2 | c.B3 | c.B4) = 1 THEN 'D' ELSE '' END AS BSet
) AS CA1
where ref is null
UPDATE ContactMaster
SET ref = ASet + BSet
FROM ContactMaster
INNER JOIN Contact
ON ContactMaster.ContactId = Contact.ContactId
CROSS APPLY (
SELECT CASE WHEN (Contact.A1 | Contact.A2 | Contact.A3 | Contact.A4) = 1 THEN 'A' ELSE '' END AS ASet
,CASE WHEN (Contact.B1 | Contact.B2 | Contact.B3 | Contact.B4) = 1 THEN 'B' ELSE '' END AS BSet
) AS CA1
WHERE ContactMaster.ref IS NULL
IF (Select Count(*) from table where A1=1 AND A2 =1 AND a3 =1 AND a4 =1 AND b1 =1 AND b2 =1 AND b3 =1 AND b4=1 )>0
BEGIN
UPDATE MyTable
SET ColumnValue ='AB'
where A1=1 AND A2 =1 AND a3 =1 AND a4 =1 AND b1 =1 AND b2 =1 AND b3 =1 AND b4=1
END
ELSE IF (Select Count(*) from table where A1 =1 and A3 =1 )>0
BEGIN
Update MyTable set columnValue ='A'
where A1 =1 and A3 =1
END