How to use count distinct along multiple columns? - sql

Input:
id sem1 sem2 sem3 sem4 sem5 sem6 sem7
1 S O S R null null null
2 O O R R S null null
Desired Output:
id O R S
1 1 1 2
2 2 2 1

If your database supports APPLY/UNPIVOT operator then use this
CROSS APPLY method
SELECT id,
SUM(CASE WHEN val = 'O' THEN 1 ELSE 0 END) O,
SUM(CASE WHEN val = 'R' THEN 1 ELSE 0 END) R,
SUM(CASE WHEN val = 'S' THEN 1 ELSE 0 END) S
FROM mytable
CROSS apply (VALUES (sem1),
(sem2),
(sem3),
(sem4),
(sem5),
(sem6),
(sem7)) cs(val)
GROUP BY id
SQL FIDDLE DEMO
UNPIVOT method
SELECT id,
SUM(CASE WHEN val = 'O' THEN 1 ELSE 0 END) O,
SUM(CASE WHEN val = 'R' THEN 1 ELSE 0 END) R,
SUM(CASE WHEN val = 'S' THEN 1 ELSE 0 END) S
FROM (SELECT *
FROM mytable) a
UNPIVOT (val
FOR col IN ( sem1,
sem2,
sem3,
sem4,
sem5,
sem6,
sem7 )) upv
GROUP BY id
SQL FIDDLE DEMO
I personally prefer CROSS APPLY method over UNPIVOT since it is more readable. Performance wise both will be identical

Related

Counting columns with a where clause

Is there a way to count a number of columns which has a particular value for each rows in Hive.
I have data which looks like in input and I want to count how many columns have value 'a' and how many column have value 'b' and get the output like in 'Output'.
Is there a way to accomplish this with Hive query?
One method in Hive is:
select ( (case when cl_1 = 'a' then 1 else 0 end) +
(case when cl_2 = 'a' then 1 else 0 end) +
(case when cl_3 = 'a' then 1 else 0 end) +
(case when cl_4 = 'a' then 1 else 0 end) +
(case when cl_5 = 'a' then 1 else 0 end)
) as count_a,
( (case when cl_1 = 'b' then 1 else 0 end) +
(case when cl_2 = 'b' then 1 else 0 end) +
(case when cl_3 = 'b' then 1 else 0 end) +
(case when cl_4 = 'b' then 1 else 0 end) +
(case when cl_5 = 'b' then 1 else 0 end)
) as count_b
from t;
To get the total count, I would suggest using a subquery and adding count_a and count_b.
Use lateral view with explode on the data and do the aggregations on it.
select id
,sum(cast(col='a' as int)) as cnt_a
,sum(cast(col='b' as int)) as cnt_b
,sum(cast(col in ('a','b') as int)) as cnt_total
from tbl
lateral view explode(array(ci_1,ci_2,ci_3,ci_4,ci_5)) tbl as col
group by id

JOIN always the default value, else join the match value

I have the following SQL Server Query
select r.isactive,r.workingyear,r.startperiod,r.endperiod,r.anniversary
from setup_holiday_policy t cross apply
(select data
from dbo.Split(t.scheduleapplication, ',')
) di cross apply
(select max(case when did.id = 1 then did.data end) as isactive,
max(case when did.id = 2 then did.data end) as workingyear,
max(case when did.id = 3 then did.data end) as anniversary,
max(case when did.id = 4 then did.data end) as startperiod,
max(case when did.id = 5 then did.data end) as endperiod
from dbo.Split(di.data,':') did
) r
WHERE r.workingyear = #employeeworkingyears
The policy table can have a 0 value in the workingyear field. Meaning that when this field has 0 then is the default record I should return.
setup_holiday_policy
So, if #employeeworkingyears = 2 and there is no workingyears = 2 in setup_holiday_policy I should return the default row that has the 0 value in workingyears field.
This is a sample of the rows returned.
Any clue how to achieve this?
If only one row is going to be returned (as suggested by the sample data), you can do this using top:
select top 1 r.isactive,r.workingyear,r.startperiod,r.endperiod,r.anniversary
from setup_holiday_policy t cross apply
(select data
from dbo.Split(t.scheduleapplication, ',')
) di cross apply
(select max(case when did.id = 1 then did.data end) as isactive,
max(case when did.id = 2 then did.data end) as workingyear,
max(case when did.id = 3 then did.data end) as anniversary,
max(case when did.id = 4 then did.data end) as startperiod,
max(case when did.id = 5 then did.data end) as endperiod
from dbo.Split(di.data,':') did
) r left outer join
(select #employeeworkingyears as employeeworkingyears
) e
on
WHERE r.workingyear in (#employeeworkingyears, 0)
order by r.workingyear desc;

SQL query rewrite for prettification and or performance improvement

I have a query that essentially amounts to:
Select query 1
Union
Select query 2
where rowid not in query 1 rowids
Is there a prettier / more performant way to do this? I'm assuming the results of query 1 would be cached and thus utilized in the union... but it's also kinda oogly.
Update with the original query:
SELECT FruitType
, count(CASE WHEN Status = 0 THEN 1 ELSE 0 END) AS Fresh
, count(CASE WHEN Status = 1 THEN 1 ELSE 0 END) AS Ripe
, count(CASE WHEN Status = 2 THEN 1 ELSE 0 END) AS Moldy
FROM FruitTypes FT1
LEfT JOIN Fruits F on F.FTID = FT1.ID
where
Fruit.IsHighPriced = 0
GROUP BY FruitType
Union ALL
select FruitType, 0 as Fresh, 0 as Ripe, 0 as Moldy
FROM FruitTypes ft3
where
ft3.StoreID = #PassedInStoreID
and FruitType NOT IN
(
SELECT FruitType
, count(CASE WHEN Status = 0 THEN 1 ELSE 0 END) AS Fresh
, count(CASE WHEN Status = 1 THEN 1 ELSE 0 END) AS Ripe
, count(CASE WHEN Status = 2 THEN 1 ELSE 0 END) AS Moldy
FROM FruitTypes FT2
LEfT JOIN Fruits F on F.FTID = FT2.ID
where
Fruit.IsHighPriced = 0
GROUP BY FruitType
)
Thanks!
You don't need the second case statement in the NOT in clause. And not Exists is often faster in SQL Server.
SELECT FruitType
, count(CASE WHEN Status = 0 THEN 1 ELSE 0 END) AS Fresh
, count(CASE WHEN Status = 1 THEN 1 ELSE 0 END) AS Ripe
, count(CASE WHEN Status = 2 THEN 1 ELSE 0 END) AS Moldy
FROM FruitTypes FT1
LEfT JOIN Fruits F on F.FTID = FT1.ID
where
Fruit.IsHighPriced = 0
GROUP BY FruitType
Union ALL
select FruitType, 0 as Fresh, 0 as Ripe, 0 as Moldy
FROM FruitTypes ft3
where
ft3.StoreID = #PassedInStoreID
and NOT EXISTS
(
SELECT *
FROM FruitTypes FT2
LEfT JOIN Fruits F on F.FTID = FT2.ID
where
Fruit.IsHighPriced = 0
and ft3.FruitType = FT2.FruitType
)
The prettiest way of writing would probably be by turning query #1 into a view or a function, then using that view or function to call the repetitious code.
Performance could possibly be improved by using query #1 to fill a temp table or table variable, then using that temp table in place of the repititious code.

Using Pivot or CTE to horizontalize a query

I am using sql 2008
My data set looks like
Entity Type1 Type2 Balance
1 A R 100
1 B Z 200
1 C R 300
2 A X 1000
2 B Y 2000
My output should look like
Entity A-Type2 A-Balance B-Type2 B-Balance C-Type2 C-Balance
1 R 100 Z 200 R 300
2 X 1000 Y 2000 0
Now I started writing a pivot query, and I think I can get away with MAX because there should be one record per Entity/Type1 combination. But can not figure out how to do two fields in one pivot. Is this possible? Is this something that CTE could help out with?
Easiest is the MAX idea, but with a CASE statement, e.g.:
SELECT
Entity,
MAX(CASE WHEN Type1 = 'A' THEN Type2 ELSE NULL END) AS AType2,
MAX(CASE WHEN Type1 = 'A' THEN Balance ELSE NULL END) AS ABalance,
MAX(CASE WHEN Type1 = 'B' THEN Type2 ELSE NULL END) AS BType2,
MAX(CASE WHEN Type1 = 'B' THEN Balance ELSE NULL END) AS BBalance,
MAX(CASE WHEN Type1 = 'C' THEN Type2 ELSE NULL END) AS CType2,
MAX(CASE WHEN Type1 = 'C' THEN Balance ELSE NULL END) AS CBalance
FROM
...
GROUP BY
Entity
In other words, only use the value when Type1 is a specific value (with other Type1 values getting a null).
You just use conditional aggregation for the pivoting like this:
select Entity,
max(case when Type1 = 'A' then Type2 end) as A_Type2,
max(case when Type1 = 'A' then Balance else 0 end) as A_Balance,
max(case when Type1 = 'B' then Type2 end) as B_Type2,
max(case when Type1 = 'B' then Balance else 0 end) as B_Balance,
max(case when Type1 = 'C' then Type2 end) as C_Type2,
max(case when Type1 = 'C' then Balance else 0 end) as C_Balance
from MyDataSet mds
group by Entity;
Here's doing it with a pivot and a lookup.
SELECT
data.Entity,
ISNULL(a.Type2,'') AS [A-Type2],
ISNULL([A-Balance],0) AS [A-Balance],
ISNULL(b.Type2,'') AS [B-Type2],
ISNULL([B-Balance],0) AS [B-Balance],
ISNULL(c.Type2,'') AS [C-Type2],
ISNULL([C-Balance],0) AS [C-Balance]
FROM
(
SELECT
Entity,
A AS [A-Balance],
B AS [B-Balance],
C AS [C-Balance]
FROM
(
SELECT Entity, Type1, Balance FROM #table
) t
PIVOT (
MAX(Balance)
FOR Type1 IN ([A],[B],[C])
) piv
) data
LEFT OUTER JOIN #table a on a.Type1 = 'A'
AND a.Entity = data.Entity AND a.Balance = [A-Balance]
LEFT OUTER JOIN #table b on b.Type1 = 'B'
AND b.Entity = data.Entity AND b.Balance = [B-Balance]
LEFT OUTER JOIN #table c on c.Type1 = 'C'
AND c.Entity = data.Entity AND c.Balance = [C-Balance]

how to get value x without code duplication

create table t(a int, b int);
insert into t values (1,1),(1,2),(1,3),(2,1),(2,2),(2,3),(3,1),(3,2),(3,3);
select * from t;
a | b
----------
1 | 1
1 | 2
1 | 3
2 | 1
2 | 2
2 | 3
3 | 1
3 | 2
3 | 3
select
max(case when a = 1 then b else 0 end) as q,
max(case when b = 1 then a else 0 end) as c,
(
max(case when a = 1 then b else 0 end)
+
max(case when b = 1 then a else 0 end)
) as x
from t
Is it possible to do something like this?
select
max(case when a = 1 then b else 0 end) as q,
max(case when b = 1 then a else 0 end) as c,
(q + c) as x
from t
You can't use the ALIAS that was given on the same level of the SELECT clause.
You have two choices:
by using the expression directly
query:
select
max(case when a = 1 then b else 0 end) as q,
max(case when b = 1 then a else 0 end) as c,
(max(case when a = 1 then b else 0 end) + max(case when b = 1 then a else 0 end)) as x
from t
by wrapping in a subquery
query:
SELECT q,
c,
q + c as x
FROM
(
select
max(case when a = 1 then b else 0 end) as q,
max(case when b = 1 then a else 0 end) as c
from t
) d
Also in SQLServer2005+ you can use CTE
;WITH cte AS
(
select max(case when a = 1 then b else 0 end) as q,
max(case when b = 1 then a else 0 end) as c
from t
)
SELECT q, c, q + c as x
FROM cte
You can't do that unfortunately.
The ALIAS can not be used in the same level where you created them.
A temporary table is necessary, i think.