So yesterday I learned about conditional aggregation. I'm fairly new to SQL.
Here is my query:
select
Year_CW,
sum(case when col = 0 then 1 else 0 end) as "Total_sampled(Checked)",
sum(case when col = 1 then 1 else 0 end) as "Total_unsampled(Not_Checked)",
sum(case when col = 0 AND col2 = 'accepted' then 1 else 0 end) as "Accepted",
sum(case when col = 0 AND col2 = 'accepted with comments' then 1 else 0 end) as "Accepted with comments",
sum(case when col = 0 AND col2 = 'request for rework' then 1 else 0 end) as "Request for rework",
sum(case when col = 0 AND col2 = 'rejected' then 1 else 0 end) as "Rejected",
sum(case when col = 0 Or col = 1 then 1 else 0 end) as "Total_DS"
from
(select
Year_CW, SAMPLED as col, APPROVAL as col2
from
View_TEST tv) tv
group by
Year_CW
order by
Year_CW desc
I'm basically just calculating some KPIs grouped by week.
Look at the row for "Total_DS". It is essentially the sum of the first two sums, "Total_sampled(Checked)" and "Total_unsampled(Not_Checked)".
Is there a way that I can add the two columns from the first two sums to get the third one instead of trying to get the data all over again? I feel performance wise this would be terrible practice. It doesn't matter for this database but I don't want to learn bad code practice from the start.
Thanks for helping.
You probably won't see a significant performance hit from what you're doing now as you already have all the data available, you're just repeating the case evaluation.
But you can't refer to the column aliases for the first two columns within the same level of query.
If you can't do a simple count as #Zeki suggested because you aren't sure if there might be values other than zero and one (though this looks rather like a binary true/false equivalent, so there may well be a check constraint limiting you to those values), or if you're just more interested in a more general case, you can use an inline view as #jarhl suggested:
select Year_CW,
"Total_sampled(Checked)",
"Total_unsampled(Not_Checked)",
"Accepted",
"Accepted with comments",
"Request for rework",
"Rejected",
"Total_sampled(Checked)" + "Total_unsampled(Not_Checked)" as "Total_DS"
from (
select Year_CW,
sum(case when col = 0 then 1 else 0 end) as "Total_sampled(Checked)",
sum(case when col = 1 then 1 else 0 end) as "Total_unsampled(Not_Checked)",
sum(case when col = 0 AND col2 = 'accepted' then 1 else 0 end) as "Accepted",
sum(case when col = 0 AND col2 = 'accepted with comments' then 1 else 0 end)
as "Accepted with comments",
sum(case when col = 0 AND col2 = 'request for rework' then 1 else 0 end)
as "Request for rework",
sum(case when col = 0 AND col2 = 'rejected' then 1 else 0 end) as "Rejected"
from (
select Year_CW, SAMPLED as col, APPROVAL as col2
from View_TEST tv
) tv
group by Year_CW
)
order by Year_CW desc;
The inner query gets the data and calculates the conditional aggregate values. The outer query just gets those values from the inner query, and also adds the Total_DS column to the result set by adding together the rwo values from the inner query.
You should generally avoid quoted identifiers, and if you really need them in your result set you should apply them at the last possible moment - so use unquoted identifiers in the inner query, and give them qupted aliases in the outer query. And personally if the point of a query is to count things, I prefer to use a conditional count over a conditional sum. I'm also not sure why you already have a subquery against your view, which just changes the column names and makes the main query slightly more obscure. So I might do this as:
select year_cw,
total_sampled_checked as "Total_sampled(Checked)",
total_unsampled_not_checked as "Total_unsampled(Not_Checked)",
accepted as "Accepted",
accepted_with_comments as "Accepted with comments",
request_for_rework as "Request for rework",
rejected as "Rejected",
total_sampled_checked + total_unsampled_not_checked as "Total_DS"
from (
select year_cw,
count(case when sampled = 0 then 1 end) as total_sampled_checked,
count(case when sampled = 1 then 1 end) as total_unsampled_not_checked,
count(case when sampled = 0 and approval = 'accepted' then 1 end) as accepted,
count(case when sampled = 0 and approval = 'accepted with comments' then 1 end)
as accepted_with_comments,
count(case when sampled = 0 and approval = 'request for rework' then 1 end)
as request_for_rework,
count(case when sampled = 0 and approval = 'rejected' then 1 end) as rejected
from view_test
group by year_cw
)
order by year_cw desc;
Note that in the case expression, then 1 can be then <anything that isn't null>, so you could do then sampled or whatever. I've left out the implicit else null. As count() ignores nulls, all the case expression has to do is evaluate to any not-null value for the rows you want to include in the count.
You can try below
select Year_CW,
sum(case when col = 0 then 1 else 0 end) as "Total_sampled(Checked)",
sum(case when col = 1 then 1 else 0 end) as "Total_unsampled(Not_Checked)",
sum(case when col = 0 AND col2 = 'accepted' then 1 else 0 end) as "Accepted",
sum(case when col = 0 AND col2 = 'accepted with comments' then 1 else 0 end) as "Accepted with comments",
sum(case when col = 0 AND col2 = 'request for rework' then 1 else 0 end) as "Request for rework",
sum(case when col = 0 AND col2 = 'rejected' then 1 else 0 end) as "Rejected",
sum(sum(case when col = 0 then 1 else 0 end) = 0 Or sum(case when col = 1 then 1 else 0 end) = 1 then 1 else 0 end) as "Total_DS"
from (select Year_CW, SAMPLED as col, APPROVAL as col2
from View_TEST tv
) tv
group by Year_CW
order by Year_CW desc
Related
How can I create a query that has multiple counter columns for the same field?
I have a field called card_status that can have 7 different values.
I wanted to create a query that would display total values on the same row and not on 7 different rows.
SELECT SUM(CASE WHEN card_status = 1 THEN 1 ELSE 0 END) as Count_of_1,
SUM(CASE WHEN card_status = 2 THEN 1 ELSE 0 END) as Count_of_2,
...
SUM(CASE WHEN card_status = 7 THEN 1 ELSE 0 END) as Count_of_7
FROM your_table;
You could use a conditional count
For example:
SELECT col1, col2
, COUNT(CASE WHEN card_status = 'revoked' THEN card_status END) AS TotalRevoked
, COUNT(CASE WHEN card_status = 'requested' THEN card_status END) AS TotalRequested
, COUNT(CASE WHEN card_status = 'lost' THEN card_status END) AS TotalLost
-- add more
, COUNT(*) AS Total
FROM YourTable t
GROUP BY col1, col2
ORDER BY col1, col2
This works on the principle that counting a column or expression doesn't count the NULL's
I need to count users that match certain conditions. To do that I need to join some tables and check if any of the grouping combination match the condition.
The way I implemented that now is by having a nested select that counts original matches and then counting the rows that have at least one result.
SELECT
COUNT(case when NestedCount1 > 0 then 1 else null end) as Count1,
COUNT(case when NestedCount2 > 0 then 1 else null end) as Count2,
COUNT(case when NestedCount3 > 0 then 1 else null end) as Count3
FROM
(SELECT
COUNT(case when Type = 1 then 1 else null end) as NestedCount1,
COUNT(case when Type = 2 then 1 else null end) as NestedCount2,
COUNT(case when Type = 2 AND Condition = 1 then 1 else null end) as NestedCount3
FROM [User]
LEFT JOIN [UserGroup] ON [User].Id = [UserGroup].UserId
LEFT JOIN [Group] ON [UserGroup].GroupId = [Group].Id
GROUP BY [User].Id) nested
What irks me is that the counts from the nested select are only used to check existence. However since ANY in SQL is only an operator I cannot think of a cleaner way on how to rewrite this.
The query returns correct results as is.
I'm wondering if there is any way to rewrite this that would avoid having intermediate results that are only used to check existence condition?
Sample imput User.csv Group.csv UserGroup.csv
Expected results: 483, 272, 121
It might be possible to simplify that query.
I think that the group on the UserId can be avoided.
By using distinct conditional counts on the user id.
Then there's no need for a sub-query.
SELECT
COUNT(DISTINCT case when [User].[Type] = 1 then [User].Id end) as Count1,
COUNT(DISTINCT case when [User].[Type] = 2 then [User].Id end) as Count2,
COUNT(DISTINCT case when [User].[Type] = 2 AND Condition = 1 then [User].Id end) as Count3
FROM [User]
LEFT JOIN [UserGroup] ON [UserGroup].UserId = [User].Id
LEFT JOIN [Group] ON [Group].Id = [UserGroup].GroupId;
SELECT
SUM(case when NestedCount1 > 0 then 1 else 0 end) as Count1,
SUM(case when NestedCount2 > 0 then 1 else 0 end) as Count2,
SUM(case when NestedCount3 > 0 then 1 else 0 end) as Count3
FROM
(
SELECT
[User].Id,
COUNT(case when Type = 1 then 1 else 0 end) as NestedCount1,
COUNT(case when Type = 2 then 1 else 0 end) as NestedCount2,
COUNT(case when Type = 2 AND Condition = 1 then 1 else 0 end) as NestedCount3
FROM [User]
LEFT JOIN [UserGroup] ON [UserGroup].UserId = [User].Id
LEFT JOIN [Group] ON [Group].Id = [UserGroup].GroupId
GROUP BY [User].Id
) nested
Is there a way to count a number of columns which has a particular value for each rows in Hive.
I have data which looks like in input and I want to count how many columns have value 'a' and how many column have value 'b' and get the output like in 'Output'.
Is there a way to accomplish this with Hive query?
One method in Hive is:
select ( (case when cl_1 = 'a' then 1 else 0 end) +
(case when cl_2 = 'a' then 1 else 0 end) +
(case when cl_3 = 'a' then 1 else 0 end) +
(case when cl_4 = 'a' then 1 else 0 end) +
(case when cl_5 = 'a' then 1 else 0 end)
) as count_a,
( (case when cl_1 = 'b' then 1 else 0 end) +
(case when cl_2 = 'b' then 1 else 0 end) +
(case when cl_3 = 'b' then 1 else 0 end) +
(case when cl_4 = 'b' then 1 else 0 end) +
(case when cl_5 = 'b' then 1 else 0 end)
) as count_b
from t;
To get the total count, I would suggest using a subquery and adding count_a and count_b.
Use lateral view with explode on the data and do the aggregations on it.
select id
,sum(cast(col='a' as int)) as cnt_a
,sum(cast(col='b' as int)) as cnt_b
,sum(cast(col in ('a','b') as int)) as cnt_total
from tbl
lateral view explode(array(ci_1,ci_2,ci_3,ci_4,ci_5)) tbl as col
group by id
I have query returning details of customers that are subscribed to channel xyz or all other channels.
To generate this results i am using the following query:
select customerID
,sum(case when channel='xyz' then 1 else 0 end) as 'xyz Count'
,sum(case when channel<>'xyz' then bundle_qty else 0 end) as 'Other'
From temptable
So my Question is, how do i Exclude customers that are subscribed to 2 channels, where one is xyz and one is another channel.
select customerID
,sum(case when channel='xyz' then 1 else 0 end) as 'xyz Count'
,sum(case when channel<>'xyz' then bundle_qty else 0 end) as 'Other'
From temptable
group by customerID
having sum(case when channel= 'xyz' then 1 else 0 end) > 0
and sum(case when channel<>'xyz' then 1 else 0 end) > 0
First, your query is not correct. It needs a group by. Second, you can do what you want using having:
select customerID,
sum(case when channel = 'xyz' then 1 else 0 end) as xyz_Count,
sum(case when channel<>'xyz' then bundle_qty else 0 end) as Other
From temptable
group by customerID
having count(*) = 2 and
sum(case when channel = 'xyz' then 1 else 0 end) = 1;
If customers can subscribe to the same channel multiple times, and you still want only "xyz" and another channel, then:
having count(distinct channel) = 2 and
(min(channel) = 'xyz' or max(channel) = 'xyz')
How to total count?
SELECT
COUNT(CASE WHEN SHP.id = 1 then 1 ELSE NULL END) as "New",
COUNT(CASE WHEN SHP.id = 2 then 5 ELSE NULL END) as "Accepted"
from SHP
RESULT:
NEW Accepted
1 5
But I need a total count
result: 6
I'd do something like this;
SELECT
COUNT(CASE WHEN id = 1 THEN 1 END) as New,
COUNT(CASE WHEN id = 2 THEN 5 END) as Accepted,
COUNT(CASE WHEN id = 1 THEN 1
WHEN id = 2 THEN 5 END) as Total
FROM SHP
This is exactly what the CASE statement should be used for, the logic is very simple. This will avoid having to perform multiple calculations on the same fields.
As a note, the value in your THEN statement isn't used in this instance at all, it's just doing a COUNT of the number rather than performing a SUM. I've also removed the ELSE NULL because this is what the CASE will do by default anyway.
If your intention was to SUM the values then do this;
SELECT
SUM(CASE WHEN id = 1 THEN 1 END) as New,
SUM(CASE WHEN id = 2 THEN 5 END) as Accepted,
SUM(CASE WHEN id = 1 THEN 1
WHEN id = 2 THEN 5 END) as Total
FROM SHP
Example
Assuming you have only two values in your database, 1 and 2, we can create test data like this;
CREATE TABLE #SHP (id int)
INSERT INTO #SHP (id)
VALUES (1),(2)
And use this query;
SELECT
SUM(CASE WHEN id = 1 then 1 END) as New,
SUM(CASE WHEN id = 2 then 5 END) as Accepted,
SUM(CASE WHEN id = 1 THEN 1
WHEN id = 2 THEN 5 END) as Total
FROM #SHP
Gives this result;
New Accepted Total
1 5 6
Try this:
SELECT
COUNT(CASE WHEN SHP.id = 1 then 1 ELSE NULL END) +
COUNT(CASE WHEN SHP.id = 2 then 5 ELSE NULL END) as "Total"
from SHP
You could wrap your query into a subquery and do something like this:
SELECT SUM(New) as New, Sum(Accepted) as Accepted, Sum(New + Accepted) as Total FROM
(SELECT
COUNT(CASE WHEN SHP.id = 1 then 1 ELSE NULL END) as "New",
COUNT(CASE WHEN SHP.id = 2 then 5 ELSE NULL END) as "Accepted"
from SHP) as SubQuery
That's if you don't want to duplicate doing the counts and just adding the two together.
try this
with s1 as(
SELECT
COUNT(CASE WHEN SHP.id = 1 then 1 ELSE 0 END) as "New"
from SHP
),s2 as
(
SELECT
COUNT(CASE WHEN SHP.id = 2 then 5 ELSE 0 END) as "Accepted"
from SHP
)
select sum("New"+ "Accepted") from s1,s2