How to aggregate unioned tables with count and dummy values? - sql

I am counting in two tables some stuff and want one aggregated result (one row). I write this SQL for this purpose:
SELECT sum (Amount_New) Amount_New,
sum(Import_Dropout) Import_Dropout,
sum(Import)Import,
sum(Processing)Processing,
sum(Processing_Dropout)Processing_Dropout,
sum(Matching)Matching,
sum(Matching_Dropout)Matching_Dropout,
sum(Export)Export,
sum(Exported)Exported,
sum(Rejected)Rejected,
sum(AmountSubTotal)AmountSubTotal,
sum(AmountTotal)AmountTotal
FROM (
SELECT COUNT(CASE WHEN ProcessStatus='_New' THEN 1 ELSE null END) AS Amount_New,
COUNT(CASE WHEN ProcessStatus='Import_Dropout' THEN 1 ELSE null END) AS Import_Dropout,
COUNT(CASE WHEN ProcessStatus='Import' THEN 1 ELSE null END) AS Import,
COUNT(CASE WHEN ProcessStatus='Processing' THEN 1 ELSE null END) AS Processing,
COUNT(*) AS AmountTotal,
0 as Processing_Dropout,
0 as Matching,
0 as Matching_Dropout,
0 as Export,
0 as Export_Dropout,
0 as Exported,
0 as Rejected,
0 as AmountSubTotal,
0 as UnionOrder
FROM "fileimport$marketscanimportcsv"
WHERE ft_boekdat like '%2018%'
UNION
SELECT 0 AS Amount_New,
0 AS Import_Dropout,
0 AS Import,
COUNT(CASE WHEN ProcessStatus='Processing_Dropout' THEN 1 ELSE null END) AS Processing_Dropout,
COUNT(CASE WHEN ProcessStatus='Processing' THEN 1 ELSE null END) AS Processing,
COUNT(CASE WHEN ProcessStatus='Matching' THEN 1 ELSE null END) AS Matching,
COUNT(CASE WHEN ProcessStatus='Matching_Dropout' THEN 1 ELSE null END) AS Matching_Dropout,
COUNT(CASE WHEN ProcessStatus='Export' THEN 1 ELSE null END) AS Export,
COUNT(CASE WHEN ProcessStatus='Export_Dropout' THEN 1 ELSE null END) AS Export_Dropout,
COUNT(CASE WHEN ProcessStatus='Exported' THEN 1 ELSE null END) AS Exported,
COUNT(CASE WHEN ProcessStatus='Rejected' THEN 1 ELSE null END) AS Rejected,
COUNT(CASE WHEN ProcessStatus!= '_New' and ProcessStatus!= 'Import_Dropout' and ProcessStatus!= 'Import' THEN 1 ELSE null END) AS AmountSubTotal,
COUNT(*) AS AmountTotal,
1 as UnionOrder
FROM "matching$marketscanmovement"
WHERE date_part ('year', BookingDate_BA_MS)= 2018
) SK GROUP by unionorder order by unionorder asc
1) The result is 2 rows, which is not one value as total of that column.
Why is this query not summarizing the unioned same column values? How should it be written?
2) When I try sum "(Amount_New1+Amount_New2) Amount_New" (and changing the subquery column names to amount_new1 / amount_new2) it is not working neither. Why?

If you want to return a single row, conceptually representing an aggregate over the entire union table, then remove GROUP BY:
SELECT SUM(Amount_New) Amount_New,
SUM(Import_Dropout) Import_Dropout,
SUM(Import) Import,
SUM(Processing) Processing,
SUM(Processing_Dropout) Processing_Dropout,
SUM(Matching) Matching,
SUM(Matching_Dropout) Matching_Dropout,
SUM(Export) Export,
SUM(Exported) Exported,
SUM(Rejected) Rejected,
SUM(AmountSubTotal) AmountSubTotal,
SUM(AmountTotal) AmountTotal
FROM
(
-- ... your current query
) SK;
You also may want to remove the UnionOrder computed column, since there is no need to sort a single row result set.

Related

Write Hive queries to see how many missing values you have in each attribute

I want write hive query such that it I can see count of null values of each column
You can use this SQL - this will give you total count, null and not null count.
SELECT
count(*) total_cnt,
sum(case when data_col is null then 1 else 0 end) null_cnt,
sum(case when data_col is null then 0 else 1 end) nonnull_cnt
From mytable

GROUP BY SUM CASE expression

I want to group by account number, but I am running into problems if I get multiple RATE_CD's for an account - I get a NONCOMPLIANT_CNT of 2, but I want it to be only 1 per account even if there is more than 1 RATE_CD.
Below is the SQL I'm playing around with, any ideas on how I can return the NONCOMPLIANT_CNT per account, and not roll up the count if there is more than 1 RATE_CD?
SELECT ID
,ACCOUNT_NBR SUM(CASE
WHEN GROUP_CD = 'RED'
AND TYPE_CD IN ('CHK')
THEN 1
ELSE 0
END) AS 'COMPLIANT_CNT'
,SUM(CASE
WHEN GROUP_CD = 'RED'
AND TYPE_CD IN (
'CN'
,'RN'
)
AND RATE_CD <> 'BLK'
THEN 1
ELSE 0
END) AS 'NONCOMPLIANT_CNT'
,SUM(CASE
WHEN GROUP_CD = 'RED'
AND TYPE_CD IN (
'CN'
,'RN'
,'CHK'
)
THEN 1
ELSE 0
END) AS 'TOTAL_CNT'
FROM DETAIL
LEFT OUTER JOIN RATE_LOOKUP ACCOUNT_NBR = ACCOUNT_NBR
GROUP BY ID
,ACCOUNT_NBR
,RATE_CD
If you only want 1 instead of how many actual, change your SUM() to MAX(). So if they have 5 entries, it would still show as at least 1, otherwise will be 0 for the given column aggregate.

select query result filter using if conditions

I have folllowing select query
SELECT
Table.ID
SUM(CASE WHEN Table.Status = 1 THEN 1 ELSE null END) AS NormalCount,
SUM(CASE WHEN Table.status = 2 THEN 1 ELSE null END) AS AbnormalCount
FROM Table
GROUP BY Table.ID
I want to get above results and generate new result set with following conditions
IF(NormalCount > 0 or AbnormalCount == NULL)
SELECT
Table.ID
Table.Status AS "Normal"
FROM Table
GROUP BY Table.ID
ELSE IF ( AbnormalCount > 0)
SELECT
Table.ID
Table.Status AS "Abnormal"
SUM(CASE WHEN Header.status = 2 THEN 1 ELSE null END) AS AbnormalCount
FROM Table
GROUP BY Table.ID
I think the logic you want is to label each ID group as being abnormal if it has one or more abnormal observation. If so, then you can use another CASE statement to check the conditional abnormal sum and label the status appropriately. Normal groups would have the characteristic of having an abnormal count of zero, but this count would appear for all groups.
SELECT t.ID,
CASE WHEN SUM(CASE WHEN t.status = 2 THEN 1 ELSE 0 END) > 0
THEN "Abnormal"
ELSE "Normal" END AS Status,
SUM(CASE WHEN t.status = 2 THEN 1 ELSE 0 END) AS AbnormalCount
FROM Table t
GROUP BY t.ID

Count rows for two columns using two different clauses

I'm after a CTE which I want to return two columns, one with the total number of 1's and one with the total number of 0's. Currently I can get it to return one column with the total number of 1's using:
WITH getOnesAndZerosCTE
AS (
SELECT COUNT([message]) AS TotalNo1s
FROM dbo.post
WHERE dbo.checkletters([message]) = 1
--SELECT COUNT([message]) AS TotalNo0s
--FROM dbo.post
--WHERE dbo.checkletters([message]) = 0
)
SELECT * FROM getOnesAndZerosCTE;
How do I have a second column called TotalNo0s in the same CTE which I have commented in there to show what I mean.
Using conditional aggregation:
WITH getOnesAndZerosCTE AS(
SELECT
TotalNo1s = SUM(CASE WHEN dbo.checkletters([message]) = 1 THEN 1 ELSE 0 END),
TotalNo0s = SUM(CASE WHEN dbo.checkletters([message]) = 0 THEN 1 ELSE 0 END)
FROM post
)
SELECT * FROM getOnesAndZerosCTE;
For using COUNT() directly just be aware that it counts any NON-NULL values. You can omit the ELSE condition which implicitly returns NULL if not stated
SELECT
COUNT(CASE WHEN dbo.checkletters([message]) = 1 THEN 1 END) TotalNo1s
, COUNT(CASE WHEN dbo.checkletters([message]) = 0 THEN 1 END) TotalNo0s
FROM post
or, explicitly state NULL
SELECT
COUNT(CASE WHEN dbo.checkletters([message]) = 1 THEN 1 ELSE NULL END) TotalNo1s
, COUNT(CASE WHEN dbo.checkletters([message]) = 0 THEN 1 ELSE NULL END) TotalNo0s
FROM post
You can do it without CTE
select count(message) total,
dbo.checkletters(message) strLength
from post
group by dbo.checkletters(message)
having dbo.checkletters(message) in (1, 2) //All the messages with length 1 or 2

case statement doesn't go to else

I am wondering why the following query doesn't give 'N/A' when there are no rows for ENVIRON='Dev/Int'. It is returning null in the result of the query. I tried doing NVL(COUNT(*)) but that does't work either.
Any thoughts?
Thanks in advance.
SELECT G1.NAME,
(SELECT CASE
WHEN COUNT(*) > 0 AND ticket IS NOT NULL THEN 'Solved'
WHEN COUNT(*) > 0 AND ticket IS NULL THEN 'Done'
ELSE 'N/A'
END
FROM TABLE1
WHERE ENVIRON='Dev/Int' AND G1.NAME=NAME GROUP BY ENVIRON, ticket ) "Dev/Int"
FROM TABLE1 G1 group by G1.NAME
It doesn't give any rows because you are filtering them all out. The case is inside the query. When there are no rows to process, it returns NULL.
I think you just want conditional aggregation. The subqueries don't seem necessary:
SELECT G1.NAME,
(CASE WHEN SUM(CASE WHEN ENVIRON = 'Dev/Int' then 1 else 0 END) > 0 AND ticket IS NOT NULL
THEN 'Solved'
WHEN SUM(CASE WHEN ENVIRON = 'Dev/Int' then 1 else 0 END) > 0 AND ticket IS NULL
THEN 'Done'
ELSE 'N/A'
END) as "Dev/Int"
FROM TABLE1
group by G1.NAME;
EDIT:
Oops, the above left ticket out of the sum(). I think the logic you want has ticket in the sum() condition:
SELECT G1.NAME,
(CASE WHEN SUM(CASE WHEN ENVIRON = 'Dev/Int' AND ticket IS NOT NULL then 1 else 0 END) > 0
THEN 'Solved'
WHEN SUM(CASE WHEN ENVIRON = 'Dev/Int' AND ticket IS NULL then 1 else 0 END) > 0
THEN 'Done'
ELSE 'N/A'
END) as "Dev/Int"
FROM TABLE1
group by G1.NAME;
I'm surprised your original query worked at all and didn't get an error of the sort that subquery returned more than one row.