Pivot not returning aggregate (SQL Adventureworks) - sql

I am using the Adventureworks sample database (HumanResources.Employee table is the database).
For simplicity I will take a sample of the table:
HireDate Gender SalariedFlag
-----------------------------------------
2014-01-01 M 1
2015-01-30 F 1
2014-01-30 M 1
2014-02-12 F 0
2014-03-11 F 1 and so on
Code:
SELECT
YEAR(hiredate), F, M
FROM
HumanResources.employee
PIVOT
(sum(SalariedFlag)
FOR gender IN ([F], [M])
) AS gg
-- Unable to use sum though since its bit field
Expected output:
Year F M
-------------------
2014 1 2 # count(SalariedFlag)
2015 1 0 # count(SalariedFlag)
But I really get:
No name F M
-------------------------------
2014 0 1
2015 1 0
2014 0 1
2014 1 0
2014 1 0
and so on.
So basically in output its not considering the salariedflag column at all, it is simply returning 1 in F if the person was female and 1 in M if person was male.
What am I doing wrong?

Firstly, counting the value of SalariedFlag isn't going to achieve anything. COUNT counts the number of rows with a non-NULL value, and all your rows have a non-NULL value. Instead you want to COUNT the number of rows where the value of SalariedFlag is 1.
You might, therefore, be able to SUM the column, however, as it's a "flag" it's more likely to be a bit and you can't SUM a bit. Therefore using COUNT and checking the value is 1 with a CASE would likely be better.
Personally, rather than using the restrictive PIVOT operator, I would suggest you use conditional aggregation. This gives you the following:
SELECT YEAR(HireDate) AS HireYear,
COUNT(CASE WHEN Gender = 'M' AND SalariedFlag = 1 THEN 1 END) AS M,
COUNT(CASE WHEN Gender = 'F' AND SalariedFlag = 1 THEN 1 END) AS F
FROM #YourTable
GROUP BY YEAR(HireDate);
db<>fiddle

Related

SQL Group by on multiple conditions on same table

enter image description hereI am trying to write an SQL query to get below output.
table has below data:
ID GENDER
10 M
10 F
10 F
20 F
20 M
Output:
ID Male Female
10 1 2
20 1 1
do i need to use case with group by. Can someone help here.
use simple group
select id,sum(case when GENDER = 'M' then 1 else 0 end) as Male,
sum(case when GENDER = 'F' then 1 else 0 end) as FeMale
from tablename
group by id

Use case after order by

I was reading an sql book, one of questions is:
Write a query against the Sales.Customers table that returns for each customer the customer ID and region. Sort the rows in the output by region, having NULL marks sort last (after non-NULL values).Note that the default sort behavior for NULL marks in T-SQL is to sort first (before non-NULL values).
And the answer is :
SELECT custid, region
FROM Sales.Customers
ORDER BY
CASE WHEN region IS NULL THEN 1 ELSE 0 END, region;
I can kind of get the idea but still confused, let's take the record with custid = 9 for instance:
since custid 9 has a null region, in the case cstatement return 1, so the query is sth like:
ORDER BY 1, region
which is equivalent to:
ORDER BY custid, region --because custid is the first column
so how come the custid 9 is not before custid 10(the second record in the output)? isn't that output needs to order by custid first, so 9 is before 10?
Your interpretation is incorrect. The 1 is simple a number, not a column reference.
The query is equivalent to:
SELECT custid, region
FROM (SELECT c.*,
(CASE WHEN region IS NULL THEN 1 ELSE 0 END) as region_is_null
FROM Sales.Customers c
) c
ORDER BY region_is_null, region;
This is an important distinction about numbers in the ORDER BY. The expression:
ORDER BY 1
refers to the first column. However,
ORDER BY 1 + 0
is simply a numeric expression that returns the constant 1 -- and will result in an error in SQL Server (which does not allow constants in ORDER BY).
so the query is sth like
ORDER BY 1, region
No this is incorrect. The expression CASE WHEN region IS NULL THEN 1 ELSE 0 END is evaluated per-row; and the 1 is a value instead of column position. Column position inside ORDER BY can only specified only as a literal and not as an expression. So this:
custid region
8 NULL
9 NULL
10 BC
42 BC
45 CA
Becomes:
custid region case...
8 NULL 1
9 NULL 1
10 BC 0
42 BC 0
45 CA 0
And the sorted results could be:
custid region case...
10 BC 0
42 BC 0
45 CA 0
8 NULL 1
9 NULL 1
Or:
custid region case...
42 BC 0
10 BC 0
45 CA 0
9 NULL 1
8 NULL 1
You can try below - in your case 0 will be comign first then 1 so you need to change the order of the value, or you can do desc order if you don't want to change the value
SELECT custid, region
FROM Sales.Customers
ORDER BY
CASE WHEN region IS NULL THEN 0 ELSE 1 END, region
The idea is to use CASE statement to create a calculate virtual column to mark the nulls as 0 and none nulls as 1 and then sort accordingly.
if you use 0 in the order by clause you will get an error because you don't have a column at position of 0, also if you reorder the selected columns the result will be the same.
so the output of case statement is not a position of column it's a calculated column.
customer_id region marker
not important if null 0
ORDER BY CASE
WHEN region IS NULL THEN
1
ELSE
0
END,
region
is not equivalent to
ORDER BY 1,
region
because in the second one the first column to sort by is always constant, whereas in the first it can change depending on the CASE.
And
ORDER BY 1,
region
is also not equivalent to
ORDER BY custid,
region
again in the first the 1 is constant but custid is variable.
What
ORDER BY CASE
WHEN region IS NULL THEN
1
ELSE
0
END,
region
does is to "generate" a new column to sort by depending on the content of region. That new column gets 1 when region is null 0 otherwise. If you imagine this new column in the table it would look like
custid | region | new column
...
10 | BC | 0
...
9 | NULL | 1
...
Now if this gets sorted by the new column and the region the customer with ID 10 comes before the customer with ID 9 because the one with ID 10 has the lower value for the new column -- 0 against the 1 from the customer with the ID 9.

SQL Server : how can I get difference between counts of total rows and those with only data

I have a table with data as shown below (the table is built every day with current date, but I left off that field for ease of reading).
This table keeps track of people and the doors they enter on a daily basis.
Table entrance_t:
id entrance entered
------------------------
1 a 0
1 b 0
1 c 0
1 d 0
2 a 1
2 b 0
2 c 0
2 d 0
3 a 0
3 b 1
3 c 1
3 d 1
My goal is to report on people and count entrances not used(grouping on people), but ONLY if they entered(entered=1).
So using the above table, I would like the results of query to be...
id count
----------
2 3
3 1
(id=2 did not use 3 of the entrances and id=3 did not use 1)
I tried queries(some with inner joins on two instances of same table) and I can get the entrances not used, but it's always for everybody. Like this...
id count
----------
1 4
2 3
3 1
How do I not display results id=1 since they did not enter at all?
Thank you,
You could use conditional aggregation:
SELECT id, count(CASE WHEN entered = 0 THEN 1 END) AS cnt
FROM entrance_t
GROUP BY id
HAVING count(CASE WHEN entered = 1 THEN 1 END) > 0;
DBFiddle Demo

SQL Aggregation prior to advanced SQL

Given a simple case like this:
Gender DeptID
--------------
M 1
F 1
M 2
F 2
F 2
What SQL statement I need to write If I want to generate the following result using SQL without using advanced CUBE, Rollup, etc., just using plain SQL-92:
GenderSum Dept1Sum Dept2Sum
----------------------------
M 1 1
F 1 2
I was wondering how such information would be generated by ETL in the past using SQL?
Remark: It is possible to use Group by on gender and union that with a group by on DetptId to get a vertical result set but this is clearly not what I want.
You can try the following pivot query:
SELECT Gender AS GenderSum,
SUM(CASE WHEN DeptID = 1 THEN 1 ELSE 0 END) AS Dept1Sum,
SUM(CASE WHEN DeptID = 2 THEN 1 ELSE 0 END) AS Dept2Sum
FROM yourTable
GROUP BY Gender

Very special kind of AVG statement

Table example:
time a b c
-------------
12:00 1 0 1
12:00 2 3 1
13:00 3 2 1
13:00 3 3 3
14:00 1 1 1
How can I get AVG(a) from row WHERE b!=0 and AVG(c) grouped by time. Is it possible to solve with sql only? I mean that query should not count 1st row to get AVG(a), but not the same with AVG(c).
You can utilize CASE statements to get conditional aggregates:
SELECT AVG(CASE WHEN b != 0 THEN a END)
,AVG(c)
FROM YourTable
GROUP BY time
Demo: SQL Fiddle
This works because a value not captured by WHEN criteria in a CASE statement will default to NULL, and NULL values are ignored by aggregate functions.
SELECT AVG(a), AVG(c) from table WHERE b != 0
group by time
Yea... is this what you need?
You might want to try something like
SELECT T.tTIME
, AVG(CASE WHEN T.B != 0 THEN T.A END)
, AVG(T.C)
FROM #T T
GROUP BY T.tTIME
The output is the following:
tTIME (No column name) (No column name)
12:00:00.0000000 2 1
13:00:00.0000000 3 2
14:00:00.0000000 1 1