how to transpose rows to columns for the grouped data? - sql

While doing the employees and supervisors analysis, I got in trouble with the BigQuery statements.
SELECT SupervisorName, Emp_Status, COUNT(DISTINCT EmpNO)AS NUMBER
FROM
(SELECT
EmpNO,
EmpName,
(CASE WHEN TerminationDate IS NULL THEN 'Active'
ELSE 'Terminated'
END
)AS Emp_Status,
DateOfBirth,
DATE_DIFF(CURRENT_DATE(),DateOfBirth,YEAR) AS Age,
SupervisorName
FROM `Table1`
)
GROUP BY SupervisorName, Emp_Status
ORDER BY SupervisorName, NUMBER DESC
The result is shown below:
Row SupervisorName Emp_Status NUMBER
1 null Terminated 321
2 null Active 2
3 Ahearn Active 3
4 Ahearn Terminated 2
5 Allen Active 6
6 Allen Terminated 3
......
How can I change it to like this:
Row SupervisorName Active Termination Total
1 Null 2 321 323
2 Ahearn 3 2 5
3 Allen 6 3 9
......

The standard pattern here is to use SUM and CASE to get the result -- as below:
SELECT
SupervisorName,
SUM(CASE WHEN Emp_Status = 'Active' THEN 1 ELSE 0 END) AS Active,
SUM(CASE WHEN Emp_Status = 'Terminated' THEN 1 ELSE 0 END) AS Termination,
COUNT(*) AS Total
FROM (
SELECT
EmpNO,
EmpName,
CASE WHEN TerminationDate IS NULL THEN 'Active' ELSE 'Terminated' END AS Emp_Status,
DateOfBirth,
DATE_DIFF(CURRENT_DATE(),DateOfBirth,YEAR) AS Age,
SupervisorName
FROM `Table1`
)
GROUP BY SupervisorName
Note, I left the same sub-query you had, but as given you don't actually need a sub-query, you can just change the CASE statement to look at termination date instead of the string you created in the sub-query.
I assume your actual code is more complicated so I left it like this.

maybe that's what you want
select
SupervisorName,
count(distinct if(TerminationDate is null, EmpNO, null)) as active,
count(distinct if(TerminationDate is null, null, EmpNO)) as terminated,
count(distinct EmpNO) as dist_total,
count(*) as total
from
`Table1`
where
-- you should use keyword "date" and iso8601 format
LastHireDate between date'2018-01-01'
and current_date()
group by
1
order by
1, 4 desc

Related

Retrieve a report on any duplicate rows of data in the emp table along with the count of -- the number of times that row of data is duplicated

I have EMP table as follows:
CREATE TABLE EMP
(
[ID] INT NOT NULL PRIMARY KEY,
[MGR_ID] INT,
[DEPT_ID] INT,
[NAME] VARCHAR(30),
[SAL] INT,
[DOJ] DATE
);
I need to retrieve a report on any duplicate rows of data in the emp table along with the count of -- the number of times that row of data is duplicated.
I partially solved this:
This query returns a singe instance of each of the duplicated rows
SELECT [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ]
from EMP
group by [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ]
having count(*) > 1
the output will be:
MGR_ID DEPT_ID NAME SAL DOJ
NULL 2 Hash 100 2012-01-01
1 2 Robo 100 2012-01-01
2 1 Privy 50 2012-05-01
I still need to group this output by the number of times each of these rows are duplicated in the EMP table.
I tried this:
WITH CTE
AS
(
SELECT * from EMP A
join ( SELECT [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ]
from EMP
group by [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ]
having count(*) > 1 ) B
on a.[MGR_ID] = b.[MGR_ID]
OR a.[MGR_ID] != b.[MGR_ID]
AND a.[DEPT_ID] = b.[DEPT_ID]
AND a.[NAME] = b.[NAME]
AND a.[SAL] = b.[SAL]
AND a.[DOJ] = b.[DOJ]
)
SELECT [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ], DENSE_RANK() OVER
(PARTITION BY [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ] ORDER BY DUPICATES) AS [DUPLICATES]
FROM CTE
But I got this error:
Msg 8156, Level 16, State 1, Line 1
The column 'MGR_ID' was specified multiple times for 'CTE'.
Please help.
The solution was partially found, except from I still need to do return MRG_ID column in the output for 3 records where it is = NULL
with cte as
(
SELECT A.[DEPT_ID],A.[NAME],A.[SAL],A.[DOJ] from EMP A
join ( SELECT [DEPT_ID],[NAME],[SAL],[DOJ]
from EMP
group by [DEPT_ID],[NAME],[SAL],[DOJ]
having count(*) > 1 ) B
ON a.[DEPT_ID] = b.[DEPT_ID]
AND a.[NAME] = b.[NAME]
AND a.[SAL] = b.[SAL]
AND a.[DOJ] = b.[DOJ]
)
SELECT [DEPT_ID],[NAME],[SAL],[DOJ], DENSE_RANK() OVER
(PARTITION BY [NAME] ORDER BY [NAME] DESC) AS [DUPLICATES], RANK() OVER
(PARTITION BY [NAME] ORDER BY [NAME] DESC) AS [SimpleRank]
FROM CTE
DEPT_ID NAME SAL DOJ DUPLICATES SimpleRank
2 Hash 100 2012-01-01 1 1
2 Hash 100 2012-01-01 1 1
2 Hash 100 2012-01-01 1 1
1 Privy 50 2012-05-01 1 1
1 Privy 50 2012-05-01 1 1
1 Privy 50 2012-05-01 1 1
2 Robo 100 2012-01-01 1 1
2 Robo 100 2012-01-01 1 1
2 Robo 100 2012-01-01 1 1
much
The final solution appears to be much easier:
Select [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ], count(name) From EMP group by [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ] having Count(Name) >1
It produces this result set
MGR_ID DEPT_ID NAME SAL DOJ Count_Of_ Duplicated_Rows
NULL 2 Hash 100 2012-01-01 3
1 2 Robo 100 2012-01-01 3
2 1 Privy 50 2012-05-01 3
Note: This will work only if you group by column that is duplicated.
The example below is based on previous more complex query, but it validates all the fields in the row, in comparison to the simple query above that checks condition of a one particular column that you are grouping the query by.
WITH CTE
AS
(
SELECT A.[MGR_ID], A.[DEPT_ID], A.[NAME], A.[SAL], A.[DOJ]
FROM EMP A
JOIN (SELECT [MGR_ID], [DEPT_ID], [NAME], [SAL], [DOJ]
FROM EMP
GROUP BY [MGR_ID], [DEPT_ID], [NAME], [SAL], [DOJ]
HAVING count(*) > 1) B
ON a.[MGR_ID] = b.[MGR_ID]
AND a.[DEPT_ID] = b.[DEPT_ID]
AND a.[NAME] = b.[NAME]
AND a.[SAL] = b.[SAL]
AND a.[DOJ] = b.[DOJ]
)
SELECT [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ],
count(*) As Count_Of_Duplicated_Rows
FROM EMP
GROUP BY [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ]
--HAVING Count(*) >1
Your problem is that you do not explicitly name the selected columns inside your CTE. Since both EMP and the subquery have a column called MGR_ID, doing select * on the join returns the column MGR_ID twice. According to MSDN, this is not allowed:
The list of column names is optional only if distinct names for all resulting columns are supplied in the query definition.
Note that you will encounter the same error for each pair of columns that exists on both sides of the join. To resolve this, you can either explicitly name the columns returned by the CTE in a column list with an alias for the repeated columns, like so:
WITH CTE (mgr_id,dept_id,name,sal,doj,mgr_id2,...) //mgr_id2 is an alias for b.mgr_id
AS
...
You can refer to this SQLFiddle for a demo. Remove the column list and you will see the same error you see now.
Alternatively, you can specify the columns to be selected in the CTE itself, I would recommend this since you don't actually need any repeated columns in your query:
;with cte as
(
SELECT A.[MGR_ID],A.[DEPT_ID],A.[NAME],A.[SAL],A.[DOJ] from EMP A
join ( SELECT [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ]
from EMP
group by [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ]
having count(*) > 1 ) B
...
try this
WITH CTE
AS
(
SELECT a.* from EMP A
join ( SELECT [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ]
from EMP
group by [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ]
having count(*) > 1 ) B
on a.[MGR_ID] = b.[MGR_ID]
--OR a.[MGR_ID] != b.[MGR_ID]
AND a.[DEPT_ID] = b.[DEPT_ID]
AND a.[NAME] = b.[NAME]
AND a.[SAL] = b.[SAL]
AND a.[DOJ] = b.[DOJ]
),cte2 as(
SELECT [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ], DENSE_RANK() OVER
(PARTITION BY [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ] ORDER BY [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ]) AS [DUPLICATES]
FROM CTE )
select [MGR_ID],[DEPT_ID],[NAME],[SAL],[DOJ] from cte2 where DUPLICATES=1

SQL Query to fetch employee Attendence

I need to write query on employee table to fetch the employee with employee ID & how many days he is present absent & half-day for given date range.
Employee
AID EmpID Status Date
1 10 Present 17-03-2015
2 10 Absent 18-03-2015
3 10 HalfDay 19-03-2015
4 10 Present 20-03-2015
5 11 Present 21-03-2015
6 11 Absent 22-03-2015
7 11 HalfDay 23-03-2015
Expected Output will be :
EmpID Present Absent HalfDay
10 2 1 1
11 1 1 1
Can you please help me with the Sql query ?
Here Is the query I tried
SELECT EMP.EMPID,
(CASE WHEN EMP.STATUS = 'Present' THEN COUNT(STATUS) ELSE 0 END) Pres,
(CASE WHEN EMP.STATUS = 'Absent' THEN COUNT(STATUS) ELSE 0 END) ABSENT,
(CASE WHEN emp.status = 'HalfDay' THEN Count(status) ELSE 0 END) HalfDay
FROM EMPLOYEE EMP GROUP BY emp.empid
The COUNT() function tests if the value is NOT NULL. Therefore it will always increment for both sides of a CASE statement like this:
COUNT(CASE Status WHEN 'Present' THEN 1 ELSE 0) AS Present
So we need to use SUM() ...
select empid,
sum(case when status='Present' then 1 else 0 end) present_tot,
sum(case when status='Absent' then 1 else 0 end) absent_tot,
sum(case when status='HalfDay' then 1 else 0 end) halfday_tot
from employee
group by empid
order by empid
/
... or use COUNT() with a NULL else clause. Both produce the same output, perhaps this one is clearer:
SQL> select empid,
2 count(case when status='Present' then 1 end) present_tot,
3 count(case when status='Absent' then 1 end) absent_tot,
4 count(case when status='HalfDay' then 1 end) halfday_tot
5 from employee
6 group by empid
7 order by empid
8 /
EMPID PRESENT_TOT ABSENT_TOT HALFDAY_TOT
---------- ----------- ---------- -----------
10 2 1 1
11 1 1 1
SQL>
Note that we need to use ORDER BY to guarantee the order of the result set. Oracle introduced a hashing optimization for aggregations in 10g which meant GROUP BY rarely returns a predictable sort order.
Replace 0 with null because it would be also come in count and added the where clause for date range, check the example below:
select empID,
count(case when status='Present' then 1 else null end) Present_Days,
count(case when status='Absent' then 1 else null end) Absent_Days,
count(case when status='HalfDay' then 1 else null end) HalfDays
from Employee
where date >= to_date('17mar2015') and date <= to_date('23mar2015')
group by empID

select from 2 tables with multiple counts

I have 2 tables I'm trying to join in a select query.
Table 1: Store, primary_key(id,store_num)
store_id store_num due_date manager_id
1 100 06-30-2024 user1
2 108 06-30-2018 user2
3 109 13-31-2014 user3
Table 2: Department, where status(A-applied,p-Pending)
store_id store_num dept_num status
1 100 201 A
1 100 202 A
1 100 203 P
1 100 204 A
1 100 205 P
1 100 206 A
Expecting to select store_id, store_num, due_date, manager_id, Applied count, pending count. The result is something looks like this.
store_id store_num due_date manager_id applied_count pending_count
1 100 06-30-2024 user1 4 2
I tried it and got where I am able to join and get it in multiple rows, But counts not working out for me. can some one help me how I can get the counts
select
store.store_id,
store.store_num,
store.due_date,
store.manager_id,
dept.status
from store as store
inner join department as dept on store.store_id = dept.store_id
and store.store_num = dept.store_num
Your query is half way done. You need to do an aggregation to get the values in different columns. This is a conditional aggregation, as shown here:
select s.store_id, s.store_num, s.due_date, s.manager_id,
sum(case when d.status = 'A' then 1 else 0 end) as Active_Count,
sum(case when d.status = 'P' then 1 else 0 end) as Pending_Count
from store s inner join
department as dept
on s.store_id = d.store_id and s.store_num = d.store_num
group by store.store_id, store.store_num, store.due_date, store.manager_id;
The expression:
sum(case when d.status = 'A' then 1 else 0 end) as Active_Count,
Is counting the rows where status = 'A'. It does so by assigning such rows a value of 1 and then summing up that value.

Oracle SQL- Getting "Distinct" values within a "CASE" query

I have the following query in ORACLE SQL:
Select
Trunc(Cs.Create_Dtime),
Count(Case When Cs.Cs_Listing_Id Like '99999999%' Then (Cs.Player_Id) End) As Sp_Dau,
Count(Case When Cs.Cs_Listing_Id Not Like '99999999%' Then (Cs.Player_Id) End) As Cs_Dau
From
Player_Chkin_Cs Cs
Where
Trunc(Cs.Create_Dtime) >= To_Date('2012-Jan-01','yyyy-mon-dd')
Group By Trunc(Cs.Create_Dtime)
Order By 1 ASC
I added "Distinct" just before "case" for each count. I just want to make sure that this only returns all of the distinct player_Ids in each case. Can some one confirm? Thank you! Here is the final query:
Select
Trunc(Cs.Create_Dtime),
Count(Distinct Case When Cs.Cs_Listing_Id Like '99999999%' Then (Cs.Player_Id) End) As Sp_Dau,
Count(Distinct Case When Cs.Cs_Listing_Id Not Like '99999999%' Then (Cs.Player_Id) End) As Cs_Dau
From
Player_Chkin_Cs Cs
Where
Trunc(Cs.Create_Dtime) >= To_Date('2012-Jan-01','yyyy-mon-dd')
Group By Trunc(Cs.Create_Dtime)
Order By 1 ASC;
A simple test case for you to prove count(distinct ... returns only distinct values:
11:34:09 HR#vm_xe> select department_id, count(*) from employees group by department_id order by 2 desc;
DEPARTMENT_ID COUNT(*)
------------- ----------
50 45
80 34
100 6
30 6
60 5
90 3
20 2
110 2
40 1
10 1
1
70 1
12 rows selected.
Elapsed: 00:00:00.03
11:34:12 HR#vm_xe> select count(department_id) "ALL", count(distinct department_id) "DISTINCT" from employees;
ALL DISTINCT
---------- ----------
106 11
1 row selected.
Elapsed: 00:00:00.02
11:34:20 HR#vm_xe>

AVG and Count from one columns

I Have a table by this Columns :
[Student_ID],[Class_ID],[Techer_ID],[Course_ID],[Marks]
and for range of marks exist name for example:
between 0 to 5 = D
between 6 to 10 = C
between 11 to 15 = B
between 16 to 20 = A
Now i need create T-Sq l Query for Return this result message columns:
Teacher_ID|Course_ID|Count(Marks)|Count(A)| Count(B)|Count(C)|Count(D)
Very thanks for your help
select Teacher_ID
, Course_ID
, count(*)
, sum(case when Marks between 16 and 20 then 1 end) as SumA
, sum(case when Marks between 11 and 15 then 1 end) as SumB
, sum(case when Marks between 6 and 10 then 1 end) as SumC
, sum(case when Marks between 0 and 5 then 1 end) as SumD
from YourTable
group by
Teacher_ID
, Course_ID
I would use the same approach as Andomar, only change sum to count like this:
select Teacher_ID
, Course_ID
, count(*)
, count(case when Marks between 16 and 20 then 1 end) as countA
, count(case when Marks between 11 and 15 then 1 end) as countB
, count(case when Marks between 6 and 10 then 1 end) as countC
, count(case when Marks between 0 and 5 then 1 end) as countD
from YourTable
group by
Teacher_ID
, Course_ID
In my opinion, the query looks more natural this way.
You can do this with PIVOT. (Note also that this formulation of the Marks-to-Letter calculation is a bit safer than one where both ends of each range must be typed.)
with T as (
select
Teacher_ID,
Course_ID,
case
when Marks <= 5 then 'countD'
when Marks <= 10 then 'countC'
when Marks <= 15 then 'countB'
else 'countA' end as Letter
from T
)
select
Teacher_ID,
Course_ID,
countD+countC+countB+countA as countMarks,
countA,
countB,
countC,
countD
from T pivot (
count(Letter) for Letter in ([countA],[countB],[countC],[countD])
) as P