INNER JOIN with aggregate functions in my SELECT - sql

I'm trying to join a new column to my current query that uses aggregate functions. I create this column with a new query that also uses an aggregate function from a different table but I'm not sure if a JOIN will work for me since I need to join it to its respective row.
TABLE A (employees that are enrolled or were enrolled in a project)
ID
DEPARTMENT
ENROLLED
PROJECT
1
MARKETING
Yes
ARQ
2
MARKETING
Yes
TC
3
MARKETING
No
ARQ
4
MARKETING
No
TC
5
FINANCE
Yes
ARQ
6
FINANCE
Yes
TC
7
FINANCE
No
ARQ
8
FINANCE
Yes
TC
This table has more departments and more projects, but I simplified.
TABLE B (relation with departments and employees)
ID
DEPARTMENT
TOTAL_EMPLOYEES
1
MARKETING
2
2
MARKETING
3
3
FINANCE
4
4
FINANCE
8
In my first query I was asked to achieve the following result - using only table A:
(employees enrolled) (employees not enrolled)
DEPARTMENT
ARQ_E
TC_E
TOTAL_ENROLLED
ARQ_N
TC_N
TOTAL_NOT_ENROLLED
TOTAL
MARKETING
1
1
2
1
1
2
4
FINANCE
1
1
2
1
1
2
4
Using the following query:
SELECT tableA.department,
sum(case when enrolled = 'Yes' and tableA.project = 'ARQ' then 1 else 0 end) as ARQ_E,
sum(case when enrolled = 'Yes' and tableA.project = 'TC' then 1 else 0 end) as TC_E,
sum(case when enrolled = 'Yes' then 1 else 0 end) as TOTAL_ENROLLED,
sum(case when enrolled != 'Yes' and tableA.project = 'ARQ' then 1 else 0 end) as ARQ_N,
sum(case when enrolled != 'Yes' and tableA.project = 'TC' then 1 else 0 end) as TC_N,
sum(case when enrolled != 'Yes' then 1 else 0 end) as TOTAL_NOT_ENROLLED,
count (*) AS Total
FROM tableA
GROUP BY tableA.department;
My second query gets departments and their total employees from table B:
DEPARTMENT
TOTAL_EMPLOYEES
MARKETING
5
FINANCE
12
Using the following query:
SELECT tableB.department,
sum(tableB.total_employees) AS TOTAL_EMPLOYEES
FROM tableB
GROUP BY tableB.department;
I need to add the column TOTAL_EMPLOYEES to my first query, next to TOTAL will be TOTAL_EMPLOYEES. But it has to be placed with its respective department row. I need this to compare this 2 columns and see how many employees were not assigned to any project.
This is my expected result.
(employees enrolled) (employees not enrolled)
DEPARTMENT
ARQ_E
TC_E
TOTAL_ENROLLED
ARQ_N
TC_N
TOTAL_NOT_ENROLLED
TOTAL
T_EMPL
MARKETING
1
1
2
1
1
2
4
5
FINANCE
1
1
2
1
1
2
4
12
I have tried to achieve this using the following query:
SELECT tableA.department,
sum(case when enrolled = 'Yes' and tableA.project = 'ARQ' then 1 else 0 end) as ARQ_E,
sum(case when enrolled = 'Yes' and tableA.project = 'TC' then 1 else 0 end) as TC_E,
sum(case when enrolled = 'Yes' then 1 else 0 end) as TOTAL_ENROLLED,
sum(case when enrolled != 'Yes' and tableA.project = 'ARQ' then 1 else 0 end) as ARQ_N,
sum(case when enrolled != 'Yes' and tableA.project = 'TC' then 1 else 0 end) as TC_N,
sum(case when enrolled != 'Yes' then 1 else 0 end) as TOTAL_NOT_ENROLLED,
count (*) AS Total,
sum (tableB.total_employees) AS T_EMPL
FROM tableA
JOIN tableB
ON tableA.department = tableB.department
GROUP BY tableA.department;
But the numbers I get in my query are completely wrong since the JOINS repeat my rows and my SUMS duplicate.
I don't know if I really need to use a join or a subquery to place my sum(tableB.department) in its respective row.
I'm using PostgreSQL but since I'm using Standard 92 any SQL solution will help.

Your main issue stemmed from inadvertently multiplying rows with the join,
and has already been addressed. See:
Two SQL LEFT JOINS produce incorrect result
But use the standard SQL aggregate FILTER clause. It's shorter, cleaner, and noticeably faster. See:
Aggregate columns with additional (distinct) filters
For absolute performance, is SUM faster or COUNT?
SELECT *
FROM (
SELECT department
, count(*) FILTER (WHERE enrolled AND project = 'ARQ') AS arq_e
, count(*) FILTER (WHERE enrolled AND project = 'TC') AS tc_e
, count(*) FILTER (WHERE enrolled) AS total_enrolled
, count(*) FILTER (WHERE NOT enrolled AND project = 'ARQ') AS arq_n
, count(*) FILTER (WHERE NOT enrolled AND project = 'TC') AS tc_n
, count(*) FILTER (WHERE NOT enrolled) AS total_not_enrolled
, count(*) AS total
FROM tableA a
GROUP BY 1
) a
LEFT JOIN ( -- !
SELECT department
, sum(total_employees) AS total_employees
FROM tableB b
GROUP BY 1
) b USING (department);
enrolled should be a boolean column. Make it so if it isn't. Then you can use it directly. Smaller, faster, cleaner, shorter code.
I replaced the [INNER] JOIN with a LEFT [OUTER] JOIN on a suspicion. Typically, you want to keep all results, even if the same department is not found in the other table. Maybe even a FULL [OUTER] JOIN?
Also, USING (department) as join condition conveniently outputs that column only once, so we can make do with SELECT * in the outer SELECT.
Finally, subqueries are shorter and faster than CTEs. Not much since Postgres 12, but still. See:
Does WITH query store the results of referred tables?

Join the results of the two queries, using sub-queries, don't join the tables.
That way you're joining 1 row of enrollment data per department to 1 row of employee data per department.
SELECT
*
FROM
(
SELECT tableA.department,
sum(case when enrolled = 'Yes' and tableA.project = 'ARQ' then 1 else 0 end) as ARQ_E,
sum(case when enrolled = 'Yes' and tableA.project = 'TC' then 1 else 0 end) as TC_E,
sum(case when enrolled = 'Yes' then 1 else 0 end) as TOTAL_ENROLLED,
sum(case when enrolled != 'Yes' and tableA.project = 'ARQ' then 1 else 0 end) as ARQ_N,
sum(case when enrolled != 'Yes' and tableA.project = 'TC' then 1 else 0 end) as TC_N,
sum(case when enrolled != 'Yes' then 1 else 0 end) as TOTAL_NOT_ENROLLED,
count (*) AS Total
FROM tableA
GROUP BY tableA.department
)
AS enroll
INNER JOIN
(
SELECT tableB.department,
sum(tableB.total_employees) AS Total_EMPLOYEES
FROM tableB
GROUP BY tableB.department
)
AS employee
ON employee.department = enroll.department

As the join will multiply the summ, you can first sum the values and then join them
WITH CTE1 as (SELECT tableA.department,
sum(case when enrolled = 'Yes' and tableA.project = 'ARQ' then 1 else 0 end) as ARQ_E,
sum(case when enrolled = 'Yes' and tableA.project = 'TC' then 1 else 0 end) as TC_E,
sum(case when enrolled = 'Yes' then 1 else 0 end) as TOTAL_ENROLLED,
sum(case when enrolled != 'Yes' and tableA.project = 'ARQ' then 1 else 0 end) as ARQ_N,
sum(case when enrolled != 'Yes' and tableA.project = 'TC' then 1 else 0 end) as TC_N,
sum(case when enrolled != 'Yes' then 1 else 0 end) as TOTAL_NOT_ENROLLED,
count (*) AS Total
FROM tableA
GROUP BY tableA.department),
CTE2 as (SELECT tableB.department,
sum(tableB.total_employees) AS TOTAL_EMPLOYEES
FROM tableB
GROUP BY tableB.department)
SELECT
CTE1.department, ARQ_E, TC_E, TOTAL_ENROLLED, ARQ_N, TC_N, TOTAL_NOT_ENROLLED, TOTAL, T_EMPL,CTE2.TOTAL_EMPLOYEES
FROM CTE1 JOIN CTE2 ON CTE1.department = CTE2.department

Related

SQL Group By with multiple counts

I'm trying to group a list of services together along with the number of applicants in each service, but I also need a count on the status each applicant is in.
Applicants table
serviceID clientID applicantID status
----------------------------------------------------
1 41 1 1 (Processing)
1 41 16 1 (Processing)
1 41 15 2 (Ready)
2 41 12 1 (Processing)
2 41 18 3 (Complete)
Service table:
serviceID serviceName
--------------------------
1 Full Service
2 Part Service
Results need to look like:
serviceName totalApplicants processingCount readyCount completeCount
---------------------------------------------------------------------------
Full Service 3 2 1 0
Part Service 2 1 0 1
I've got the following, but it's returning the same count in each of the columns:-
SELECT
Services.serviceName,
(COUNT(Applicants.applicantID)) AS totalApplicants,
ISNULL(SUM(CASE WHEN Applicants.status = 1 THEN 1 ELSE 0 END), 0) AS processingCount,
ISNULL(SUM(CASE WHEN Applicants.status = 2 THEN 1 ELSE 0 END), 0) AS readyCount,
ISNULL(SUM(CASE WHEN Applicants.status = 3 THEN 1 ELSE 0 END), 0) AS completeCount
FROM
Applicants
LEFT JOIN
Services ON Applicants.serviceID = Services.serviceID
WHERE
Applicants.clientID = #CompanyID
GROUP BY
Services.serviceName
You can do conditional aggregation:
select
s.serviceName,
count(s.serviceID) totalApplicants,
sum(case when status = 1 then 1 else 0 end) processingCount,
sum(case when status = 2 then 1 else 0 end) readyCount,
sum(case when status = 3 then 1 else 0 end) completeCount
from service s
left join applicants a on a.serviceID = s.serviceID AND a.clientID = #CompanyID
group by s.serviceID, s.serviceName
The conditional expression use standard case expression; depending on the database that you are actually using, neater alternatives may exists.
Your query should be fine, but it can be simplified to:
SELECT s.serviceName,
COUNT(a.AppicantId) AS totalApplicants,
SUM(CASE WHEN a.status = 1 THEN 1 ELSE 0 END) AS processingCount,
SUM(CASE WHEN a.status = 2 THEN 1 ELSE 0 END) AS readyCount,
SUM(CASE WHEN a.status = 3 THEN 1 ELSE 0 END) AS completeCount
FROM Services s LEFT JOIN
Applicants a
ON a.serviceID = s.serviceID AND
a.clientID = #CompanyID
GROUP BY s.serviceName ;
Notes:
It looks like you want a row for every service, so that should be the first table in the LEFT JOIN.
Hence the filtering on the company goes into the ON clause.
Table aliases make the query easier to write and to read.
No ISNULL() is needed (and I prefer COALESCE() over ISNULL()).

SQL ANY as a function instead of an operator

I need to count users that match certain conditions. To do that I need to join some tables and check if any of the grouping combination match the condition.
The way I implemented that now is by having a nested select that counts original matches and then counting the rows that have at least one result.
SELECT
COUNT(case when NestedCount1 > 0 then 1 else null end) as Count1,
COUNT(case when NestedCount2 > 0 then 1 else null end) as Count2,
COUNT(case when NestedCount3 > 0 then 1 else null end) as Count3
FROM
(SELECT
COUNT(case when Type = 1 then 1 else null end) as NestedCount1,
COUNT(case when Type = 2 then 1 else null end) as NestedCount2,
COUNT(case when Type = 2 AND Condition = 1 then 1 else null end) as NestedCount3
FROM [User]
LEFT JOIN [UserGroup] ON [User].Id = [UserGroup].UserId
LEFT JOIN [Group] ON [UserGroup].GroupId = [Group].Id
GROUP BY [User].Id) nested
What irks me is that the counts from the nested select are only used to check existence. However since ANY in SQL is only an operator I cannot think of a cleaner way on how to rewrite this.
The query returns correct results as is.
I'm wondering if there is any way to rewrite this that would avoid having intermediate results that are only used to check existence condition?
Sample imput User.csv Group.csv UserGroup.csv
Expected results: 483, 272, 121
It might be possible to simplify that query.
I think that the group on the UserId can be avoided.
By using distinct conditional counts on the user id.
Then there's no need for a sub-query.
SELECT
COUNT(DISTINCT case when [User].[Type] = 1 then [User].Id end) as Count1,
COUNT(DISTINCT case when [User].[Type] = 2 then [User].Id end) as Count2,
COUNT(DISTINCT case when [User].[Type] = 2 AND Condition = 1 then [User].Id end) as Count3
FROM [User]
LEFT JOIN [UserGroup] ON [UserGroup].UserId = [User].Id
LEFT JOIN [Group] ON [Group].Id = [UserGroup].GroupId;
SELECT
SUM(case when NestedCount1 > 0 then 1 else 0 end) as Count1,
SUM(case when NestedCount2 > 0 then 1 else 0 end) as Count2,
SUM(case when NestedCount3 > 0 then 1 else 0 end) as Count3
FROM
(
SELECT
[User].Id,
COUNT(case when Type = 1 then 1 else 0 end) as NestedCount1,
COUNT(case when Type = 2 then 1 else 0 end) as NestedCount2,
COUNT(case when Type = 2 AND Condition = 1 then 1 else 0 end) as NestedCount3
FROM [User]
LEFT JOIN [UserGroup] ON [UserGroup].UserId = [User].Id
LEFT JOIN [Group] ON [Group].Id = [UserGroup].GroupId
GROUP BY [User].Id
) nested

How do I get multiple counts based on distinct parameters in one SQL query?

I have a table with multiple medical records of relatives. I'm trying to count instances of cancer diagnoses per degree of relative.
CREATE TABLE Relatives
(person varchar(9),
relative varchar(12),
degree int,
relativeID varchar(9),
age int,
CancerDiagnosis varchar(2))
INSERT INTO RELATIVES (person, relative, degree, relativeid, age, cancerdiagnosis)
VALUES ('12345678','aunt','2','54876','36','Y'),
('12345678','aunt','2','54876','43','Y'),
('12345678','cousin','3','213786','39','N'),
('12345678','daughter','1','128756','15','Y'),
('12345678','daughter','1','128756','21','Y'),
('12345678','daughter','1','128756','12','N'),
('12345678','father','1','867578','64','Y'),
('98765432','cousin','3','987645','39','Y'),
('98765432','cousin','3','987645','40','Y'),
('98765432','sibling','1','123744','22','N'),
('98765432','mother','1','876418','64','Y'),
('98765432','mother','1','876418','65','Y'),
I expect the result:
person fdr_cancer sdr_cancer tdr_cancer
12345678 2 1 0
98765432 1 0 1
Here is my query:
SELECT person,
SUM(CASE WHEN cancerdiagnosis = 'y' AND degree = 1 THEN 1 ELSE 0 END) AS
FDR_Cancer,
SUM(CASE WHEN cancerdiagnosis = 'y' AND degree = 2 THEN 1 ELSE 0 END) AS
SDR_Cancer,
sum(CASE WHEN cancerdiagnosis = 'y' AND degree = 3 THEN 1 ELSE 0 END) AS
TDR_Cancer
FROM Relatives
GROUP BY person
How do I get this to count distinct rows by relativeID, degree, and diagnosis?
If I understand correctly, you want conditional count(distinct):
SELECT person,
COUNT(DISTINCT CASE WHEN cancerdiagnosis = 'y' AND degree = 1 THEN relativeid END) AS FDR_Cancer,
COUNT(DISTINCT CASE WHEN cancerdiagnosis = 'y' AND degree = 2 THEN relativeid END) AS SDR_Cancer,
COUNT(DISTINCT CASE WHEN cancerdiagnosis = 'y' AND degree = 3 THEN relativeid END) AS TDR_Cancer
FROM Relatives
GROUP BY person;
Instead of three COUNT DISTINCT you can apply DISTINCT before aggregation, should be more efficient:
with cte as
( select distinct
person, degree, relativeid, cancerdiagnosis
from Relatives
)
SELECT person,
SUM(CASE WHEN cancerdiagnosis = 'y' AND degree = 1 THEN 1 ELSE 0 END) AS FDR_Cancer,
SUM(CASE WHEN cancerdiagnosis = 'y' AND degree = 2 THEN 1 ELSE 0 END) AS SDR_Cancer,
sum(CASE WHEN cancerdiagnosis = 'y' AND degree = 3 THEN 1 ELSE 0 END) AS TDR_Cancer
FROM cte
GROUP BY person
It might be possible to move the cancerdiagnosis = 'y' as a WHERE-condititon into the CTE (but then a person where the diagnosis is N for all rows will be omitted):
with cte as
( select distinct
person, degree, relativeid
from Relatives
where cancerdiagnosis = 'y'
)
SELECT person,
SUM(CASE WHEN degree = 1 THEN 1 ELSE 0 END) AS FDR_Cancer,
SUM(CASE WHEN degree = 2 THEN 1 ELSE 0 END) AS SDR_Cancer,
sum(CASE WHEN degree = 3 THEN 1 ELSE 0 END) AS TDR_Cancer
FROM cte
GROUP BY person
The previous answers are easier to understand, but another option is to use PIVOT
SELECT
person
,[1] AS FDR_Cancer
,[2] AS SDR_Cancer
,[3] AS TDR_Cancer
FROM
(
SELECT DISTINCT
person
,relativeid
,degree
,CancerDiagnosis
FROM Relatives
) ps
PIVOT
(
COUNT(relativeid) FOR degree IN ([1],[2],[3])
) AS pvt
WHERE
pvt.CancerDiagnosis = 'Y'

Trouble ordering GROUP BY, ORDER BY AND JOIN

i'm having trouble ordering a query.
I have this table (AttendanceLog);
ClassID | StudentPin | Status
69 1 YES
8 2 NO
10 2 NO
17 3 NO
43 5 YES
58 6 YES
and this table (Students):
STUDENTPIN | FNAME | LNAME | INTERNATIONAL
1 X X NO
2 X X YES
3 X X NO
4 X X YES
I want to find out the which INTERNATIONAL students (Fname, Lname and StudentPIN) have missed 10 or more classes (attendancelog status being no).
Currently I have this (below) which tells me the studentPIN and the number of classes attended and no attended by each student, however I am unable to join the two tables together.
SELECT
ATTENDANCELOG.studentpin,
SUM(CASE WHEN status = 'YES' THEN 1 ELSE 0 END) AS number_of_yes,
SUM(CASE WHEN status = 'NO' THEN 1 ELSE 0 END) AS number_of_no
FROM attendancelog
GROUP BY ATTENDANCELOG.studentpin
ORDER BY ATTENDANCELOG.studentpin
Thanks!
you could use a join
SELECT
ATTENDANCELOG.studentpin,
Students.FNAME,
Students.LNAME,
SUM(CASE WHEN status = 'YES' THEN 1 ELSE 0 END) AS number_of_yes,
SUM(CASE WHEN status = 'NO' THEN 1 ELSE 0 END) AS number_of_no
FROM attendancelog
INNER JOIN Students ON Students.STUDENTPIN = attendancelog.StudentPin
and INTERNATIONAL='YES'
GROUP BY ATTENDANCELOG.studentpin, Students.FNAME, Students.LNAME
ORDER BY ATTENDANCELOG.studentpin
Join on student pin, put your international = 'YES' filter in the where clause, and filter for more than 10 misses in a having clause. You can also shorten the case expressions a little:
select a.studentpin
, s.fname, s.lname, s.international
, count(case a.status when 'YES' then 1 end) as attended
, count(case a.status when 'NO' then 1 end) as missed
from attendancelog a
join students s on s.studentpin = a.studentpin
where international = 'YES'
group by s.fname, s.lname, s.international, a.studentpin
having count(case a.status when 'NO' then 1 end) > 10
order by s.fname, s.lname, a.studentpin;

SQL Server Month Totals

SQL Server newbie
The following query returns SRA by Student and month only if there is a record for a student in Discipline table. I need a query to return all students and month totals even if there is no record for student in Discipline table. Any direction appreciated
SELECT TOP 100 PERCENT MONTH(dbo.Discipline.DisciplineDate) AS [Month], dbo.Discipline.StuId, dbo.Stu.Lastname + ',' + dbo.Stu.FirstName AS Student,
SUM(CASE WHEN Discipline.SRA = 1 THEN 1 END) AS [Acad Suspension], SUM(CASE WHEN Discipline.SRA = 2 THEN 1 END) AS Conduct,
SUM(CASE WHEN Discipline.SRA = 3 THEN 1 END) AS Disrespect, SUM(CASE WHEN Discipline.SRA = 4 THEN 1 END) AS [S.R.A],
SUM(CASE WHEN Discipline.SRA = 5 THEN 1 END) AS Suspension, SUM(CASE WHEN Discipline.SRA = 6 THEN 1 END) AS Tone
FROM dbo.Discipline INNER JOIN
dbo.Stu ON dbo.Discipline.StuId = dbo.Stu.StuId
GROUP BY dbo.Discipline.StuId, dbo.Stu.Lastname, dbo.Stu.FirstName, MONTH(dbo.Discipline.DisciplineDate)
ORDER BY Student
You need to change the INNER JOIN onto dbo.Stu to a LEFT JOIN:
SELECT MONTH(d.disciplinedate) AS [Month],
d.StuId,
s.Lastname + ',' + s.FirstName AS Student,
SUM(CASE WHEN d.SRA = 1 THEN 1 END) AS [Acad Suspension],
SUM(CASE WHEN d.SRA = 2 THEN 1 END) AS Conduct,
SUM(CASE WHEN d.SRA = 3 THEN 1 END) AS Disrespect,
SUM(CASE WHEN d.SRA = 4 THEN 1 END) AS [S.R.A],
SUM(CASE WHEN d.SRA = 5 THEN 1 END) AS Suspension,
SUM(CASE WHEN d.SRA = 6 THEN 1 END) AS Tone
FROM dbo.Discipline d
LEFT JOIN dbo.Stu s ON s.stuid = d.stuid
GROUP BY d.StuId, s.Lastname, s.FirstName, MONTH(d.DisciplineDate)
ORDER BY Student
The LEFT JOIN means that whatever table you're LEFT JOINing to might not have records to support the JOIN, but you'll still get records from the base table (dbo.Discipline).
I used table aliases - d and s. Less to type when you need to specify references.
generate a series of months, join discipline to that.