How to include "zero" / "0" results in COUNT aggregate? - sql

I've just got myself a little bit stuck with some SQL. I don't think I can phrase the question brilliantly - so let me show you.
I have two tables, one called person, one called appointment. I'm trying to return the number of appointments a person has (including if they have zero). Appointment contains the person_id and there is a person_id per appointment. So COUNT(person_id) is a sensible approach.
The query:
SELECT person_id, COUNT(person_id) AS "number_of_appointments"
FROM appointment
GROUP BY person_id;
Will return correctly, the number of appointments a person_id has. However, a person who has 0 appointments isn't returned (obviously as they are not in that table).
Tweaking the statement to take person_id from the person table gives me something like:
SELECT person.person_id, COUNT(appointment.person_id) AS "number_of_appointments"
FROM appointment
JOIN person ON person.person_id = appointment.person_id
GROUP BY person.person_id;
This however, will still only return a person_id who has an appointment and not what I want which is a return with persons who have 0 appointments!
Any suggestions please?

You want an outer join for this (and you need to use person as the "driving" table)
SELECT person.person_id, COUNT(appointment.person_id) AS "number_of_appointments"
FROM person
LEFT JOIN appointment ON person.person_id = appointment.person_id
GROUP BY person.person_id;
The reason why this is working, is that the outer (left) join will return NULL for those persons that do not have an appointment. The aggregate function count() will not count NULL values and thus you'll not get a zero.
If you want to learn more about outer joins, here is a nice tutorial: http://sqlzoo.net/wiki/Using_Null

You must use LEFT JOIN instead of INNER JOIN
SELECT person.person_id, COUNT(appointment.person_id) AS "number_of_appointments"
FROM person
LEFT JOIN appointment ON person.person_id = appointment.person_id
GROUP BY person.person_id;

if you do the outer join (with the count), and then use this result as a sub-table, you can get 0 as expected (thanks to the nvl function)
Ex:
select P.person_id, nvl(A.nb_apptmts, 0) from
(SELECT person.person_id
FROM person) P
LEFT JOIN
(select person_id, count(*) as nb_apptmts
from appointment
group by person_id) A
ON P.person_id = A.person_id

USE join to get 0 count in the result using GROUP BY.
simply 'join' does Inner join in MS SQL so , Go for left or right join.
If the table which contains the primary key is mentioned first in the QUERY then use LEFT join else RIGHT join.
EG:
select WARDNO,count(WARDCODE) from MAIPADH
right join MSWARDH on MSWARDH.WARDNO= MAIPADH.WARDCODE
group by WARDNO
.
select WARDNO,count(WARDCODE) from MSWARDH
left join MAIPADH on MSWARDH.WARDNO= MAIPADH.WARDCODE group by WARDNO
Take group by from the table which has Primary key and count from the another table which has actual entries/details.

To change even less on your original query, you can turn your join into a RIGHT join
SELECT person.person_id, COUNT(appointment.person_id) AS "number_of_appointments"
FROM appointment
RIGHT JOIN person ON person.person_id = appointment.person_id
GROUP BY person.person_id;
This just builds on the selected answer, but as the outer join is in the RIGHT direction, only one word needs to be added and less changes. - Just remember that it's there and can sometimes make queries more readable and require less rebuilding.

The problem with a LEFT JOIN is that if there are no appointments, it will still return one row with a null, which when aggregated by COUNT will become 1, and it will appear that the person has one appointment when actually they have none. I think this will give the correct results:
SELECT person.person_id,
(SELECT COUNT(*) FROM appointment WHERE person.person_id = appointment.person_id) AS 'Appointments'
FROM person;

Related

How to return all rows but change count when a condition is satisfied-postgresql

I would appreciate any help or resources to solve this query!
step-by-step demo:dbfiddle
SELECT
d.id,
d.fname,
d.lname,
COUNT(DISTINCT act.id) -- 3
FROM director d
LEFT JOIN movie_director mdir ON mdir.did = d.id -- 1
LEFT JOIN casts cas ON cas.mid = mdir.mid
LEFT JOIN actor act ON cas.aid = act.id
AND d.lname = act.lname AND d.fname != act.fname -- 2
GROUP BY d.id, d.fname, d.lname
LEFT instead of INNER join for joining to keep the directors which do not fulfill the condition
Move your filter WHERE condition into the join condition. In your way, the join will be done and afterwards the filter removes all directors without required actors. If you move it into the join condition, the LEFT join ensure to keep these directors nonetheless
DISTINCT in COUNT() aggregate returns the actor only once. Important: You counted the directors ids, not the actors. But you are interested in the actor!
To get all directors, replace the INNER JOIN with a LEFT JOIN everywhere (you need to move the WHERE conditions into the join condition to avoid removing NULL rows after the join). Then use count(act.id) to count the actors. This will automatically skip all NULL values, so you don't have to worry about them. Actors will be counted multiple times if they act in more than one picture. If you want to avoid that, use count(DISTINCT act.id).

SQL Get aggregate as 0 for non existing row using inner joins

I am using SQL Server to query these three tables that look like (there are some extra columns but not that relevant):
Customers -> Id, Name
Addresses -> Id, Street, StreetNo, CustomerId
Sales -> AddressId, Week, Total
And I would like to get the total sales per week and customer (showing at the same time the address details). I have come up with this query
SELECT a.Name, b.Street, b.StreetNo, c.Week, SUM (c.Total) as Total
FROM Customers a
INNER JOIN Addresses b ON a.Id = b.CustomerId
INNER JOIN Sales c ON b.Id = c.AddressId
GROUP BY a.Name, c.Week, b.Street, b.StreetNo
and even if my SQL skill are close to none it looks like it's doing its job. But now I would like to be able to show 0 whenever the one customer don't have sales for a particular week (weeks are just integers). And I wonder if somehow I should get distinct values of the weeks in the Sales table, and then loop through them (not sure how)
Any help?
Thanks
Use CROSS JOIN to generate the rows for all customers and weeks. Then use LEFT JOIN to bring in the data that is available:
SELECT c.Name, a.Street, a.StreetNo, w.Week,
COALESCE(SUM(s.Total), 0) as Total
FROM Customers c CROSS JOIN
(SELECT DISTINCT s.Week FROM sales s) w LEFT JOIN
Addresses a
ON c.CustomerId = a.CustomerId LEFT JOIN
Sales s
ON s.week = w.week AND s.AddressId = a.AddressId
GROUP BY c.Name, a.Street, a.StreetNo, w.Week;
Using table aliases is good, but the aliases should be abbreviations for the table names. So, a for Addresses not Customers.
You should generate a week numbers, rather than using DISTINCT. This is better in terms of performance and reliability. Then use a LEFT JOIN on the Sales table instead of an INNER JOIN:
SELECT a.Name
,b.Street
,b.StreetNo
,weeks.[Week]
,COALESCE(SUM(c.Total),0) as Total
FROM Customers a
INNER JOIN Addresses b ON a.Id = b.CustomerId
CROSS JOIN (
-- Generate a sequence of 52 integers (13 x 4)
SELECT ROW_NUMBER() OVER (ORDER BY a.x) AS [Week]
FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) a(x)
CROSS JOIN (SELECT x FROM (VALUES(1),(1),(1),(1)) b(x)) b
) weeks
LEFT JOIN Sales c ON b.Id = c.AddressId AND c.[Week] = weeek.[Week]
GROUP BY a.Name
,b.Street
,b.StreetNo
,weeks.[Week]
Please try the following...
SELECT Name,
Street,
StreetNo,
Week,
SUM( CASE
WHEN Total IS NULL THEN
0
ELSE
Total
END ) AS Total
FROM Customers a
JOIN Addresses b ON a.Id = b.CustomerId
RIGHT JOIN Sales c ON b.Id = c.AddressId
GROUP BY a.Name,
c.Week,
b.Street,
b.StreetNo;
I have modified your statement in three places. The first is I changed your join to Sales to a RIGHT JOIN. This will join as it would with an INNER JOIN, but it will also keep the records from the table on the right side of the JOIN that do not have a matching record or group of records on the left, placing NULL values in the resulting dataset's fields that would have come from the left of the JOIN. A LEFT JOIN works in the same way, but with any extra records in the table on the left being retained.
I have removed the word INNER from your surviving INNER JOIN. Where JOIN is not preceded by a join type, an INNER JOIN is performed. Both JOIN and INNER JOIN are considered correct, but the prevailing protocol seems to be to leave the INNER out, where the RDBMS allows it to be left out (which SQL-Server does). Which you go with is still entirely up to you - I have left it out here for illustrative purposes.
The third change is that I have added a CASE statement that tests to see if the Total field contains a NULL value, which it will if there were no sales for that Customer for that Week. If it does then SUM() would return a NULL, so the CASE statement returns a 0 instead. If Total does not contain a NULL value, then the SUM() of all values of Total for that grouping is performed.
Please note that I am assuming that Total will not have any NULL values other than from the RIGHT JOIN. Please advise me if this assumption is incorrect.
Please also note that I have assumed that either there will be no missing Weeks for a Customer in the Sales table or that you are not interested in listing them if there are. Again, please advise me if this assumption is incorrect.
If you have any questions or comments, then please feel free to post a Comment accordingly.

SQL - Display Count of records even if 0

I am trying to create an SQL query in MS Access that will show how many appointments an employee will have in the current month, even if it is 0. It is very similar to this question, but i can't get it to work with a WHERE clause.
I have 3 tables:
tblEmployees
-employeeID PK
-FirstName ETC
tblEngineersAppts1
-ApptID PK
-EmployeeID*
tblEngineersAppts2
-DiaryID PK
-ApptDate
-ApptID*
I want to show all employees, a COUNT of all appointments (DiaryID) in tblEngineersAppts2 even if there are none where ApptDate is the current month.
This is my query, it only shows employees that have an appointment in the current month, it doesn't show those who have none.
SELECT tblEmployees.EmployeeID, Count(tblEngineersAppts2.DiaryID) AS CountOfDiaryID
FROM (tblEmployees
LEFT JOIN tblEngineersAppts1 ON tblEmployees.EmployeeID = tblEngineersAppts1.EmployeeID)
LEFT JOIN tblEngineersAppts2 ON tblEngineersAppts1.ApptID = tblEngineersAppts2.ApptID
WHERE (((Format$([ApptDate],'MM/YY'))='03/17'))
GROUP BY tblEmployees.EmployeeID;
Thanks
The problem is when you put the WHERE condition you make the LEFT JOIN an INNER JOIN
WHERE (((Format$([ApptDate],'MM/YY'))='03/17'))
So Include the ApptDate constraint in the ON condition.
SELECT tblEmployees.EmployeeID, Count(tblEngineersAppts2.DiaryID) AS CountOfDiaryID
FROM (tblEmployees
LEFT JOIN tblEngineersAppts1 ON tblEmployees.EmployeeID = tblEngineersAppts1.EmployeeID)
LEFT JOIN tblEngineersAppts2
ON ( tblEngineersAppts1.ApptID = tblEngineersAppts2.ApptID
AND Format$([ApptDate],'MM/YY')='03/17'
)
GROUP BY tblEmployees.EmployeeID;
I think you could do something like this:
SELECT
tblEmployees.EmployeeID,
(
SELECT Count(tblEngineersAppts2.DiaryID)
FROM tblEngineersAppts1
JOIN tblEngineersAppts2 ON tblEngineersAppts1.ApptID = tblEngineersAppts2.ApptID
WHERE tblEmployees.EmployeeID = tblEngineersAppts1.EmployeeID
AND (((Format$([ApptDate],'MM/YY'))='03/17'))
) AS CountOfDiaryID
FROM
tblEmployees

How to list unused items from database

MDW_CUSTOMER_ACCOUNTS has the following fields: ACCOUNT_ID, MEAL_ID.
MDW_MEALS_MENU has the following fields: MEAL_ID, MEAL_NAME.
I am trying to generate a report on the number of times a particular meal has been subscribed to by a customer using the query,
SELECT count(a.account_id), b.meal_id, b.meal_name
FROM mdw_meals_menu b LEFT JOIN mdw_customer_accounts a
on b.meal_id=a.meal_id
WHERE
a.start_date BETWEEN to_date('01-APR-2013','DD-MON-YYYY')
AND to_date('30-JUN-2013','DD-MON-YYYY')
GROUP BY b.meal_id, b.meal_name
ORDER BY count(a.account_id) desc, b.meal_id;
This only lists the MEAL_IDs that has been subscribed to at least once. But it is not displaying the Ids that have not been subscribed to.
How do I get these MEAL_IDs to print with the count being 0?
i have modified the code, but still i get the same result.
Your where clause is effectively turning your outer join back into an inner join - conditions on an outer-joined table should generally be in the join clause, like so:
SELECT count(a.account_id), b.meal_id, b.meal_name
FROM mdw_meals_menu b
LEFT JOIN mdw_customer_accounts a
on b.meal_id=a.meal_id and
a.start_date BETWEEN to_date('01-APR-2013','DD-MON-YYYY')
AND to_date('30-JUN-2013','DD-MON-YYYY')
GROUP BY b.meal_id, b.meal_name
ORDER BY count(a.account_id) desc, b.meal_id;
You should use a left outer join .

Left outer join two levels deep in Postgres results in cartesian product

Given the following 4 tables:
CREATE TABLE events ( id, name )
CREATE TABLE profiles ( id, event_id )
CREATE TABLE donations ( amount, profile_id )
CREATE TABLE event_members( id, event_id, user_id )
I'm attempting to get a list of all events, along with a count of any members, and a sum of any donations. The issue is the sum of donations is coming back wrong (appears to be a cartesian result of donations * # of event_members).
Here is the SQL query (Postgres)
SELECT events.name, COUNT(DISTINCT event_members.id), SUM(donations.amount)
FROM events
LEFT OUTER JOIN profiles ON events.id = profiles.event_id
LEFT OUTER JOIN donations ON donations.profile_id = profiles.id
LEFT OUTER JOIN event_members ON event_members.event_id = events.id
GROUP BY events.name
The sum(donations.amount) is coming back = to the actual sum of donations * number of rows in event_members. If I comment out the count(distinct event_members.id) and the event_members left outer join, the sum is correct.
As I explained in an answer to the referenced question you need to aggregate before joining to avoid a proxy CROSS JOIN. Like:
SELECT e.name, e.sum_donations, m.ct_members
FROM (
SELECT e.id AS event_id, e.name, SUM(d.amount) AS sum_donations
FROM events e
LEFT JOIN profiles p ON p.event_id = e.id
LEFT JOIN donations d ON d.profile_id = p.id
GROUP BY 1, 2
) e
LEFT JOIN (
SELECT m.event_id, count(DISTINCT m.id) AS ct_members
FROM event_members m
GROUP BY 1
) m USING (event_id);
IF event_members.id is the primary key, then id is guaranteed to be UNIQUE in the table and you can drop DISTINCT from the count:
count(*) AS ct_members
You seem to have this two independent structures (-[ means 1-N association):
events -[ profiles -[ donations
events -[ event members
I wrapped the second one into a subquery:
SELECT events.name,
member_count.the_member_count
COUNT(DISTINCT event_members.id),
SUM(donations.amount)
FROM events
LEFT OUTER JOIN profiles ON events.id = profiles.event_id
LEFT OUTER JOIN donations ON donations.profile_id = profiles.id
LEFT OUTER JOIN (
SELECT
event_id,
COUNT(*) AS the_member_count
FROM event_members
GROUP BY event_id
) AS member_count
ON member_count.event_id = events.id
GROUP BY events.name
Of course you get a cartesian product between donations and events for every event since both are only bound to the event, there is no join relation between donations and event_members other than the event id, which of course means that every member matches every donation.
When you do your query, you ask for all events - let's say there are two, event Alpha and event Beta - and then JOIN with the members. Let's say that there is a member Alice that participates on both events.
SELECT events.name, COUNT(DISTINCT event_members.id), SUM(donations.amount)
FROM events
LEFT OUTER JOIN profiles ON events.id = profiles.event_id
LEFT OUTER JOIN donations ON donations.profile_id = profiles.id
LEFT OUTER JOIN event_members ON event_members.event_id = events.id
GROUP BY events.name
On each row you asked the total for Alice's donations. If Alice donated 100 USD, then you asked for:
Alpha Alice 100USD
Beta Alice 100USD
So it's not surprising that when asking for the sum total Alice comes out as having donated 200 USD.
If you wanted the sum of all donations, you'd better doing with two distinct queries. Trying to do everything with a single query, while possible, would be a classical SQL Antipattern (actually the one in chapter #18, "Spaghetti Query"):
Unintended Products
One common consequence of producing all your
results in one query is a Cartesian product. This happens when two of
the tables in the query have no condition restricting their
relationship. Without such a restriction, the join of two tables pairs
each row in the first table to every row in the other table. Each such
pairing becomes a row of the result set, and you end up with many more
rows than you expect.