SQL reporting joining three tables - sql

Firstly, my SQL knowledge is little rusty. I am trying to generate a report of reviews each patient has gone through for a time period. A review is done as part of a Doctors' round. The following are the corresponding tables with relevant columns:
Patients: (id, name)
Rounds: (id, patient_id, date)
Reviews: (id, round_id, review)
The report should look like the following:
Patient | Reviews
_________________________
Patient 1 | 2
_________________________
Patient 2 | 1
_________________________
Patient 3 | 0
_________________________
I tried the following SQL statement:
SELECT
p.name as patient,
COUNT(r.round_id) as reviews
FROM
patients as p
JOIN rounds as ro ON p.id = ro.patient_id
JOIN reviews as r ON ro.id = r.round_id
WHERE
r.review_date between '2012-02-01' AND '2012-02-29'
GROUP BY
p.name
But, the above query only returns rows where reviews count is > 1. I want it to return even if the count is 0.

The simplest way to Join the tables, and to include instances where there is no match in one of those tables, is to use a LEFT OUTER JOIN. This will match all records to the left, regardless of whether a match was found on the right side of the JOIN.
Since your r.review_date is in your WHERE clause, no matches can occur unless there is a review between those dates. So to include instances where there is no review, you must allow for that in your WHERE clause by adding "OR r.review_date IS NULL" as below. You may also want to consider filtering on the round.date field instead, so that you are only looking at instances where there were valid rounds performed within that time frame. ie. "WHERE ro.date between '2012-02-01' AND '2012-02-29'"
eg.
SELECT
p.name as patient,
COUNT(r.round_id) as reviews
FROM
patients as p
JOIN rounds as ro ON p.id = ro.patient_id
LEFT OUTER JOIN reviews as r ON ro.id = r.round_id
WHERE
r.review_date between '2012-02-01' AND '2012-02-29' OR r.review_date IS NULL
GROUP BY
p.name
Note: If you want to report records without any rounds, you will also have to make the first JOIN a LEFT OUTER JOIN as well.
FROM
patients as p
LEFT OUTER JOIN rounds as ro ON p.id = ro.patient_id
LEFT OUTER JOIN reviews as r ON ro.id = r.round_id

SELECT p.name AS patient
, COUNT(r.ID) AS reviews
FROM patients AS p
LEFT JOIN rounds AS ro ON p.id = ro.patient_id
LEFT JOIN reviews AS r ON ro.id = r.round_id
AND r.review_date BETWEEN '2012-02-01'
AND '2012-02-29'
GROUP BY p.name
Will get you a list of all patients, including those who have not had a round or a review in between your specific dates. Patients with no rounds or reviews will have a 0.

Related

Left outer join with count, on 3 tables not returning all rows from left table

I have these 3 tables:
Areas - id, name
Persons - id, area_id
Special_Persons - id_person, date
I'd like to produce a list of all Areas, followed by a count of Special Persons in each area, including Areas with no Special Persons.
If I do a left join of Areas and Persons, like this:
select a.id as idArea, count(p.id) as count
from areas a
left join persons p on p.area_id = a.id
group by a.id;
This works just fine; Areas that have no Persons show up, and have a count of 0.
What I am not clear on is how to do the same thing with the special_persons table, which currently only has 2 entries, both in the same Area.
I have tried the following:
select a.id as idArea, count(sp.id_person) as count
from special_persons sp, areas a
left join persons p on p.area_id = a.id
where p.area_id = a.id
and sp.id_person = p.id
group by a.id;
And it only returns 1 row, with the Area that happens to have 2 Special Persons in it, and a count of 2.
To continue getting a list of all areas, do I need to use a sub-query? Another join? I'm not sure how to go about it.
You can add another left join to the Special_Persons table:
select a.id as idArea, count(p.id), count(sp.id_person)
from areas a
left join persons p on p.area_id = a.id
left join special_persons sp on sp.id_person = p.id
group by a.id;

SQL Get aggregate as 0 for non existing row using inner joins

I am using SQL Server to query these three tables that look like (there are some extra columns but not that relevant):
Customers -> Id, Name
Addresses -> Id, Street, StreetNo, CustomerId
Sales -> AddressId, Week, Total
And I would like to get the total sales per week and customer (showing at the same time the address details). I have come up with this query
SELECT a.Name, b.Street, b.StreetNo, c.Week, SUM (c.Total) as Total
FROM Customers a
INNER JOIN Addresses b ON a.Id = b.CustomerId
INNER JOIN Sales c ON b.Id = c.AddressId
GROUP BY a.Name, c.Week, b.Street, b.StreetNo
and even if my SQL skill are close to none it looks like it's doing its job. But now I would like to be able to show 0 whenever the one customer don't have sales for a particular week (weeks are just integers). And I wonder if somehow I should get distinct values of the weeks in the Sales table, and then loop through them (not sure how)
Any help?
Thanks
Use CROSS JOIN to generate the rows for all customers and weeks. Then use LEFT JOIN to bring in the data that is available:
SELECT c.Name, a.Street, a.StreetNo, w.Week,
COALESCE(SUM(s.Total), 0) as Total
FROM Customers c CROSS JOIN
(SELECT DISTINCT s.Week FROM sales s) w LEFT JOIN
Addresses a
ON c.CustomerId = a.CustomerId LEFT JOIN
Sales s
ON s.week = w.week AND s.AddressId = a.AddressId
GROUP BY c.Name, a.Street, a.StreetNo, w.Week;
Using table aliases is good, but the aliases should be abbreviations for the table names. So, a for Addresses not Customers.
You should generate a week numbers, rather than using DISTINCT. This is better in terms of performance and reliability. Then use a LEFT JOIN on the Sales table instead of an INNER JOIN:
SELECT a.Name
,b.Street
,b.StreetNo
,weeks.[Week]
,COALESCE(SUM(c.Total),0) as Total
FROM Customers a
INNER JOIN Addresses b ON a.Id = b.CustomerId
CROSS JOIN (
-- Generate a sequence of 52 integers (13 x 4)
SELECT ROW_NUMBER() OVER (ORDER BY a.x) AS [Week]
FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) a(x)
CROSS JOIN (SELECT x FROM (VALUES(1),(1),(1),(1)) b(x)) b
) weeks
LEFT JOIN Sales c ON b.Id = c.AddressId AND c.[Week] = weeek.[Week]
GROUP BY a.Name
,b.Street
,b.StreetNo
,weeks.[Week]
Please try the following...
SELECT Name,
Street,
StreetNo,
Week,
SUM( CASE
WHEN Total IS NULL THEN
0
ELSE
Total
END ) AS Total
FROM Customers a
JOIN Addresses b ON a.Id = b.CustomerId
RIGHT JOIN Sales c ON b.Id = c.AddressId
GROUP BY a.Name,
c.Week,
b.Street,
b.StreetNo;
I have modified your statement in three places. The first is I changed your join to Sales to a RIGHT JOIN. This will join as it would with an INNER JOIN, but it will also keep the records from the table on the right side of the JOIN that do not have a matching record or group of records on the left, placing NULL values in the resulting dataset's fields that would have come from the left of the JOIN. A LEFT JOIN works in the same way, but with any extra records in the table on the left being retained.
I have removed the word INNER from your surviving INNER JOIN. Where JOIN is not preceded by a join type, an INNER JOIN is performed. Both JOIN and INNER JOIN are considered correct, but the prevailing protocol seems to be to leave the INNER out, where the RDBMS allows it to be left out (which SQL-Server does). Which you go with is still entirely up to you - I have left it out here for illustrative purposes.
The third change is that I have added a CASE statement that tests to see if the Total field contains a NULL value, which it will if there were no sales for that Customer for that Week. If it does then SUM() would return a NULL, so the CASE statement returns a 0 instead. If Total does not contain a NULL value, then the SUM() of all values of Total for that grouping is performed.
Please note that I am assuming that Total will not have any NULL values other than from the RIGHT JOIN. Please advise me if this assumption is incorrect.
Please also note that I have assumed that either there will be no missing Weeks for a Customer in the Sales table or that you are not interested in listing them if there are. Again, please advise me if this assumption is incorrect.
If you have any questions or comments, then please feel free to post a Comment accordingly.

join with date dimension but don't want NULL for the dates with values

I have a query:
SELECT
d.FiscalMonth,
d.FiscalMonthOfYear,
p.Name
FROM
DimDate d
LEFT JOIN FactSales f on f.SaleDate=d.PKDate
LEFT JOIN DimPerson p on p.PersonId=f.PersonId
WHERE d.FiscalYear='2014/7/1'
group by d.FiscalMonth, d.FiscalMonthOfYear, p.Name
ORDER BY d.FiscalMonthOfYear asc, p.PersonID asc
Which gives me these results:
Which is all fine, I want to include all months, even the ones that don't have data. (In this case FiscalMonth 2-12.)
The problem I have is with that one NULL value where I have data, IE. FiscalMonthOfYear 1. The red box.
How would I go about not returning that one "NULL" for the FiscalMonth=2014-07-01? I've tried some various where clauses but any time I remove the "NULL" values from the results, I also remove all the ones I want (IE. FiscalMonthOfYear 2-12)
Any help or guidance is greatly appreciated!
Thanks!
-Russ
Update:
DimDate table has primary key PKDate, which is one row for every date:
DimDate
PKDate ....
2014-07-01
2014-07-02
2014-07-03
etc.
FaceSales table has one ore many Sales transactions for a given day:
FactSales
SaleDate Amount
2014-07-01 34.99
2014-07-01 21.89
2014-07-02 24.77
2014-07-04 22.77
The problem is that FactSales may not have a sale on a particular day. So my query is finding that one (or many) days with no transactions, and because of the LEFT JOIN is returning it. How would I go about removing this result so it's not in my results?
SELECT
d.PKDate
,f.SaleDate
FROM
DimDate d
LEFT JOIN FactSales f on f.SaleDate=d.PKDate
LEFT JOIN DimPerson p on p.PersonId=f.PersonId
WHERE d.FiscalYear='2014/7/1'
ORDER BY d.PKDate
The problems stems from the fact that you are actually trying to do two things at once:
You want all the Names related to sales of fiscal months with at
least one sale
You want an extra row for all fiscal month with no
sales
As often goes in these cases... you should solve the two distinct problems and then put together the results (with a UNION in this specific case).
Something like this:
SELECT * FROM
(
SELECT DISTINCT
d.FiscalMonth,
d.FiscalMonthOfYear,
p.Name
FROM DimDate d
JOIN FactSales f ON f.SaleDate=d.PKDate
JOIN DimPerson p ON p.PersonId=f.PersonId
WHERE d.FiscalYear='2014/7/1'
) UNION (
SELECT
d.FiscalMonth,
d.FiscalMonthOfYear,
NULL AS Name
FROM DimDate d
LEFT JOIN FactSales f ON f.SaleDate=d.PKDate
WHERE d.FiscalYear='2014/7/1'
GROUP BY d.FiscalMonth, d.FiscalMonthOfYear, p.Name
HAVING COUNT(f.SaleDate)=0
)
ORDER BY FiscalMonthOfYear asc, PersonID ASC
I haven't tested it, and there may be some better ways to solve the second part (SUBSELECT, EXISTS) but that depends a bit on the engine you are using.
You can do an inner join as follows:
SELECT
d.FiscalMonth,
d.FiscalMonthOfYear,
p.Name
FROM
DimDate d
INNER JOIN FactSales f on f.SaleDate=d.PKDate
LEFT JOIN DimPerson p on p.PersonId=f.PersonId
WHERE d.FiscalYear='2014/7/1'
group by d.FiscalMonth, d.FiscalMonthOfYear, p.Name
ORDER BY d.FiscalMonthOfYear asc, p.PersonID asc
The inner join does a union of the two tables without giving priority to the left table. For more on joins you can read this blog: Visual representation of sql joins
Which states that an INNER JOIN will return all of the records in the left table (table A) that have a matching record in the right table (table B) whearas a LEFT JOIN will return all of the records in the left table (table A) regardless if any of those records have a match in the right table (table B)

SQL server SELECT with join performance issue

Sorry about the saga here but am trying to explain everything.
We have 2 databases that I would like to join some tables in.
1 database holds sales data from various different stores/sites. This database is quite large (over 3mill rows currently) This table is ItemSales
The other holds application data from an in house web app. These tables are Departments and GroupItems
I would like to create a query that joins 2 tables from the app database with the sales database table. This is so we can group some items together for a date range and see the amount sold for example.
My first attempt was (DealId being the variable that it is grouped on in the App):
SELECT d.Id, d.ItemNo, d.UnitValue, d.NoGST, d.ItemStartDate, d.ItemEndDate,
(SELECT SUM(ItemQty) AS Expr1
FROM Sales.dbo.ItemSales AS s
WHERE (Store = d.SiteId) AND (ItemNo = d.ItemNo) AND (ItemSaleDate >= d.ItemStartDate) AND (ItemSaleDate <= d.ItemEndDate)) AS ItemsSold, Sales.dbo.ItemSales.ItemDesc, Departments.Description
FROM Departments INNER JOIN
Sales.dbo.ItemSales ON Departments.Id = Sales.dbo.ItemSales.ItemDept RIGHT OUTER JOIN
GroupItems AS d ON Sales.dbo.ItemSales.ItemNo = d.ItemNo
WHERE (d.DealId = 11)
GROUP BY d.Id, d.ItemNo, d.UnitValue, d.NoGST, d.ItemStartDate, d.ItemEndDate, ItemDesc, Departments.Description, d.SiteId
ORDER BY d.Id
This does exactly what I want which is:
-Give me all the details from the GroupItems table (UnitValue, ItemStartDate, ItemEndDate etc)
-Gives me the SUM() on the ItemQty column for the amount sold (plus the description etc)
-Returns NULL for something with no sales for the period
It is VERY slow though. To the point that if the GroupItems table has more than about 7 items in it, it times out.
Second attempt has been:
SELECT d.Id, d.ItemNo, d.UnitValue, d.NoGST, d.ItemStartDate, d.ItemEndDate, SUM(ItemQty) AS ItemsSold, Sales.dbo.ItemSales.ItemDesc, Departments.Description
FROM Departments INNER JOIN
Sales.dbo.ItemSales ON Departments.Id = Sales.dbo.ItemSales.ItemDept RIGHT OUTER JOIN
GroupItems AS d ON Sales.dbo.ItemSales.ItemNo = d.ItemNo
WHERE (Store = d.SiteId) AND (d.DealId = 11) AND (Sales.dbo.ItemSales.ItemSaleDate >= d.ItemStartDate) AND (Sales.dbo.ItemSales.ItemSaleDate <= d.ItemEndDate)
GROUP BY d.Id, d.ItemNo, d.UnitValue, d.NoGST, d.ItemStartDate, d.ItemEndDate, ItemDesc, Departments.Description
ORDER BY d.Id
This is very quick and does not time out but does not return the NULLs for no sales items in the ItemSales table. This is a problem as we need to see nothing or 0 for a no sales item otherwise people will think we forgot to check that item.
Can someone help me come up with a query please that returns everything from the GroupItems table, shows the SUM() of items sold and doesn't time out? I have also tried a SELECT x WHERE EXISTS (Subquery) but this also didn't return the NULLs for me but I may have had that one wrong.
If you want everything from GroupItems regardless of the sales, use it as the base of the query and then use left outer joins from there. Something along these lines:
SELECT GroupItems.Id, GroupItems.ItemNo, GroupItems.UnitValue, GroupItems.NoGST,
GroupItems.ItemStartDate, GroupItems.ItemEndDate,
Sales.ItemDesc,
SUM(ItemQty) AS SumOfSales,
Departments.Description
FROM GroupItems
LEFT OUTER JOIN #tempSales AS Sales ON
Sales.ItemNo = GroupItems.ItemNo
AND Sales.Store = GroupItems.SiteId
AND Sales.ItemSaleDate >= GroupItems.ItemStartDate
AND Sales.ItemSaleDate <= GroupItems.ItemEndDate
LEFT OUTER JOIN Departments ON Departments.Id = Sales.ItemDept
WHERE GroupItems.DealId = 11
GROUP BY GroupItems.Id, GroupItems.ItemNo, GroupItems.UnitValue, GroupItems.NoGST,
GroupItems.ItemStartDate, GroupItems.ItemEndDate,
Sales.ItemDesc,
SUM(ItemQty) AS SumOfSales,
Departments.Description
ORDER BY GroupItems.Id
Does changing the INNER JOIN to Sales.dbo.ItemSales into a LEFT OUTER JOIN to Sales.dbo.ItemSales and changing the RIGHT OUTER JOIN to GroupItems into an INNER JOIN to GroupItems fix your issue?

Left outer join two levels deep in Postgres results in cartesian product

Given the following 4 tables:
CREATE TABLE events ( id, name )
CREATE TABLE profiles ( id, event_id )
CREATE TABLE donations ( amount, profile_id )
CREATE TABLE event_members( id, event_id, user_id )
I'm attempting to get a list of all events, along with a count of any members, and a sum of any donations. The issue is the sum of donations is coming back wrong (appears to be a cartesian result of donations * # of event_members).
Here is the SQL query (Postgres)
SELECT events.name, COUNT(DISTINCT event_members.id), SUM(donations.amount)
FROM events
LEFT OUTER JOIN profiles ON events.id = profiles.event_id
LEFT OUTER JOIN donations ON donations.profile_id = profiles.id
LEFT OUTER JOIN event_members ON event_members.event_id = events.id
GROUP BY events.name
The sum(donations.amount) is coming back = to the actual sum of donations * number of rows in event_members. If I comment out the count(distinct event_members.id) and the event_members left outer join, the sum is correct.
As I explained in an answer to the referenced question you need to aggregate before joining to avoid a proxy CROSS JOIN. Like:
SELECT e.name, e.sum_donations, m.ct_members
FROM (
SELECT e.id AS event_id, e.name, SUM(d.amount) AS sum_donations
FROM events e
LEFT JOIN profiles p ON p.event_id = e.id
LEFT JOIN donations d ON d.profile_id = p.id
GROUP BY 1, 2
) e
LEFT JOIN (
SELECT m.event_id, count(DISTINCT m.id) AS ct_members
FROM event_members m
GROUP BY 1
) m USING (event_id);
IF event_members.id is the primary key, then id is guaranteed to be UNIQUE in the table and you can drop DISTINCT from the count:
count(*) AS ct_members
You seem to have this two independent structures (-[ means 1-N association):
events -[ profiles -[ donations
events -[ event members
I wrapped the second one into a subquery:
SELECT events.name,
member_count.the_member_count
COUNT(DISTINCT event_members.id),
SUM(donations.amount)
FROM events
LEFT OUTER JOIN profiles ON events.id = profiles.event_id
LEFT OUTER JOIN donations ON donations.profile_id = profiles.id
LEFT OUTER JOIN (
SELECT
event_id,
COUNT(*) AS the_member_count
FROM event_members
GROUP BY event_id
) AS member_count
ON member_count.event_id = events.id
GROUP BY events.name
Of course you get a cartesian product between donations and events for every event since both are only bound to the event, there is no join relation between donations and event_members other than the event id, which of course means that every member matches every donation.
When you do your query, you ask for all events - let's say there are two, event Alpha and event Beta - and then JOIN with the members. Let's say that there is a member Alice that participates on both events.
SELECT events.name, COUNT(DISTINCT event_members.id), SUM(donations.amount)
FROM events
LEFT OUTER JOIN profiles ON events.id = profiles.event_id
LEFT OUTER JOIN donations ON donations.profile_id = profiles.id
LEFT OUTER JOIN event_members ON event_members.event_id = events.id
GROUP BY events.name
On each row you asked the total for Alice's donations. If Alice donated 100 USD, then you asked for:
Alpha Alice 100USD
Beta Alice 100USD
So it's not surprising that when asking for the sum total Alice comes out as having donated 200 USD.
If you wanted the sum of all donations, you'd better doing with two distinct queries. Trying to do everything with a single query, while possible, would be a classical SQL Antipattern (actually the one in chapter #18, "Spaghetti Query"):
Unintended Products
One common consequence of producing all your
results in one query is a Cartesian product. This happens when two of
the tables in the query have no condition restricting their
relationship. Without such a restriction, the join of two tables pairs
each row in the first table to every row in the other table. Each such
pairing becomes a row of the result set, and you end up with many more
rows than you expect.