Left join 3 tables returning results for every match in right table - sql

Fairly straightforward, I have 3 tables i need to join. The DB (MSSQL) should have 1 record in the first two (p and u) tables, and then multiple records in the 3rd table (a).
I only want it to return a match from the first table (is that not a left outer join?) regardless if there is a match in the second table, but if there is display that match, and then if there is a match in the 3rd table (most situations there will be multiple matches) but to only use the first match when the column appt_date is ordered DESC (giving me the most recent appointment date)
SELECT p.person_id, p.ln, p.fn, p.sex,
u.ud1_id, u.ud2_id, a.date, a.time
FROM person p LEFT OUTER JOIN person_defined u
ON p.person_id = u.person_id LEFT OUTER JOIN appointments a
ON p.person_id = a.person_id
where p.home_phone = '123456789'
ORDER BY a.appt_date DESC

You can take advantage of the OUTER APPLY operator here to find the most recent appointment for each person. This is much easier than using the combination of the GROUP BY and ROW_NUMBER() operators.
SELECT
p.person_id, p.ln, p.fn, p.sex,
u.ud1_id, u.ud2_id,
pa.date, pa.time
FROM person p
LEFT OUTER JOIN person_defined u ON p.person_id = u.person_id
OUTER APPLY
(
select top 1 a.date, a.time
from appointments a
where a.person_id = p.person_id
order by a.appt_date desc
) pa
where p.home_phone = '123456789'

Related

How do I filter out SELECT results from tables?

I have three tables that have user name/id and how many tasks they have submitted. I'm trying to SELECT user.name and the max amount of submissions they have for a single task.
SELECT DISTINCT O.nimi, COUNT(T.id)
FROM Opiskelijat O
LEFT JOIN Lahetykset L ON O.id = L.opiskelija_id
LEFT JOIN Tehtavat T ON T.id = L.tehtava_id
GROUP BY O.id, L.tehtava_id
The first picture shows the tables in question. In the second picture the above is what I'm trying to get, and the bottom is what my code does at the moment. I'm trying to get it to only show Maija - 3 instead of both.
Maybe somehing like this:
SELECT O.nimi, COUNT(*)
FROM Opiskelijat O
LEFT JOIN Lahetykset L ON O.id = L.opiskelija_id
LEFT JOIN Tehtavat T ON T.id = L.tehtava_id
GROUP BY O.id, O.nimi
As was pointed out, you need to have all the selected output fields in the GROUP BY. And also the DISTINCT should not be needed. I think the problem was the group by on L.tehtava_id.
If you want one row per nimi that should be the only column in the GROUP BY. I think you want:
SELECT O.nimi, COUNT(T.id)
FROM Opiskelijat O LEFT JOIN
Lahetykset L
ON O.id = L.opiskelija_id LEFT JOIN
Tehtavat T
ON T.id = L.tehtava_id
GROUP BY O.nimi;
I suspect that you don't actually need the join to Tehtavat:
SELECT O.nimi, COUNT(L.tehtava_id)
FROM Opiskelijat O LEFT JOIN
Lahetykset L
ON O.id = L.opiskelija_id
GROUP BY O.nimi;

Multiple joins with group by (Sum)

When I using multiple JOIN, I hope to get the sum of some column in joined tables.
SELECT
A.*,
SUM(C.purchase_price) AS purcchase_total,
SUM(D.sales_price) AS sales_total,
B.user_name
FROM
PROJECT AS A
LEFT JOIN
USER AS B ON A.user_idx = B.user_idx
LEFT JOIN
PURCHASE AS C ON A.project_idx = C.project_idx
LEFT JOIN
SALES AS D ON A.project_idx = D.project_idx
GROUP BY
????
You need to use subquery as follows:
SELECT A.project_idx,
a.project_name,
A.project_category,
sum(C.purchase_price) AS purcchase_total,
sum(D.sales_price) as sales_total,
B.user_name
FROM PROJECT AS A
LEFT JOIN USER AS B ON A.user_idx = B.user_idx
LEFT JOIN (select project_idx, sum(purchase_price) as purchase_price
from PURCHASE group by project_idx ) AS C ON A.project_idx = C.project_idx
LEFT JOIN (select project_idx, sum(sale_price) as sale_price
from SALES group by project_idx) AS D ON A.project_idx = D.project_idx
I am not sure but you can use inner join of project with user instead of left join.
SELECT A.project_idx,
a.project_name,
A.project_category,
purcchase_total,
sales_total,
B.user_name
FROM PROJECT AS A
LEFT JOIN USER AS B ON A.user_idx = B.user_idx
LEFT JOIN (select project_idx, sum(purchase_price) as purchase_total
from PURCHASE group by project_idx ) AS C ON A.project_idx = C.project_idx
LEFT JOIN (select project_idx, sum(sale_price) as sale_total
from SALES group by project_idx) AS D ON A.project_idx = D.project_idx
This is working correctly on MS-SQL Server.
Thanks to Popeye
You are attempting to aggregate over two unrelated dimensions, and that throws off all the calculations.
Correlated subqueries are an alternative:
SELECT p.*,
(SELECT SUM(pu.purchase_price)
FROM PURCHASE pu
WHERE p.project_idx = pu.project_idx
) as purchase_total,
(SELECT SUM(s.sales_price)
FROM SALES s
WHERE p.project_idx = s.project_idx
) as sales_total,
u.user_name
FROM PROJECT p LEFT JOIN
USER u
ON p.user_idx = u.user_idx ;
Note that this uses meaningful table aliases so the query is easier to read. Arbitrary letters are really no better (and perhaps worse) than using the entire table name.
Correlated subqueries avoid the outer aggregation as well -- and let you select all the columns from the first table, which is what you want. They also often have better performance with the right indexes.

How to add condition on the left table to include “zero” / “0” results in COUNT aggregate?

I have an SQL-select:
SELECT
p.id,
COUNT(a.id)
FROM Person p
LEFT JOIN Account a
ON a.person_id = p.id
WHERE p.id = 1
GROUP BY p.id;
and it works fine. But if I add a condition on left table this query will return no rows instead of zero count:
SELECT
p.id,
COUNT(a.id)
FROM Person p
LEFT JOIN Account a
ON a.person_id = p.id
WHERE p.id = 1 AND a.state = '0'
GROUP BY p.id;
How can add the condition on the left table that returns 0 count in case there are no results?
In a LEFT JOIN, conditions on the second table need to be in the ON clause:
SELECT p.id, COUNT(a.id)
FROM Person p LEFT JOIN
Account a
ON a.person_id = p.id AND a.state = '0'
WHERE p.id = 1
GROUP BY p.id;
The rule is pretty simple to follow. A LEFT JOIN keeps all rows in the first table, even when there is no match in the second table. The values in the second table become NULL. The NULL value will fail the condition a.state = '0'.

WHERE clause in an SQL query

I THINK what is happening with this query is if there are no records in the GenericAttribute table associated with the Product, then that product is not displayed. See line below in WHERE clause: "AND GenericAttribute.KeyGroup = 'Product'"
Is there a way to reword so that that part of the WHERE is ignored if no associated record in the GenericAttribute table?
Also, looking at my ORDER BY clause, will a record from the product table still show up if it has no associated record in the Pvl_AdDates table?
Thanks!
SELECT DISTINCT Product_Category_Mapping.CategoryId, Product.Id, Product.Name, Product.ShortDescription, Pvl_AdDates.Caption, Pvl_AdDates.EventDateTime, convert(varchar(25), Pvl_AdDates.EventDateTime, 120) AS TheDate, Pvl_AdDates.DisplayOrder, Pvl_Urls.URL, [Address].FirstName, [Address].LastName, [Address].Email, [Address].Company, [Address].City, [Address].Address1, [Address].Address2, [Address].ZipPostalCode, [Address].PhoneNumber
FROM [Address]
RIGHT JOIN (GenericAttribute
RIGHT JOIN (Pvl_Urls RIGHT JOIN (Pvl_AdDates
RIGHT JOIN (Product_Category_Mapping
LEFT JOIN Product
ON Product_Category_Mapping.ProductId = Product.Id)
ON Pvl_AdDates.ProductId = Product.Id)
ON Pvl_Urls.ProductId = Product.Id)
ON GenericAttribute.EntityId = Product.Id)
ON Address.Id = convert(int, GenericAttribute.Value)
WHERE
Product_Category_Mapping.CategoryId=12
AND GenericAttribute.KeyGroup = 'Product'
AND Product.Published=1
AND Product.Deleted=0
AND Product.AvailableStartDateTimeUtc <= getdate()
AND Product.AvailableEndDateTimeUtc >= getdate()
ORDER BY
Pvl_AdDates.EventDateTime DESC,
Product.Id,
Pvl_AdDates.DisplayOrder
I strongly encourage you to not mix left join and right join. I have written many SQL queries and cannot think of an occasion when that was necessary.
In fact, just stick to left join.
If you want all products (or at least all products not filtered out by the where clause), then start with the products table and go from there:
FROM Products p LEFT JOIN
Product_Category_Mapping pcm
ON pcm.ProductId = p.Id LEFT JOIN
Pvl_AdDates ad
ON ad.ProductId = p.id LEFT JOIN
Pvl_Urls u
ON u.ProductId = p.id LEFT JOIN
GenericAttribute ga
ON ga.EntityId = p.id LEFT JOIN
Address a
ON a.Id = convert(int, ga.Value)
Note that I added table aliases. These make queries easier to write and to read.
I would add a caution. It looks like you are combining data along different dimensions. You are likely to get a Cartesian product of the dimension attributes for each dimension. Perhaps that is what you want or the WHERE clause takes care of the additional rows.
Yes put constraints (restrictions) on tables on the outer side of outer joins in the on conditions of the outer join, not in the where clause. Conditions in where clauses are not evaluated and applied until after the outer joins are evaluated, so where there is not record in the outer table, the predicate will be false and entire row will be eliminated, undoing the outer-ness. Conditions in the join are evaluated during the join, before the rows from the inner side are added back in, so the result set will still include them.
Second, formatting formatting, formatting! Stick to one direction of join (left is easier) and use Aliases for tables names!
SELECT DISTINCT m.CategoryId, p.Id,
p.Name, p.ShortDescription, d.Caption, d.EventDateTime,
convert(varchar(25), d.EventDateTime, 120) TheDate,
d.DisplayOrder, u.URL, a.FirstName, a.LastName,
a.Email, a.Company, a.City, a.Address1, a.Address2,
a.ZipPostalCode, a.PhoneNumber
FROM Product_Category_Mapping m
left join Product p on p.Id = m.ProductId
and p.Published=1
and p.Deleted=0
and p.AvailableStartDateTimeUtc <= getdate()
and p.AvailableEndDateTimeUtc >= getdate()
left join Pvl_AdDates d ON d.ProductId = p.Id
left join Pvl_Urls u ON u.ProductId = p.Id
left join GenericAttribute g ON g.EntityId = p.Id
and g.KeyGroup = 'Product'
left join [Address] a ON a.Id = convert(int, g.Value)
WHERE m.CategoryId=12
ORDER BY d.EventDateTime DESC, p.Id, d.DisplayOrder

Query extensibility with WHERE EXISTS with a large table

The following query is designed to find the number of people who went to a hospital, the total number of people who went to a hospital and the divide those two to find a percentage. The table Claims is two million plus rows and does have the correct non-clustered index of patientid, admissiondate, and dischargdate. The query runs quickly enough but I'm interested in how I could make it more usable. I would like to be able to add another code in the line where (hcpcs.hcpcs ='97001') and have the change in percentRehabNotHomeHealth be relfected in another column. Is there possible without writing a big, fat join statement where I join the results of the two queries together? I know that by adding the extra column the math won't look right, but I'm not worried about that at the moment. desired sample output: http://imgur.com/BCLrd
database schema
select h.hospitalname
,count(*) as visitCounts
,hospitalcounts
,round(count(*)/cast(hospitalcounts as float) *100,2) as percentRehabNotHomeHealth
from Patient p
inner join statecounties as sc on sc.countycode = p.countycode
and sc.statecode = p.statecode
inner join hospitals as h on h.npi=p.hospitalnpi
inner join
--this join adds the hospitalCounts column
(
select h.hospitalname, count(*) as hospitalCounts
from hospitals as h
inner join patient as p on p.hospitalnpi=h.npi
where p.statecode='21' and h.statecode='21'
group by h.hospitalname
) as t on t.hospitalname=h.hospitalname
--this where exists clause gives the visitCounts column
where h.stateCode='21' and p.statecode='21'
and exists
(
select distinct p2.patientid
from Patient as p2
inner join Claims as c on c.patientid = p2.patientid
and c.admissiondate = p2.admissiondate
and c.dischargedate = p2.dischargedate
inner join hcpcs on hcpcs.hcpcs=c.hcpcs
inner join hospitals as h on h.npi=p2.hospitalnpi
where (hcpcs.hcpcs ='97001' or hcpcs.hcpcs='9339' or hcpcs.hcpcs='97002')
and p2.patientid=p.patientid
)
and hospitalcounts > 10
group by h.hospitalname, t.hospitalcounts
having count(*)>10
You might look into CTE (Common Table Expressions) to get what you need. It would allow you to get summarized data and join that back to the detail on a common key. As an example I modified your join on the subquery to be a CTE.
;with hospitalCounts as (
select h.hospitalname, count(*) as hospitalCounts
from hospitals as h
inner join patient as p on p.hospitalnpi=h.npi
where p.statecode='21' and h.statecode='21'
group by h.hospitalname
)
select h.hospitalname
,count(*) as visitCounts
,hospitalcounts
,round(count(*)/cast(hospitalcounts as float) *100,2) as percentRehabNotHomeHealth
from Patient p
inner join statecounties as sc on sc.countycode = p.countycode
and sc.statecode = p.statecode
inner join hospitals as h on h.npi=p.hospitalnpi
inner join hospitalCounts on t.hospitalname=h.hospitalname
--this where exists clause gives the visitCounts column
where h.stateCode='21' and p.statecode='21'
and exists
(
select p2.patientid
from Patient as p2
inner join Claims as c on c.patientid = p2.patientid
and c.admissiondate = p2.admissiondate
and c.dischargedate = p2.dischargedate
inner join hcpcs on hcpcs.hcpcs=c.hcpcs
inner join hospitals as h on h.npi=p2.hospitalnpi
where (hcpcs.hcpcs ='97001' or hcpcs.hcpcs='9339' or hcpcs.hcpcs='97002')
and p2.patientid=p.patientid
)
and hospitalcounts > 10
group by h.hospitalname, t.hospitalcounts
having count(*)>10