Left Join not working as expected - sql

Hi helpful clever people, I am running into an issue when trying to prepare a query. I am trying to join a count of sales and a running total of sales to a prebuilt temp table.
The temp table (TMP_WEEK_SHOP) just has 2 rows, a list of week codes(WEEK_ID) and then an entry with that code for every location(SHOP_ID).
Now my issue is that no matter what i do, my queries will always omit results from the temp table if there were no sales for that location during that week. I need all entries for every week to be entered, even the 0 sales shops.
From what i can gather a left outer join should give me this, but no matter what i have tried it keeps omitting any shops without sales. I should say I am running on an SQL Server 2005 environment with Server Management Studio.
Any help at all would be fantastic!
USE CX
SELECT tws.WEEK_ID as Week, isnull(tws.SHOP_ID,0)as Shop, isnull(count(sal.SALES_ITEMS_ID),0)Sales, isnull(sum(sal.LINEVALUE),0)Sales_Value
FROM TMP_WEEK_SHOP tws
JOIN CX_DATES dat
on dat.WEEK_ID=tws.WEEK_ID
LEFT OUTER JOIN CX_SALES_ITEMS sal
on sal.DATE_ID=dat.DATE_ID
and sal.SHOP_NUM=tws.SHOP_ID
JOIN CX_STYLES sty
on sal.STY_QUAL = sty.STY_QUAL
WHERE sty.STY_RET_TYPE='BIKES'
and (sal.SHOP_NUM='70006' or sal.SHOP_NUM='70008' or sal.SHOP_NUM='70010' or sal.SHOP_NUM='70018' or sal.SHOP_NUM='70028' or sal.SHOP_NUM='70029' or sal.SHOP_NUM='70012' or sal.SHOP_NUM='70016' or sal.SHOP_NUM='70026')
GROUP BY tws.WEEK_ID, tws.SHOP_ID
ORDER BY tws.WEEK_ID, tws.SHOP_ID

This is your query, formatted a bit better so I can read it:
SELECT tws.WEEK_ID as Week, isnull(tws.SHOP_ID,0)as Shop,
isnull(count(sal.SALES_ITEMS_ID),0)Sales, isnull(sum(sal.LINEVALUE),0)Sales_Value
FROM TMP_WEEK_SHOP tws JOIN
CX_DATES dat
on dat.WEEK_ID = tws.WEEK_ID LEFT OUTER JOIN
CX_SALES_ITEMS sal
on sal.DATE_ID = dat.DATE_ID and
sal.SHOP_NUM = tws.SHOP_ID JOIN
CX_STYLES sty
on sal.STY_QUAL = sty.STY_QUAL
WHERE sty.STY_RET_TYPE='BIKES' and
(sal.SHOP_NUM in ('70006', '70008', '70010', '70018', '70028', '70029', '70012', '70016', '70026')
GROUP BY tws.WEEK_ID, tws.SHOP_ID
ORDER BY tws.WEEK_ID, tws.SHOP_ID;
You have three problems that are "undoing" the left outer join. The inner join condition will fail when sal.STY_QUAL is NULL. SImilarly, the where conditions have the same problem.
You need for all the joins to be left outer joins and to move the where conditions to on clauses:
SELECT tws.WEEK_ID as Week, isnull(tws.SHOP_ID,0)as Shop,
isnull(count(sal.SALES_ITEMS_ID),0)Sales, isnull(sum(sal.LINEVALUE),0)Sales_Value
FROM TMP_WEEK_SHOP tws JOIN
CX_DATES dat
on dat.WEEK_ID = tws.WEEK_ID LEFT OUTER JOIN
CX_SALES_ITEMS sal
on sal.DATE_ID = dat.DATE_ID and
sal.SHOP_NUM = tws.SHOP_ID and
sal.SHOP_NUM in ('70006', '70008', '70010', '70018', '70028', '70029', '70012', '70016', '70026'
) LEFT OUTER JOIN
CX_STYLES sty
on sal.STY_QUAL = sty.STY_QUAL and
sty.STY_RET_TYPE='BIKES'
GROUP BY tws.WEEK_ID, tws.SHOP_ID
ORDER BY tws.WEEK_ID, tws.SHOP_ID;
In addition, count() never returns a NULL value, so using isnull() or coalesce() is unnecessary.

your other join onvolving CX_SALES_ITEMS sal needs to be an left outer join as well

Related

Inner Join and Left Join on 5 tables in Access using SQL

I am attempting to access data from the following tables:
OrgPlanYear
ProjOrgPlnYrJunction
DC
DCMaxEEContribLevel
DCNonDiscretionaryContribLevel
Basically, I need to inner join OrgPlanYear + DC and ProjOrgPlnYrJunction then I need to Left Join the remaining tables (tables 4 and 5) due to the fact the tables 1-3 have all the rows I need and only some have data in tables 4-5. I need several variables from each table. I also need the WHERE function to be across all fields (meaning I want all this data for a select group where projectID=919).
Please help!
I have tried many things with errors including attempting to use the Design Query side (i.e. JOIN function issues, badly formatted FROM function, etc.)! Here is an example of one excluding all variables I need:
SELECT
ProjOrgPlnYrJunction.fkeyProjectID, OrgPlanYear.OrgName, DC.PlanCode, DCNonDiscretionaryContribLevel.Age,DCNonDiscretionaryContribLevel.Service
FROM
(((OrgPlanYear INNER JOIN DC ON OrgPlanYear.OrgPlanYearID = DC.fkeyOrgPlanYearID) INNER JOIN ProjOrgPlnYrJunction ON OrgPlanYear.OrgPlanYearID = ProjOrgPlnYrJunction.fkeyOrgPlanYearID)
LEFT JOIN
(SELECT DCNonDiscretionaryContribLevel.Age AS Age, DCNonDiscretionaryContribLevel.Service AS Service FROM DCNonDiscretionaryContribLevel WHERE ProjOrgPlnYrJunction.fkeyProjectID)=919)
LEFT JOIN (
SELECT DCMaxEEContribLevel.EEContribRoth FROM EEContribRoth WHERE ProjOrgPlnYrJunction.fkeyProjectID)=919)
ORDER BY OrgPlanYear.OrgName;
Main issues with your query:
Missing ON clauses for each LEFT JOIN.
Referencing other table columns in SELECT and WHERE of a different subquery (e.g., FROM DCNonDiscretionaryContribLevel WHERE ProjOrgPlnYrJunction.fkeyProjectID).
Unmatched parentheses around subqueries and joins per Access SQL requirements.
See below adjusted SQL that now uses short table aliases. Be sure to adjust SELECT and ON clauses with appropriate columns.
SELECT p.fkeyProjectID, o.OrgName, DC.PlanCode, dcn.Age, dcn.Service, e.EEContribRoth
FROM (((OrgPlanYear o
INNER JOIN DC
ON o.OrgPlanYearID = DC.fkeyOrgPlanYearID)
INNER JOIN ProjOrgPlnYrJunction p
ON o.OrgPlanYearID = p.fkeyOrgPlanYearID)
LEFT JOIN
(SELECT Age AS Age, Service AS Service
FROM DCNonDiscretionaryContribLevel
WHERE fkeyProjectID = 919) AS dcn
ON dcn.fkeyProjectID = p.fkeyOrgPlanYearID)
LEFT JOIN
(SELECT EEContribRoth
FROM EEContribRoth
WHERE fkeyProjectID = 919) AS e
ON e.fkeyProjectID = p.fkeyProjectID
ORDER BY o.OrgName;

SQL Get aggregate as 0 for non existing row using inner joins

I am using SQL Server to query these three tables that look like (there are some extra columns but not that relevant):
Customers -> Id, Name
Addresses -> Id, Street, StreetNo, CustomerId
Sales -> AddressId, Week, Total
And I would like to get the total sales per week and customer (showing at the same time the address details). I have come up with this query
SELECT a.Name, b.Street, b.StreetNo, c.Week, SUM (c.Total) as Total
FROM Customers a
INNER JOIN Addresses b ON a.Id = b.CustomerId
INNER JOIN Sales c ON b.Id = c.AddressId
GROUP BY a.Name, c.Week, b.Street, b.StreetNo
and even if my SQL skill are close to none it looks like it's doing its job. But now I would like to be able to show 0 whenever the one customer don't have sales for a particular week (weeks are just integers). And I wonder if somehow I should get distinct values of the weeks in the Sales table, and then loop through them (not sure how)
Any help?
Thanks
Use CROSS JOIN to generate the rows for all customers and weeks. Then use LEFT JOIN to bring in the data that is available:
SELECT c.Name, a.Street, a.StreetNo, w.Week,
COALESCE(SUM(s.Total), 0) as Total
FROM Customers c CROSS JOIN
(SELECT DISTINCT s.Week FROM sales s) w LEFT JOIN
Addresses a
ON c.CustomerId = a.CustomerId LEFT JOIN
Sales s
ON s.week = w.week AND s.AddressId = a.AddressId
GROUP BY c.Name, a.Street, a.StreetNo, w.Week;
Using table aliases is good, but the aliases should be abbreviations for the table names. So, a for Addresses not Customers.
You should generate a week numbers, rather than using DISTINCT. This is better in terms of performance and reliability. Then use a LEFT JOIN on the Sales table instead of an INNER JOIN:
SELECT a.Name
,b.Street
,b.StreetNo
,weeks.[Week]
,COALESCE(SUM(c.Total),0) as Total
FROM Customers a
INNER JOIN Addresses b ON a.Id = b.CustomerId
CROSS JOIN (
-- Generate a sequence of 52 integers (13 x 4)
SELECT ROW_NUMBER() OVER (ORDER BY a.x) AS [Week]
FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) a(x)
CROSS JOIN (SELECT x FROM (VALUES(1),(1),(1),(1)) b(x)) b
) weeks
LEFT JOIN Sales c ON b.Id = c.AddressId AND c.[Week] = weeek.[Week]
GROUP BY a.Name
,b.Street
,b.StreetNo
,weeks.[Week]
Please try the following...
SELECT Name,
Street,
StreetNo,
Week,
SUM( CASE
WHEN Total IS NULL THEN
0
ELSE
Total
END ) AS Total
FROM Customers a
JOIN Addresses b ON a.Id = b.CustomerId
RIGHT JOIN Sales c ON b.Id = c.AddressId
GROUP BY a.Name,
c.Week,
b.Street,
b.StreetNo;
I have modified your statement in three places. The first is I changed your join to Sales to a RIGHT JOIN. This will join as it would with an INNER JOIN, but it will also keep the records from the table on the right side of the JOIN that do not have a matching record or group of records on the left, placing NULL values in the resulting dataset's fields that would have come from the left of the JOIN. A LEFT JOIN works in the same way, but with any extra records in the table on the left being retained.
I have removed the word INNER from your surviving INNER JOIN. Where JOIN is not preceded by a join type, an INNER JOIN is performed. Both JOIN and INNER JOIN are considered correct, but the prevailing protocol seems to be to leave the INNER out, where the RDBMS allows it to be left out (which SQL-Server does). Which you go with is still entirely up to you - I have left it out here for illustrative purposes.
The third change is that I have added a CASE statement that tests to see if the Total field contains a NULL value, which it will if there were no sales for that Customer for that Week. If it does then SUM() would return a NULL, so the CASE statement returns a 0 instead. If Total does not contain a NULL value, then the SUM() of all values of Total for that grouping is performed.
Please note that I am assuming that Total will not have any NULL values other than from the RIGHT JOIN. Please advise me if this assumption is incorrect.
Please also note that I have assumed that either there will be no missing Weeks for a Customer in the Sales table or that you are not interested in listing them if there are. Again, please advise me if this assumption is incorrect.
If you have any questions or comments, then please feel free to post a Comment accordingly.

SQL server SELECT with join performance issue

Sorry about the saga here but am trying to explain everything.
We have 2 databases that I would like to join some tables in.
1 database holds sales data from various different stores/sites. This database is quite large (over 3mill rows currently) This table is ItemSales
The other holds application data from an in house web app. These tables are Departments and GroupItems
I would like to create a query that joins 2 tables from the app database with the sales database table. This is so we can group some items together for a date range and see the amount sold for example.
My first attempt was (DealId being the variable that it is grouped on in the App):
SELECT d.Id, d.ItemNo, d.UnitValue, d.NoGST, d.ItemStartDate, d.ItemEndDate,
(SELECT SUM(ItemQty) AS Expr1
FROM Sales.dbo.ItemSales AS s
WHERE (Store = d.SiteId) AND (ItemNo = d.ItemNo) AND (ItemSaleDate >= d.ItemStartDate) AND (ItemSaleDate <= d.ItemEndDate)) AS ItemsSold, Sales.dbo.ItemSales.ItemDesc, Departments.Description
FROM Departments INNER JOIN
Sales.dbo.ItemSales ON Departments.Id = Sales.dbo.ItemSales.ItemDept RIGHT OUTER JOIN
GroupItems AS d ON Sales.dbo.ItemSales.ItemNo = d.ItemNo
WHERE (d.DealId = 11)
GROUP BY d.Id, d.ItemNo, d.UnitValue, d.NoGST, d.ItemStartDate, d.ItemEndDate, ItemDesc, Departments.Description, d.SiteId
ORDER BY d.Id
This does exactly what I want which is:
-Give me all the details from the GroupItems table (UnitValue, ItemStartDate, ItemEndDate etc)
-Gives me the SUM() on the ItemQty column for the amount sold (plus the description etc)
-Returns NULL for something with no sales for the period
It is VERY slow though. To the point that if the GroupItems table has more than about 7 items in it, it times out.
Second attempt has been:
SELECT d.Id, d.ItemNo, d.UnitValue, d.NoGST, d.ItemStartDate, d.ItemEndDate, SUM(ItemQty) AS ItemsSold, Sales.dbo.ItemSales.ItemDesc, Departments.Description
FROM Departments INNER JOIN
Sales.dbo.ItemSales ON Departments.Id = Sales.dbo.ItemSales.ItemDept RIGHT OUTER JOIN
GroupItems AS d ON Sales.dbo.ItemSales.ItemNo = d.ItemNo
WHERE (Store = d.SiteId) AND (d.DealId = 11) AND (Sales.dbo.ItemSales.ItemSaleDate >= d.ItemStartDate) AND (Sales.dbo.ItemSales.ItemSaleDate <= d.ItemEndDate)
GROUP BY d.Id, d.ItemNo, d.UnitValue, d.NoGST, d.ItemStartDate, d.ItemEndDate, ItemDesc, Departments.Description
ORDER BY d.Id
This is very quick and does not time out but does not return the NULLs for no sales items in the ItemSales table. This is a problem as we need to see nothing or 0 for a no sales item otherwise people will think we forgot to check that item.
Can someone help me come up with a query please that returns everything from the GroupItems table, shows the SUM() of items sold and doesn't time out? I have also tried a SELECT x WHERE EXISTS (Subquery) but this also didn't return the NULLs for me but I may have had that one wrong.
If you want everything from GroupItems regardless of the sales, use it as the base of the query and then use left outer joins from there. Something along these lines:
SELECT GroupItems.Id, GroupItems.ItemNo, GroupItems.UnitValue, GroupItems.NoGST,
GroupItems.ItemStartDate, GroupItems.ItemEndDate,
Sales.ItemDesc,
SUM(ItemQty) AS SumOfSales,
Departments.Description
FROM GroupItems
LEFT OUTER JOIN #tempSales AS Sales ON
Sales.ItemNo = GroupItems.ItemNo
AND Sales.Store = GroupItems.SiteId
AND Sales.ItemSaleDate >= GroupItems.ItemStartDate
AND Sales.ItemSaleDate <= GroupItems.ItemEndDate
LEFT OUTER JOIN Departments ON Departments.Id = Sales.ItemDept
WHERE GroupItems.DealId = 11
GROUP BY GroupItems.Id, GroupItems.ItemNo, GroupItems.UnitValue, GroupItems.NoGST,
GroupItems.ItemStartDate, GroupItems.ItemEndDate,
Sales.ItemDesc,
SUM(ItemQty) AS SumOfSales,
Departments.Description
ORDER BY GroupItems.Id
Does changing the INNER JOIN to Sales.dbo.ItemSales into a LEFT OUTER JOIN to Sales.dbo.ItemSales and changing the RIGHT OUTER JOIN to GroupItems into an INNER JOIN to GroupItems fix your issue?

When I add a LEFT OUTER JOIN, the query returns only a few rows

The original query returns 160k rows. When I add the LEFT OUTER JOIN:
LEFT OUTER JOIN Table_Z Z WITH (NOLOCK) ON A.Id = Z.Id
the query returns only 150 rows. I'm not sure what I'm doing wrong.
All I need to do is add a column to the query, which will bring back a code from a different table. The code could be a number or a NULL. I still have to display NULL, hence the reason for the LEFT join. They should join on the "id" columns.
SELECT <lots of stuff> + the new column that I need (called "code").
FROM
dbo.Table_A A WITH (NOLOCK)
INNER JOIN
dbo.Table_B B WITH (NOLOCK) ON A.Id = B.Id AND A.version = B.version
--this is where I added the LEFT OUTER JOIN. with it, the query returns 150 rows, without it, 160k rows.
LEFT OUTER JOIN
Table_Z Z WITH (NOLOCK) ON A.Id = Z.Id
LEFT OUTER JOIN
Table_E E WITH (NOLOCK) ON A.agent = E.agent
LEFT OUTER JOIN
Table_D D WITH (NOLOCK) ON E.location = D.location
AND E.type = 'Organization'
AND D.af_type = 'agent_location'
INNER JOIN
(SELECT X , MAX(Version) AS MaxVersion
FROM LocalTable WITH (NOLOCK)
GROUP BY agemt) P ON E.agent = P.location AND E.Version = P.MaxVersion
Does anyone have any idea what could be causing the issue?
When you perform a LEFT OUTER JOIN between tables A and E, you are maintaining your original set of data from A. That is to say, there is no data, or lack of data, in table E that can reduce the number of rows in your query.
However, when you then perform an INNER JOIN between E and P at the bottom, you are indeed opening yourself up to the possibility of reducing the number of rows returned. This will treat your subsequent LEFT OUTER JOINs like INNER JOINs.
Now, without your exact schema and a set of data to test against, this may or may not be the exact issue you are experiencing. Still, as a general rule, always put your INNER JOINs before your OUTER JOINs. It can make writing queries like this much, much easier. Your most restrictive joins come first, and then you won't have to worry about breaking any of your outer joins later on.
As a quick fix, try changing your last join to P to a LEFT OUTER JOIN, just to see if the Z join works.
You have to be very careful once you start with LEFT JOINs.
Let's suppose this model: You have tables Products, Orders and Customers. Not all products necessarily have been ordered, but every order must have customer entered.
Task: Show all products, and if the product was ordered, list the ordering customers; i.e., product without orders will be shown as one row, product with 10 orders will have 10 rows in the resultset. This calls for a query designed around FROM Products LEFT JOIN Orders.
Now someone could think "OK, Customer is always entered into orders, so I can make inner join from orders to customers". Wrong. Since the table Customers is joined through left-joined table Orders, it has to be left-joined itself... otherwise the inner join will propagate into the previous level(s) and as a result, you will lose all products that have no orders.
That is, once you join any table using LEFT JOIN, any subsequent tables that are joined through this table, need to keep LEFT JOINs. But it does not mean that once you use LEFT JOIN, all joins have to be of that type... only those that are dependent on the first performed LEFT JOIN. It would be perfectly fine to INNER JOIN the table Products with another table Category for example, if you only want to see Products which have a category set.
(Answer is based on this answer: http://www.sqlservercentral.com/Forums/Topic247971-8-1.aspx -> last entry)

Left outer join and group by issue

I wrote a query. this query sum fields from 2 different table. And grouped by main table id field. But second left outer join is not grouped and giving me different results.
SELECT s.*,
f.firma_adi,
sum(sd.fiyat) AS konak,
sum(ss.fiyat) AS sponsor
FROM fuar_sozlesme1 s
INNER JOIN fuar_firma_2012 f
ON ( s.cari = f.cari )
LEFT OUTER JOIN fuar_sozlesme1_detay sd
ON ( sd.sozlesme_id = s.id )
LEFT OUTER JOIN fuar_sozlesme1_sponsor ss
ON ( ss.sozlesme_id = s.id )
GROUP BY s.id
ORDER BY s.id DESC
I know, it is really complicated but I'm stucking on this issue.
My question is: why second left outer join is not correctly sum of field . If I remove second left outer join or first, everything is normal.
The problem is that you have multiple dimensions on your data, and the number of rows is multiplying beyond what you expect. I would suggest that you run the query for one id, without the group by, to see what rows the join is producing.
One way to fix this is by using correlated subqueries:
select s.*, f.firma_adi,
(select SUM(sd.fiyat)
from fuar_sozlesme1_detay fd
where sd.sozlesme_id = s.id
) as konak,
(select SUM(ss.fiyat)
from fuar_sozlesme1_sponsor ss
where (ss.sozlesme_id = s.id)
) as sponsor
from fuar_sozlesme1 s inner join
fuar_firma_2012 f
on (s.cari = f.cari)
order by s.id DESC
By the way, you appear to by using MySQL (because your query is not parsable in any other dialect). You should tag your questions with the version of the database you are using.