join with date dimension but don't want NULL for the dates with values - sql

I have a query:
SELECT
d.FiscalMonth,
d.FiscalMonthOfYear,
p.Name
FROM
DimDate d
LEFT JOIN FactSales f on f.SaleDate=d.PKDate
LEFT JOIN DimPerson p on p.PersonId=f.PersonId
WHERE d.FiscalYear='2014/7/1'
group by d.FiscalMonth, d.FiscalMonthOfYear, p.Name
ORDER BY d.FiscalMonthOfYear asc, p.PersonID asc
Which gives me these results:
Which is all fine, I want to include all months, even the ones that don't have data. (In this case FiscalMonth 2-12.)
The problem I have is with that one NULL value where I have data, IE. FiscalMonthOfYear 1. The red box.
How would I go about not returning that one "NULL" for the FiscalMonth=2014-07-01? I've tried some various where clauses but any time I remove the "NULL" values from the results, I also remove all the ones I want (IE. FiscalMonthOfYear 2-12)
Any help or guidance is greatly appreciated!
Thanks!
-Russ
Update:
DimDate table has primary key PKDate, which is one row for every date:
DimDate
PKDate ....
2014-07-01
2014-07-02
2014-07-03
etc.
FaceSales table has one ore many Sales transactions for a given day:
FactSales
SaleDate Amount
2014-07-01 34.99
2014-07-01 21.89
2014-07-02 24.77
2014-07-04 22.77
The problem is that FactSales may not have a sale on a particular day. So my query is finding that one (or many) days with no transactions, and because of the LEFT JOIN is returning it. How would I go about removing this result so it's not in my results?
SELECT
d.PKDate
,f.SaleDate
FROM
DimDate d
LEFT JOIN FactSales f on f.SaleDate=d.PKDate
LEFT JOIN DimPerson p on p.PersonId=f.PersonId
WHERE d.FiscalYear='2014/7/1'
ORDER BY d.PKDate

The problems stems from the fact that you are actually trying to do two things at once:
You want all the Names related to sales of fiscal months with at
least one sale
You want an extra row for all fiscal month with no
sales
As often goes in these cases... you should solve the two distinct problems and then put together the results (with a UNION in this specific case).
Something like this:
SELECT * FROM
(
SELECT DISTINCT
d.FiscalMonth,
d.FiscalMonthOfYear,
p.Name
FROM DimDate d
JOIN FactSales f ON f.SaleDate=d.PKDate
JOIN DimPerson p ON p.PersonId=f.PersonId
WHERE d.FiscalYear='2014/7/1'
) UNION (
SELECT
d.FiscalMonth,
d.FiscalMonthOfYear,
NULL AS Name
FROM DimDate d
LEFT JOIN FactSales f ON f.SaleDate=d.PKDate
WHERE d.FiscalYear='2014/7/1'
GROUP BY d.FiscalMonth, d.FiscalMonthOfYear, p.Name
HAVING COUNT(f.SaleDate)=0
)
ORDER BY FiscalMonthOfYear asc, PersonID ASC
I haven't tested it, and there may be some better ways to solve the second part (SUBSELECT, EXISTS) but that depends a bit on the engine you are using.

You can do an inner join as follows:
SELECT
d.FiscalMonth,
d.FiscalMonthOfYear,
p.Name
FROM
DimDate d
INNER JOIN FactSales f on f.SaleDate=d.PKDate
LEFT JOIN DimPerson p on p.PersonId=f.PersonId
WHERE d.FiscalYear='2014/7/1'
group by d.FiscalMonth, d.FiscalMonthOfYear, p.Name
ORDER BY d.FiscalMonthOfYear asc, p.PersonID asc
The inner join does a union of the two tables without giving priority to the left table. For more on joins you can read this blog: Visual representation of sql joins
Which states that an INNER JOIN will return all of the records in the left table (table A) that have a matching record in the right table (table B) whearas a LEFT JOIN will return all of the records in the left table (table A) regardless if any of those records have a match in the right table (table B)

Related

sql multiple left joins with sum

I have 3 tables as below. What I need to do is create a sumamry after left joining the 1st table to the 2nd and the 2nd to the 3rd.
The code I'm using ends up resulting in a cartesian join. My query to create the 1st table (person) is complicated and resource intensive while the volume of data is table 2(shopping list) is massive so having a nested query is not ideal. Below is the code I'm using right now and the expected output (image 1) & what I get (image 2)
select
a.ID,
a.Name,
sum(b.cost) total_cost,
sum(c.discount_amount) total_discount
from
person a,
left join shopping_list b on a.id=b.id
left join discount c on b.item = c.item
group by
a.ID,
a.Name
I've looked at the below links but I was hoping there's a solution that may work better give the size of my dataset
https://dba.stackexchange.com/questions/217220/how-i-use-multiple-sum-with-multiple-left-joins
Multiple Left Join with sum
Thanks in advance for your help
You have multiple rows for the discounts, so presummarize those:
select p.id, p.name, coalesce(sl.cost, 0) as cost,
coalesce(d.discount_amount, 0) as discount_amount
from person p left join
shopping_list sl
on sl.id = p.id left join
(select d.item, sum(discount_amount) as discount_amount
from discount
group by d.item
) d
on sl.item = d.item
group by p.id, p.name;
The problem with your query is that the multiple rows of discount end up multiplying the rows of shopping_list -- resulting in the inaccurate totals.
Notice that in this query, the table aliases are abbreviations for the table names. This is a best practice that makes it much, much easier to follow the logic of a query.

SQL Query to remove duplicated data and take single column sum

I have the following table resulted from
SELECT m.MedName as [Medicine],m.MedSellPrice as [RetailPrice],m.MedType as [Type],
m.SoldQuantity as [Sold],m.Quantity as [Available],b.BillAmount as [Total Bill],b.BillDate
FROM BillMedicine AS bm LEFT JOIN
Medicine AS m
ON bm.MedicineID=m.id LEFT JOIN
Bill AS b
ON bm.BilIID = b. ID
but now I want to remove the repeated rows except the Sum of 'TotalBill'.
Use GROUP BY:
SELECT
m.MedName AS [Medicine],
m.MedSellPrice AS [RetailPrice],
m.MedType AS [Type],
m.SoldQuantity AS [Sold],
m.Quantity AS [Available],
SUM(b.BillAmount) AS [Total Bill]
FROM BillMedicine AS bm
LEFT JOIN Medicine AS m
ON bm.MedicineID = m.id
LEFT JOIN Bill AS b
ON bm.BilIID = b.ID
GROUP BY
m.MedName,
m.MedSellPrice,
m.MedType,
m.SoldQuantity,
m.Quantity;
Note that for the billing date, the two "duplicate" records you have highlighted have different dates. It is not clear which date, if any, you want to report here. I have omitted this column.
GROUP BY Is Best Option for DUPLICATE DATE Removed & SUM.
Select Column1,column2....., SUM(Total) as Total From Tablename Group BY column1,column2
You seem to want most (or all) columns from m and then the sum from another table. One method is a lateral join or correlated subquery:
SELECT m.*, -- or whatever columns you want,
(SELECT SUM(b.BillAmount)
FROM BillMedicine bm JOIN
Bill b
ON bm.BilIID = b.ID
WHERE bm.MedicineID = m.id
) as [Total Bill]
FROM Medicine m ;
I suggest this approach for several reasons.
This is often more efficient than an outer aggregation.
You have LEFT JOINs but they do not look correct. I suspect you want to start with the Medicine table.
You are including a date/time in the results, but clearly that is not appropriate when combining multiple rows.

SQL Get aggregate as 0 for non existing row using inner joins

I am using SQL Server to query these three tables that look like (there are some extra columns but not that relevant):
Customers -> Id, Name
Addresses -> Id, Street, StreetNo, CustomerId
Sales -> AddressId, Week, Total
And I would like to get the total sales per week and customer (showing at the same time the address details). I have come up with this query
SELECT a.Name, b.Street, b.StreetNo, c.Week, SUM (c.Total) as Total
FROM Customers a
INNER JOIN Addresses b ON a.Id = b.CustomerId
INNER JOIN Sales c ON b.Id = c.AddressId
GROUP BY a.Name, c.Week, b.Street, b.StreetNo
and even if my SQL skill are close to none it looks like it's doing its job. But now I would like to be able to show 0 whenever the one customer don't have sales for a particular week (weeks are just integers). And I wonder if somehow I should get distinct values of the weeks in the Sales table, and then loop through them (not sure how)
Any help?
Thanks
Use CROSS JOIN to generate the rows for all customers and weeks. Then use LEFT JOIN to bring in the data that is available:
SELECT c.Name, a.Street, a.StreetNo, w.Week,
COALESCE(SUM(s.Total), 0) as Total
FROM Customers c CROSS JOIN
(SELECT DISTINCT s.Week FROM sales s) w LEFT JOIN
Addresses a
ON c.CustomerId = a.CustomerId LEFT JOIN
Sales s
ON s.week = w.week AND s.AddressId = a.AddressId
GROUP BY c.Name, a.Street, a.StreetNo, w.Week;
Using table aliases is good, but the aliases should be abbreviations for the table names. So, a for Addresses not Customers.
You should generate a week numbers, rather than using DISTINCT. This is better in terms of performance and reliability. Then use a LEFT JOIN on the Sales table instead of an INNER JOIN:
SELECT a.Name
,b.Street
,b.StreetNo
,weeks.[Week]
,COALESCE(SUM(c.Total),0) as Total
FROM Customers a
INNER JOIN Addresses b ON a.Id = b.CustomerId
CROSS JOIN (
-- Generate a sequence of 52 integers (13 x 4)
SELECT ROW_NUMBER() OVER (ORDER BY a.x) AS [Week]
FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) a(x)
CROSS JOIN (SELECT x FROM (VALUES(1),(1),(1),(1)) b(x)) b
) weeks
LEFT JOIN Sales c ON b.Id = c.AddressId AND c.[Week] = weeek.[Week]
GROUP BY a.Name
,b.Street
,b.StreetNo
,weeks.[Week]
Please try the following...
SELECT Name,
Street,
StreetNo,
Week,
SUM( CASE
WHEN Total IS NULL THEN
0
ELSE
Total
END ) AS Total
FROM Customers a
JOIN Addresses b ON a.Id = b.CustomerId
RIGHT JOIN Sales c ON b.Id = c.AddressId
GROUP BY a.Name,
c.Week,
b.Street,
b.StreetNo;
I have modified your statement in three places. The first is I changed your join to Sales to a RIGHT JOIN. This will join as it would with an INNER JOIN, but it will also keep the records from the table on the right side of the JOIN that do not have a matching record or group of records on the left, placing NULL values in the resulting dataset's fields that would have come from the left of the JOIN. A LEFT JOIN works in the same way, but with any extra records in the table on the left being retained.
I have removed the word INNER from your surviving INNER JOIN. Where JOIN is not preceded by a join type, an INNER JOIN is performed. Both JOIN and INNER JOIN are considered correct, but the prevailing protocol seems to be to leave the INNER out, where the RDBMS allows it to be left out (which SQL-Server does). Which you go with is still entirely up to you - I have left it out here for illustrative purposes.
The third change is that I have added a CASE statement that tests to see if the Total field contains a NULL value, which it will if there were no sales for that Customer for that Week. If it does then SUM() would return a NULL, so the CASE statement returns a 0 instead. If Total does not contain a NULL value, then the SUM() of all values of Total for that grouping is performed.
Please note that I am assuming that Total will not have any NULL values other than from the RIGHT JOIN. Please advise me if this assumption is incorrect.
Please also note that I have assumed that either there will be no missing Weeks for a Customer in the Sales table or that you are not interested in listing them if there are. Again, please advise me if this assumption is incorrect.
If you have any questions or comments, then please feel free to post a Comment accordingly.

Joining 3 tables on 2 columns?

I've created 3 views with identical columns- Quantity, Year, and Variety. I want to join all three tables on year and variety in order to do some calculations with quantities.
The problem is that a particular year/variety combo does not occur on every view.
I've tried queries like :
SELECT
*
FROM
a
left outer join
b
on a.variety = b.variety
left outer join
c
on a.variety = c.variety or b.variety = c.variety
WHERE
a.year = '2015'
and b.year = '2015'
and a.year= '2015'
Obviously this isn't the right solution. Ideally I'd like to join on both year and variety and not use a where statement at all.
The desired output would be put all quantities of matching year and variety on the same line, regardless of null values on a table.
I really appreciate the help, thanks.
You want a full outer join, not a left join, like so:
Select coalesce(a.year, b.year, c.year) as Year
, coalesce(a.variety, b.variety, c.variety) as Variety
, a.Quantity, b.Quantity, c.Quantity
from tableA a
full outer join tableB b
on a.variety = b.variety
and a.year = b.year
full outer join tableC c
on isnull(a.variety, b.variety) = c.variety
and isnull(a.year, b.year) = c.year
where coalesce(a.year, b.year, c.year) = 2015
The left join you are using won't pick up values from b or c that aren't in a. Additionally, your where clause is dropping rows that don't have values in all three tables (because the year in those rows is null, which is not equal to 2015). The full outer join will grab rows from either table in the join, regardless of whether the other table contains a match.

SQL server SELECT with join performance issue

Sorry about the saga here but am trying to explain everything.
We have 2 databases that I would like to join some tables in.
1 database holds sales data from various different stores/sites. This database is quite large (over 3mill rows currently) This table is ItemSales
The other holds application data from an in house web app. These tables are Departments and GroupItems
I would like to create a query that joins 2 tables from the app database with the sales database table. This is so we can group some items together for a date range and see the amount sold for example.
My first attempt was (DealId being the variable that it is grouped on in the App):
SELECT d.Id, d.ItemNo, d.UnitValue, d.NoGST, d.ItemStartDate, d.ItemEndDate,
(SELECT SUM(ItemQty) AS Expr1
FROM Sales.dbo.ItemSales AS s
WHERE (Store = d.SiteId) AND (ItemNo = d.ItemNo) AND (ItemSaleDate >= d.ItemStartDate) AND (ItemSaleDate <= d.ItemEndDate)) AS ItemsSold, Sales.dbo.ItemSales.ItemDesc, Departments.Description
FROM Departments INNER JOIN
Sales.dbo.ItemSales ON Departments.Id = Sales.dbo.ItemSales.ItemDept RIGHT OUTER JOIN
GroupItems AS d ON Sales.dbo.ItemSales.ItemNo = d.ItemNo
WHERE (d.DealId = 11)
GROUP BY d.Id, d.ItemNo, d.UnitValue, d.NoGST, d.ItemStartDate, d.ItemEndDate, ItemDesc, Departments.Description, d.SiteId
ORDER BY d.Id
This does exactly what I want which is:
-Give me all the details from the GroupItems table (UnitValue, ItemStartDate, ItemEndDate etc)
-Gives me the SUM() on the ItemQty column for the amount sold (plus the description etc)
-Returns NULL for something with no sales for the period
It is VERY slow though. To the point that if the GroupItems table has more than about 7 items in it, it times out.
Second attempt has been:
SELECT d.Id, d.ItemNo, d.UnitValue, d.NoGST, d.ItemStartDate, d.ItemEndDate, SUM(ItemQty) AS ItemsSold, Sales.dbo.ItemSales.ItemDesc, Departments.Description
FROM Departments INNER JOIN
Sales.dbo.ItemSales ON Departments.Id = Sales.dbo.ItemSales.ItemDept RIGHT OUTER JOIN
GroupItems AS d ON Sales.dbo.ItemSales.ItemNo = d.ItemNo
WHERE (Store = d.SiteId) AND (d.DealId = 11) AND (Sales.dbo.ItemSales.ItemSaleDate >= d.ItemStartDate) AND (Sales.dbo.ItemSales.ItemSaleDate <= d.ItemEndDate)
GROUP BY d.Id, d.ItemNo, d.UnitValue, d.NoGST, d.ItemStartDate, d.ItemEndDate, ItemDesc, Departments.Description
ORDER BY d.Id
This is very quick and does not time out but does not return the NULLs for no sales items in the ItemSales table. This is a problem as we need to see nothing or 0 for a no sales item otherwise people will think we forgot to check that item.
Can someone help me come up with a query please that returns everything from the GroupItems table, shows the SUM() of items sold and doesn't time out? I have also tried a SELECT x WHERE EXISTS (Subquery) but this also didn't return the NULLs for me but I may have had that one wrong.
If you want everything from GroupItems regardless of the sales, use it as the base of the query and then use left outer joins from there. Something along these lines:
SELECT GroupItems.Id, GroupItems.ItemNo, GroupItems.UnitValue, GroupItems.NoGST,
GroupItems.ItemStartDate, GroupItems.ItemEndDate,
Sales.ItemDesc,
SUM(ItemQty) AS SumOfSales,
Departments.Description
FROM GroupItems
LEFT OUTER JOIN #tempSales AS Sales ON
Sales.ItemNo = GroupItems.ItemNo
AND Sales.Store = GroupItems.SiteId
AND Sales.ItemSaleDate >= GroupItems.ItemStartDate
AND Sales.ItemSaleDate <= GroupItems.ItemEndDate
LEFT OUTER JOIN Departments ON Departments.Id = Sales.ItemDept
WHERE GroupItems.DealId = 11
GROUP BY GroupItems.Id, GroupItems.ItemNo, GroupItems.UnitValue, GroupItems.NoGST,
GroupItems.ItemStartDate, GroupItems.ItemEndDate,
Sales.ItemDesc,
SUM(ItemQty) AS SumOfSales,
Departments.Description
ORDER BY GroupItems.Id
Does changing the INNER JOIN to Sales.dbo.ItemSales into a LEFT OUTER JOIN to Sales.dbo.ItemSales and changing the RIGHT OUTER JOIN to GroupItems into an INNER JOIN to GroupItems fix your issue?