SQL on joining tables (Month and subtextures that has no sales) - sql

DBMS: Derby Embedded
Hello I wonder how I can make some outcome like
SubTextureID Year Month NetSales
1 2013 10 1000
2 2013 10 2000
3 2013 10 0
The third row never appears if that product
has no sales(no records) in the order detail table
Any help would be greatly appreciated!
Thanks
Jack
select s.TextureName, s.SubTextureId, sum(COALESCE(d.NetSales, 0)) NetSales
from (select SubTextureId, TextureName from subtexture) as s
join
(select SubTextureId, ProductCode from products) as p
on (p.SubTextureId = s.SubTextureId)
left outer join
(select ProductCode, OrderCode, NetSales from order_details) as d
on (d.ProductCode = p.ProductCode)
left outer join
( select YEAR(o.PurchaseDateTime) y,
MONTH(o.PurchaseDateTime) m,
OrderCode
from orders o
where o.PurchaseDateTime between '2013-11-01 00:00:00' and '2013-11-30 23:59:59' -- make use of an index if one exists
) as o
on (o.orderCode = d.orderCode)
group by s.TextureName, s.SubTextureId, o.y, o.m

because you used LEFT OUTER JOIN, try to use RIGHT OUTER JOIN, if you understand what's difference about LEFT and RIGHT OUTER JOIN, you will handle your problem

Related

How to join the results of two queries in just one table grouped by YEAR and MONTH?

I have two tables,materials_students and components_students. Both of them has afinished_at column. material_student has a component_student_id column.
I need to count the number of components_students and materials_students (Where finished_at id is not NULL), extract month and year from finished_at, group the result by month and year and plot it in just one table, like this:
| Materials | Components | Month | Year
---------------------------------------------
| 45 3 1 2019
| 37 6 2 2019
| 63 8 3 2019
I know how to do this for one table only, but dont know how to join the results in just one table.
Find below how I did for one table:
FROM materials_students
LEFT JOIN students ON materials_students.student_id = students.id
LEFT JOIN company_profiles ON students.company_profile_id = company_profiles.id
LEFT JOIN companies ON company_profiles.company_id = companies.id
WHERE materials_students.finished_at IS NOT NULL
GROUP BY YEAR, MONTH
ORDER BY YEAR, MONTH
Thanks!
The best is to assemble a subquery for each case, then join them.
select
ISNULL(M.yy, C.yy) [yy],
ISNULL(M.mm, C.mm) [mm],
ISNULL(number_material_students, 0) [number_material_students],
ISNULL(number_components_students, 0) [number_component_students]
from (
SELECT
year(materials_students.finished_at) yy,
month(materials_students.finished_at) mm,
count(*) number_material_students
FROM materials_students
LEFT JOIN students ON materials_students.student_id = students.id
LEFT JOIN company_profiles ON students.company_profile_id = company_profiles.id
LEFT JOIN companies ON company_profiles.company_id = companies.id
WHERE materials_students.finished_at IS NOT NULL
GROUP BY year(materials_students.finished_at), month(materials_students.finished_at)
) M
full outer join (
SELECT
year(components_students.finished_at) yy,
month(components_students.finished_at) mm,
count(*) number_material_students
FROM components_students
LEFT JOIN students ON components_students.student_id = students.id
LEFT JOIN company_profiles ON students.company_profile_id = company_profiles.id
LEFT JOIN companies ON company_profiles.company_id = companies.id
WHERE components_students.finished_at IS NOT NULL
GROUP BY year(materials_students.finished_at), month(materials_students.finished_at)
) C
ON C.yy = M.yy AND C.mm = M.mm
ORDER BY 1, 2
I had to make a FULL OUTER JOIN between the subqueries, because there may have been year/months that appear only on materials, but not on components, and vice-versa.
To retrieve the year I use the ISNULL() function, so in case year is not filled from the materials subquery, I use it from the components subquery. Similar reasoning applies to all other resulting columns.

SQL server SELECT with join performance issue

Sorry about the saga here but am trying to explain everything.
We have 2 databases that I would like to join some tables in.
1 database holds sales data from various different stores/sites. This database is quite large (over 3mill rows currently) This table is ItemSales
The other holds application data from an in house web app. These tables are Departments and GroupItems
I would like to create a query that joins 2 tables from the app database with the sales database table. This is so we can group some items together for a date range and see the amount sold for example.
My first attempt was (DealId being the variable that it is grouped on in the App):
SELECT d.Id, d.ItemNo, d.UnitValue, d.NoGST, d.ItemStartDate, d.ItemEndDate,
(SELECT SUM(ItemQty) AS Expr1
FROM Sales.dbo.ItemSales AS s
WHERE (Store = d.SiteId) AND (ItemNo = d.ItemNo) AND (ItemSaleDate >= d.ItemStartDate) AND (ItemSaleDate <= d.ItemEndDate)) AS ItemsSold, Sales.dbo.ItemSales.ItemDesc, Departments.Description
FROM Departments INNER JOIN
Sales.dbo.ItemSales ON Departments.Id = Sales.dbo.ItemSales.ItemDept RIGHT OUTER JOIN
GroupItems AS d ON Sales.dbo.ItemSales.ItemNo = d.ItemNo
WHERE (d.DealId = 11)
GROUP BY d.Id, d.ItemNo, d.UnitValue, d.NoGST, d.ItemStartDate, d.ItemEndDate, ItemDesc, Departments.Description, d.SiteId
ORDER BY d.Id
This does exactly what I want which is:
-Give me all the details from the GroupItems table (UnitValue, ItemStartDate, ItemEndDate etc)
-Gives me the SUM() on the ItemQty column for the amount sold (plus the description etc)
-Returns NULL for something with no sales for the period
It is VERY slow though. To the point that if the GroupItems table has more than about 7 items in it, it times out.
Second attempt has been:
SELECT d.Id, d.ItemNo, d.UnitValue, d.NoGST, d.ItemStartDate, d.ItemEndDate, SUM(ItemQty) AS ItemsSold, Sales.dbo.ItemSales.ItemDesc, Departments.Description
FROM Departments INNER JOIN
Sales.dbo.ItemSales ON Departments.Id = Sales.dbo.ItemSales.ItemDept RIGHT OUTER JOIN
GroupItems AS d ON Sales.dbo.ItemSales.ItemNo = d.ItemNo
WHERE (Store = d.SiteId) AND (d.DealId = 11) AND (Sales.dbo.ItemSales.ItemSaleDate >= d.ItemStartDate) AND (Sales.dbo.ItemSales.ItemSaleDate <= d.ItemEndDate)
GROUP BY d.Id, d.ItemNo, d.UnitValue, d.NoGST, d.ItemStartDate, d.ItemEndDate, ItemDesc, Departments.Description
ORDER BY d.Id
This is very quick and does not time out but does not return the NULLs for no sales items in the ItemSales table. This is a problem as we need to see nothing or 0 for a no sales item otherwise people will think we forgot to check that item.
Can someone help me come up with a query please that returns everything from the GroupItems table, shows the SUM() of items sold and doesn't time out? I have also tried a SELECT x WHERE EXISTS (Subquery) but this also didn't return the NULLs for me but I may have had that one wrong.
If you want everything from GroupItems regardless of the sales, use it as the base of the query and then use left outer joins from there. Something along these lines:
SELECT GroupItems.Id, GroupItems.ItemNo, GroupItems.UnitValue, GroupItems.NoGST,
GroupItems.ItemStartDate, GroupItems.ItemEndDate,
Sales.ItemDesc,
SUM(ItemQty) AS SumOfSales,
Departments.Description
FROM GroupItems
LEFT OUTER JOIN #tempSales AS Sales ON
Sales.ItemNo = GroupItems.ItemNo
AND Sales.Store = GroupItems.SiteId
AND Sales.ItemSaleDate >= GroupItems.ItemStartDate
AND Sales.ItemSaleDate <= GroupItems.ItemEndDate
LEFT OUTER JOIN Departments ON Departments.Id = Sales.ItemDept
WHERE GroupItems.DealId = 11
GROUP BY GroupItems.Id, GroupItems.ItemNo, GroupItems.UnitValue, GroupItems.NoGST,
GroupItems.ItemStartDate, GroupItems.ItemEndDate,
Sales.ItemDesc,
SUM(ItemQty) AS SumOfSales,
Departments.Description
ORDER BY GroupItems.Id
Does changing the INNER JOIN to Sales.dbo.ItemSales into a LEFT OUTER JOIN to Sales.dbo.ItemSales and changing the RIGHT OUTER JOIN to GroupItems into an INNER JOIN to GroupItems fix your issue?

SQL rewrite to optimize

I'm trying to optimize or change the SQL to work with inner joins rather than independent calls
Database: one invoice can have many payment records and order (products) records
Original:
SELECT
InvoiceNum,
(SELECT SUM(Orders.Cost) FROM Orders WHERE Orders.Invoice = InvoiceNum and Orders.Returned <> 1 GROUP BY Orders.Invoice) as vat_only,
(SELECT SUM(Orders.Vat) FROM Orders WHERE Orders.Invoice = InvoiceNum and Orders.Returned <> 1 GROUP BY Orders.Invoice) as sales_prevat,
(SELECT SUM(pay.Amount) FROM Payments as pay WHERE Invoices.InvoiceNum = pay.InvoiceNum ) as income
FROM
Invoices
WHERE
InvoiceYear = currentyear
I'm sure we can do this another way by grouping and joining tables together. When I tried the SQL statement below, I wasn't getting the same amount (count) of records...I'm thinking in respect to the type of join or where it joins !! but still couldn't get it working after 3 hrs of looking on the screen..
So far I got to...
SELECT
Invoices.InvoiceNum,
Sum(Orders.Cost) AS SumOfCost,
Sum(Orders.VAT) AS SumOfVAT,
SUM(distinct Payments.Amount) as money
FROM
Invoices
LEFT JOIN
Orders ON Orders.Invoice = Invoices.InvoiceNum
LEFT JOIN
Payments ON Invoices.InvoiceNum = Payments.InvoiceNum
WHERE
Invoices.InvoiceYear = 11
AND Orders.Returned <> 1
GROUP BY
Invoices.InvoiceNum
Sorry for the bad english and I'm not sure what to search for to find if it's already been answered here :D
Thanks in advance for all the help
Your problem is that an order has multiple lines for an invoice and it has multiple payments on an invoice (sometimes). This causes a cross product effect for a given order. You fix this by pre-summarizing the tables.
A related problem is that the join will fail if there are no payments, so you need left outer join.
select i.InvoiceNum, osum.cost, osum.vat, p.income
from Invoice i left outer join
(select o.Invoice, sum(o.Cost) as cost, sum(o.vat) as vat
from orders o
where Returned <> 1
group by o.Invoice
) osum
on osum.Invoice = i.InvoiceNum left outer join
(select p.InvoiceNum, sum(pay.Amount) as income
from Payments p
group by p.InvoiceNum
) psum
on psum.InvoiceNum = i.InvoiceNum
where i.InvoiceYear = year(getdate())
Two comments: Is the key field for orders really Invoice or is it also InvoiceNum? Also, do you have a field Invoice.InvoiceYear? Or do you want year(i.InvoiceDate) in the where clause?
Assuming that both payments and orders can contain more than one record per invoice you will need to do your aggregates in a subquery to avoid cross joining:
SELECT Invoices.InvoiceNum, o.Cost, o.VAT, p.Amount
FROM Invoices
LEFT JOIN
( SELECT Invoice, Cost = SUM(Cost), VAT = SUM(VAT)
FROM Orders
WHERE Orders.Returned <> 1
GROUP BY Invoice
) o
ON o.Invoice = Invoices.InvoiceNum
LEFT JOIN
( SELECT InvoiceNum, Amount = SUM(Amount)
FROM Payments
GROUP BY InvoiceNum
) P
ON P.InvoiceNum = Invoices.InvoiceNum
WHERE Invoices.InvoiceYear = 11;
ADDENDUM
To expand on the CROSS JOIN comment, imagine this data for an Invoice (1)
Orders
Invoice Cost VAT
1 15.00 3.00
1 10.00 2.00
Payments
InvoiceNum Amount
1 15.00
1 10.00
When you join these tables as you did:
SELECT Orders.*, Payments.Amount
FROM Invoices
LEFT JOIN Orders
ON Orders.Invoice = Invoices.InvoiceNum
LEFT JOIN Payments
ON Invoices.InvoiceNum = Payments.InvoiceNum;
You end up with:
Orders.Invoice Orders.Cost Orders.Vat Payments.Amount
1 15.00 3.00 15.00
1 10.00 2.00 15.00
1 15.00 3.00 10.00
1 10.00 2.00 10.00
i.e. every combination of payments/orders, so for each invoice you would get many more rows than required, which distorts your totals. So even though the original data had £25 of payments, this doubles to £50 because of the two records in the order table. This is why each table needs to be aggregated individually, using DISTINCT would not work in the case there was more than one payment/order for the same amount on a single invoice.
One final point with regard to optimisation, you should probably index your tables, If you run the query and display the actual execution plan SSMS will suggest indexes for you, but at a guess the following should improve the performance:
CREATE NONCLUSTERED INDEX IX_Orders_InvoiceNum ON Orders (Invoice) INCLUDE(Cost, VAT, Returned);
CREATE NONCLUSTERED INDEX IX_Payments_InvoiceNum ON Payments (InvoiceNum) INCLUDE(Amount);
This should allow both subqueries to only use the index on each table, with no bookmark loopup/clustered index scan required.
Try this, note that I haven't tested it, just wipped it out on notepad. If any of your invoices may not exist in any of the subtables, then use LEFT JOIN
SELECT InvoiceNum, vat_only, sales_prevat, income
FROM Invoices i
INNER JOIN (SELECT Invoice, SUM(Cost) [vat_only], SUM(Vat) [sales_prevat]
FROM Orders
WHERE Returned <> 1
GROUP BY Invoice) o
ON i.InvoiceNum = o.Invoice
INNER JOIN (SELECT SUM(Amount) [income]
FROM Payments) p
ON i.InvoiceNum = p.InvoiceNum
WHERE i.InvoiceYear = currentyear
select
PreQuery.InvoiceNum,
PreQuery.VAT_Only,
PreQuery.Sales_Prevat,
SUM( Pay.Amount ) as Income
from
( select
I.InvoiceNum,
SUM( O.Cost ) as VAT_Only,
SUM( O.Vat ) as sales_prevat
from
Invoice I
Join Orders O
on I.InvoiceNum = O.Invoice
AND O.Returned <> 1
where
I.InvoiceYear = currentYear
group by
I.InvoiceNum ) PreQuery
JOIN Payments Pay
on PreQuery.InvoiceNum = Pay.InvoiceNum
group by
PreQuery.InvoiceNum,
PreQuery.VAT_Only,
PreQuery.Sales_Prevat
Your "currentYear" reference could be parameterized or you can use from getting the current date from sql function such as
Year( GetDate() )

Two Left Outer Join

I have two tables in this form:
Inventory:
Units InDate OutDate
1000 11/4 12/4
2000 13/4 14/4
Prices:
Date Price
11/4 5
12/4 4
13/4 6
14/4 7
I want to build the following table:
Units InDate OutDate InPrice OutPrice
1000 11/4 12/4 5 4
2000 13/4 14/4 6 7
I thought I should use something like:
Select *
FROM Inventory
LEFT OUTER JOIN Prices ON Inventory.InDate = Prices.Date
LEFT OUTER JOIN Prices ON Inventory.OutDate = Prices.Date
But the second OUTER JOIN seem to mess things up.
How can I reach this result?
Select
Units,
InDate,
OutDate,
P1.Price as InPrice,
P2.Price as OutPrice
FROM Inventory
LEFT OUTER JOIN Prices as P1 ON Inventory.InDate = P1.Date
LEFT OUTER JOIN Prices as P2 ON Inventory.OutDate = P2.Date
Try this.
SELECT Inventory.Units, Inventory.InDate, Inventory.OutDate, InPrices.Price AS InPrice, OutPrices.Price AS OutPrice
FROM Inventory
LEFT OUTER JOIN Prices AS InPrices ON Inventory.InDate = InPrices.Date
LEFT OUTER JOIN Prices AS OutPrices ON Inventory.OutDate = OutPrices.Date
Your current query was very close to being correct. If you placed different aliases on the prices table then it would have worked. Since you are joining on the same table prices twice, you need to use a different alias to distinguish between them:
select i.units,
i.indate,
i.outdate,
inPrice.price,
outPrice.price
from inventory i
left join prices inPrice -- first join with alias
on i.indate = inPrice.date
left join prices outPrice -- second join with alias
on i.outdate = outPrice.date
See SQL Fiddle with Demo

3 joins and where clause together

I have 3 tables
bl_main (bl_id UNIQUE, bl_area)
bl_details (bl_id UNIQUE, name)
bl_data(bl_id, month, paper_tons, bottles_tons)
bl_id is not unique in the last table. There will be multiple rows of same bl_id.
I am trying to retrieve data in the following way
bl_id | name | bl_area | sum(paper_tons) | sum (bottles_tons) | paper_tons | bottles_tons
sum(paper_tons) should return the sum of all the paper tons for the same bl_id like Jan to December.
Using the below query i am able to retrieve all the data correctly except in the result, there are multiple occurances of bl_ids(From bl_data table).
SELECT bl_main.bl_id,name,bl_area,sums.SummedPaper, sums.SummedBottles,paper_tons,bottles_tons
FROM bl_main
JOIN bl_details ON
bl_main.bl_id= bl_details.bl_id
left outer JOIN bl_data ON
bl_data.bl_id= bl_main.bl_id
left outer JOIN (
SELECT bl_id, SUM(Paper_tons) As SummedPaper, SUM(bottle_tons) As SummedBottles
FROM bl_data
GROUP by bl_id) sums ON sums.bl_id = bl_main.bl_id
I wanto retrieve only the unique values of bl_ids without repetition and it should contain the bl_id which has the max month and not all the months for the same bl_id.
For ex:
INCORRECT
**0601** University Hall 75.76 17051 1356 4040 1154 **11**
**0601** University Hall 75.76 17051 1356 9190 101 **12**
**0605** UIC Student 22.86 3331 14799 0 356 **8**
CORRECT
**0601** University Hall 75.76 17051 1356 9190 101 **12**
**0605** UIC Student 22.86 3331 14799 0 356 **8**
I know I can get the max value using
WHERE Month = (SELECT MAX(Month)
but where exactlt should i add this in the query and should i change the join definition.
Any help is highly appreciated as i am new to sql. Thanks in advance.
You have two tables that probably should be combined into one (bl_main and bl_details). But putting that aside, what you need is a self-join subquery to select the row with the max month. Something like the following (untested):
SELECT bl_main.bl_id, bl_details.name, bl_main.bl_area, sums.sum_paper_tons,
sums.sum_bottles_tons, maxmonth.paper_tons, maxmonth.bottles_tons
FROM bl_main
INNER JOIN bl_details ON bl_main.bl_id = bl_details.bl_id
LEFT OUTER JOIN (SELECT bl_id, SUM(paper_tons) AS sum_paper_tons,
SUM(bottles_tons) AS sum_bottles_tons
FROM bl_data
GROUP BY bl_id) sums ON bl_main.bl_id = sums.bl_id
LEFT OUTER JOIN (SELECT bl_id, paper_tons, bottles_tons
FROM bl_data data2
INNER JOIN (SELECT bl_id, MAX(month) AS max_month
FROM bl_data
GROUP BY bl_id) m
ON m.bl_id = data2.bl_id
AND m.max_month = data2.month) maxmonth
ON bl_main.bl_id = maxmonth.bl_id
You can join the table containing the month against itself, using a subquery of the form:
Select *
From mytable m
Inner Join (Select max(Month) as Month, myId
From mytable
Group By myId) mnth
On mnth.myId = m.myId and mnth.Month = m.Month
Your JOIN clause
left outer JOIN bl_data ON
bl_data.bl_id= bl_main.bl_id
does not specify which month to select for the data you are displaying with paper_tons and bottles_tons.
You could update that JOIN to only contain the max month, and this should limit the entries, like so:
left outer JOIN (SELECT bl_id, MAX(Month) as Month from bl_data GROUP BY bl_id) as Month
ON Month.bl_id = bl_main.bl_id
left outer JOIN bl_data ON
bl_data.bl_id = bl_main.bl_id AND bl_data.Month = Month.bl_Month
I think this query is what you are looking for
SELECT bl_main.bl_id,name, bl_area, sums.SummedPaper, sums.SummedBottles, paper_tons, bottles_tons
FROM bl_main
JOIN bl_details ON bl_main.bl_id= bl_details.bl_id
left outer JOIN bl_data ON bl_data.bl_id= bl_main.bl_id
left outer JOIN
(
SELECT bl_id, month, SUM(Paper_tons) As SummedPaper, SUM(bottle_tons) As SummedBottles
FROM bl_data
WHERE month in
(SELECT MAX(month) FROM bl_data GROUP BY bl_id)
GROUP BY bl_id, month
) sums ON sums.bl_id = bl_main.bl_id
I wanted to just add a comment to the answer lc gave, but I don't have 50 reputation points yet. It is a link to an article that I believe explains this question and adds the why the solution that lc gave is correct.
http://www.sqlteam.com/article/how-to-use-group-by-with-distinct-aggregates-and-derived-tables