How to join table in many-to-many relationship? - sql

Here is a simplified version of my problem. I have two tables. Each table has a unique ID field, but it's irrelevant in this case.
shipments has 3 fields: shipment_id, receive_by_datetime, and qty.
deliveries has 4 fields: delivery_id, shipment_id, delivered_on_datetime, and qty.
In shipments, the shipment_id and receive_by_datetime fields always match up. There are many rows in the table that would appear to be duplicates based off of those two columns (but they aren't... other fields are different).
In deliveries, the shipment_id matches up to the shipments table. There are also many rows that would appear to be duplicates based off of the delivery_id and delivered_on_datetime fields (but they aren't again... other fields exist that I didn't list).
I am trying to pull one row per aggregate delivered_on_datetime and receive_by_datetime, but because of the many-to-many relationships, it's difficult. Is a query somewhere along these lines correct?
SELECT d.delivered_on_datetime, s.receive_by_datetime, SUM(d.qty)
FROM deliveries d
LEFT JOIN (
SELECT DISTINCT s1.shipment_id, s1.receive_by_datetime
FROM shipments s1
) s ON (s.shipment_id = d.shipment_id)
GROUP BY d.delivered_on_datetime, s.receive_by_datetime

You will run into problems where the total SUM(d.qty) will be larger than the value from SELECT SUM(qty) FROM deliveries
Something like this might be better suited for you:
SELECT d.delivered_on_datetime, s.receive_by_datetime, SUM(d.qty) AS delivered_qty, SUM(d.qty) AS shipped_qty
FROM deliveries d
LEFT JOIN (
SELECT s1.shipment_id, s1.receive_by_datetime, SUM(s1.qty) AS qty
FROM shipments s1
GROUP BY s1.shipment_id, s1.received_by_datetime
) s ON (s.shipment_id = d.shipment_id)
GROUP BY d.delivered_on_datetime, s.receive_by_datetime
If you somehow have (or might have) a shipment_id that has multiple values for received_by_datetime and it's best practice to assume that something else might have corrupted the data slightly then to prevent the lines in the deliveries table being duplicated while still returning a valid result you can use:
SELECT d.delivered_on_datetime, s.receive_by_datetime, SUM(d.qty) AS delivered_qty, SUM(d.qty) AS shipped_qty
FROM deliveries d
LEFT JOIN (
SELECT s1.shipment_id, MAX(s1.receive_by_datetime) AS receive_by_datetime, SUM(s1.qty) AS qty
FROM shipments s1
GROUP BY s1.shipment_id
) s ON (s.shipment_id = d.shipment_id)
GROUP BY d.delivered_on_datetime, s.receive_by_datetime

Yep, the problem with many-to-many is you get the cartesian product of rows, so you end up counting the same row more than once. Once for each other row it matches against.
In shipments, the shipment_id and receive_by_datetime fields always match up
If this means there cannot be two shipments with the same ID but different dates then your query will work. But in general it is not safe. i.e. If subselect distinct could return more than one row per shipment ID, you will be subject to the double counting issue. Generically this is a very tricky problem to solve - in fact I see no way it could be with this data model.

Related

What's the use of this WHERE clause

this is an answer to the question : We need a list of customer IDs with the total amount they have ordered. Write a SQL statement to return customer ID (cust_id in the Orders table) and total_ordered using a subquery to return the total of orders for each customer. Sort the results by amount spent from greatest to the least. Hint: you’ve used the SUM() to calculate order totals previously.
SELECT prod_name,
(SELECT Sum(quantity)
FROM OrderItems
WHERE Products.prod_id=OrderItems.prod_id) AS quant_sold
FROM Products
;
So there is this simple code up here, and I know that this WHERE clause is comparing two columns in two different tables. But since We are calculating the SUM of that quantity, why do need that WHERE clause exactly. I really couldn't get it. Why the product_id exactly and not any other column ( p.s: the only shared column between those two tables is prod_id column ) I am still a beginner. Thank you!
First you would want to know the sum for each product - so need to adjust the subquery similar to this:
(SELECT prod_id, Sum(quantity) qty
FROM OrderItems
group by prod_id
) AS quant_sold
then once you know how much for each product, then you can link that
SELECT prod_name,
(SELECT prod_id, Sum(quantity) qty
FROM OrderItems
group by prod_id
) AS quant_sold
FROM Products p
WHERE p.prod_id = quant_sold.prod_id
Run it without the where clause and compare the results. You'll learn a lot that way. specifically focus on two different product Ids ensuring they both have order items and quantities.
You have two different tables involved. There are multiple products. You don't want the sum of all orders on each product; which is what you would get without the where clause. So the where clause correlates the two tables ensuring you only SUM the quantity of each order item for each product between the tables. Personally, I'd use a join, sum, and a group by as I find it easier to read and I'm not a fan of sub selects in the select of another query; but that's me.
SELECT prod_name,
(SELECT Sum(quantity)
FROM OrderItems
WHERE Products.prod_id=OrderItems.prod_id) AS quant_sold
FROM Products
Should be the same as:
SELECT prod_name, Sum(coalesce(P.quantity,0))
FROM Products P
LEFT JOIN orderItems OI
on P.prod_id=OI.prod_id
GROUP BY Prod_Name
'Notes
the above is untested.
a left join is needed because all products should be listed and if a product doesn't have an order, the quantity would be zero.
if we use an inner join, the product would be excluded.
We use coalesce because you'd have a "Null" quantity instead of zero for such lines without an order item.
as to which is "right" well it depends and varies on different cases. each has it's own merits and in different cases, one will perform better than another, and in a different case, vice-versa. See --> Join vs. sub-query
As an example:
Say you have Products A & B
"A" has Order Item Quantities of 1 & 2
"B" has order item Quantities of 10 & 20
If we don't have the where clause every result record would have qty 33
If we have the where product "A" would have 3
product "B" would have qty 30.

SQL Server question - subqueries in column result with a join?

I have a distinct list of part numbers from one table. It is basically a table that contains a record of all the company's part numbers. I want to add columns that will pull data from different tables but only pertaining to the part number on that row of the distinct part list.
For example: if I have part A, B, C from the unique part list I want to add columns for Purchase quantity, repair quantity, loan quantity, etc... from three totally unique tables.
So it's almost like I need 3 subqueries that will sum of that data from the different tables for each part.
Can anybody steer me in the direction of how to do this? Please and thank you so much!
One method is correlated subqueries. Something like this:
select p.*,
(select count(*)
from purchases pu
where pu.part_id = p.part_id
) as num_purchases,
(select count(*)
from repairs r
where r.part_id = p.part_id
) as num_repairs,
(select count(*)
from loans l
where l.part_id = p.part_id
) as num_loans
from parts p;
Another option is joins with aggregation before the join. Or lateral joins (which are quite similar to correlated subqueries).

getting duplicates when joining tables

I have two tables that I want to join. Table1 has sales order, but it doesn’t have the name of the sales person. It only has employee ID. I have table2, that has the names of employees, and employeeID is common between the two tables. Normally I would use an inner join to get the name of the sales person from table2. The problem is that on table2, there are multiple entries for each employee. If they changed manager, or changed roles within the company, or perhaps went on FMLA, it creates a new row. Therefore, when I join the tables, it creates duplicates because of the multiple entries in table2. A sale shows 3 or 4 times in my results.
Select
a.state_name
,order_number
,a.employeeID
,b.Sales_Rep_Name
,a.order_date
from
table1 as A
Inner join table2 as B
On a.employeeid = b.employeeID
where
b.monthperiod = 'November' <-- If I remove this one it adds duplicates
Is there a way to not get these duplicates? I tried distinct but didn’t work. Probably because the rows have at least one column different. I was able to eliminate the duplicates when I added a where clause asking for last month on table 2, but I am in a situation where I need all months, not just one. I have to manually change the month in order to get the full year.
Any help would be appreciated. Thanks
Use a subquery to get list of distinct employee records and then query the sales table

SQL query with loop

I am having trouble with writing a SQL query in Access with foreign keys. There are two tables, 'Customers'(ID, Name, Surname) and 'Orders'(ID, Customer, Date, Volume). ID fields are primary, and Orders.Customer is a foreign key linked to Customers.ID, so a customer can have many orders or none.
My goal is to do a search on customers based on many criteria, one of which being if customers have at least an order which volume is superior to a certain quantity. I tried joins with SELECT DISTINCT but it still gives me duplicate results, plus I had to create an empty order for every customer without orders if the query didn't use the above condition.
Does anyone have an idea about that? Maybe some special instruction on foreign keys or 2 separate queries?
Based on the information you give, i only can give you hints on what I think you're doing/understanding wrong :
SELECT DISTINCT does select you a unique record, not a unique value, so if your statement selects all fields (*), distinct won't help you much there.
My guess is you had to create an empty order for each customer because you used INNER JOIN, try LEFT OUTER JOIN instead
For example :
SELECT DISTINCT Customers.*
FROM Customers
LEFT OUTER JOIN Orders
ON (Orders.Customer = Customers.id)
WHERE Volume > put_your_value

Joining table issue with SQL Server 2008

I am using the following query to obtain some sales figures. The problem is that it is returning the wrong data.
I am joining together three tables tbl_orders tbl_orderitems tbl_payment. The tbl_orders table holds summary information, the tbl_orderitems holds the items ordered and the tbl_payment table holds payment information regarding the order. Multiple payments can be placed against each order.
I am trying to get the sum of the items sum(mon_orditems_pprice), and also the amount of items sold count(uid_orderitems).
When I run the following query against a specific order number, which I know has 1 order item. It returns a count of 2 and the sum of two items.
Item ProdTotal ProdCount
Westvale Climbing Frame 1198 2
This order has two payment records held in the tbl_payment table, which is causing the double count. If I remove the payment table join it reports the correct figures, or if I select an order which has a single payment it works as well. Am I missing something, I am tired!!??
SELECT
txt_orditems_pname,
SUM(mon_orditems_pprice) AS prodTotal,
COUNT(uid_orderitems) AS prodCount
FROM dbo.tbl_orders
INNER JOIN dbo.tbl_orderitems ON (dbo.tbl_orders.uid_orders = dbo.tbl_orderitems.uid_orditems_orderid)
INNER JOIN dbo.tbl_payment ON (dbo.tbl_orders.uid_orders = dbo.tbl_payment.uid_pay_orderid)
WHERE
uid_orditems_orderid = 61571
GROUP BY
dbo.tbl_orderitems.txt_orditems_pname
ORDER BY
dbo.tbl_orderitems.txt_orditems_pname
Any suggestions?
Thank you.
Drill down Table columns
dbo.tbl_payment.bit_pay_paid (1/0) Has this payment been paid, yes no
dbo.tbl_orders.bit_order_archive (1/0) Is this order archived, yes no
dbo.tbl_orders.uid_order_webid (integer) Web Shop's ID
dbo.tbl_orders.bit_order_preorder (1/0) Is this a pre-order, yes no
YEAR(dbo.tbl_orders.dte_order_stamp) (2012) Sales year
dbo.tbl_orders.txt_order_status (varchar) Is the order dispatched, awaiting delivery
dbo.tbl_orderitems.uid_orditems_pcatid (integer) Product category ID
It's a normal behavior, if you remove grouping clause you'll see that there really are 2 rows after joining and they both have 599 as a mon_orditems_pprice hence the SUM is correct. When there is a multiple match in any joined table the entire output row becomes multiple and the data that is being summed (or counted or aggregated in any other way) also gets summed multiple times. Try this:
SELECT txt_orditems_pname,
SUM(mon_orditems_pprice) AS prodTotal,
COUNT(uid_orderitems) AS prodCount
FROM dbo.tbl_orders
INNER JOIN dbo.tbl_orderitems ON (dbo.tbl_orders.uid_orders = dbo.tbl_orderitems.uid_orditems_orderid)
INNER JOIN
(
SELECT x.uid_pay_orderid
FROM dbo.tbl_payment x
GROUP BY x.uid_pay_orderid
) AS payments ON (dbo.tbl_orders.uid_orders = payments.uid_pay_orderid)
WHERE
uid_orditems_orderid = 61571
GROUP BY
dbo.tbl_orderitems.txt_orditems_pname
ORDER BY
dbo.tbl_orderitems.txt_orditems_pname
I don't know what data from tbl_payment you are using, are any of the columns from the SELECT list actually from tbl_payment? Why is tbl_payment being joined?