Multi Join Table, Multiple Sums - sql

I've got 3 tables I need to work with:
CREATE TABLE invoices (
id INTEGER,
number VARCHAR(256)
)
CREATE TABLE items (
invoice_id INTEGER,
total DECIMAL
)
CREATE TABLE payments (
invoice_id INTEGER,
total DECIMAL
)
I need a result set along the lines of:
invoices.id
invoices.number
item_total
payment_total
oustanding_balance
00001
i82
42.50
42.50
00.00
00002
i83
89.99
9.99
80.00
I tried
SELECT
invoices.*,
SUM(items.total) AS item_total,
SUM(payments.total) AS payment_total,
SUM(items.total) - SUM(payments.total) AS oustanding_balance
FROM
invoices
LEFT OUTER JOIN items ON items.invoice_id = invoices.id
LEFT OUTER JOIN payments ON payments.invoice_id = invoices.id
GROUP BY
invoices.id
But that fails. The sum for payments ends up wrong since I'm doing 2 joins here and I end up counting payments multiple times.
I ended up with
SELECT
invoices.*,
invoices.item_total - invoices.payment_total AS oustanding_balance
FROM
(
SELECT invoices.*,
(SELECT SUM(items.total FROM items WHERE items.invoice_id = invoices.id) AS item_total,
(SELECT SUM(payments.total FROM payments WHERE payments.invoice_id = invoices.id) AS payment_total
) AS invoices
But ... that feels ugly. Now I've got subqueries going on everywhere. It DOES work, but I'm concerned about performance?
There has to be some good way to do this with joins - I'm sure I'm missing something super obvious?

As you say the sum behavior with multiple joins is normal and working with sub queries (Or CTE for SQl Server) is not a bad practice.
Doing such GOUP BY on an ID and a total in sub queries won't significantly downgrade your performance (depending on your tables sizes).
Another solution could be doing one SUM sub query for each column you need. It would be easier to understand this way I think :
SELECT
invoices.id
, i_total.total as item_total
, p_total.total aspayment_total
, ( i_total.total - p_total.total) as outstanding_balance
FROM
invoices
LEFT JOIN (
SELECT invoice_id, SUM(total) as total FROM items GROUP BY invoice_id
) i_total
ON i_total.invoice_id = invoices.id
LEFT JOIN (
SELECT invoice_id, SUM(total) as total FROM payments GROUP BY invoice_id
) p_total
ON p_total.invoice_id = invoices.id

I think a common table expression (or in this case two CTEs) will give you what you want. You are using something called a scalar, which is precisely speaking not wrong, but as you correctly identified is ugly, hard to read, hard to maintain and can be non-performant in many situations.
CTE essentially take a query and makes it "behave" like a table. We define it once and then we can refer to it later.
with item_data as (
SELECT invoice_id, SUM(total) as item_total
FROM items
group by invoice_id
),
payment_data as (
SELECT invoice_id, SUM(total) as payment_total
FROM payments
group by invoice_id
)
select
i.*,
id.item_total - pd.payment_total as outstanding_balance
from
invoices i
join item_data id on i.invoice_id = id.invoice_id
join payment_data pd on i.invoice_id = pd.invoice_id
Untested, but hopefully you get the idea.

Related

Subquery amount not coming in full

I have this query:
Select I.Invoice_Number, PA.Invoice_Number, I.Line_Amount, PA.Invoiced_Amount
from XXX as PA
Left join (select Invoice_Number, Line_Amount from Invoices) as I
on PA.Invoice_Number = I.Invoice_Number
Group by PA.Invoice_Number;
Both should give me the same amount of cost (I.Line_Amount = PA.Invoice_Amount) per Invoice_Number, yet I.Line_Amount is only bringing the first row on the list, while PA.Invoiced.Number brings the sum of the cost on that Invoice.
I tried using sum(Line_Amount) within the subquery but all records come out as Null.
Is there a way for me to join both tables and make sure that the amounts per invoice match to the total amount of that invoice?
Thanks!!
If I understand you correctly (and you want to make sure that sum of Line_Amount in Invoices table is the same as Invoiced_Amount in XXX table) the second table should have invoice number and sum of amounts:
select I.Invoice_Number, PA.Invoice_Number, I.total, PA.Invoiced_Amount
from XXX as PA
left join (
select Invoice_Number, sum(Line_Amount) as total
from Invoices
group by Invoice_Number
) as I
on PA.Invoice_Number = I.Invoice_Number
You can try it here: http://sqlfiddle.com/#!9/d1d010/1/0

Select SUM from multiple tables

I keep getting the wrong sum value when I join 3 tables.
Here is a pic of the ERD of the table:
(Original here: http://dl.dropbox.com/u/18794525/AUG%207%20DUMP%20STAN.png )
Here is the query:
select SUM(gpCutBody.actualQty) as cutQty , SUM(gpSewBody.quantity) as sewQty
from jobOrder
inner join gpCutHead on gpCutHead.joNum = jobOrder.joNum
inner join gpSewHead on gpSewHead.joNum = jobOrder.joNum
inner join gpCutBody on gpCutBody.gpCutID = gpCutHead.gpCutID
inner join gpSewBody on gpSewBody.gpSewID = gpSewHead.gpSewID
If you are only interested in the quantities of cuts and sews for all orders, the simplest way to do it would be like this:
select (select SUM(gpCutBody.actualQty) from gpCutBody) as cutQty,
(select SUM(gpSewBody.quantity) from gpSewBody) as sewQty
(This assumes that cuts and sews will always have associated job orders.)
If you want to see a breakdown of cuts and sews by job order, something like this might be preferable:
select joNum, SUM(actualQty) as cutQty, SUM(quantity) as sewQty
from (select joNum, actualQty, 0 as quantity
from gpCutBody
union all
select joNum, 0 as actualQty, quantity
from gpSewBody) sc
group by joNum
Mark's approach is a good one. I want to suggest the alternative of doing the group by's before the union, simply because this can be a more general approach for summing along multiple dimensions.
Your problem is that you have two dimensions that you want to sum along, and you are getting a cross product of the values in the join.
select joNum, act.quantity as ActualQty, q.quantity as Quantity
from (select joNum, sum(actualQty) as quantity
from gpCutBody
group by joNum
) act full outer join
(select joNum, sum(quantity) as quantity
from gpSewBody
group by joNum
) q
on act.joNum = q.joNum
(I have kept Mark's assumption that doing this by joNum is the desired output.)

Using SQL query to find details of customers who ordered > x types of products

Please note that I have seen a similar query here, but think my query is different enough to merit a separate question.
Suppose that there is a database with the following tables:
customer_table with customer_ID (key field), customer_name
orders_table with order_ID (key field), customer_ID, product_ID
Now suppose I would like to find the names of all the customers who have ordered more than 10 different types of product, and the number of types of products they ordered. Multiple orders of the same product does not count.
I think the query below should work, but have the following questions:
Is the use of count(distinct xxx) generally allowed with a "group by" statement?
Is the method I use the standard way? Does anybody have any better ideas (e.g. without involving temporary tables)?
Below is my query
select T1.customer_name, T1.customer_ID, T2.number_of_products_ordered
from customer_table T1
inner join
(
select cust.customer_ID as customer_identity, count(distinct ord.product_ID) as number_of_products_ordered
from customer_table cust
inner join order_table ord on cust.customer_ID=ord.customer_ID
group by ord.customer_ID, ord.product_ID
having count(distinct ord.product_ID) > 10
) T2
on T1.customer_ID=T2.customer_identity
order by T2.number_of_products_ordered, T1.customer_name
Isn't that what you are looking for? Seems to be a little bit simpler. Tested it on SQL Server - works fine.
SELECT customer_name, COUNT(DISTINCT product_ID) as products_count FROM customer_table
INNER JOIN orders_table ON customer_table.customer_ID = orders_table.customer_ID
GROUP BY customer_table.customer_ID, customer_name
HAVING COUNT(DISTINCT product_ID) > 10
You could do it more simply:
select
c.id,
c.cname,
count(distinct o.pid) as `uniques`
from o join c
on c.id = o.cid
group by c.id
having `uniques` > 10

Joining two tables with a queried table

Oh great SQL gods I require your assistance.
Here is my Schema:
CAR(Serial_no,Model,Manufacturer,Price)
OPTIONS(Serial_no,Option_name,Price)
SALE(Salesperson_id,Serial_no,Date,Sale_price)
SALESPERSON(Salesperson_id,Name,Phone)
First, I need to join the CAR and SALE table by Serial_no.
Second, i need to take the OPTIONS table and SUM all the prices for similar Serial_no which the following does:
SELECT O.Serial_no, SUM(O.Price)
FROM OPTIONS O
GROUP BY (O.Serial_no);
Last I need to merge steps one and two and query the result so I get a resulting set of where CAR.Price < (SALE.Sale_price + OPTIONS.Price).
Can this be done? Any help would be immensely appreciated!
Thanks,
Mark
SELECT C.Serial_no,
MIN(c.Price) CarPrice,
MIN(s.Sale_price) SalePrice,
SUM(o.Price) OptionsPrice,
MIN(s.Sale_price) + IFNULL(SUM(o.Price),0) TotalPrice
FROM Car c JOIN Sale s ON c.Serial_no = s.Serial_no
LEFT JOIN `Options` o ON c.Serial_no = o.Serial_no
GROUP BY c.Serial_no
HAVING MIN(c.Price) < MIN(s.Sale_price) + IFNULL(SUM(o.Price),0)
Note: the MIN() are not taking anything away, it is only there since you are grouping, and the options table may have multiple rows.
Another option would be to do the calculations in a Subquery which may lead to better performance:
SELECT C.Serial_no,
C.Price,
S.Sale_price,
og.SumPrice
FROM Car c JOIN Sale s ON c.Serial_no = s.Serial_no
LEFT JOIN (
SELECT Serial_no, SUM(Price) SumPrice
FROM `Options`
GROUP BY Serial_no
) og ON c.Serial_no = og.Serial_no
WHERE c.Price < s.Sale_price + IFNULL(og.SumPrice,0)

Should I use a subquery?

I have two tables, one that stores the current price, and one that stores the historical price of items. I want to create a query that pulls the current price, and the difference between the current price and the most recent historical price.
In the historical table, I have the start and end times of the price, so I can just select the most recent price, but how do I pull it all together in one query? Or do I have to do a subquery?
select p.current_price,
h.historical_price
h.historical_time
from price p
inner join price_history h
on p.id = h.id
where max(h.historical_time)
This obviously doesn't work, but that is what I'm trying to accomplish.
This gives me the current and historical price. But I want to make sure I have the most RECENT price. How would I do this?
I would do it like this. Note, you may get duplicate records if there are two price entries with the same date for the same id in price_history:
select p.current_price, h.historical_price,
p.current_price - h.historical_price as PriceDeff, h.historical_time
from price p
inner join (
select id, max(historical_time) as MaxHistoricalTime
from price_history
group by id
) hm on p.id = hm.id
inner join price_history h on hm.id = h.id
and hm.MaxHistoricalTime = h.historical_time
I don't believe there's a way of doing this without a subquery that isn't worse. On the other hand, if your table is indexed correctly, subqueries returning results of aggregate functions are generally pretty fast.
select
p.current_price,
h3.historical_price,
h3.historical_time
from
price p,
( select h1.id, max( h1.historical_time ) as MaxHT
from price_history h1
group by 1 ) h2,
price_history h3
where
p.id = h2.id
and p.id = h3.id
and h2.MaxHT = h3.historical_time