Should I use a subquery? - sql

I have two tables, one that stores the current price, and one that stores the historical price of items. I want to create a query that pulls the current price, and the difference between the current price and the most recent historical price.
In the historical table, I have the start and end times of the price, so I can just select the most recent price, but how do I pull it all together in one query? Or do I have to do a subquery?
select p.current_price,
h.historical_price
h.historical_time
from price p
inner join price_history h
on p.id = h.id
where max(h.historical_time)
This obviously doesn't work, but that is what I'm trying to accomplish.
This gives me the current and historical price. But I want to make sure I have the most RECENT price. How would I do this?

I would do it like this. Note, you may get duplicate records if there are two price entries with the same date for the same id in price_history:
select p.current_price, h.historical_price,
p.current_price - h.historical_price as PriceDeff, h.historical_time
from price p
inner join (
select id, max(historical_time) as MaxHistoricalTime
from price_history
group by id
) hm on p.id = hm.id
inner join price_history h on hm.id = h.id
and hm.MaxHistoricalTime = h.historical_time

I don't believe there's a way of doing this without a subquery that isn't worse. On the other hand, if your table is indexed correctly, subqueries returning results of aggregate functions are generally pretty fast.

select
p.current_price,
h3.historical_price,
h3.historical_time
from
price p,
( select h1.id, max( h1.historical_time ) as MaxHT
from price_history h1
group by 1 ) h2,
price_history h3
where
p.id = h2.id
and p.id = h3.id
and h2.MaxHT = h3.historical_time

Related

Multi Join Table, Multiple Sums

I've got 3 tables I need to work with:
CREATE TABLE invoices (
id INTEGER,
number VARCHAR(256)
)
CREATE TABLE items (
invoice_id INTEGER,
total DECIMAL
)
CREATE TABLE payments (
invoice_id INTEGER,
total DECIMAL
)
I need a result set along the lines of:
invoices.id
invoices.number
item_total
payment_total
oustanding_balance
00001
i82
42.50
42.50
00.00
00002
i83
89.99
9.99
80.00
I tried
SELECT
invoices.*,
SUM(items.total) AS item_total,
SUM(payments.total) AS payment_total,
SUM(items.total) - SUM(payments.total) AS oustanding_balance
FROM
invoices
LEFT OUTER JOIN items ON items.invoice_id = invoices.id
LEFT OUTER JOIN payments ON payments.invoice_id = invoices.id
GROUP BY
invoices.id
But that fails. The sum for payments ends up wrong since I'm doing 2 joins here and I end up counting payments multiple times.
I ended up with
SELECT
invoices.*,
invoices.item_total - invoices.payment_total AS oustanding_balance
FROM
(
SELECT invoices.*,
(SELECT SUM(items.total FROM items WHERE items.invoice_id = invoices.id) AS item_total,
(SELECT SUM(payments.total FROM payments WHERE payments.invoice_id = invoices.id) AS payment_total
) AS invoices
But ... that feels ugly. Now I've got subqueries going on everywhere. It DOES work, but I'm concerned about performance?
There has to be some good way to do this with joins - I'm sure I'm missing something super obvious?
As you say the sum behavior with multiple joins is normal and working with sub queries (Or CTE for SQl Server) is not a bad practice.
Doing such GOUP BY on an ID and a total in sub queries won't significantly downgrade your performance (depending on your tables sizes).
Another solution could be doing one SUM sub query for each column you need. It would be easier to understand this way I think :
SELECT
invoices.id
, i_total.total as item_total
, p_total.total aspayment_total
, ( i_total.total - p_total.total) as outstanding_balance
FROM
invoices
LEFT JOIN (
SELECT invoice_id, SUM(total) as total FROM items GROUP BY invoice_id
) i_total
ON i_total.invoice_id = invoices.id
LEFT JOIN (
SELECT invoice_id, SUM(total) as total FROM payments GROUP BY invoice_id
) p_total
ON p_total.invoice_id = invoices.id
I think a common table expression (or in this case two CTEs) will give you what you want. You are using something called a scalar, which is precisely speaking not wrong, but as you correctly identified is ugly, hard to read, hard to maintain and can be non-performant in many situations.
CTE essentially take a query and makes it "behave" like a table. We define it once and then we can refer to it later.
with item_data as (
SELECT invoice_id, SUM(total) as item_total
FROM items
group by invoice_id
),
payment_data as (
SELECT invoice_id, SUM(total) as payment_total
FROM payments
group by invoice_id
)
select
i.*,
id.item_total - pd.payment_total as outstanding_balance
from
invoices i
join item_data id on i.invoice_id = id.invoice_id
join payment_data pd on i.invoice_id = pd.invoice_id
Untested, but hopefully you get the idea.

How to find missing data in table Sql

This is similar to How to find missing data rows using SQL? and How to find missing rows (dates) in a mysql table? but a bit more complex, so I'm hitting a wall.
I have a data table with the noted Primary key:
country_id (PK)
product_id (PK)
history_date (PK)
amount
I have a products table with all products, a countries table, and a calendar table with all valid dates.
I'd like to find all countries, dates and products for which there are missing products, with this wrinkle:
I only care about dates for which there are entries for a country for at least one product (i.e. if the country has NOTHING on that day, I don't need to find it) - so, by definition, there is an entry in the history table for every country and date I care about.
I know it's going to involve some joins maybe a cross join, but I'm hitting a real wall in finding missing data.
I tried this (pretty sure it wouldn't work):
SELECT h.history_date, h.product_id, h.country_id, h.amount
FROM products p
LEFT JOIN history h ON (p.product_id = h.product_id)
WHERE h.product_id IS NULL
No Joy.
I tried this too:
WITH allData AS (SELECT h1.country_id, p.product_id, h1.history_date
FROM products p
CROSS JOIN (SELECT DISTINCT country_id, history_date FROM history) h1)
SELECT f.history_date, f.product_id, f.country_id
FROM allData f
LEFT OUTER JOIN history h ON (f.country_id = h.country_id AND f.history_date = h.history_date AND f.product_id = h.product_id)
WHERE h.product_id IS NULL
AND h.country_id IS NOT NULL
AND h.history_date IS NOT null
also no luck. The CTE does get me every product on every date that there is also data, but the rest returns nothing.
I only care about dates for which there are entries for a country for
at least one product (i.e. if the country has NOTHING on that day, I
don't need to find it)
So we care about this combination:
from (select distinct country_id, history_date from history) country_date
cross join products p
Then it's just a matter of checking for existence:
select *
from (select distinct country_id, history_date from history) country_date
cross join products p
where not exists (select null
from history h
where country_date.country_id = h.country_id
and country_date.history_date = h.history_date
and p.product_id = h.product_id
)

Correlated subquery structure in MS Access SQL

I'm close, but I cannot seem to figure out this SQL query. I've got the SELECT and related FROM tables right, but I think my subquery structure is messed up.
Question: Compose an SQL statement to generate a list of two least expensive vendors (suppliers) for each raw material. In the result table, show the following columns: material ID, material description, vendor ID, vendor name, and the supplier's unit price. Sort the result table by material ID and supplier’s unit price in ascending order. Note: If a raw material has only one vendor (supplier), that supplier and its unit price for the raw material should also be in the result (output) table.
Here's what I've got:
SELECT Supplies_t.Material_ID, Raw_Materials_t.Material_Description,
Vendor_t.Vendor_ID, Vendor_t.Vendor_name, Supplies_t.Unit_price
FROM Supplies_t S1, Raw_Materials_t, Vendor_t
WHERE Vendor_t.Vendor_ID = Supplies_t.Vendor_ID
AND Supplies_t.Material_ID = Raw_Materials_t.Material_ID
AND Supplies_t.Unit_price IN
(SELECT TOP 2 Unit_price
FROM Supplies_t S2
WHERE S1.Material_ID = S2.Material_ID
ORDER BY S2.Material_ID ASC, S2.Unit_price ASC)
Using the correct table aliases may solve your problem. You should also use explicit JOIN syntax:
SELECT s.Material_ID, rm.Material_Description, v.Vendor_ID, v.Vendor_name, s.Unit_price
FROM (Supplies_t s INNER JOIN
Raw_Materials_t rm
ON s.Material_ID = rm.Material_ID
) INNER JOIN
Vendor_t v
ON v.Vendor_ID = s.Vendor_ID
WHERE s.Unit_price IN (SELECT TOP 2 s2.Unit_price
FROM Supplies_t s2
WHERE s.Material_ID = s2.Material_ID
ORDER BY s2.Material_ID ASC, s2.Unit_price ASC
);

Define sort order when updating

I have a script that updates an ID field on one table where that record matches to another table based on criteria.
Below is the general structure of my query.
update p.saleId = e.saleId
from products p inner join sales s on s.crit1 = p.crit1
where p.someDate between s.startDate and s.endDate
This is working fine. My issue is that in some situations there is more than one match on the 'sales' table with this query which is generally ok. I'd however like to sort these results based on another field to make sure the saleId I get is the one with the highest cost.
Is that possible?
As it is the saleID you want to set and the sales table you are looking up, you can probably just update all products records. Then you can write a simple update statement on the table and don't have to join. This makes this much easier to write:
update products p
set saleId =
(
select top(1) s.saleId
from sales s
where s.crit1 = p.crit1
and p.someDate between s.startDate and s.endDate
order by cost desc
);
The main difference to your statement is that mine sets saleId = NULL where there is no match in the sales table, while your lets these untouched. But I guess that doesn't make a difference here.
I hope the below query may solve. Wrote very high level draft as per your question. Please take only the concept not the syntax.
with maxSales as (select salesId, crit1 from sales s1
where cost = (select max(cost) from
sales s2 where s1.crit1 = s2.crit1)
update products p set p.saleId =
(select s.saleId from
maxSales s
where s.crit1 = p.crit1
and p.someDate between s.startDate and s.endDate)
UPDATE p
set p.saleId = e.rowNumber
FROM products p
INNER JOIN
(SELECT saleId, row_number() OVER (ORDER BY saleId DESC) as rowNumber
FROM sales)
e ON e.saleId = p.saleId
TRY THIS:
UPDATE p
SET p.saleid = s.saleid
FROM products p
INNER JOIN
(SELECT s.crit1,
s.saleid
FROM sales s
WHERE cost IN
(SELECT max(cost) cost
FROM sales
GROUP BY crit1)) s ON s.crit1 = p.crit1
None of the answers worked, but I managed to do it by using and Outer Apply as my join, and specified the sort order in that.
Cheers everyone for the input.

Select SUM from multiple tables

I keep getting the wrong sum value when I join 3 tables.
Here is a pic of the ERD of the table:
(Original here: http://dl.dropbox.com/u/18794525/AUG%207%20DUMP%20STAN.png )
Here is the query:
select SUM(gpCutBody.actualQty) as cutQty , SUM(gpSewBody.quantity) as sewQty
from jobOrder
inner join gpCutHead on gpCutHead.joNum = jobOrder.joNum
inner join gpSewHead on gpSewHead.joNum = jobOrder.joNum
inner join gpCutBody on gpCutBody.gpCutID = gpCutHead.gpCutID
inner join gpSewBody on gpSewBody.gpSewID = gpSewHead.gpSewID
If you are only interested in the quantities of cuts and sews for all orders, the simplest way to do it would be like this:
select (select SUM(gpCutBody.actualQty) from gpCutBody) as cutQty,
(select SUM(gpSewBody.quantity) from gpSewBody) as sewQty
(This assumes that cuts and sews will always have associated job orders.)
If you want to see a breakdown of cuts and sews by job order, something like this might be preferable:
select joNum, SUM(actualQty) as cutQty, SUM(quantity) as sewQty
from (select joNum, actualQty, 0 as quantity
from gpCutBody
union all
select joNum, 0 as actualQty, quantity
from gpSewBody) sc
group by joNum
Mark's approach is a good one. I want to suggest the alternative of doing the group by's before the union, simply because this can be a more general approach for summing along multiple dimensions.
Your problem is that you have two dimensions that you want to sum along, and you are getting a cross product of the values in the join.
select joNum, act.quantity as ActualQty, q.quantity as Quantity
from (select joNum, sum(actualQty) as quantity
from gpCutBody
group by joNum
) act full outer join
(select joNum, sum(quantity) as quantity
from gpSewBody
group by joNum
) q
on act.joNum = q.joNum
(I have kept Mark's assumption that doing this by joNum is the desired output.)