Optimizing Query With Subselect

Optimizing Query With Subselect - sql

I'm trying to generate a sales reports which lists each product + total sales in a given month. Its a little tricky because the prices of products can change throughout the month. For example:
Between Jan-01 and Jan-15, my company sells 50 Widgets at a cost of $10 each
Between Jan-15 and Jan-31, my company sells 50 more Widgets at a cost of $15 each
The total sales of Widgets for January = (50 * 10) + (50 * 15) = $1250
This setup is represented in the database as follows:
Sales table
Sale_ID ProductID Sale_Date
1 1 2009-01-01
2 1 2009-01-01
3 1 2009-01-02
...
50 1 2009-01-15
51 1 2009-01-16
52 1 2009-01-17
...
100 1 2009-01-31
Prices table
Product_ID Sale_Date Price
1 2009-01-01 10.00
1 2009-01-16 15.00
When a price is defined in the prices table, it is applied to all products sold with the given ProductID from the given SaleDate going forward.
Basically, I'm looking for a query which returns data as follows:
Desired output
Sale_ID ProductID Sale_Date Price
1 1 2009-01-01 10.00
2 1 2009-01-01 10.00
3 1 2009-01-02 10.00
...
50 1 2009-01-15 10.00
51 1 2009-01-16 15.00
52 1 2009-01-17 15.00
...
100 1 2009-01-31 15.00
I have the following query:
SELECT
Sale_ID,
Product_ID,
Sale_Date,
(
SELECT TOP 1 Price
FROM Prices
WHERE
Prices.Product_ID = Sales.Product_ID
AND Prices.Sale_Date < Sales.Sale_Date
ORDER BY Prices.Sale_Date DESC
) as Price
FROM Sales
This works, but is there a more efficient query than a nested sub-select?
And before you point out that it would just be easier to include "price" in the Sales table, I should mention that the schema is maintained by another vendor and I'm unable to change it. And in case it matters, I'm using SQL Server 2000.

If you start storing start and end dates, or create a view that includes the start and end dates (you can even create an indexed view) then you can heavily simplify your query. (provided you are certain there are no range overlaps)
SELECT
Sale_ID,
Product_ID,
Sale_Date,
Price
FROM Sales
JOIN Prices on Sale_date > StartDate and Sale_Date <= EndDate
-- careful not to use between it includes both ends
Note:
A technique along these lines will allow you to do this with a view. Note, if you need to index the view, it will have to be juggled around quite a bit ..
create table t (d datetime)
insert t values(getdate())
insert t values(getdate()+1)
insert t values(getdate()+2)
go
create view myview
as
select start = isnull(max(t2.d), '1975-1-1'), finish = t1.d from t t1
left join t t2 on t1.d > t2.d
group by t1.d
select * from myview
start finish
----------------------- -----------------------
1975-01-01 00:00:00.000 2009-01-27 11:12:57.383
2009-01-27 11:12:57.383 2009-01-28 11:12:57.383
2009-01-28 11:12:57.383 2009-01-29 11:12:57.383

It's well to avoid these types of correlated subqueries. Here's a classic technique for such cases.
SELECT
Sale_ID,
Product_ID,
Sale_Date,
p1.Price
FROM Sales AS s
LEFT JOIN Prices AS p1 ON s.ProductID = p1.ProductID
AND s.Sale_Date >= p1.Sale_Date
LEFT JOIN Prices AS p2 ON s.ProductID = p2.ProductID
AND s.Sale_Date >= p2.Sale_Date
AND p2.Sale_Date > p1.Sale_Date
WHERE p2.Price IS NULL -- want this one not to be found
Use a left outer join on the pricing table as p2, and look for a NULL record demonstrating that the matched product-price record found in p1 is the most recent on or before the sales date.
(I would have inner-joined the first price match, but if there is none, it's nice to have the product show up anyway so you know there's a problem.)

Are you actually running into performance problems or are you just anticipating them? I would implement this exactly as you have, were my hands tied from a schema-modification standpoint as yours are.

I agreee with Sean. The code you have written is very clean and understandable. If you are having performance issues, then take the extra effort to make the code faster. Otherwise, you are making the code more complex for no reason. Nested sub-selects are extremely useful when used judiciously.

The combination of Product_ID and Sale_Date is your foreign key. Try a select-join on Product_ID, Sale_Date.

Related

How to GROUP BY and aggregate fields after JOINS in Query

I have the following data which I got from the following query:
date
quantity
name
season_id
contract_id
signing_date
1
2016-07-01 00:00:00
3
John Doe
4
3000
2016-10-20
2
2021-07-28 00:00:00
14
John Doe
5
3541
2021-01-28
3
2016-08-15 00:00:00
10
John Doe
5
3000
2016-10-20
4
2016-08-02 00:00:00
5
John Doe
5
1528
2016-03-02
WITH ws AS (select date, quantity,
name, season_id, contract_id, contract.signing_date
FROM warehouse_state
JOIN inventory ON inventory.id = warehouse_state.inventory_id
JOIN owner ON owner.inventory_id = warehouse_state.id
JOIN season ON season.id = owner.season_id
JOIN contract ON contract.id = warehouse_contract.contract_id
GROUP BY date, quantity, name, season.id, contract.id, signing_date)
Now, I am having trouble aggregating the ws records based on dates.
Let's say I want a SUM of quantity grouped by date where date is date before contract signing_date. Not sure how to proceed with this, and probably it can be done in a single query without having a WITH x AS query or something actually using it like:
SELECT * FROM ws
LEFT JOIN contract on contract.contract_id = ws.contract_id
-- Here set following condition: for any ws record that has `date` before `signing_date`, SUM quantity and return aggregate
Expected output:
contract_id
signing_date
quantity
name
3000
2016-10-20
18
John Doe
3541
2021-01-28
18
John Doe
1528
2021-01-28
0
John Doe
In the expect output quantity is a SUM, and the record is grouped by contract. In the first record, #1, #3, and #4 were aggregated because their date values are before the contract (3000) signing_date. Even though, the 4th record does not have the same contract_id, it's also aggregated because its date field is before the signing date in contract 3000. Similarly, when grouped by contract 3541, record #2 is excluded from the aggregation because its date value is not before the signing_date of contract 3541.
Any suggestions? Thanks

Does that SQL really compile? Reason is I see you referencing an inventory table that I don't see anywhere.
Also you are grouping on all columns -- essential a "select distinct." Is that what you meant to do?
That aside, assuming your joins are correct and a couple of other assumptions, I'm going to sub them all with "< your tables and joins >." I think all you want is a simple aggregate. No need for a CTE (with clause).
select
date, sum (quantity)
FROM
< your tables and joins >
where
date < signing_date
GROUP BY
date
Alternatively, you can see the total quantity for all dates AND the total quantity before the contract date using a filter:
select
date, sum (quantity) as total_quantity,
sum (quantity) filter (where date < signing_date) as qty_before_contract_sign
FROM
< your tables and joins >
GROUP BY
date
If you wanted to see the other columns as well, then you want a windowing function. Let me know if that's the case and I can demonstrate.
-- EDIT 9/7/22 --
Based on your update, I think this is what you want:
select
contract_id, contract.signing_date, sum (quantity) as quantity,
name
FROM warehouse_state
JOIN inventory ON inventory.id = warehouse_state.inventory_id
JOIN owner ON owner.inventory_id = warehouse_state.id
JOIN season ON season.id = owner.season_id
JOIN contract ON contract.id = warehouse_contract.contract_id
where
date < contact.signing_date
GROUP BY
contract_id, contract.signing_date, name
But the one gotcha is Contract 1528 will not show up in this output since it's filtered out by the where condition.
I'm not fond of this, but you could keep the filter to overcome this... maybe there's a better solution.
select
contract_id, contract.signing_date,
coalesce (sum (quantity) filter (where date < contact.signing_date), 0) as quantity,
name
FROM warehouse_state
JOIN inventory ON inventory.id = warehouse_state.inventory_id
JOIN owner ON owner.inventory_id = warehouse_state.id
JOIN season ON season.id = owner.season_id
JOIN contract ON contract.id = warehouse_contract.contract_id
GROUP BY
contract_id, contract.signing_date, name
Also, my output does not match yours, but I'm hoping that's because of sample data.

Is there SQL Logic to reduce type 2 table along a dimension

I have a slowly changing type 2 price change table which I need to reduce the size of to improve performance. Often rows are written to the table even if no price change occurred (when some other dimensional field changed) and the result is that for any product the table could be 3-10x the size it needs to be if it were including only changes in price.
I'd like to compress the table so that it only has contains the first effective date and last expiration date for each price until that price changes that can also
Deal with an unknown number of rows of the same price
Deal with products going back to an old price
As an example if i have this raw data:
Product
Price Effective Date
Price Expiration Date
Price
123456
6/22/18
9/19/18
120
123456
9/20/18
11/8/18
120
123456
11/9/18
11/29/18
120
123456
11/30/18
12/6/18
120
123456
12/7/18
12/19/18
85
123456
12/20/18
1/1/19
85
123456
1/2/19
2/19/19
85
123456
2/20/19
2/20/19
120
123456
2/21/19
3/19/19
85
123456
3/20/19
5/22/19
85
123456
5/23/19
10/10/19
85
123456
10/11/19
6/19/19
80
123456
6/20/20
12/31/99
80
I need to transform it into this:
Product
Price Effective Date
Price Expiration Date
Price
123456
6/22/18
12/6/18
120
123456
12/7/18
2/19/19
85
123456
2/20/19
2/20/19
120
123456
2/21/19
10/10/19
85
123456
10/11/19
12/31/99
80

You can first find the intervals where the price does not change, and then group on those intervals:
with to_r as (select row_number() over (order by (select 1)) r, t.* from data_table t),
to_group as (select t.*, (select sum(t1.r < t.r and t1.price != t.price) from to_r t1) c from to_r t)
select t.product, min(t.effective), max(t.expiration), max(t.price) from to_group t group by t.c order by t.r;
Output:
Product
Price Effective Date
Price Expiration Date
Price
123456
6/22/18
12/6/18
120
123456
12/7/18
2/19/19
85
123456
2/20/19
2/20/19
120
123456
2/21/19
10/10/19
85
123456
10/11/19
12/31/99
80

This is a type of gaps-and-islands problem. I would recommend reconstructing the data, saving it in a temporary table, and then reloading the existing table.
The code to reconstruct the data is:
select product, price, min(effective_date), max(expiration_date)
from (select t.*,
sum(case when prev_expiration_date = effective_date - interval '1 day' then 0 else 1 end) over (partition by product order by effective_date) as grp
from (select t.*,
lag(expiration_date) over (partition by product, price order by effective_date) as prev_expiration_date
from t
) t
) t
group by product, price, grp;
Note that the logic for date arithmetic varies depending on the database.
Save this result into a temporary table, temp_t or whatever, using select into, create table as, or whatever your database supports.
Then empty the current table and reload it:
truncate table t;
insert into t
select product, price, effective_date, expiration_date
from temp_t;
Notes:
Validate the data before using truncate_table!
If there are triggers or columns with default values, you might want to be careful.

It sounds like you are asking for a temporal schema? Where for a given date you can know the price of an asset?
This is done with two tables; price_current and price_history.
price_id
item_id
price
rec_created
1
1
100
'2015-04-18'
price_id
item_id
from
to
price
1
1
'2001-01-01'
'2004-05-01'
114
1
1
'2004-05-01'
'2015-04-18'
102
i.e. for any item, you can ascertain the date it was set without polluting your "current" table. For this to work effectively you will need to have UPDATE triggers on your current_table. When you update a record you insert into the history table the details and the period it was valid from.
CREATE OR REPLACE TRIGGER trg_price_current_update
AS
BEGIN
INSERT INTO price_history(price_id, item_id, from, to, price)
SELECT price_id, item_id, rec_created, GETDATE(), price
FROM rows_updated
END
Now you have a distinction between current and historical, without your current table (presumably the busier table) getting out of hand because of maintaining historical state. Hope i understood the question.
To ignore 'dummy' updates, just alter the trigger to ignore empty changes (if that's not handled by the DBMS anyway). Tbh, this should and could be done application side easily enough, but to manage it via the trigger:
CREATE OR REPLACE TRIGGER trg_price_current_update
AS
BEGIN
INSERT INTO price_history(price_id, item_id, from, to, price)
SELECT price_id, item_id, rec_created, GETDATE(), price
FROM rows_updated u
INNER JOIN price_current ON u.price_id = p.price_id
WHERE u.price <> p.price
END
i.e. rows_updated contains the record from the update, we insert into the history table the previous row, providing the previous row's price is different from the current row's price.
(edited to include new trigger. I also changed the date held in rec_created, this must be the date the row is created, not the first instance that product had a price assigned to it. that was a mistake. Regarding the dates, I am lazy to put the full DD-MM-YYYY hh:mm:ss:zzz, but that would generally be useful in between queries)

What you are asking for is a versioning system. Many RDBMS platforms implement support for this out of the box (it's a SQL standard), which may be suitable, depending on your requirements.
You have not tagged a specific platform so it's not possible to be specific to your situation. I use the concept of system versioning regularly in MS Sql Server, where you would implement it thus:
assuming schema "history" exists,
alter table dbo.MyTable add
ValidFrom datetime2 generated always as row start hidden constraint DF_MyTableSysStart default sysutcdatetime(),
ValidTo datetime2 generated always as row end hidden constraint DF_MyTableSysEnd default convert(datetime2, '9999-12-31 23:59:59.9999999'),
period for system_time (ValidFrom, ValidTo);
end
alter table MyTable set (system_versioning = on (history_table = History.MyTable));
create clustered index ix_MyTable on History.MyTable (ValidTo, ValidFrom) with (data_compression = page,drop_existing=on) on History;
A number of syntax extensions exist to aid querying the temporal data for example to find historical data at a point in time.
Alternatively, to utilise a single table but handle the duplication, you could create an instead of trigger.
the idea here is that the trigger gets to intercept the data before it is inserted, where you can check to see of the value is different to the last value and discard or insert as appropriate.

something along the lines of:
WITH keeps AS
(
SELECT p.product_id, p.effective, p.expires, p.price, CASE WHEN EXISTS(SELECT 1 FROM prices p1 WHERE p1.effective = DATEADD(DAY, p.exires, 1) AND p1.price <> p.price) THEN 1 ELSE 0 END AS has_after, CASE WHEN EXISTS(SELECT 1 FROM prices p1 WHERE p1.expires = DATEADD(DAY, p.effective, -1) AND p1.price <> p.price) THEN 1 ELSE 0 END AS has_before
FROM prices p
)
SELECT * FROM keeps
WHERE has_after = 1
OR has_before = 1
UNION ALL
SELECT p.product_id, p.effective, p.exires, p.price
FROM prices p
WHERE p.effective = (SELECT MIN(effective) FROM prices p1 WHERE p1.product_id = p.product_id)
What's it doing:
Find all the entries where there exists another entry whose effective date is that of the previous entry's expiry date + 1, and the price of that new entry is different. This gives us all the actual changes in price. But we miss the first price entry, so we simply include that in the results.
e.g.:
product_id
effective
expires
price
has_before
has_after
123456
6/22/18
9/19/18
120
0
0
123456
9/20/18
11/8/18
120
0
0
123456
11/9/18
11/29/18
120
0
0
123456
11/30/18
12/6/18
120
0
1
123456
12/7/18
12/19/18
85
1
0
123456
12/20/18
1/1/19
85
0
0
123456
2/1/19
2/19/19
85
0
1
123456
2/20/19
2/20/19
120
1
1
123456
2/21/19
3/19/19
85
1
0

Sql query to calculate first occurrence of a sales order not fulfilled by stock

I have two tables:
Sales Orders (SO ) with fields:Part, Due_Date, Qty
Part with fields Part and Stock.
I an trying to write a query that will produce the first occurrence ( by date - SO.Due_Date) that a sales order (SO.Qty) cannot be fulfilled by the stock.
This is easy if there is no stock i.e. Part.Stock=0 or if there is only one sales order for the part (SO.Qty > Part.Stock)
If there are multiple sales orders I only want the first one shown e.g.
Part.Part = Box , Part.Stock = 250
SO.Part | SO.Due_Date | SO.Qty
Box | 26/10/2014 | 100
Box | 27/10/2014 | 100
Box | 28/10/2014 | 100 * Return this row
Box | 29/10/2014 | 100
I think I need a sub query or need to use CTE but I can't work it out unless I use a loop. The tables have thousands of parts and sales orders and I am trying to run this query as quickly as possible.
Many thanks for your help

I assume this is a learning exercise, as no real business would work this way.
Anyway, here is a query to do what you want:
select *
from sales_order as so1
where due_date =
(select min(due_date)
from sales_order as so2
inner join part as p on p.part = so2.part
where so1.part = so2.part
and stock < (
select sum(quantity)
from sales_order as so3
where so3.due_date <= so2.due_date
and so3.part = so2.part
)
)
Which I have put into a working fiddle here: http://sqlfiddle.com/#!2/bd8ab5/1
There are some assumptions such as one order per date, but I believe it answers the question.

A query that uses a self join to calculate the running quantity total for each row and selects the row with the smallest due date having a running total greater than p.stock
select so.part, so.due_date, so.quantity
from sales_order so
join part p on p.part = so.part
join sales_order so2 on so2.part = so.part
and so2.due_date <= so.due_date
where p.part = 'Box'
group by so.part, so.due_date, so.quantity
having sum(so2.quantity) > max(p.stock)
order by so.due_date limit 1

mysql group by date with multiple join

SELECT
tba.UpdatedDate AS UpdatedDate,
tsh.SupplierID,
ts.ProductCode as ProductCode,
sum(tba.AfterDiscount) as AfterDiscount,
sum(tba.Quantity) as Quantity
FROM
tblstockhistory as tsh
left join tblstock as ts
on tsh.StockID=ts.StockID
left join tblbasket as tba
on ts.ProductCode=tba.ProductCode
and tsh.SupplierID=49
AND tba.Status=3
group by
tba.UpdatedDate
ORDER BY
Quantity DESC
i have the supplier table, the supplier id tagged in to tblstockhistory table, and in this tblstockhistory table contains the StockID(reference from tblstock table), and i have Stock table contains StockID, ProductCode ,
And i have the tblbasket table , in this am maintaining the ProductCode,
My idea here ,
i want to show thw stats by supplierID, when i pass the supplier id, it show show , this supplier supplied goods sale stats,
But the above query sometime return null value, and it takes too much time for excution, around 50 seconds ,
I what somthing like below from above query
Date SupplierID, Amount, Quantity
2010-12-12 12 12200 20
2010-12-12 40 10252 30
2010-12-12 10 12551 50
2010-12-13 22 1900 20
2010-12-13 40 18652 30
2010-12-13 85 19681 50
2010-12-15 22 1900 20
2010-12-15 40 18652 30
2010-12-15 85 19681 50

Does a tblstockhistory ever exist without a stockID. If it doesn't you can convert it to an inner join which can help.
e.g.
tblstockhistory as tsh
INNER join tblstock as ts
on tsh.StockID=ts.StockID
Also you might to consider adding indexes if they don't currently exist.
At the very least I would have the following fields indexed since they will likely be joined and queried commonly.
tblstockhistory.SockID
tblstockhistory.SupplierID
tblstock.StockID
tblstock.ProductCode
tblbasket.ProductCode
tblBacket.Status
tblbasket.UpdatedDate
Finally if its really important that this query be lightening fast you can create summary tables and update them periodically.

re write the group by clause as and try again
group by
tba.UpdatedDate, tsh.SupplierID
you have mentioned ProductCode in your query but not in the 'result' you wanted if you want to display ProductCode as well then add it to the group by clause or else remove it from the select clause.

Selecting the latest per group of items [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Retrieving the last record in each group
i have 2 tables products and cost
PRODUCT
ProdCode - PK
ProdName
COST
Effectivedate - PK
RetailCOst
Prodcode
i tried this query:
SELECT a.ProdCOde AS id, MAX(EffectiveDate) AS edate, RetailCOst AS retail
FROM cost a
INNER JOIN product b USING (ProdCode)
WHERE EffectiveDate <= '2009-10-01'
GROUP BY a.ProdCode;
uhm yah its showing the right effectivedate but the cost on that specific effectivedate doesnt match.
so i want to select the latest date with the matching cost per item.
for example the date i selected is '2009-12-25' and the records for 1 item are these:
ProdCode |EffectiveDate| Cost
00010000 | 2009-01-05 | 50
00010000 | 2009-05-25 | 48
00010000 | 2010-07-01 | 40
so in result i should get 00010000|2009-05-25|48 because it is lesser than the date on my query and it is the latest for that item. and then i want to to show on my query the latest costs on each product.
hope to hear from you soon! thanks!

You need to use a subquery here:
SELECT maxdates.ProdCode, maxdates.maxDate, cost.RetailCost as retail
SELECT ProdCode, max(EffectiveDate) as maxDate
FROM cost
WHERE EffectiveDate < '2009-10-01'
GROUP BY ProdCode
) maxdates
LEFT JOIN cost ON (maxdates.ProdCode=cost.ProdCode
AND maxdates.maxDate=cost.EffectiveDate)
Explanation:
The inner SELECT gives a list of all Products and their respective maximum EffectiveDates. The join "glues" the retail cost per data entry to the result.

Alternatively, using the old max concat trick should do the trick.
SELECT
p.ProdCode,
SUBSTRING(MAX(CONCAT(d.EffectiveDate, c.RetailCost)), 1, 10) AS date,
SUBSTRING(MAX(CONCAT(d.EffectiveDate, c.RetailCost)), 10, 100) + 0 AS cost
FROM
product p,
cost c
WHERE
p.ProdCode = c.ProdCode AND
c.EffectiveDate < '2009-10-01'
GROUP BY
p.ProdCode

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Optimizing Query With Subselect - sql

Are you actually running into performance problems or are you just anticipating them? I would implement this exactly as you have, were my hands tied from a schema-modification standpoint as yours are.

I agreee with Sean. The code you have written is very clean and understandable. If you are having performance issues, then take the extra effort to make the code faster. Otherwise, you are making the code more complex for no reason. Nested sub-selects are extremely useful when used judiciously.

The combination of Product_ID and Sale_Date is your foreign key. Try a select-join on Product_ID, Sale_Date.

Related

How to GROUP BY and aggregate fields after JOINS in Query

Is there SQL Logic to reduce type 2 table along a dimension

Sql query to calculate first occurrence of a sales order not fulfilled by stock

mysql group by date with multiple join

Selecting the latest per group of items [duplicate]

Categories

Resources