SQL Server 2008 version of OVER(... Rows Unbounded Preceding) - sql

Looking for help in converting this to SQL Server 2008 friendly as I just can't work it out. I've tried cross applies and inner joins (not saying I did them right) to no avail... Any suggestions?
What this essentially does is have a table of stock and a table of orders.
and combine the two to show me what to pick once the stock is taken away (see my previous question for more details More Details)
WITH ADVPICK
AS (SELECT 'A' AS PlaceA,
placeb,
CASE
WHEN picktime = '00:00' THEN '07:00'
ELSE ISNULL(picktime, '12:00')
END AS picktime,
Cast(product AS INT) AS product,
prd_description,
-qty AS Qty
FROM t_pick_orders
UNION ALL
SELECT 'A' AS PlaceA,
placeb,
'0',
Cast(code AS INT) AS product,
NULL,
stock
FROM t_pick_stock),
STOCK_POST_ORDER
AS (SELECT *,
Sum(qty)
OVER (
PARTITION BY placeb, product
ORDER BY picktime ROWS UNBOUNDED PRECEDING ) AS new_qty
FROM ADVPICK)
SELECT *,
CASE
WHEN new_qty > qty THEN new_qty
ELSE qty
END AS order_shortfall
FROM STOCK_POST_ORDER
WHERE new_qty < 0
ORDER BY placeb,
picktime,
product
Now the whole sum over partition by order by is SQL Server 2012+ however I have two servers that run on 2008 and so need it converted...
Expected Results:
+--------+--------+----------+---------+-----------+-------+---------+-----------------+
| PlaceA | PlaceB | Picktime | product | Prd_Descr | qty | new_qty | order_shortfall |
+--------+--------+----------+---------+-----------+-------+---------+-----------------+
| BW | AMES | 16:00 | 1356 | Product A | -1330 | -17 | -17 |
| BW | AMES | 16:00 | 17 | Product B | -48 | -42 | -42 |
| BW | AMES | 17:00 | 1356 | Product A | -840 | -857 | -840 |
| BW | AMES | 18:00 | 1356 | Product A | -770 | -1627 | -770 |
| BW | AMES | 18:00 | 17 | Product B | -528 | -570 | -528 |
| BW | AMES | 19:00 | 1356 | Product A | -700 | -2327 | -700 |
| BW | AMES | 20:00 | 1356 | Product A | -910 | -3237 | -910 |
| BW | AMES | 20:00 | 8009 | Product C | -192 | -52 | -52 |
| BW | AMES | 20:00 | 897 | Product D | -90 | -10 | -10 |
+--------+--------+----------+---------+-----------+-------+---------+-----------------+

One straight-forward way to do it is to use a correlated sub-query in CROSS APPLY.
If your table is more or less large, then your next question would be how to make it fast. Index on PlaceB, Product, PickTime INCLUDE (Qty) should help. But, if your table is really large, cursor would be better.
WITH
ADVPICK
AS
(
SELECT 'A' as PlaceA,PlaceB, case when PickTime = '00:00' then '07:00' else isnull(picktime,'12:00') end as picktime, cast(Product as int) as product, Prd_Description, -Qty AS Qty FROM t_pick_orders
UNION ALL
SELECT 'A' as PlaceA,PlaceB, '0', cast(Code as int) as product, NULL, Stock FROM t_pick_stock
)
,stock_post_order
AS
(
SELECT
*
FROM
ADVPICK AS Main
CROSS APPLY
(
SELECT SUM(Sub.Qty) AS new_qty
FROM ADVPICK AS Sub
WHERE
Sub.PlaceB = Main.PlaceB
AND Sub.Product = Main.Product
AND T.PickTime <= Main.PickTime
) AS A
)
SELECT
*,
CASE WHEN new_qty > qty THEN new_qty ELSE qty END AS order_shortfall
FROM
stock_post_order
WHERE
new_qty < 0
ORDER BY PlaceB, picktime, product;
Oh, and if (PlaceB, Product, PickTime) is not unique, you'll get somewhat different results to original query with SUM() OVER. If you need exactly same results, you need to use some extra column (like ID) to resolve the ties.

Related

Find the first order of a supplier in a day using SQL

I am trying to write a query to return supplier ID (sup_id), order date and the order ID of the first order (based on earliest time).
+--------+--------+------------+--------+-----------------+
|orderid | sup_id | items | sales | order_ts |
+--------+--------+------------+--------+-----------------+
|1111132 | 3 | 1 | 27,0 | 24/04/17 13:00 |
|1111137 | 3 | 2 | 69,0 | 02/02/17 16:30 |
|1111147 | 1 | 1 | 87,0 | 25/04/17 08:25 |
|1111153 | 1 | 3 | 82,0 | 05/11/17 10:30 |
|1111155 | 2 | 1 | 29,0 | 03/07/17 02:30 |
|1111160 | 2 | 2 | 44,0 | 30/01/17 20:45 |
|....... | ... | ... | ... | ... ... |
+--------+--------+------------+--------+-----------------+
Output I am looking for:
+--------+--------+------------+
| sup_id | date | order_id |
+--------+--------+------------+
|....... | ... | ... |
+--------+--------+------------+
I tried using a subquery in the join clause as below but didn't know how to join it without having selected order_id.
SELECT sup_id, date(order_ts), order_id
FROM sales s
JOIN
(
SELECT sup_id, date(order_ts) as date, min(time(order_date))
FROM sales
GROUP BY merchant_id, date
) m
on ...
Kindly assist.
You can use not exists:
select *
from sales
where not exists (
-- find sales for same supplier, earlier date, same day
select *
from sales as older
where older.sup_id = sales.sup_id
and older.order_ts < sales.order_ts
and older.order_ts >= cast(sales.order_ts as date)
)
The query below might not be the fastest in the world, but it should give you all information you need.
select order_id, sup_id, items, sales, order_ts
from sales s
where order_ts <= (
select min(order_ts)
from sales m
where m.sup_id = s.sup_id
)
select sup_id, min(order_ts), min(order_id) from sales
where order_ts = '2022-15-03'
group by sup_id
Assumed orderid is an identity / auto increment column

Subtracting previous row value from current row

I'm doing an aggregation like this:
select
date,
product,
count(*) as cnt
from
t1
where
yyyy_mm_dd in ('2020-03-31', '2020-07-31', '2020-09-30', '2020-12-31')
group by
1,2
order by
product asc, date asc
This produces data which looks like this:
| date | product | cnt | difference |
|------------|---------|------|------------|
| 2020-03-31 | p1 | 100 | null |
| 2020-07-31 | p1 | 1000 | 900 |
| 2020-09-30 | p1 | 900 | -100 |
| 2020-12-31 | p1 | 1100 | 200 |
| 2020-03-31 | p2 | 200 | null |
| 2020-07-31 | p2 | 210 | 10 |
| ... | ... | ... | x |
But without the difference column. How could I make such a calculation? I could pivot the date column and subtract that way but maybe there's a better way
Was able to use lag with partition by and order by to get this to work:
select
date,
product,
count,
count - lag(count) over (partition by product order by date, product) as difference
from(
select
date,
product,
count(*) as count
from
t1
where
yyyy_mm_dd in ('2020-03-31', '2020-07-31', '2020-09-30', '2020-12-31')
group by
1,2
) t

SQL How to Pivot this table?

I have a very hard time understanding how to pivot something.
I have this simple query
select
year
,AVG(Quantity) Quantity
,AVG(Price) Price
,CAST(Datepart(wk,Date) as nvarchar) + '-' + RIGHT(CAST(year(Date) as NVARCHAR),2) Week
from Yearly
GROUP BY Year, CAST(Datepart(wk,Date) as nvarchar) + '-' + RIGHT(CAST(year(Date) as NVARCHAR),2)
Which results in this table
+------+----------+---------------+------+
| year | Quantity | Price | Week |
+------+----------+---------------+------+
| 16 | 877814 | 68636081.39 | 6-20 |
| 17 | 436029 | 2635873.72 | 6-20 |
| 18 | 3793464 | 65971353.61 | 6-20 |
| 19 | 23552519 | 478741292.122 | 6-20 |
| 20 | 6973687 | 34658140.815 | 6-20 |
| Z01 | 7776508 | 54949609.221 | 6-20 |
+------+----------+---------------+------+
Right now I only have the one week, but as the days go by, I have a job that is going to build those 6 rows for 7-20, 8-20, 9-20, etc.
I want my table to look like
+------+--------+-------------+--------+------------+---------+-------------+----------+-------------+---------+-------------+---------+-------------+----------+-------------+
| | 16 | 17 | 18 | 19 | 20 | Z01 | Total | |
+------+--------+-------------+--------+------------+---------+-------------+----------+-------------+---------+-------------+---------+-------------+----------+-------------+
| Week | Qty | Price | Qty | Price | Qty | Price | Qty | Price | Qty | Price | Qty | Price | Qty | Price |
| 6-20 | 877814 | 68636081.39 | 436029 | 2635873.72 | 3793464 | 65971353.61 | 23552519 | 478741292.1 | 6973687 | 34658140.82 | 7776508 | 54949609.22 | 43410021 | 705592350.9 |
| 7-20 | | | | | | | | | | | | | | |
| 8-20 | | | | | | | | | | | | | | |
+------+--------+-------------+--------+------------+---------+-------------+----------+-------------+---------+-------------+---------+-------------+----------+-------------+
Should I use Pivot or is there a better way to do this? If
This is a variation on pwilcox's answer, but more concise:
select v.week,
avg(case when year = 16 then quantity end) as quantityYr16,
avg(case when year = 16 then price end) as priceYr16,
avg(case when year = 17 then quantity end) as quantityYr17,
avg(case when year = 17 then price end) as priceYr17,
. . .
sum(quantity) as totalQuantity,
sum(price) as totalPrice
from yearly cross apply
(values (concat(datename(week, date), '-', datename(year, date)))
) v(week)
group by v.week
order by v.week;
Notes:
Never use varchar() without a length. The default length varies by context and may not be long enough.
datename() is a convenient function that returns strings and not numbers.
When using the date part functions, spell out the full names of the date parts -- week, year. This makes the code easier to read.
For a multi column pivot of the sort that you're wanting, you're going to have to take advantage of the fact that aggregate operations don't consider null values. So place case statements inside your averages that give the quantity or price value if associated with any given year, and null otherwise.
select ap.week,
quantityYr16 = avg(case when year = 16 then quantity end),
priceYr16 = avg(case when year = 16 then price end),
quantityYr17 = avg(case when year = 17 then quantity end),
priceYr17 = avg(case when year = 17 then price end),
...
from yearly
cross apply (select week =
cast(datepart(wk,date) as nvarchar) + '-' +
right(cast(year(date) as nvarchar),2)
) ap
group by ap.week
However, this structure is for reporting. SQL doesn't handle it as well as reporting tools such as HTML, SSRS or Excel. I would do this operation with whatever reporting tool you ultimately report this with.
Here is a PIVOT. Assumed you did not need Dynamic
Example
Select *
From (
Select A.Week
,B.*
From (
-- YOUR ORIGINAL QUERY HERE (without the Order By) ---
) A
Cross Apply ( values (concat(year,'_Qty') ,[Quantity])
,(concat(year,'_Price'),[Price])
,(concat('Total','_Qty'),[Quantity])
,(concat('Total','_Price'),[Price])
) B(item,value)
) src
Pivot (sum(Value) for Item in ([16_Qty],[16_Price],[17_Qty],[17_Price],[18_Qty],[18_Price],[19_Qty],[19_Price],[20_Qty],[20_Price],[Z01_Qty],[Z01_Price],[Total_Qty],[Total_Price]) ) pvt
Returns

Duplicate records upon joining table

I am still very new to SQL and Tableau however I am trying to work myself towards achieving a personal project of mine.
Table A; shows a table which contains the defect quantity per product category and when it was raised
+--------+-------------+--------------+-----------------+
| Issue# | Date_Raised | Category_ID# | Defect_Quantity |
+--------+-------------+--------------+-----------------+
| PCR12 | 11-Jan-2019 | Product#1 | 14 |
| PCR13 | 12-Jan-2019 | Product#1 | 54 |
| PCR14 | 5-Feb-2019 | Product#1 | 5 |
| PCR15 | 5-Feb-2019 | Product#2 | 7 |
| PCR16 | 20-Mar-2019 | Product#1 | 76 |
| PCR17 | 22-Mar-2019 | Product#2 | 5 |
| PCR18 | 25-Mar-2019 | Product#1 | 89 |
+--------+-------------+--------------+-----------------+
Table B; shows the consumption quantity of each product by month
+-------------+--------------+-------------------+
| Date_Raised | Category_ID# | Consumed_Quantity |
+-------------+--------------+-------------------+
| 5-Jan-2019 | Product#1 | 100 |
| 17-Jan-2019 | Product#1 | 200 |
| 5-Feb-2019 | Product#1 | 100 |
| 8-Feb-2019 | Product#2 | 50 |
| 10-Mar-2019 | Product#1 | 100 |
| 12-Mar-2019 | Product#2 | 50 |
+-------------+--------------+-------------------+
END RESULT
I would like to create a table/bar chart in tableau that shows that Defect_Quantity/Consumed_Quantity per month, per Category_ID#, so something like this below;
+----------+-----------+-----------+
| Month | Product#1 | Product#2 |
+----------+-----------+-----------+
| Jan-2019 | 23% | |
| Feb-2019 | 5% | 14% |
| Mar-2019 | 89% | 10% |
+----------+-----------+-----------+
WHAT I HAVE TRIED SO FAR
Unfortunately i have not really done anything, i am struggling to understand how do i get rid of the duplicates upon joining the tables based on Category_ID#.
Appreciate all the help I can receive here.
I can think of doing left joins on both product1 and 2.
select to_char(to_date(Date_Raised,'d-mon-yyyy'),'mon-yyyy')
, (p2.product1 - sum(case when category_id='Product#1' then Defect_Quantity else 0 end))/p2.product1 * 100
, (p2.product2 - sum(case when category_id='Product#2' then Defect_Quantity else 0 end))/p2.product2 * 100
from tableA t1
left join
(select to_char(to_date(Date_Raised,'d-mon-yyyy'),'mon-yyyy') Date_Raised
, sum(Comsumed_Quantity) as product1 tableB
where category_id = 'Product#1'
group by to_char(to_date(Date_Raised,'d-mon-yyyy'),'mon-yyyy')) p1
on p1.Date_Raised = t1.Date_Raised
left join
(select to_char(to_date(Date_Raised,'d-mon-yyyy'),'mon-yyyy') Date_Raised
, sum(Comsumed_Quantity) as product2 tableB
where category_id = 'Product#2'
group by to_char(to_date(Date_Raised,'d-mon-yyyy'),'mon-yyyy')) p2
on p2.Date_Raised = t1.Date_Raised
group by to_char(to_date(Date_Raised,'d-mon-yyyy'),'mon-yyyy')
By using ROW_NUMBER() OVER (PARTITION BY ORDER BY ) as RN, you can remove duplicate rows. As of your end result you should extract month from date and use pivot to achieve.
I would do this as:
select to_char(date_raised, 'YYYY-MM'),
(sum(case when product = 'Product#1' then defect_quantity end) /
sum(case when product = 'Product#1' then consumed_quantity end)
) as product1,
(sum(case when product = 'Product#2' then defect_quantity end) /
sum(case when product = 'Product#2' then consumed_quantity end)
) as product2
from ((select date_raised, product, defect_quantity, 0 as consumed_quantity
from a
) union all
(select date_raised, product, 0 as defect_quantity, consumed_quantity
from b
)
) ab
group by to_char(date_raised, 'YYYY-MM')
order by min(date_raised);
(I changed the date format because I much prefer YYYY-MM, but that is irrelevant to the logic.)
Why do I prefer this method? This will include all months where there is a row in either table. I don't have to worry that some months are inadvertently filtered out, because there are missing production or defects in one month.

Gaps And Islands: Splitting Islands Based On External Table

My scenario started off similar to a Island and Gaps problem, where I needed to find consecutive days of work. My current SQL query answers "ProductA was produced at LocationA from DateA through DateB, totaling X quantity".
However, this does not suffice when I needed to throw prices into the mix. Prices are in a separate table and handled in C# after the fact. Price changes are essentially a list of records that say "ProductA from LocationA is now Y value per unit effective DateC".
The end result is it works as long as the island does not overlap with a price-change date, but if it does overlap, I get a "close" answer, but it's not precise.
The C# code can handle applying the prices efficiently, what I need to do though is split the islands based on price changes. My goal is to make the SQL's partioning take into account the ranking of days from the other table, but I'm having trouble applying what I want to do.
The current SQL that generates my island is as follows
SELECT MIN(ScheduledDate) as StartDate, MAX(ScheduledDate) as
EndDate, ProductId, DestinationId, SUM(Quantity) as TotalQuantity
FROM (
SELECT ScheduledDate, DestinationId, ProductId, PartitionGroup = DATEADD(DAY ,-1 * DENSE_RANK() OVER (ORDER BY ScheduledDate), ScheduledDate), Quantity
FROM History
) tmp
GROUP BY PartitionGroup, DestinationId, ProductId;
The current SQL that takes from the PriceChange table and ranks the dates is as follows
DECLARE #PriceChangeDates TABLE(Rank int, SplitDate Date);
INSERT INTO #PriceChangeDates
SELECT DENSE_RANK() over (ORDER BY EffectiveDate) as Rank, EffectiveDate as SplitDate
FROM ProductPriceChange
GROUP BY EffectiveDate;
My thought is to somehow update the first queries inner SELECT statement to somehow take advantage of the #PriceChangeDates table created by the second query. I would think we can multiply the DATEADD's increment parameter by the rank from the declared table, but I am struggling to write it.
If I was to somehow do this with loops, my thought process would be to determine which rank the ScheduledDate would be from the #PriceChangeDates table, where its rank is the rank of the closest Date that is smaller than itself it can find. Then take whatever rank that gives and, I would think, multiply it by the increment parameter being passed in (or some math, for example doing a *#PriceChangeDates.Count() on the existing parameter and then adding in the new rank to avoid collisions). However, that's "loop" logic not "set" logic, and in SQL I need to think in sets.
Any and all help/advice is greatly appreciated. Thank you :)
UPDATE:
Sample data & example on SQLFiddle: http://www.sqlfiddle.com/#!18/af568/1
Where the data is:
CREATE TABLE History
(
ProductId int,
DestinationId int,
ScheduledDate date,
Quantity float
);
INSERT INTO History (ProductId, DestinationId, ScheduledDate, Quantity)
VALUES
(0, 1000, '20180401', 5),
(0, 1000, '20180402', 10),
(0, 1000, '20180403', 7),
(3, 5000, '20180507', 15),
(3, 5000, '20180508', 23),
(3, 5000, '20180509', 52),
(3, 5000, '20180510', 12),
(3, 5000, '20180511', 14);
CREATE TABLE PriceChange
(
ProductId int,
DestinationId int,
EffectiveDate date,
Price float
);
INSERT INTO PriceChange (ProductId, DestinationId, EffectiveDate, Price)
VALUES
(0, 1000, '20180201', 1),
(0, 1000, '20180402', 2),
(3, 5000, '20180101', 5),
(3, 5000, '20180510', 20);
The desired results would be to have a SQL statement that generates the result:
StartDate EndDate ProductId DestinationId TotalQuantity
2018-04-01 2018-04-01 0 1000 5
2018-04-02 2018-04-03 0 1000 17
2018-05-07 2018-05-09 3 5000 90
2018-05-10 2018-05-11 3 5000 26
To clarify, the end result does need the TotalQuantity of each split amount, so the procedural code that manipulates the results and applies the pricing knows how much of each product was one on each side of the price change to accurately determine the values.
Here is one more variant that is likely to perform better than my first answer. I decided to put it as a second answer, because the approach is rather different and the answer would be too long. You should compare performance of all variants with your real data on your hardware, and don't forget about indexes.
In the first variant I was using APPLY to pick a relevant price for each row in the History table. For each row from the History table the engine is searching for a relevant row from the PriceChange table. Even with appropriate index on the PriceChange table when this is done via a single seek, it still means 3.7 million seeks in a loop join.
We can simply join History and PriceChange tables together and with appropriate indexes on both tables it will be an efficient merge join.
Here I'm also using an extended sample data set to illustrate the gaps. I added these rows to the sample data from the question.
INSERT INTO History (ProductId, DestinationId, ScheduledDate, Quantity)
VALUES
(0, 1000, '20180601', 5),
(0, 1000, '20180602', 10),
(0, 1000, '20180603', 7),
(3, 5000, '20180607', 15),
(3, 5000, '20180608', 23),
(3, 5000, '20180609', 52),
(3, 5000, '20180610', 12),
(3, 5000, '20180611', 14);
Intermediate query
We do a FULL JOIN here, not a LEFT JOIN because it is possible that the date on which the price changed doesn't appear in the History table at all.
WITH
CTE_Join
AS
(
SELECT
ISNULL(History.ProductId, PriceChange.ProductID) AS ProductID
,ISNULL(History.DestinationId, PriceChange.DestinationId) AS DestinationId
,ISNULL(History.ScheduledDate, PriceChange.EffectiveDate) AS ScheduledDate
,History.Quantity
,PriceChange.Price
FROM
History
FULL JOIN PriceChange
ON PriceChange.ProductID = History.ProductID
AND PriceChange.DestinationId = History.DestinationId
AND PriceChange.EffectiveDate = History.ScheduledDate
)
,CTE2
AS
(
SELECT
ProductID
,DestinationId
,ScheduledDate
,Quantity
,Price
,MAX(CASE WHEN Price IS NOT NULL THEN ScheduledDate END)
OVER (PARTITION BY ProductID, DestinationId ORDER BY ScheduledDate
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS grp
FROM CTE_Join
)
SELECT *
FROM CTE2
ORDER BY
ProductID
,DestinationId
,ScheduledDate
Create the following indexes
CREATE UNIQUE NONCLUSTERED INDEX [IX_History] ON [dbo].[History]
(
[ProductId] ASC,
[DestinationId] ASC,
[ScheduledDate] ASC
)
INCLUDE ([Quantity])
CREATE UNIQUE NONCLUSTERED INDEX [IX_Price] ON [dbo].[PriceChange]
(
[ProductId] ASC,
[DestinationId] ASC,
[EffectiveDate] ASC
)
INCLUDE ([Price])
and the join will be an efficient MERGE join in the execution plan (not a LOOP join)
Intermediate result
+-----------+---------------+---------------+----------+-------+------------+
| ProductID | DestinationId | ScheduledDate | Quantity | Price | grp |
+-----------+---------------+---------------+----------+-------+------------+
| 0 | 1000 | 2018-02-01 | NULL | 1 | 2018-02-01 |
| 0 | 1000 | 2018-04-01 | 5 | NULL | 2018-02-01 |
| 0 | 1000 | 2018-04-02 | 10 | 2 | 2018-04-02 |
| 0 | 1000 | 2018-04-03 | 7 | NULL | 2018-04-02 |
| 0 | 1000 | 2018-06-01 | 5 | NULL | 2018-04-02 |
| 0 | 1000 | 2018-06-02 | 10 | NULL | 2018-04-02 |
| 0 | 1000 | 2018-06-03 | 7 | NULL | 2018-04-02 |
| 3 | 5000 | 2018-01-01 | NULL | 5 | 2018-01-01 |
| 3 | 5000 | 2018-05-07 | 15 | NULL | 2018-01-01 |
| 3 | 5000 | 2018-05-08 | 23 | NULL | 2018-01-01 |
| 3 | 5000 | 2018-05-09 | 52 | NULL | 2018-01-01 |
| 3 | 5000 | 2018-05-10 | 12 | 20 | 2018-05-10 |
| 3 | 5000 | 2018-05-11 | 14 | NULL | 2018-05-10 |
| 3 | 5000 | 2018-06-07 | 15 | NULL | 2018-05-10 |
| 3 | 5000 | 2018-06-08 | 23 | NULL | 2018-05-10 |
| 3 | 5000 | 2018-06-09 | 52 | NULL | 2018-05-10 |
| 3 | 5000 | 2018-06-10 | 12 | NULL | 2018-05-10 |
| 3 | 5000 | 2018-06-11 | 14 | NULL | 2018-05-10 |
+-----------+---------------+---------------+----------+-------+------------+
You can see that the Price column has a lot of NULL values. We need to "fill" these NULL values with the preceding non-NULL value.
Itzik Ben-Gan wrote a nice article showing how to solve this efficiently The Last non NULL Puzzle. Also see Best way to replace NULL with most recent non-null value.
This is done in CTE2 using MAX window function and you can see how it populates the grp column. This requires SQL Server 2012+. After the groups are determined we should remove rows where Quantity is NULL, because these rows are not from the History table.
Now we can do the same gaps-and-islands step using the grp column as an additional partitioning.
The rest of the query is pretty much the same as in the first variant.
Final query
WITH
CTE_Join
AS
(
SELECT
ISNULL(History.ProductId, PriceChange.ProductID) AS ProductID
,ISNULL(History.DestinationId, PriceChange.DestinationId) AS DestinationId
,ISNULL(History.ScheduledDate, PriceChange.EffectiveDate) AS ScheduledDate
,History.Quantity
,PriceChange.Price
FROM
History
FULL JOIN PriceChange
ON PriceChange.ProductID = History.ProductID
AND PriceChange.DestinationId = History.DestinationId
AND PriceChange.EffectiveDate = History.ScheduledDate
)
,CTE2
AS
(
SELECT
ProductID
,DestinationId
,ScheduledDate
,Quantity
,Price
,MAX(CASE WHEN Price IS NOT NULL THEN ScheduledDate END)
OVER (PARTITION BY ProductID, DestinationId ORDER BY ScheduledDate
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS grp
FROM CTE_Join
)
,CTE_RN
AS
(
SELECT
ProductID
,DestinationId
,ScheduledDate
,grp
,Quantity
,ROW_NUMBER() OVER (PARTITION BY ProductId, DestinationId, grp ORDER BY ScheduledDate) AS rn1
,DATEDIFF(day, '20000101', ScheduledDate) AS rn2
FROM CTE2
WHERE Quantity IS NOT NULL
)
SELECT
ProductId
,DestinationId
,MIN(ScheduledDate) AS StartDate
,MAX(ScheduledDate) AS EndDate
,SUM(Quantity) AS TotalQuantity
FROM
CTE_RN
GROUP BY
ProductId
,DestinationId
,grp
,rn2-rn1
ORDER BY
ProductID
,DestinationId
,StartDate
;
Final result
+-----------+---------------+------------+------------+---------------+
| ProductId | DestinationId | StartDate | EndDate | TotalQuantity |
+-----------+---------------+------------+------------+---------------+
| 0 | 1000 | 2018-04-01 | 2018-04-01 | 5 |
| 0 | 1000 | 2018-04-02 | 2018-04-03 | 17 |
| 0 | 1000 | 2018-06-01 | 2018-06-03 | 22 |
| 3 | 5000 | 2018-05-07 | 2018-05-09 | 90 |
| 3 | 5000 | 2018-05-10 | 2018-05-11 | 26 |
| 3 | 5000 | 2018-06-07 | 2018-06-11 | 116 |
+-----------+---------------+------------+------------+---------------+
This variant doesn't output the relevant price (as the first variant), because I simplified the "last non-null" query. It wasn't required in the question. In any case, it is pretty easy to add the price if needed.
The straight-forward method is to fetch the effective price for each row of History and then generate gaps and islands taking price into account.
It is not clear from the question what is the role of DestinationID. Sample data is of no help here.
I'll assume that we need to join and partition on both ProductID and DestinationID.
The following query returns effective Price for each row from History.
You need to add index to the PriceChange table
CREATE NONCLUSTERED INDEX [IX] ON [dbo].[PriceChange]
(
[ProductId] ASC,
[DestinationId] ASC,
[EffectiveDate] DESC
)
INCLUDE ([Price])
for this query to work efficiently.
Query for Prices
SELECT
History.ProductId
,History.DestinationId
,History.ScheduledDate
,History.Quantity
,A.Price
FROM
History
OUTER APPLY
(
SELECT TOP(1)
PriceChange.Price
FROM
PriceChange
WHERE
PriceChange.ProductID = History.ProductID
AND PriceChange.DestinationId = History.DestinationId
AND PriceChange.EffectiveDate <= History.ScheduledDate
ORDER BY
PriceChange.EffectiveDate DESC
) AS A
ORDER BY ProductID, ScheduledDate;
For each row from History there will be one seek in this index to pick the correct price.
This query returns:
Prices
+-----------+---------------+---------------+----------+-------+
| ProductId | DestinationId | ScheduledDate | Quantity | Price |
+-----------+---------------+---------------+----------+-------+
| 0 | 1000 | 2018-04-01 | 5 | 1 |
| 0 | 1000 | 2018-04-02 | 10 | 2 |
| 0 | 1000 | 2018-04-03 | 7 | 2 |
| 3 | 5000 | 2018-05-07 | 15 | 5 |
| 3 | 5000 | 2018-05-08 | 23 | 5 |
| 3 | 5000 | 2018-05-09 | 52 | 5 |
| 3 | 5000 | 2018-05-10 | 12 | 20 |
| 3 | 5000 | 2018-05-11 | 14 | 20 |
+-----------+---------------+---------------+----------+-------+
Now a standard gaps-and-island step to collapse consecutive days with the same price together. I use a difference of two row number sequences here.
I've added some more rows to your sample data to see the gaps within the same ProductId.
INSERT INTO History (ProductId, DestinationId, ScheduledDate, Quantity)
VALUES
(0, 1000, '20180601', 5),
(0, 1000, '20180602', 10),
(0, 1000, '20180603', 7),
(3, 5000, '20180607', 15),
(3, 5000, '20180608', 23),
(3, 5000, '20180609', 52),
(3, 5000, '20180610', 12),
(3, 5000, '20180611', 14);
If you run this intermediate query you'll see how it works:
WITH
CTE_Prices
AS
(
SELECT
History.ProductId
,History.DestinationId
,History.ScheduledDate
,History.Quantity
,A.Price
FROM
History
OUTER APPLY
(
SELECT TOP(1)
PriceChange.Price
FROM
PriceChange
WHERE
PriceChange.ProductID = History.ProductID
AND PriceChange.DestinationId = History.DestinationId
AND PriceChange.EffectiveDate <= History.ScheduledDate
ORDER BY
PriceChange.EffectiveDate DESC
) AS A
)
,CTE_rn
AS
(
SELECT
ProductId
,DestinationId
,ScheduledDate
,Quantity
,Price
,ROW_NUMBER() OVER (PARTITION BY ProductId, DestinationId, Price ORDER BY ScheduledDate) AS rn1
,DATEDIFF(day, '20000101', ScheduledDate) AS rn2
FROM
CTE_Prices
)
SELECT *
,rn2-rn1 AS Diff
FROM CTE_rn
Intermediate result
+-----------+---------------+---------------+----------+-------+-----+------+------+
| ProductId | DestinationId | ScheduledDate | Quantity | Price | rn1 | rn2 | Diff |
+-----------+---------------+---------------+----------+-------+-----+------+------+
| 0 | 1000 | 2018-04-01 | 5 | 1 | 1 | 6665 | 6664 |
| 0 | 1000 | 2018-04-02 | 10 | 2 | 1 | 6666 | 6665 |
| 0 | 1000 | 2018-04-03 | 7 | 2 | 2 | 6667 | 6665 |
| 0 | 1000 | 2018-06-01 | 5 | 2 | 3 | 6726 | 6723 |
| 0 | 1000 | 2018-06-02 | 10 | 2 | 4 | 6727 | 6723 |
| 0 | 1000 | 2018-06-03 | 7 | 2 | 5 | 6728 | 6723 |
| 3 | 5000 | 2018-05-07 | 15 | 5 | 1 | 6701 | 6700 |
| 3 | 5000 | 2018-05-08 | 23 | 5 | 2 | 6702 | 6700 |
| 3 | 5000 | 2018-05-09 | 52 | 5 | 3 | 6703 | 6700 |
| 3 | 5000 | 2018-05-10 | 12 | 20 | 1 | 6704 | 6703 |
| 3 | 5000 | 2018-05-11 | 14 | 20 | 2 | 6705 | 6703 |
| 3 | 5000 | 2018-06-07 | 15 | 20 | 3 | 6732 | 6729 |
| 3 | 5000 | 2018-06-08 | 23 | 20 | 4 | 6733 | 6729 |
| 3 | 5000 | 2018-06-09 | 52 | 20 | 5 | 6734 | 6729 |
| 3 | 5000 | 2018-06-10 | 12 | 20 | 6 | 6735 | 6729 |
| 3 | 5000 | 2018-06-11 | 14 | 20 | 7 | 6736 | 6729 |
+-----------+---------------+---------------+----------+-------+-----+------+------+
Now simply group by the Diff to get one row per interval.
Final query
WITH
CTE_Prices
AS
(
SELECT
History.ProductId
,History.DestinationId
,History.ScheduledDate
,History.Quantity
,A.Price
FROM
History
OUTER APPLY
(
SELECT TOP(1)
PriceChange.Price
FROM
PriceChange
WHERE
PriceChange.ProductID = History.ProductID
AND PriceChange.DestinationId = History.DestinationId
AND PriceChange.EffectiveDate <= History.ScheduledDate
ORDER BY
PriceChange.EffectiveDate DESC
) AS A
)
,CTE_rn
AS
(
SELECT
ProductId
,DestinationId
,ScheduledDate
,Quantity
,Price
,ROW_NUMBER() OVER (PARTITION BY ProductId, DestinationId, Price ORDER BY ScheduledDate) AS rn1
,DATEDIFF(day, '20000101', ScheduledDate) AS rn2
FROM
CTE_Prices
)
SELECT
ProductId
,DestinationId
,MIN(ScheduledDate) AS StartDate
,MAX(ScheduledDate) AS EndDate
,SUM(Quantity) AS TotalQuantity
,Price
FROM
CTE_rn
GROUP BY
ProductId
,DestinationId
,Price
,rn2-rn1
ORDER BY
ProductID
,DestinationId
,StartDate
;
Final result
+-----------+---------------+------------+------------+---------------+-------+
| ProductId | DestinationId | StartDate | EndDate | TotalQuantity | Price |
+-----------+---------------+------------+------------+---------------+-------+
| 0 | 1000 | 2018-04-01 | 2018-04-01 | 5 | 1 |
| 0 | 1000 | 2018-04-02 | 2018-04-03 | 17 | 2 |
| 0 | 1000 | 2018-06-01 | 2018-06-03 | 22 | 2 |
| 3 | 5000 | 2018-05-07 | 2018-05-09 | 90 | 5 |
| 3 | 5000 | 2018-05-10 | 2018-05-11 | 26 | 20 |
| 3 | 5000 | 2018-06-07 | 2018-06-11 | 116 | 20 |
+-----------+---------------+------------+------------+---------------+-------+
Not sure that i understand correctly, but this is just my idea:
Select concat_ws(',',view2.StartDate, string_agg(view1.splitDate, ','),
view2.EndDate), view2.productId, view2.DestinationId from (
SELECT DENSE_RANK() OVER (ORDER BY EffectiveDate) as Rank, EffectiveDate as
SplitDate FROM PriceChange GROUP BY EffectiveDate) view1 join
(
SELECT MIN(ScheduledDate) as StartDate, MAX(ScheduledDate) as
EndDate,ProductId, DestinationId, SUM(Quantity) as TotalQuantity
FROM (
SELECT ScheduledDate, DestinationId, ProductId, PartitionGroup =
DATEADD(DAY ,-1 * DENSE_RANK() OVER (ORDER BY ScheduledDate),
ScheduledDate), Quantity
FROM History
) tmp
GROUP BY PartitionGroup, DestinationId, ProductId
) view2 on view1.SplitDate >= view2.StartDate
and view1.SplitDate <=view2.EndDate
group by view2.startDate, view2.endDate, view2.productId,
view2.DestinationId
The result from this query will be:
| ranges | productId | DestinationId |
|---------------------------------------------|-----------|---------------|
| 2018-04-01,2018-04-02,2018-04-03 | 0 | 1000 |
| 2018-05-07,2018-05-10,2018-05-11 | 3 | 5000 |
Then, with any procedure language, for each row, you can split the string (with appropriate inclusive or exclusive rule for each boundary) to find out a list of condition (:from, :to, :productId, :destinationId).
And finally, you can loop through the list of conditions and use Union all clause to build one query (which is the union of all queries, which states a condition) to find out the final result. For example,
Select * from History where ScheduledDate >= '2018-04-01' and ScheduledDate <'2018-04-02' and productId = 0 and destinationId = 1000
union all
Select * from History where ScheduledDate >= '2018-04-02' and ScheduledDate <'2018-04-03' and productId = 0 and destinationId = 1000
----Update--------
Just based on above idea, i do some quick changes to provide your resultset. Maybe you can optimize it later
with view3 as
(Select concat_ws(',',view2.StartDate, string_agg(view1.splitDate, ','),
dateadd(day, 1, view2.EndDate)) dateRange, view2.productId, view2.DestinationId from (
SELECT DENSE_RANK() OVER (ORDER BY EffectiveDate) as Rank, EffectiveDate as
SplitDate FROM PriceChange GROUP BY EffectiveDate) view1 join
(
SELECT MIN(ScheduledDate) as StartDate, MAX(ScheduledDate) as
EndDate,ProductId, DestinationId, SUM(Quantity) as TotalQuantity
FROM (
SELECT ScheduledDate, DestinationId, ProductId, PartitionGroup =
DATEADD(DAY ,-1 * DENSE_RANK() OVER (ORDER BY ScheduledDate),
ScheduledDate), Quantity
FROM History
) tmp
GROUP BY PartitionGroup, DestinationId, ProductId
) view2 on view1.SplitDate >= view2.StartDate
and view1.SplitDate <=view2.EndDate
group by view2.startDate, view2.endDate, view2.productId,
view2.DestinationId
),
view4 as
(
select productId, destinationId, value from view3 cross apply string_split(dateRange, ',')
),
view5 as(
select *, row_number() over(partition by productId, destinationId order by value) rn from view4
),
view6 as (
select v52.value fr, v51.value t, v51.productid, v51. destinationid from view5 v51 join view5 v52
on v51.productid = v52.productid
and v51.destinationid = v52.destinationid
and v51.rn = v52.rn+1
)
select min(h.ScheduledDate) StartDate, max(h.ScheduledDate) EndDate, v6.productId, v6.destinationId, sum(h.quantity) TotalQuantity from view6 v6 join History h
on v6.destinationId = h.destinationId
and v6.productId = h.productId
and h.ScheduledDate >= v6.fr
and h.ScheduledDate <v6.t
group by v6.fr, v6.t, v6.productId, v6.destinationId
And the result is exactly the same with what you gave.
| StartDate | EndDate | productId | destinationId | TotalQuantity |
|------------|------------|-----------|---------------|---------------|
| 2018-04-01 | 2018-04-01 | 0 | 1000 | 5 |
| 2018-04-02 | 2018-04-03 | 0 | 1000 | 17 |
| 2018-05-07 | 2018-05-09 | 3 | 5000 | 90 |
| 2018-05-10 | 2018-05-11 | 3 | 5000 | 26 |
Use outer apply to choose the nearest price, then do a group by:
Live test: http://www.sqlfiddle.com/#!18/af568/65
select
StartDate = min(h.ScheduledDate),
EndDate = max(h.ScheduledDate),
h.ProductId,
h.DestinationId,
TotalQuantity = sum(h.Quantity)
from History h
outer apply
(
select top 1 pc.*
from PriceChange pc
where
pc.ProductId = h.ProductId
and pc.Effectivedate <= h.ScheduledDate
order by pc.EffectiveDate desc
) UpToDate
group by UpToDate.EffectiveDate,
h.ProductId,
h.DestinationId
order by StartDate, EndDate, ProductId
Output:
| StartDate | EndDate | ProductId | DestinationId | TotalQuantity |
|------------|------------|-----------|---------------|---------------|
| 2018-04-01 | 2018-04-01 | 0 | 1000 | 5 |
| 2018-04-02 | 2018-04-03 | 0 | 1000 | 17 |
| 2018-05-07 | 2018-05-09 | 3 | 5000 | 90 |
| 2018-05-10 | 2018-05-11 | 3 | 5000 | 26 |