I have the following table in a SQL Server 2014 database:
+----+-------+--------+---------+
| ID | CODE | NUMBER | BALANCE |
+----+-------+--------+---------+
| 1 | B0001 | 122960 | 100.00 |
+----+-------+--------+---------+
| 2 | B0001 | 123168 | -100.00 |
+----+-------+--------+---------+
| 3 | B0001 | 121400 | 500.00 |
+----+-------+--------+---------+
| 4 | T0001 | 19755 | 50.00 |
+----+-------+--------+---------+
| 5 | T0001 | 19975 | -50.00 |
+----+-------+--------+---------+
| 6 | T0001 | 122202 | 50.00 |
+----+-------+--------+---------+
| 7 | T0001 | 122203 | 50.00 |
+----+-------+--------+---------+
I am trying to select rows where the balances for a given code can be offset against another row and totaled to 0. For example, the balance on rows 1 and 2 sum to 0 so should be returned. I have tried the following query:
SELECT T1.NUMBER
FROM TABLE T1, TABLE T2
WHERE T1.CODE = T2.CODE
AND T1.BALANCE + T2.BALANCE = 0
This works OK for code B0001. It will return rows 1 and 2 which cancel each other out and ignore row 3. I'm having a problem with code T0001 because the query I'm using will match each of the 3 positive values with the negative value and return all rows associated with that code. I only want it to return rows 4 and 5 for T0001.
Try this:
/* DATASET MOCK-UP */
DECLARE #Data TABLE ( ID INT, CODE VARCHAR(10), NUMBER INT, BALANCE DECIMAL(18,2) );
INSERT INTO #Data ( ID, CODE, NUMBER, BALANCE ) VALUES
( 1, 'B0001', 122960 , 100.00 ),
( 2, 'B0001', 123168 , -100.00 ),
( 3, 'B0001', 121400 , 500.00 ),
( 4, 'T0001', 19755 , 50.00 ),
( 5, 'T0001', 19975 , -50.00 ),
( 6, 'T0001', 122202 , 50.00 ),
( 7, 'T0001', 122203 , 50.00 );
/*
Return records where combined balances equal 0 by adding the
current record's BALANCE against its previous (lag) or following (lead) balances.
*/
SELECT
ID, CODE, NUMBER, BALANCE, ( BALANCE + LAG_BALANCE ) AS LAG_BALANCE, ( BALANCE + LEAD_BALANCE ) AS LEAD_BALANCE
FROM (
SELECT
ID,
CODE,
NUMBER,
BALANCE,
LAG ( BALANCE, 1, 0 ) OVER ( PARTITION BY CODE ORDER BY CODE, ID ) AS LAG_BALANCE,
LEAD ( BALANCE, 1, 0 ) OVER ( PARTITION BY CODE ORDER BY CODE, ID ) AS LEAD_BALANCE
FROM #Data
) AS Results
WHERE
BALANCE + LAG_BALANCE = 0
OR
BALANCE + LEAD_BALANCE = 0
ORDER BY
ID;
Returns
+----+-------+--------+---------+-------------+--------------+
| ID | CODE | NUMBER | BALANCE | LAG_BALANCE | LEAD_BALANCE |
+----+-------+--------+---------+-------------+--------------+
| 1 | B0001 | 122960 | 100.00 | 100.00 | 0.00 |
| 2 | B0001 | 123168 | -100.00 | 0.00 | 400.00 |
| 4 | T0001 | 19755 | 50.00 | 550.00 | 0.00 |
| 5 | T0001 | 19975 | -50.00 | 0.00 | 0.00 |
| 6 | T0001 | 122202 | 50.00 | 0.00 | 100.00 |
+----+-------+--------+---------+-------------+--------------+
UPDATE:
I just want the NUMBER values where they can be cancelled off. For T0001, it wouldn't matter which number it returned to cancel the negative value as long is it only returns one pair of values. For example, for T0001 it could return rows 4 and 5, 5 and 6, or 5 and 7. They would all be valid but I only want one of them.
This edit returns a single NUMBER for each CODE that matches your "zero-out" condition:
SELECT
CODE, MIN ( NUMBER ) AS MIN_NUMBER
FROM (
SELECT
ID,
CODE,
NUMBER,
BALANCE,
LAG ( BALANCE, 1, 0 ) OVER ( PARTITION BY CODE ORDER BY CODE, ID ) AS LAG_BALANCE,
LEAD ( BALANCE, 1, 0 ) OVER ( PARTITION BY CODE ORDER BY CODE, ID ) AS LEAD_BALANCE
FROM #Data
) AS Results
WHERE
BALANCE + LAG_BALANCE = 0
OR
BALANCE + LEAD_BALANCE = 0
GROUP BY
CODE
ORDER BY
CODE;
Returns
+-------+------------+
| CODE | MIN_NUMBER |
+-------+------------+
| B0001 | 122960 |
| T0001 | 19755 |
+-------+------------+
UPDATE #2:
/*
Return the first TWO rows for a CODE with BALANCEs that zero-out each other.
*/
SELECT
ID, CODE, NUMBER, BALANCE, ( BALANCE + LAG_BALANCE ) AS LAG_BALANCE, ( BALANCE + LEAD_BALANCE ) AS LEAD_BALANCE
FROM (
SELECT
ID,
CODE,
NUMBER,
BALANCE,
LAG ( BALANCE, 1, 0 ) OVER ( PARTITION BY CODE ORDER BY CODE, ID ) AS LAG_BALANCE,
LEAD ( BALANCE, 1, 0 ) OVER ( PARTITION BY CODE ORDER BY CODE, ID ) AS LEAD_BALANCE,
ROW_NUMBER() OVER ( PARTITION BY CODE ORDER BY CODE, ID ) AS CODE_ROW
FROM #Data
) AS Results
WHERE
CODE_ROW <= 2
AND ( BALANCE + LAG_BALANCE = 0 OR BALANCE + LEAD_BALANCE = 0 )
ORDER BY
ID;
Returns
+----+-------+--------+---------+-------------+--------------+
| ID | CODE | NUMBER | BALANCE | LAG_BALANCE | LEAD_BALANCE |
+----+-------+--------+---------+-------------+--------------+
| 1 | B0001 | 122960 | 100.00 | 100.00 | 0.00 |
| 2 | B0001 | 123168 | -100.00 | 0.00 | 400.00 |
| 4 | T0001 | 19755 | 50.00 | 50.00 | 0.00 |
| 5 | T0001 | 19975 | -50.00 | 0.00 | 0.00 |
+----+-------+--------+---------+-------------+--------------+
You want to match rows on opposite balance, but each row should be matched only once.
An option is to enumerate the rows with row_number() first. You can then use the self-join solution, adding the row number in the join condition. I prefer not exists - but the logic is the same:
with cte as (
select
t.*,
row_number() over(partition by code, balance order by id) rn
from mytable t
)
select *
from cte c
where exists (
select 1
from cte c1
where c1.code = c.code and c1.rn = c.rn and c1.balance + c.balance = 0
)
order by code, id
Demo on DB Fiddle:
id | code | number | balance | rn
-: | :---- | -----: | ------: | -:
1 | B0001 | 122960 | 100.00 | 1
2 | B0001 | 123168 | -100.00 | 1
4 | T0001 | 19755 | 50.00 | 1
5 | T0001 | 19975 | -50.00 | 1
Something like this
;with
neg_cte as (select *, row_number() over(partition by code, balance order by id) rn
from #Data where BALANCE<0),
pos_cte as (select *, row_number() over(partition by code, balance order by id) rn
from #Data where BALANCE>0)
select * from neg_cte
union all
select pc.* from neg_cte nc join pos_cte pc on nc.CODE=pc.CODE
and nc.BALANCE=pc.BALANCE*-1
and nc.rn=pc.rn
order by ID;
Results
ID CODE NUMBER BALANCE rn
1 B0001 122960 100.00 1
2 B0001 123168 -100.00 1
4 T0001 19755 50.00 1
5 T0001 19975 -50.00 1
I got a table data as follows:
ID | TYPE_ID | CREATED_DT | ROW_NUM
=====================================
123 | 485 | 2019-08-31 | 1
123 | 485 | 2019-05-31 | 2
123 | 485 | 2019-02-28 | 3
123 | 485 | 2018-11-30 | 4
123 | 485 | 2018-08-31 | 5
123 | 485 | 2018-05-31 | 6
123 | 487 | 2019-05-31 | 1
123 | 487 | 2018-05-31 | 2
I would like to select 6 ROW_NUMs for each TYPE_ID, if there is missing data I need to return NULL value for CREATED_DT and the final result set should look like:
ID | TYPE_ID | CREATED_DT | ROW_NUM
=====================================
123 | 485 | 2019-08-31 | 1
123 | 485 | 2019-05-31 | 2
123 | 485 | 2019-02-28 | 3
123 | 485 | 2018-11-30 | 4
123 | 485 | 2018-08-31 | 5
123 | 485 | 2018-05-31 | 6
123 | 487 | 2019-05-31 | 1
123 | 487 | 2018-05-31 | 2
123 | 487 | NULL | 3
123 | 487 | NULL | 4
123 | 487 | NULL | 5
123 | 487 | NULL | 6
Query:
SELECT
A.*
FROM TBL AS A
WHERE A.ROW_NUM <= 6
UNION ALL
SELECT
B.*
FROM TBL AS B
WHERE B.ROW_NUM NOT IN (SELECT ROW_NUM FROM TBL)
AND B.ROW_NUM <= 6
I tried using UNION ALL and ISNULL to backfill data that is not available but it is still giving me the existing data but not the expected result. I think this can be done in a easy way by using CTE but not sure how to get this working. Can any help me in this regard.
Assuming Row_Num has at least record has at least all 6 rows... 1,2,3,4,5,6 in tbl and no fractions or 0 or negative numbers...
we get a list of all the distinct type ID's and IDs. (Alias A)
Then we get a distinct list of row numbers less than 7 (giving us 6 records)
we cross join these to ensure each ID & Type_ID has all 6 rows.
we then left join back in the base set (tbl) to get all the needed dates; where such dates exist. As we're using left join the rows w/o a date will still persist.
.
SELECT A.ID, A.Type_ID, C.Created_DT, B.Row_Num
FROM (SELECT DISTINCT ID, Type_ID FROM tbl) A
CROSS JOIN (SELECT distinct row_num from tbl where Row_num < 7) B
LEFT JOIN tbl C
on C.ID = A.ID
and C.Type_ID = A.Type_ID
and C.Row_num = B.Row_num
Giving us:
+----+-----+---------+------------+---------+
| | ID | Type_ID | Created_DT | Row_Num |
+----+-----+---------+------------+---------+
| 1 | 123 | 485 | 2019-08-31 | 1 |
| 2 | 123 | 485 | 2019-05-31 | 2 |
| 3 | 123 | 485 | 2019-02-28 | 3 |
| 4 | 123 | 485 | 2018-11-30 | 4 |
| 5 | 123 | 485 | 2018-08-31 | 5 |
| 6 | 123 | 485 | 2018-05-31 | 6 |
| 7 | 123 | 487 | 2019-05-31 | 1 |
| 8 | 123 | 487 | 2018-05-31 | 2 |
| 9 | 123 | 487 | NULL | 3 |
| 10 | 123 | 487 | NULL | 4 |
| 11 | 123 | 487 | NULL | 5 |
| 12 | 123 | 487 | NULL | 6 |
+----+-----+---------+------------+---------+
Rex Tester: Example
This also assumes that you'd want 1-6 for each combination of type_id and ID. If ID's irrelevant, then simply exclude it from the join criteria. I included it as it's an ID and seems like it's part of a key.
Please reference the other answer for how you can do this using a CROSS JOIN - which is pretty neat. Alternatively, we can utilize the programming logic available in MS-SQL to achieve the desired results. The following approach stores distinct ID and TYPE_ID combinations inside a SQL cursor. Then it iterates through the cursor entries to ensure the appropriate amount of data is stored into a temp table. Finally, the SELECT is performed on the temp table and the cursor is closed. Here is a proof of concept that I validated on https://rextester.com/l/sql_server_online_compiler.
-- Create schema for testing
CREATE TABLE Test (
ID INT,
TYPE_ID INT,
CREATED_DT DATE
)
-- Populate data
INSERT INTO Test(ID, TYPE_ID, CREATED_DT)
VALUES
(123,485,'2019-08-31')
,(123,485,'2019-05-31')
,(123,485,'2019-02-28')
,(123,485,'2018-11-30')
,(123,485,'2018-08-31')
,(123,485,'2018-05-31')
,(123,487,'2019-05-31')
,(123,487,'2018-05-31');
-- Create TempTable for output
CREATE TABLE #OutputTable (
ID INT,
TYPE_ID INT,
CREATED_DT DATE,
ROW_NUM INT
)
-- Declare local variables
DECLARE #tempID INT, #tempType INT;
-- Create cursor to iterate ID and TYPE_ID
DECLARE mycursor CURSOR FOR (
SELECT DISTINCT ID, TYPE_ID FROM Test
);
OPEN mycursor
-- Populate cursor
FETCH NEXT FROM mycursor
INTO #tempID, #tempType;
-- Loop
WHILE ##FETCH_STATUS = 0
BEGIN
DECLARE #count INT = (SELECT COUNT(*) FROM Test WHERE ID = #tempID AND TYPE_ID = #tempType);
INSERT INTO #OutputTable (ID, TYPE_ID, CREATED_DT, ROW_NUM)
SELECT ID, TYPE_ID, CREATED_DT, ROW_NUMBER() OVER(ORDER BY ID ASC)
FROM Test
WHERE ID = #tempID AND TYPE_ID = #tempType;
WHILE #count < 6
BEGIN
SET #count = #count + 1
INSERT INTO #OutputTable
VALUES (#tempID, #tempType, NULL, #count);
END
FETCH NEXT FROM mycursor
INTO #tempID, #tempType;
END
-- Close cursor
CLOSE mycursor;
-- View results
SELECT * FROM #OutputTable;
Note, if you have an instance where a unique combination of ID and TYPE_ID are grouped more than 6 times, the additional groupings will be included in your final result. If you must only show exactly 6 groupings, you can change that part of the query to SELECT TOP 6 ....
create a cte with a series and cross apply it
CREATE TABLE Test (
ID INT,
TYPE_ID INT,
CREATED_DT DATE
)
INSERT INTO Test(ID, TYPE_ID, CREATED_DT)
VALUES
(123,485,'2019-08-31')
,(123,485,'2019-05-31')
,(123,485,'2019-02-28')
,(123,485,'2018-11-30')
,(123,485,'2018-08-31')
,(123,485,'2018-05-31')
,(123,487,'2019-05-31')
,(123,487,'2018-05-31')
;
WITH n(n) AS
(
SELECT 1
UNION ALL
SELECT n+1 FROM n WHERE n < 6
)
,id_n as (
SELECT
DISTINCT
ID
,TYPE_ID
,n
FROM
Test
cross apply n
)
SELECT
id_n.ID
,id_n.TYPE_ID
,test.CREATED_DT
,id_n.n row_num
FROM
id_n
left join
(
select
ID
,TYPE_ID
,CREATED_DT
,ROW_NUMBER() over(partition by id, type_id order by created_dt) rn
from
Test
) Test on Test.ID = id_n.ID and Test.TYPE_ID = id_n.TYPE_ID and id_n.n = test.rn
drop table Test
Which includes steps in the process and status of each step.
For processions completed the "Done" step is last and its duration is 0
A process that is without a "Done" stage-it still continues to run the query
I need a query to add another column in the table that calculates the minutes of each step in the process
I would appreciate your help
Which Syntax is effective
added a Syntax of table creation and data:
Create table T_Step (
employee_ID INT
, Process_ID int
, Step_ID int
, Start_Date Datetime
, Step_Status varchar(30)
);
Insert into T_Step values
('1','1','1','2018-01-01 8:00' ,'Pending')
, ('1','1','2','2018-01-01 9:30' ,'InService')
, ('1','1','3','2018-01-01 9:45' ,'Done')
, ('2','2','1','2018-01-02 11:32','Pending')
, ('2','2','2','2018-01-02 11:40','InService')
, ('2','2','3','2018-01-02 12:20','Done')
;
Thanks
Use LEFT JOIN and then count time difference between two-step for the particular process.
This query work in MYSQL
select t1.employee_ID,t1.Process_ID,t1.Step_ID,t1.Start_Date,t1.Step_Status,
IFNULL(TIMESTAMPDIFF(MINUTE,t1.Start_Date,t2.Start_Date),0) As TimeInMinute
from T_Step t1
LEFT JOIN T_Step t2
ON t1.Process_ID=t2.Process_ID AND t1.Step_ID!=t2.Step_ID AND (t2.Step_ID-t1.Step_ID)=1
ORDER BY t1.Process_ID,t1.Step_ID;
OUTPUT
| employee_ID | Process_ID | Step_ID | Start_Date | Step_Status | TimeInMinute |
| ----------- | ---------- | ------- | ------------------- | ----------- | ------------ |
| 1 | 1 | 1 | 2018-01-01 08:00:00 | Pending | 90 |
| 1 | 1 | 2 | 2018-01-01 09:30:00 | InService | 15 |
| 1 | 1 | 3 | 2018-01-01 09:45:00 | Done | 0 |
| 2 | 2 | 1 | 2018-01-02 11:32:00 | Pending | 8 |
| 2 | 2 | 2 | 2018-01-02 11:40:00 | InService | 40 |
| 2 | 2 | 3 | 2018-01-02 12:20:00 | Done | 0 |
DEMO
declare #T_Step table (
employee_ID INT
, Process_ID int
, Step_ID int
, Start_Date Datetime
, Step_Status varchar(30)
);
Insert into #T_Step values
('1','1','1','2018-01-01 8:00' ,'Pending')
, ('1','1','2','2018-01-01 9:30' ,'InService')
, ('1','1','3','2018-01-01 9:45' ,'Done')
, ('2','2','1','2018-01-02 11:32','Pending')
, ('2','2','2','2018-01-02 11:40','InService')
, ('2','2','3','2018-01-02 12:20','Done')
;
with cte as (Select *, R=ROW_NUMBER()
over(partition by employee_ID order by employee_ID)
from #T_Step)
Select T1.employee_ID,T1.Process_ID,T1.Step_Status,t1.Start_Date,t2.Start_Date, DATEDIFF(SECOND,t2.Start_Date,t1.Start_Date) TimeTaken
from cte T1
left join cte T2 on T1.R = T2.R+1 and T1.employee_ID = T2.employee_ID
order by T1.employee_ID
My scenario started off similar to a Island and Gaps problem, where I needed to find consecutive days of work. My current SQL query answers "ProductA was produced at LocationA from DateA through DateB, totaling X quantity".
However, this does not suffice when I needed to throw prices into the mix. Prices are in a separate table and handled in C# after the fact. Price changes are essentially a list of records that say "ProductA from LocationA is now Y value per unit effective DateC".
The end result is it works as long as the island does not overlap with a price-change date, but if it does overlap, I get a "close" answer, but it's not precise.
The C# code can handle applying the prices efficiently, what I need to do though is split the islands based on price changes. My goal is to make the SQL's partioning take into account the ranking of days from the other table, but I'm having trouble applying what I want to do.
The current SQL that generates my island is as follows
SELECT MIN(ScheduledDate) as StartDate, MAX(ScheduledDate) as
EndDate, ProductId, DestinationId, SUM(Quantity) as TotalQuantity
FROM (
SELECT ScheduledDate, DestinationId, ProductId, PartitionGroup = DATEADD(DAY ,-1 * DENSE_RANK() OVER (ORDER BY ScheduledDate), ScheduledDate), Quantity
FROM History
) tmp
GROUP BY PartitionGroup, DestinationId, ProductId;
The current SQL that takes from the PriceChange table and ranks the dates is as follows
DECLARE #PriceChangeDates TABLE(Rank int, SplitDate Date);
INSERT INTO #PriceChangeDates
SELECT DENSE_RANK() over (ORDER BY EffectiveDate) as Rank, EffectiveDate as SplitDate
FROM ProductPriceChange
GROUP BY EffectiveDate;
My thought is to somehow update the first queries inner SELECT statement to somehow take advantage of the #PriceChangeDates table created by the second query. I would think we can multiply the DATEADD's increment parameter by the rank from the declared table, but I am struggling to write it.
If I was to somehow do this with loops, my thought process would be to determine which rank the ScheduledDate would be from the #PriceChangeDates table, where its rank is the rank of the closest Date that is smaller than itself it can find. Then take whatever rank that gives and, I would think, multiply it by the increment parameter being passed in (or some math, for example doing a *#PriceChangeDates.Count() on the existing parameter and then adding in the new rank to avoid collisions). However, that's "loop" logic not "set" logic, and in SQL I need to think in sets.
Any and all help/advice is greatly appreciated. Thank you :)
UPDATE:
Sample data & example on SQLFiddle: http://www.sqlfiddle.com/#!18/af568/1
Where the data is:
CREATE TABLE History
(
ProductId int,
DestinationId int,
ScheduledDate date,
Quantity float
);
INSERT INTO History (ProductId, DestinationId, ScheduledDate, Quantity)
VALUES
(0, 1000, '20180401', 5),
(0, 1000, '20180402', 10),
(0, 1000, '20180403', 7),
(3, 5000, '20180507', 15),
(3, 5000, '20180508', 23),
(3, 5000, '20180509', 52),
(3, 5000, '20180510', 12),
(3, 5000, '20180511', 14);
CREATE TABLE PriceChange
(
ProductId int,
DestinationId int,
EffectiveDate date,
Price float
);
INSERT INTO PriceChange (ProductId, DestinationId, EffectiveDate, Price)
VALUES
(0, 1000, '20180201', 1),
(0, 1000, '20180402', 2),
(3, 5000, '20180101', 5),
(3, 5000, '20180510', 20);
The desired results would be to have a SQL statement that generates the result:
StartDate EndDate ProductId DestinationId TotalQuantity
2018-04-01 2018-04-01 0 1000 5
2018-04-02 2018-04-03 0 1000 17
2018-05-07 2018-05-09 3 5000 90
2018-05-10 2018-05-11 3 5000 26
To clarify, the end result does need the TotalQuantity of each split amount, so the procedural code that manipulates the results and applies the pricing knows how much of each product was one on each side of the price change to accurately determine the values.
Here is one more variant that is likely to perform better than my first answer. I decided to put it as a second answer, because the approach is rather different and the answer would be too long. You should compare performance of all variants with your real data on your hardware, and don't forget about indexes.
In the first variant I was using APPLY to pick a relevant price for each row in the History table. For each row from the History table the engine is searching for a relevant row from the PriceChange table. Even with appropriate index on the PriceChange table when this is done via a single seek, it still means 3.7 million seeks in a loop join.
We can simply join History and PriceChange tables together and with appropriate indexes on both tables it will be an efficient merge join.
Here I'm also using an extended sample data set to illustrate the gaps. I added these rows to the sample data from the question.
INSERT INTO History (ProductId, DestinationId, ScheduledDate, Quantity)
VALUES
(0, 1000, '20180601', 5),
(0, 1000, '20180602', 10),
(0, 1000, '20180603', 7),
(3, 5000, '20180607', 15),
(3, 5000, '20180608', 23),
(3, 5000, '20180609', 52),
(3, 5000, '20180610', 12),
(3, 5000, '20180611', 14);
Intermediate query
We do a FULL JOIN here, not a LEFT JOIN because it is possible that the date on which the price changed doesn't appear in the History table at all.
WITH
CTE_Join
AS
(
SELECT
ISNULL(History.ProductId, PriceChange.ProductID) AS ProductID
,ISNULL(History.DestinationId, PriceChange.DestinationId) AS DestinationId
,ISNULL(History.ScheduledDate, PriceChange.EffectiveDate) AS ScheduledDate
,History.Quantity
,PriceChange.Price
FROM
History
FULL JOIN PriceChange
ON PriceChange.ProductID = History.ProductID
AND PriceChange.DestinationId = History.DestinationId
AND PriceChange.EffectiveDate = History.ScheduledDate
)
,CTE2
AS
(
SELECT
ProductID
,DestinationId
,ScheduledDate
,Quantity
,Price
,MAX(CASE WHEN Price IS NOT NULL THEN ScheduledDate END)
OVER (PARTITION BY ProductID, DestinationId ORDER BY ScheduledDate
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS grp
FROM CTE_Join
)
SELECT *
FROM CTE2
ORDER BY
ProductID
,DestinationId
,ScheduledDate
Create the following indexes
CREATE UNIQUE NONCLUSTERED INDEX [IX_History] ON [dbo].[History]
(
[ProductId] ASC,
[DestinationId] ASC,
[ScheduledDate] ASC
)
INCLUDE ([Quantity])
CREATE UNIQUE NONCLUSTERED INDEX [IX_Price] ON [dbo].[PriceChange]
(
[ProductId] ASC,
[DestinationId] ASC,
[EffectiveDate] ASC
)
INCLUDE ([Price])
and the join will be an efficient MERGE join in the execution plan (not a LOOP join)
Intermediate result
+-----------+---------------+---------------+----------+-------+------------+
| ProductID | DestinationId | ScheduledDate | Quantity | Price | grp |
+-----------+---------------+---------------+----------+-------+------------+
| 0 | 1000 | 2018-02-01 | NULL | 1 | 2018-02-01 |
| 0 | 1000 | 2018-04-01 | 5 | NULL | 2018-02-01 |
| 0 | 1000 | 2018-04-02 | 10 | 2 | 2018-04-02 |
| 0 | 1000 | 2018-04-03 | 7 | NULL | 2018-04-02 |
| 0 | 1000 | 2018-06-01 | 5 | NULL | 2018-04-02 |
| 0 | 1000 | 2018-06-02 | 10 | NULL | 2018-04-02 |
| 0 | 1000 | 2018-06-03 | 7 | NULL | 2018-04-02 |
| 3 | 5000 | 2018-01-01 | NULL | 5 | 2018-01-01 |
| 3 | 5000 | 2018-05-07 | 15 | NULL | 2018-01-01 |
| 3 | 5000 | 2018-05-08 | 23 | NULL | 2018-01-01 |
| 3 | 5000 | 2018-05-09 | 52 | NULL | 2018-01-01 |
| 3 | 5000 | 2018-05-10 | 12 | 20 | 2018-05-10 |
| 3 | 5000 | 2018-05-11 | 14 | NULL | 2018-05-10 |
| 3 | 5000 | 2018-06-07 | 15 | NULL | 2018-05-10 |
| 3 | 5000 | 2018-06-08 | 23 | NULL | 2018-05-10 |
| 3 | 5000 | 2018-06-09 | 52 | NULL | 2018-05-10 |
| 3 | 5000 | 2018-06-10 | 12 | NULL | 2018-05-10 |
| 3 | 5000 | 2018-06-11 | 14 | NULL | 2018-05-10 |
+-----------+---------------+---------------+----------+-------+------------+
You can see that the Price column has a lot of NULL values. We need to "fill" these NULL values with the preceding non-NULL value.
Itzik Ben-Gan wrote a nice article showing how to solve this efficiently The Last non NULL Puzzle. Also see Best way to replace NULL with most recent non-null value.
This is done in CTE2 using MAX window function and you can see how it populates the grp column. This requires SQL Server 2012+. After the groups are determined we should remove rows where Quantity is NULL, because these rows are not from the History table.
Now we can do the same gaps-and-islands step using the grp column as an additional partitioning.
The rest of the query is pretty much the same as in the first variant.
Final query
WITH
CTE_Join
AS
(
SELECT
ISNULL(History.ProductId, PriceChange.ProductID) AS ProductID
,ISNULL(History.DestinationId, PriceChange.DestinationId) AS DestinationId
,ISNULL(History.ScheduledDate, PriceChange.EffectiveDate) AS ScheduledDate
,History.Quantity
,PriceChange.Price
FROM
History
FULL JOIN PriceChange
ON PriceChange.ProductID = History.ProductID
AND PriceChange.DestinationId = History.DestinationId
AND PriceChange.EffectiveDate = History.ScheduledDate
)
,CTE2
AS
(
SELECT
ProductID
,DestinationId
,ScheduledDate
,Quantity
,Price
,MAX(CASE WHEN Price IS NOT NULL THEN ScheduledDate END)
OVER (PARTITION BY ProductID, DestinationId ORDER BY ScheduledDate
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS grp
FROM CTE_Join
)
,CTE_RN
AS
(
SELECT
ProductID
,DestinationId
,ScheduledDate
,grp
,Quantity
,ROW_NUMBER() OVER (PARTITION BY ProductId, DestinationId, grp ORDER BY ScheduledDate) AS rn1
,DATEDIFF(day, '20000101', ScheduledDate) AS rn2
FROM CTE2
WHERE Quantity IS NOT NULL
)
SELECT
ProductId
,DestinationId
,MIN(ScheduledDate) AS StartDate
,MAX(ScheduledDate) AS EndDate
,SUM(Quantity) AS TotalQuantity
FROM
CTE_RN
GROUP BY
ProductId
,DestinationId
,grp
,rn2-rn1
ORDER BY
ProductID
,DestinationId
,StartDate
;
Final result
+-----------+---------------+------------+------------+---------------+
| ProductId | DestinationId | StartDate | EndDate | TotalQuantity |
+-----------+---------------+------------+------------+---------------+
| 0 | 1000 | 2018-04-01 | 2018-04-01 | 5 |
| 0 | 1000 | 2018-04-02 | 2018-04-03 | 17 |
| 0 | 1000 | 2018-06-01 | 2018-06-03 | 22 |
| 3 | 5000 | 2018-05-07 | 2018-05-09 | 90 |
| 3 | 5000 | 2018-05-10 | 2018-05-11 | 26 |
| 3 | 5000 | 2018-06-07 | 2018-06-11 | 116 |
+-----------+---------------+------------+------------+---------------+
This variant doesn't output the relevant price (as the first variant), because I simplified the "last non-null" query. It wasn't required in the question. In any case, it is pretty easy to add the price if needed.
The straight-forward method is to fetch the effective price for each row of History and then generate gaps and islands taking price into account.
It is not clear from the question what is the role of DestinationID. Sample data is of no help here.
I'll assume that we need to join and partition on both ProductID and DestinationID.
The following query returns effective Price for each row from History.
You need to add index to the PriceChange table
CREATE NONCLUSTERED INDEX [IX] ON [dbo].[PriceChange]
(
[ProductId] ASC,
[DestinationId] ASC,
[EffectiveDate] DESC
)
INCLUDE ([Price])
for this query to work efficiently.
Query for Prices
SELECT
History.ProductId
,History.DestinationId
,History.ScheduledDate
,History.Quantity
,A.Price
FROM
History
OUTER APPLY
(
SELECT TOP(1)
PriceChange.Price
FROM
PriceChange
WHERE
PriceChange.ProductID = History.ProductID
AND PriceChange.DestinationId = History.DestinationId
AND PriceChange.EffectiveDate <= History.ScheduledDate
ORDER BY
PriceChange.EffectiveDate DESC
) AS A
ORDER BY ProductID, ScheduledDate;
For each row from History there will be one seek in this index to pick the correct price.
This query returns:
Prices
+-----------+---------------+---------------+----------+-------+
| ProductId | DestinationId | ScheduledDate | Quantity | Price |
+-----------+---------------+---------------+----------+-------+
| 0 | 1000 | 2018-04-01 | 5 | 1 |
| 0 | 1000 | 2018-04-02 | 10 | 2 |
| 0 | 1000 | 2018-04-03 | 7 | 2 |
| 3 | 5000 | 2018-05-07 | 15 | 5 |
| 3 | 5000 | 2018-05-08 | 23 | 5 |
| 3 | 5000 | 2018-05-09 | 52 | 5 |
| 3 | 5000 | 2018-05-10 | 12 | 20 |
| 3 | 5000 | 2018-05-11 | 14 | 20 |
+-----------+---------------+---------------+----------+-------+
Now a standard gaps-and-island step to collapse consecutive days with the same price together. I use a difference of two row number sequences here.
I've added some more rows to your sample data to see the gaps within the same ProductId.
INSERT INTO History (ProductId, DestinationId, ScheduledDate, Quantity)
VALUES
(0, 1000, '20180601', 5),
(0, 1000, '20180602', 10),
(0, 1000, '20180603', 7),
(3, 5000, '20180607', 15),
(3, 5000, '20180608', 23),
(3, 5000, '20180609', 52),
(3, 5000, '20180610', 12),
(3, 5000, '20180611', 14);
If you run this intermediate query you'll see how it works:
WITH
CTE_Prices
AS
(
SELECT
History.ProductId
,History.DestinationId
,History.ScheduledDate
,History.Quantity
,A.Price
FROM
History
OUTER APPLY
(
SELECT TOP(1)
PriceChange.Price
FROM
PriceChange
WHERE
PriceChange.ProductID = History.ProductID
AND PriceChange.DestinationId = History.DestinationId
AND PriceChange.EffectiveDate <= History.ScheduledDate
ORDER BY
PriceChange.EffectiveDate DESC
) AS A
)
,CTE_rn
AS
(
SELECT
ProductId
,DestinationId
,ScheduledDate
,Quantity
,Price
,ROW_NUMBER() OVER (PARTITION BY ProductId, DestinationId, Price ORDER BY ScheduledDate) AS rn1
,DATEDIFF(day, '20000101', ScheduledDate) AS rn2
FROM
CTE_Prices
)
SELECT *
,rn2-rn1 AS Diff
FROM CTE_rn
Intermediate result
+-----------+---------------+---------------+----------+-------+-----+------+------+
| ProductId | DestinationId | ScheduledDate | Quantity | Price | rn1 | rn2 | Diff |
+-----------+---------------+---------------+----------+-------+-----+------+------+
| 0 | 1000 | 2018-04-01 | 5 | 1 | 1 | 6665 | 6664 |
| 0 | 1000 | 2018-04-02 | 10 | 2 | 1 | 6666 | 6665 |
| 0 | 1000 | 2018-04-03 | 7 | 2 | 2 | 6667 | 6665 |
| 0 | 1000 | 2018-06-01 | 5 | 2 | 3 | 6726 | 6723 |
| 0 | 1000 | 2018-06-02 | 10 | 2 | 4 | 6727 | 6723 |
| 0 | 1000 | 2018-06-03 | 7 | 2 | 5 | 6728 | 6723 |
| 3 | 5000 | 2018-05-07 | 15 | 5 | 1 | 6701 | 6700 |
| 3 | 5000 | 2018-05-08 | 23 | 5 | 2 | 6702 | 6700 |
| 3 | 5000 | 2018-05-09 | 52 | 5 | 3 | 6703 | 6700 |
| 3 | 5000 | 2018-05-10 | 12 | 20 | 1 | 6704 | 6703 |
| 3 | 5000 | 2018-05-11 | 14 | 20 | 2 | 6705 | 6703 |
| 3 | 5000 | 2018-06-07 | 15 | 20 | 3 | 6732 | 6729 |
| 3 | 5000 | 2018-06-08 | 23 | 20 | 4 | 6733 | 6729 |
| 3 | 5000 | 2018-06-09 | 52 | 20 | 5 | 6734 | 6729 |
| 3 | 5000 | 2018-06-10 | 12 | 20 | 6 | 6735 | 6729 |
| 3 | 5000 | 2018-06-11 | 14 | 20 | 7 | 6736 | 6729 |
+-----------+---------------+---------------+----------+-------+-----+------+------+
Now simply group by the Diff to get one row per interval.
Final query
WITH
CTE_Prices
AS
(
SELECT
History.ProductId
,History.DestinationId
,History.ScheduledDate
,History.Quantity
,A.Price
FROM
History
OUTER APPLY
(
SELECT TOP(1)
PriceChange.Price
FROM
PriceChange
WHERE
PriceChange.ProductID = History.ProductID
AND PriceChange.DestinationId = History.DestinationId
AND PriceChange.EffectiveDate <= History.ScheduledDate
ORDER BY
PriceChange.EffectiveDate DESC
) AS A
)
,CTE_rn
AS
(
SELECT
ProductId
,DestinationId
,ScheduledDate
,Quantity
,Price
,ROW_NUMBER() OVER (PARTITION BY ProductId, DestinationId, Price ORDER BY ScheduledDate) AS rn1
,DATEDIFF(day, '20000101', ScheduledDate) AS rn2
FROM
CTE_Prices
)
SELECT
ProductId
,DestinationId
,MIN(ScheduledDate) AS StartDate
,MAX(ScheduledDate) AS EndDate
,SUM(Quantity) AS TotalQuantity
,Price
FROM
CTE_rn
GROUP BY
ProductId
,DestinationId
,Price
,rn2-rn1
ORDER BY
ProductID
,DestinationId
,StartDate
;
Final result
+-----------+---------------+------------+------------+---------------+-------+
| ProductId | DestinationId | StartDate | EndDate | TotalQuantity | Price |
+-----------+---------------+------------+------------+---------------+-------+
| 0 | 1000 | 2018-04-01 | 2018-04-01 | 5 | 1 |
| 0 | 1000 | 2018-04-02 | 2018-04-03 | 17 | 2 |
| 0 | 1000 | 2018-06-01 | 2018-06-03 | 22 | 2 |
| 3 | 5000 | 2018-05-07 | 2018-05-09 | 90 | 5 |
| 3 | 5000 | 2018-05-10 | 2018-05-11 | 26 | 20 |
| 3 | 5000 | 2018-06-07 | 2018-06-11 | 116 | 20 |
+-----------+---------------+------------+------------+---------------+-------+
Not sure that i understand correctly, but this is just my idea:
Select concat_ws(',',view2.StartDate, string_agg(view1.splitDate, ','),
view2.EndDate), view2.productId, view2.DestinationId from (
SELECT DENSE_RANK() OVER (ORDER BY EffectiveDate) as Rank, EffectiveDate as
SplitDate FROM PriceChange GROUP BY EffectiveDate) view1 join
(
SELECT MIN(ScheduledDate) as StartDate, MAX(ScheduledDate) as
EndDate,ProductId, DestinationId, SUM(Quantity) as TotalQuantity
FROM (
SELECT ScheduledDate, DestinationId, ProductId, PartitionGroup =
DATEADD(DAY ,-1 * DENSE_RANK() OVER (ORDER BY ScheduledDate),
ScheduledDate), Quantity
FROM History
) tmp
GROUP BY PartitionGroup, DestinationId, ProductId
) view2 on view1.SplitDate >= view2.StartDate
and view1.SplitDate <=view2.EndDate
group by view2.startDate, view2.endDate, view2.productId,
view2.DestinationId
The result from this query will be:
| ranges | productId | DestinationId |
|---------------------------------------------|-----------|---------------|
| 2018-04-01,2018-04-02,2018-04-03 | 0 | 1000 |
| 2018-05-07,2018-05-10,2018-05-11 | 3 | 5000 |
Then, with any procedure language, for each row, you can split the string (with appropriate inclusive or exclusive rule for each boundary) to find out a list of condition (:from, :to, :productId, :destinationId).
And finally, you can loop through the list of conditions and use Union all clause to build one query (which is the union of all queries, which states a condition) to find out the final result. For example,
Select * from History where ScheduledDate >= '2018-04-01' and ScheduledDate <'2018-04-02' and productId = 0 and destinationId = 1000
union all
Select * from History where ScheduledDate >= '2018-04-02' and ScheduledDate <'2018-04-03' and productId = 0 and destinationId = 1000
----Update--------
Just based on above idea, i do some quick changes to provide your resultset. Maybe you can optimize it later
with view3 as
(Select concat_ws(',',view2.StartDate, string_agg(view1.splitDate, ','),
dateadd(day, 1, view2.EndDate)) dateRange, view2.productId, view2.DestinationId from (
SELECT DENSE_RANK() OVER (ORDER BY EffectiveDate) as Rank, EffectiveDate as
SplitDate FROM PriceChange GROUP BY EffectiveDate) view1 join
(
SELECT MIN(ScheduledDate) as StartDate, MAX(ScheduledDate) as
EndDate,ProductId, DestinationId, SUM(Quantity) as TotalQuantity
FROM (
SELECT ScheduledDate, DestinationId, ProductId, PartitionGroup =
DATEADD(DAY ,-1 * DENSE_RANK() OVER (ORDER BY ScheduledDate),
ScheduledDate), Quantity
FROM History
) tmp
GROUP BY PartitionGroup, DestinationId, ProductId
) view2 on view1.SplitDate >= view2.StartDate
and view1.SplitDate <=view2.EndDate
group by view2.startDate, view2.endDate, view2.productId,
view2.DestinationId
),
view4 as
(
select productId, destinationId, value from view3 cross apply string_split(dateRange, ',')
),
view5 as(
select *, row_number() over(partition by productId, destinationId order by value) rn from view4
),
view6 as (
select v52.value fr, v51.value t, v51.productid, v51. destinationid from view5 v51 join view5 v52
on v51.productid = v52.productid
and v51.destinationid = v52.destinationid
and v51.rn = v52.rn+1
)
select min(h.ScheduledDate) StartDate, max(h.ScheduledDate) EndDate, v6.productId, v6.destinationId, sum(h.quantity) TotalQuantity from view6 v6 join History h
on v6.destinationId = h.destinationId
and v6.productId = h.productId
and h.ScheduledDate >= v6.fr
and h.ScheduledDate <v6.t
group by v6.fr, v6.t, v6.productId, v6.destinationId
And the result is exactly the same with what you gave.
| StartDate | EndDate | productId | destinationId | TotalQuantity |
|------------|------------|-----------|---------------|---------------|
| 2018-04-01 | 2018-04-01 | 0 | 1000 | 5 |
| 2018-04-02 | 2018-04-03 | 0 | 1000 | 17 |
| 2018-05-07 | 2018-05-09 | 3 | 5000 | 90 |
| 2018-05-10 | 2018-05-11 | 3 | 5000 | 26 |
Use outer apply to choose the nearest price, then do a group by:
Live test: http://www.sqlfiddle.com/#!18/af568/65
select
StartDate = min(h.ScheduledDate),
EndDate = max(h.ScheduledDate),
h.ProductId,
h.DestinationId,
TotalQuantity = sum(h.Quantity)
from History h
outer apply
(
select top 1 pc.*
from PriceChange pc
where
pc.ProductId = h.ProductId
and pc.Effectivedate <= h.ScheduledDate
order by pc.EffectiveDate desc
) UpToDate
group by UpToDate.EffectiveDate,
h.ProductId,
h.DestinationId
order by StartDate, EndDate, ProductId
Output:
| StartDate | EndDate | ProductId | DestinationId | TotalQuantity |
|------------|------------|-----------|---------------|---------------|
| 2018-04-01 | 2018-04-01 | 0 | 1000 | 5 |
| 2018-04-02 | 2018-04-03 | 0 | 1000 | 17 |
| 2018-05-07 | 2018-05-09 | 3 | 5000 | 90 |
| 2018-05-10 | 2018-05-11 | 3 | 5000 | 26 |