Create calculated value based on calculated value inside previous row - sql

I'm trying to find a way to apply monthly percentage changes to forecast pricing. I set my problem up in excel to make it a bit more clear. I'm using SQL Server 2017.
We'll say all months before 9/1/18 are historical and 9/1/18 and beyond are forecasts. I need to calculate the forecast price (shaded in yellow on the sample data) using...
Forecast Price = (Previous Row Forecast Price * Pct Change) + Previous Row Forecast Price
Just to be clear, the yellow shaded prices do not exist in my data yet. That is what I am trying to have my query calculate. Since this is monthly percentage change, each row depends on the row before and goes beyond a single ROW_NUMBER/PARTITION solution because we have to use the previous calculated price. Clearly what is an easy sequential calculation in excel is a bit more difficult here. Any idea how to create forecasted price column in SQL?

You need to use a recursive CTE. That is one of the easier ways to look at the value of a calculated value from previous row:
DECLARE #t TABLE(Date DATE, ID VARCHAR(10), Price DECIMAL(10, 2), PctChange DECIMAL(10, 2));
INSERT INTO #t VALUES
('2018-01-01', 'ABC', 100, NULL),
('2018-01-02', 'ABC', 150, 50.00),
('2018-01-03', 'ABC', 130, -13.33),
('2018-01-04', 'ABC', 120, -07.69),
('2018-01-05', 'ABC', 110, -08.33),
('2018-01-06', 'ABC', 120, 9.09),
('2018-01-07', 'ABC', 120, 0.00),
('2018-01-08', 'ABC', 100, -16.67),
('2018-01-09', 'ABC', NULL, -07.21),
('2018-01-10', 'ABC', NULL, 1.31),
('2018-01-11', 'ABC', NULL, 6.38),
('2018-01-12', 'ABC', NULL, -30.00),
('2019-01-01', 'ABC', NULL, 14.29),
('2019-01-02', 'ABC', NULL, 5.27);
WITH ncte AS (
-- number the rows sequentially without gaps
SELECT *, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Date) AS rn
FROM #t
), rcte AS (
-- find first row in each group
SELECT *, Price AS ForecastedPrice
FROM ncte AS base
WHERE rn = 1
UNION ALL
-- find next row for each group from prev rows
SELECT curr.*, CAST(prev.ForecastedPrice * (1 + curr.PctChange / 100) AS DECIMAL(10, 2))
FROM ncte AS curr
INNER JOIN rcte AS prev ON curr.ID = prev.ID AND curr.rn = prev.rn + 1
)
SELECT *
FROM rcte
ORDER BY ID, rn
Result:
| Date | ID | Price | PctChange | rn | ForecastedPrice |
|------------|-----|--------|-----------|----|-----------------|
| 2018-01-01 | ABC | 100.00 | NULL | 1 | 100.00 |
| 2018-01-02 | ABC | 150.00 | 50.00 | 2 | 150.00 |
| 2018-01-03 | ABC | 130.00 | -13.33 | 3 | 130.01 |
| 2018-01-04 | ABC | 120.00 | -7.69 | 4 | 120.01 |
| 2018-01-05 | ABC | 110.00 | -8.33 | 5 | 110.01 |
| 2018-01-06 | ABC | 120.00 | 9.09 | 6 | 120.01 |
| 2018-01-07 | ABC | 120.00 | 0.00 | 7 | 120.01 |
| 2018-01-08 | ABC | 100.00 | -16.67 | 8 | 100.00 |
| 2018-01-09 | ABC | NULL | -7.21 | 9 | 92.79 |
| 2018-01-10 | ABC | NULL | 1.31 | 10 | 94.01 |
| 2018-01-11 | ABC | NULL | 6.38 | 11 | 100.01 |
| 2018-01-12 | ABC | NULL | -30.00 | 12 | 70.01 |
| 2019-01-01 | ABC | NULL | 14.29 | 13 | 80.01 |
| 2019-01-02 | ABC | NULL | 5.27 | 14 | 84.23 |
Demo on DB Fiddle

In SQL Server you can access values of previous / next rows by using the windowing functions LAG and LEAD. You need to define the order of the rows by specifying it in the OVER clause. You may need to wrap the select query, that returns prev/next values in a derived table or CTE, and then select from it and calculate your forecasts.
with cte as (SELECT [Date], Price, LAG(Price, 1) over(order by [Date]) as PrevPrice from TABLE)
select [Date], Price, Price - PrevPrice as PriceChange from cte

Related

Fill Rows with future dates even when there's no value

Story:
My dataset looks like this:
+---------+------+-----------------+---------+
| Date | Cost | Revenue Month | Revenue |
+---------+------+-----------------+---------+
| 2018-01 | 20 | 2018-02 | 20 |
| 2018-01 | 20 | 2018-03 | 100 |
| 2018-02 | 5 | 2018-03 | 15 |
| 2018-02 | 5 | 2018-04 | 25 |
+---------+------+-----------------+---------+
Basically the Date Column represents initial investment and the Revenue Month is for money generated due to the investment month. I would like to fill rows for the revenue month for each subsequent month until current month and force the Revenue to show 0 (i.e August 2020)
Objective:
+---------+------+-----------------+---------+---------+
| Date | Cost | Returning Month | Revenue | Product |
+---------+------+-----------------+---------+---------+
| 2018-01 | 20 | 2018-02 | 20 | A |
| 2018-01 | 20 | 2018-03 | 100 | A |
| 2018-01 | 20 | 2018-04 | 0 | A |
| 2018-01 | 20 | 2018-05 | 0 | A |
| 2018-02 | 5 | 2018-03 | 15 | A |
| 2018-02 | 5 | 2018-04 | 25 | A |
| 2018-02 | 5 | 2018-03 | 0 | A |
| 2018-02 | 5 | 2018-03 | 0 | A |
What I tried:
I built this tally date table
DROP TABLE IF EXISTS ##dates
CREATE TABLE ##dates ([date] Date)
DECLARE #dIncr DATE = '01/01/2018'
DECLARE #dEnd DATE = cast(getdate() as date)
WHILE (#dIncr <= #dEnd)
BEGIN
INSERT INTO ##dates ([date]) VALUES (#dIncr)
SELECT #dIncr = DATEADD(month,1,#dIncr)
END
But I'm stuck with this.
If you want to add two months to the data, you can use union all:
select Date, Cost, Returning_Month, Revenue, Product
from t
union all
select Date, Cost, dateadd(month, v.n, Returning_Month), 0 as Revenue, Product
from (select date, cost, max(returning_month) as returning_month, revenue, product
from t
group by date, cost, revenue, product
) t cross apply
(values (1), (2)) v(n);
EDIT:
Use a recursive CTE:
with cte as (
select date, cost, max(returning_month) as returning_month, revenue, product, 0 as lev
from t
group by date, cost, revenue, product
union all
select date, cost, dateadd(month, 1, returning_month), revenue, product, lev + 1
from cte
where returning_month < getdate()
)
select date, cost, returning_month, revenue, product
from cte
where lev > 0;

SQL: how to check for neither overlapping nor holes in payment records

I do have a table PaymentSchedules with percentages info, and dates from/to for which those
percentages are valid, resource by resource:
| auto_numbered | res_id | date_start | date_end | org | pct |
|---------------+--------+------------+------------+-------+-----|
| 1 | A | 2018-01-01 | 2019-06-30 | One | 100 |
| 2 | A | 2019-07-01 | (NULL) | One | 60 |
| 3 | A | 2019-07-02 | 2019-12-31 | Two | 40 |
| 4 | A | 2020-01-01 | (NULL) | Two | 40 |
| 5 | B | (NULL) | (NULL) | Three | 100 |
| 6 | C | 2018-01-01 | (NULL) | One | 100 |
| 7 | C | 2019-11-01 | (NULL) | Four | 100 |
(Records #3 and #4 could be summarized onto just one line, but duplicated on purpose, to show that there are many combinations of date_start and date_end.)
A quick reading of the data:
Org "One" is fully paying for resource A up to 2019-06-30; then, it continues
to pay 60% of the cost, but the rest (40%) is being paid by org "Two" since
2019-07-02.
This should begin on 2019-07-01... small encoding error… provoking a 1-day gap.
Org "Three" is fully paying for resource B, at all times.
Org "One" is fully paying for resource C from 2018-01-01... but, starting on
2019-01-11, org "Four" is paying for it...
... and, there, there is an encoding error: we do have 200% of resource C being
taken into account since 2019-11-01: the record #6 should have been closed
(date_end set to 2019-10-31), but hasn't...
So, when we generate a financial report for the year 2019 (from 2019-01-01 to
2019-12-31), we will have calculation errors...
So, question: how can we make sure we don't have overlapping payments for
resources, or -- also the contrary -- "holes" for some period of times?
How is it possible to write an SQL query to check that there are neither
underpaid nor overpaid resources? That is, all resources in the table should be
paid, for every single day of the financial period being looked at, by exactly
one or more organizations, in a way that the summed up percentage is always
equal to 100%.
I don't see how to proceed with such a query. Anybody able to give hints, to put
me on track?
EDIT -- Working with both SQL Server and Oracle.
EDIT -- I don't own the DB, I can't add triggers or views. I need to be able to detect things "after the facts"... Need to easily spot the conflictual records, or the "missing" ones (in case of "period holes"), fix them by hand, and then re-run the financial report.
EDIT -- If we make an analysis for 2019, the following report would be desired:
| res_id | pct_sum | date |
|--------+---------+------------|
| A | 60 | 2019-07-01 |
| C | 200 | 2019-11-01 |
| C | 200 | 2019-11-02 |
| C | 200 | ... |
| C | 200 | ... |
| C | 200 | ... |
| C | 200 | 2019-12-30 |
| C | 200 | 2019-12-31 |
or, of course, an even much better version -- certainly unobtainable? -- where one
type of problem would one be present once, with the relevant date range for
which the problem is observed:
| res_id | pct_sum | date_start | date_end |
|--------+---------+------------+------------|
| A | 60 | 2019-07-01 | 2019-07-01 |
| C | 200 | 2019-11-01 | 2019-12-31 |
EDIT -- Fiddle code: db<>fiddle here
Here's an incomplete attempt for Sql Server.
Basically, the idea was to use a recursive CTE to unfold months for each res_id.
Then left join 'what could be' to the existing date ranges.
But I doubt it can be done in a sql that would work both for Oracle & MS Sql Server.
Sure, both have window functions and CTE's.
But the datetime functions are rarely the same for different RDMS.
So I give up.
Maybe someone else finds an easier solution.
create table PaymentSchedules
(
auto_numbered int identity(1,1) primary key,
res_id varchar(30),
date_start date,
date_end date,
org varchar(30),
pct decimal(3,0)
)
GO
✓
insert into PaymentSchedules
(res_id, org, pct, date_start, date_end)
values
('A', 'One', 100, '2018-01-01', '2018-06-30')
, ('A', 'One', 100, '2019-01-01', '2019-06-30')
, ('A', 'One', 60, '2019-07-01', null)
, ('A', 'Two', 40, '2019-07-02', '2019-12-31')
, ('A', 'Two', 40, '2020-01-01', null)
, ('B', 'Three', 100, null, null)
, ('C', 'One', 100, '2018-01-01', null)
, ('C', 'Four', 100, '2019-11-01', null)
;
GO
8 rows affected
declare #MaxEndDate date;
set #MaxEndDate = (select max(iif(date_start > date_end, date_start, isnull(date_end, date_start))) from PaymentSchedules);
;with rcte as
(
select res_id
, datefromparts(year(min(date_start)), month(min(date_start)), 1) as month_start
, eomonth(coalesce(max(date_end), #MaxEndDate)) as month_end
, 0 as lvl
from PaymentSchedules
group by res_id
having min(date_start) is not null
union all
select res_id
, dateadd(month, 1, month_start)
, month_end
, lvl + 1
from rcte
where dateadd(month, 1, month_start) < month_end
)
, cte_gai as
(
select c.res_id, c.month_start, c.month_end
, t.org, t.pct, t.auto_numbered
, sum(isnull(t.pct,0)) over (partition by c.res_id, c.month_start) as res_month_pct
, count(t.auto_numbered) over (partition by c.res_id, c.month_start) as cnt
from rcte c
left join PaymentSchedules t
on t.res_id = c.res_id
and c.month_start >= datefromparts(year(t.date_start), month(t.date_start), 1)
and c.month_start <= coalesce(t.date_end, #MaxEndDate)
)
select *
from cte_gai
where res_month_pct <> 100
order by res_id, month_start
GO
res_id | month_start | month_end | org | pct | auto_numbered | res_month_pct | cnt
:----- | :---------- | :--------- | :--- | :--- | ------------: | :------------ | --:
A | 2018-07-01 | 2019-12-31 | null | null | null | 0 | 0
A | 2018-08-01 | 2019-12-31 | null | null | null | 0 | 0
A | 2018-09-01 | 2019-12-31 | null | null | null | 0 | 0
A | 2018-10-01 | 2019-12-31 | null | null | null | 0 | 0
A | 2018-11-01 | 2019-12-31 | null | null | null | 0 | 0
A | 2018-12-01 | 2019-12-31 | null | null | null | 0 | 0
C | 2019-11-01 | 2020-01-31 | One | 100 | 7 | 200 | 2
C | 2019-11-01 | 2020-01-31 | Four | 100 | 8 | 200 | 2
C | 2019-12-01 | 2020-01-31 | One | 100 | 7 | 200 | 2
C | 2019-12-01 | 2020-01-31 | Four | 100 | 8 | 200 | 2
C | 2020-01-01 | 2020-01-31 | One | 100 | 7 | 200 | 2
C | 2020-01-01 | 2020-01-31 | Four | 100 | 8 | 200 | 2
db<>fiddle here
I am not giving the full answer here but I think that you are after cursors(https://learn.microsoft.com/en-us/sql/t-sql/language-elements/declare-cursor-transact-sql?view=sql-server-ver15).
This allows you to iterate through the database, checking all of the records.
This is bad practice because even though the idea is really good, they are quite heavy, and they are slow, and they block the involved tables.
I know some people have found a method to rewrite cursors using loops (while probably), so you need to understand a cursor, get how you would implement it and then translate it into a loop. (https://www.sqlbook.com/advanced/sql-cursors-how-to-avoid-them/)
Also, views can be helpful, but I am assuming that you know how to use them already.
The algorithm should be something like:
have table1 and table2 (table2 is a copy of table1, https://www.tutorialrepublic.com/sql-tutorial/sql-cloning-tables.php)
iterate through all of the records (I would use in the first instance a cursor for this) from table1. Picking up a record from table1.
if overlapping dates (check it against table2) do something
else do something else
pick another record from table1 and go to step 2.
Drop unnecessary tables

Gaps And Islands: Splitting Islands Based On External Table

My scenario started off similar to a Island and Gaps problem, where I needed to find consecutive days of work. My current SQL query answers "ProductA was produced at LocationA from DateA through DateB, totaling X quantity".
However, this does not suffice when I needed to throw prices into the mix. Prices are in a separate table and handled in C# after the fact. Price changes are essentially a list of records that say "ProductA from LocationA is now Y value per unit effective DateC".
The end result is it works as long as the island does not overlap with a price-change date, but if it does overlap, I get a "close" answer, but it's not precise.
The C# code can handle applying the prices efficiently, what I need to do though is split the islands based on price changes. My goal is to make the SQL's partioning take into account the ranking of days from the other table, but I'm having trouble applying what I want to do.
The current SQL that generates my island is as follows
SELECT MIN(ScheduledDate) as StartDate, MAX(ScheduledDate) as
EndDate, ProductId, DestinationId, SUM(Quantity) as TotalQuantity
FROM (
SELECT ScheduledDate, DestinationId, ProductId, PartitionGroup = DATEADD(DAY ,-1 * DENSE_RANK() OVER (ORDER BY ScheduledDate), ScheduledDate), Quantity
FROM History
) tmp
GROUP BY PartitionGroup, DestinationId, ProductId;
The current SQL that takes from the PriceChange table and ranks the dates is as follows
DECLARE #PriceChangeDates TABLE(Rank int, SplitDate Date);
INSERT INTO #PriceChangeDates
SELECT DENSE_RANK() over (ORDER BY EffectiveDate) as Rank, EffectiveDate as SplitDate
FROM ProductPriceChange
GROUP BY EffectiveDate;
My thought is to somehow update the first queries inner SELECT statement to somehow take advantage of the #PriceChangeDates table created by the second query. I would think we can multiply the DATEADD's increment parameter by the rank from the declared table, but I am struggling to write it.
If I was to somehow do this with loops, my thought process would be to determine which rank the ScheduledDate would be from the #PriceChangeDates table, where its rank is the rank of the closest Date that is smaller than itself it can find. Then take whatever rank that gives and, I would think, multiply it by the increment parameter being passed in (or some math, for example doing a *#PriceChangeDates.Count() on the existing parameter and then adding in the new rank to avoid collisions). However, that's "loop" logic not "set" logic, and in SQL I need to think in sets.
Any and all help/advice is greatly appreciated. Thank you :)
UPDATE:
Sample data & example on SQLFiddle: http://www.sqlfiddle.com/#!18/af568/1
Where the data is:
CREATE TABLE History
(
ProductId int,
DestinationId int,
ScheduledDate date,
Quantity float
);
INSERT INTO History (ProductId, DestinationId, ScheduledDate, Quantity)
VALUES
(0, 1000, '20180401', 5),
(0, 1000, '20180402', 10),
(0, 1000, '20180403', 7),
(3, 5000, '20180507', 15),
(3, 5000, '20180508', 23),
(3, 5000, '20180509', 52),
(3, 5000, '20180510', 12),
(3, 5000, '20180511', 14);
CREATE TABLE PriceChange
(
ProductId int,
DestinationId int,
EffectiveDate date,
Price float
);
INSERT INTO PriceChange (ProductId, DestinationId, EffectiveDate, Price)
VALUES
(0, 1000, '20180201', 1),
(0, 1000, '20180402', 2),
(3, 5000, '20180101', 5),
(3, 5000, '20180510', 20);
The desired results would be to have a SQL statement that generates the result:
StartDate EndDate ProductId DestinationId TotalQuantity
2018-04-01 2018-04-01 0 1000 5
2018-04-02 2018-04-03 0 1000 17
2018-05-07 2018-05-09 3 5000 90
2018-05-10 2018-05-11 3 5000 26
To clarify, the end result does need the TotalQuantity of each split amount, so the procedural code that manipulates the results and applies the pricing knows how much of each product was one on each side of the price change to accurately determine the values.
Here is one more variant that is likely to perform better than my first answer. I decided to put it as a second answer, because the approach is rather different and the answer would be too long. You should compare performance of all variants with your real data on your hardware, and don't forget about indexes.
In the first variant I was using APPLY to pick a relevant price for each row in the History table. For each row from the History table the engine is searching for a relevant row from the PriceChange table. Even with appropriate index on the PriceChange table when this is done via a single seek, it still means 3.7 million seeks in a loop join.
We can simply join History and PriceChange tables together and with appropriate indexes on both tables it will be an efficient merge join.
Here I'm also using an extended sample data set to illustrate the gaps. I added these rows to the sample data from the question.
INSERT INTO History (ProductId, DestinationId, ScheduledDate, Quantity)
VALUES
(0, 1000, '20180601', 5),
(0, 1000, '20180602', 10),
(0, 1000, '20180603', 7),
(3, 5000, '20180607', 15),
(3, 5000, '20180608', 23),
(3, 5000, '20180609', 52),
(3, 5000, '20180610', 12),
(3, 5000, '20180611', 14);
Intermediate query
We do a FULL JOIN here, not a LEFT JOIN because it is possible that the date on which the price changed doesn't appear in the History table at all.
WITH
CTE_Join
AS
(
SELECT
ISNULL(History.ProductId, PriceChange.ProductID) AS ProductID
,ISNULL(History.DestinationId, PriceChange.DestinationId) AS DestinationId
,ISNULL(History.ScheduledDate, PriceChange.EffectiveDate) AS ScheduledDate
,History.Quantity
,PriceChange.Price
FROM
History
FULL JOIN PriceChange
ON PriceChange.ProductID = History.ProductID
AND PriceChange.DestinationId = History.DestinationId
AND PriceChange.EffectiveDate = History.ScheduledDate
)
,CTE2
AS
(
SELECT
ProductID
,DestinationId
,ScheduledDate
,Quantity
,Price
,MAX(CASE WHEN Price IS NOT NULL THEN ScheduledDate END)
OVER (PARTITION BY ProductID, DestinationId ORDER BY ScheduledDate
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS grp
FROM CTE_Join
)
SELECT *
FROM CTE2
ORDER BY
ProductID
,DestinationId
,ScheduledDate
Create the following indexes
CREATE UNIQUE NONCLUSTERED INDEX [IX_History] ON [dbo].[History]
(
[ProductId] ASC,
[DestinationId] ASC,
[ScheduledDate] ASC
)
INCLUDE ([Quantity])
CREATE UNIQUE NONCLUSTERED INDEX [IX_Price] ON [dbo].[PriceChange]
(
[ProductId] ASC,
[DestinationId] ASC,
[EffectiveDate] ASC
)
INCLUDE ([Price])
and the join will be an efficient MERGE join in the execution plan (not a LOOP join)
Intermediate result
+-----------+---------------+---------------+----------+-------+------------+
| ProductID | DestinationId | ScheduledDate | Quantity | Price | grp |
+-----------+---------------+---------------+----------+-------+------------+
| 0 | 1000 | 2018-02-01 | NULL | 1 | 2018-02-01 |
| 0 | 1000 | 2018-04-01 | 5 | NULL | 2018-02-01 |
| 0 | 1000 | 2018-04-02 | 10 | 2 | 2018-04-02 |
| 0 | 1000 | 2018-04-03 | 7 | NULL | 2018-04-02 |
| 0 | 1000 | 2018-06-01 | 5 | NULL | 2018-04-02 |
| 0 | 1000 | 2018-06-02 | 10 | NULL | 2018-04-02 |
| 0 | 1000 | 2018-06-03 | 7 | NULL | 2018-04-02 |
| 3 | 5000 | 2018-01-01 | NULL | 5 | 2018-01-01 |
| 3 | 5000 | 2018-05-07 | 15 | NULL | 2018-01-01 |
| 3 | 5000 | 2018-05-08 | 23 | NULL | 2018-01-01 |
| 3 | 5000 | 2018-05-09 | 52 | NULL | 2018-01-01 |
| 3 | 5000 | 2018-05-10 | 12 | 20 | 2018-05-10 |
| 3 | 5000 | 2018-05-11 | 14 | NULL | 2018-05-10 |
| 3 | 5000 | 2018-06-07 | 15 | NULL | 2018-05-10 |
| 3 | 5000 | 2018-06-08 | 23 | NULL | 2018-05-10 |
| 3 | 5000 | 2018-06-09 | 52 | NULL | 2018-05-10 |
| 3 | 5000 | 2018-06-10 | 12 | NULL | 2018-05-10 |
| 3 | 5000 | 2018-06-11 | 14 | NULL | 2018-05-10 |
+-----------+---------------+---------------+----------+-------+------------+
You can see that the Price column has a lot of NULL values. We need to "fill" these NULL values with the preceding non-NULL value.
Itzik Ben-Gan wrote a nice article showing how to solve this efficiently The Last non NULL Puzzle. Also see Best way to replace NULL with most recent non-null value.
This is done in CTE2 using MAX window function and you can see how it populates the grp column. This requires SQL Server 2012+. After the groups are determined we should remove rows where Quantity is NULL, because these rows are not from the History table.
Now we can do the same gaps-and-islands step using the grp column as an additional partitioning.
The rest of the query is pretty much the same as in the first variant.
Final query
WITH
CTE_Join
AS
(
SELECT
ISNULL(History.ProductId, PriceChange.ProductID) AS ProductID
,ISNULL(History.DestinationId, PriceChange.DestinationId) AS DestinationId
,ISNULL(History.ScheduledDate, PriceChange.EffectiveDate) AS ScheduledDate
,History.Quantity
,PriceChange.Price
FROM
History
FULL JOIN PriceChange
ON PriceChange.ProductID = History.ProductID
AND PriceChange.DestinationId = History.DestinationId
AND PriceChange.EffectiveDate = History.ScheduledDate
)
,CTE2
AS
(
SELECT
ProductID
,DestinationId
,ScheduledDate
,Quantity
,Price
,MAX(CASE WHEN Price IS NOT NULL THEN ScheduledDate END)
OVER (PARTITION BY ProductID, DestinationId ORDER BY ScheduledDate
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS grp
FROM CTE_Join
)
,CTE_RN
AS
(
SELECT
ProductID
,DestinationId
,ScheduledDate
,grp
,Quantity
,ROW_NUMBER() OVER (PARTITION BY ProductId, DestinationId, grp ORDER BY ScheduledDate) AS rn1
,DATEDIFF(day, '20000101', ScheduledDate) AS rn2
FROM CTE2
WHERE Quantity IS NOT NULL
)
SELECT
ProductId
,DestinationId
,MIN(ScheduledDate) AS StartDate
,MAX(ScheduledDate) AS EndDate
,SUM(Quantity) AS TotalQuantity
FROM
CTE_RN
GROUP BY
ProductId
,DestinationId
,grp
,rn2-rn1
ORDER BY
ProductID
,DestinationId
,StartDate
;
Final result
+-----------+---------------+------------+------------+---------------+
| ProductId | DestinationId | StartDate | EndDate | TotalQuantity |
+-----------+---------------+------------+------------+---------------+
| 0 | 1000 | 2018-04-01 | 2018-04-01 | 5 |
| 0 | 1000 | 2018-04-02 | 2018-04-03 | 17 |
| 0 | 1000 | 2018-06-01 | 2018-06-03 | 22 |
| 3 | 5000 | 2018-05-07 | 2018-05-09 | 90 |
| 3 | 5000 | 2018-05-10 | 2018-05-11 | 26 |
| 3 | 5000 | 2018-06-07 | 2018-06-11 | 116 |
+-----------+---------------+------------+------------+---------------+
This variant doesn't output the relevant price (as the first variant), because I simplified the "last non-null" query. It wasn't required in the question. In any case, it is pretty easy to add the price if needed.
The straight-forward method is to fetch the effective price for each row of History and then generate gaps and islands taking price into account.
It is not clear from the question what is the role of DestinationID. Sample data is of no help here.
I'll assume that we need to join and partition on both ProductID and DestinationID.
The following query returns effective Price for each row from History.
You need to add index to the PriceChange table
CREATE NONCLUSTERED INDEX [IX] ON [dbo].[PriceChange]
(
[ProductId] ASC,
[DestinationId] ASC,
[EffectiveDate] DESC
)
INCLUDE ([Price])
for this query to work efficiently.
Query for Prices
SELECT
History.ProductId
,History.DestinationId
,History.ScheduledDate
,History.Quantity
,A.Price
FROM
History
OUTER APPLY
(
SELECT TOP(1)
PriceChange.Price
FROM
PriceChange
WHERE
PriceChange.ProductID = History.ProductID
AND PriceChange.DestinationId = History.DestinationId
AND PriceChange.EffectiveDate <= History.ScheduledDate
ORDER BY
PriceChange.EffectiveDate DESC
) AS A
ORDER BY ProductID, ScheduledDate;
For each row from History there will be one seek in this index to pick the correct price.
This query returns:
Prices
+-----------+---------------+---------------+----------+-------+
| ProductId | DestinationId | ScheduledDate | Quantity | Price |
+-----------+---------------+---------------+----------+-------+
| 0 | 1000 | 2018-04-01 | 5 | 1 |
| 0 | 1000 | 2018-04-02 | 10 | 2 |
| 0 | 1000 | 2018-04-03 | 7 | 2 |
| 3 | 5000 | 2018-05-07 | 15 | 5 |
| 3 | 5000 | 2018-05-08 | 23 | 5 |
| 3 | 5000 | 2018-05-09 | 52 | 5 |
| 3 | 5000 | 2018-05-10 | 12 | 20 |
| 3 | 5000 | 2018-05-11 | 14 | 20 |
+-----------+---------------+---------------+----------+-------+
Now a standard gaps-and-island step to collapse consecutive days with the same price together. I use a difference of two row number sequences here.
I've added some more rows to your sample data to see the gaps within the same ProductId.
INSERT INTO History (ProductId, DestinationId, ScheduledDate, Quantity)
VALUES
(0, 1000, '20180601', 5),
(0, 1000, '20180602', 10),
(0, 1000, '20180603', 7),
(3, 5000, '20180607', 15),
(3, 5000, '20180608', 23),
(3, 5000, '20180609', 52),
(3, 5000, '20180610', 12),
(3, 5000, '20180611', 14);
If you run this intermediate query you'll see how it works:
WITH
CTE_Prices
AS
(
SELECT
History.ProductId
,History.DestinationId
,History.ScheduledDate
,History.Quantity
,A.Price
FROM
History
OUTER APPLY
(
SELECT TOP(1)
PriceChange.Price
FROM
PriceChange
WHERE
PriceChange.ProductID = History.ProductID
AND PriceChange.DestinationId = History.DestinationId
AND PriceChange.EffectiveDate <= History.ScheduledDate
ORDER BY
PriceChange.EffectiveDate DESC
) AS A
)
,CTE_rn
AS
(
SELECT
ProductId
,DestinationId
,ScheduledDate
,Quantity
,Price
,ROW_NUMBER() OVER (PARTITION BY ProductId, DestinationId, Price ORDER BY ScheduledDate) AS rn1
,DATEDIFF(day, '20000101', ScheduledDate) AS rn2
FROM
CTE_Prices
)
SELECT *
,rn2-rn1 AS Diff
FROM CTE_rn
Intermediate result
+-----------+---------------+---------------+----------+-------+-----+------+------+
| ProductId | DestinationId | ScheduledDate | Quantity | Price | rn1 | rn2 | Diff |
+-----------+---------------+---------------+----------+-------+-----+------+------+
| 0 | 1000 | 2018-04-01 | 5 | 1 | 1 | 6665 | 6664 |
| 0 | 1000 | 2018-04-02 | 10 | 2 | 1 | 6666 | 6665 |
| 0 | 1000 | 2018-04-03 | 7 | 2 | 2 | 6667 | 6665 |
| 0 | 1000 | 2018-06-01 | 5 | 2 | 3 | 6726 | 6723 |
| 0 | 1000 | 2018-06-02 | 10 | 2 | 4 | 6727 | 6723 |
| 0 | 1000 | 2018-06-03 | 7 | 2 | 5 | 6728 | 6723 |
| 3 | 5000 | 2018-05-07 | 15 | 5 | 1 | 6701 | 6700 |
| 3 | 5000 | 2018-05-08 | 23 | 5 | 2 | 6702 | 6700 |
| 3 | 5000 | 2018-05-09 | 52 | 5 | 3 | 6703 | 6700 |
| 3 | 5000 | 2018-05-10 | 12 | 20 | 1 | 6704 | 6703 |
| 3 | 5000 | 2018-05-11 | 14 | 20 | 2 | 6705 | 6703 |
| 3 | 5000 | 2018-06-07 | 15 | 20 | 3 | 6732 | 6729 |
| 3 | 5000 | 2018-06-08 | 23 | 20 | 4 | 6733 | 6729 |
| 3 | 5000 | 2018-06-09 | 52 | 20 | 5 | 6734 | 6729 |
| 3 | 5000 | 2018-06-10 | 12 | 20 | 6 | 6735 | 6729 |
| 3 | 5000 | 2018-06-11 | 14 | 20 | 7 | 6736 | 6729 |
+-----------+---------------+---------------+----------+-------+-----+------+------+
Now simply group by the Diff to get one row per interval.
Final query
WITH
CTE_Prices
AS
(
SELECT
History.ProductId
,History.DestinationId
,History.ScheduledDate
,History.Quantity
,A.Price
FROM
History
OUTER APPLY
(
SELECT TOP(1)
PriceChange.Price
FROM
PriceChange
WHERE
PriceChange.ProductID = History.ProductID
AND PriceChange.DestinationId = History.DestinationId
AND PriceChange.EffectiveDate <= History.ScheduledDate
ORDER BY
PriceChange.EffectiveDate DESC
) AS A
)
,CTE_rn
AS
(
SELECT
ProductId
,DestinationId
,ScheduledDate
,Quantity
,Price
,ROW_NUMBER() OVER (PARTITION BY ProductId, DestinationId, Price ORDER BY ScheduledDate) AS rn1
,DATEDIFF(day, '20000101', ScheduledDate) AS rn2
FROM
CTE_Prices
)
SELECT
ProductId
,DestinationId
,MIN(ScheduledDate) AS StartDate
,MAX(ScheduledDate) AS EndDate
,SUM(Quantity) AS TotalQuantity
,Price
FROM
CTE_rn
GROUP BY
ProductId
,DestinationId
,Price
,rn2-rn1
ORDER BY
ProductID
,DestinationId
,StartDate
;
Final result
+-----------+---------------+------------+------------+---------------+-------+
| ProductId | DestinationId | StartDate | EndDate | TotalQuantity | Price |
+-----------+---------------+------------+------------+---------------+-------+
| 0 | 1000 | 2018-04-01 | 2018-04-01 | 5 | 1 |
| 0 | 1000 | 2018-04-02 | 2018-04-03 | 17 | 2 |
| 0 | 1000 | 2018-06-01 | 2018-06-03 | 22 | 2 |
| 3 | 5000 | 2018-05-07 | 2018-05-09 | 90 | 5 |
| 3 | 5000 | 2018-05-10 | 2018-05-11 | 26 | 20 |
| 3 | 5000 | 2018-06-07 | 2018-06-11 | 116 | 20 |
+-----------+---------------+------------+------------+---------------+-------+
Not sure that i understand correctly, but this is just my idea:
Select concat_ws(',',view2.StartDate, string_agg(view1.splitDate, ','),
view2.EndDate), view2.productId, view2.DestinationId from (
SELECT DENSE_RANK() OVER (ORDER BY EffectiveDate) as Rank, EffectiveDate as
SplitDate FROM PriceChange GROUP BY EffectiveDate) view1 join
(
SELECT MIN(ScheduledDate) as StartDate, MAX(ScheduledDate) as
EndDate,ProductId, DestinationId, SUM(Quantity) as TotalQuantity
FROM (
SELECT ScheduledDate, DestinationId, ProductId, PartitionGroup =
DATEADD(DAY ,-1 * DENSE_RANK() OVER (ORDER BY ScheduledDate),
ScheduledDate), Quantity
FROM History
) tmp
GROUP BY PartitionGroup, DestinationId, ProductId
) view2 on view1.SplitDate >= view2.StartDate
and view1.SplitDate <=view2.EndDate
group by view2.startDate, view2.endDate, view2.productId,
view2.DestinationId
The result from this query will be:
| ranges | productId | DestinationId |
|---------------------------------------------|-----------|---------------|
| 2018-04-01,2018-04-02,2018-04-03 | 0 | 1000 |
| 2018-05-07,2018-05-10,2018-05-11 | 3 | 5000 |
Then, with any procedure language, for each row, you can split the string (with appropriate inclusive or exclusive rule for each boundary) to find out a list of condition (:from, :to, :productId, :destinationId).
And finally, you can loop through the list of conditions and use Union all clause to build one query (which is the union of all queries, which states a condition) to find out the final result. For example,
Select * from History where ScheduledDate >= '2018-04-01' and ScheduledDate <'2018-04-02' and productId = 0 and destinationId = 1000
union all
Select * from History where ScheduledDate >= '2018-04-02' and ScheduledDate <'2018-04-03' and productId = 0 and destinationId = 1000
----Update--------
Just based on above idea, i do some quick changes to provide your resultset. Maybe you can optimize it later
with view3 as
(Select concat_ws(',',view2.StartDate, string_agg(view1.splitDate, ','),
dateadd(day, 1, view2.EndDate)) dateRange, view2.productId, view2.DestinationId from (
SELECT DENSE_RANK() OVER (ORDER BY EffectiveDate) as Rank, EffectiveDate as
SplitDate FROM PriceChange GROUP BY EffectiveDate) view1 join
(
SELECT MIN(ScheduledDate) as StartDate, MAX(ScheduledDate) as
EndDate,ProductId, DestinationId, SUM(Quantity) as TotalQuantity
FROM (
SELECT ScheduledDate, DestinationId, ProductId, PartitionGroup =
DATEADD(DAY ,-1 * DENSE_RANK() OVER (ORDER BY ScheduledDate),
ScheduledDate), Quantity
FROM History
) tmp
GROUP BY PartitionGroup, DestinationId, ProductId
) view2 on view1.SplitDate >= view2.StartDate
and view1.SplitDate <=view2.EndDate
group by view2.startDate, view2.endDate, view2.productId,
view2.DestinationId
),
view4 as
(
select productId, destinationId, value from view3 cross apply string_split(dateRange, ',')
),
view5 as(
select *, row_number() over(partition by productId, destinationId order by value) rn from view4
),
view6 as (
select v52.value fr, v51.value t, v51.productid, v51. destinationid from view5 v51 join view5 v52
on v51.productid = v52.productid
and v51.destinationid = v52.destinationid
and v51.rn = v52.rn+1
)
select min(h.ScheduledDate) StartDate, max(h.ScheduledDate) EndDate, v6.productId, v6.destinationId, sum(h.quantity) TotalQuantity from view6 v6 join History h
on v6.destinationId = h.destinationId
and v6.productId = h.productId
and h.ScheduledDate >= v6.fr
and h.ScheduledDate <v6.t
group by v6.fr, v6.t, v6.productId, v6.destinationId
And the result is exactly the same with what you gave.
| StartDate | EndDate | productId | destinationId | TotalQuantity |
|------------|------------|-----------|---------------|---------------|
| 2018-04-01 | 2018-04-01 | 0 | 1000 | 5 |
| 2018-04-02 | 2018-04-03 | 0 | 1000 | 17 |
| 2018-05-07 | 2018-05-09 | 3 | 5000 | 90 |
| 2018-05-10 | 2018-05-11 | 3 | 5000 | 26 |
Use outer apply to choose the nearest price, then do a group by:
Live test: http://www.sqlfiddle.com/#!18/af568/65
select
StartDate = min(h.ScheduledDate),
EndDate = max(h.ScheduledDate),
h.ProductId,
h.DestinationId,
TotalQuantity = sum(h.Quantity)
from History h
outer apply
(
select top 1 pc.*
from PriceChange pc
where
pc.ProductId = h.ProductId
and pc.Effectivedate <= h.ScheduledDate
order by pc.EffectiveDate desc
) UpToDate
group by UpToDate.EffectiveDate,
h.ProductId,
h.DestinationId
order by StartDate, EndDate, ProductId
Output:
| StartDate | EndDate | ProductId | DestinationId | TotalQuantity |
|------------|------------|-----------|---------------|---------------|
| 2018-04-01 | 2018-04-01 | 0 | 1000 | 5 |
| 2018-04-02 | 2018-04-03 | 0 | 1000 | 17 |
| 2018-05-07 | 2018-05-09 | 3 | 5000 | 90 |
| 2018-05-10 | 2018-05-11 | 3 | 5000 | 26 |

How to get record by date from 1 table and update other table in Postgresql?

I have two tables. In one table(order_produt) have multiple records by date and other table(Transfer_product) also multiple record by date. order_product table have correct record. i want update my transfer_product table with order_product table by date range.
order_product_table
-------------------------
id | date | Product_id | value
-------------------------------------------
1 | 2017-07-01 | 2 | 53
2 | 2017-08-05 | 2 | 67
3 | 2017-10-02 | 2 | 83
4 | 2018-01-20 | 5 | 32
5 | 2018-05-01 | 5 | 53
6 | 2008-08-05 | 6 | 67
Transfer_product_table
----------------------------
id | date | Product_id | value
--------------------------------------------
1 | 2017-08-01 | 2 | 10
2 | 2017-10-06 | 2 | 20
3 | 2017-12-12 | 2 | 31
4 | 2018-06-25 | 5 | 5
Result(Transfer_product_table)
--------------------------------
id | date | Product_id | value
--------------------------------------------
1 | 2017-08-01 | 2 | 53
2 | 2017-10-06 | 2 | 83
3 | 2017-12-12 | 2 | 83
4 | 2018-06-25 | 5 | 53
I want by date value update like you can see Result table.
i use query partion by but this is not what i want.
UPDATE Transfer_product_table imp
SET value = sub.value
FROM (SELECT product_id,value
,ROW_NUMBER() OVER (PARTITION BY product_id ORDER BY orderdate DESC)AS Rno
FROM order_product_table
where orderdate between '2017-07-01' and '2019-10-31') sub
WHERE imp.product_id = sub.product_id
and sub.Rno=1
and imp.date between '2017-07-01' and '2019-10-31'
This is pretty straightforward using postgres' awesome daterange type.
with order_product_table as (
select * from (
VALUES (1, '2017-07-01'::date, 2, 53),
(2, '2017-08-05', 2, 67),
(3, '2017-10-02', 2, 83),
(4, '2018-01-20', 5, 32),
(5, '2018-05-01', 5, 53),
(6, '2008-08-05', 6, 67)
) v(id, date, product_id, value)
), transfer_product_table as (
select * from (
VALUES (1, '2017-08-01'::date, 2, 10),
(2, '2017-10-06', 2, 20),
(3, '2017-12-12', 2, 31),
(4, '2018-06-25', 5, 5)
) v(id, date, product_id, value)
), price_ranges AS (
select product_id,
daterange(date, lead(date) OVER (PARTITION BY product_id order by date), '[)') as pricerange,
value
FROM order_product_table
)
SELECT id,
date,
transfer_product_table.product_id,
price_ranges.value
FROM transfer_product_table
JOIN price_ranges ON price_ranges.product_id = transfer_product_table.product_id
AND date <# pricerange
ORDER BY id
;
id | date | product_id | value
----+------------+------------+-------
1 | 2017-08-01 | 2 | 53
2 | 2017-10-06 | 2 | 83
3 | 2017-12-12 | 2 | 83
4 | 2018-06-25 | 5 | 53
(4 rows)
Basically, we figure out the price at any given date by using the order_product_table. We get the price between the current date (inclusive) and the next date (exclusive) with this:
daterange(date, lead(date) OVER (PARTITION BY product_id order by date), '[)') as pricerange,
Then we simply join to this on the condition that the product_ids match and that date in the transfer_product_table is contained by the pricerange.

Add extra column in sql to show ratio with previous row

I have a SQL table with a format like this:
SELECT period_id, amount FROM table;
+--------------------+
| period_id | amount |
+-----------+--------+
| 1 | 12 |
| 2 | 11 |
| 3 | 15 |
| 4 | 20 |
| .. | .. |
+-----------+--------+
I'd like to add an extra column (just in my select statement) that calculates the growth ratio with the previous amount, like so:
SELECT period_id, amount, [insert formula here] AS growth FROM table;
+-----------------------------+
| period_id | amount | growth |
+-----------+-----------------+
| 1 | 12 | |
| 2 | 11 | 0.91 | <-- 11/12
| 3 | 15 | 1.36 | <-- 15/11
| 4 | 20 | 1.33 | <-- 20/15
| .. | .. | .. |
+-----------+-----------------+
Just need to work out how to perform the operation with the line before. Not interested in adding to the table. Any help appreciated :)
** also want to point out that period_id is in order but not necessarily increasing incrementally
The window function Lag() would be a good fit here.
You may notice that we use (amount+0.0). This is done just in case AMOUNT is an INT, and NullIf() to avoid the dreaded divide by zero
Declare #YourTable table (period_id int,amount int)
Insert Into #YourTable values
( 1,12),
( 2,11),
( 3,15),
( 4,20)
Select period_id
,amount
,growth = cast((amount+0.0) / NullIf(lag(amount,1) over (Order By Period_ID),0) as decimal(10,2))
From #YourTable
Returns
period_id amount growth
1 12 NULL
2 11 0.92
3 15 1.36
4 20 1.33
If you are using SQL Server 2012+ then go for John Cappelletti answer.
And If you are also less blessed like me then this below code work for you in the 2008 version too.
Declare #YourTable table (period_id int,amount int)
Insert Into #YourTable values
( 1,12),
( 2,11),
( 3,15),
( 4,20)
;WITH CTE AS (
SELECT ROW_NUMBER() OVER (
ORDER BY period_id
) SNO
,period_id
,amount
FROM #YourTable
)
SELECT C1.period_id
,C1.amount
,CASE
WHEN C2.amount IS NOT NULL AND C2.amount<>0
THEN CAST(C1.amount / CAST(C2.amount AS FLOAT) AS DECIMAL(18, 2))
END AS growth
FROM CTE C1
LEFT JOIN CTE C2 ON C1.SNO = C2.SNO + 1
Which works same as LAG.
+-----------+--------+--------+
| period_id | amount | growth |
+-----------+--------+--------+
| 1 | 12 | NULL |
| 2 | 11 | 0.92 |
| 3 | 15 | 1.36 |
| 4 | 20 | 1.33 |
+-----------+--------+--------+