First value in subsequents rows that match a condition - sql

order_at
delivery_at
2023-01-01
2023-01-03
2023-01-02
2023-01-03
2023-01-03
2023-01-05
2023-01-04
2023-01-05
I want a new field, next_delivery_at, which is the first delivery_at in subsequents rows for each delivery_at, that is not the same value as delivery_at so the final table would be:
order_at
delivery_at
next_delivery_at
2023-01-01
2023-01-03
2023-01-05
2023-01-02
2023-01-03
2023-01-05
2023-01-03
2023-01-05
null
2023-01-04
2023-01-05
null
For this specific case, I could do something like:
CASE
WHEN (LEAD(delivery_at) OVER (PARTITION BY NULL ORDER BY delivery_at DESC) = delivery_at)
THEN (LEAD(delivery_at, 2) OVER (PARTITION BY NULL ORDER BY delivery_at DESC))
ELSE LEAD(delivery_at) OVER (PARTITION BY NULL ORDER BY delivery_at DESC)
END AS next_delivery_at
But if there are more than two rows in a row with the same delivery_at, the output will be wrong, so I am looking for a generic way of getting the first value in subsequents rows for delivery_at that is distinct for each delivery_at value.

You can use a self join to match successive deliveries, then get the minimum next delivery.
SELECT t1.order_at, t1.delivery_at, MIN(t2.delivery_at) AS next_delivery_at
FROM tab t1
LEFT JOIN tab t2
ON t1.delivery_at < t2.delivery_at
GROUP BY t1.order_at, t1.delivery_at

You might consider below using a logical window frame RANGE
WITH sample_table AS (
SELECT '2023-01-01' order_at, '2023-01-03' delivery_at UNION ALL
SELECT '2023-01-02' order_at, '2023-01-03' delivery_at UNION ALL
SELECT '2023-01-03' order_at, '2023-01-05' delivery_at UNION ALL
SELECT '2023-01-04' order_at, '2023-01-05' delivery_at
)
SELECT *,
FIRST_VALUE(delivery_at) OVER (
ORDER BY UNIX_DATE(DATE(delivery_at))
RANGE BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING
) AS next_delivery_at
FROM sample_table;
Query results

Related

Generate dates without gap and vlookup 2 tables

I have 2 tables,
Table 1 - has the available principal balance on the transaction dates
=>(Transactions_Date, Principal balance)
Table 2 - has interest rates and the effective dates
=> ( Effective_Date, Interest)
I have to generate a result set, with the below data
Result_Date - List out all the dates from table1(Transaction_Date) by filling all the gaps between the available dates
Principle Balance - vlookup the Result_Dates in Table 1 and fill in the corresponding Principal_Balance until the next table1.transaction_date.
Interest - vlookup Result_date in Table 2.Effective_date and fill in all the Interest for that date range.
How to implement this in SNOWFLAKE?
Note: I can use CTE but can not create table
a couple of simple steps, find the min/max of dates for all tables, find the least/greatest of those values, spread across a generator range, and then use a row_number to feed into a dateadd, then left join this range of dates to the original data, and use nvl to pick between the this value, and the carried lag via the ignore nulls extension.
with table_1(t_date, p_bal) as (
select * from values
('2023-02-10'::date, 10000),
('2023-01-20'::date, 9000),
('2023-01-03'::date, 8000)
), table_2(e_date, interest) as (
select * from values
('2023-02-05'::date, 1),
('2023-01-15'::date, 2),
('2023-01-01'::date, 3)
), range_of_date as (
select dateadd(day, rn, min_date) as date
from (
select
least(min_t_date, min_e_date) as min_date
,greatest(max_t_date, max_e_date) as max_date
from (
select
min(t_date) as min_t_date
,max(t_date) as max_t_date
from table_1
) as t
cross join (
select
min(e_date) as min_e_date
,max(e_date) as max_e_date
from table_2
) as e
) as a
cross join (
select
row_number() over (order by null) - 1 as rn
from table(generator(ROWCOUNT => 1000))
) as b
having date <= max_date
)
select
*
,nvl(t1.p_bal,lag(t1.p_bal)ignore nulls over (order by rd.date)) as filled_p_bal
,nvl(t2.interest,lag(t2.interest)ignore nulls over(order by rd.date)) as filled_interest
from range_of_date as rd
left join table_1 as t1
on rd.date = t1.t_date
left join table_2 as t2
on rd.date = t2.e_date
order by 1;
gives:
DATE
T_DATE
P_BAL
E_DATE
INTEREST
FILLED_P_BAL
FILLED_INTEREST
2023-01-01
2023-01-01
3
3
2023-01-02
3
2023-01-03
2023-01-03
8,000
8,000
3
2023-01-04
8,000
3
2023-01-05
8,000
3
2023-01-06
8,000
3
2023-01-07
8,000
3
2023-01-08
8,000
3
2023-01-09
8,000
3
2023-01-10
8,000
3
2023-01-11
8,000
3
2023-01-12
8,000
3
2023-01-13
8,000
3
2023-01-14
8,000
3
2023-01-15
2023-01-15
2
8,000
2
2023-01-16
8,000
2
2023-01-17
8,000
2
2023-01-18
8,000
2
2023-01-19
8,000
2
2023-01-20
2023-01-20
9,000
9,000
2
2023-01-21
9,000
2
2023-01-22
9,000
2
2023-01-23
9,000
2
2023-01-24
9,000
2
2023-01-25
9,000
2
2023-01-26
9,000
2
2023-01-27
9,000
2
2023-01-28
9,000
2
2023-01-29
9,000
2
2023-01-30
9,000
2
2023-01-31
9,000
2
2023-02-01
9,000
2
2023-02-02
9,000
2
2023-02-03
9,000
2
2023-02-04
9,000
2
2023-02-05
2023-02-05
1
9,000
1
2023-02-06
9,000
1
2023-02-07
9,000
1
2023-02-08
9,000
1
2023-02-09
9,000
1
2023-02-10
2023-02-10
10,000
10,000
1
Take 2:
To make the data only have values in the range of the table 1 but carry interesting values from earlier, that are not aligned with the first balance value, we need the time range to be wider (it can be a more constrained match, but I have opted for simplicity to me).
with table_1(t_date, p_bal) as (
select * from values
('2023-02-10'::date, 10000),
('2023-01-20'::date, 9000),
('2023-01-03'::date, 8000)
), table_2(e_date, interest) as (
select * from values
('2023-02-05'::date, 1),
('2023-01-15'::date, 2),
('2023-01-01'::date, 3)
), range_1 as (
select
min(t_date) as min_t_date
,max(t_date) as max_t_date
from table_1
), range_of_date as (
select dateadd(day, rn, min_date) as date
from (
select
least(min_t_date, (select min(e_date) from table_2)) as min_date
,max_t_date as max_date
from range_1
) as a
cross join (
select
row_number() over (order by null) - 1 as rn
from table(generator(ROWCOUNT => 1000))
) as b
having date <= max_date
)
select
rd.date
,nvl(t1.p_bal,lag(t1.p_bal)ignore nulls over (order by rd.date)) as filled_p_bal
,nvl(t2.interest,lag(t2.interest)ignore nulls over(order by rd.date)) as filled_interest
from range_of_date as rd
left join table_1 as t1
on rd.date = t1.t_date
left join table_2 as t2
on rd.date = t2.e_date
qualify filled_p_bal is not null
order by 1;

How can I select records from the last value accumulated

I have the next data: TABLE_A
RegisteredDate
Quantity
2022-03-01 13:00
100
2022-03-01 13:10
20
2022-03-01 13:20
-80
2022-03-01 13:30
-40
2022-03-02 09:00
10
2022-03-02 22:00
-5
2022-03-03 02:00
-5
2022-03-03 03:00
25
2022-03-03 03:20
-10
If I add cumulative column
select RegisteredDate, Quantity
, sum(Quantity) over ( order by RegisteredDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as Summary
from TABLE_A
RegisteredDate
Quantity
Summary
2022-03-01 13:00
100
100
2022-03-01 13:10
20
120
2022-03-01 13:20
-80
40
2022-03-01 13:30
-40
0
2022-03-02 09:00
10
10
2022-03-02 22:00
-5
5
2022-03-03 02:00
-5
0
2022-03-03 03:00
25
25
2022-03-03 03:20
-10
15
Is there a way to get the following result with a query?
RegisteredDate
Quantity
Summary
2022-03-03 03:00
25
25
2022-03-03 03:20
-10
15
This result is the last records after the last zero.
EDIT:
Really for the solution to this problem I need the: 2022-03-03 03:00 is the first date of the last records after the last zero.
You can try to use SUM aggregate window function to calculation grp column which part represent to last value accumulated.
Query 1:
WITH cte AS
(
SELECT RegisteredDate,
Quantity,
sum(Quantity) over (order by RegisteredDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as Summary
FROM TABLE_A
), cte2 AS (
SELECT *,
SUM(CASE WHEN Summary = 0 THEN 1 ELSE 0 END) OVER(order by RegisteredDate desc) grp
FROM cte
)
SELECT RegisteredDate,
Quantity
FROM cte2
WHERE grp = 0
ORDER BY RegisteredDate
Results:
| RegisteredDate | Quantity |
|----------------------|----------|
| 2022-03-03T03:00:00Z | 25 |
| 2022-03-03T03:20:00Z | -10 |
Use a CTE that returns the summary column and NOT EXISTS to filter out the rows that you don't need:
WITH cte AS (SELECT *, SUM(Quantity) OVER (ORDER BY RegisteredDate) Summary FROM TABLE_A)
SELECT c1.*
FROM cte c1
WHERE NOT EXISTS (
SELECT 1
FROM cte c2 WHERE c2.RegisteredDate >= c1.RegisteredDate AND c2.Summary = 0
)
ORDER BY c1.RegisteredDate;
There is no need for ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW in the OVER clause of the window function, because this is the default behavior.
See the demo.
Try this:
with u as
(select RegisteredDate,
Quantity,
sum(Quantity) over (order by RegisteredDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as Summary
from TABLE_A)
select * from u
where RegisteredDate >= all(select RegisteredDate from u where Summary = 0)
and Summary <> 0;
Fiddle
Basically what you want is for RegisteredDate to be >= all RegisteredDatess where Summary = 0, and you want Summary <> 0.
When using window functions, it is necessary to take into account that RegisteredDate column is not unique in TABLE_A, so ordering only by RegisteredDate column is not enough to get a stable result on the same dataset.
With A As (
Select ROW_NUMBER() Over (Order by RegisteredDate, Quantity) As ID, RegisteredDate, Quantity
From TABLE_A),
B As (
Select A.*, SUM(Quantity) Over (Order by ID) As Summary
From A)
Select Top 1 *
From B
Where ID > (Select MAX(ID) From B Where Summary=0)
ID
RegisteredDate
Quantity
Summary
8
2022-03-03 03:00
25
25

SQL Dates Selection

I Have a OPL_Dates Table with Start Date and End Dates as Below:
dbo.OPL_Dates
ID Start_date End_date
--------------------------------------
12345 1975-01-01 2001-12-31
12345 1989-01-01 2004-12-31
12345 2005-01-01 NULL
12345 2007-01-01 NULL
12377 2009-06-01 2009-12-31
12377 2013-02-07 NULL
12377 2010-01-01 2012-01-01
12489 2011-12-31 NULL
12489 2012-03-01 2012-04-01
The Output I am looking for is:
ID Start_date End_date
-------------------------------------
12345 1975-01-01 2004-12-31
12345 2005-01-01 NULL
12377 2009-06-01 2009-12-31
12377 2010-01-01 2012-01-01
12377 2013-02-07 NULL
12489 2011-12-31 NULL
Basically, I want to show the gap between the OPL periods(IF Any) else I need min of Start Date and Max of End Dates, for a particular ID.NULL means Open-Ended Date which can be converted to "9999-12-31".
The following pretty much does what you want:
with p as (
select v.*, sum(inc) over (partition by v.id order by v.dte) as running_inc
from t cross apply
(values (id, start_date, 1),
(id, coalesce(end_date, '2999-12-31'), -1)
) v(id, dte, inc)
)
select id, min(dte), max(dte)
from (select p.*, sum(case when running_inc = 0 then 1 else 0 end) over (partition by id order by dte desc) as grp
from p
) p
group by id, grp;
Note that it changes the "inifinite" end date from NULL to 2999-12-31. This is a convenience, because NULL orders first in SQL Server ascending sorts.
Here is a SQL Fiddle.
What is this doing? It is unpivoting the dates into a single column, with a 1/-1 flag (inc) indicating whether the record is a start or end. The running sum of this flag then indicates the groups that should be combined. When the running sum is 0, then a group has ended. To include the end date in the right group, a reverse running sum is needed -- but that's a detail.

start date end date combine rows

In Redshift, through SQL script want to consolidate monthly records as long as gap between the end date of first and the start date of the next record is 32 days or less (<=32) into single record with minimum startdate of continuous month as output startdate and maximum of end date of continuous month as output enddate.
The below input data refers to the table's data and also listed the expected output. The input data is listed ORDER BY ID,STARTDT,ENDDT in ASC.
For example, in below table, consider ID 100, the gab between the end of the first record and start of the next record <=32, however gap between the second record end date and third records start date falls more than 32 days, hence the first two records to be consolidate into one record i.e. (ID),MIN(STARTSDT),MAX(ENDDT) which corresponds to first record in the expected output. Similarly gab between 3 and 4 record in the input data falls within the 32 days and thus these 2 records to be consolidated into single records which corresponds to the second record in the expected output.
INPUT DATA:
ID STARTDT ENDDT
100 2000-01-01 2000-01-31
100 2000-02-01 2000-02-29
100 2000-05-01 2000-05-31
100 2000-06-01 2000-06-30
100 2000-09-01 2000-09-30
100 2000-10-01 2000-10-31
101 2012-06-01 2012-06-30
101 2012-07-01 2012-07-31
102 2000-01-01 2000-01-31
103 2013-03-01 2013-03-31
103 2013-05-01 2013-05-31
EXPECTED OUTPUT:
ID MIN_STARTDT MAX_END_DT
100 2000-01-01 2000-02-29
100 2000-05-01 2000-06-30
100 2000-09-01 2000-10-31
101 2012-06-01 2012-07-31
102 2000-01-01 2000-01-31
103 2013-03-01 2013-03-31
103 2013-05-01 2013-05-31
You can do this in steps:
Use a join to identify where two adjacent records should be combined.
Then do a cumulative sum to assign all such adjacent records a grouping identifier.
Aggregate.
It looks like:
select id, min(startdt), max(enddte)
from (select t.*,
count(case when tprev.id is null then 1 else 0 end) over
(partition by t.idid
order by t.startdt
rows between unbounded preceding and current row
) as grp
from t left join
t tprev
on t.id = tprev.id and
t.startdt = tprev.enddt + interval '1 day'
) t
group by id, grp;
The question is very similar to this one and my answer is also similar: Fetch rows based on condition
The gist of the idea is to use Window Functions to identify transitions between period (events which are less than 33 days apart), and then do some filtering to remove the rows within the period, and then Window Functions again.
Complete solution:
SELECT
id,
startdt AS period_start,
period_end
FROM (
SELECT
id,
startdt,
enddt,
lead(enddt, 1)
OVER (PARTITION BY id
ORDER BY enddt) AS period_end,
period_boundary
FROM (
SELECT
id,
startdt,
enddt,
CASE WHEN period_switch = 0 AND reverse_period_switch = 1
THEN 'start'
ELSE 'end' END AS period_boundary
FROM (
SELECT
id,
startdt,
enddt,
CASE WHEN datediff(days, enddt, lead(startdt, 1)
OVER (PARTITION BY id
ORDER BY enddt ASC)) > 32
THEN 1
ELSE 0 END AS period_switch,
CASE WHEN datediff(days, lead(enddt, 1)
OVER (PARTITION BY id
ORDER BY enddt DESC), startdt) > 32
THEN 1
ELSE 0 END AS reverse_period_switch
FROM date_test
)
AS sessioned
WHERE period_switch != 0 OR reverse_period_switch != 0
UNION
SELECT -- adding start rows without transition
id,
startdt,
enddt,
'start'
FROM (
SELECT
id,
startdt,
enddt,
row_number()
OVER (PARTITION BY id
ORDER BY enddt ASC) AS row_num
FROM date_test
) AS with_row_number
WHERE row_num = 1
UNION
SELECT -- adding end rows without transition
id,
startdt,
enddt,
'end'
FROM (
SELECT
id,
startdt,
enddt,
row_number()
OVER (PARTITION BY id
ORDER BY enddt desc) AS row_num
FROM date_test
) AS with_row_number
WHERE row_num = 1
) AS with_boundary -- data set containing start/end boundaries
) AS with_end -- data set where end date is propagated into the start row of the period
WHERE period_boundary = 'start'
ORDER BY id, startdt ASC;
Note that in your expected output, you had a row for 103 2013-05-01 2013-05-31, however its start date is 31 days apart from end date of the previous row, so this row should instead be merged with the previous row for id 103 according to your requirements.
So the output that I get looks like this:
id start end
100 2000-01-01 2000-02-29
100 2000-05-01 2000-06-30
100 2000-09-01 2000-10-31
101 2012-06-01 2012-07-31
102 2000-01-01 2000-01-31
103 2013-03-01 2013-05-31

Find From/To Dates across multiple rows - SQL Postgres

I want to be able to "book" within range of dates, but you can't book across gaps of days. So booking across multiple rates is fine as long as they are contiguous.
I am happy to change data structure/index, if there are better ways of storing start/end ranges.
So far I have a "rates" table which contains Start/End Periods of time with a daily rate.
e.g. Rates Table.
ID Price From To
1 75.00 2015-04-12 2016-04-15
2 100.00 2016-04-16 2016-04-17
3 50.00 2016-04-18 2016-04-30
For the above data I would want to return:
From To
2015-04-12 2016-4-30
For simplicity sake it is safe to assume that dates are safely consecutive. For contiguous dates To is always 1 day before from.
For the case there is only 1 row, I would want it to return the From/To of that single row.
Also to clarify if I had the following data:
ID Price From To
1 75.00 2015-04-12 2016-04-15
2 100.00 2016-04-17 2016-04-18
3 50.00 2016-04-19 2016-04-30
4 50.00 2016-05-01 2016-05-21
Meaning where there is a gap >= 1 day it would count as a separate range.
In which case I would expect the following:
From To
2015-04-12 2016-04-15
2015-04-17 2016-05-21
Edit 1
After playing around I have come up with the following SQL which seems to work. Although I'm not sure if there are better ways/issues with it?
WITH grouped_rates AS
(SELECT
from_date,
to_date,
SUM(grp_start) OVER (ORDER BY from_date, to_date) group
FROM (SELECT
gite_id,
from_date,
to_date,
CASE WHEN (from_date - INTERVAL '1 DAY') = lag(to_date)
OVER (ORDER BY from_date, to_date)
THEN 0
ELSE 1
END grp_start
FROM rates
GROUP BY from_date, to_date) AS start_groups)
SELECT
min(from_date) from_date,
max(to_date) to_date
FROM grouped_rates
GROUP BY grp;
This is identifying contiguous overlapping groups in the data. One approach is to find where each group begins and then do a cumulative sum. The following query adds a flag indicating if a row starts a group:
select r.*,
(case when not exists (select 1
from rates r2
where r2.from < r.from and r2.to >= r.to or
(r2.from = r.from and r2.id < r.id)
)
then 1 else 0 end) as StartFlag
from rate r;
The or in the correlation condition is to handle the situation where intervals that define a group overlap on the start date for the interval.
You can then do a cumulative sum on this flag and aggregate by that sum:
with r as (
select r.*,
(case when not exists (select 1
from rates r2
where (r2.from < r.from and r2.to >= r.to) or
(r2.from = r.from and r2.id < r.id)
)
then 1 else 0 end) as StartFlag
from rate r
)
select min(from), max(to)
from (select r.*,
sum(r.StartFlag) over (order by r.from) as grp
from r
) r
group by grp;
CREATE TABLE prices( id INTEGER NOT NULL PRIMARY KEY
, price MONEY
, date_from DATE NOT NULL
, date_upto DATE NOT NULL
);
-- some data (upper limit is EXCLUSIVE)
INSERT INTO prices(id, price, date_from, date_upto) VALUES
( 1, 75.00, '2015-04-12', '2016-04-16' )
,( 2, 100.00, '2016-04-17', '2016-04-19' )
,( 3, 50.00, '2016-04-19', '2016-05-01' )
,( 4, 50.00, '2016-05-01', '2016-05-22' )
;
-- SELECT * FROM prices;
-- Recursive query to "connect the dots"
WITH RECURSIVE rrr AS (
SELECT date_from, date_upto
, 1 AS nperiod
FROM prices p0
WHERE NOT EXISTS (SELECT * FROM prices nx WHERE nx.date_upto = p0.date_from) -- no preceding segment
UNION ALL
SELECT r.date_from, p1.date_upto
, 1+r.nperiod AS nperiod
FROM prices p1
JOIN rrr r ON p1.date_from = r.date_upto
)
SELECT * FROM rrr r
WHERE NOT EXISTS (SELECT * FROM prices nx WHERE nx.date_from = r.date_upto) -- no following segment
;
Result:
date_from | date_upto | nperiod
------------+------------+---------
2015-04-12 | 2016-04-16 | 1
2016-04-17 | 2016-05-22 | 3
(2 rows)