Postgres lag/lead Function filter - sql

Can someone help me with the last step of my Query.
I have this table fiddle
CREATE TABLE rent(id integer,start_date date, end_date date,objekt_id integer,person_id integer);
INSERT INTO rent VALUES
(1, '2011-10-01','2015-10-31',5156,18268),
(2, '2015-11-01','2018-04-30',5156,18268),
(3, '2018-05-01','2021-03-31',5156,18269),
(4, '2021-04-01','2021-05-15',5156,null),
(5, '2021-05-16','2100-01-01',5156,18270),
(6, '2021-03-14','2021-05-15',5160,18270),
(7, '2021-05-16','2100-01-01',5160,18271);
With lag and lead i want two columns for last person_id and next person_id.
With this Query i almost solved my Problem but there is still one thing i need help to change.
with tbl as (
SELECT rent.*,
row_number() over (PARTITION BY objekt_id) as row_id
FROM rent
ORDER BY id)
SELECT r.id,
r.start_date,
r.end_date,
r.objekt_id,
r.person_id,
lag(person_id) over (PARTITION BY objekt_id, person_id IS NOT NULL AND objekt_id IS NOT NULL ORDER BY id) as last_person,
lead(person_id) over (PARTITION BY objekt_id, person_id IS NOT NULL AND objekt_id IS NOT NULL ORDER BY id) as next_person
FROM tbl r
order by 1;
Last or Next Person_id always have to either null or from another person_id.
At the moment row 2 will give me last_person_id = 18268 since row 1 had the same person_id. If person_id is empty i also want to see last and next person.
Output now:
id start_date end_date objekt_id person_id last_person next_person
1 2011-10-01 2015-10-31 5156 18268 18268
2 2015-11-01 2018-04-30 5156 18268 18268 18269
3 2018-05-01 2021-03-31 5156 18269 18268 18270
4 2021-04-01 2021-05-15 5156
5 2021-05-16 2100-01-01 5156 18270 18269
6 2021-03-14 2021-05-15 5160 18270 18271
7 2021-05-16 2100-01-01 5160 18271 18270
Wished Output:
id start_date end_date objekt_id person_id last_person next_person
1 2011-10-01 2015-10-31 5156 18268 18269
2 2015-11-01 2018-04-30 5156 18268 18269
3 2018-05-01 2021-03-31 5156 18269 18268 18270
4 2021-04-01 2021-05-15 5156 18269 18270
5 2021-05-16 2100-01-01 5156 18270 18269
6 2021-03-14 2021-05-15 5160 18270 18271
7 2021-05-16 2100-01-01 5160 18271 18270
The goal with query is to choose a specific date and to tell if the object is for rent or not and then also show who rent's it at and who was the last one and is there someone in line to rent

You can try to use correlated-subquery to make it by your logic condition.
with tbl as (
SELECT rent.*,
row_number() over (PARTITION BY objekt_id) as row_id
FROM rent
ORDER BY id)
SELECT r.id,
r.start_date,
r.end_date,
r.objekt_id,
r.person_id,
( SELECT t1.person_id
FROM tbl t1
WHERE t1.objekt_id = r.objekt_id
AND t1.id < r.id
AND (t1.person_id <> r.person_id OR r.person_id IS NULL)
AND t1.person_id IS NOT NULL
ORDER BY t1.id desc
LIMIT 1) last_person,
(SELECT t1.person_id
FROM tbl t1
WHERE t1.objekt_id = r.objekt_id
AND t1.id > r.id
AND (t1.person_id <> r.person_id OR r.person_id IS NULL)
AND t1.person_id IS NOT NULL
ORDER BY t1.id
LIMIT 1) next_person
FROM tbl r
order by 1;
sqlfiddle

It is possible with window functions, but while I'm on my phone I'm struggling to work out a concise answer as PostGreSQL doesn't have IGNORE NULLS.
For now, here's a clunky answer...
with
tbl as
(
-- From your question, but fixed by moving the `ORDER BY` into the window function
SELECT
rent.*,
row_number() over (PARTITION BY objekt_id ORDER BY start_date) as row_id
FROM
rent
),
lag_lead AS
(
-- do a naive lag and lead, not yet trying to account for nulls
-- if the result is the same as the current row, replace with NULL
-- (thus only identifying lag/lead values where the's a change)
SELECT
*,
NULLIF(LAG( person_id) over (PARTITION BY objekt_id ORDER BY start_date), person_id) AS last_person,
NULLIF(LEAD(person_id) over (PARTITION BY objekt_id ORDER BY start_date), person_id) AS next_person
FROM
tbl
),
identify_partitions AS
(
-- create groups of rows where the results should be the same
SELECT
*,
COUNT(new_last_person) OVER (PARTITION BY objekt_id ORDER BY start_date ASC) AS last_person_partition,
COUNT(new_next_person) OVER (PARTITION BY objekt_id ORDER BY start_date DESC) AS next_person_partition
FROM
lag_lead
)
SELECT
*,
MAX(new_last_person) OVER (PARTITION BY objekt_id, last_person_partition) AS real_last_person,
MAX(new_next_person) OVER (PARTITION BY objekt_id, next_person_partition) AS real_next_person
FROM
identify_partitions
ORDER BY
1;
https://dbfiddle.uk/?rdbms=postgres_10&fiddle=b613f88a730cfddcef4efb612b6e236c
In that example I've amended your data slightly, to demonstrate the behaviour if the person_id transitions from X to NULL and back to X.
If you require different behaviour, please comment

Related

SQL: Deduce Effective Pricing from overlapping Dates

I have pricing record with overlapping dates. On few dates there are more than one overlapping prices. Please follow the example below:
Example on 2022/02/15 there are 2 prices 10 and 8 .
article
price
startdate
enddate
123
10
2022/02/02
2049/12/31
123
8
2022/02/14
2022/09/14
123
5
2022/03/14
2022/04/06
123
4
2022/04/11
2022/04/27
I want to apply the effective price for date ranges like below and avoid conflicting prices in the output.
article
price
startdate
enddate
123
10
2022/02/02
2022/02/13
123
8
2022/02/14
2022/03/13
123
5
2022/03/14
2022/04/06
123
8
2022/04/07
2022/04/10
123
4
2022/04/11
2022/04/27
123
8
2022/04/28
2022/09/14
123
10
2022/09/15
2049/12/31
I can think of window functions to adjust the end dates and prices, but I cannot wrap my head around the problem completely to get the complete solution. Any suggestion/solution is appreciated.
Database: Snowflake
Thank you
Using the logic of new starting price window wins for overlaps.
Discreate Date version:
with data(article,price,startdate,enddate) as (
select * FROM VALUES
(123, 10, '2022-02-02'::date, '2049-12-31'::date),
(123, 8, '2022-02-14'::date, '2022-09-14'::date),
(123, 5, '2022-03-14'::date, '2022-04-06'::date),
(123, 4, '2022-04-11'::date, '2022-04-27'::date)
), dis_times as (
select article,
date as startdate,
lead(date) over(partition by article order by date)-1 as enddate
from (
select distinct article, startdate as date from data
union
select distinct article, enddate+1 as date from data
)
qualify enddate is not null
)
select
d1.article,
d1.price,
d2.startdate,
d2.enddate
from data as d1
join dis_times as d2
on d1.article = d2.article
and d2.startdate between d1.startdate and d1.enddate qualify row_number() over (partition by d1.article, s_startdate order by d1.startdate desc) = 1
order by 1,3;
gives:
ARTICLE
PRICE
S_STARTDATE
S_ENDDATE
123
10
2022-02-02
2022-02-13
123
8
2022-02-14
2022-03-13
123
5
2022-03-14
2022-04-06
123
8
2022-04-07
2022-04-10
123
4
2022-04-11
2022-04-27
123
8
2022-04-28
2022-09-14
123
10
2022-09-15
2049-12-31
Continuous Timestamp version:
with data(article,price,startdate,enddate) as (
select * FROM VALUES
(123, 10, '2022-02-02'::date, '2049-12-31'::date),
(123, 8, '2022-02-14'::date, '2022-09-14'::date),
(123, 5, '2022-03-14'::date, '2022-04-06'::date),
(123, 4, '2022-04-11'::date, '2022-04-27'::date)
), dis_times as (
select article,
date as startdate,
lead(date) over(partition by article order by date) as enddate
from (
select distinct article, startdate as date from data
union
select distinct article, enddate as date from data
)
qualify enddate is not null
)
select
d1.article,
d1.price,
d2.startdate,
d2.enddate
from data as d1
join dis_times as d2
on d1.article = d2.article
and d2.startdate >= d1.startdate and d2.startdate < d1.enddate
qualify row_number() over (partition by d1.article, s_startdate order by d1.startdate desc) = 1
order by 1,3;
which gives:
ARTICLE
PRICE
S_STARTDATE
S_ENDDATE
123
10
2022-02-02
2022-02-14
123
8
2022-02-14
2022-03-14
123
5
2022-03-14
2022-04-06
123
8
2022-04-06
2022-04-11
123
4
2022-04-11
2022-04-27
123
8
2022-04-27
2022-09-14
123
10
2022-09-14
2049-12-31
Thanks to MatBailie for the tighter join suggestion.
join dis_times as d2
on d1.article = d2.article
and d2.startdate between d1.startdate and d1.enddate
the continuous range I would normally do in this for
and d2.startdate between d1.startdate and d1.enddate and d2.startdate < d1.enddate
instead of this form
and d2.startdate >= d1.startdate and d2.startdate < d1.enddate
because I in experience it performed better. always test your complexities.
First thing I did was --I turned your price-per-date range data into a price-per-date lookup table.
create or replace temporary table price_date_lookup as
select distinct
article,
dateadd('day',b.index-1,start_date) as dates,
first_value(price) over (partition by article, dates order by end_date) as price
from my_table,
lateral split_to_table(repeat('.',datediff(day,start_date,end_date)), '.') b;
Notes:
first_value handles overlaps by overriding prices based on their end dates.
lateral... basically helps create a date column with all the days in the range
As soon as I created that table, I figured the rest could be approached like a gaps and island problem.
with cte1 as
(select *, case when lag(price) over (partition by article order by dates)=price then 0 else 1 end as price_start --flag start of a new price island
from price_date_lookup),
cte2 as
(select *, sum(price_start) over (partition by article order by dates) as price_id --assign id to all the price islands
from cte1)
select article,
price,
min(dates) as start_date,
max(dates) as end_date
from cte2
group by article,price,price_id;

PostgreSQL ROW_NUMBER with timestamp conditions

I'm trying to extend PARTITION BY to keep rows in same partition if ts_created of current row is within 1hour of previous row.
SELECT t1.id,
t1.user_email,
t1.ts_created,
t1.prev_ts
row_number() OVER (PARTITION BY t1.user_email ORDER BY t1.ts_created DESC) AS time_order
FROM (SELECT id,
user_email,
ts_created,
lag(ts_created) OVER(PARTITION BY user_email ORDER BY ts_created DESC) AS prev_ts
FROM table1) AS t1 ORDER BY t1.ts_created DESC;
So far i'm doing partition over user_email and prepared timestamp of previous row, now i'm abit lost on how to handle time component between current and previous row.
expectation
id
user_email
ts_created
time_order
6
mailA
2022-01-01 07:30:00.000
1
5
mailA
2022-01-01 06:40:00.000
2
4
mailA
2022-01-01 05:50:00.000
3
3
mailA
2022-01-01 05:00:00.000
4
2
mailA
2022-01-01 03:50:00.000
1
1
mailB
2021-01-01 03:30:00.000
1

Get max date for each from either of 2 columns

I have a table like below
AID BID CDate
-----------------------------------------------------
1 2 2018-11-01 00:00:00.000
8 1 2018-11-08 00:00:00.000
1 3 2018-11-09 00:00:00.000
7 1 2018-11-15 00:00:00.000
6 1 2018-12-24 00:00:00.000
2 5 2018-11-02 00:00:00.000
2 7 2018-12-15 00:00:00.000
And I am trying to get a result set as follows
ID MaxDate
-------------------
1 2018-12-24 00:00:00.000
2 2018-12-15 00:00:00.000
Each value in the id columns(AID,BID) should return the max of CDate .
ex: in the case of 1, its max CDate is 2018-12-24 00:00:00.000 (here 1 appears under BID)
in the case of 2 , max date is 2018-12-15 00:00:00.000 . (here 2 is under AID)
I tried the following.
1.
select
g.AID,g.BID,
max(g.CDate) as 'LastDate'
from dbo.TT g
inner join
(select AID,BID,max(CDate) as maxdate
from dbo.TT
group by AID,BID)a
on (a.AID=g.AID or a.BID=g.BID)
and a.maxdate=g.CDate
group by g.AID,g.BID
and 2.
SELECT
AID,
CDate
FROM (
SELECT
*,
max_date = MAX(CDate) OVER (PARTITION BY [AID])
FROM dbo.TT
) AS s
WHERE CDate= max_date
Please suggest a 3rd solution.
You can assemble the data in a table expression first, and the compute the max for each value is simple. For example:
select
id, max(cdate)
from (
select aid as id, cdate from t
union all
select bid, cdate from t
) x
group by id
You seem to only care about values that are in both columns. If this interpretation is correct, then:
select id, max(cdate)
from ((select aid as id, cdate, 1 as is_a, 0 as is_b
from t
) union all
(select bid as id, cdate, 1 as is_a, 0 as is_b
from t
)
) ab
group by id
having max(is_a) = 1 and max(is_b) = 1;

For multiple rows with some identical fields, keep the one with updated values, and mark the others

For multiple rows with identical features, I hope two add few marks/new columns in the original table.
The original table is as below:
ID Start_date End_Date Amount
1 2005-01-01 2010-01-01 5
1 2000-07-01 2009-06-01 10
1 2017-08-01 2018-03-01 30
I wish to keep one record with the earliest start date, latest end date, added amount and an indicator to tell me to use this record. For the others, just use the indicator to tell me not to use.
The updated table should be as below:
ID Start_date End_Date Amount Amount_new Usable Start End
1 2005-01-01 2010-01-01 5 45 0 2000-07-01 2018-03-01
1 2000-07-01 2009-06-01 10 1
1 2017-08-01 2018-03-01 30 1
It does not matter which row to keep, as long as there is one row with Usable=0, and Amount_new, Start and End are updated.
If not considering the end date, I was thinking of grouping by ID and Start_date, then update the column Usable and Amount_new of the first row. However I still have the problem of how to select the first row from the group by group. Considering the End_Date makes my mind even more messy!
Could anyone help to shed some light upon this issue?
You seem to want something like this:
alter table original
add amount_new int,
add usable bit,
add new_start,
add new_end;
Then, you can update it using window functions:
with toupdate as (
select o.*,
sum(amount) over (partition by id) as x_amount,
(case when row_number() over (partition by id order by start_date) as x_usable,
min(start_date) as x_start_date,
max(end_date) as x_end_date
from original o
)
update toupdate
set new_amount = x_amount,
usable = x_usable,
new_start = x_start_date,
new_end = x_end_date;
The following query should do what you want:
CREATE TABLE #temp (ID INT, [Start_date] DATE, End_Date DATE, Amount NUMERIC(28,0), Amount_new NUMERIC(28,0), Usable BIT, Start [Date], [End] [Date])
INSERT INTO #temp (ID, [Start_date], End_Date, Amount) VALUES
(1,'2005-01-01','2010-01-01',5),
(1,'2000-07-01','2009-06-01',10),
(1,'2017-08-01','2018-03-01',30),
(2,'2001-07-01','2009-06-01',5),
(2,'2017-08-01','2019-03-01',35)
UPDATE t1
SET Amount_new = t2.[Amount_new],
Usable = 1,
Start = t2.[Start],
[End] = t2.[End]
FROM (SELECT *,ROW_NUMBER() OVER (PARTITION BY ID ORDER BY (SELECT 1)) AS RNO FROM #temp) t1
INNER JOIN
(
SELECT ID,[Start_date],[End_Date],[Amount]
,SUM(Amount) OVER(PARTITION BY ID) AS [Amount_new]
,MIN([Start_date]) OVER(PARTITION BY ID) AS [Start]
,MAX(End_Date) OVER(PARTITION BY ID) AS [End]
,ROW_NUMBER() OVER (PARTITION BY ID ORDER BY (SELECT 1)) AS RNO
FROM #temp ) t2 ON t1.id = t2.id AND t2.rno = t1.RNO AND t2.RNO = 1
SELECT * FROM #temp
The result is as below,
ID Start_date End_Date Amount Amount_new Usable Start End
1 2005-01-01 2010-01-01 5 45 1 2000-07-01 2018-03-01
1 2000-07-01 2009-06-01 10 NULL NULL NULL NULL
1 2017-08-01 2018-03-01 30 NULL NULL NULL NULL
2 2001-07-01 2009-06-01 5 40 1 2001-07-01 2019-03-01
2 2017-08-01 2019-03-01 35 NULL NULL NULL NULL

SQL Dates Selection

I Have a OPL_Dates Table with Start Date and End Dates as Below:
dbo.OPL_Dates
ID Start_date End_date
--------------------------------------
12345 1975-01-01 2001-12-31
12345 1989-01-01 2004-12-31
12345 2005-01-01 NULL
12345 2007-01-01 NULL
12377 2009-06-01 2009-12-31
12377 2013-02-07 NULL
12377 2010-01-01 2012-01-01
12489 2011-12-31 NULL
12489 2012-03-01 2012-04-01
The Output I am looking for is:
ID Start_date End_date
-------------------------------------
12345 1975-01-01 2004-12-31
12345 2005-01-01 NULL
12377 2009-06-01 2009-12-31
12377 2010-01-01 2012-01-01
12377 2013-02-07 NULL
12489 2011-12-31 NULL
Basically, I want to show the gap between the OPL periods(IF Any) else I need min of Start Date and Max of End Dates, for a particular ID.NULL means Open-Ended Date which can be converted to "9999-12-31".
The following pretty much does what you want:
with p as (
select v.*, sum(inc) over (partition by v.id order by v.dte) as running_inc
from t cross apply
(values (id, start_date, 1),
(id, coalesce(end_date, '2999-12-31'), -1)
) v(id, dte, inc)
)
select id, min(dte), max(dte)
from (select p.*, sum(case when running_inc = 0 then 1 else 0 end) over (partition by id order by dte desc) as grp
from p
) p
group by id, grp;
Note that it changes the "inifinite" end date from NULL to 2999-12-31. This is a convenience, because NULL orders first in SQL Server ascending sorts.
Here is a SQL Fiddle.
What is this doing? It is unpivoting the dates into a single column, with a 1/-1 flag (inc) indicating whether the record is a start or end. The running sum of this flag then indicates the groups that should be combined. When the running sum is 0, then a group has ended. To include the end date in the right group, a reverse running sum is needed -- but that's a detail.