get last value of the other partition in postgresql - sql

I got this SCD table:
start_date
end_date
partition
2022-03-08 15:35:09.856
2022-03-09 14:57:36.610
1
2022-03-09 14:57:36.610
2022-05-18 13:26:31.195
2
2022-05-18 13:26:31.195
2022-08-02 10:12:02.441
2
2022-08-02 10:12:02.441
2022-09-01 11:10:01.019
2
2022-09-01 11:10:01.019
2022-09-01 11:10:20.777
1
2022-09-01 11:10:20.777
2022-09-01 11:21:26.526
1
I would like to know for each partition the last value of start_date and end_date of the other partition (there are only two). for the given table:
start_date
end_date
partition
max_start_date
max_end_date
2022-03-08 15:35:09.856
2022-03-09 14:57:36.610
1
null
null
2022-03-09 14:57:36.610
2022-05-18 13:26:31.195
2
2022-03-08 15:35:09.856
2022-03-09 14:57:36.610
2022-05-18 13:26:31.195
2022-08-02 10:12:02.441
2
2022-03-08 15:35:09.856
2022-03-09 14:57:36.610
2022-08-02 10:12:02.441
2022-09-01 11:10:01.019
2
2022-03-08 15:35:09.856
2022-03-09 14:57:36.610
2022-09-01 11:10:01.019
2022-09-01 11:10:20.777
1
2022-08-02 10:12:02.441
2022-09-01 11:10:01.019
2022-09-01 11:10:20.777
2022-09-01 11:21:26.526
1
2022-08-02 10:12:02.441
2022-09-01 11:10:01.019
I tried some last_value window function and didn't made it. like this:
, last_value (start_date) OVER (partition by partition = '1' order by start_date asc) as last_start_date_partition
, last_value (end_date) OVER (partition by partition = '1' order by end_date asc) as last_end_date_partition
is there any option to inject a condition to window function and make it function that way?

Using dense_rank:
with cte as (
select (select sum((s1.start_date < s.start_date and s1.partition != s.partition)::int)
from scd s1) r, s.*
from scd s
),
n_part as (
select dense_rank() over (order by c.r) dr, c.* from cte c
)
select np.start_date, np.end_date, np.partition, max(np1.start_date), max(np1.end_date)
from n_part np left join n_part np1 on np1.dr = np.dr - 1
group by np.start_date, np.end_date, np.partition
order by np.start_date, np.end_date
See fiddle.

Using windows functions and gaps-and-islandish approach:
SELECT start_date,
end_date,
PARTITION,
max(start_date) OVER (ORDER BY grp RANGE UNBOUNDED PRECEDING EXCLUDE GROUP) max_start_date, -- use max value without current group
max(end_date) OVER (ORDER BY grp RANGE UNBOUNDED PRECEDING EXCLUDE GROUP) max_end_date -- use max value without current group
FROM
(SELECT start_date,
end_date,
PARTITION,
sum(lag) OVER (ORDER BY end_date) AS grp -- use cumulative sum to create a group
FROM
(SELECT *,
CASE
WHEN lag(PARTITION) OVER (ORDER BY end_date) != PARTITION THEN 1
ELSE 0
END lag -- use lag to determine if the partition has changed
FROM mytable) t) tt
Fiddle

You could do a self-left join and aggregate as the following:
select T.start_date, T.end_date, T.partition_,
max(D.start_date) max_start_date,
max(D.end_date) max_end_date
from SCD T left join SCD D
on T.start_date > D.start_date and
T.partition_ <> D.partition_
group by T.start_date, T.end_date, T.partition_
order by T.start_date
See demo

Related

SQL, rank for each instance of a partition

I am trying to to create a rank for each instance of a status occurring, for example
ID
Status
From_date
To_date
rank
1
Available
2022-01-01
2022-01-02
1
1
Available
2022-01-02
2022-01-03
1
1
Unavailable
2022-01-03
2022-01-10
2
1
Available
2022-01-10
2022-01-20
3
For each ID, for each instance of a status occurring, by from_date ascending.
I want to do this as i see this as the best way of getting to the final result i want which is
ID
Status
From_date
To_date
rank
1
Available
2022-01-01
2022-01-03
1
1
Unavailable
2022-01-03
2022-01-10
2
1
Available
2022-01-10
2022-01-20
3
I tried dense_rank(partition by id order by status, from_date but can see now why that wouldnt work. Not sure how to get to this result.
So with this CTE for the data:
with data(ID, Status, From_date, To_date) as (
select * from values
(1, 'Available', '2022-01-01', '2022-01-02'),
(1, 'Available', '2022-01-02', '2022-01-03'),
(1, 'Unavailable', '2022-01-03', '2022-01-10'),
(1, 'Available', '2022-01-10', '2022-01-20')
)
the first result, being rank can be done with CONDITIONAL_CHANGE_EVENT:
select *
,CONDITIONAL_CHANGE_EVENT( Status ) OVER ( PARTITION BY ID ORDER BY From_date ) as rank
from data;
ID
STATUS
FROM_DATE
TO_DATE
RANK
1
Available
2022-01-01
2022-01-02
0
1
Available
2022-01-02
2022-01-03
0
1
Unavailable
2022-01-03
2022-01-10
1
1
Available
2022-01-10
2022-01-20
2
and thus the keeps the first of each rank can be achieved with a QUALIFY/ROW_NUMBER, because the CONDITIONAL_CHANGE is a complex operation, needs wrapping in a sub-select, so the answer is not as short as I would like:
select * from (
select *
,CONDITIONAL_CHANGE_EVENT( Status ) OVER ( PARTITION BY ID ORDER BY From_date ) as rank
from data
)
qualify row_number() over(partition by id, rank ORDER BY From_date ) = 1
gives:
ID
STATUS
FROM_DATE
TO_DATE
RANK
1
Available
2022-01-01
2022-01-02
0
1
Unavailable
2022-01-03
2022-01-10
1
1
Available
2022-01-10
2022-01-20
2
Also, the final result minus the ranking can be done with:
select *
from data
qualify nvl(Status <> lag(status) over ( PARTITION BY ID ORDER BY From_date ), true)
ID
STATUS
FROM_DATE
TO_DATE
1
Available
2022-01-01
2022-01-02
1
Unavailable
2022-01-03
2022-01-10
1
Available
2022-01-10
2022-01-20
and thus a rank can be added at the end
select *
,rank() over ( PARTITION BY ID ORDER BY From_date ) as rank
from (
select *
from data
qualify nvl(Status <> lag(status) over ( PARTITION BY ID ORDER BY From_date ), true)
)
ID
STATUS
FROM_DATE
TO_DATE
RANK
1
Available
2022-01-01
2022-01-02
1
1
Unavailable
2022-01-03
2022-01-10
2
1
Available
2022-01-10
2022-01-20
3
This is a typical gaps-and-island problem, where islands are groups of consecutive rows that have the same status.
Here is one way to solve it with window functions:
select id, status,
min(from_date) from_date, max(to_date) to_date,
row_number() over (partition by id order by min(from_date)) rn
from (
select t.*,
row_number() over (partition by id order by from_date) rn1,
row_number() over (partition by id, status order by from_date) rn2
from mytable t
) t
group by id, status, rn1 - rn2
order by min(from_date)
This worked by ranking rows within two different partitions (with a without the status) ; the difference between the row numbers define the islands.
You can group consecutive status using conditional_change_event, then collapse the dates using min and max, and finally use row_number() to rank the events
with cte as
(select *,conditional_change_event(status) over (partition by id order by from_date) as rn
from t)
select id,
status,
min(from_date) as from_date,
max(to_date) as to_date,
row_number() over (partition by id, order by min(from_date), max(to_date)) as rank
from cte
group by id, status, rn
order by rank

Postgres lag/lead Function filter

Can someone help me with the last step of my Query.
I have this table fiddle
CREATE TABLE rent(id integer,start_date date, end_date date,objekt_id integer,person_id integer);
INSERT INTO rent VALUES
(1, '2011-10-01','2015-10-31',5156,18268),
(2, '2015-11-01','2018-04-30',5156,18268),
(3, '2018-05-01','2021-03-31',5156,18269),
(4, '2021-04-01','2021-05-15',5156,null),
(5, '2021-05-16','2100-01-01',5156,18270),
(6, '2021-03-14','2021-05-15',5160,18270),
(7, '2021-05-16','2100-01-01',5160,18271);
With lag and lead i want two columns for last person_id and next person_id.
With this Query i almost solved my Problem but there is still one thing i need help to change.
with tbl as (
SELECT rent.*,
row_number() over (PARTITION BY objekt_id) as row_id
FROM rent
ORDER BY id)
SELECT r.id,
r.start_date,
r.end_date,
r.objekt_id,
r.person_id,
lag(person_id) over (PARTITION BY objekt_id, person_id IS NOT NULL AND objekt_id IS NOT NULL ORDER BY id) as last_person,
lead(person_id) over (PARTITION BY objekt_id, person_id IS NOT NULL AND objekt_id IS NOT NULL ORDER BY id) as next_person
FROM tbl r
order by 1;
Last or Next Person_id always have to either null or from another person_id.
At the moment row 2 will give me last_person_id = 18268 since row 1 had the same person_id. If person_id is empty i also want to see last and next person.
Output now:
id start_date end_date objekt_id person_id last_person next_person
1 2011-10-01 2015-10-31 5156 18268 18268
2 2015-11-01 2018-04-30 5156 18268 18268 18269
3 2018-05-01 2021-03-31 5156 18269 18268 18270
4 2021-04-01 2021-05-15 5156
5 2021-05-16 2100-01-01 5156 18270 18269
6 2021-03-14 2021-05-15 5160 18270 18271
7 2021-05-16 2100-01-01 5160 18271 18270
Wished Output:
id start_date end_date objekt_id person_id last_person next_person
1 2011-10-01 2015-10-31 5156 18268 18269
2 2015-11-01 2018-04-30 5156 18268 18269
3 2018-05-01 2021-03-31 5156 18269 18268 18270
4 2021-04-01 2021-05-15 5156 18269 18270
5 2021-05-16 2100-01-01 5156 18270 18269
6 2021-03-14 2021-05-15 5160 18270 18271
7 2021-05-16 2100-01-01 5160 18271 18270
The goal with query is to choose a specific date and to tell if the object is for rent or not and then also show who rent's it at and who was the last one and is there someone in line to rent
You can try to use correlated-subquery to make it by your logic condition.
with tbl as (
SELECT rent.*,
row_number() over (PARTITION BY objekt_id) as row_id
FROM rent
ORDER BY id)
SELECT r.id,
r.start_date,
r.end_date,
r.objekt_id,
r.person_id,
( SELECT t1.person_id
FROM tbl t1
WHERE t1.objekt_id = r.objekt_id
AND t1.id < r.id
AND (t1.person_id <> r.person_id OR r.person_id IS NULL)
AND t1.person_id IS NOT NULL
ORDER BY t1.id desc
LIMIT 1) last_person,
(SELECT t1.person_id
FROM tbl t1
WHERE t1.objekt_id = r.objekt_id
AND t1.id > r.id
AND (t1.person_id <> r.person_id OR r.person_id IS NULL)
AND t1.person_id IS NOT NULL
ORDER BY t1.id
LIMIT 1) next_person
FROM tbl r
order by 1;
sqlfiddle
It is possible with window functions, but while I'm on my phone I'm struggling to work out a concise answer as PostGreSQL doesn't have IGNORE NULLS.
For now, here's a clunky answer...
with
tbl as
(
-- From your question, but fixed by moving the `ORDER BY` into the window function
SELECT
rent.*,
row_number() over (PARTITION BY objekt_id ORDER BY start_date) as row_id
FROM
rent
),
lag_lead AS
(
-- do a naive lag and lead, not yet trying to account for nulls
-- if the result is the same as the current row, replace with NULL
-- (thus only identifying lag/lead values where the's a change)
SELECT
*,
NULLIF(LAG( person_id) over (PARTITION BY objekt_id ORDER BY start_date), person_id) AS last_person,
NULLIF(LEAD(person_id) over (PARTITION BY objekt_id ORDER BY start_date), person_id) AS next_person
FROM
tbl
),
identify_partitions AS
(
-- create groups of rows where the results should be the same
SELECT
*,
COUNT(new_last_person) OVER (PARTITION BY objekt_id ORDER BY start_date ASC) AS last_person_partition,
COUNT(new_next_person) OVER (PARTITION BY objekt_id ORDER BY start_date DESC) AS next_person_partition
FROM
lag_lead
)
SELECT
*,
MAX(new_last_person) OVER (PARTITION BY objekt_id, last_person_partition) AS real_last_person,
MAX(new_next_person) OVER (PARTITION BY objekt_id, next_person_partition) AS real_next_person
FROM
identify_partitions
ORDER BY
1;
https://dbfiddle.uk/?rdbms=postgres_10&fiddle=b613f88a730cfddcef4efb612b6e236c
In that example I've amended your data slightly, to demonstrate the behaviour if the person_id transitions from X to NULL and back to X.
If you require different behaviour, please comment

PostgreSQL ROW_NUMBER with timestamp conditions

I'm trying to extend PARTITION BY to keep rows in same partition if ts_created of current row is within 1hour of previous row.
SELECT t1.id,
t1.user_email,
t1.ts_created,
t1.prev_ts
row_number() OVER (PARTITION BY t1.user_email ORDER BY t1.ts_created DESC) AS time_order
FROM (SELECT id,
user_email,
ts_created,
lag(ts_created) OVER(PARTITION BY user_email ORDER BY ts_created DESC) AS prev_ts
FROM table1) AS t1 ORDER BY t1.ts_created DESC;
So far i'm doing partition over user_email and prepared timestamp of previous row, now i'm abit lost on how to handle time component between current and previous row.
expectation
id
user_email
ts_created
time_order
6
mailA
2022-01-01 07:30:00.000
1
5
mailA
2022-01-01 06:40:00.000
2
4
mailA
2022-01-01 05:50:00.000
3
3
mailA
2022-01-01 05:00:00.000
4
2
mailA
2022-01-01 03:50:00.000
1
1
mailB
2021-01-01 03:30:00.000
1

Find the start and end date of stock difference

Please Suggest good sql query to find the start and end date of stock difference
imagine i data in a table like below.
Sample_table
transaction_date stock
2018-12-01 10
2018-12-02 10
2018-12-03 20
2018-12-04 20
2018-12-05 20
2018-12-06 20
2018-12-07 20
2018-12-08 10
2018-12-09 10
2018-12-10 30
Expected result should be
Start_date end_date stock
2018-12-01 2018-12-02 10
2018-12-03 2018-12-07 20
2018-12-08 2018-12-09 10
2018-12-10 null 30
It is the gap and island problem. You may use row_numer and group by for this.
select t.stock, min(transaction_date), max(transaction_date)
from (
select row_number() over (order by transaction_date) -
row_number() over (partition by stock order by transaction_date) grp,
transaction_date,
stock
from data
) t
group by t.grp, t.stock
In the following DBFIDDLE DEMO I solve also the null value of the last group, but the main idea of finding consecutive rows is build on the above query.
You may check this for an explanation of this solution.
You can try below using row_number()
select stock,min(transaction_date) as start_date,
case when min(transaction_date)=max(transaction_date) then null else max(transaction_date) end as end_date
from
(
select *,row_number() over(order by transaction_date)-
row_number() over(partition by stock order by transaction_date) as rn
from t1
)A group by stock,rn
Try to use GROUP BY with MIN and MAX:
SELECT
stock,
MIN(transaction_date) Start_date,
CASE WHEN COUNT(*)>1 THEN MAX(transaction_date) END end_date
FROM Sample_table
GROUP BY stock
ORDER BY stock
You can try with LEAD, LAG functions as below:
select currentStockDate as startDate,
LEAD(currentStockDate,1) as EndDate,
currentStock
from
(select *
from
(select
LAG(transaction_date,1) over(order by transaction_date) as prevStockDate,
transaction_date as CurrentstockDate,
LAG(stock,1) over(order by transaction_date) as prevStock,
stock as currentStock
from sample_table) as t
where (prevStock <> currentStock) or (prevStock is null)
) as t2

SQL Dates Selection

I Have a OPL_Dates Table with Start Date and End Dates as Below:
dbo.OPL_Dates
ID Start_date End_date
--------------------------------------
12345 1975-01-01 2001-12-31
12345 1989-01-01 2004-12-31
12345 2005-01-01 NULL
12345 2007-01-01 NULL
12377 2009-06-01 2009-12-31
12377 2013-02-07 NULL
12377 2010-01-01 2012-01-01
12489 2011-12-31 NULL
12489 2012-03-01 2012-04-01
The Output I am looking for is:
ID Start_date End_date
-------------------------------------
12345 1975-01-01 2004-12-31
12345 2005-01-01 NULL
12377 2009-06-01 2009-12-31
12377 2010-01-01 2012-01-01
12377 2013-02-07 NULL
12489 2011-12-31 NULL
Basically, I want to show the gap between the OPL periods(IF Any) else I need min of Start Date and Max of End Dates, for a particular ID.NULL means Open-Ended Date which can be converted to "9999-12-31".
The following pretty much does what you want:
with p as (
select v.*, sum(inc) over (partition by v.id order by v.dte) as running_inc
from t cross apply
(values (id, start_date, 1),
(id, coalesce(end_date, '2999-12-31'), -1)
) v(id, dte, inc)
)
select id, min(dte), max(dte)
from (select p.*, sum(case when running_inc = 0 then 1 else 0 end) over (partition by id order by dte desc) as grp
from p
) p
group by id, grp;
Note that it changes the "inifinite" end date from NULL to 2999-12-31. This is a convenience, because NULL orders first in SQL Server ascending sorts.
Here is a SQL Fiddle.
What is this doing? It is unpivoting the dates into a single column, with a 1/-1 flag (inc) indicating whether the record is a start or end. The running sum of this flag then indicates the groups that should be combined. When the running sum is 0, then a group has ended. To include the end date in the right group, a reverse running sum is needed -- but that's a detail.