PostgreSQL ROW_NUMBER with timestamp conditions - sql

I'm trying to extend PARTITION BY to keep rows in same partition if ts_created of current row is within 1hour of previous row.
SELECT t1.id,
t1.user_email,
t1.ts_created,
t1.prev_ts
row_number() OVER (PARTITION BY t1.user_email ORDER BY t1.ts_created DESC) AS time_order
FROM (SELECT id,
user_email,
ts_created,
lag(ts_created) OVER(PARTITION BY user_email ORDER BY ts_created DESC) AS prev_ts
FROM table1) AS t1 ORDER BY t1.ts_created DESC;
So far i'm doing partition over user_email and prepared timestamp of previous row, now i'm abit lost on how to handle time component between current and previous row.
expectation
id
user_email
ts_created
time_order
6
mailA
2022-01-01 07:30:00.000
1
5
mailA
2022-01-01 06:40:00.000
2
4
mailA
2022-01-01 05:50:00.000
3
3
mailA
2022-01-01 05:00:00.000
4
2
mailA
2022-01-01 03:50:00.000
1
1
mailB
2021-01-01 03:30:00.000
1

Related

Delete all rows except first N rows within each time interval group

I have a table:
ID
DateTime
1
2022-01-30 01:02:03
1
2022-01-30 01:34:03
1
2022-01-30 02:59:03
2
2022-01-30 01:02:03
2
2022-01-30 01:34:03
2
2022-01-30 02:59:03
And I would like to delete all the rows except for 1 every hour for each unique ID. So the resulting table would look like:
ID
DateTime
1
2022-01-30 01:02:03
1
2022-01-30 02:59:03
2
2022-01-30 01:02:03
2
2022-01-30 02:59:03
You can use a cte (they could be used for delete) and window functions:
with cte as (
select *, row_number() over (
partition by id, cast(datetime as date), datepart(hour, datetime)
order by datetime
) as rn
from t
)
select * -- delete
from cte
where rn > 1
Change select * to delete once you're sure that the query contains the correct rows.

SQL, rank for each instance of a partition

I am trying to to create a rank for each instance of a status occurring, for example
ID
Status
From_date
To_date
rank
1
Available
2022-01-01
2022-01-02
1
1
Available
2022-01-02
2022-01-03
1
1
Unavailable
2022-01-03
2022-01-10
2
1
Available
2022-01-10
2022-01-20
3
For each ID, for each instance of a status occurring, by from_date ascending.
I want to do this as i see this as the best way of getting to the final result i want which is
ID
Status
From_date
To_date
rank
1
Available
2022-01-01
2022-01-03
1
1
Unavailable
2022-01-03
2022-01-10
2
1
Available
2022-01-10
2022-01-20
3
I tried dense_rank(partition by id order by status, from_date but can see now why that wouldnt work. Not sure how to get to this result.
So with this CTE for the data:
with data(ID, Status, From_date, To_date) as (
select * from values
(1, 'Available', '2022-01-01', '2022-01-02'),
(1, 'Available', '2022-01-02', '2022-01-03'),
(1, 'Unavailable', '2022-01-03', '2022-01-10'),
(1, 'Available', '2022-01-10', '2022-01-20')
)
the first result, being rank can be done with CONDITIONAL_CHANGE_EVENT:
select *
,CONDITIONAL_CHANGE_EVENT( Status ) OVER ( PARTITION BY ID ORDER BY From_date ) as rank
from data;
ID
STATUS
FROM_DATE
TO_DATE
RANK
1
Available
2022-01-01
2022-01-02
0
1
Available
2022-01-02
2022-01-03
0
1
Unavailable
2022-01-03
2022-01-10
1
1
Available
2022-01-10
2022-01-20
2
and thus the keeps the first of each rank can be achieved with a QUALIFY/ROW_NUMBER, because the CONDITIONAL_CHANGE is a complex operation, needs wrapping in a sub-select, so the answer is not as short as I would like:
select * from (
select *
,CONDITIONAL_CHANGE_EVENT( Status ) OVER ( PARTITION BY ID ORDER BY From_date ) as rank
from data
)
qualify row_number() over(partition by id, rank ORDER BY From_date ) = 1
gives:
ID
STATUS
FROM_DATE
TO_DATE
RANK
1
Available
2022-01-01
2022-01-02
0
1
Unavailable
2022-01-03
2022-01-10
1
1
Available
2022-01-10
2022-01-20
2
Also, the final result minus the ranking can be done with:
select *
from data
qualify nvl(Status <> lag(status) over ( PARTITION BY ID ORDER BY From_date ), true)
ID
STATUS
FROM_DATE
TO_DATE
1
Available
2022-01-01
2022-01-02
1
Unavailable
2022-01-03
2022-01-10
1
Available
2022-01-10
2022-01-20
and thus a rank can be added at the end
select *
,rank() over ( PARTITION BY ID ORDER BY From_date ) as rank
from (
select *
from data
qualify nvl(Status <> lag(status) over ( PARTITION BY ID ORDER BY From_date ), true)
)
ID
STATUS
FROM_DATE
TO_DATE
RANK
1
Available
2022-01-01
2022-01-02
1
1
Unavailable
2022-01-03
2022-01-10
2
1
Available
2022-01-10
2022-01-20
3
This is a typical gaps-and-island problem, where islands are groups of consecutive rows that have the same status.
Here is one way to solve it with window functions:
select id, status,
min(from_date) from_date, max(to_date) to_date,
row_number() over (partition by id order by min(from_date)) rn
from (
select t.*,
row_number() over (partition by id order by from_date) rn1,
row_number() over (partition by id, status order by from_date) rn2
from mytable t
) t
group by id, status, rn1 - rn2
order by min(from_date)
This worked by ranking rows within two different partitions (with a without the status) ; the difference between the row numbers define the islands.
You can group consecutive status using conditional_change_event, then collapse the dates using min and max, and finally use row_number() to rank the events
with cte as
(select *,conditional_change_event(status) over (partition by id order by from_date) as rn
from t)
select id,
status,
min(from_date) as from_date,
max(to_date) as to_date,
row_number() over (partition by id, order by min(from_date), max(to_date)) as rank
from cte
group by id, status, rn
order by rank

How can I select records from the last value accumulated

I have the next data: TABLE_A
RegisteredDate
Quantity
2022-03-01 13:00
100
2022-03-01 13:10
20
2022-03-01 13:20
-80
2022-03-01 13:30
-40
2022-03-02 09:00
10
2022-03-02 22:00
-5
2022-03-03 02:00
-5
2022-03-03 03:00
25
2022-03-03 03:20
-10
If I add cumulative column
select RegisteredDate, Quantity
, sum(Quantity) over ( order by RegisteredDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as Summary
from TABLE_A
RegisteredDate
Quantity
Summary
2022-03-01 13:00
100
100
2022-03-01 13:10
20
120
2022-03-01 13:20
-80
40
2022-03-01 13:30
-40
0
2022-03-02 09:00
10
10
2022-03-02 22:00
-5
5
2022-03-03 02:00
-5
0
2022-03-03 03:00
25
25
2022-03-03 03:20
-10
15
Is there a way to get the following result with a query?
RegisteredDate
Quantity
Summary
2022-03-03 03:00
25
25
2022-03-03 03:20
-10
15
This result is the last records after the last zero.
EDIT:
Really for the solution to this problem I need the: 2022-03-03 03:00 is the first date of the last records after the last zero.
You can try to use SUM aggregate window function to calculation grp column which part represent to last value accumulated.
Query 1:
WITH cte AS
(
SELECT RegisteredDate,
Quantity,
sum(Quantity) over (order by RegisteredDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as Summary
FROM TABLE_A
), cte2 AS (
SELECT *,
SUM(CASE WHEN Summary = 0 THEN 1 ELSE 0 END) OVER(order by RegisteredDate desc) grp
FROM cte
)
SELECT RegisteredDate,
Quantity
FROM cte2
WHERE grp = 0
ORDER BY RegisteredDate
Results:
| RegisteredDate | Quantity |
|----------------------|----------|
| 2022-03-03T03:00:00Z | 25 |
| 2022-03-03T03:20:00Z | -10 |
Use a CTE that returns the summary column and NOT EXISTS to filter out the rows that you don't need:
WITH cte AS (SELECT *, SUM(Quantity) OVER (ORDER BY RegisteredDate) Summary FROM TABLE_A)
SELECT c1.*
FROM cte c1
WHERE NOT EXISTS (
SELECT 1
FROM cte c2 WHERE c2.RegisteredDate >= c1.RegisteredDate AND c2.Summary = 0
)
ORDER BY c1.RegisteredDate;
There is no need for ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW in the OVER clause of the window function, because this is the default behavior.
See the demo.
Try this:
with u as
(select RegisteredDate,
Quantity,
sum(Quantity) over (order by RegisteredDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as Summary
from TABLE_A)
select * from u
where RegisteredDate >= all(select RegisteredDate from u where Summary = 0)
and Summary <> 0;
Fiddle
Basically what you want is for RegisteredDate to be >= all RegisteredDatess where Summary = 0, and you want Summary <> 0.
When using window functions, it is necessary to take into account that RegisteredDate column is not unique in TABLE_A, so ordering only by RegisteredDate column is not enough to get a stable result on the same dataset.
With A As (
Select ROW_NUMBER() Over (Order by RegisteredDate, Quantity) As ID, RegisteredDate, Quantity
From TABLE_A),
B As (
Select A.*, SUM(Quantity) Over (Order by ID) As Summary
From A)
Select Top 1 *
From B
Where ID > (Select MAX(ID) From B Where Summary=0)
ID
RegisteredDate
Quantity
Summary
8
2022-03-03 03:00
25
25

Current record with group by function

Trying to get userid recent aggregate value for session_id.
(session_id 3 has two records, recent agg value is 80.00
session_id 4 has four records, recent agg value is 95.00
session_id 6 has three records, recent agg value is 72.00
Table:session_agg
id session_id userid agg date
-- ---------- ------ ----- -------
1 3 11 60.00 1573561586
4 3 11 80.00 1573561586
6 4 11 35.00 1573561749
7 4 11 50.00 1573561751
8 4 11 70.00 1573561912
10 4 11 95.00 1573561921
11 6 14 40.00 1573561945
12 6 14 67.00 1573561967
13 6 14 72.00 1573561978
select id, session_id, userid, agg, date from session_agg
WHERE date IN (select MAX(date) from session_agg GROUP BY session_id) AND
userid = 11
If you want to stick with your current approach, then you need to correlate the session_id in the subquery which checks for the max date for each session:
SELECT id, session_id, userid, add, date
FROM session_agg sa1
WHERE
date = (SELECT MAX(date) FROM session_agg sa2 WHERE sa2.session_id = sa1.session_id) AND
userid = 11;
But, if your version of SQL supports analytic functions, ROW_NUMBER is an easier way to do this:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY session_id ORDER BY date DESC) rn
FROM session_agg
)
SELECT id, session_id, userid, add, date
FROM cte
WHERE rn = 1;

Select rows based on criteria from within group

I have the following table:
pk_positions ass_pos_id underlying entry_date
1 1 abc 2016-03-14
2 1 xyz 2016-03-17
3 tlt 2016-03-18
4 4 ujf 2016-03-21
5 4 dks 2016-03-23
6 4 dqp 2016-03-26
I need to select one row per ass_pos_id which has the earliest entry_date. Rows which do not have a value for ass_pos_id are not included.
In other words, for each non null ass_pos_id group, select the row which has the earliest entry_date
The following is the desired result:
pk_positions ass_pos_id underlying entry_date
1 1 abc 2016-03-14
4 4 ujf 2016-03-21
You could use the row_number window function:
SELECT pk_positions, ass_pos_id, underlying, entry_date
FROM (SELECT pk_positions, ass_pos_id, underlying, entry_date,
ROW_NUMBER() OVER (PARTITION BY ass_pos_id
ORDER BY entry_date ASC) rn
FROM mytable
WHERE ass_pos_id IS NOT NULL) t
WHERE rn = 1