SQL, rank for each instance of a partition - sql

I am trying to to create a rank for each instance of a status occurring, for example
ID
Status
From_date
To_date
rank
1
Available
2022-01-01
2022-01-02
1
1
Available
2022-01-02
2022-01-03
1
1
Unavailable
2022-01-03
2022-01-10
2
1
Available
2022-01-10
2022-01-20
3
For each ID, for each instance of a status occurring, by from_date ascending.
I want to do this as i see this as the best way of getting to the final result i want which is
ID
Status
From_date
To_date
rank
1
Available
2022-01-01
2022-01-03
1
1
Unavailable
2022-01-03
2022-01-10
2
1
Available
2022-01-10
2022-01-20
3
I tried dense_rank(partition by id order by status, from_date but can see now why that wouldnt work. Not sure how to get to this result.

So with this CTE for the data:
with data(ID, Status, From_date, To_date) as (
select * from values
(1, 'Available', '2022-01-01', '2022-01-02'),
(1, 'Available', '2022-01-02', '2022-01-03'),
(1, 'Unavailable', '2022-01-03', '2022-01-10'),
(1, 'Available', '2022-01-10', '2022-01-20')
)
the first result, being rank can be done with CONDITIONAL_CHANGE_EVENT:
select *
,CONDITIONAL_CHANGE_EVENT( Status ) OVER ( PARTITION BY ID ORDER BY From_date ) as rank
from data;
ID
STATUS
FROM_DATE
TO_DATE
RANK
1
Available
2022-01-01
2022-01-02
0
1
Available
2022-01-02
2022-01-03
0
1
Unavailable
2022-01-03
2022-01-10
1
1
Available
2022-01-10
2022-01-20
2
and thus the keeps the first of each rank can be achieved with a QUALIFY/ROW_NUMBER, because the CONDITIONAL_CHANGE is a complex operation, needs wrapping in a sub-select, so the answer is not as short as I would like:
select * from (
select *
,CONDITIONAL_CHANGE_EVENT( Status ) OVER ( PARTITION BY ID ORDER BY From_date ) as rank
from data
)
qualify row_number() over(partition by id, rank ORDER BY From_date ) = 1
gives:
ID
STATUS
FROM_DATE
TO_DATE
RANK
1
Available
2022-01-01
2022-01-02
0
1
Unavailable
2022-01-03
2022-01-10
1
1
Available
2022-01-10
2022-01-20
2
Also, the final result minus the ranking can be done with:
select *
from data
qualify nvl(Status <> lag(status) over ( PARTITION BY ID ORDER BY From_date ), true)
ID
STATUS
FROM_DATE
TO_DATE
1
Available
2022-01-01
2022-01-02
1
Unavailable
2022-01-03
2022-01-10
1
Available
2022-01-10
2022-01-20
and thus a rank can be added at the end
select *
,rank() over ( PARTITION BY ID ORDER BY From_date ) as rank
from (
select *
from data
qualify nvl(Status <> lag(status) over ( PARTITION BY ID ORDER BY From_date ), true)
)
ID
STATUS
FROM_DATE
TO_DATE
RANK
1
Available
2022-01-01
2022-01-02
1
1
Unavailable
2022-01-03
2022-01-10
2
1
Available
2022-01-10
2022-01-20
3

This is a typical gaps-and-island problem, where islands are groups of consecutive rows that have the same status.
Here is one way to solve it with window functions:
select id, status,
min(from_date) from_date, max(to_date) to_date,
row_number() over (partition by id order by min(from_date)) rn
from (
select t.*,
row_number() over (partition by id order by from_date) rn1,
row_number() over (partition by id, status order by from_date) rn2
from mytable t
) t
group by id, status, rn1 - rn2
order by min(from_date)
This worked by ranking rows within two different partitions (with a without the status) ; the difference between the row numbers define the islands.

You can group consecutive status using conditional_change_event, then collapse the dates using min and max, and finally use row_number() to rank the events
with cte as
(select *,conditional_change_event(status) over (partition by id order by from_date) as rn
from t)
select id,
status,
min(from_date) as from_date,
max(to_date) as to_date,
row_number() over (partition by id, order by min(from_date), max(to_date)) as rank
from cte
group by id, status, rn
order by rank

Related

Group by date and find median of processing time

I select input date and output date from a database. I use a formula to indicate the processing time. Now, I would like the values ​​to be grouped according to the date of receipt and the median of the processing time to be output for all grouped dates of receipt. Something like this:
The data I select:
input date | output date | processing time
2022-01-03 | 2022-01-03 | 0
2022-01-03 | 2022-01-06 | 3
2022-01-03 | 2022-01-11 | 8
2022-01-05 | 2022-01-10 | 5
2022-01-05 | 2022-01-15 | 10
The output I want:
input date | processing time
2022-01-03 | 3
2022-01-05 | 7.5
My SQL Code:
SELECT [received_date]
,CONVERT(date, [exported_on])
,DATEDIFF(day, [received_date], [exported_on]) AS processing_time
FROM [request] WHERE YEAR (received_date) = 2022
GROUP BY received_date, [exported_on]
ORDER BY received_date
How can I do this? Do I need a temp table to do this, or can I modify my query?
You could try using PERCENTILE_CONT
with cte as (
select input_date,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY processing_time) OVER(PARTITION BY input_date) as Median_Process_Time
FROM tableA
)
SELECT *
FROM cte
GROUP BY input_date, Median_Process_Time
db fiddle
Also you check check out the discussion here How to find the SQL medians for a grouping
Here my solution. Thank you for your help.
SET NOCOUNT ON;
DECLARE #working TABLE(entry_date date, exit_date date, work_time int)
INSERT INTO #working
SELECT [received] AS date_of_entry
,CONVERT(date, [exported]) AS date_of_exit
,DATEDIFF(day, [received], [exported]) AS processing_time
FROM [zsdt].[dbo].[antrag] WHERE YEAR([received]) = 2022 AND scanner_name IS NOT NULL AND exportiert_am IS NOT NULL AND NOT scanner_name = 'AP99'
GROUP BY [received], [exported]
ORDER BY [received] ASC
;WITH CTE AS
( SELECT entry_date,
work_time,
[half1] = NTILE(2) OVER(PARTITION BY entry_date ORDER BY work_time),
[half2] = NTILE(2) OVER(PARTITION BY entry_date ORDER BY work_time DESC)
FROM #working
WHERE work_time IS NOT NULL
)
SELECT entry_date,
(MAX(CASE WHEN Half1 = 1 THEN work_time END) +
MIN(CASE WHEN Half2 = 1 THEN work_time END)) / 2.0
FROM CTE
GROUP BY entry_date;

PostgreSQL ROW_NUMBER with timestamp conditions

I'm trying to extend PARTITION BY to keep rows in same partition if ts_created of current row is within 1hour of previous row.
SELECT t1.id,
t1.user_email,
t1.ts_created,
t1.prev_ts
row_number() OVER (PARTITION BY t1.user_email ORDER BY t1.ts_created DESC) AS time_order
FROM (SELECT id,
user_email,
ts_created,
lag(ts_created) OVER(PARTITION BY user_email ORDER BY ts_created DESC) AS prev_ts
FROM table1) AS t1 ORDER BY t1.ts_created DESC;
So far i'm doing partition over user_email and prepared timestamp of previous row, now i'm abit lost on how to handle time component between current and previous row.
expectation
id
user_email
ts_created
time_order
6
mailA
2022-01-01 07:30:00.000
1
5
mailA
2022-01-01 06:40:00.000
2
4
mailA
2022-01-01 05:50:00.000
3
3
mailA
2022-01-01 05:00:00.000
4
2
mailA
2022-01-01 03:50:00.000
1
1
mailB
2021-01-01 03:30:00.000
1

Need to get maximum date range which is overlapping in SQL

I have a table with 3 columns id, start_date, end_date
Some of the values are as follows:
1 2018-01-01 2030-01-01
1 2017-10-01 2018-10-01
1 2019-01-01 2020-01-01
1 2015-01-01 2016-01-01
2 2010-01-01 2011-02-01
2 2010-10-01 2010-12-01
2 2008-01-01 2009-01-01
I have the above kind of data set where I have to filter out overlap date range by keeping maximum datarange and keep the other date range which is not overlapping for a particular id.
Hence desired output should be:
1 2018-01-01 2030-01-01
1 2015-01-01 2016-01-01
2 2010-01-01 2011-02-01
2 2008-01-01 2009-01-01
I am unable to find the right way to code in impala. Can someone please help me.
I have tried like,
with cte as(
select a*, row_number() over(partition by id order by datediff(end_date , start_date) desc) as flag from mytable a) select * from cte where flag=1
but this will remove other date range which is not overlapping. Please help.
use row number with countItem for each id
with cte as(
select *,
row_number() over(partition by id order by id) as seq,
count(*) over(partition by id order by id) as countItem
from mytable
)
select id,start_date,end_date
from cte
where seq = 1 or seq = countItem
or without cte
select id,start_date,end_date
from
(select *,
row_number() over(partition by id order by id) as seq,
count(*) over(partition by id order by id) as countItem
from mytable) t
where seq = 1 or seq = countItem
demo in db<>fiddle
You can use a cumulative max to see if there is any overlap with preceding rows. If there is not, then you have the first row of a new group (row in the result set).
A cumulative sum of the starts assigns each row in the source to a group. Then aggregate:
select id, min(start_date), max(end_date)
from (select t.*,
sum(case when prev_end_date >= start_date then 0 else 1 end) over
(partition by id
order by start_date
rows between unbounded preceding and current row
) as grp
from (select t.*,
max(end_date) over (partition by id
order by start_date
rows between unbounded preceding and 1 preceding
) as prev_end_date
from t
) t
) t
group by id, grp;

SQL query to find continuous local max, min of date based on category column

I have the following data set
Customer_ID Category FROM_DATE TO_DATE
1 5 1/1/2000 12/31/2001
1 6 1/1/2002 12/31/2003
1 5 1/1/2004 12/31/2005
2 7 1/1/2010 12/31/2011
2 7 1/1/2012 12/31/2013
2 5 1/1/2014 12/31/2015
3 7 1/1/2010 12/31/2011
3 7 1/5/2012 12/31/2013
3 5 1/1/2014 12/31/2015
The result I want to achieve is to find continuous local min/max date for Customers with the same category and identify any gap in dates:
Customer_ID FROM_Date TO_Date Category
1 1/1/2000 12/31/2001 5
1 1/1/2002 12/31/2003 6
1 1/1/2004 12/31/2005 5
2 1/1/2010 12/31/2013 7
2 1/1/2014 12/31/2015 5
3 1/1/2010 12/31/2011 7
3 1/5/2012 12/31/2013 7
3 1/1/2014 12/31/2015 5
My code works fine for customer 1 (return all 3 rows) and customer 2(return 2 rows with min and max date for each category) but for customer 3, it cannot identify the gap between 12/31/2011 and 1/5/2012 for category 7.
Customer_ID FROM_Date TO_Date Category
3 1/1/2010 12/31/2013 7
3 1/1/2014 12/31/2015 5
Here is my code:
SELECT Customer_ID, Category, min(From_Date), max(To_Date) FROM
(
SELECT Customer_ID, Category, From_Date,To_Date
,row_number() over (order by member_id, To_Date) - row_number() over (partition by Customer_ID order by Category) as p
FROM FFS_SAMP
) X
group by Customer_ID,Category,p
order by Customer_ID,min(From_Date),Max(To_Date)
This is a type of gaps and islands problem. Probably the safest method is to use a cumulative max() to look for overlaps with previous records. Where there is no overlap, then an "island" of records starts. So:
select customer_id, min(from_date), max(to_date), category
from (select t.*,
sum(case when prev_to_date >= from_date then 0 else 1 end) over
(partition by customer_id, category
order by from_date
) as grp
from (select t.*,
max(to_date) over (partition by customer_id, category
order by from_date
rows between unbounded preceding and 1 preceding
) as prev_to_date
from t
) t
) t
group by customer_id, category, grp;
Your attempt is quite close. You just need to fix the over() clause of the window functions:
select customer_id, category, min(from_date), max(to_date)
from (
select
fs.*,
row_number() over (partition by customer_id order from_date)
- row_number() over (partition by customer_id, category order by from_date) as grp
from ffs_samp fs
) x
group by customer_id, category, grp
order by customer_id, min(from_date)
Note that this method assumes no gaps or overlalp in the periods of a given customer, as show in your sample data.

Making groups of dates in SQL Server

I have a table contains ids and dates, I want to groups of dates for each id
id date
------------------
1 2019-01-01
2 2019-01-01
1 2019-01-02
2 2019-01-02
2 2019-01-03
1 2019-01-04
1 2019-01-05
2 2019-01-05
2 2019-01-06
I want to check where are gaps in date for each id to get output like
id from to
------------------------------------
1 2019-01-01 2019-01-02
1 2019-01-04 2019-01-05
2 2019-01-01 2019-01-03
2 2019-01-05 2019-01-06
This is a form of gaps-and-islands problem. The simplest solution is to generate a sequential number for each id and subtract that from the date. This is constant for dates that are sequential.
So:
select id, min(date), max(date)
from (select t.*, row_number() over (partition by id order by date) as seqnum
from t
) t
group by id, dateadd(day, -seqnum, date)
order by id, min(date);
Here is a db<>fiddle.
A typical approach to this gaps-and-islands problem is build the groups by comparing the date of the current record to the "previous" date of the same id. When dates are not consecutive, a new group starts:
select id, min(date) from_date, max(date) to_date
from (
select
t.*,
sum(case when date = dateadd(day, 1, lag_date) then 0 else 1 end)
over(partition by id order by date) grp
from (
select
t.*,
lag(date) over(partition by id order by date) lag_date
from mytable t
) t
) t
group by id, grp
order by id, from_date