Making groups of dates in SQL Server

Making groups of dates in SQL Server - sql

I have a table contains ids and dates, I want to groups of dates for each id
id date
------------------
1 2019-01-01
2 2019-01-01
1 2019-01-02
2 2019-01-02
2 2019-01-03
1 2019-01-04
1 2019-01-05
2 2019-01-05
2 2019-01-06
I want to check where are gaps in date for each id to get output like
id from to
------------------------------------
1 2019-01-01 2019-01-02
1 2019-01-04 2019-01-05
2 2019-01-01 2019-01-03
2 2019-01-05 2019-01-06

This is a form of gaps-and-islands problem. The simplest solution is to generate a sequential number for each id and subtract that from the date. This is constant for dates that are sequential.
So:
select id, min(date), max(date)
from (select t.*, row_number() over (partition by id order by date) as seqnum
from t
) t
group by id, dateadd(day, -seqnum, date)
order by id, min(date);
Here is a db<>fiddle.

A typical approach to this gaps-and-islands problem is build the groups by comparing the date of the current record to the "previous" date of the same id. When dates are not consecutive, a new group starts:
select id, min(date) from_date, max(date) to_date
from (
select
t.*,
sum(case when date = dateadd(day, 1, lag_date) then 0 else 1 end)
over(partition by id order by date) grp
from (
select
t.*,
lag(date) over(partition by id order by date) lag_date
from mytable t
) t
) t
group by id, grp
order by id, from_date

Related

SQL, rank for each instance of a partition

I am trying to to create a rank for each instance of a status occurring, for example
ID
Status
From_date
To_date
rank
1
Available
2022-01-01
2022-01-02
1
1
Available
2022-01-02
2022-01-03
1
1
Unavailable
2022-01-03
2022-01-10
2
1
Available
2022-01-10
2022-01-20
3
For each ID, for each instance of a status occurring, by from_date ascending.
I want to do this as i see this as the best way of getting to the final result i want which is
ID
Status
From_date
To_date
rank
1
Available
2022-01-01
2022-01-03
1
1
Unavailable
2022-01-03
2022-01-10
2
1
Available
2022-01-10
2022-01-20
3
I tried dense_rank(partition by id order by status, from_date but can see now why that wouldnt work. Not sure how to get to this result.

So with this CTE for the data:
with data(ID, Status, From_date, To_date) as (
select * from values
(1, 'Available', '2022-01-01', '2022-01-02'),
(1, 'Available', '2022-01-02', '2022-01-03'),
(1, 'Unavailable', '2022-01-03', '2022-01-10'),
(1, 'Available', '2022-01-10', '2022-01-20')
)
the first result, being rank can be done with CONDITIONAL_CHANGE_EVENT:
select *
,CONDITIONAL_CHANGE_EVENT( Status ) OVER ( PARTITION BY ID ORDER BY From_date ) as rank
from data;
ID
STATUS
FROM_DATE
TO_DATE
RANK
1
Available
2022-01-01
2022-01-02
0
1
Available
2022-01-02
2022-01-03
0
1
Unavailable
2022-01-03
2022-01-10
1
1
Available
2022-01-10
2022-01-20
2
and thus the keeps the first of each rank can be achieved with a QUALIFY/ROW_NUMBER, because the CONDITIONAL_CHANGE is a complex operation, needs wrapping in a sub-select, so the answer is not as short as I would like:
select * from (
select *
,CONDITIONAL_CHANGE_EVENT( Status ) OVER ( PARTITION BY ID ORDER BY From_date ) as rank
from data
)
qualify row_number() over(partition by id, rank ORDER BY From_date ) = 1
gives:
ID
STATUS
FROM_DATE
TO_DATE
RANK
1
Available
2022-01-01
2022-01-02
0
1
Unavailable
2022-01-03
2022-01-10
1
1
Available
2022-01-10
2022-01-20
2
Also, the final result minus the ranking can be done with:
select *
from data
qualify nvl(Status <> lag(status) over ( PARTITION BY ID ORDER BY From_date ), true)
ID
STATUS
FROM_DATE
TO_DATE
1
Available
2022-01-01
2022-01-02
1
Unavailable
2022-01-03
2022-01-10
1
Available
2022-01-10
2022-01-20
and thus a rank can be added at the end
select *
,rank() over ( PARTITION BY ID ORDER BY From_date ) as rank
from (
select *
from data
qualify nvl(Status <> lag(status) over ( PARTITION BY ID ORDER BY From_date ), true)
)
ID
STATUS
FROM_DATE
TO_DATE
RANK
1
Available
2022-01-01
2022-01-02
1
1
Unavailable
2022-01-03
2022-01-10
2
1
Available
2022-01-10
2022-01-20
3

This is a typical gaps-and-island problem, where islands are groups of consecutive rows that have the same status.
Here is one way to solve it with window functions:
select id, status,
min(from_date) from_date, max(to_date) to_date,
row_number() over (partition by id order by min(from_date)) rn
from (
select t.*,
row_number() over (partition by id order by from_date) rn1,
row_number() over (partition by id, status order by from_date) rn2
from mytable t
) t
group by id, status, rn1 - rn2
order by min(from_date)
This worked by ranking rows within two different partitions (with a without the status) ; the difference between the row numbers define the islands.

You can group consecutive status using conditional_change_event, then collapse the dates using min and max, and finally use row_number() to rank the events
with cte as
(select *,conditional_change_event(status) over (partition by id order by from_date) as rn
from t)
select id,
status,
min(from_date) as from_date,
max(to_date) as to_date,
row_number() over (partition by id, order by min(from_date), max(to_date)) as rank
from cte
group by id, status, rn
order by rank

Need to get maximum date range which is overlapping in SQL

I have a table with 3 columns id, start_date, end_date
Some of the values are as follows:
1 2018-01-01 2030-01-01
1 2017-10-01 2018-10-01
1 2019-01-01 2020-01-01
1 2015-01-01 2016-01-01
2 2010-01-01 2011-02-01
2 2010-10-01 2010-12-01
2 2008-01-01 2009-01-01
I have the above kind of data set where I have to filter out overlap date range by keeping maximum datarange and keep the other date range which is not overlapping for a particular id.
Hence desired output should be:
1 2018-01-01 2030-01-01
1 2015-01-01 2016-01-01
2 2010-01-01 2011-02-01
2 2008-01-01 2009-01-01
I am unable to find the right way to code in impala. Can someone please help me.
I have tried like,
with cte as(
select a*, row_number() over(partition by id order by datediff(end_date , start_date) desc) as flag from mytable a) select * from cte where flag=1
but this will remove other date range which is not overlapping. Please help.

use row number with countItem for each id
with cte as(
select *,
row_number() over(partition by id order by id) as seq,
count(*) over(partition by id order by id) as countItem
from mytable
)
select id,start_date,end_date
from cte
where seq = 1 or seq = countItem
or without cte
select id,start_date,end_date
from
(select *,
row_number() over(partition by id order by id) as seq,
count(*) over(partition by id order by id) as countItem
from mytable) t
where seq = 1 or seq = countItem
demo in db<>fiddle

You can use a cumulative max to see if there is any overlap with preceding rows. If there is not, then you have the first row of a new group (row in the result set).
A cumulative sum of the starts assigns each row in the source to a group. Then aggregate:
select id, min(start_date), max(end_date)
from (select t.*,
sum(case when prev_end_date >= start_date then 0 else 1 end) over
(partition by id
order by start_date
rows between unbounded preceding and current row
) as grp
from (select t.*,
max(end_date) over (partition by id
order by start_date
rows between unbounded preceding and 1 preceding
) as prev_end_date
from t
) t
) t
group by id, grp;

SQL query to find continuous local max, min of date based on category column

I have the following data set
Customer_ID Category FROM_DATE TO_DATE
1 5 1/1/2000 12/31/2001
1 6 1/1/2002 12/31/2003
1 5 1/1/2004 12/31/2005
2 7 1/1/2010 12/31/2011
2 7 1/1/2012 12/31/2013
2 5 1/1/2014 12/31/2015
3 7 1/1/2010 12/31/2011
3 7 1/5/2012 12/31/2013
3 5 1/1/2014 12/31/2015
The result I want to achieve is to find continuous local min/max date for Customers with the same category and identify any gap in dates:
Customer_ID FROM_Date TO_Date Category
1 1/1/2000 12/31/2001 5
1 1/1/2002 12/31/2003 6
1 1/1/2004 12/31/2005 5
2 1/1/2010 12/31/2013 7
2 1/1/2014 12/31/2015 5
3 1/1/2010 12/31/2011 7
3 1/5/2012 12/31/2013 7
3 1/1/2014 12/31/2015 5
My code works fine for customer 1 (return all 3 rows) and customer 2(return 2 rows with min and max date for each category) but for customer 3, it cannot identify the gap between 12/31/2011 and 1/5/2012 for category 7.
Customer_ID FROM_Date TO_Date Category
3 1/1/2010 12/31/2013 7
3 1/1/2014 12/31/2015 5
Here is my code:
SELECT Customer_ID, Category, min(From_Date), max(To_Date) FROM
(
SELECT Customer_ID, Category, From_Date,To_Date
,row_number() over (order by member_id, To_Date) - row_number() over (partition by Customer_ID order by Category) as p
FROM FFS_SAMP
) X
group by Customer_ID,Category,p
order by Customer_ID,min(From_Date),Max(To_Date)

This is a type of gaps and islands problem. Probably the safest method is to use a cumulative max() to look for overlaps with previous records. Where there is no overlap, then an "island" of records starts. So:
select customer_id, min(from_date), max(to_date), category
from (select t.*,
sum(case when prev_to_date >= from_date then 0 else 1 end) over
(partition by customer_id, category
order by from_date
) as grp
from (select t.*,
max(to_date) over (partition by customer_id, category
order by from_date
rows between unbounded preceding and 1 preceding
) as prev_to_date
from t
) t
) t
group by customer_id, category, grp;

Your attempt is quite close. You just need to fix the over() clause of the window functions:
select customer_id, category, min(from_date), max(to_date)
from (
select
fs.*,
row_number() over (partition by customer_id order from_date)
- row_number() over (partition by customer_id, category order by from_date) as grp
from ffs_samp fs
) x
group by customer_id, category, grp
order by customer_id, min(from_date)
Note that this method assumes no gaps or overlalp in the periods of a given customer, as show in your sample data.

display only records with continuous coverage

I have sample data as below. Even though i have duplicates in the data i want to result as below
duplicates can be ignored.
Sample data
123456 2019-03-01 2199-12-31
123456 2019-03-01 2019-12-31
123456 2019-03-01 2199-12-31
123456 2020-01-01 2199-12-31
123456 1920-01-01 2019-02-28
Output is required as below
123456 1920-01-01 2019-02-28
123456 2019-03-01 2199-12-31
can some please help me write a SQL display the out with continuous coverage records with end date as 2199/12/31

You can solve this by unpivoting the data and keeping track of periods when the net number is greater than 0:
with t as (
select id, start as dt, 1 as inc
from <table> t
union all
select id, end, -1 as inc
from <table> t
)
select id, min(dt), max(next_dt)
from (select t.*,
sum(case when net_ins = 0 then 1 else 0 end) over (partition by id order by dt) as grp,
lead(dt) over (partition by id order by dt) as next_dt
from (select id, dt,
sum(sum(inc)) over (partition by id order by dt) as net_ins
from t
group by id, dt
) t
) t
where net_ins > 0
group by id, grp;
Here is a db<>fiddle.

Retrieve rows for time interval but also previous row of each - how to?

I have a table like this:
Id FKId Amount1 Amount2 Date
-----------------------------------------------------
1 1 100,0000 33,0000 2018-01-18 19:57:39.403
2 2 50,0000 10,0000 2018-01-19 19:57:57.097
3 1 130,0000 40,0000 2018-01-20 19:58:13.660
5 2 44,0000 2,0000 2018-01-21 11:11:00.000
How to get rows from 3 - 5 (all that have dates 2018-01-21 or 2018-01-21) but also their previous row regarding FKId (1 and 2)?
Thank you

In most databases, you can use the ANSI standard lead() function:
select t.*
from (select t.*, lead(date) over (partition by fkid order by date) as next_date
from t
) t
where date in ('2018-01-20', '2018-01-21') or
next_date in ('2018-01-20', '2018-01-21');
Alternatively, if you just want all records where the date is bigger than some date and the previous record, this logic also works:
select t.*
from t
where t.date >= (select max(t2.date)
from t t2
where t2.fkid = t.fkid and t2.date < '2018-01-20'
);

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Making groups of dates in SQL Server - sql

Related

SQL, rank for each instance of a partition

Need to get maximum date range which is overlapping in SQL

SQL query to find continuous local max, min of date based on category column

display only records with continuous coverage

Retrieve rows for time interval but also previous row of each - how to?

Categories

Resources