Finding gaps between date ranges spanning records - sql

I'm trying to write a query where I can find any gap in the date ranges for a given ID when passing in two dates.
EDIT: I need to know if a whole gap or part of a gap exists in my date range.
I have data in this format:
Example 1:
| ID | START_DATE | END_DATE |
|----|------------|------------|
| 1 | 01/01/2019 | 30/09/2019 |
| 1 | 01/03/2020 | (null) |
Example 2:
| ID | START_DATE | END_DATE |
|----|------------|------------|
| 2 | 01/01/2019 | 30/09/2019 |
| 2 | 01/10/2019 | 01/12/2019 |
| 2 | 02/12/2019 | (null) |
NB. A null end date essentially means "still active up to current day".
E.g. Example 1 has a gap of 152 days between 30/09/2019 and 01/03/2020. If I queried in the range of 05/05/2019 - 01/09/2019 there's no gap in that range. Whereas if I'm looking at the date range 05/05/2019 - 02/10/2019 there's a single day gap in that range.
For what it's worth, I don't actually care how many days gap, just whether there is one or not.
I've tried doing something like this but it doesn't work when my date falls into a gap:
SELECT SUM(START_DATE - PREV_END - 1)
FROM
(
SELECT ID, START_DATE, END_DATE, LAG(END_DATE) OVER (ORDER BY START_DATE) AS PREV_END_DATE
FROM TBL
WHERE ID = X_ID
)
WHERE START_DATE >= Y_FIRST_DATE
AND START_DATE <= Z_SECOND_DATE;
X_ID, Y_FIRST_DATE, and Z_SECOND_DATE are just any different ID or date range I might want to pass in.
How could I go about this?

Another option to determine the days might be by use SELECT .. FROM dual CONNECT BY LEVEL <= syntax through EXISTence of gaps by INTERSECTing two sets, one finds all dates between extremum parameters while the other finds all the dates fitting within the dates inserted into table as bounds :
SELECT CASE WHEN
SUM( 1 + LEAST(Z_SECOND_DATE,NVL(END_DATE,TRUNC(SYSDATE)))
- GREATEST(Y_FIRST_DATE,START_DATE) ) = Z_SECOND_DATE - Y_FIRST_DATE + 1 THEN
'NO Gap'
ELSE
'Gap Exists'
END "gap?"
FROM TBL t
WHERE ID = X_ID
AND EXISTS ( SELECT Y_FIRST_DATE + LEVEL - 1
FROM dual
CONNECT BY LEVEL <= Z_SECOND_DATE - Y_FIRST_DATE + 1
INTERSECT
SELECT t.START_DATE + LEVEL - 1
FROM dual
CONNECT BY LEVEL <= NVL(t.END_DATE,TRUNC(SYSDATE))- t.START_DATE + 1
)
START_DATE values are assumed to be non-null based on the sample data.
Demo

This is another variation the islands-and-gaps problem that pops up a lot here. I think this fits with Oracle's pattern matching functionality. Take this example:
WITH tbl AS
(
SELECT 1 AS ID, to_date('01/01/2019', 'DD/MM/YYYY') AS START_DATE, to_date('30/09/2019', 'DD/MM/YYYY') AS END_DATE FROM DUAL
UNION ALL
SELECT 1 AS ID, to_date('01/03/2020', 'DD/MM/YYYY') AS START_DATE, NULL AS END_DATE FROM DUAL
UNION ALL
SELECT 2 AS ID, to_date('01/01/2019', 'DD/MM/YYYY') AS START_DATE, to_date('30/09/2019', 'DD/MM/YYYY') AS END_DATE FROM DUAL
UNION ALL
SELECT 2 AS ID, to_date('01/10/2019', 'DD/MM/YYYY') AS START_DATE, to_date('01/12/2019', 'DD/MM/YYYY') AS END_DATE FROM DUAL
UNION ALL
SELECT 2 AS ID, to_date('02/12/2019', 'DD/MM/YYYY') AS START_DATE, NULL AS END_DATE FROM DUAL
)
SELECT *
FROM tbl
MATCH_RECOGNIZE(ORDER BY ID, start_date
MEASURES b.id AS ID,
a.end_date+1 AS GAP_START,
b.start_date-1 AS GAP_END
PATTERN (A B+)
DEFINE B AS start_date > PREV(end_date)+1 AND ID = PREV(ID))L;
I know it looks long, but most of it is creating the WITH clause. The pattern matching allows you to define what a gap is and pull the information accordingly. Notice that in order to have a gap, your start date must be greater than the previous end date + 1 grouped by the ID column.
To enhance this to answer your updated/edited question, just add this line of code to the end:
WHERE GREATEST(gap_start, TO_DATE('15/09/2019', 'DD/MM/YYYY' /*Y_FIRST_DATE*/)) <= LEAST(gap_end, to_date('15/10/2019', 'DD/MM/YYYY')/*Z_SECOND_DATE*/)

You can split the date range you are passing, into dates and then compare it with a date range in your table as follows:
SELECT
CASE WHEN SUM(CASE WHEN T.ID IS NULL THEN 1 END) > 0
THEN 'THERE IS GAP'
ELSE 'THERE IS NO GAP'
END AS RESULT_
FROM ( SELECT P_IN_FROM_DATE + LEVEL - 1 AS CUST_DATES
FROM DUAL
CONNECT BY LEVEL <= P_IN_TO_DATE - P_IN_FROM_DATE + 1
) CUST_TBL
LEFT JOIN TBL T
ON CUST_TBL.CUST_DATES BETWEEN T.START_DATE AND T.END_DATE
OR ( CUST_TBL.CUST_DATES >= T.START_DATE AND T.END_DATE IS NULL )

I would suggest finding the maximum end date before the current record -- based on the start date.
That would be:
select t.*
from (select t.*,
max(end_date) over (order by start_date
rows between unbounded preceding and 1 preceding
) as max_prev_end_date
from tbl t
where start_date <= :input_end_date and
end_date >= :input_start_date
) t
where max_prev_end_date < start_date;

Related

Converting PostgreSQL recursive CTE to SQL Server

I'm having trouble adapting some recursive CTE code from PostgreSQL to SQL Server, from the book "Fighting Churn with Data"
This is the working PostgreSQL code:
with recursive
active_period_params as (
select interval '30 days' as allowed_gap,
'2021-09-30'::date as calc_date
),
active as (
-- anchor
select distinct account_id, min(start_date) as start_date
from subscription inner join active_period_params
on start_date <= calc_date
and (end_date > calc_date or end_date is null)
group by account_id
UNION
-- recursive
select s.account_id, s.start_date
from subscription s
cross join active_period_params
inner join active e on s.account_id=e.account_id
and s.start_date < e.start_date
and s.end_date >= (e.start_date-allowed_gap)::date
)
select account_id, min(start_date) as start_date
from active
group by account_id
This is my attempt at converting to SQL Server. It gets stuck in a loop. I believe the issue has to do with the UNION ALL required by SQL Server.
with
active_period_params as (
select 30 as allowed_gap,
cast('2021-09-30' as date) as calc_date
),
active as (
-- anchor
select distinct account_id, min(start_date) as start_date
from subscription inner join active_period_params
on start_date <= calc_date
and (end_date > calc_date or end_date is null)
group by account_id
UNION ALL
-- recursive
select s.account_id, s.start_date
from subscription s
cross join active_period_params
inner join active e on s.account_id=e.account_id
and s.start_date < e.start_date
and s.end_date >= dateadd(day, -allowed_gap, e.start_date)
)
select account_id, min(start_date) as start_date
from active
group by account_id
The subscription table is a list of subscriptions belonging to customers. A customer can have multiple subscriptions with overlapping dates or gaps between dates. null end_date means the subscription is currently active and has no defined end_date. Example data for a single customer (account_id = 15) below:
subscription
---------------------------------------------------
| id | account_id | start_date | end_date |
---------------------------------------------------
| 6 | 15 | 01/06/2021 | null |
| 5 | 15 | 01/01/2021 | null |
| 4 | 15 | 01/06/2020 | 01/02/2021 |
| 3 | 15 | 01/04/2020 | 15/05/2020 |
| 2 | 15 | 01/03/2020 | 15/05/2020 |
| 1 | 15 | 01/06/2019 | 01/01/2020 |
Expected query result (as produced by PostgreSQL code):
------------------------------
| account_id | start_date |
------------------------------
| 15 | 01/03/2020 |
Issue:
The SQL Server code above gets stuck in a loop and doesn't produce a result.
Description of the PostgreSQL code:
anchor block finds subs that are active as at the calc_date (30/09/2021) (id 5 & 6), and returns the min start_date (01/01/2021)
the recursion block then looks for any earlier subs that existed within the allowed_gap, which is 30 days prior to the min_start date found in 1). id 4 meets this criteria, so the new min start_date is 01/06/2020
recursion repeats and finds two subs within the allowed_gap (01/06/2020 - 30 days). Of these subs (id 2 & 3), the new min start_date is 01/03/2020
recursion fails to find an earlier sub within the allowed_gap (01/03/2020 - 30 days)
query returns a start date of 01/03/2020 for account_id 15
Any help appreciated!
It seems the issue is related to the way SQL Server deals with recursive CTEs.
This is a type of gaps-and-islands problem, and does not actually require recursion.
There are a number of solutions, here is one. Given your requirement, there may be more efficient methods, but this should get you started.
Using LAG we identify rows which are within the specified gap of the next row
We use a running COUNT to give each consecutive set of rows an ID
We group by that ID, and take the minimum start_date, filtering out non-qualifying groups
Group again to get the minimum per account
DECLARE #allowed_gap int = 30,
#calc_date datetime = cast('2021-09-30' as date);
WITH PrevValues AS (
SELECT *,
IsStart = CASE WHEN ISNULL(LAG(end_date) OVER (PARTITION BY account_id
ORDER BY start_date), '2099-01-01') < DATEADD(day, -#allowed_gap, start_date)
THEN 1 END
FROM subscription
),
Groups AS (
SELECT *,
GroupId = COUNT(IsStart) OVER (PARTITION BY account_id
ORDER BY start_date ROWS UNBOUNDED PRECEDING)
FROM PrevValues
),
ByGroup AS (
SELECT
account_id,
GroupId,
start_date = MIN(start_date)
FROM Groups
GROUP BY account_id, GroupId
HAVING COUNT(CASE WHEN start_date <= #calc_date
and (end_date > #calc_date or end_date is null) THEN 1 END) > 0
)
SELECT
account_id,
start_date = MIN(start_date)
FROM ByGroup
GROUP BY account_id;
db<>fiddle

repeat a sequence from a lookup table across a set number of records oracle sql

case:
the user is required to select a start and end date for a specific period and they are also required to select a sequence and where in the sequence they would like to start the sequence cycle (the sequence is stored in a lookup table...an example of a store sequence is shown in the example below under sequence lookup ).
user input parameters:
start date : 01-jan-2021
end date : 14-jan-2021
sequence_name : 1-5
start sequence at : 4
after the user inputs the paramaters , the system will
list all dates between between 01-jan-2021 and 14-jan-2021 - shown below in the example then it will start to map the sequence to the dates starting with the sequence number inputted in this case its 4 (shown in example below)
when the system reaches the end of the sequence (in this case the end is 5) it will restart the sequence from 1 because that was the start of the sequence in the look up.
in the example image below it shows what the results should look like.
thank you for your help!
i prefer to write it in sql but if its not possible in sql then plsql is also fine.
You haven't really explained how you get from your 'sequence name' to a range of values, so I'll assume you already that that part, and will work from a date range and a sequence range, which can be provided for simplicity as a CTE:
with input (start_date, end_date, start_seq, end_seq, start_at) as (
select date '2021-01-01', date '2021-01-14', 1, 5, 4 from dual
)
select * from input
You tagged the question with Oracle 11g. If that is 11gR2 then you can use a recursive CTE to generate the result from that simulated input data:
with input (start_date, end_date, start_seq, end_seq, start_at) as (
select date '2021-01-01', date '2021-01-14', 1, 5, 4 from dual
),
rcte (dt, seq, end_date, start_seq, end_seq) as (
select start_date, start_at, end_date, start_seq, end_seq
from input
union all
select dt + 1, case when seq = end_seq then start_seq else seq + 1 end,
end_date, start_seq, end_seq
from rcte
where dt < end_date
)
select dt, seq
from rcte
order by dt;
The anchor member use the start date and start-at value, and keeps the other information needed later. The recursive member increments both, wrapping the seq value at the top of that range. Giving the result:
DT | SEQ
:-------- | --:
01-JAN-21 | 4
02-JAN-21 | 5
03-JAN-21 | 1
04-JAN-21 | 2
05-JAN-21 | 3
06-JAN-21 | 4
07-JAN-21 | 5
08-JAN-21 | 1
09-JAN-21 | 2
10-JAN-21 | 3
11-JAN-21 | 4
12-JAN-21 | 5
13-JAN-21 | 1
14-JAN-21 | 2
On earlier versions, or just if you prefer it, you can use a hierarchical query, which looks shorter but I think it's a bit less intuitive:
with input (start_date, end_date, start_seq, end_seq, start_at) as (
select date '2021-01-01', date '2021-01-14', 1, 5, 4 from dual
)
select start_date + level - 1 as dt,
mod(level - 1 + start_at - start_seq, end_seq - start_seq + 1) + start_seq as seq
from input
connect by level <= end_date - start_date + 1
order by dt;
db<>fiddle showing both approaches.
And a second db<>fiddle showing a different sequence range and starting point.

Returning all dates in dd/mm/yy format from the past 6 months

I am looking to return all days from the past 6 months.
Per example:
Column1
-------
01-OCT-18
30-SEP-18
29-SEP-18
........
01-APR-18
#TimBiegeleisen - Your solution pointed me in the right direction, so you get the points.
#MT0 - "ADD_MONTHS" as far as I know is not used in T-SQL so the the clarification I believe was necessary. but thank you for the pointer with the updates will refrain from doing that in the future.
We can compare each date in Column1 against SYSDATE, 6 months earlier, and then display the dates in the format you want using TO_CHAR with an appropriate format mask:
SELECT
TO_CHAR(Column1, 'dd/mm/yy') AS output
FROM yourTable
WHERE
Column1 >= ADD_MONTHS(SYSDATE, -6);
Demo
This will get you all the dates (in the format in your example) from the last 6 months:
SQL Fiddle
Query 1:
SELECT TO_CHAR( SYSDATE - LEVEL + 1, 'DD-MON-RR' ) AS Column1
FROM DUAL
CONNECT BY SYSDATE - LEVEL + 1 >= ADD_MONTHS( SYSDATE, -6 )
Results:
| COLUMN1 |
|-----------|
| 11-OCT-18 |
| 10-OCT-18 |
| 09-OCT-18 |
...
| 13-APR-18 |
| 12-APR-18 |
| 11-APR-18 |
Update
the idea is to produce a list of days from the past 6 months and a count of how many times a particular value has been recorded against each date
SQL Fiddle
Oracle 11g R2 Schema Setup:
Create an example table with multiple rows for various days:
CREATE TABLE table_name ( value ) AS
SELECT TRUNC( SYSDATE ) - 0 FROM DUAL CONNECT BY LEVEL <= 5
UNION ALL SELECT TRUNC( SYSDATE ) - 1 FROM DUAL CONNECT BY LEVEL <= 3
UNION ALL SELECT TRUNC( SYSDATE ) - 2 FROM DUAL CONNECT BY LEVEL <= 7
UNION ALL SELECT TRUNC( SYSDATE ) - 3 FROM DUAL CONNECT BY LEVEL <= 2
UNION ALL SELECT TRUNC( SYSDATE ) - 4 FROM DUAL CONNECT BY LEVEL <= 1
Query 1:
SELECT TO_CHAR( c.Column1, 'DD-MON-RR' ) AS Column1,
COUNT( t.value ) AS num_values_per_day
FROM (
SELECT TRUNC( SYSDATE ) - LEVEL + 1 AS Column1
FROM DUAL
CONNECT BY TRUNC( SYSDATE ) - LEVEL + 1 >= ADD_MONTHS( SYSDATE, -6 )
) c
LEFT OUTER JOIN table_name t
ON ( c.column1 = t.value )
GROUP BY c.Column1
ORDER BY c.Column1 DESC
Results:
| COLUMN1 | NUM_VALUES_PER_DAY |
|-----------|--------------------|
| 11-OCT-18 | 5 |
| 10-OCT-18 | 3 |
| 09-OCT-18 | 7 |
| 08-OCT-18 | 2 |
| 07-OCT-18 | 1 |
| 06-OCT-18 | 0 |
| 05-OCT-18 | 0 |
...
| 14-APR-18 | 0 |
| 13-APR-18 | 0 |
| 12-APR-18 | 0 |
So for the additional task provided in your comment you might want to adjust Tim Biegeleisens solution a little bit:
SELECT TRUNC(Column1) AS "Day"
, count(*) as "Count"
FROM yourTable
GROUP BY TRUNC(Column1)
WHERE Column1 >= ADD_MONTHS(SYSDATE, -6);
I will add more to it however this was the starting point I needed, I was over complicating the query by trying to use "CONNECT BY LEVEL" the query I had before worked fine for dates in the future but would not return anything previous to the sysdate (I will play around a bit more with the above to figure out how it works but for the time being I know enough).
Thanks for the answer, I was able to figure out the solution for what I wanted via the following:
SELECT col1
FROM table1
WHERE col1 >= add_months(sysdate,-6)

Oracle SQL overlap between begin date and end date in 2 or more records

Database my_table:
id seq start_date end_date
1 1 01-01-2017 02-01-2017
1 2 07-01-2017 09-01-2017
1 3 11-01-2017 11-01-2017
2 1 20-01-2017 20-01-2017
3 1 01-02-2017 02-02-2017
3 2 03-02-2017 04-02-2017
3 3 08-01-2017 09-02-2017
3 4 09-01-2017 10-02-2017
3 5 10-01-2017 12-02-2017
My requirement is to get the first date (normally seq 1 start date) and end date (normally last seq end date) and the number of dates occurred during all seq for each unique ID.
Date occurred:
id 1 2 3
01-01-2017 20-01-2017 01-02-2017
02-01-2017 02-02-2017
07-01-2017 03-02-2017
08-01-2017 04-02-2017
09-01-2017 08-02-2017
11-01-2017 09-02-2017
10-02-2017
11-02-2017
12-02-2017
total 6 1 9
Here is the result I want:
id start_date end_date num_date
1 01-01-2017 11-01-2017 6
2 20-01-2017 20-01-2017 1
3 01-02-2017 12-02-2017 9
I have tried
SELECT id
, MIN(start_date)
, MAX(end_date)
, SUM(end_date - start_date + 1)
FROM my_table
GROUP BY id
and this SQL statement work fine in id 1 and 2 since there is no overlap date between begin date and end date. But for id 3, the result num_date is 11. Could you please suggest the SQL statement to solve this problem? Thank you.
One more question: The date in database is in datetime format. How do I convert it to date. I tried to use TRUNC function but it sometimes convert date to yesterday instead.
You need to count how many times an end_date equals the following start_date. For this you need to use the lag() or the lead() analytic function. You can use a case expression for the comparison, but alas you can't wrap the case expression within a COUNT or SUM in the same query; you need a subquery and an outer query.
Something like this; not tested, since you didn't provide CREATE TABLE and INSERT statements to recreate your sample data.
select id, min(start_date) as start_date, max(end_date) as end_date,
sum(end_date - start_date + 1 - flag) as num_days
from ( select id, start_date, end_date,
case when start_date = lag(end_date)
over (partition by id order by end_date) then 1
else 0 end as flag
from my_table
)
group by id;
SELECT id,
MIN( start_date ) AS start_date,
MAX( end_date ) AS end_date,
SUM( end_date - start_date + 1 ) AS num_days
FROM (
SELECT id,
GREATEST(
start_date,
COALESCE(
LAG( end_date ) OVER ( PARTITION BY id ORDER BY seq ) + 1,
start_date
)
) AS start_date,
end_date
FROM your_table
)
WHERE start_date <= end_date
GROUP BY id;

Using the model clause to expand dates

I have several different types of data involving date range that I want to merge together, but at the same time broken down by day. So a 3 day piece of data would result in three rows:
start primary_key
start+1 primary_key
start+2 primary_key
I've been playing around using the model clause of the select statement in 10g and was looking for the best way to achieve this. Currently I'm joining a range of dates that covers the full range of possible dates (select min(start date), max(end date)). I'd prefer to be selecting the data and adding in more rows to transform it to a per day dataset.
edit:
I've managed to come up with (now includes sample data):
SELECT * FROM (
SELECT 123 req_code,
345 req_par_code,
TO_DATE('01-03-2010', 'dd-mm-yyyy') req_start_date,
TO_DATE('05-03-2010', 'dd-mm-yyyy') req_end_date
FROM dual
)
MODEL
PARTITION BY (req_code)
DIMENSION BY (0 d)
MEASURES (SYSDATE dt, req_par_code, req_start_date, req_end_date)
RULES ITERATE(365) UNTIL (dt[iteration_number] >= TRUNC(req_end_date[0])) (
dt[iteration_number] = NVL(dt[iteration_number-1] + 1, TRUNC(req_start_date[0])),
--Copy data across
req_par_code[ iteration_number ] = req_par_code[0],
req_start_date[ iteration_number ] = req_start_date[0],
req_end_date[ iteration_number ] = req_end_date[0]
)
ORDER BY dt, req_code;
you can use the MODEL clause to generate rows, here's a small example:
SQL> SELECT * FROM t_data;
PK START_DATE END_DATE
---------- ----------- -----------
1 20/01/2010 20/01/2010
2 21/01/2010 23/01/2010
3 24/01/2010 27/01/2010
SQL> SELECT pk, start_date, end_date FROM t_data
2 MODEL
3 PARTITION BY (pk)
4 DIMENSION BY (0 AS i)
5 MEASURES(start_date, end_date)
6 RULES
7 ( start_date[FOR i
8 FROM 1 TO end_date[0]-start_date[0]
9 INCREMENT 1] = start_date[0] + cv(i),
10 end_date[ANY] = start_date[CV()] + 1
11 )
12 ORDER BY 1,2;
PK START_DATE END_DATE
---------- ----------- -----------
1 20/01/2010 21/01/2010
2 21/01/2010 22/01/2010
2 22/01/2010 23/01/2010
2 23/01/2010 24/01/2010
3 24/01/2010 25/01/2010
3 25/01/2010 26/01/2010
3 26/01/2010 27/01/2010
3 27/01/2010 28/01/2010
SELECT TO_DATE('01.01.2009', 'dd.mm.yyyy') + level - 1
FROM dual
CONNECT BY
TO_DATE('01.01.2009', 'dd.mm.yyyy') + level <= TRUNC(SYSDATE, 'DDD') + 1
will give you the list of all dates from Jan 1st, 2009 till today.