Oracle SQL overlap between begin date and end date in 2 or more records - sql

Database my_table:
id seq start_date end_date
1 1 01-01-2017 02-01-2017
1 2 07-01-2017 09-01-2017
1 3 11-01-2017 11-01-2017
2 1 20-01-2017 20-01-2017
3 1 01-02-2017 02-02-2017
3 2 03-02-2017 04-02-2017
3 3 08-01-2017 09-02-2017
3 4 09-01-2017 10-02-2017
3 5 10-01-2017 12-02-2017
My requirement is to get the first date (normally seq 1 start date) and end date (normally last seq end date) and the number of dates occurred during all seq for each unique ID.
Date occurred:
id 1 2 3
01-01-2017 20-01-2017 01-02-2017
02-01-2017 02-02-2017
07-01-2017 03-02-2017
08-01-2017 04-02-2017
09-01-2017 08-02-2017
11-01-2017 09-02-2017
10-02-2017
11-02-2017
12-02-2017
total 6 1 9
Here is the result I want:
id start_date end_date num_date
1 01-01-2017 11-01-2017 6
2 20-01-2017 20-01-2017 1
3 01-02-2017 12-02-2017 9
I have tried
SELECT id
, MIN(start_date)
, MAX(end_date)
, SUM(end_date - start_date + 1)
FROM my_table
GROUP BY id
and this SQL statement work fine in id 1 and 2 since there is no overlap date between begin date and end date. But for id 3, the result num_date is 11. Could you please suggest the SQL statement to solve this problem? Thank you.
One more question: The date in database is in datetime format. How do I convert it to date. I tried to use TRUNC function but it sometimes convert date to yesterday instead.

You need to count how many times an end_date equals the following start_date. For this you need to use the lag() or the lead() analytic function. You can use a case expression for the comparison, but alas you can't wrap the case expression within a COUNT or SUM in the same query; you need a subquery and an outer query.
Something like this; not tested, since you didn't provide CREATE TABLE and INSERT statements to recreate your sample data.
select id, min(start_date) as start_date, max(end_date) as end_date,
sum(end_date - start_date + 1 - flag) as num_days
from ( select id, start_date, end_date,
case when start_date = lag(end_date)
over (partition by id order by end_date) then 1
else 0 end as flag
from my_table
)
group by id;

SELECT id,
MIN( start_date ) AS start_date,
MAX( end_date ) AS end_date,
SUM( end_date - start_date + 1 ) AS num_days
FROM (
SELECT id,
GREATEST(
start_date,
COALESCE(
LAG( end_date ) OVER ( PARTITION BY id ORDER BY seq ) + 1,
start_date
)
) AS start_date,
end_date
FROM your_table
)
WHERE start_date <= end_date
GROUP BY id;

Related

Finding gaps between date ranges spanning records

I'm trying to write a query where I can find any gap in the date ranges for a given ID when passing in two dates.
EDIT: I need to know if a whole gap or part of a gap exists in my date range.
I have data in this format:
Example 1:
| ID | START_DATE | END_DATE |
|----|------------|------------|
| 1 | 01/01/2019 | 30/09/2019 |
| 1 | 01/03/2020 | (null) |
Example 2:
| ID | START_DATE | END_DATE |
|----|------------|------------|
| 2 | 01/01/2019 | 30/09/2019 |
| 2 | 01/10/2019 | 01/12/2019 |
| 2 | 02/12/2019 | (null) |
NB. A null end date essentially means "still active up to current day".
E.g. Example 1 has a gap of 152 days between 30/09/2019 and 01/03/2020. If I queried in the range of 05/05/2019 - 01/09/2019 there's no gap in that range. Whereas if I'm looking at the date range 05/05/2019 - 02/10/2019 there's a single day gap in that range.
For what it's worth, I don't actually care how many days gap, just whether there is one or not.
I've tried doing something like this but it doesn't work when my date falls into a gap:
SELECT SUM(START_DATE - PREV_END - 1)
FROM
(
SELECT ID, START_DATE, END_DATE, LAG(END_DATE) OVER (ORDER BY START_DATE) AS PREV_END_DATE
FROM TBL
WHERE ID = X_ID
)
WHERE START_DATE >= Y_FIRST_DATE
AND START_DATE <= Z_SECOND_DATE;
X_ID, Y_FIRST_DATE, and Z_SECOND_DATE are just any different ID or date range I might want to pass in.
How could I go about this?
Another option to determine the days might be by use SELECT .. FROM dual CONNECT BY LEVEL <= syntax through EXISTence of gaps by INTERSECTing two sets, one finds all dates between extremum parameters while the other finds all the dates fitting within the dates inserted into table as bounds :
SELECT CASE WHEN
SUM( 1 + LEAST(Z_SECOND_DATE,NVL(END_DATE,TRUNC(SYSDATE)))
- GREATEST(Y_FIRST_DATE,START_DATE) ) = Z_SECOND_DATE - Y_FIRST_DATE + 1 THEN
'NO Gap'
ELSE
'Gap Exists'
END "gap?"
FROM TBL t
WHERE ID = X_ID
AND EXISTS ( SELECT Y_FIRST_DATE + LEVEL - 1
FROM dual
CONNECT BY LEVEL <= Z_SECOND_DATE - Y_FIRST_DATE + 1
INTERSECT
SELECT t.START_DATE + LEVEL - 1
FROM dual
CONNECT BY LEVEL <= NVL(t.END_DATE,TRUNC(SYSDATE))- t.START_DATE + 1
)
START_DATE values are assumed to be non-null based on the sample data.
Demo
This is another variation the islands-and-gaps problem that pops up a lot here. I think this fits with Oracle's pattern matching functionality. Take this example:
WITH tbl AS
(
SELECT 1 AS ID, to_date('01/01/2019', 'DD/MM/YYYY') AS START_DATE, to_date('30/09/2019', 'DD/MM/YYYY') AS END_DATE FROM DUAL
UNION ALL
SELECT 1 AS ID, to_date('01/03/2020', 'DD/MM/YYYY') AS START_DATE, NULL AS END_DATE FROM DUAL
UNION ALL
SELECT 2 AS ID, to_date('01/01/2019', 'DD/MM/YYYY') AS START_DATE, to_date('30/09/2019', 'DD/MM/YYYY') AS END_DATE FROM DUAL
UNION ALL
SELECT 2 AS ID, to_date('01/10/2019', 'DD/MM/YYYY') AS START_DATE, to_date('01/12/2019', 'DD/MM/YYYY') AS END_DATE FROM DUAL
UNION ALL
SELECT 2 AS ID, to_date('02/12/2019', 'DD/MM/YYYY') AS START_DATE, NULL AS END_DATE FROM DUAL
)
SELECT *
FROM tbl
MATCH_RECOGNIZE(ORDER BY ID, start_date
MEASURES b.id AS ID,
a.end_date+1 AS GAP_START,
b.start_date-1 AS GAP_END
PATTERN (A B+)
DEFINE B AS start_date > PREV(end_date)+1 AND ID = PREV(ID))L;
I know it looks long, but most of it is creating the WITH clause. The pattern matching allows you to define what a gap is and pull the information accordingly. Notice that in order to have a gap, your start date must be greater than the previous end date + 1 grouped by the ID column.
To enhance this to answer your updated/edited question, just add this line of code to the end:
WHERE GREATEST(gap_start, TO_DATE('15/09/2019', 'DD/MM/YYYY' /*Y_FIRST_DATE*/)) <= LEAST(gap_end, to_date('15/10/2019', 'DD/MM/YYYY')/*Z_SECOND_DATE*/)
You can split the date range you are passing, into dates and then compare it with a date range in your table as follows:
SELECT
CASE WHEN SUM(CASE WHEN T.ID IS NULL THEN 1 END) > 0
THEN 'THERE IS GAP'
ELSE 'THERE IS NO GAP'
END AS RESULT_
FROM ( SELECT P_IN_FROM_DATE + LEVEL - 1 AS CUST_DATES
FROM DUAL
CONNECT BY LEVEL <= P_IN_TO_DATE - P_IN_FROM_DATE + 1
) CUST_TBL
LEFT JOIN TBL T
ON CUST_TBL.CUST_DATES BETWEEN T.START_DATE AND T.END_DATE
OR ( CUST_TBL.CUST_DATES >= T.START_DATE AND T.END_DATE IS NULL )
I would suggest finding the maximum end date before the current record -- based on the start date.
That would be:
select t.*
from (select t.*,
max(end_date) over (order by start_date
rows between unbounded preceding and 1 preceding
) as max_prev_end_date
from tbl t
where start_date <= :input_end_date and
end_date >= :input_start_date
) t
where max_prev_end_date < start_date;

Skip specific rows using LAG in sql

I have a table that looks like this:
Using the LAG function in SQL, I would like to perform the LAG on only values where star_date=end_date and get the past previous start_date record where start_date=end_date.
That my end table will have an extra column like this:
I hope my question is clear, any help is appreciated.
You can assign a group to these values and use that:
select t.*,
(case when start_date = end_date
then lag(start_date) over (partition by (case when start_date = end_date then 1 else 0 end) order by start_date)
end) as prev_eq_start_date
from t;
Or:
select t.*,
(case when start_date = end_date
then lag(start_date) over (partition by start_date = end_date order by start_date)
end) as prev_eq_start_date
from t;
Note if you data is big and most rows have different dates, then you might have a resources issue. In this case, an additional, unused partition by key can help:
select t.*,
(case when start_date = end_date
then lag(start_date) over (partition by (case when start_date = end_date then 1 else 2 end), (case when start_date <> end_date then start_date end) order by start_date)
end) as prev_eq_start_date
from t;
This has no impact on the result but it can avoid a resources error caused by too many rows with different values.
Below is for BigQuery Standard SQL
#standardSQL
SELECT *, NULL AS lag_result
FROM `project.dataset.table` WHERE start_date != end_date
UNION ALL
SELECT *, LAG(start_date) OVER(ORDER BY start_date)
FROM `project.dataset.table` WHERE start_date = end_date
If to apply to sample data in your question - result is
Row user_id start_date end_date lag_result
1 1 2019-01-01 2019-02-28 null
2 3 2019-02-27 2019-02-28 null
3 4 2019-08-04 2019-09-01 null
4 2 2019-02-01 2019-02-01 null
5 5 2019-08-07 2019-08-07 2019-02-01
6 6 2019-08-27 2019-08-27 2019-08-07
Btw, in case if your start_date and end_date are of STRING data type ('27/02/2019') vs. DATE type ('2019-02-27' as it was assumed in above query) - you should use below one
#standardSQL
SELECT *, NULL AS lag_result
FROM `project.dataset.table` WHERE start_date != end_date
UNION ALL
SELECT *, LAG(start_date) OVER(ORDER BY PARSE_DATE('%d/%m/%Y', start_date))
FROM `project.dataset.table` WHERE start_date = end_date
with result
Row user_id start_date end_date lag_result
1 1 01/01/2019 28/02/2019 null
2 3 27/02/2019 28/02/2019 null
3 4 04/08/2019 01/09/2019 null
4 2 01/02/2019 01/02/2019 null
5 5 07/08/2019 07/08/2019 01/02/2019
6 6 27/08/2019 27/08/2019 07/08/2019
Use JOIN
SQL FIDDLE
SELECT T.*,T1.LAG_Result
FROM TABLE T LEFT JOIN
(
SELECT User_Id,LAG(start_date) OVER(ORDER BY start_date) LAG_Result
FROM TABLE S
WHERE start_date = end_date
) T1 ON T.User_Id = T1.User_Id

SQL to find difference of dates - Oracle

I have a table:
table1
tran_id user_id start_date end_date
1 100 01-04-2018 02-04-2018
2 100 14-06-2018 14-06-2018
4 100 12-06-2018 12-06-2018
7 101 05-01-2018 05-01-2018
9 101 08-01-2018 08-01-2018
3 101 03-01-2018 03-01-2018
Date format is DD-MM-YYYY
I need to find the day difference ordered by user_id and start_date. The formula to find day difference is to subtract end_date of the prior record and start_date of the next.
Here is the expected output:
tran_id user_id start_date end_date day_diff
1 100 01-04-2018 02-04-2018 71
4 100 12-06-2018 12-06-2018 2
2 100 14-06-2018 14-06-2018 0
3 101 03-01-2018 03-01-2018 2
7 101 05-01-2018 05-01-2018 3
9 101 08-01-2018 08-01-2018 0
How to get this in an SQL query?
Use function lead() to find next value, then substract dates:
select tran_id, user_id, start_date, end_date, nvl(nsd - start_date, 0) diff
from (
select t.*, lead(start_date) over (partition by user_id
order by start_date) nsd
from table1 t)
demo
You can use lead() for this. Because lead() takes three arguments, you don't need a subquery or to deal with null values:
select t.*,
(lead(start_date, 1, start_date) over (partition by user_id order by start_date) -
start_date
) as diff
from t;
I think this: DATEDIFF function in Oracle in combination with the LAG function should help you out.
A minor correction to the answer given by Ponder Stibbons. Difference highlighted in Bold.
select tran_id, user_id, start_date, end_date, nvl(nsd - end_date, 0) diff
from (
select t.*, lead(start_date) over (partition by user_id
order by start_date) nsd
from date_test t);
Regards
Akash

Select min/max dates for periods that don't intersect

Example! I have a table with 4 columns. date format dd.MM.yy
id ban start end
1 1 01.01.15 31.12.18
1 1 02.02.15 31.12.18
1 1 05.04.15 31.12.17
In this case dates from rows 2 and 3 are included in dates from row 1
1 1 02.04.19 31.12.20
1 1 05.05.19 31.12.20
In this case dates from row 5 are included in dates from rows 4. Basically we have 2 periods that don't intersect.
01.01.15 31.12.18
and
02.04.19 31.12.20
Situation where a date starts in one period and ends in another are impossible. The end result should look like this
1 1 01.01.15 31.12.18
1 1 02.04.19 31.12.20
I tried using analitical functions(LAG)
select id
, ban
, case
when start >= nvl(lag(start) over (partition by id, ban order by start, end asc), start)
and end <= nvl(lag(end) over (partition by id, ban order by start, end asc), end)
then nvl(lag(start) over (partition by id, ban order by start, end asc), start)
else start
end as start
, case
when start >= nvl(lag(start) over (partition by id, ban order by start, end asc), start)
and end <= nvl(lag(end) over (partition by id, ban order by start, end asc), end)
then nvl(lag(end) over (partition by id, ban order by start, end asc), end)
else end
end as end
from table
Where I order rows and if current dates are included in previous I replace them. It works if I have just 2 rows. For example this
1 1 08.09.15 31.12.99
1 1 31.12.15 31.12.99
turns into this
1 1 08.09.15 31.12.99
1 1 08.09.15 31.12.99
which I can then group by all fields and get what I want, but if there are more
1 2 13.11.15 31.12.99
1 2 31.12.15 31.12.99
1 2 16.06.15 31.12.99
I get
1 2 16.06.15 31.12.99
1 2 16.06.15 31.12.99
1 2 13.11.15 31.12.99
I understand why this happens, but how do I work around it? Running the query multiple times is not an option.
This query looks promising:
-- test data
with t(id, ban, dtstart, dtend) as (
select 1, 1, date '2015-01-01', date '2015-03-31' from dual union all
select 1, 1, date '2015-02-02', date '2015-03-31' from dual union all
select 1, 1, date '2015-03-15', date '2015-03-31' from dual union all
select 1, 1, date '2015-08-05', date '2015-12-31' from dual union all
select 1, 2, date '2015-01-01', date '2016-12-31' from dual union all
select 2, 1, date '2016-01-01', date '2017-12-31' from dual),
-- end of test data
step1 as (select id, ban, dt, to_number(inout) direction
from t unpivot (dt for inout in (dtstart as '1', dtend as '-1'))),
step2 as (select distinct id, ban, dt, direction,
sum(direction) over (partition by id, ban order by dt) sm
from step1),
step3 as (select id, ban, direction, dt dt1,
lead(dt) over (partition by id, ban order by dt) dt2
from step2
where (direction = 1 and sm = 1) or (direction = -1 and sm = 0) )
select id, ban, dt1, dt2
from step3 where direction = 1 order by id, ban, dt1
step1 - unpivot dates and assign 1 for start date, -1 for end
date (column direction)
step2 - add cumulative sum for direction
step3 - filter only interesting dates, pivot second date using lead()
You can shorten this syntax, I divided it to steps to show what's going on.
Result:
ID BAN DT1 DT2
------ ---------- ----------- -----------
1 1 2015-01-01 2015-03-31
1 1 2015-08-05 2015-12-31
1 2 2015-01-01 2016-12-31
2 1 2016-01-01 2017-12-31
I assumed that for different (ID, BAN) we have to make calculations separately. If not - change partitioning and ordering in sum() and lead().
Pivot and unpivot works in Oracle 11 and later, for earlier versions you need case when.
BTW - START is reserved word in Oracle so in my example I changed slightly column names.
I like to do this by identifying the period starts, then doing a cumulative sum to define the group, and a final aggregation:
select id, ban, min(start), max(end)
from (select t.*, sum(start_flag) over (partition by id, bin order by start) as grp
from (select t.*,
(case when exists (select 1
from t t2
where t2.id = t.id and t2.ban = t.ban and
t.start <= t2.end and t.end >= t2.start and
t.start <> t2.start and t.end <> t2.end
)
then 0 else 1
end) as start_flag
from t
) t
) t
group by id, ban, grp;

using min and max in group by clause

I want below output in oracle sql.
I have data in table as below :
id start_date end_date assignment number
1 2.02.2014 15.02.2014 10
2 25.02.2014 30.02.2014 20
3 26.03.2014 04.05.2014 30
4 06.06.2014 31.12.4712 10
I need output using group by
assignment_number start_date end_date
10 02.02.2014 15.02.2014
10 06.06.2014 31.12.4712
20 25.02.2014 30.02.2014
30 26.03.2014 04.05.2014
I tried using min(start_date) and max(end_date) for assignment 10 ia was getting output as
assignment_number start_date end_date
10 02.02.2014 31.12.4712
But I want as :-
assignment_number start_date end_date
10 02.02.2014 15.02.2014
10 06.06.2014 31.12.4712
Please help
I think you'd have to calculate the min and max separately, then union them. Try something like this:
SELECT
assignment_number
, start_date
, end_date
FROM
(SELECT
assignment_number
, start_date
, end_date
FROM TABLE
GROUP BY assignment_number
HAVING MIN(start_date)
UNION
SELECT
assignment_number
, start_date
, end_date
FROM TABLE
GROUP BY assignment_number
HAVING MAX(end_date)
)
ORDER BY
1 ASC
, 2 ASC
, 3 ASC
;
sql fiddle
select id, to_char(start_date,'dd.mm.yyyy') start_date, to_char(end_date,'dd.mm.yyyy') end_date,ASSIGNMENT_NUMBER from sof1 s
where not exists
(select 1 from sof1 s2
where s2.assignment_number=s.assignment_number
and s2.start_date<s.start_date
)
or not exists
(select 1 from sof1 s2
where s2.assignment_number=s.assignment_number
and s2.end_date>s.end_date
)
order by ASSIGNMENT_NUMBER
With analytic function:
sql fiddle
select id, to_char(start_date,'dd.mm.yyyy') start_date, to_char(end_date,'dd.mm.yyyy') end_date,ASSIGNMENT_NUMBER from
(select s.*
, min (start_date) over (partition by ASSIGNMENT_NUMBER) sd
, max (end_date) over (partition by ASSIGNMENT_NUMBER) ed
from sof1 s
)
where start_date=sd or end_date=ed
order by ASSIGNMENT_NUMBER, start_date