SQL counting days with gap / overlapping - sql

I am working on a "counting days" problem almost identical to this one. I have a list of date(s), and need to count how many days used excluding duplicate, and handling the gaps. Same input and output.
From: Markus Jarderot
Input
ID d1 d2
1 2011-08-01 2011-08-08
1 2011-08-02 2011-08-06
1 2011-08-03 2011-08-10
1 2011-08-12 2011-08-14
2 2011-08-01 2011-08-03
2 2011-08-02 2011-08-06
2 2011-08-05 2011-08-09
Output
ID hold_days
1 11
2 8
SQL to find time elapsed from multiple overlapping intervals
But for the life of me I couldn't understand Markus Jarderot's solution.
SELECT DISTINCT
t1.ID,
t1.d1 AS date,
-DATEDIFF(DAY, (SELECT MIN(d1) FROM Orders), t1.d1) AS n
FROM Orders t1
LEFT JOIN Orders t2 -- Join for any events occurring while this
ON t2.ID = t1.ID -- is starting. If this is a start point,
AND t2.d1 <> t1.d1 -- it won't match anything, which is what
AND t1.d1 BETWEEN t2.d1 AND t2.d2 -- we want.
GROUP BY t1.ID, t1.d1, t1.d2
HAVING COUNT(t2.ID) = 0
Why is DATEDIFF(DAY, (SELECT MIN(d1) FROM Orders), t1.d1) picking from the min(d1) from the entire list? Is that regardless of ID.
And what does t1.d1 BETWEEN t2.d1 AND t2.d2 do? Is that to ensure only overlapped interval are calculated?
Same thing with group by, I think because if in the event the same identical period will be discarded? I tried to trace the solution by hand but getting more confused.

This is mostly a duplicate of my answer here (including explanation) but with the inclusion of grouping on an id column. It should use a single table scan and does not require a recursive sub-query factoring clause (CTE) or self joins.
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE your_table ( id, usr, start_date, end_date ) AS
SELECT 1, 'A', DATE '2017-06-01', DATE '2017-06-03' FROM DUAL UNION ALL
SELECT 1, 'B', DATE '2017-06-02', DATE '2017-06-04' FROM DUAL UNION ALL -- Overlaps previous
SELECT 1, 'C', DATE '2017-06-06', DATE '2017-06-06' FROM DUAL UNION ALL
SELECT 1, 'D', DATE '2017-06-07', DATE '2017-06-07' FROM DUAL UNION ALL -- Adjacent to previous
SELECT 1, 'E', DATE '2017-06-11', DATE '2017-06-20' FROM DUAL UNION ALL
SELECT 1, 'F', DATE '2017-06-14', DATE '2017-06-15' FROM DUAL UNION ALL -- Within previous
SELECT 1, 'G', DATE '2017-06-22', DATE '2017-06-25' FROM DUAL UNION ALL
SELECT 1, 'H', DATE '2017-06-24', DATE '2017-06-28' FROM DUAL UNION ALL -- Overlaps previous and next
SELECT 1, 'I', DATE '2017-06-27', DATE '2017-06-30' FROM DUAL UNION ALL
SELECT 1, 'J', DATE '2017-06-27', DATE '2017-06-28' FROM DUAL UNION ALL -- Within H and I
SELECT 2, 'K', DATE '2011-08-01', DATE '2011-08-08' FROM DUAL UNION ALL -- Your data below
SELECT 2, 'L', DATE '2011-08-02', DATE '2011-08-06' FROM DUAL UNION ALL
SELECT 2, 'M', DATE '2011-08-03', DATE '2011-08-10' FROM DUAL UNION ALL
SELECT 2, 'N', DATE '2011-08-12', DATE '2011-08-14' FROM DUAL UNION ALL
SELECT 3, 'O', DATE '2011-08-01', DATE '2011-08-03' FROM DUAL UNION ALL
SELECT 3, 'P', DATE '2011-08-02', DATE '2011-08-06' FROM DUAL UNION ALL
SELECT 3, 'Q', DATE '2011-08-05', DATE '2011-08-09' FROM DUAL;
Query 1:
SELECT id,
SUM( days ) AS total_days
FROM (
SELECT id,
dt - LAG( dt ) OVER ( PARTITION BY id
ORDER BY dt ) + 1 AS days,
start_end
FROM (
SELECT id,
dt,
CASE SUM( value ) OVER ( PARTITION BY id
ORDER BY dt ASC, value DESC, ROWNUM ) * value
WHEN 1 THEN 'start'
WHEN 0 THEN 'end'
END AS start_end
FROM your_table
UNPIVOT ( dt FOR value IN ( start_date AS 1, end_date AS -1 ) )
)
WHERE start_end IS NOT NULL
)
WHERE start_end = 'end'
GROUP BY id
Results:
| ID | TOTAL_DAYS |
|----|------------|
| 1 | 25 |
| 2 | 13 |
| 3 | 9 |

The brute force method is to create all days (in a recursive query) and then count:
with dates(id, day, d2) as
(
select id, d1 as day, d2 from mytable
union all
select id, day + 1, d2 from dates where day < d2
)
select id, count(distinct day)
from dates
group by id
order by id;
Unfortunately there is a bug in some Oracle versions and recursive queries with dates don't work there. So try this code and see whether it works in your system. (I have Oracle 11.2 and the bug still exists there; so I guess you need Oracle 12c.)

I guess Markus' idea is to find all starting points that are not within other ranges and all ending points that aren't. Then just take the first starting point till the first ending point, then the next starting point till the next ending point, etc. As Markus isn't using a window function to number starting and ending points, he must find a more complicated way to achieve this. Here is the query with ROW_NUMBER. Maybe this gives you a start what to look for in Markus' query.
select startpoint.id, sum(endpoint.day - startpoint.day)
from
(
select id, d1 as day, row_number() over (partition by id order by d1) as rn
from mytable m1
where not exists
(
select *
from mytable m2
where m1.id = m2.id
and m1.d1 > m2.d1 and m1.d1 <= m2.d2
)
) startpoint
join
(
select id, d2 as day, row_number() over (partition by id order by d1) as rn
from mytable m1
where not exists
(
select *
from mytable m2
where m1.id = m2.id
and m1.d2 >= m2.d1 and m1.d2 < m2.d2
)
) endpoint on endpoint.id = startpoint.id and endpoint.rn = startpoint.rn
group by startpoint.id
order by startpoint.id;

If all your intervals start at different dates, consider them in ascending order by d1 counting how many days are from d1 to the next interval.
You can discard an interval of it is contained in another one.
The last interval won't have a follower.
This query should give you how many days each interval give
select a.id, a.d1,nvl(min(b.d1), a.d2) - a.d1
from orders a
left join orders b
on a.id = b.id and a.d1 < b.d1 and a.d2 between b.d1 and b.d2
group by a.id, a.d1
Then group by id and sum days

Related

How to get the count of new unique ip address logged in to the website on each day using analytical function in sql?

Ex:
Date - up,
1/2 - 1.1.127.0 ,
1/3 - 1.1.127.1,
1/3 - 1.1.127.0,
1/4 - 1.1.127.3,
1/4 - 1.1.127.5,
1/5 - 1.1.127.3,
Output:
Date-count,
1/2 - 1,
1/3 - 1,
1/4 - 2,
1/5 -0
New and unique ip logged in in each day
You want to count how many IPs exist for a date that have not occurred on a previous date. You want to use analytic functions for this.
The number of new IDs is the total number of distinct IDs on a date minus the number of the previous date. In order to get this, first select the running count per row. Then aggregate per date to get the distinct number of IDs per date. Then use LAG to get the difference per day.
select
date,
max(cnt) - lag(max(cnt)) over (order by date) as new_ips
from
(
select date, count(distinct ip) over (order by date) as cnt
from mytable
) running_counts
group by date
order by date;
The same without analytic functions, which is probably more readable:
select date, count(distinct ip) as cnt
from mytable
where not exists
(
select null
from mytable before
where before.date < mytable.date
and before.id = mytable.id
)
group by date
order by date;
The DISTINCT in this latter query is not necessary, if there can be no duplicates (two rows with the same date and IP) in the table.
You can also use below solution using left join.
with t (dt, ip) as (
select to_date( '1/2', 'MM/DD' ), '1.1.127.0' from dual union all
select to_date( '1/3', 'MM/DD' ), '1.1.127.1' from dual union all
select to_date( '1/3', 'MM/DD' ), '1.1.127.0' from dual union all
select to_date( '1/4', 'MM/DD' ), '1.1.127.3' from dual union all
select to_date( '1/4', 'MM/DD' ), '1.1.127.5' from dual union all
select to_date( '1/5', 'MM/DD' ), '1.1.127.3' from dual
)
select t.DT, count( decode(t2.IP, null, 1, null) ) cnt
from t
left join t t2
on ( t2.DT < t.DT and t2.IP = t.IP )
group by t.DT
order by 1
;
demo

ORACLE SQL: Subtract dates of two or more consecutive rows

I would like to subtract dates of two consecutive rows, using ORACLE's LAG-function (ORACLE version 19g):
SELECT CLIENT, ID, GROUP_A, GROUP_B, GROUP_C, DATE_A, DATE_B
(DATE_A - LAG(DATE_B, 1) OVER (PARTITION BY GROUP_A, ID
ORDER BY ID ASC, GROUP_A ASC, GROUP_B ASC)) AS DELTA_TIME_IN_DAYS
FROM MY_TABLE
As you can see, the problem is, that there can be multiple entries like in line 3 and 4.
Of course column "DELTA_TIME_IN_DAYS" entry in line 4 shouldn't be negative.
It should be "1" as result.
Do you have any suggestions to solve this problem?
I think you want to order by date_b, if you want non-negative numbers:
SELECT CLIENT, ID, GROUP_A, GROUP_B, GROUP_C, DATE_A, DATE_B
(DATE_A - LAG(DATE_B, 1) OVER (PARTITION BY GROUP_A, ID
ORDER BY ID DATE_B ASC
)
) AS DELTA_TIME_IN_DAYS
FROM MY_TABLE;
Note that you do not need to repeat the partitioning keys in the ORDER BY.
You may use range between to offset by groups of rows with identical sort ordinal rather than by physical rows like lag do by default. So, let's use last_value with offset of 1 group to do the same that lag does.
with a (id, start_dt, end_dt) as (
select 1, date '2021-01-01', date '2021-01-31' from dual union all
select 2, date '2021-02-01', date '2021-02-27' from dual union all
select 3, date '2021-03-01', date '2021-03-25' from dual union all
select 3, date '2021-03-01', date '2021-03-25' from dual union all
select 4, date '2021-04-01', date '2021-05-31' from dual union all
select 4, date '2021-04-01', date '2021-05-31' from dual
)
select
a.*
, last_value(end_dt) over(order by id asc range between unbounded preceding and 1 preceding) as dt_diff
from a
D
START_DT
END_DT
DT_DIFF
1
2021-01-01T00:00:00Z
2021-01-31T00:00:00Z
(null)
2
2021-02-01T00:00:00Z
2021-02-27T00:00:00Z
2021-01-31T00:00:00Z
3
2021-03-01T00:00:00Z
2021-03-25T00:00:00Z
2021-02-27T00:00:00Z
3
2021-03-01T00:00:00Z
2021-03-25T00:00:00Z
2021-02-27T00:00:00Z
4
2021-04-01T00:00:00Z
2021-05-31T00:00:00Z
2021-03-25T00:00:00Z
4
2021-04-01T00:00:00Z
2021-05-31T00:00:00Z
2021-03-25T00:00:00Z

How do I find the total number of used days in a month?

I am arriving at the total number of days a service has been used in a month.
(Start_Date and End_Date are - both inclusive)
Sample Data 1:
User Start_Date End_Date
A 01-Jun-2017 30-Jun-2017
B 06-Jun-2017 30-Jun-2017
Ans: Service used days = 30 days.
Sample Data 2:
User Start_Date End_Date
C 06-Jun-2017 10-Jun-2017
D 02-Jun-2017 02-Jun-2017
Ans: Service used days = 6 days.
How do I write a code to find the same, preferable in SQL to PLSQL.
Test Data:
CREATE TABLE your_table ( usr, start_date, end_date ) AS (
SELECT 'A', DATE '2017-06-01', DATE '2017-06-03' FROM DUAL UNION ALL
SELECT 'B', DATE '2017-06-02', DATE '2017-06-04' FROM DUAL UNION ALL -- Overlaps previous
SELECT 'C', DATE '2017-06-06', DATE '2017-06-06' FROM DUAL UNION ALL
SELECT 'D', DATE '2017-06-07', DATE '2017-06-07' FROM DUAL UNION ALL -- Adjacent to previous
SELECT 'E', DATE '2017-06-11', DATE '2017-06-20' FROM DUAL UNION ALL
SELECT 'F', DATE '2017-06-14', DATE '2017-06-15' FROM DUAL UNION ALL -- Within previous
SELECT 'G', DATE '2017-06-22', DATE '2017-06-25' FROM DUAL UNION ALL
SELECT 'H', DATE '2017-06-24', DATE '2017-06-28' FROM DUAL UNION ALL -- Overlaps previous and next
SELECT 'I', DATE '2017-06-27', DATE '2017-06-30' FROM DUAL UNION ALL
SELECT 'J', DATE '2017-06-27', DATE '2017-06-28' FROM DUAL; -- Within H and I
Query:
SELECT SUM( days ) AS total_days
FROM (
SELECT dt - LAG( dt ) OVER ( ORDER BY dt ) + 1 AS days,
start_end
FROM (
SELECT dt,
CASE SUM( value ) OVER ( ORDER BY dt ASC, value DESC, ROWNUM ) * value
WHEN 1 THEN 'start'
WHEN 0 THEN 'end'
END AS start_end
FROM your_table
UNPIVOT ( dt FOR value IN ( start_date AS 1, end_date AS -1 ) )
)
WHERE start_end IS NOT NULL
)
WHERE start_end = 'end';
Output:
TOTAL_DAYS
----------
25
Explanation:
SELECT dt, value
FROM your_table
UNPIVOT ( dt FOR value IN ( start_date AS 1, end_date AS -1 ) )
This will UNPIVOT the table so that the start and end dates are in the same column (dt) and are given a corresponding value of +1 for a start and -1 for an end date.
SELECT dt,
SUM( value ) OVER ( ORDER BY dt ASC, value DESC, ROWNUM ) AS total,
value
FROM your_table
UNPIVOT ( dt FOR value IN ( start_date AS 1, end_date AS -1 ) )
Will give the start and end dates and the cumulative sum of those generated values. The start of a range will always have value=1 and total=1 and the end of a range will always have total=0. If a date is mid-way through a range then it will either have total>1 or value=-1 and total=1. Using this, if you multiply value and total then the start of a range is when value*total=1 and the end of a range is when value*total=0 and any other value indicates a date that is midway through a range.
Which is what this gives:
SELECT dt,
CASE SUM( value ) OVER ( ORDER BY dt ASC, value DESC, ROWNUM ) * value
WHEN 1 THEN 'start'
WHEN 0 THEN 'end'
END AS start_end
FROM your_table
UNPIVOT ( dt FOR value IN ( start_date AS 1, end_date AS -1 ) )
You can then filter out the dates when the start_end is NULL which will leave you with a table with alternating start and end rows which you can use LAG to calculate the number of days difference:
SELECT dt - LAG( dt ) OVER ( ORDER BY dt ) + 1 AS days,
start_end
FROM (
SELECT dt,
CASE SUM( value ) OVER ( ORDER BY dt ASC, value DESC, ROWNUM ) * value
WHEN 1 THEN 'start'
WHEN 0 THEN 'end'
END AS start_end
FROM your_table
UNPIVOT ( dt FOR value IN ( start_date AS 1, end_date AS -1 ) )
)
WHERE start_end IS NOT NULL
All you need to do then is to SUM all the differences for the end - start; which gives the query above.
As #Pravin Satav addressed, your requirement it's not very clear, something like this is what I understood from your explanation:
SELECT sum(CASE WHEN end_date=start_date THEN 1 ELSE (end_date-start_date)+1 END) as total_days
FROM my_table
WHERE <conditions that determine your "sample data">;

Oracle SQL CTE (Common Table Expression) where no data

I'm building a very complex SQL query in Oracle using multiple CTEs, where latter expressions rely on values from former ones. However I find that the whole execution halts if one of the prior CTEs contains no data. For example:
WITH CTE1 AS
(
SELECT
PEOPLE.ID AS PID,
APPLICATIONS.DATE AS APPDATE
FROM
PEOPLE,
APPLICATIONS
WHERE
APPLICATIONS.PERSON_ID = PEOPLE.ID
AND APPLICATIONS.DATE BETWEEN TO_DATE('2015-01-01','YYYY-MM-DD') AND TO_DATE('2015-01-31','YYYY-MM-DD')
),
CTE2 AS
(
SELECT
APPLICATIONS.PERSON_ID AS PID
MIN(APPLICATIONS.DATE) AS EARLIEST_APPDATE
FROM
CTE1,
APPLICATIONS
WHERE
APPLICATIONS.PERSON_ID = CTE1.PID
AND APPLICATIONS.DATE < ADD_MONTHS(CTE1.APPDATE, -18)
GROUP BY APPLICATIONS.PERSON_ID
),
MAIN_QUERY AS
(
SELECT
CTE1.PID AS PID
FROM
CTE1, CTE2
WHERE
-- Note that the PIDs should either match, or should not exist in CTE2
CTE1.PID = CTE2.PID OR (NOT EXISTS (SELECT PID FROM CTE2 WHERE CTE1.PID = CTE2.PID))
)
SELECT
MAIN_QUERY.PID
FROM MAIN_QUERY
Of course, I realise that the above example is completely pointless in itself, however I have just simplified this to illustrate the problem. CTE2 returns the earliest date of any application made by the same Person ID where the application is dated more than 18 months prior to the application date of CTE1. However... what if there are no such applications? CTE2 is capable of returning zero rows.
You will notice that CTE2 is not, in itself, referenced in the final query. An empty CTE2 is dealt with in MAIN_QUERY. So with regards the final query, it should not matter whether CTE2 actually returns any lines or not.
However the application I'm using (Business Objects) throws up an error that the query has "no data to fetch", when CTE2 has no lines.
I want to find a way around this, to enable my query to execute even if CTE2 returns null. Thanks.
CTE2 returns the earliest date of any application made by the same Person ID where the application is dated more than 18 months prior to the application date of CTE1. However... what if there are no such applications? CTE2 is capable of returning zero rows.
You can replace your sub-query factoring (WITH ... AS ( ... )) clauses with a simple analytical function:
Oracle Setup:
CREATE TABLE PEOPLE ( id ) AS
SELECT LEVEL FROM DUAL CONNECT BY LEVEL <= 3;
CREATE TABLE APPLICATIONS ( id, person_id, "DATE" ) AS
SELECT 1, 1, DATE '2015-01-01' FROM DUAL UNION ALL -- First row to return
SELECT 2, 1, DATE '2014-01-01' FROM DUAL UNION ALL -- Within 18 months
SELECT 3, 1, DATE '2013-01-01' FROM DUAL UNION ALL -- Before 18 months
SELECT 4, 1, DATE '2012-01-01' FROM DUAL UNION ALL -- Before 18 months and min
SELECT 5, 2, DATE '2015-01-02' FROM DUAL UNION ALL -- Second row to return
SELECT 6, 2, DATE '2014-01-02' FROM DUAL UNION ALL -- Within 18 months
SELECT 7, 3, DATE '2015-01-03' FROM DUAL UNION ALL -- Third row to return
SELECT 8, 3, DATE '2013-07-03' FROM DUAL; -- Exactly 18 months earlier
Query:
SELECT PID,
APPDATE,
CASE EARLIEST_APPDATE
WHEN APPDATE - INTERVAL '18' MONTH
THEN NULL
ELSE EARLIEST_APPDATE
END AS EARLIEST_APPDATE -- Included for the edge case where
-- EARLIEST_APPDATE is exactly 18 months
-- earlier as the RANGE BETWEEN is
-- inclusive.
FROM (
SELECT p.ID AS PID,
a."DATE" AS APPDATE,
MIN( a."DATE" ) OVER ( PARTITION BY p.ID
ORDER BY a."DATE"
RANGE BETWEEN UNBOUNDED PRECEDING
AND INTERVAL '18' MONTH PRECEDING )
AS EARLIEST_APPDATE
FROM PEOPLE p
INNER JOIN APPLICATIONS a
ON ( a.PERSON_ID = p.ID )
)
WHERE APPDATE BETWEEN DATE '2015-01-01' AND DATE '2015-01-31'
Output:
PID APPDATE EARLIEST_APPDATE
---------- ------------------- -------------------
1 2015-01-01 00:00:00 2012-01-01 00:00:00
2 2015-01-02 00:00:00
3 2015-01-03 00:00:00

Lowest continuous date without break

I have a table and each record has a date. We can assume that a date range is contiguous if there's not a 3 month break. How can I find the start of the most recent contiguous date range?
For example, imagine if I had this data:
1990-5-1
1990-6-4
1990-10-28
1990-11-14
1990-12-19
1991-1-20
1991-4-30
1991-5-13
I'd like for it to return 1991-4-30 because it's the start of the most recent contiguous range of dates.
I think this does what you're looking for. Using my own table and column names as test data. This is on Oracle.
select * from (
select * from sm_ss_tickets t1 where exists (
select * from sm_ss_tickets t2 where t2.created_date between t1.created_date and t1.created_date+90 and t1.rowid <> t2.rowid
) order by created_date asc
) where rownum = 1;
Maybe something like the following would work:
WITH d1 AS (
SELECT date'1990-05-01' AS dt FROM dual
UNION ALL
SELECT date'1990-06-04' AS dt FROM dual
UNION ALL
SELECT date'1990-10-28' AS dt FROM dual
UNION ALL
SELECT date'1990-11-14' AS dt FROM dual
UNION ALL
SELECT date'1990-12-19' AS dt FROM dual
UNION ALL
SELECT date'1991-01-20' AS dt FROM dual
UNION ALL
SELECT date'1991-04-30' AS dt FROM dual
UNION ALL
SELECT date'1991-05-13' AS dt FROM dual
)
SELECT MAX(dt) FROM (
SELECT dt, LAG(dt) OVER ( ORDER BY dt ) AS prev_dt, LEAD(dt) OVER ( ORDER BY dt ) AS next_dt
FROM d1
) WHERE ( dt > ADD_MONTHS(prev_dt, 3) OR prev_dt IS NULL )
AND dt > ADD_MONTHS(next_dt, -3)
In the above, a date can only be the start of a contiguous sequence if there is no prior date within 3 months (either it is more than three months ago or it doesn't exist at all) and there is also a subsequent date within 3 months.
You can use LAG and LEAD. Find the query below. I think it works fine.
tmp_year is the table I have created. tdate is the column.
The records in the table are
28-JAN-15
27-JAN-15
26-JAN-15
25-JAN-15
12-JUL-14
11-JUL-14
10-JUL-14
09-JUL-14
24-DEC-13
23-DEC-13
22-DEC-13
21-DEC-13
15-SEP-13
07-JUN-13
27-FEB-13
19-NOV-12
11-AUG-12
Please find the query which returns 25th Jan 2015.
select max(d.tdate) from (
select c.tdate,c.next_date,c.date_diff,lag(date_diff) over( order by tdate) prev_diff from (
select b.tdate ,b.next_date,(next_date-tdate) date_diff from
(select a.tdate,lead(a.tdate) over(order by a.tdate) next_date from tmp_year a ) b ) c) d where d.date_diff<90 and d.prev_diff>=90;