Finding the most recent thing prior to a specific event - sql

I'm doing some timestamp problem solving but am stuck with some join logic.
I have a table of data like so:
id, event_time, event_type, location
1001, 2018-06-04 18:23:48.526895 UTC, I, d
1001, 2018-06-04 19:26:44.359296 UTC, I, h
1001, 2018-06-05 06:07:03.658263 UTC, I, w
1001, 2018-06-07 00:47:44.651841 UTC, I, d
1001, 2018-06-07 00:48:17.857729 UTC, C, d
1001, 2018-06-08 00:04:53.086240 UTC, I, a
1001, 2018-06-12 21:23:03.071829 UTC, I, d
...
And I'm trying to find the timestamp difference between when a user has an event_type of C and the most recent event type of I up to event_type C for a given location value.
Ultimately the schema I'm after is:
id, location, timestamp_diff
1001, d, 33
1001, z, 21
1002, a, 55
...
I tried the following, which works for only one id value, but doesn't seem to work for multiples ids. I might be over-complicating the issue, but I wasn't sure. On one id it gives about 5 rows, which is right. However, when I open it up two ids, I get upwards of 200 rows when I should get something like 7 (5 for the first id and 2 for the second):
with c as (
select
id
,event_time as c_time
,location
from data
where event_type = 'C'
and id = '1001'
)
,i as (
select
id
,event_time as i_time
,location
from data
where event_type = 'I'
)
,check1 as (
c.*
,i.i_time
from c
left join i on (c.id = i.id and c.location = i.location)
group by 1,2,3,4
having i_time <= c_time
)
,check2 as (
select
id
,c_time
,location
,max(i_time) as i_time
from check1
group by 1,2,3
)
select
id
,location
,timestamp_diff(c_time, i_time, second) as timestamp_diff

#standardSQL
SELECT id, location, TIMESTAMP_DIFF(event_time, i_event_time, SECOND) AS diff
FROM (
SELECT *, MAX(IF(event_type = 'I', event_time, NULL)) OVER(win2) AS i_event_time
FROM (
SELECT *, COUNTIF(event_type = 'C') OVER(win1) grp
FROM `project.dataset.table`
WINDOW win1 AS (PARTITION BY id, location ORDER BY event_time ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
)
WINDOW win2 AS (PARTITION BY id, location, grp ORDER BY event_time ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
)
WHERE event_type = 'C'
AND NOT i_event_time IS NULL
This version addresses some edge cases - like for example case when there are consecutive 'C' events with "missing" 'I' events as in example below
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1001 id, TIMESTAMP '2018-06-04 18:23:48.526895 UTC' event_time, 'I' event_type, 'd' location UNION ALL
SELECT 1001, '2018-06-04 19:26:44.359296 UTC', 'I', 'h' UNION ALL
SELECT 1001, '2018-06-05 06:07:03.658263 UTC', 'I', 'w' UNION ALL
SELECT 1001, '2018-06-07 00:47:44.651841 UTC', 'I', 'd' UNION ALL
SELECT 1001, '2018-06-07 00:48:17.857729 UTC', 'C', 'd' UNION ALL
SELECT 1001, '2018-06-08 00:04:53.086240 UTC', 'C', 'd' UNION ALL
SELECT 1001, '2018-06-12 21:23:03.071829 UTC', 'I', 'd'
)
SELECT id, location, TIMESTAMP_DIFF(event_time, i_event_time, SECOND) AS diff
FROM (
SELECT *, MAX(IF(event_type = 'I', event_time, NULL)) OVER(win2) AS i_event_time
FROM (
SELECT *, COUNTIF(event_type = 'C') OVER(win1) grp
FROM `project.dataset.table`
WINDOW win1 AS (PARTITION BY id, location ORDER BY event_time ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
)
WINDOW win2 AS (PARTITION BY id, location, grp ORDER BY event_time ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
)
WHERE event_type = 'C'
AND NOT i_event_time IS NULL
result is
Row id location diff
1 1001 d 33
while if not to address that mentioned edge case it would be
Row id location diff
1 1001 d 33
2 1001 d 83795

You can use a cumulative max() function to get the most recent i time before every event.
Then just filter based on the C event:
select id, location,
timestamp_diff(event_time, i_event_time, second) as diff
from (select t.*,
max(case when event_type = 'I' then event_time end) over (partition by id, location order by event_time) as i_event_time
from t
) t
where event_type = 'C';

Related

create time range with 2 columns date_time

The problem I am facing is how to find distinct time periods from multiple time periods with overlap in Teradata ANSI SQL.
For example, the attached tables contain multiple overlapping time periods, how can I combine those time periods into 3 unique time periods in Teradata SQL???
I think I can do it in python with the loop function, but not sure how to do it in SQL
ID
Start Date
End Date
001
2005-01-01
2006-01-01
001
2005-01-01
2007-01-01
001
2008-01-01
2008-06-01
001
2008-04-01
2008-12-01
001
2010-01-01
2010-05-01
001
2010-04-01
2010-12-01
001
2010-11-01
2012-01-01
My expected result is:
ID
start_Date
end_date
001
2005-01-01
2007-01-01
001
2008-01-01
2008-12-01
001
2010-01-01
2012-01-01
From Oracle 12, you can use MATCH_RECOGNIZE to perform a row-by-row comparison:
SELECT *
FROM table_name
MATCH_RECOGNIZE(
PARTITION BY id
ORDER BY start_date
MEASURES
FIRST(start_date) AS start_date,
MAX(end_date) AS end_date
ONE ROW PER MATCH
PATTERN (overlapping_ranges* last_range)
DEFINE overlapping_ranges AS NEXT(start_date) <= MAX(end_date)
)
Which, for the sample data:
CREATE TABLE table_name (ID, Start_Date, End_Date) AS
SELECT '001', DATE '2005-01-01', DATE '2006-01-01' FROM DUAL UNION ALL
SELECT '001', DATE '2005-01-01', DATE '2007-01-01' FROM DUAL UNION ALL
SELECT '001', DATE '2008-01-01', DATE '2008-06-01' FROM DUAL UNION ALL
SELECT '001', DATE '2008-04-01', DATE '2008-12-01' FROM DUAL UNION ALL
SELECT '001', DATE '2010-01-01', DATE '2010-05-01' FROM DUAL UNION ALL
SELECT '001', DATE '2010-04-01', DATE '2010-12-01' FROM DUAL UNION ALL
SELECT '001', DATE '2010-11-01', DATE '2012-01-01' FROM DUAL;
Outputs:
ID
START_DATE
END_DATE
001
2005-01-01 00:00:00
2007-01-01 00:00:00
001
2008-01-01 00:00:00
2008-12-01 00:00:00
001
2010-01-01 00:00:00
2012-01-01 00:00:00
db<>fiddle here
Update: Alternative query
SELECT id,
start_date,
end_date
FROM (
SELECT id,
dt,
SUM(cnt) OVER (PARTITION BY id ORDER BY dt) AS grp,
cnt
FROM (
SELECT ID,
dt,
SUM(type) OVER (PARTITION BY id ORDER BY dt, ROWNUM) * type AS cnt
FROM table_name
UNPIVOT (dt FOR type IN (start_date AS 1, end_date AS -1))
)
WHERE cnt IN (1,0)
)
PIVOT (MAX(dt) FOR cnt IN (1 AS start_date, 0 AS end_date))
Or, an equivalent that does not use UNPIVOT, PIVOT or ROWNUM and works in both Oracle and PostgreSQL:
SELECT id,
MAX(CASE cnt WHEN 1 THEN dt END) AS start_date,
MAX(CASE cnt WHEN 0 THEN dt END) AS end_date
FROM (
SELECT id,
dt,
SUM(cnt) OVER (PARTITION BY id ORDER BY dt) AS grp,
cnt
FROM (
SELECT ID,
dt,
SUM(type) OVER (PARTITION BY id ORDER BY dt, rn) * type AS cnt
FROM (
SELECT r.*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY dt ASC, type DESC) AS rn
FROM (
SELECT id, 1 AS type, start_date AS dt FROM table_name
UNION ALL
SELECT id, -1 AS type, end_date AS dt FROM table_name
) r
) p
) s
WHERE cnt IN (1,0)
) t
GROUP BY id, grp
Update 2: Another Alternative
SELECT id,
MIN(start_date) AS start_date,
MAX(end_Date) AS end_date
FROM (
SELECT t.*,
SUM(CASE WHEN start_date <= prev_max THEN 0 ELSE 1 END)
OVER (PARTITION BY id ORDER BY start_date) AS grp
FROM (
SELECT t.*,
MAX(end_date) OVER (
PARTITION BY id ORDER BY start_date
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
) AS prev_max
FROM table_name t
) t
) t
GROUP BY id, grp
db<>fiddle Oracle PostgreSQL
This is a gaps and islands problem. Try this:
with u as
(select ID, start_date, end_date,
case
when start_date <= lag(end_date) over(partition by ID order by start_date, end_date) then 0
else 1 end as grp
from table_name),
v as
(select ID, start_date, end_date,
sum(grp) over(partition by ID order by start_date, end_date) as island
from u)
select ID, min(start_date) as start_Date, max(end_date) as end_date
from v
group by ID, island;
Fiddle
Basically you can identify "islands" by comparing start_date of current row to end_date of previous row (ordered by start_date, end_date), if it precedes it then it's the same island. Then you can do a rolling sum() to get the island numbers. Finally select min(start_date) and max(end_date) from each island to get the desired output.
This may work ,with little bit of change in function , I tried it in Dbeaver :
select ID,Start_Date,End_Date
from
(
select t.*,
dense_rank () over(partition by extract (year from Start_Date) order BY End_Date desc) drnk
from testing_123 t
) temp
where temp.drnk = 1
ORDER BY Start_Date;
Try this
WITH a as (
SELECT
ID,
LEFT(Start_Date, 4) as Year,
MIN(Start_Date) as New_Start_Date
FROM
TAB1
GROUP BY
ID,
LEFT(Start_Date, 4)
), b as (
SELECT
a.ID,
Year,
New_Start_Date,
End_Date
FROM
a
LEFT JOIN
TAB1
ON LEFT(a.New_Start_Date, 4) = LEFT(TAB1.Start_Date, 4)
)
select
ID,
New_Start_Date as Start_Date,
MAX(End_Date)
from
b
GROUP BY
ID,
New_Start_Date;
Example: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=97f91b68c635aebfb752538cdd752ace

LAG with condition

I want to get a value from the previous row that matches a certain condition.
For example: here I want for each row to get the timestamp from the last event = 1.
I feel I can do it without joins with LAG and PARTITION BY with CASE but I am not able to crack it.
Please help.
Here is one approach using analytic functions:
WITH cte AS (
SELECT *, COUNT(CASE WHEN event = 1 THEN 1 END) OVER
(PARTITION BY customer_id ORDER BY ts) cnt
FROM yourTable
)
SELECT ts, customer_id, event,
MAX(CASE WHEN event = 1 THEN ts END) OVER
(PARTITION BY customer_id, cnt) AS desired_result
FROM cte
ORDER BY customer_id, ts;
Demo
We can articulate your problem by saying that your want the desired_result column to contain the most recent timestamp value when the event was 1. The count (cnt) in the CTE above computes a pseudo group of records for each time the event is 1. Then we simply do a conditional aggregation over customer and pseudo group to find the timestamp value.
One more approach with "one query":
with data as
(
select sysdate - 0.29 ts, 111 customer_id, 1 event from dual union all
select sysdate - 0.28 ts, 111 customer_id, 2 event from dual union all
select sysdate - 0.27 ts, 111 customer_id, 3 event from dual union all
select sysdate - 0.26 ts, 111 customer_id, 1 event from dual union all
select sysdate - 0.25 ts, 111 customer_id, 1 event from dual union all
select sysdate - 0.24 ts, 111 customer_id, 2 event from dual union all
select sysdate - 0.23 ts, 111 customer_id, 1 event from dual union all
select sysdate - 0.22 ts, 111 customer_id, 1 event from dual
)
select
ts, event,
last_value(case when event=1 then ts end) ignore nulls
over (partition by customer_id order by ts) desired_result,
max(case when event=1 then ts end)
over (partition by customer_id order by ts) desired_result_2
from data
order by ts
Edit: As suggested by MatBailie the max(case...) works as well and is a more general approach. The "last_value ... ignore nulls" is Oracle specific.

How can I select the first and the last row for each set returned

I have the following data which I want to select as follows:
How can I modify the query to select the output as shown below?
select primary_id, timestamp, secondary_id,... from tablename where
timestamp <= to_timestamp('2020-07-29 00:00:00', 'YYYY-MM-DD HH24:MI:SS') and
timestamp < to_timestamp('2020-07-29 04:00:00', 'YYYY-MM-DD HH24:MI:SS')
order by timestamp, secondary_id;
primary_id timestamp secondary_id attribute1 attribute2 ... -- I want to get
-------------------------------------------------------------------
1 2020/01/20 10 ... ... ... -- <- this
2 2020/02/28 10 ... ... ...
3 2020/03/01 10 ... ... ... -- <- and this
4 2020/04/08 20 ... ... ... -- <- this
5 2020/05/31 20 ... ... ...
6 2020/06/30 20 ... ... ...
7 2020/06/31 20 ... ... ...
8 2020/07/31 20 ... ... ... -- <- and this
You can use window functions to rank records having the same secondary_id by ascending and descending timestamp, and then use that information to filter in the first and last record per group:
select primary_id, timestamp, secondary_id, ...
from (
select
t.*,
row_number() over(partition by secondary_id order by timestamp asc ) rn_asc,
row_number() over(partition by secondary_id order by timestamp desc) rn_desc
from tablename t
where
timestamp <= timestamp '2020-07-29 00:00:00'
and timestamp < timestamp '2020-07-29 04:00:00'
) t
where 1 in (rn_asc, rn_desc)
order by timestamp, secondary_id;
Note that you don't need to_timestamp() to convert these literal strings: you can use literal dates instead.
This also works when the value of secondary_id can be repeated in another group of rows, it simply checks if the current id is different from the previous or next row:
select *
from (
select
t.*,
lag(secondary_id) over(order by timestamp asc ) lag_id,
lead(secondary_id) over(order by timestamp asc) lead_id
from tablename t
where timestamp <= timestamp '2020-07-29 00:00:00'
and timestamp < timestamp '2020-07-29 04:00:00'
) t
where lag_id is null
or lead_id is null
or lag_id <> secondary_id
or lead_id <> secondary_id
order by timestamp, secondary_id;
Should be quite efficient as there's the same ORDER BY for both LEAD & LAG.
Please use below query,
select primary_id, timestamp, secondary_id,... from
(select primary_id, timestamp, secondary_id,...,
row_number() over (partition by secondary_id order by timestamp) as rnk1,
row_number() over (partition by secondary_id order by timestamp desc) as rnk2
from tablename where
timestamp <= to_timestamp('2020-07-29 00:00:00', 'YYYY-MM-DD HH24:MI:SS') and
timestamp < to_timestamp('2020-07-29 04:00:00', 'YYYY-MM-DD HH24:MI:SS') ) qry
where rnk1=1 and rnk2 = 1
order by timestamp, secondary_id;
You could use first_value and last_value. These are analytics functions and can be used like in the demo below.
with demo_data ( primary_id, secondary_id, timestamp)as
( select 1, 10, date '2020-01-01' from dual
union all
select 2 ,10, date '2020-01-28' from dual
union all
select 3, 10, date '2020-02-03' from dual
union all
select 4, 20, date '2020-03-02' from dual
union all
select 5, 20, date '2020-03-15' from dual
)
, grouped_data as
( select primary_id,
secondary_id,
timestamp,
decode(first_value(primary_id) over(partition by secondary_id order by timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ), primary_id, 'Y', 'N') first_row_in_group,
decode(last_value(primary_id) over(partition by secondary_id order by timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING), primary_id, 'Y', 'N') last_row_in_group
from demo_data
)
select primary_id, secondary_id, timestamp
from grouped_data s
where first_row_in_group = 'Y' or last_row_in_group = 'Y'
/

SQL join on timestamp differences from the same table

I'm not sure how to write this SQL query in BigQuery. I have a table of events with names and timestamps. Let's say I have only two events in the table: A and B. What I want to do is query the table to get all instances of event A, and get the next closest occurrence of B and create a new column with the time difference. B will always happen after A.
For example if I had a table that looks like:
A1 | 1:00 pm
B5 | 2:00 pm
A3 | 3:00 pm
B9 | 5:00 pm
My resultant table would be:
A1 | 1 hour
A3 | 2 hours
The query I came up with is the following:
SELECT
CAST(TIMESTAMP_DIFF((SELECT MIN(sub.time)
FROM table sub
WHERE sub.time > main.time), main.time, SECOND) AS INT64) duration
FROM table main
This works fine for getting the table I wanted above, but I would also like to include an additional column from the subquery. Something that looks like:
A1 | 1 hour | B5Column
A3 | 2 hours | B9Column
I attempted at using the query below:
SELECT
(SELECT
sub.SubQueryColumn
FROM table sub
WHERE sub.time > main.time
ORDER BY sub.time asc
LIMIT 1) SubColumn,
CAST(TIMESTAMP_DIFF((SELECT MIN(sub.time)
FROM table sub
WHERE sub.time > main.time), main.time, SECOND) AS INT64) duration
FROM table main
but it did not work. The error I get is
Correlated subqueries that reference other tables are not supported unless they can be de-correlated, such as by transforming them into an efficient JOIN.
Could I get some help with this?
Here is one method:
select m.*,
timestamp_diff(time, next_b_time, second) as duration
from (select m.*,
min(case when event like 'B%' then time end) over (order by time desc) as next_b_time
from main m
) m
where event like 'A%';
Below is for BigQuery Standard SQL
#standardSQL
SELECT event, TIMESTAMP_DIFF(b_time, time, SECOND) duration, b_event
FROM (
SELECT event, time,
LEAD(time) OVER(PARTITION BY grp ORDER BY time) b_time,
LEAD(event) OVER(PARTITION BY grp ORDER BY time) b_event
FROM (
SELECT *,
COUNTIF(STARTS_WITH(event, 'A')) OVER(ORDER BY time) grp
FROM `project.dataset.your_table` t
)
)
WHERE STARTS_WITH(event, 'A')
-- ORDER BY time
You can test / play with it using dummy data from your question as below
#standardSQL
WITH `project.dataset.your_table` AS (
SELECT 'A1' event, TIMESTAMP '2018-01-01 1:00:00' time UNION ALL
SELECT 'B5', TIMESTAMP '2018-01-01 2:00:00' UNION ALL
SELECT 'A3', TIMESTAMP '2018-01-01 3:00:00' UNION ALL
SELECT 'B9', TIMESTAMP '2018-01-01 5:00:00'
)
SELECT event, TIMESTAMP_DIFF(b_time, time, SECOND) duration, b_event
FROM (
SELECT event, time,
LEAD(time) OVER(PARTITION BY grp ORDER BY time) b_time,
LEAD(event) OVER(PARTITION BY grp ORDER BY time) b_event
FROM (
SELECT *,
COUNTIF(STARTS_WITH(event, 'A')) OVER(ORDER BY time) grp
FROM `project.dataset.your_table` t
)
)
WHERE STARTS_WITH(event, 'A')
ORDER BY time
with result as
Row event duration b_event
1 A1 3600 B5
2 A3 7200 B9
Please note: above solution rely on statement in your question - B will always happen after A so if you have sequence as below
WITH `project.dataset.your_table` AS (
SELECT 'A1' event, TIMESTAMP '2018-01-01 1:00:00' time UNION ALL
SELECT 'A2', TIMESTAMP '2018-01-01 1:30:00' UNION ALL
SELECT 'B5', TIMESTAMP '2018-01-01 2:00:00' UNION ALL
SELECT 'A3', TIMESTAMP '2018-01-01 3:00:00' UNION ALL
SELECT 'B9', TIMESTAMP '2018-01-01 5:00:00'
)
result will be
Row event duration b_event
1 A1 null null
2 A2 1800 B5
3 A3 7200 B9
If you need to address this - try below
#standardSQL
WITH `project.dataset.your_table` AS (
SELECT 'A1' event, TIMESTAMP '2018-01-01 1:00:00' time UNION ALL
SELECT 'A2', TIMESTAMP '2018-01-01 1:30:00' UNION ALL
SELECT 'B5', TIMESTAMP '2018-01-01 2:00:00' UNION ALL
SELECT 'A3', TIMESTAMP '2018-01-01 3:00:00' UNION ALL
SELECT 'B9', TIMESTAMP '2018-01-01 5:00:00'
)
SELECT event, TIMESTAMP_DIFF(b_time, time, SECOND) duration, b_event
FROM (
SELECT event, time, type, grp,
FIRST_VALUE(event) OVER(ORDER BY grp RANGE BETWEEN 1 FOLLOWING AND 1 FOLLOWING) b_event,
FIRST_VALUE(time) OVER(ORDER BY grp RANGE BETWEEN 1 FOLLOWING AND 1 FOLLOWING) b_time
FROM (
SELECT event, time, SUBSTR(event, 1, 1) type,
COUNTIF(STARTS_WITH(event, 'B')) OVER(ORDER BY time) grp
FROM `project.dataset.your_table` t
)
)
WHERE STARTS_WITH(event, 'A')
ORDER BY time
this version will return
Row event duration b_event
1 A1 3600 B5
2 A2 1800 B5
3 A3 7200 B9

Identify contiguous and discontinuous date ranges

I have a table named x . The data is as follows.
Acccount_num start_dt end_dt
A111326 02/01/2016 02/11/2016
A111326 02/12/2016 03/05/2016
A111326 03/02/2016 03/16/2016
A111331 02/28/2016 02/29/2016
A111331 02/29/2016 03/29/2016
A999999 08/25/2015 08/25/2015
A999999 12/19/2015 12/22/2015
A222222 11/06/2015 11/10/2015
A222222 05/16/2016 05/17/2016
Both A111326 and A111331 should be identified as contiguous data and A999999 and
A222222 should be identified as discontinuous data.In my code I currently use the following query to identify discontinuous data. The A111326 is also erroneously identified as discontinuous data. Please help to modify the below code so that A111326 is not identified as discontinuous data.Thanks in advance for your help.
(SELECT account_num
FROM (SELECT account_num,
(MAX (
END_DT)
OVER (PARTITION BY account_num
ORDER BY START_DT))
START_DT,
(LEAD (
START_DT)
OVER (PARTITION BY account_num
ORDER BY START_DT))
END_DT
FROM x
WHERE (START_DT + 1) <=
(END_DT - 1))
WHERE START_DT < END_DT);
Oracle Setup:
CREATE TABLE accounts ( Account_num, start_dt, end_dt ) AS
SELECT 'A', DATE '2016-02-01', DATE '2016-02-11' FROM DUAL UNION ALL
SELECT 'A', DATE '2016-02-12', DATE '2016-03-05' FROM DUAL UNION ALL
SELECT 'A', DATE '2016-03-02', DATE '2016-03-16' FROM DUAL UNION ALL
SELECT 'B', DATE '2016-02-28', DATE '2016-02-29' FROM DUAL UNION ALL
SELECT 'B', DATE '2016-02-29', DATE '2016-03-29' FROM DUAL UNION ALL
SELECT 'C', DATE '2015-08-25', DATE '2015-08-25' FROM DUAL UNION ALL
SELECT 'C', DATE '2015-12-19', DATE '2015-12-22' FROM DUAL UNION ALL
SELECT 'D', DATE '2015-11-06', DATE '2015-11-10' FROM DUAL UNION ALL
SELECT 'D', DATE '2016-05-16', DATE '2016-05-17' FROM DUAL UNION ALL
SELECT 'E', DATE '2016-01-01', DATE '2016-01-02' FROM DUAL UNION ALL
SELECT 'E', DATE '2016-01-05', DATE '2016-01-06' FROM DUAL UNION ALL
SELECT 'E', DATE '2016-01-03', DATE '2016-01-07' FROM DUAL;
Query:
WITH times ( account_num, dt, lvl ) AS (
SELECT Account_num, start_dt - 1, 1 FROM accounts
UNION ALL
SELECT Account_num, end_dt, -1 FROM accounts
)
, totals ( account_num, dt, total ) AS (
SELECT account_num,
dt,
SUM( lvl ) OVER ( PARTITION BY Account_num ORDER BY dt, lvl DESC )
FROM times
)
SELECT Account_num,
CASE WHEN COUNT( CASE total WHEN 0 THEN 1 END ) > 1
THEN 'N'
ELSE 'Y'
END AS is_contiguous
FROM totals
GROUP BY Account_Num
ORDER BY Account_Num;
Output:
ACCOUNT_NUM IS_CONTIGUOUS
----------- -------------
A Y
B Y
C N
D N
E Y
Alternative Query:
(It's exactly the same method just using UNPIVOT rather than UNION ALL.)
SELECT Account_num,
CASE WHEN COUNT( CASE total WHEN 0 THEN 1 END ) > 1
THEN 'N'
ELSE 'Y'
END AS is_contiguous
FROM (
SELECT Account_num,
SUM( lvl ) OVER ( PARTITION BY Account_Num
ORDER BY CASE lvl WHEN 1 THEN dt - 1 ELSE dt END,
lvl DESC
) AS total
FROM accounts
UNPIVOT ( dt FOR lvl IN ( start_dt AS 1, end_dt AS -1 ) )
)
GROUP BY Account_Num
ORDER BY Account_Num;
WITH cte AS (
SELECT
AccountNumber
,CASE
WHEN
LAG(End_Dt) OVER (PARTITION BY AccountNumber ORDER BY End_Dt) IS NULL THEN 0
WHEN
LAG(End_Dt) OVER (PARTITION BY AccountNumber ORDER BY End_Dt) >= Start_Dt - 1 THEN 0
ELSE 1
END as discontiguous
FROM
#Table
)
SELECT
AccountNumber
,CASE WHEN SUM(discontiguous) > 0 THEN 'discontiguous' ELSE 'contiguous' END
FROM
cte
GROUP BY
AccountNumber;
One of your problems is that your contiguous desired result also includes overlapping date ranges in your example data set. Example A111326 Starts on 3/2/2016 but ends the row before on 3/5/2015 meaning it overlaps by 3 days.