How to get missing values in date range? - sql

I have table with following structure:
And I'm trying to get grouped values between two dates, problem is, that i would like to have also retuned rows for dates which are not in select, for example i have range for
WHERE m.date BETWEEN "2014-09-02" AND "2014-09-10"
But for example in date 2014-09-06 is no related row in table, so in result should be
2014-09-06| 0 | 0 | 0 | 0
How can i do it please? (if is it possible with SQLLite database).
Here is the query which i'm using:
SELECT substr(m.date, 1, 10) as my_date, COUNT(m.ID) AS 'NUMBER_OF_ALL_CALLS',
(SELECT COUNT(*) FROM dialed_calls subq WHERE subq.call_result = 'DONE'
AND substr(m.date, 1, 10) = substr(subq.DATE, 1, 10)) as 'RESULT_DONE',
(SELECT COUNT(*) FROM dialed_calls subq WHERE subq.call_result = 'NOT_INTERESTED'
AND substr(m.date, 1, 10) = substr(subq.DATE, 1, 10)) as 'RESULT_NOT_INTERESTED',
(SELECT COUNT(*) FROM dialed_calls subq WHERE subq.call_result = 'NO_APPOINTMENT'
AND substr(m.date, 1, 10) = substr(subq.DATE, 1, 10)) as 'RESULT_NO_APP'
FROM dialed_calls m
WHERE m.date BETWEEN "2014-09-02" AND "2014-09-05"
GROUP BY my_date
Many thanks for any help.
Table structure:
BEGIN TRANSACTION;
CREATE TABLE dialed_calls(Id integer PRIMARY KEY,
'date' datetime,
'called_number' VARCHAR(45),
'call_result' VARCHAR(45),
'call_duration' INT,
'synced' BOOL);
/* Create few records in this table */
INSERT INTO dialed_calls VALUES(1,'2014-09-02 15:54:34+0200',
'800123456', 'NOT_INTERESTED', 10, 0);
INSERT INTO dialed_calls VALUES(2,'2014-09-02 15:56:30+0200',
'800123456', 'NO_APPOINTMENT', 10, 0);
INSERT INTO dialed_calls VALUES(3,'2014-09-02 16:01:49+0200',
'800123456', 'DONE', 9, 0);
INSERT INTO dialed_calls VALUES(4,'2014-09-02 16:03:03+0200',
'800123456', 'NO_APPOINTMENT', 69, 0);
INSERT INTO dialed_calls VALUES(5,'2014-09-02 18:09:34+0200',
'800123456', 'NO_APPOINTMENT', 3, 0);
INSERT INTO dialed_calls VALUES(6,'2014-09-02 18:54:02+0200',
'123456789', 'NO_APPOINTMENT', 89, 0);
INSERT INTO dialed_calls VALUES(7,'2014-09-02 18:55:25+0200',
'123456789', 'NOT_INTERESTED', 89, 0);
INSERT INTO dialed_calls VALUES(8,'2014-09-03 18:36:58+0200',
'123456789', 'DONE', 185, 0);
INSERT INTO dialed_calls VALUES(9,'2014-09-04 18:36:58+0200',
'123456789', 'DONE', 185, 0);
INSERT INTO dialed_calls VALUES(10,'2014-09-05 18:36:58+0200',
'123456789', 'DONE', 185, 0);
COMMIT;

Try this:
SELECT
d.date AS DATE,
IFNULL(NUMBER_OF_ALL_CALLS, 0) AS NUMBER_OF_ALL_CALLS,
IFNULL(RESULT_DONE, 0) AS RESULT_DONE,
IFNULL(RESULT_NOT_INTERESTED, 0) AS RESULT_NOT_INTERESTED,
IFNULL(RESULT_NO_APP, 0) AS RESULT_NO_APP
FROM
(SELECT DATE('1970-01-01', '+' || (t4.i*10000 + t3.i*1000 + t2.i*100 + t1.i*10 + t0.i) || ' days') date FROM
(SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t0,
(SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t1,
(SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t2,
(SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t3,
(SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t4) d
LEFT JOIN
(
SELECT substr(m.date, 1, 10) as my_date, COUNT(m.ID) AS 'NUMBER_OF_ALL_CALLS',
(SELECT COUNT(*) FROM dialed_calls subq WHERE subq.call_result = 'DONE'
AND substr(m.date, 1, 10) = substr(subq.DATE, 1, 10)) as 'RESULT_DONE',
(SELECT COUNT(*) FROM dialed_calls subq WHERE subq.call_result = 'NOT_INTERESTED'
AND substr(m.date, 1, 10) = substr(subq.DATE, 1, 10)) as 'RESULT_NOT_INTERESTED',
(SELECT COUNT(*) FROM dialed_calls subq WHERE subq.call_result = 'NO_APPOINTMENT'
AND substr(m.date, 1, 10) = substr(subq.DATE, 1, 10)) as 'RESULT_NO_APP'
FROM dialed_calls m
GROUP BY my_date
) t
ON d.date = t.my_date
WHERE d.date BETWEEN '2014-09-02' AND '2014-09-10'
ORDER BY d.date;
Above query will first the retrieve the dates between the specified date range and later will join the retrieved values with your table.

This is a good case for joining to a Calendar table.
http://web.archive.org/web/20070611150639/http://sqlserver2000.databases.aspfaq.com/why-should-i-consider-using-an-auxiliary-calendar-table.html
Note this is a SQL server link, but you can adapt this to SQLlite.
You could do your calculations, and then right join the results to the calendar table, so that the dates appear with NULL values. Or you could COALESCE() the nulls to something that makes more sense, like 0.

to get intermediate dates as the query result you need a table with all the dates in the range.
in some rdbms you can populate a temporary table to join with.
you will have to compare the date part only (without time).
be careful with your between because the second date has time 00:00:00 and maybe it's not what you meant.

Related

SQL query to Ignore matching positive and negative values in a table

I have a transaction table that stores amount paid(+amount) and corrected (-ve amount). I am looking for a query that would ignore a positive and a negative matching value of the amount for a date and post the sum of remaining number of transactions ignoring the 2 .
Id Dept Date Amount
1 A 21-Apr-21 1100
1 A 21-Apr-21 1100
1 A 21-Apr-21 -1100
1 A 07-Apr-21 1100
1 A 03-Feb-21 100
1 A 12-Jan-21 500
The sql query should ignore Rows 2 and 3 as the amount was corrected and should not be counted as a transaction.
o/p should be
Id Dept sum(Amount) count(transaction)
1 A 2800 4
If I got you well, you can use below solution for that purpose.
I first ranked all the occurrences of the same amount value, before I grouped them in order to make oracle ignore all matching positive and negative values.
with YourSample (Id, Dept, Date#, Amount) as (
select 1, 'A', to_date('21-Apr-21', 'dd-Mon-RR', 'nls_date_language=english'), 1100 from dual union all
select 1, 'A', to_date('21-Apr-21', 'dd-Mon-RR', 'nls_date_language=english'), 1100 from dual union all
select 1, 'A', to_date('21-Apr-21', 'dd-Mon-RR', 'nls_date_language=english'), -1100 from dual union all
select 1, 'A', to_date('07-Apr-21', 'dd-Mon-RR', 'nls_date_language=english'), 1100 from dual union all
select 1, 'A', to_date('03-Feb-21', 'dd-Mon-RR', 'nls_date_language=english'), 100 from dual union all
select 1, 'A', to_date('12-Jan-21', 'dd-Mon-RR', 'nls_date_language=english'), 500 from dual
)
, ranked_rws as (
select Id, Dept, Date#
, abs(Amount)Amount
, sign(AMOUNT) row_sign
, row_number() OVER (PARTITION BY Id, Dept, Amount order by date#, rownum) rn
from YourSample t
)
, ingored_matched_pos_neg_values as (
select ID, DEPT, sum(row_sign) * AMOUNT AMOUNT/*, sum(row_sign)*/
from ranked_rws
group by ID, DEPT, AMOUNT, RN
having sum(row_sign) != 0 /* this line filters out all matching positive
and negatives values (equality in terms of occurrences)*/
)
select ID, DEPT, sum(AMOUNT) sum, count(*) transactions
from ingored_matched_pos_neg_values
group by ID, DEPT
;
demo
Maybe some idea like this could work.
SELECT Id, Dept, Date, Amount, COUNT(*) AS RecordCount
INTO #temptable
FROM table GROUP BY ...
SELECT
t1.Id
,t1.Dept
,t1.Date
,(t1.RecordCount - COALESCE(t2.RecordCount, 0)) * t1.Amount
,t1.RecordCount - COALESCE(t2.RecordCount, 0)
FROM #temptable t1
LEFT JOIN #temptable t2 ON
t1.Id = t2.Id
AND t1.Dept = t2.Dept
AND t1.Date = t2.Date
AND (t1.Amount * -1) = t2.Amount

sql: cummulative sum (partition by clients order by date)

Would you, pleace, help me, to count cummulative sum in sql server 2017. Condition is: 1) partition by client 2) order by date_tm. Desirable result is in the table below.
create table #clients (client nvarchar(1)
, date_tm datetime
,sum_pay int
, desirable_result int)
insert into #clients
(client, date_tm, sum_pay, desirable_result)
select '1', '2020-01-01', 10, 10 union all
select '1', '2020-01-02', 20, 30 union all
select '2', '2020-01-03', 20, 60 union all
select '2', '2020-01-01', 20, 20 union all
select '2', '2020-01-02', 20, 40 union all
select '3', '2020-01-01', 20, 20 union all
select '3', '2020-01-04', 20, 70 union all
select '3', '2020-01-02', 30, 50
select * from #clients
drop table if exists #clients
Thank you very much.
are finding below
select c.*,sum(sum_pay) over(partition by client order by date_tm)
from #clients c
You can use sum()over() window function as below:
select * ,SUM (sum_pay) OVER (partition by client order by date_tm) AS cummulativesum from #clients
SELECT * ,
CASE WHEN desirable_result = cum_sum THEN 'OK' ELSE 'NO' END AS Status
FROM
(
select
*,
SUM (sum_pay) OVER (partition by client order by date_tm) AS cum_sum
from #clients as tbl
) as a
with this code you can compare, desirable_result and cummilative sum

SQL query to find ids in the same table but different timestamp events (cohorts)

I need to write a query that gives me the count with the following logic. The example below shows that ACCOUNT_ID 123 signup in 2020-02-21 so M0 is 1 and then the same ACCOUNT_ID had an event in the consecutive month so M1 is 1.
M0 is a the signup date
M1 is signup date + 1 month
M2 is signup date + 2 consecutive months
M3 is signup date + 3 consecutive months
WITH M_O AS (
SELECT
parsed_data."ACCOUNT_ID" AS "parsed_data.account_id",
MIN(TO_CHAR(TO_DATE(parsed_data."TIMESTAMP"::timestamp_ntz ), 'YYYY-MM-DD')) AS "SIGNUP",
COUNT(DISTINCT (parsed_data."ACCOUNT_ID") ) AS "COUNT_USERS_O"
FROM "PUBLIC"."PARSED_DATA"
AS parsed_data
WHERE (parsed_data."ACCOUNT_ID") IS NOT NULL
AND (((parsed_data."EVENT") = 'Started'))
AND (
((TO_CHAR(TO_DATE(parsed_data."TIMESTAMP"::timestamp_ntz ), 'YYYY-MM-DD')) >= '2020-02-21')
AND ((parsed_data."TIMESTAMP"::timestamp_ntz ) < CURRENT_DATE())
)
GROUP BY 1),
M_1 AS (
SELECT
parsed_data."ACCOUNT_ID" AS "parsed_data.account_id",
TO_CHAR(TO_DATE(parsed_data."TIMESTAMP"::timestamp_ntz ), 'YYYY-MM-DD') AS "parsed_data.timestamp_date",
COUNT(DISTINCT (parsed_data."ACCOUNT_ID") ) AS "COUNT_USERS_1"
FROM "PUBLIC"."PARSED_DATA"
AS parsed_data INNER JOIN M_O ON parsed_data.account_id = M_O."parsed_data.account_id"
WHERE
(parsed_data."ACCOUNT_ID") IS NOT NULL
AND (((parsed_data."EVENT") = 'Started'))
AND (
(TO_CHAR(TO_DATE(parsed_data."TIMESTAMP"::timestamp_ntz ), 'YYYY-MM-DD')) >= DATEADD('MONTH', 1, SIGNUP)
AND ((parsed_data."TIMESTAMP"::timestamp_ntz ) < CURRENT_DATE())
)
GROUP BY 1,2
)
It looks like you want to create cohorts? As in "establish the creation date for each id, and then look how they changed their behavior every month thereafter".
This code should work:
with events as (
select 1 id, '2020-01-01'::date e_date
union all select 1, '2020-02-03'
union all select 2, '2020-03-01'
union all select 2, '2020-05-08'
union all select 3, '2020-08-01'
union all select 3, '2020-09-02'
union all select 3, '2020-09-22'
union all select 3, '2020-09-30'
union all select 3, '2020-10-10'
),
first_per_id as (
select id, min(e_date) first_date
from events
group by id
)
select a.id
, count_if(e_date>=dateadd(month, 0, first_date) and e_date<dateadd(month, 1, first_date)) m0
, count_if(e_date>=dateadd(month, 1, first_date) and e_date<dateadd(month, 2, first_date)) m1
, count_if(e_date>=dateadd(month, 2, first_date) and e_date<dateadd(month, 3, first_date)) m2
from events a
join first_per_id b
where a.id=b.id
group by 1

SQL - '1' IF hour in month EXISTS, '0' IF NOT EXISTS

I have a table that has aggregations down to the hour level YYYYMMDDHH. The data is aggregated and loaded by an external process (I don't have control over). I want to test the data on a monthly basis.
The question I am looking to answer is: Does every hour in the month exist?
I'm looking to produce output that will return a 1 if the hour exists or 0 if the hour does not exist.
The aggregation table looks something like this...
YYYYMM YYYYMMDD YYYYMMDDHH DATA_AGG
201911 20191101 2019110100 100
201911 20191101 2019110101 125
201911 20191101 2019110103 135
201911 20191101 2019110105 95
… … … …
201911 20191130 2019113020 100
201911 20191130 2019113021 110
201911 20191130 2019113022 125
201911 20191130 2019113023 135
And defined as...
CREATE TABLE YYYYMMDDHH_DATA_AGG AS (
YYYYMM VARCHAR,
YYYYMMDD VARCHAR,
YYYYMMDDHH VARCHAR,
DATA_AGG INT
);
I'm looking to produce the following below...
YYYYMMDDHH HOUR_EXISTS
2019110100 1
2019110101 1
2019110102 0
2019110103 1
2019110104 0
2019110105 1
... ...
In the example above, two hours do not exist, 2019110102 and 2019110104.
I assume I'd have to join the aggregation table against a computed table that contains all the YYYYMMDDHH combos???
The database is Snowflake, but assume most generic ANSI SQL queries will work.
You can get what you want with a recursive CTE
The recursive CTE generates the list of possible Hours. And then a simple left outer join gets you the flag for if you have any records that match that hour.
WITH RECURSIVE CTE (YYYYMMDDHH) as
(
SELECT YYYYMMDDHH
FROM YYYYMMDDHH_DATA_AGG
WHERE YYYYMMDDHH = (SELECT MIN(YYYYMMDDHH) FROM YYYYMMDDHH_DATA_AGG)
UNION ALL
SELECT TO_VARCHAR(DATEADD(HOUR, 1, TO_TIMESTAMP(C.YYYYMMDDHH, 'YYYYMMDDHH')), 'YYYYMMDDHH') YYYYMMDDHH
FROM CTE C
WHERE TO_VARCHAR(DATEADD(HOUR, 1, TO_TIMESTAMP(C.YYYYMMDDHH, 'YYYYMMDDHH')), 'YYYYMMDDHH') <= (SELECT MAX(YYYYMMDDHH) FROM YYYYMMDDHH_DATA_AGG)
)
SELECT
C.YYYYMMDDHH,
IFF(A.YYYYMMDDHH IS NOT NULL, 1, 0) HOUR_EXISTS
FROM CTE C
LEFT OUTER JOIN YYYYMMDDHH_DATA_AGG A
ON C.YYYYMMDDHH = A.YYYYMMDDHH;
If your timerange is too long you'll have issues with the cte recursing too much. You can create a table or temp table with all of the possible hours instead. For example:
CREATE OR REPLACE TEMPORARY TABLE HOURS (YYYYMMDDHH VARCHAR) AS
SELECT TO_VARCHAR(DATEADD(HOUR, SEQ4(), TO_TIMESTAMP((SELECT MIN(YYYYMMDDHH) FROM YYYYMMDDHH_DATA_AGG), 'YYYYMMDDHH')), 'YYYYMMDDHH')
FROM TABLE(GENERATOR(ROWCOUNT => 10000)) V
ORDER BY 1;
SELECT
H.YYYYMMDDHH,
IFF(A.YYYYMMDDHH IS NOT NULL, 1, 0) HOUR_EXISTS
FROM HOURS H
LEFT OUTER JOIN YYYYMMDDHH_DATA_AGG A
ON H.YYYYMMDDHH = A.YYYYMMDDHH
WHERE H.YYYYMMDDHH <= (SELECT MAX(YYYYMMDDHH) FROM YYYYMMDDHH_DATA_AGG);
You can then fiddle with the generator count to make sure you have enough hours.
You can generate a table with every hour of the month and LEFT OUTER JOIN your aggregation to it:
WITH EVERY_HOUR AS (
SELECT TO_CHAR(DATEADD(HOUR, HH, TO_DATE(YYYYMM::TEXT, 'YYYYMM')),
'YYYYMMDDHH')::NUMBER YYYYMMDDHH
FROM (SELECT DISTINCT YYYYMM FROM YYYYMMDDHH_DATA_AGG) t
CROSS JOIN (
SELECT ROW_NUMBER() OVER (ORDER BY NULL) - 1 HH
FROM TABLE(GENERATOR(ROWCOUNT => 745))
) h
QUALIFY YYYYMMDDHH < (YYYYMM + 1) * 10000
)
SELECT h.YYYYMMDDHH, NVL2(a.YYYYMM, 1, 0) HOUR_EXISTS
FROM EVERY_HOUR h
LEFT OUTER JOIN YYYYMMDDHH_DATA_AGG a ON a.YYYYMMDDHH = h.YYYYMMDDHH
Here's something that might help get you started. I'm guessing you want to have 'synthetic' [YYYYMMDD] values? Otherwise, if the value aren't there, then they shouldn't appear in the list
DROP TABLE IF EXISTS #_hours
DROP TABLE IF EXISTS #_temp
--Populate a table with hours ranging from 00 to 23
CREATE TABLE #_hours ([hour_value] VARCHAR(2))
DECLARE #_i INT = 0
WHILE (#_i < 24)
BEGIN
INSERT INTO #_hours
SELECT FORMAT(#_i, '0#')
SET #_i += 1
END
-- Replicate OP's sample data set
CREATE TABLE #_temp (
[YYYYMM] INTEGER
, [YYYYMMDD] INTEGER
, [YYYYMMDDHH] INTEGER
, [DATA_AGG] INTEGER
)
INSERT INTO #_temp
VALUES
(201911, 20191101, 2019110100, 100),
(201911, 20191101, 2019110101, 125),
(201911, 20191101, 2019110103, 135),
(201911, 20191101, 2019110105, 95),
(201911, 20191130, 2019113020, 100),
(201911, 20191130, 2019113021, 110),
(201911, 20191130, 2019113022, 125),
(201911, 20191130, 2019113023, 135)
SELECT X.YYYYMM, X.YYYYMMDD, X.YYYYMMDDHH
-- Case: If 'target_hours' doesn't exist, then 0, else 1
, CASE WHEN X.target_hours IS NULL THEN '0' ELSE '1' END AS [HOUR_EXISTS]
FROM (
-- Select right 2 characters from converted [YYYYMMDDHH] to act as 'target values'
SELECT T.*
, RIGHT(CAST(T.[YYYYMMDDHH] AS VARCHAR(10)), 2) AS [target_hours]
FROM #_temp AS T
) AS X
-- Right join to keep all of our hours and only the target hours that match.
RIGHT JOIN #_hours AS H ON H.hour_value = X.target_hours
Sample output:
YYYYMM YYYYMMDD YYYYMMDDHH HOUR_EXISTS
201911 20191101 2019110100 1
201911 20191101 2019110101 1
NULL NULL NULL 0
201911 20191101 2019110103 1
NULL NULL NULL 0
201911 20191101 2019110105 1
NULL NULL NULL 0
With (almost) standard sql, you can do a cross join of the distinct values of YYYYMMDD to a list of all possible hours and then left join to the table:
select concat(d.YYYYMMDD, h.hour) as YYYYMMDDHH,
case when t.YYYYMMDDHH is null then 0 else 1 end as hour_exists
from (select distinct YYYYMMDD from tablename) as d
cross join (
select '00' as hour union all select '01' union all
select '02' union all select '03' union all
select '04' union all select '05' union all
select '06' union all select '07' union all
select '08' union all select '09' union all
select '10' union all select '11' union all
select '12' union all select '13' union all
select '14' union all select '15' union all
select '16' union all select '17' union all
select '18' union all select '19' union all
select '20' union all select '21' union all
select '22' union all select '23'
) as h
left join tablename as t
on concat(d.YYYYMMDD, h.hour) = t.YYYYMMDDHH
order by concat(d.YYYYMMDD, h.hour)
Maybe in Snowflake you can construct the list of hours with a sequence much easier instead of all those UNION ALLs.
This version accounts for the full range of days, across months and years. It's a simple cross join of the set of possible days with the set of possible hours of the day -- left joined to actual dates.
set first = (select min(yyyymmdd::number) from YYYYMMDDHH_DATA_AGG);
set last = (select max(yyyymmdd::number) from YYYYMMDDHH_DATA_AGG);
with
hours as (select row_number() over (order by null) - 1 h from table(generator(rowcount=>24))),
days as (
select
row_number() over (order by null) - 1 as n,
to_date($first::text, 'YYYYMMDD')::date + n as d,
to_char(d, 'YYYYMMDD') as yyyymmdd
from table(generator(rowcount=>($last-$first+1)))
)
select days.yyyymmdd || lpad(hours.h,2,0) as YYYYMMDDHH, nvl2(t.yyyymmddhh,1,0) as HOUR_EXISTS
from days cross join hours
left join YYYYMMDDHH_DATA_AGG t on t.yyyymmddhh = days.yyyymmdd || lpad(hours.h,2,0)
order by 1
;
$first and $last can be packed in as sub-queries if you prefer.

Only select records that do not start within the time frame of another record

I'm trying to achieve the following goal using MS SQL Server 2005 but do not know how to do it.
The goal is to select only records that do not start within the same time period as an anchor record.
Rows that have same ID are a group and evaluated as part of that group.
Start with the earliest date (A) based on StartDate, compare to the next row (B) that has the same ID.
If B starts within A, mark B as invalid. Continue to compare A against all remaining records that have the same ID. Mark any starting within A as invalid.
Flag the next record that does not overlap with A as Valid. Now repeat the same process as above (i.e. check to see if any subsequent records start within the time frame of the new valid record).
Repeat this process until all records have been analyzed.
Example: Create the following table.
if object_id ('tempdb..#Dates') is not null drop table #Dates
create table #Dates (ID int, StartDate datetime, EndDate datetime)
Insert into #Dates
Select 1, '7/23/2003' , '8/22/2003' union all
select 1, '8/21/2003' , '11/19/2003' union all
select 1, '11/18/2003' , '12/18/2003' union all
select 1, '12/17/2003' , '1/16/2004' union all
select 1, '1/15/2004' , '2/14/2004' union all
select 1, '2/11/2004' , '2/26/2004' union all
select 1, '9/14/2004' , '10/14/2004' union all
select 1, '10/5/2004' , '10/20/2004' union all
select 1, '11/20/2004' , '12/20/2004' union all
select 1, '12/19/2004' , '1/18/2005' union all
select 1, '1/12/2005' , '1/27/2005' union all
select 1, '2/27/2005' , '3/11/2005'
Expected output after applying the overlap logic rules:
ID StartDate EndDate Valid
-- --------- --------- -----
1 7/23/2003 8/22/2003 1
1 8/21/2003 11/19/2003 0
1 11/18/2003 12/18/2003 1
1 12/17/2003 1/16/2004 0
1 1/15/2004 2/14/2004 1
1 2/11/2004 2/26/2004 0
1 9/14/2004 10/14/2004 1
1 10/5/2004 10/20/2004 0
1 11/20/2004 12/20/2004 1
1 12/19/2004 1/18/2005 0
1 1/12/2005 1/27/2005 1
1 2/27/2005 3/11/2005 1
I figured out how to answer my own question. Used recursive SQL after ordering the records using row_number.
if object_id ('tempdb..#Dates') is not null drop table #Dates
create table #Dates (ID int, StartDate datetime, EndDate datetime)
Insert into #Dates
Select 1, '7/23/2003' , '8/22/2003' union all
select 1, '8/21/2003' , '11/19/2003' union all
select 1, '11/18/2003' , '12/18/2003' union all
select 1, '12/19/2004' , '1/18/2005' union all
select 1, '1/12/2005' , '1/27/2005' union all
select 1, '2/27/2005' , '3/11/2005' union all
select 1, '12/17/2003' , '1/16/2004' union all
select 1, '1/15/2004' , '2/14/2004' union all
select 1, '2/11/2004' , '2/26/2004' union all
select 1, '9/14/2004' , '10/14/2004' union all
select 1, '10/5/2004' , '10/20/2004' union all
select 1, '11/20/2004' , '12/20/2004'
--Phase 1: Apply ordering to dates
if object_id ('tempdb..#OrderedRecords') is not null drop table #OrderedRecords
select *, N = row_number () over (partition by ID order by StartDate asc, EndDate desc)
into #OrderedRecords
from #Dates
--Phase 2: Apply Overlap Rules (Subsume records that overlap)
;with Subsume (ID, N, StartDate, EndDate, IntermediateStartDate, IntermediateEndDate, Valid) as
(
select ID, N, StartDate, EndDate, IntermediateStartDate = StartDate, IntermediateEndDate = EndDate,
Valid = 1
from #OrderedRecords
where N = 1
UNION ALL
select c.ID, c.N, y.StartDate, y.EndDate,
IntermediateStartDate = case when c.StartDate between y.IntermediateStartDate and y.IntermediateEndDate then y.IntermediateStartDate else c.StartDate end,
IntermediateEndDate = case when c.StartDate between y.IntermediateStartDate and y.IntermediateEndDate then y.IntermediateEndDate else c.EndDate end,
Valid = case when (c.StartDate between y.IntermediateStartDate and y.IntermediateEndDate) then 0 else 1 end
from #OrderedRecords c
join Subsume y
on y.ID = c.ID
and y.N = c.n - 1
and y.IntermediateStartDate >= c.EndDate
UNION ALL
select c.ID, c.N, c.StartDate, c.EndDate,
IntermediateStartDate = case when c.StartDate between y.IntermediateStartDate and y.IntermediateEndDate then y.IntermediateStartDate else c.StartDate end,
IntermediateEndDate = case when c.StartDate between y.IntermediateStartDate and y.IntermediateEndDate then y.IntermediateEndDate else c.EndDate end,
Valid = case when (c.StartDate between y.IntermediateStartDate and y.IntermediateEndDate) then 0 else 1 end
from #OrderedRecords c
join Subsume y
on y.ID = c.ID
and y.N = c.n - 1
and y.IntermediateStartDate < c.EndDate
)
Select ID, StartDate, EndDate, Valid
from Subsume
OPTION (MAXRECURSION 0)