Combine consecutive date ranges - sql
Using SQL Server 2008 R2,
I'm trying to combine date ranges into the maximum date range given that one end date is next to the following start date.
The data is about different employments. Some employees may have ended their employment and have rejoined at a later time. Those should count as two different employments (example ID 5). Some people have different types of employment, running after each other (enddate and startdate neck-to-neck), in this case it should be considered as one employment in total (example ID 30).
An employment period that has not ended has an enddate that is null.
Some examples is probably enlightening:
declare #t as table (employmentid int, startdate datetime, enddate datetime)
insert into #t values
(5, '2007-12-03', '2011-08-26'),
(5, '2013-05-02', null),
(30, '2006-10-02', '2011-01-16'),
(30, '2011-01-17', '2012-08-12'),
(30, '2012-08-13', null),
(66, '2007-09-24', null)
-- expected outcome
EmploymentId StartDate EndDate
5 2007-12-03 2011-08-26
5 2013-05-02 NULL
30 2006-10-02 NULL
66 2007-09-24 NULL
I've been trying different "islands-and-gaps" techniques but haven't been able to crack this one.
The strange bit you see with my use of the date '31211231' is just a very large date to handle your "no-end-date" scenario. I have assumed you won't really have many date ranges per employee, so I've used a simple Recursive Common Table Expression to combine the ranges.
To make it run faster, the starting anchor query keeps only those dates that will not link up to a prior range (per employee). The rest is just tree-walking the date ranges and growing the range. The final GROUP BY keeps only the largest date range built up per starting ANCHOR (employmentid, startdate) combination.
SQL Fiddle
MS SQL Server 2008 Schema Setup:
create table Tbl (
employmentid int,
startdate datetime,
enddate datetime);
insert Tbl values
(5, '2007-12-03', '2011-08-26'),
(5, '2013-05-02', null),
(30, '2006-10-02', '2011-01-16'),
(30, '2011-01-17', '2012-08-12'),
(30, '2012-08-13', null),
(66, '2007-09-24', null);
/*
-- expected outcome
EmploymentId StartDate EndDate
5 2007-12-03 2011-08-26
5 2013-05-02 NULL
30 2006-10-02 NULL
66 2007-09-24 NULL
*/
Query 1:
;with cte as (
select a.employmentid, a.startdate, a.enddate
from Tbl a
left join Tbl b on a.employmentid=b.employmentid and a.startdate-1=b.enddate
where b.employmentid is null
union all
select a.employmentid, a.startdate, b.enddate
from cte a
join Tbl b on a.employmentid=b.employmentid and b.startdate-1=a.enddate
)
select employmentid,
startdate,
nullif(max(isnull(enddate,'32121231')),'32121231') enddate
from cte
group by employmentid, startdate
order by employmentid
Results:
| EMPLOYMENTID | STARTDATE | ENDDATE |
-----------------------------------------------------------------------------------
| 5 | December, 03 2007 00:00:00+0000 | August, 26 2011 00:00:00+0000 |
| 5 | May, 02 2013 00:00:00+0000 | (null) |
| 30 | October, 02 2006 00:00:00+0000 | (null) |
| 66 | September, 24 2007 00:00:00+0000 | (null) |
SET NOCOUNT ON
DECLARE #T TABLE(ID INT,FromDate DATETIME, ToDate DATETIME)
INSERT INTO #T(ID,FromDate,ToDate)
SELECT 1,'20090801','20090803' UNION ALL
SELECT 2,'20090802','20090809' UNION ALL
SELECT 3,'20090805','20090806' UNION ALL
SELECT 4,'20090812','20090813' UNION ALL
SELECT 5,'20090811','20090812' UNION ALL
SELECT 6,'20090802','20090802'
SELECT ROW_NUMBER() OVER(ORDER BY s1.FromDate) AS ID,
s1.FromDate,
MIN(t1.ToDate) AS ToDate
FROM #T s1
INNER JOIN #T t1 ON s1.FromDate <= t1.ToDate
AND NOT EXISTS(SELECT * FROM #T t2
WHERE t1.ToDate >= t2.FromDate
AND t1.ToDate < t2.ToDate)
WHERE NOT EXISTS(SELECT * FROM #T s2
WHERE s1.FromDate > s2.FromDate
AND s1.FromDate <= s2.ToDate)
GROUP BY s1.FromDate
ORDER BY s1.FromDate
An alternative solution that uses window functions rather than recursive CTEs
SELECT
employmentid,
MIN(startdate) as startdate,
NULLIF(MAX(COALESCE(enddate,'9999-01-01')), '9999-01-01') as enddate
FROM (
SELECT
employmentid,
startdate,
enddate,
DATEADD(
DAY,
-COALESCE(
SUM(DATEDIFF(DAY, startdate, enddate)+1) OVER (PARTITION BY employmentid ORDER BY startdate ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),
0
),
startdate
) as grp
FROM #t
) withGroup
GROUP BY employmentid, grp
ORDER BY employmentid, startdate
This works by calculating a grp value that will be the same for all consecutive rows. This is achieved by:
Determine totals days the span occupies (+1 as the dates are inclusive)
SELECT *, DATEDIFF(DAY, startdate, enddate)+1 as daysSpanned FROM #t
Cumulative sum the days spanned for each employment, ordered by startdate. This gives us the total days spanned by all the previous employment spans
We coalesce with 0 to ensure we dont have NULLs in our cumulative sum of days spanned
We do not include current row in our cumulative sum, this is because we will use the value against startdate rather than enddate (we cant use it against enddate because of the NULLs)
SELECT *, COALESCE(
SUM(daysSpanned) OVER (
PARTITION BY employmentid
ORDER BY startdate
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
)
,0
) as cumulativeDaysSpanned
FROM (
SELECT *, DATEDIFF(DAY, startdate, enddate)+1 as daysSpanned FROM #t
) inner1
Subtract the cumulative days from the startdate to get our grp. This is the crux of the solution.
If the start date increases at the same rate as the days spanned then the days are consecutive, and subtracting the two will give us the same value.
If the startdate increases faster than the days spanned then there is a gap and we will get a new grp value greater than the previous one.
Although grp is a date, the date itself is meaningless we are using just as a grouping value
SELECT *, DATEADD(DAY, -cumulativeDaysSpanned, startdate) as grp
FROM (
SELECT *, COALESCE(
SUM(daysSpanned) OVER (
PARTITION BY employmentid
ORDER BY startdate
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
)
,0
) as cumulativeDaysSpanned
FROM (
SELECT *, DATEDIFF(DAY, startdate, enddate)+1 as daysSpanned FROM #t
) inner1
) inner2
With the results
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| employmentid | startdate | enddate | daysSpanned | cumulativeDaysSpanned | grp |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 5 | 2007-12-03 00:00:00.000 | 2011-08-26 00:00:00.000 | 1363 | 0 | 2007-12-03 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 5 | 2013-05-02 00:00:00.000 | NULL | NULL | 1363 | 2009-08-08 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 30 | 2006-10-02 00:00:00.000 | 2011-01-16 00:00:00.000 | 1568 | 0 | 2006-10-02 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 30 | 2011-01-17 00:00:00.000 | 2012-08-12 00:00:00.000 | 574 | 1568 | 2006-10-02 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 30 | 2012-08-13 00:00:00.000 | NULL | NULL | 2142 | 2006-10-02 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 66 | 2007-09-24 00:00:00.000 | NULL | NULL | 0 | 2007-09-24 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
Finally we can GROUP BY grp to get the get rid of the consecutive days.
Use MIN and MAX to get the new startdate and endate
To handle the NULL enddate we give them a large value to get picked up by MAX then convert them back to NULL again
SELECT
employmentid,
MIN(startdate) as startdate,
NULLIF(MAX(COALESCE(enddate,'9999-01-01')), '9999-01-01') as enddate
FROM (
SELECT *, DATEADD(DAY, -cumulativeDaysSpanned, startdate) as grp
FROM (
SELECT *, COALESCE(
SUM(daysSpanned) OVER (
PARTITION BY employmentid
ORDER BY startdate
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
)
,0
) as cumulativeDaysSpanned
FROM (
SELECT *, DATEDIFF(DAY, startdate, enddate)+1 as daysSpanned FROM #t
) inner1
) inner2
) inner3
GROUP BY employmentid, grp
ORDER BY employmentid, startdate
To get the desired result
+--------------+-------------------------+-------------------------+
| employmentid | startdate | enddate |
+--------------+-------------------------+-------------------------+
| 5 | 2007-12-03 00:00:00.000 | 2011-08-26 00:00:00.000 |
+--------------+-------------------------+-------------------------+
| 5 | 2013-05-02 00:00:00.000 | NULL |
+--------------+-------------------------+-------------------------+
| 30 | 2006-10-02 00:00:00.000 | NULL |
+--------------+-------------------------+-------------------------+
| 66 | 2007-09-24 00:00:00.000 | NULL |
+--------------+-------------------------+-------------------------+
We can combine the inner queries to get the query at the start of this answer. Which is shorter, but less explainable
Limitations of all this required that
there are no overlaps of startdate and enddate for an employment. This could produce collisions in our grp.
startdate is not NULL. However this could be overcome by replacing NULL start dates with small date values
Future developers can decipher the window black magic you performed
A modified script for combining all overlapping periods. For example
01.01.2001-01.01.2010
05.05.2005-05.05.2015
will give one period:
01.01.2001-05.05.2015
tbl.enddate must be completed
;WITH cte
AS(
SELECT
a.employmentid
,a.startdate
,a.enddate
from tbl a
left join tbl c on a.employmentid=c.employmentid
and a.startdate > c.startdate
and a.startdate <= dateadd(day, 1, c.enddate)
WHERE c.employmentid IS NULL
UNION all
SELECT
a.employmentid
,a.startdate
,a.enddate
from cte a
inner join tbl c on a.startdate=c.startdate
and (c.startdate = dateadd(day, 1, a.enddate) or (c.enddate > a.enddate and c.startdate <= a.enddate))
)
select distinct employmentid,
startdate,
nullif(max(enddate),'31.12.2099') enddate
from cte
group by employmentid, startdate
Related
SQL merge overlapping time intervals
I have a sqlite dataset like this Startdate | Enddate | ID 2019-04-29 | 2019-05-04 | 12 2019-04-23 | 2019-04-25 | 533 2019-04-23 | 2019-04-24 | 44 2019-04-24 | 2019-04-25 | 79 I'm trying to get an output that is sorted in range from startdate to startdate plus day in a loop until last Endate. The plan is to get all observation from min(Startdate) to min(Startdate) +1 and min(Startdate) plus +2 and so one Range | ID 2019-04-23 2019-04-24 | 44 2019-04-23 2019-04-25 | 44 2019-04-23 2019-04-25 | 533 2019-04-23 2019-04-25 | 79 2019-04-23 2019-05-04 | 44 2019-04-23 2019-05-04 | 533 2019-04-23 2019-05-04 | 79 2019-04-23 2019-05-04 | 12 I'm not sure have to achieve this
I believe that the following will provide the results that you want :- WITH cte(sdate,edate) AS ( SELECT (SELECT min(startdate) FROM mytable),(SELECT min(startdate) FROM mytable) UNION ALL SELECT sdate,date(edate,'+1 days') FROM cte WHERE edate <= ( SELECT max(enddate) FROM mytable ) LIMIT 1000 /* just in case limit to 1000 rows */ ) SELECT sdate||' '||edate AS Range, id FROM cte JOIN mytable ON startdate >= sdate AND enddate <= edate AND edate IN (SELECT enddate FROM mytable) Example/Test/Demo DROP TABLE IF EXISTS mytable; CREATE TABLE IF NOT EXISTS mytable (startdate TEXT, enddate TEXT, id INTEGER PRIMARY KEY); INSERT INTO mytable VALUES ('2019-04-29','2019-05-04',12), ('2019-04-23','2019-04-25',533), ('2019-04-23','2019-04-24',44), ('2019-04-24','2019-04-25',79) ; WITH cte(sdate,edate) AS ( SELECT (SELECT min(startdate) FROM mytable),(SELECT min(startdate) FROM mytable) UNION ALL SELECT sdate,date(edate,'+1 days') FROM cte WHERE edate <= ( SELECT max(enddate) FROM mytable ) LIMIT 1000 /* just in case limit to 1000 rows */ ) SELECT sdate||' '||edate AS Range, id FROM cte JOIN mytable ON startdate >= sdate AND enddate <= edate AND edate IN (SELECT enddate FROM mytable) ; DROP TABLE IF EXISTS mytable; The result is :- The only difference being the order of the ID column (I can't see any order from your expected result) As per the comment LIMIT 1000 is not required
How to return same row multiple times with multiple conditions
My knowledge is pretty basic so your help would be highly appreciated. I'm trying to return the same row multiple times when it meets the condition (I only have access to select query). I have a table of more than 500000 records with Customer ID, Start Date and End Date, where end date could be null. I am trying to add a new column called Week_No and list all rows accordingly. For example if the date range is more than one week, then the row must be returned multiple times with corresponding week number. Also I would like to count overlapping days, which will never be more than 7 (week) per row and then count unavailable days using second table. Sample data below t1 ID | Start_Date | End_Date 000001 | 12/12/2017 | 03/01/2018 000002 | 13/01/2018 | 000003 | 02/01/2018 | 11/01/2018 ... t2 ID | Unavailable 000002 | 14/01/2018 000003 | 03/01/2018 000003 | 04/01/2018 000003 | 08/01/2018 ... I cannot pass the stage of adding week no. I have tried using CASE and UNION ALL but keep getting errors. declare #week01start datetime = '2018-01-01 00:00:00' declare #week01end datetime = '2018-01-07 00:00:00' declare #week02start datetime = '2018-01-08 00:00:00' declare #week02end datetime = '2018-01-14 00:00:00' ... SELECT ID, '01' as Week_No, '2018' as YEAR, Start_Date, End_Date FROM t1 WHERE (Start_Date <= #week01end and End_Date >= #week01start) or (Start_Date <= #week01end and End_Date is null) UNION ALL SELECT ID, '02' as Week_No, '2018' as YEAR, Start_Date, End_Date FROM t1 WHERE (Start_Date <= #week02end and End_Date >= #week02start) or (Start_Date <= #week02end and End_Date is null) ... The new table should look like this ID | Week_No | Year | Start_Date | End_Date | Overlap | Unavail_Days 000001 | 01 | 2018 | 12/12/2017 | 03/01/2018 | 3 | 000002 | 02 | 2018 | 13/01/2018 | | 2 | 1 000003 | 01 | 2018 | 02/01/2018 | 11/01/2018 | 6 | 2 000003 | 02 | 2018 | 02/01/2018 | 11/01/2018 | 4 | 1 ...
business wise i cannot understand what you are trying to achieve. You can use the following code though to calculate your overlapping days etc. I did it the way you asked, but i would recommend a separate table, like a Time dimension to produce a "cleaner" solution /*sample data set in temp table*/ select '000001' as id, '2017-12-12'as start_dt, ' 2018-01-03' as end_dt into #tmp union select '000002' as id, '2018-01-13 'as start_dt, null as end_dt union select '000003' as id, '2018-01-02' as start_dt, '2018-01-11' as end_dt /*calculate week numbers and week diff according to dates*/ select *, DATEPART(WK,start_dt) as start_weekNumber, DATEPART(WK,end_dt) as end_weekNumber, case when DATEPART(WK,end_dt) - DATEPART(WK,start_dt) > 0 then (DATEPART(WK,end_dt) - DATEPART(WK,start_dt)) +1 else (52 - DATEPART(WK,start_dt)) + DATEPART(WK,end_dt) end as WeekDiff into #tmp1 from ( SELECT *,DATEADD(DAY, 2 - DATEPART(WEEKDAY, start_dt), CAST(start_dt AS DATE)) [start_dt_Week_Start_Date], DATEADD(DAY, 8 - DATEPART(WEEKDAY, start_dt), CAST(start_dt AS DATE)) [startdt_Week_End_Date], DATEADD(DAY, 2 - DATEPART(WEEKDAY, end_dt), CAST(end_dt AS DATE)) [end_dt_Week_Start_Date], DATEADD(DAY, 8 - DATEPART(WEEKDAY, end_dt), CAST(end_dt AS DATE)) [end_dt_Week_End_Date] from #tmp ) s /*cte used to create duplicates when week diff is over 1*/ ;with x as ( SELECT TOP (10) rn = ROW_NUMBER() --modify the max you want OVER (ORDER BY [object_id]) FROM sys.all_columns ORDER BY [object_id] ) /*final query*/ select --* ID, start_weekNumber+ (r-1) as Week, DATEPART(YY,start_dt) as [YEAR], start_dt, end_dt, null as Overlap, null as unavailable_days from ( select *, ROW_NUMBER() over (partition by id order by id) r from ( select d.* from x CROSS JOIN #tmp1 AS d WHERE x.rn <= d.WeekDiff union all select * from #tmp1 where WeekDiff is null ) a )a_ext order by id,start_weekNumber --drop table #tmp1,#tmp The above will produce the results you want except the overlap and unavailable columns. Instead of just counting weeks, i added the number of week in the year using start_dt, but you can change that if you don't like it: ID Week YEAR start_dt end_dt Overlap unavailable_days 000001 50 2017 2017-12-12 2018-01-03 NULL NULL 000001 51 2017 2017-12-12 2018-01-03 NULL NULL 000001 52 2017 2017-12-12 2018-01-03 NULL NULL 000002 2 2018 2018-01-13 NULL NULL NULL 000003 1 2018 2018-01-02 2018-01-11 NULL NULL 000003 2 2018 2018-01-02 2018-01-11 NULL NULL
Date range with minimum and maximum dates from dataset having records with continuous date range
I have a dataset with id ,Status and date range of employees. The input dataset given below are the details of one employee. The date ranges in the records are continuous(in exact order) such that startdate of second row will be the next date of enddate of first row. If an employee takes leave continuously for different months, then the table is storing the info with date range as separated for different months. For example: In the input set, the employee has taken Sick leave from '16-10-2016' to '31-12-2016' and joined back on '1-1-2017'. So there are 3 records for this item but the dates are continuous. In the output I need this as one record as shown in the expected output dataset. INPUT Id Status StartDate EndDate 1 Active 1-9-2007 15-10-2016 1 Sick 16-10-2016 31-10-2016 1 Sick 1-11-2016 30-11-2016 1 Sick 1-12-2016 31-12-2016 1 Active 1-1-2017 4-2-2017 1 Unpaid 5-2-2017 9-2-2017 1 Active 10-2-2017 11-2-2017 1 Unpaid 12-2-2017 28-2-2017 1 Unpaid 1-3-2017 31-3-2017 1 Unpaid 1-4-2017 30-4-2017 1 Active 1-5-2017 13-10-2017 1 Sick 14-10-2017 11-11-2017 1 Active 12-11-2017 NULL EXPECTED OUTPUT Id Status StartDate EndDate 1 Active 1-9-2007 15-10-2016 1 Sick 16-10-2016 31-12-2016 1 Active 1-1-2017 4-2-2017 1 Unpaid 5-2-2017 9-2-2017 1 Active 10-2-2017 11-2-2017 1 Unpaid 12-2-2017 30-4-2017 1 Active 1-5-2017 13-10-2017 1 Sick 14-10-2017 11-11-2017 1 Active 12-11-2017 NULL I can't take min(startdate) and max(EndDate) group by id,status because if the same employee has taken another Sick leave then that end date ('11-11-2017' in the example) will come as the End date. can anyone help me with the query in SQL server 2014?
It suddenly hit me that this is basically a gaps and islands problem - so I've completely changed my solution. For this solution to work, the dates does not have to be consecutive. First, create and populate sample table (Please save us this step in your future questions): DECLARE #T AS TABLE ( Id int, Status varchar(10), StartDate date, EndDate date ); SET DATEFORMAT DMY; -- This is needed because how you specified your dates. INSERT INTO #T (Id, Status, StartDate, EndDate) VALUES (1, 'Active', '1-9-2007', '15-10-2016'), (1, 'Sick', '16-10-2016', '31-10-2016'), (1, 'Sick', '1-11-2016', '30-11-2016'), (1, 'Sick', '1-12-2016', '31-12-2016'), (1, 'Active', '1-1-2017', '4-2-2017'), (1, 'Unpaid', '5-2-2017', '9-2-2017'), (1, 'Active', '10-2-2017', '11-2-2017'), (1, 'Unpaid', '12-2-2017', '28-2-2017'), (1, 'Unpaid', '1-3-2017', '31-3-2017'), (1, 'Unpaid', '1-4-2017', '30-4-2017'), (1, 'Active', '1-5-2017', '13-10-2017'), (1, 'Sick', '14-10-2017', '11-11-2017'), (1, 'Active', '12-11-2017', NULL); The (new) common table expression: ;WITH CTE AS ( SELECT Id, Status, StartDate, EndDate, ROW_NUMBER() OVER(PARTITION BY Id ORDER BY StartDate) - ROW_NUMBER() OVER(PARTITION BY Id, Status ORDER BY StartDate) As IslandId, ROW_NUMBER() OVER(PARTITION BY Id ORDER BY StartDate DESC) - ROW_NUMBER() OVER(PARTITION BY Id, Status ORDER BY StartDate DESC) As ReverseIslandId FROM #T ) The (new) query: SELECT DISTINCT Id, Status, MIN(StartDate) OVER(PARTITION BY IslandId, ReverseIslandId) As StartDate, NULLIF(MAX(ISNULL(EndDate, '9999-12-31')) OVER(PARTITION BY IslandId, ReverseIslandId), '9999-12-31') As EndDate FROM CTE ORDER BY StartDate (new) Results: Id Status StartDate EndDate 1 Active 01.09.2007 15.10.2016 1 Sick 16.10.2016 31.12.2016 1 Active 01.01.2017 04.02.2017 1 Unpaid 05.02.2017 09.02.2017 1 Active 10.02.2017 11.02.2017 1 Unpaid 12.02.2017 30.04.2017 1 Active 01.05.2017 13.10.2017 1 Sick 14.10.2017 11.11.2017 1 Active 12.11.2017 NULL You can see a live demo on rextester. Please note that string representation of dates in SQL should be acccording to ISO 8601 - meaning either yyyy-MM-dd or yyyyMMdd as it's unambiguous and will always be interpreted correctly by SQL Server.
It's an example of GROUPING AND WINDOW. First you set a reset point for each Status Sum to set a group Then get max/min dates of each group. ;with x as ( select Id, Status, StartDate, EndDate, iif (lag(Status) over (order by Id, StartDate) = Status, null, 1) rst from emp ), y as ( select Id, Status, StartDate, EndDate, sum(rst) over (order by Id, StartDate) grp from x ) select Id, MIN(Status) as Status, MIN(StartDate) StartDate, MAX(EndDate) EndDate from y group by Id, grp order by Id, grp GO Id | Status | StartDate | EndDate -: | :----- | :------------------ | :------------------ 1 | Active | 01/09/2007 00:00:00 | 15/10/2016 00:00:00 1 | Sick | 16/10/2016 00:00:00 | 31/12/2016 00:00:00 1 | Active | 01/01/2017 00:00:00 | 04/02/2017 00:00:00 1 | Unpaid | 05/02/2017 00:00:00 | 09/02/2017 00:00:00 1 | Active | 10/02/2017 00:00:00 | 11/02/2017 00:00:00 1 | Unpaid | 12/02/2017 00:00:00 | 30/04/2017 00:00:00 1 | Active | 01/05/2017 00:00:00 | 13/10/2017 00:00:00 1 | Sick | 14/10/2017 00:00:00 | 11/11/2017 00:00:00 1 | Active | 12/11/2017 00:00:00 | null dbfiddle here
Here's an alternative answer that doesn't use LAG. First I need to take a copy of your test data: DECLARE #table TABLE (Id INT, [Status] VARCHAR(50), StartDate DATE, EndDate DATE); INSERT INTO #table SELECT 1, 'Active', '20070901', '20161015'; INSERT INTO #table SELECT 1, 'Sick', '20161016', '20161031'; INSERT INTO #table SELECT 1, 'Sick', '20161101', '20161130'; INSERT INTO #table SELECT 1, 'Sick', '20161201', '20161231'; INSERT INTO #table SELECT 1, 'Active', '20170101', '20170204'; INSERT INTO #table SELECT 1, 'Unpaid', '20170205', '20170209'; INSERT INTO #table SELECT 1, 'Active', '20170210', '20170211'; INSERT INTO #table SELECT 1, 'Unpaid', '20170212', '20170228'; INSERT INTO #table SELECT 1, 'Unpaid', '20170301', '20170331'; INSERT INTO #table SELECT 1, 'Unpaid', '20170401', '20170430'; INSERT INTO #table SELECT 1, 'Active', '20170501', '20171013'; INSERT INTO #table SELECT 1, 'Sick', '20171014', '20171111'; INSERT INTO #table SELECT 1, 'Active', '20171112', NULL; Then the query is: WITH add_order AS ( SELECT *, ROW_NUMBER() OVER (ORDER BY StartDate) AS order_id FROM #table), links AS ( SELECT a1.Id, a1.[Status], a1.order_id, MIN(a1.order_id) AS start_order_id, MAX(ISNULL(a2.order_id, a1.order_id)) AS end_order_id, MIN(a1.StartDate) AS StartDate, MAX(ISNULL(a2.EndDate, a1.EndDate)) AS EndDate FROM add_order a1 LEFT JOIN add_order a2 ON a2.Id = a1.Id AND a2.[Status] = a1.[Status] AND a2.order_id = a1.order_id + 1 AND a2.StartDate = DATEADD(DAY, 1, a1.EndDate) GROUP BY a1.Id, a1.[Status], a1.order_id), merged AS ( SELECT l1.Id, l1.[Status], l1.[StartDate], ISNULL(l2.EndDate, l1.EndDate) AS EndDate, ROW_NUMBER() OVER (PARTITION BY l1.Id, l1.[Status], ISNULL(l2.EndDate, l1.EndDate) ORDER BY l1.order_id) AS link_id FROM links l1 LEFT JOIN links l2 ON l2.order_id = l1.end_order_id) SELECT Id, [Status], StartDate, EndDate FROM merged WHERE link_id = 1 ORDER BY StartDate; Results are: Id Status StartDate EndDate 1 Active 2007-09-01 2016-10-15 1 Sick 2016-10-16 2016-12-31 1 Active 2017-01-01 2017-02-04 1 Unpaid 2017-02-05 2017-02-09 1 Active 2017-02-10 2017-02-11 1 Unpaid 2017-02-12 2017-04-30 1 Active 2017-05-01 2017-10-13 1 Sick 2017-10-14 2017-11-11 1 Active 2017-11-12 NULL How does it work? First I add a sequence number, to assist with merging contiguous rows together. Then I determine the rows that can be merged together, add a number to identify the first row in each set that can be merged, and finally pick the first rows out of the final CTE. Note that I also have to handle rows that can't be merged, hence the LEFT JOINs and ISNULL statements. Just for interest, this is what the output from the final CTE looks like, before I filter out all but the rows with a link_id of 1: Id Status StartDate EndDate link_id 1 Active 2007-09-01 2016-10-15 1 1 Sick 2016-10-16 2016-12-31 1 1 Sick 2016-11-01 2016-12-31 2 1 Sick 2016-12-01 2016-12-31 3 1 Active 2017-01-01 2017-02-04 1 1 Unpaid 2017-02-05 2017-02-09 1 1 Active 2017-02-10 2017-02-11 1 1 Unpaid 2017-02-12 2017-04-30 1 1 Unpaid 2017-03-01 2017-04-30 2 1 Unpaid 2017-04-01 2017-04-30 3 1 Active 2017-05-01 2017-10-13 1 1 Sick 2017-10-14 2017-11-11 1 1 Active 2017-11-12 NULL 1
You could use lag() and lead() function together to check the previous and next status WITH CTE AS ( select *, COALESCE(LEAD(status) OVER(ORDER BY (select 1)), '0') Nstatus, COALESCE(LAG(status) OVER(ORDER BY (select 1)), '0') Pstatus from table ) SELECT * FROM CTE WHERE (status <> Nstatus AND status <> Pstatus) OR (status <> Pstatus)
Impala SQL: Merging rows with overlapping dates. WHERE EXISTS and recursive CTE not supported
I am trying to merge rows with overlapping date intervals in a table in Impala SQL. However the solutions I have found to solve this are not supported by Impala eg. WHERE EXISTS and recursive CTEs. How would I write a query for this in Impala? Table: #T ID StartDate EndDate 1 20170101 20170201 2 20170101 20170401 3 20170505 20170531 4 20170530 20170531 5 20170530 20170831 6 20171001 20171005 7 20171101 20171225 8 20171105 20171110 Required Output: StartDate EndDate 20170101 20170401 20170505 20170831 20171001 20171005 Example of what I am trying to achieve that is not supported in Impala: SELECT s1.StartDate, MIN(t1.EndDate) AS EndDate FROM #T s1 INNER JOIN #T t1 ON s1.StartDate <= t1.EndDate AND NOT EXISTS(SELECT * FROM #T t2 WHERE t1.EndDate >= t2.StartDate AND t1.EndDate < t2.EndDate) WHERE NOT EXISTS(SELECT * FROM #T s2 WHERE s1.StartDate > s2.StartDate AND s1.StartDate <= s2.EndDate) GROUP BY s1.StartDate ORDER BY s1.StartDate Similar questions: Merge overlapping date intervals Eliminate and reduce overlapping date ranges https://gerireshef.wordpress.com/2010/05/02/packing-date-intervals/ https://www.sqlservercentral.com/Forums/Topic826031-8-1.aspx
select min(StartDate) as StartDate ,max(EndDate) as EndDate from (select StartDate,EndDate ,count (is_gap) over ( order by StartDate,ID ) as range_id from (select ID,StartDate,EndDate ,case when max (EndDate) over ( order by StartDate,ID rows between unbounded preceding and 1 preceding ) < StartDate then true end as is_gap from t ) t ) t group by range_id order by StartDate ; +------------+------------+ | startdate | enddate | +------------+------------+ | 2017-01-01 | 2017-04-01 | | 2017-05-05 | 2017-08-31 | | 2017-10-01 | 2017-10-05 | | 2017-11-01 | 2017-12-25 | +------------+------------+
Query to return all the days of a month
This problem is related to this, which has no solution in sight: here I have a table that shows me all sessions of an area. This session has a start date. I need to get all the days of month of the start date of the session by specific area (in this case) I have this query: SELECT idArea, idSession, startDate FROM SessionsPerArea WHERE idArea = 1 idArea | idSession | startDate | 1 | 1 | 01-01-2013 | 1 | 2 | 04-01-2013 | 1 | 3 | 07-02-2013 | And i want something like this: date | Session | 01-01-2013 | 1 | 02-01-2013 | NULL | 03-01-2013 | NULL | 04-01-2013 | 1 | ........ | | 29-01-2013 | NULL | 30-01-2013 | NULL | In this case, the table returns me all the days of January. The second column is the number of sessions that occur on that day, because there may be several sessions on the same day. Anyone can help me?
Please try: DECLARE #SessionsPerArea TABLE (idArea INT, idSession INT, startDate DATEtime) INSERT #SessionsPerArea VALUES (1,1,'2013-01-01') INSERT #SessionsPerArea VALUES (1,2,'2013-01-04') INSERT #SessionsPerArea VALUES (1,3,'2013-07-02') DECLARE #RepMonth as datetime SET #RepMonth = '01/01/2013'; WITH DayList (DayDate) AS ( SELECT #RepMonth UNION ALL SELECT DATEADD(d, 1, DayDate) FROM DayList WHERE (DayDate < DATEADD(d, -1, DATEADD(m, 1, #RepMonth))) ) SELECT * FROM DayList t1 left join #SessionsPerArea t2 on t1.DayDate=startDate and t2.idArea = 1
This will work: DECLARE #SessionsPerArea TABLE (idArea INT, idSession INT, startDate DATE) INSERT #SessionsPerArea VALUES (1,1,'2013-01-01'), (1,2,'2013-01-04'), (1,3,'2013-07-02') ;WITH t1 AS ( SELECT startDate , DATEADD(MONTH, DATEDIFF(MONTH, '1900-01-01', startDate), '1900-01-01') firstInMonth , DATEADD(DAY, -1, DATEADD(MONTH, DATEDIFF(MONTH, '1900-01-01', startDate) + 1, '1900-01-01')) lastInMonth , COUNT(*) cnt FROM #SessionsPerArea WHERE idArea = 1 GROUP BY startDate ) , calendar AS ( SELECT DISTINCT DATEADD(DAY, c.number, t1.firstInMonth) d FROM t1 JOIN master..spt_values c ON type = 'P' AND DATEADD(DAY, c.number, t1.firstInMonth) BETWEEN t1.firstInMonth AND t1.lastInMonth ) SELECT d date , cnt Session FROM calendar c LEFT JOIN t1 ON t1.startDate = c.d It uses simple join on master..spt_values table to generate rows.
Just an example of calendar table. To return data for a month adjust the number of days between < 32, for a year to 365+1. You can calculate the number of days in a month or between start/end dates with query. I'm not sure how to do this in SQL Server. I'm using hardcoded values to display all dates in Jan-2013. You can adjust start and end dates for diff. month or to get start/end dates with queries...: WITH data(r, start_date) AS ( SELECT 1 r, date '2012-12-31' start_date FROM any_table --dual in Oracle UNION ALL SELECT r+1, date '2013-01-01'+r-1 FROM data WHERE r < 32 -- number of days between start and end date+1 ) SELECT start_date FROM data WHERE r > 1 / START_DATE ---------- 1/1/2013 1/2/2013 1/3/2013 ... ... 1/31/2013