Combining records with overlapping date ranges in SQL - sql

**EDIT: Our current server is SQL 2008 R2 so LAG/LEAD functions will not work.
I'm attempting to take multiple streams of data within a table and combine them into 1 stream of data. Given the 3 streams of data below I want the end result to be 1 stream that gives preference to the status 'on'. Recursion seems to be the best option but I've had no luck so far putting together a query that does what i want.
CREATE TABLE #Dates(
id INT IDENTITY,
status VARCHAR(4),
StartDate Datetime,
EndDate Datetime,
booth int)
INSERT #Dates
VALUES
( 'off','2015-01-01 08:00','2015-01-01 08:15',1),
( 'on','2015-01-01 08:15','2015-01-01 09:15',1),
( 'off','2015-01-01 08:50','2015-01-01 09:00',2),
( 'on','2015-01-01 09:00','2015-01-01 09:30',2),
( 'off','2015-01-01 09:30','2015-01-01 09:35',2),
( 'on','2015-01-01 09:35','2015-01-01 10:15',2),
( 'off','2015-01-01 09:30','2015-01-01 10:30',3),
( 'on','2015-01-01 10:30','2015-01-01 11:00',3)
status StartDate EndDate
---------------------------
off 08:00 08:15
on 08:15 09:15
off 08:50 09:00
on 09:00 09:30
off 09:30 09:35
on 09:35 10:15
off 09:30 10:30
on 10:30 11:00
End Result:
status StartDate EndDate
---------------------------
off 8:00 8:15
on 8:15 9:15
on 9:15 9:30
off 9:30 9:35
on 9:35 10:15
off 10:15 10:30
on 10:30 11:00
Essentially, anytime there is a status of 'on' it should override any concurrent 'off' status.
Source:
|----off----||---------on---------|
|---off--||------on----||---off---||--------on------|
|--------------off------------------||------on------|
Result (Either result would work):
|----off----||----------on--------||---on---||---off---||--------on------||-off--||------on------|
|----off----||----------on------------------||---off---||--------on------||-off--||------on------|

Here's the simplest version for 2008 that I was able to figure out:
; with Data (Date) as (
select StartDate from Dates
union
select EndDate from Dates),
Ranges (StartDate, Status) as (
select D.Date, D2.Status
from Data D
outer apply (
select top 1 D2.Status
from Dates D2
where D2.StartDate <= D.Date and D2.EndDate > D.Date
order by case when Status = 'on' then 1 else 2 end
) D2)
select R.StartDate,
(select min(D.Date) from Data D where D.Date > R.StartDate) as EndDate,
Status
from Ranges R
order by R.StartDate
It will return new row starting from each start / end point even if the status is the same as previous. Didn't find any simple way to combine them.
Edit: Changing the first CTE to this will combine the rows:
; with Data (Date) as (
select distinct StartDate from Dates D1
where not exists (Select 1 from Dates D2
where D2.StartDate < D1.StartDate and D2.EndDate > D1.StartDate and
Status = 'on')
union
select distinct EndDate from Dates D1
where not exists (Select 1 from Dates D2
where D2.StartDate < D1.EndDate and D2.EndDate > D1.EndDate and
Status = 'on')
),

So basically every time there's even one "on" record, it is on, otherwise off?
Here's a little different kind of approach to the issue, adding +1 every time an "on" cycle starts, and adding -1 when it ends. Then we can use a running total for the status, and when the status is 0, then it's off, and otherwise it is on:
select Date,
sum(oncounter) over (order by Date) as onstat,
sum(offcounter) over (order by Date) as offstat
from (
select StartDate as Date,
case when status = 'on' then 1 else 0 end oncounter,
case when status = 'off' then 1 else 0 end offcounter
from Dates
union all
select EndDate as Date,
case when status = 'on' then -1 else 0 end oncounter,
case when status = 'off' then -1 else 0 end offcounter
from Dates
) TMP
Edit: Added also counter for off -states. It works the same way as "on" counter and when both are 0, then status is neither on or off.
Final result, it seems it can be done, although it's not looking that nice anymore, but at least it's not recursive :)
select
Date as StartDate,
lead(Date, 1, '21000101') over (order by Date) as EndDate,
case onstat
when 0 then
case when offstat > 0 then 'Off' else 'N/A' end
else 'On' end as State
from (
select
Date,
onstat, prevon,
offstat, prevoff
from (
Select
Date,
onstat,
lag(onstat, 1, 0) over (order by Date) as prevon,
offstat,
lag(offstat, 1, 0) over (order by Date) as prevoff
from (
select
Date,
sum(oncounter) over (order by Date) as onstat,
sum(offcounter) over (order by Date) as offstat
from (
select
StartDate as Date,
case when status = 'on' then 1 else 0 end oncounter,
case when status = 'off' then 1 else 0 end offcounter
from
Dates
union all
select
EndDate as Date,
case when status = 'on' then -1 else 0 end oncounter,
case when status = 'off' then -1 else 0 end offcounter
from
Dates
) TMP
) TMP2
) TMP3
where (onstat = 1 and prevon = 0)
or (onstat = 0 and prevon = 1)
or (onstat = 0 and offstat = 1 and prevoff = 0)
or (onstat = 0 and offstat = 0 and prevoff = 1)
) TMP4
It has quite many derived tables for the window functions and getting only the status changes into the result set so lead can pick up correct dates. It might be possible to get rid of some of them.
SQL Fiddle: http://sqlfiddle.com/#!6/b5cfa/7

Related

Using RANK OVER PARTITION to Compare a Previous Row Result

I'm working with a dataset that contains (among other columns) a userID and startDate. The goal is to have a new column "isRehire" that compares their startDate with previous startDates.
If the difference between startDates is within 1 year, isRehire = Y.
The difficulty and my issue comes in when there are more than 2 startDates for a user. If the difference between the 3rd and 1st startDate is over a year, the 3rd startDate would be the new "base date" for being a rehire.
userID
startDate
isRehire
123
07/24/19
N
123
02/04/20
Y
123
08/25/20
N
123
12/20/20
Y
123
06/15/21
Y
123
08/20/21
Y
123
08/30/21
N
In the above example you can see the issue visualized. The first startDate 07/24/19, the user is not a Rehire. The second startDate 02/04/20, they are a Rehire. The 3rd startDate 08/25/20 the user is not a rehire because it has been over 1 year since their initial startDate. This is the new "anchor" date.
The next 3 instances are all Y as they are within 1 year of the new "anchor" date of 08/25/20. The final startDate of 08/30/21 is over a year past 08/25/20, indicating a "N" and the "cycle" resets again with 08/30/21 as the new "anchor" date.
My goal is to utilize RANK OVER PARTITION to be able to complete this, as from my testing I believe there must be a way to assign ranks to the dates which can then be wrapped in a select statement for a CASE expression to be written. Although it's completely possible I'm barking up the wrong tree entirely.
Below you can see some of the code I've attempted to use to complete this, although without much success so far.
select TestRank,
startDate,
userID,
CASE WHEN TestRank = TestRank THEN (TestRank - 1
) ELSE '' END AS TestRank2
from
(
select userID,
startDate
RANK() OVER (PARTITION BY userID
ORDER BY startDate desc)
as TestRank
from [MyTable] a
WHERE a.userID = [int]
) b
This is complicated logic, and window functions are not sufficient. To solve this, you need iteration -- or in SQL-speak, a recursive CTE:
with t as (
select t.*, row_number() over (partition by id order by startdate) as seqnum
from mytable t
),
cte as (
select t.id, t.startdate, t.seqnum, 'N' as isrehire, t.startdate as anchordate
from t
where seqnum = 1
union all
select t.id, t.startdate, t.seqnum,
(case when t.startdate > dateadd(year, 1, cte.anchordate) then 'N' else 'Y' end),
(case when t.startdate > dateadd(year, 1, cte.anchordate) then t.startdate else cte.anchordate end)
from cte join
t
on t.seqnum = cte.seqnum + 1
)
select *
from cte
order by id, startdate;
Here is a db<>fiddle.

Choosing MAX value by id in a view?

I have created a simple view based on a few columns in our database
ALTER VIEW [BI].[v_RCVLI_Test] AS
Select distinct
Borger.CPRnrKort as CPR,
(...)
IndsatsDetaljer.VisitationId as VisitationsId,
Indsats.KatalogNavn as IndsatsNavn,
(case
when
(
Indsats.Model = 'SMDB2 Tilbudsmodel' or
Indsats.Model = 'SMDB2 Samtalemodel' or
Indsats.Model = 'Tilbudsmodel' or
Indsats.Model = 'NAB Tilbudsmodel'
)
then IndsatsDetaljer.ServicePeriodeStart
else IndsatsDetaljer.Ikrafttraedelsesdato
end
) as StartDato,
(case
when
(
Indsats.Model = 'SMDB2 Tilbudsmodel' or
Indsats.Model = 'SMDB2 Samtalemodel' or
Indsats.Model = 'Tilbudsmodel'
)
then (case when IndsatsDetaljer.VisitationSlut = '9999-12-31' then convert(varchar(10), getdate(), 23) else IndsatsDetaljer.VisitationSlut end)
when
Indsats.Model = 'NAB Tilbudsmodel'
then (case when IndsatsDetaljer.NABehandlingSlutDato = '9999-12-31' then convert(varchar(10), getdate(), 23) else IndsatsDetaljer.NABehandlingSlutDato end)
else (case when IndsatsDetaljer.VisitationSlut = '9999-12-31' then convert(varchar(10), getdate(), 23) else IndsatsDetaljer.VisitationSlut end)
end
) as StopDato,
Refusion.Handlekommune as Handlekommune,
replace(Refusion.Betalingskommune, 'Ukendt', 'Kendt') Betalingskommune
from nexus2.Fact_VisiteretTid as Fact
join nexus2.Dim_Paragraf Paragraf
on Fact.DW_SK_Paragraf = Paragraf.DW_SK_Paragraf
join nexus2.Dim_Indsats Indsats
on Fact.DW_SK_Indsats = Indsats.DW_SK_Indsats (...)
The cases for StartDato and StopDato are there because those dates come from different columns. I've converted the date '9999-12-31' to the the current date because we'll be doing some time calculations later on, and it's just more convenient.
CPR is the id of a person, VisitationsId is the id for the service the person received.
In theory, There should only be one StartDato and one StopDato per VisitationsId, but because of a glitch in the documentation system, we sometimes get TWO StopDato: one is the correct, and one is '9999-12-31' (now converted to current date).
So I need to group by VisitationsId and then just take the MIN value of StopDato, but I'm kind of unsure how to go about doing that?
CPR
VisitationsId
StartDato
StopDato
Something Else
123
56
2019-01-01
2019-12-12
Something
123
56
2019-01-01
9999-12-31
Something
123
58
2019-01-01
2019-12-14
Something
345
59
2018-11-01
9999-12-31
Something
345
55
2017-01-02
2017-11-12
Something
345
55
2017-01-02
9999-12-31
Something
In the above table I need to remove lines 2 and 6, because the VisitationsId is identical to the previous row, but they diverge on StopDato.
Using a group by anywhere in the query gives me an error on another (seemingly random) column telling me that the column is:
invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Any suggestions on how I can go about doing this?
Add a filter which tests for this condition?
with cte (
{your current query}
)
select *
from cte T
where not (
StopDato = '9999-12-31'
and exists (
select 1
from cte T1
where T1.VisitationsId = T.VisitationsId
and StopDato != '9999-12-31'
)
);
And you look like you are converting StopDato to a varchar which is bad - you should treat dates as dates until you need to display them.

Count number of days each employee take vacation in a month SQL Server

I have this table:
Vacationtbl:
ID Start End
-------------------------
01 04/10/17 04/12/17
01 04/27/17 05/02/17
02 04/13/17 04/15/17
02 04/17/17 04/20/17
03 06/14/17 06/22/17
Employeetbl:
ID Fname Lname
------------------
01 John AAA
02 Jeny BBB
03 Jeby CCC
I like to count the number of days each employee take vacation in April.
My query:
SELECT
SUM(DATEDIFF(DAY, Start, End) + 1) AS Days
FROM
Vacationtbl
GROUP BY
ID
01 returns 9 (not correct)
02 returns 7 (correct)
How do I fix the query so that it counts until the end of month and stops at end of month. For example, April has 30 days. On second row, Employee 01 should counts 4/27/17 until 4/30/17. And 05/02/17 is for May.
Thanks
The Tally/Calendar table is the way to go. However, you can use an ad-hoc tally table.
Example
Select Year = Year(D)
,Month = Month(D)
,ID
,Days = count(*)
From Vacationtbl A
Cross Apply (
Select Top (DateDiff(DAY,[Start],[End])+1) D=DateAdd(DAY,-1+Row_Number() Over (Order By (Select Null)),[Start])
From master..spt_values
) B
-- YOUR OPTIONAL WHERE STATEMENT HERE --
Group By ID,Year(D),Month(D)
Order By 1,2,3
Returns
Year Month ID Days
2017 4 01 7
2017 4 02 7
2017 5 01 2
EDIT - To Show All ID even if Zero Days
Select ID
,Year = Year(D)
,Month = Month(D)
,Days = sum(case when D between [Start] and [End] then 1 else 0 end)
From (
Select Top (DateDiff(DAY,'05/01/2017','05/31/2017')+1) D=DateAdd(DAY,-1+Row_Number() Over (Order By (Select Null)),'05/01/2017')
From master..spt_values
) D
Cross Join Vacationtbl B
Group By ID,Year(D),Month(D)
Order By 1,2,3
Returns
ID Year Month Days
1 2017 5 2
2 2017 5 0
dbFiddle if it Helps
EDIT - 2 Corrects for Overlaps (Gaps and Islands)
--Create Some Sample Data
----------------------------------------------------------------------
Declare #Vacationtbl Table ([ID] varchar(50),[Start] date,[End] date)
Insert Into #Vacationtbl Values
(01,'04/10/17','04/12/17')
,(01,'04/27/17','05/02/17')
,(02,'04/13/17','04/15/17')
,(02,'04/17/17','04/20/17')
,(02,'04/16/17','04/17/17') -- << Overlap
,(03,'05/16/17','05/17/17')
-- The Actual Query
----------------------------------------------------------------------
Select ID
,Year = Year(D)
,Month = Month(D)
,Days = sum(case when D between [Start] and [End] then 1 else 0 end)
From (Select Top (DateDiff(DAY,'04/01/2017','04/30/2017')+1) D=DateAdd(DAY,-1+Row_Number() Over (Order By (Select Null)),'04/01/2017') From master..spt_values ) D
Cross Join (
Select ID,[Start] = min(D),[End] = max(D)
From (
Select E.*,Grp = Dense_Rank() over (Order By D) - Row_Number() over (Partition By ID Order By D)
From (
Select Distinct A.ID,D
From #Vacationtbl A
Cross Apply (Select Top (DateDiff(DAY,A.[Start],A.[End])+1) D=DateAdd(DAY,-1+Row_Number() Over (Order By (Select Null)),A.[Start]) From master..spt_values ) B
) E
) G
Group By ID,Grp
) B
Group By ID,Year(D),Month(D)
Order By 1,2,3
Returns
ID Year Month Days
1 2017 4 7
2 2017 4 8
3 2017 4 0
Without a dates table, you could use
select Id
,sum(case when [end]>'20170430' and [start]<'20170401' then datediff(day,'20170401','20170430')+1
when [end]>'20170430' then datediff(day,[start],'20170430')+1
when [start]<'20170401' then datediff(day,'20170401',[end])+1
else datediff(day,[start],[end])+1
end) as VacationDays
from Vacationtbl
where [start] <= '20170430' and [end] >= '20170401'
group by Id
There are 3 conditions here
Start is before this month and the end is after this month. In this case you subtract the end and start dates of the month.
End is after month end and start is in the month, in this case subtract month end date from the start.
Start is before this month but the end is in the month. In this case subtract month start date and the end date.
Edit: Based on the OP's comments that the future dates have to be included,
/*This recursive cte generates the month start and end dates with in a given time frame
For Eg: all the month start and end dates for 2017
Change the start and end period as needed*/
with dates (month_start_date,month_end_date) as
(select cast('2017-01-01' as date),cast(eomonth('2017-01-01') as date)
union all
select dateadd(month,1,month_start_date),eomonth(dateadd(month,1,month_start_date)) from dates
where month_start_date < '2017-12-01'
)
--End recursive cte
--Query logic is the same as above
select v.Id
,year(d.month_start_date) as yr,month(d.month_start_date) as mth
,sum(case when v.[end]>d.month_end_date and v.[start]<d.month_start_date then datediff(day,d.month_start_date,d.month_end_date)+1
when v.[end]>d.month_end_date then datediff(day,v.[start],d.month_end_date)+1
when v.[start]<d.month_start_date then datediff(day,d.month_start_date,v.[end])+1
else datediff(day,v.[start],v.[end])+1
end) as VacationDays
from dates d
join Vacationtbl v on v.[start] <= d.month_end_date and v.[end] >= d.month_start_date
group by v.id,year(d.month_start_date),month(d.month_start_date)
Assuming you want only one month and you want to count all days, you can do this with arithmetic. A separate calendar table is not necessary. The advantage is performance.
I think this would be easier if SQL Server supported least() and greatest(), but case will do:
select id,
sum(1 + datediff(day, news, newe)) as vacation_days_april
from vactiontbl v cross apply
(values (case when [start] < '2017-04-01' then cast('2017-04-01' as date) else [start] end),
(case when [end] >= '2017-05-01' then cast('2017-04-30' as date) else [end] end)
) v(news, newe)
where news <= newe
group by id;
You can readily extend this to any month:
with m as (
select cast('2017-04-01' as date) as month_start,
cast('2017-04-30' as date) as month_end
)
select id,
sum(1 + datediff(day, news, newe)) as vacation_days_aprile
from m cross join
vactiontbl v cross apply
(values (case when [start] < m.month_start then m.month_start else [start] end),
(case when [end] >= m.month_end then m.month_end else [end] end)
) v(news, newe)
where news <= newe
group by id;
You can even use a similar idea to extend to multiple months, with a different row for each user and each month.
You can use a Calendar or dates table for this sort of thing.
For only 152kb in memory, you can have 30 years of dates in a table with this:
/* dates table */
declare #fromdate date = '20000101';
declare #years int = 30;
/* 30 years, 19 used data pages ~152kb in memory, ~264kb on disk */
;with n as (select n from (values(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) t(n))
select top (datediff(day, #fromdate,dateadd(year,#years,#fromdate)))
[Date]=convert(date,dateadd(day,row_number() over(order by (select 1))-1,#fromdate))
into dbo.Dates
from n as deka cross join n as hecto cross join n as kilo
cross join n as tenK cross join n as hundredK
order by [Date];
create unique clustered index ix_dbo_Dates_date
on dbo.Dates([Date]);
Without taking the actual step of creating a table, you can use it inside a common table expression with just this:
declare #fromdate date = '20170401';
declare #thrudate date = '20170430';
;with n as (select n from (values(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) t(n))
, dates as (
select top (datediff(day, #fromdate, #thrudate)+1)
[Date]=convert(date,dateadd(day,row_number() over(order by (select 1))-1,#fromdate))
from n as deka cross join n as hecto cross join n as kilo
cross join n as tenK cross join n as hundredK
order by [Date]
)
select [Date]
from dates;
Use either like so:
select
v.Id
, count(*) as VacationDays
from Vacationtbl v
inner join Dates d
on d.Date >= v.[Start]
and d.Date <= v.[End]
where d.Date >= '20170401'
and d.Date <= '20170430'
group by v.Id
rextester demo (table): http://rextester.com/PLW73242
rextester demo (cte): http://rextester.com/BCY62752
returns:
+----+--------------+
| Id | VacationDays |
+----+--------------+
| 01 | 7 |
| 02 | 7 |
+----+--------------+
Number and Calendar table reference:
Generate a set or sequence without loops - 2 - Aaron Bertrand
The "Numbers" or "Tally" Table: What it is and how it replaces a loop - Jeff Moden
Creating a Date Table/Dimension in sql Server 2008 - David Stein
Calendar Tables - Why You Need One - David Stein
Creating a date dimension or calendar table in sql Server - Aaron Bertrand
Try this,
declare #Vacationtbl table(ID int,Startdate date,Enddate date)
insert into #Vacationtbl VALUES
(1 ,'04/10/17','04/12/17')
,(1 ,'04/27/17','05/02/17')
,(2 ,'04/13/17','04/15/17')
,(2 ,'04/17/17','04/20/17')
-- somehow convert your input into first day of month
Declare #firstDayofGivenMonth date='2017-04-01'
Declare #LasttDayofGivenMonth date=dateadd(day,-1,dateadd(month,datediff(month,0,#firstDayofGivenMonth)+1,0))
;with CTE as
(
select *
,case when Startdate<#firstDayofGivenMonth then #firstDayofGivenMonth else Startdate end NewStDT
,case when Enddate>#LasttDayofGivenMonth then #LasttDayofGivenMonth else Enddate end NewEDT
from #Vacationtbl
)
SELECT
SUM(DATEDIFF(DAY, NewStDT, NewEDT) + 1) AS Days
FROM
CTE
GROUP BY
ID

How do I calculate coverage dates during a period of time in SQL against a transactional table?

I'm attempting to compile a list of date ranges like so:
Coverage Range: 10/1/2016 - 10/5/2016
Coverage Range: 10/9/2016 - 10/31/2016
for each policy in a database table. The table is transactional, and there is one cancellation transaction code, but three codes that can indicate coverage has begun. Also, there can be instances where the codes that indicate start of coverage can occur in sequence (start on 10/1, then another start on 10/5, then cancel on 10/14). Below is an example of a series of transactions that I would like to generate the above results from:
TransID PolicyID EffDate
NewBus 1 9/15/2016
Confirm 1 9/17/2016
Cancel 1 10/5/2016
Reinst 1 10/9/2016
Cancel 1 10/15/2016
Reinst 1 10/15/2016
PolExp 1 3/15/2017
SO in this dataset, I want the following results for the date range 10/1 - 10/31
Coverage Range: 10/1/2016 - 10/5/2016
Coverage Range: 10/9/2016 - 10/31/2016
Note that since the cancel and reinstatement happen on the same day, I'm excluding them from the results set. I tried pairing the transactions with subqueries:
CONVERT(varchar(10),
CASE WHEN overall.sPTRN_ID in (SELECT code FROM #cancelTransCodes)
-- This is a coverage cancellationentry
THEN -- Set coverage start date using previous paired record
CASE WHEN((SELECT MAX(inn.PD_EffectiveDate) FROM PolicyData inn WHERE inn.sPTRN_ID in (SELECT code FROM #startCoverageTransCodes)
and inn.PD_EffectiveDate <= overall.PD_EffectiveDate
and inn.PD_PolicyCode = overall.PD_PolicyCode) < #sDate) THEN #sDate
ELSE
(SELECT MAX(inn.PD_EffectiveDate) FROM PolicyData inn WHERE inn.sPTRN_ID in (SELECT code FROM #startCoverageTransCodes)
and inn.PD_EffectiveDate <= overall.PD_EffectiveDate
and inn.PD_PolicyCode = overall.PD_PolicyCode)
END
ELSE -- Set coverage start date using current record
CASE WHEN (overall.PD_EffectiveDate < #sDate) THEN #sDate ELSE overall.PD_EffectiveDate END END, 101)
as [Effective_Date]
This mostly works except for the situation I listed above. I'd rather not rewrite this query if I can help it. I have a similar line for expiration date:
ISNULL(CONVERT(varchar(10),
CASE WHEN overall.sPTRN_ID in (SELECT code FROM #cancelTransCodes) -- This is a coverage cancellation entry
THEN -- Set coverage end date with current record
overall.PD_EffectiveDate
ELSE -- check for future coverage end dates
CASE WHEN
(SELECT COUNT(*) FROM PolicyData pd WHERE pd.PD_EffectiveDate > overall.PD_EffectiveDate and pd.sPTRN_ID in (SELECT code FROM #cancelTransCodes)) > 1
THEN -- There are future end dates
CASE WHEN((SELECT TOP 1 pd.PD_ExpirationDate FROM PolicyData pd
WHERE pd.PD_PolicyCode = overall.PD_PolicyCode
and pd.PD_EntryDate between #sDate and #eDate
and pd.sPTRN_ID in (SELECT code FROM #cancelTransCodes))) > #eDate
THEN #eDate
ELSE
(SELECT TOP 1 pd.PD_ExpirationDate FROM PolicyData pd
WHERE pd.PD_PolicyCode = overall.PD_PolicyCode
and pd.PD_EntryDate between #sDate and #eDate
and pd.sPTRN_ID in (SELECT code FROM #cancelTransCodes))
END
ELSE -- No future coverage end dates
CASE WHEN(overall.PD_ExpirationDate > #eDate) THEN #eDate ELSE overall.PD_ExpirationDate END
END
END, 101), CONVERT(varchar(10), CASE WHEN(overall.PD_ExpirationDate > #eDate) THEN #eDate ELSE overall.PD_ExpirationDate END, 101))
as [Expiration_Date]
I can't help but feel like there's a simpler solution I'm missing here. So my question is: how can I modify the above portion of my query to accomodate the above scenario? OR What is the better answer? If I cam simplify this, I would love to hear how.
Here's the solution I ended up implementing
I took a simplified table where I boiled all the START transaction codes to START and all the cancel transaction codes to CANCEL. When I viewed the table based on that, it was MUCH easier to watch how my logic affected the results. I ended up using a simplified system where I used CASE WHEN clauses to identify specific scenarios and built my date ranges based on that. I also changed my starting point away from looking at cancellations and finding the related starts, and reversing it (find starts and then related calcellations). So here's the code I implemented:
/* Get Coverage Dates */
,cast((CASE WHEN sPTRN_ID in (SELECT code FROM #startCoverageTransCodes) THEN
CASE WHEN (cast(overall.PD_EntryDate as date) <= #sDate) THEN #sDate
WHEN (cast(overall.PD_EntryDate as date) > #sDate AND cast(overall.PD_EntryDate as date) <= #eDate) THEN overall.PD_EntryDate
WHEN (cast(overall.PD_EntryDate as date) > #eDate) THEN #eDate
ELSE cast(overall.PD_EntryDate as date) END
ELSE
null
END) as date) as Effective_Date
,cast((CASE WHEN sPTRN_ID in (SELECT code FROM #startCoverageTransCodes) THEN
CASE WHEN (SELECT MIN(p.PD_EntryDate) FROM PolicyData p WITH (NOLOCK) WHERE p.sPTRN_ID in (SELECT code FROM #cancelTransCodes) AND p.PD_EntryDate > overall.PD_EntryDate AND p.PD_PolicyCOde = overall.PD_PolicyCode) > #eDate THEN #eDate
ELSE ISNULL((SELECT MIN(p.PD_EntryDate) FROM PolicyData p WITH (NOLOCK) WHERE p.sPTRN_ID in (SELECT code FROM #cancelTransCodes) AND p.PD_EntryDate > overall.PD_EntryDate AND p.PD_PolicyCOde = overall.PD_PolicyCode), #eDate) END
ELSE
CASE WHEN (SELECT MAX(p.PD_EntryDate) FROM PolicyData p WITH (NOLOCK) WHERE p.sPTRN_ID in (SELECT code FROM #startCoverageTransCodes) AND p.PD_EntryDate > overall.PD_EntryDate AND p.PD_PolicyCOde = overall.PD_PolicyCode) > #eDate THEN #eDate
ELSE (SELECT MAX(p.PD_EntryDate) FROM PolicyData p WITH (NOLOCK) WHERE p.sPTRN_ID in (SELECT code FROM #startCoverageTransCodes) AND p.PD_EntryDate > overall.PD_EntryDate AND p.PD_PolicyCOde = overall.PD_PolicyCode)
END END) as date) as Expiration_Date
As you can see, I relied on subqueries in this case. I had a lot of this logic as joins, which caused extra rows where I didn't need them. So by making the date range logic based on sub-queries, I ended up speeding the stored procedure up by several seconds, bringing my execution time to under 1 second where before it was between 2-5 seconds.
There might be a simpler solution, but I just do not see it right now.
The outline for each step is:
Generate dates for date range, which you do not need to do if you have a calendar table.
Transform the incoming data set as you described in your question (skipping start/cancel on the same day); and add the next EffDate for each row.
Explode the data set with a row for each day between the generated ranges of step 2.
Reduce the data set back down based on consecutive days of converage.
test setup: http://rextester.com/GUNSO45644
/* set date range */
declare #fromdate date = '20161001'
declare #thrudate date = '20161031'
/* generate dates in range -- you can skip this if you have a calendar table */
;with n as (select n from (values(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) t(n))
, dates as (
select top (datediff(day, #fromdate, #thrudate)+1)
[Date]=convert(date,dateadd(day,row_number() over(order by (select 1))-1, #fromdate))
from n as deka
cross join n as hecto /* 100 days */
cross join n as kilo /* 2.73 years */
cross join n as [tenK] /* 27.3 years */
order by [Date]
)
/* reduce test table to desired input*/
, pol as (
select
Coverage = case when max(TransId) in ('Cancel','PolExp')
then 0 else 1 end
, PolicyId
, EffDate = case when max(TransId) in ('Cancel','PolExp')
then dateadd(day,1,EffDate) else EffDate end
, NextEffDate = oa.NextEffDate
from t
outer apply (
select top 1
NextEffDate = case
when i.TransId in ('Cancel','PolExp')
then dateadd(day,1,i.EffDate)
else i.EffDate
end
from t as i
where i.PolicyId = t.PolicyId
and i.EffDate > t.EffDate
order by
i.EffDate asc
, case when i.TransId in ('Cancel','PolExp') then 1 else 0 end desc
) as oa
group by t.PolicyId, t.EffDate, oa.NextEffDate
)
/* explode desired input by day, add row_numbers() */
, e as (
select pol.PolicyId, pol.Coverage, d.Date
, rn_x = row_number() over (
partition by pol.PolicyId
order by d.Date
)
, rn_y = row_number() over (
partition by pol.PolicyId, pol.Coverage
order by d.date)
from pol
inner join dates as d
on d.date >= pol.EffDate
and d.date < pol.NextEffDate
)
/* reduce to date ranges where Coverage = 1 */
select
PolicyId
, FromDate = convert(varchar(10),min(Date),120)
, ThruDate = convert(varchar(10),max(Date),120)
from e
where Coverage = 1
group by PolicyId, (rn_x-rn_y);
returns:
+----------+------------+------------+
| PolicyId | FromDate | ThruDate |
+----------+------------+------------+
| 1 | 2016-10-01 | 2016-10-05 |
| 1 | 2016-10-09 | 2016-10-31 |
+----------+------------+------------+

Merge adjacent rows in SQL?

I'm doing some reporting based on the blocks of time employees work. In some cases, the data contains two separate records for what really is a single block of time.
Here's a basic version of the table and some sample records:
EmployeeID
StartTime
EndTime
Data:
EmpID Start End
----------------------------
#1001 10:00 AM 12:00 PM
#1001 4:00 PM 5:30 PM
#1001 5:30 PM 8:00 PM
In the example, the last two records are contiguous in time. I'd like to write a query that combines any adjacent records so the result set is this:
EmpID Start End
----------------------------
#1001 10:00 AM 12:00 PM
#1001 4:00 PM 8:00 PM
Ideally, it should also be able to handle more than 2 adjacent records, but that is not required.
This article provides quite a few possible solutions to your question
http://www.sqlmag.com/blog/puzzled-by-t-sql-blog-15/tsql/solutions-to-packing-date-and-time-intervals-puzzle-136851
This one seems like the most straight forward:
WITH StartTimes AS
(
SELECT DISTINCT username, starttime
FROM dbo.Sessions AS S1
WHERE NOT EXISTS
(SELECT * FROM dbo.Sessions AS S2
WHERE S2.username = S1.username
AND S2.starttime < S1.starttime
AND S2.endtime >= S1.starttime)
),
EndTimes AS
(
SELECT DISTINCT username, endtime
FROM dbo.Sessions AS S1
WHERE NOT EXISTS
(SELECT * FROM dbo.Sessions AS S2
WHERE S2.username = S1.username
AND S2.endtime > S1.endtime
AND S2.starttime <= S1.endtime)
)
SELECT username, starttime,
(SELECT MIN(endtime) FROM EndTimes AS E
WHERE E.username = S.username
AND endtime >= starttime) AS endtime
FROM StartTimes AS S;
If this is strictly about adjacent rows (not overlapping ones), you could try the following method:
Unpivot the timestamps.
Leave only those that have no duplicates.
Pivot the remaining ones back, coupling every Start with the directly following End.
Or, in Transact-SQL, something like this:
WITH unpivoted AS (
SELECT
EmpID,
event,
dtime,
count = COUNT(*) OVER (PARTITION BY EmpID, dtime)
FROM atable
UNPIVOT (
dtime FOR event IN (StartTime, EndTime)
) u
)
, filtered AS (
SELECT
EmpID,
event,
dtime,
rowno = ROW_NUMBER() OVER (PARTITION BY EmpID, event ORDER BY dtime)
FROM unpivoted
WHERE count = 1
)
, pivoted AS (
SELECT
EmpID,
StartTime,
EndTime
FROM filtered
PIVOT (
MAX(dtime) FOR event IN (StartTime, EndTime)
) p
)
SELECT *
FROM pivoted
;
There's a demo for this query at SQL Fiddle.
CTE with cumulative sum:
DECLARE #t TABLE(EmpId INT, Start TIME, Finish TIME)
INSERT INTO #t (EmpId, Start, Finish)
VALUES
(1001, '10:00 AM', '12:00 PM'),
(1001, '4:00 PM', '5:30 PM'),
(1001, '5:30 PM', '8:00 PM')
;WITH rowind AS (
SELECT EmpId, Start, Finish,
-- IIF returns 1 for each row that should generate a new row in the final result
IIF(Start = LAG(Finish, 1) OVER(PARTITION BY EmpId ORDER BY Start), 0, 1) newrow
FROM #t),
groups AS (
SELECT EmpId, Start, Finish,
-- Cumulative sum
SUM(newrow) OVER(PARTITION BY EmpId ORDER BY Start) csum
FROM rowind)
SELECT
EmpId,
MIN(Start) Start,
MAX(Finish) Finish
FROM groups
GROUP BY EmpId, csum
I have changed a lil' bit the names and types to make the example smaller but this works and should be very fast and it has no number of records limit:
with cte as (
select
x1.id
,x1.t1
,x1.t2
,case when x2.t1 is null then 1 else 0 end as bef
,case when x3.t1 is null then 1 else 0 end as aft
from x x1
left join x x2 on x1.id=x2.id and x1.t1=x2.t2
left join x x3 on x1.id=x3.id and x1.t2=x3.t1
where x2.id is null
or x3.id is null
)
select
cteo.id
,cteo.t1
,isnull(z.t2,cteo.t2) as t2
from cte cteo
outer apply (select top 1 *
from cte ctei
where cteo.id=ctei.id and cteo.aft=0 and ctei.t1>cteo.t1
order by t1) z
where cteo.bef=1
and the fiddle for it : http://sqlfiddle.com/#!3/ad737/12/0
Option with Inline User-Defined Function AND CTE
CREATE FUNCTION dbo.Overlap
(
#availStart datetime,
#availEnd datetime,
#availStart2 datetime,
#availEnd2 datetime
)
RETURNS TABLE
RETURN
SELECT CASE WHEN #availStart > #availEnd2 OR #availEnd < #availStart2
THEN #availStart ELSE
CASE WHEN #availStart > #availStart2 THEN #availStart2 ELSE #availStart END
END AS availStart,
CASE WHEN #availStart > #availEnd2 OR #availEnd < #availStart2
THEN #availEnd ELSE
CASE WHEN #availEnd > #availEnd2 THEN #availEnd ELSE #availEnd2 END
END AS availEnd
;WITH cte AS
(
SELECT EmpID, Start, [End], ROW_NUMBER() OVER (PARTITION BY EmpID ORDER BY Start) AS Id
FROM dbo.TableName
), cte2 AS
(
SELECT Id, EmpID, Start, [End]
FROM cte
WHERE Id = 1
UNION ALL
SELECT c.Id, c.EmpID, o.availStart, o.availEnd
FROM cte c JOIN cte2 ct ON c.Id = ct.Id + 1
CROSS APPLY dbo.Overlap(c.Start, c.[End], ct.Start, ct.[End]) AS o
)
SELECT EmpID, Start, MAX([End])
FROM cte2
GROUP BY EmpID, Start
Demo on SQLFiddle