Find out how many tickets were open each day historically - sql

I have a table that looks like the below
ID | CreatedDate | CloseDate
271 | 01-Jan-2018 | 02-Jan-2018
278 | 03-Jan-2018 | 05-Jan-2018
333 | 03-Jan-2018 | NULL
I have been tasked to find out for each day in a month, how many tickets remained open going into the next day. A ticket would be classed as remaining open if the CloseDate was not on the same date as the created date, or the CloseDate is NULL.
My ideal output from the above would look like this
Day | Tickets Remaining
01-Jan-2018 | 1
02-Jan-2018 | 0
03-Jan-2018 | 2
04-Jan-2018 | 2
05-Jan-2018 | 1
Does this make sense? Using SQL Server 2016..

If you don't have a number/tally table handy, you can use a recursive CTE:
with cte as (
select id, createddate as dte, coalesce(closeddate, convert(date, getdate())) as closeddate
from t
union all
select id, dateadd(day, 1, dte), closeddate
from cte
where dte < closeddate
)
select dte, count(*)
from cte
group by dte;
Note: If any of the timespans can exceed 100 days, then you need to add option (maxrecursion 0).
If there are only a handful of dates, you can use a derived table and left join:
select v.dte, count(t.id)
from (values (convert(date, '2018-01-01')),
(convert(date, '2018-01-02')),
(convert(date, '2018-01-03')),
(convert(date, '2018-01-04')),
(convert(date, '2018-01-05')),
) v(dte) left join
t
on v.dte >= t.createddate and
(v.dte <= t.closeddate or t.closeddate is null)
group by v.dte
order by v.dte;

Related

Group By first day of month and join with a separate table

I have 2 tables in SQL
one with monthly sales targets:
Date Target
1/7/17 50000
1/8/17 65000
1/9/17 50000
1/10/17 48000
etc...
the other with sales orders:
TxnDate JobNum Value
3/7/17 100001 20000
3/7/17 100002 11000
8/7/17 100003 10000
10/8/17 100004 15000
15/9/17 100005 20000
etc...
what I want is a table with following:
Date Target Sales
1/7/17 50000 41000
1/8/17 65000 15000
1/9/17 50000 20000
please help me I'm a newbie to coding and this is doing my head in.. :)
Assuming your 1st table is targetSales and your 2nd table is Sales and your database is SQL Server:
select
t.date
, t.target
, isnull(sum(s.value), 0) as Sales
from targetSales t
left join Sales s
on (month(t.date) = month(s.date)
and year(t.date) = year(s.date))
group by t.date
, t.target
You can follow a similar approach if you use a different database, just find the equivalents of month() and year() functions for your RDBMS.
try this
select tb1.date,tb1.target,tb2.value from table1 as tb1
INNER JOIN (select sum(value) as sales, date from table2 group by date) as tb2
on tb1.date = tb2.date,
you can use this script for daily targets
An another way around, looks like in target table the date is always the first day of the month. So in the sales table, just round the TxnDate column value to first day of the month.
Query
select t1.[date],
max(t1.[target]) as [target],
coalesce(sum(t2.[value]), 0) as [value]
from [targets] t1
left join [sales] t2
on t1.[Date] = dateadd(day, - datepart(day, t2.[txnDate]) + 1, t2.[txnDate])
group by t1.[Date];
demo
If you take any datetime value in SQL Server, calculate the number of months from that date to zero datediff(month,0,TxnDate) then add that number of moths to zero dateadd(month, ... , 0) you get the first day of the month for the original datetime value. This works in all versions of SQL Server. With this we can sum the values of the orders by the first day of the month, then join to targets using that date.
CREATE TABLE Orders
([TxnDate] datetime, [JobNum] int, [Value] int)
;
INSERT INTO Orders
([TxnDate], [JobNum], [Value])
VALUES
('2017-07-03 00:00:00', 100001, 20000),
('2017-07-03 00:00:00', 100002, 11000),
('2017-07-08 00:00:00', 100003, 10000),
('2017-08-10 00:00:00', 100004, 15000),
('2017-09-15 00:00:00', 100005, 20000)
;
CREATE TABLE Targets
([Date] datetime, [Target] int)
;
INSERT INTO Targets
([Date], [Target])
VALUES
('2017-07-01 00:00:00', 50000),
('2017-08-01 00:00:00', 65000),
('2017-09-01 00:00:00', 50000),
('2017-10-10 00:00:00', 48000)
;
GO
9 rows affected
select dateadd(month,datediff(month,0,TxnDate), 0) month_start, sum(Value) SumValue
from Orders
group by dateadd(month, datediff(month,0,TxnDate), 0)
GO
month_start | SumValue
:------------------ | -------:
01/07/2017 00:00:00 | 41000
01/08/2017 00:00:00 | 15000
01/09/2017 00:00:00 | 20000
select
t.[Date], t.Target, coalesce(o.SumValue,0)
from targets t
left join (
select dateadd(month,datediff(month,0,TxnDate), 0) month_start, sum(Value) SumValue
from Orders
group by dateadd(month, datediff(month,0,TxnDate), 0)
) o on t.[Date] = o.month_start
GO
Date | Target | (No column name)
:------------------ | -----: | ---------------:
01/07/2017 00:00:00 | 50000 | 41000
01/08/2017 00:00:00 | 65000 | 15000
01/09/2017 00:00:00 | 50000 | 20000
10/10/2017 00:00:00 | 48000 | 0
dbfiddle here
This is not the best solution but this will give you a correct result.
select date,target,(
select sum(value)
from sales_orders s
where datepart(m,s.TxnDate) = datepart(m,targets.Date)
and datepart(year,s.TxnDate) = datepart(year,targets.Date)
) as sales
from targets

SQL how to count census points occurring between date records

I’m using MS-SQL-2008 R2 trying to write a script that calculates the Number of Hospital Beds occupied on any given day, at 2 census points: midnight, and 09:00.
I’m working from a data set of patient Ward Stays. Basically, each row in the table is a record of an individual patient's stay on a single ward, and records the date/time the patient is admitted onto the ward, and the date/time the patient leaves the ward.
A sample of this table is below:
Ward_Stay_Primary_Key | Ward_Start_Date_Time | Ward_End_Date_Time
1 | 2017-09-03 15:04:00.000 | 2017-09-27 16:55:00.000
2 | 2017-09-04 18:08:00.000 | 2017-09-06 18:00:00.000
3 | 2017-09-04 13:00:00.000 | 2017-09-04 22:00:00.000
4 | 2017-09-04 20:54:00.000 | 2017-09-08 14:30:00.000
5 | 2017-09-04 20:52:00.000 | 2017-09-13 11:50:00.000
6 | 2017-09-05 13:32:00.000 | 2017-09-11 14:49:00.000
7 | 2017-09-05 13:17:00.000 | 2017-09-12 21:00:00.000
8 | 2017-09-05 23:11:00.000 | 2017-09-06 17:38:00.000
9 | 2017-09-05 11:35:00.000 | 2017-09-14 16:12:00.000
10 | 2017-09-05 14:05:00.000 | 2017-09-11 16:30:00.000
The key thing to note here is that a patient’s Ward Stay can span any length of time, from a few hours to many days.
The following code enables me to calculate the number of beds at both census points for any given day, by specifying the date in the case statement:
SELECT
'05/09/2017' [Date]
,SUM(case when Ward_Start_Date_Time <= '05/09/2017 00:00:00.000' AND (Ward_End_Date_Time >= '05/09/2017 00:00:00.000' OR Ward_End_Date_Time IS NULL)then 1 else 0 end)[No. Beds Occupied at 00:00]
,SUM(case when Ward_Start_Date_Time <= '05/09/2017 09:00:00.000' AND (Ward_End_Date_Time >= '05/09/2017 09:00:00.000' OR Ward_End_Date_Time IS NULL)then 1 else 0 end)[No. Beds Occupied at 09:00]
FROM
WardStaysTable
And, based on the sample 10 records above, generates this output:
Date | No. Beds Occupied at 00:00 | No. Beds Occupied at 09:00
05/09/2017 | 4 | 4
To perform this for any number of days is obviously onerous, so what I’m looking to create is a query where I can specify a start/end date parameter (e.g. 1st-5th Sept), and for the query to then evaluate the Ward_Start_Date_Time and Ward_End_Date_Time variables for each record, and – grouping by the dates defined in the date parameter – count each time the 00:00:00.000 and 09:00:00.000 census points fall between these 2 variables, to give an output something along these lines (based on the above 10 records):
Date | No. Beds Occupied at 00:00 | No. Beds Occupied at 09:00
01/09/2017 | 0 | 0
02/09/2017 | 0 | 0
03/09/2017 | 0 | 0
04/09/2017 | 1 | 1
05/09/2017 | 4 | 4
I’ve approached this (perhaps naively) thinking that if I use a cte to create a table of dates (defined by the input parameters), along with associated midnight and 9am census date/time points, then I could use these variables to group and evaluate the dataset.
So, this code generates the grouping dates and census date/time points:
DECLARE
#StartDate DATE = '01/09/2017'
,#EndDate DATE = '05/09/2017'
,#0900 INT = 540
SELECT
DATEADD(DAY, nbr - 1, #StartDate) [Date]
,CONVERT(DATETIME,(DATEADD(DAY, nbr - 1, #StartDate))) [MidnightDate]
,DATEADD(mi, #0900,(CONVERT(DATETIME,(DATEADD(DAY, nbr - 1, #StartDate))))) [0900Date]
FROM
(
SELECT
ROW_NUMBER() OVER ( ORDER BY c.object_id ) AS nbr
FROM sys.columns c
) nbrs
WHERE nbr - 1 <= DATEDIFF(DAY, #StartDate, #EndDate)
The stumbling block I’ve hit is how to join the cte to the WardStays dataset, because there’s no appropriate key… I’ve tried a few iterations of using a subquery to make this work, but either I’m taking the wrong approach or I’m getting my syntax in a mess.
In simple terms, the logic I’m trying to create to get the output is something like:
SELECT
[Date]
,SUM (case when WST.Ward_Start_Date_Time <= [MidnightDate] AND (WST.Ward_End_Date_Time >= [MidnightDate] OR WST.Ward_End_Date_Time IS NULL then 1 else 0 end) [No. Beds Occupied at 00:00]
,SUM (case when WST.Ward_Start_Date_Time <= [0900Date] AND (WST.Ward_End_Date_Time >= [0900Date] OR WST.Ward_End_Date_Time IS NULL then 1 else 0 end) [No. Beds Occupied at 09:00]
FROM WardStaysTable WST
GROUP BY [Date]
Is the above somehow possible, or am I barking up the wrong tree and need to take a different approach altogether? Appreciate any advice.
I would expect something like this:
WITH dates as (
SELECT CAST(#StartDate as DATETIME) as dte
UNION ALL
SELECT DATEADD(DAY, 1, dte)
FROM dates
WHERE dte < #EndDate
)
SELECT dates.dte [Date],
SUM(CASE WHEN Ward_Start_Date_Time <= dte AND
Ward_END_Date_Time >= dte
THEN 1 ELSE 0
END) as num_beds_0000,
SUM(CASE WHEN Ward_Start_Date_Time <= dte + CAST('09:00' as DATETIME) AND
Ward_END_Date_Time >= dte + CAST('09:00' as DATETIME)
THEN 1 ELSE 0
END) as num_beds_0900
FROM dates LEFT JOIN
WardStaysTable wt
ON wt.Ward_Start_Date_Time <= DATEADD(day, 1, dates.dte) AND
wt.Ward_END_Date_Time >= dates.dte
GROUP BY dates.dte
ORDER BY dates.dte;
The cte is just creating the list of dates.
What a cool exercise. Here is what I came up with:
CREATE TABLE #tmp (ID int, StartDte datetime, EndDte datetime)
INSERT INTO #tmp values(1,'2017-09-03 15:04:00.000','2017-09-27 06:55:00.000')
INSERT INTO #tmp values(2,'2017-09-04 08:08:00.000','2017-09-06 18:00:00.000')
INSERT INTO #tmp values(3,'2017-09-04 13:00:00.000','2017-09-04 22:00:00.000')
INSERT INTO #tmp values(4,'2017-09-04 20:54:00.000','2017-09-08 14:30:00.000')
INSERT INTO #tmp values(5,'2017-09-04 20:52:00.000','2017-09-13 11:50:00.000')
INSERT INTO #tmp values(6,'2017-09-05 13:32:00.000','2017-09-11 14:49:00.000')
INSERT INTO #tmp values(7,'2017-09-05 13:17:00.000','2017-09-12 21:00:00.000')
INSERT INTO #tmp values(8,'2017-09-05 23:11:00.000','2017-09-06 07:38:00.000')
INSERT INTO #tmp values(9,'2017-09-05 11:35:00.000','2017-09-14 16:12:00.000')
INSERT INTO #tmp values(10,'2017-09-05 14:05:00.000','2017-09-11 16:30:00.000')
DECLARE
#StartDate DATE = '09/01/2017'
,#EndDate DATE = '10/01/2017'
, #nHours INT = 9
;WITH d(OrderDate) AS
(
SELECT DATEADD(DAY, n-1, #StartDate)
FROM (SELECT TOP (DATEDIFF(DAY, #StartDate, #EndDate) + 1)
ROW_NUMBER() OVER (ORDER BY [object_id]) FROM sys.all_objects) AS x(n)
)
, CTE AS(
select OrderDate, t2.*
from #tmp t2
cross apply(select orderdate from d ) d
where StartDte >= #StartDate and EndDte <= #EndDate)
select OrderDate,
SUM(CASE WHEN OrderDate >= StartDte and OrderDate <= EndDte THEN 1 ELSE 0 END) [No. Beds Occupied at 00:00],
SUM(CASE WHEN StartDTE <= DateAdd(hour,#nHours,CAST(OrderDate as datetime)) and DateAdd(hour,#nHours,CAST(OrderDate as datetime)) <= EndDte THEN 1 ELSE 0 END) [No. Beds Occupied at 09:00]
from CTE
GROUP BY OrderDate
This should allow you to check for any hour of the day using the #nHours parameter if you so choose. If you only want to see records that actually fall within your date range then you can filter the cross apply on start and end dates.

Duration overlap causing double counting

I'm using SQL Server Management Studio 2008 for query creation. Reporting Services 2008 for report creation.
I have been trying to work this out over a couple of weeks and I have hit a brick wall. I’m hoping someone will be able to come up with the solution as right now my brain has turned to mush.
I am currently developing an SQL query that will be feeding data through to a Reporting Services report. The aim of the report is to show the percentage of availability for first aid providers within locations around the county we are based in. The idea is that there should be only one first aider providing cover at a time at each of our 20 locations.
This has all been working fine apart from the first aiders at one location have been overlapping their cover at the start and end of each period of cover.
Example of cover overlap:
| Location | start_date | end_date |
+----------+---------------------+---------------------+
| Wick | 22/06/2015 09:00:00 | 22/06/2015 19:00:00 |
| Wick | 22/06/2015 18:30:00 | 23/06/2015 09:00:00 |
| Wick | 23/06/2015 09:00:00 | 23/06/2015 18:30:00 |
| Wick | 23/06/2015 18:00:00 | 24/06/2015 09:00:00 |
+----------+---------------------+---------------------+
In a perfect world the database that they set their cover in wouldn’t allow them to do this but it’s an externally developed database that doesn’t allow us to make changes like that to it. We also aren’t allowed to create functions, stored procedures, tally tables etc…
The query itself should return the number of minutes that each location has had first aid cover for, then broken down into hours of the day. Any overlap in cover shouldn’t end up adding additional cover and should be merged. One person can be on at a time, if they overlap then it should only count as one person lot of cover.
Example Output:
+----------+---------------------+---------------------+----------+--------------+--------+-------+------+----------+
| Location | fromDt | toDt | TimeDiff | Availability | DayN | DayNo | Hour | DayCount |
+----------+---------------------+---------------------+----------+--------------+--------+-------+------+----------+
| WicK | 22/06/2015 18:00:00 | 22/06/2015 18:59:59 | 59 | 100 | Monday | 1 | 18 | 0 |
| WicK | 22/06/2015 18:30:00 | 22/06/2015 18:59:59 | 29 | 50 | Monday | 1 | 18 | 0 |
| WicK | 22/06/2015 19:00:00 | 22/06/2015 19:59:59 | 59 | 100 | Monday | 1 | 19 | 0 |
+----------+---------------------+---------------------+----------+--------------+--------+-------+------+----------+
Example Code:
DECLARE
#StartTime datetime,
#EndTime datetime,
#GivenDate datetime;
SET #GivenDate = '2015-06-22';
SET #StartTime = #GivenDate + ' 00:00:00';
SET #EndTime = '2015-06-23' + ' 23:59:59';
Declare #Sample Table
(
Location Varchar(50),
StartDate Datetime,
EndDate Datetime
)
Insert #Sample
Select
sta.location,
act.Start,
act.END
from emp,
con,
sta,
act
where
emp.ID = con.ID
and con.location = sta.location
and SUBSTRING(sta.ident,3,2) in ('51','22')
and convert(varchar(10),act.start,111) between #GivenDate and #EndTime
and act.ACT= 18
group by sta.location,
act.Start,
act.END
order by 2
;WITH Yak (location, fromDt, toDt, maxDt,hourdiff)
AS (
SELECT location,
StartDate,
/*check if the period of cover rolls onto the next hour */
convert(datetime,convert(varchar(21),
CONVERT(varchar(10),StartDate,111)+' '
+convert(varchar(2),datepart(hour,StartDate))+':59'+':59'))
,
EndDate
,dateadd(hour,1,dateadd(hour, datediff(hour, 0, StartDate), 0))-StartDate
FROM #Sample
UNION ALL
SELECT location,
dateadd(second,1,toDt),
dateadd(hour, 1, toDt),
maxDt,
hourdiff
FROM Yak
WHERE toDt < maxDt
) ,
TAB1 (location, FROMDATE,TODATE1,TODATE) AS
(SELECT
location,
#StartTime,
convert(datetime,convert(varchar(21),
CONVERT(varchar(10),#StartTime,120)+' '
+convert(varchar(2),datepart(hour,#StartTime))+':59'+':59.999')),
#EndTime
from #Sample
UNION ALL
SELECT
location,
(DATEADD(hour, 1,(convert(datetime,convert(varchar(21),
CONVERT(varchar(10),FROMDATE,120)+' '
+convert(varchar(2),datepart(hour,FROMDATE))+':00'+':00.000')))))ToDate,
(DATEADD(hour, 1,(convert(datetime,convert(varchar(21),
CONVERT(varchar(10),TODATE1,120)+' '
+convert(varchar(2),datepart(hour,TODATE1))+':59'+':59.999'))))) Todate1,
TODATE
FROM TAB1 WHERE TODATE1 < TODATE
),
/*CTE Tab2 adds zero values to all possible hours between start and end dates */
TAB2 AS
(SELECT location, FROMDATE,
CASE WHEN TODATE1 > TODATE THEN TODATE ELSE TODATE1 END AS TODATE
FROM TAB1)
SELECT location,
fromDt,
/* Display MaxDT as start time if cover period goes into next dat */
CASE WHEN toDt > maxDt THEN maxDt ELSE toDt END AS toDt,
/* If the end date is on the next day find out the minutes between the start date and the end of the day or find out the minutes between the next day and the end date */
Case When ToDt > Maxdt then datediff(mi,fromDt,maxDt) else datediff(mi,FromDt,ToDt) end as TimeDiff,
Case When ToDt > Maxdt then round(datediff(S,fromDt,maxDt)/3600.0*100,0) else round(datediff(S,FromDt,ToDt)/3600.0*100.0,0) end as Availability,
/*Display the name of the day of the week*/
CASE WHEN toDt > maxDt THEN datename(dw,maxDt) ELSE datename(dw,fromDt) END AS DayN,
CASE WHEN toDt > maxDt THEN case when datepart(dw,maxDt)-1 = 0 then 7 else datepart(dw,maxDt)-1 end ELSE case when datepart(dw,fromDt)-1 = 0 then 7 else datepart(dw,fromDt)-1 END end AS DayNo
,DATEPART(hour, fromDt) as Hour,
'0' as DayCount
FROM Yak
where Case When ToDt > Maxdt then datediff(mi,fromDt,maxDt) else datediff(mi,FromDt,ToDt) end <> 0
group by location,fromDt,maxDt,toDt
Union all
SELECT
tab2.location,
convert(varchar(19),Tab2.FROMDATE,120),
convert(varchar(19),Tab2.TODATE,120),
'0',
'0',
datename(dw,FromDate) DayN,
case when datepart(dw,FromDate)-1 = 0 then 7 else datepart(dw,FromDate)-1 end AS DayNo,
DATEPART(hour, fromDate) as Hour,
COUNT(distinct datename(dw,fromDate))
FROM TAB2
Where datediff(MINUTE,convert(varchar(19),Tab2.FROMDATE,120),convert(varchar(19),Tab2.TODATE,120)) > 0
group by location, TODATE, FROMDATE
Order by 2
option (maxrecursion 0)
I have tried the following forum entries but they haven't worked in my case:
http://forums.teradata.com/forum/general/need-help-merging-consecutive-and-overlapping-date-spans
Checking for time range overlap, the watchman problem [SQL]
Calculate Actual Downtime ignoring overlap in dates/times
Sorry for being so lengthy but I thought I would try to give you as much detail as possible. Any help will be really appreciated. Thank you.
So the solution I came up with uses temp tables, which you can easily change to be CTEs so you can avoid using a stored procedure.
I tried working with window functions to find overlapping records and get the min and max times, the issue is where you have overlap chaining e.g. 09:00 - 09:10, 09:05 - 09:15, 09:11 - 09:20, so all minutes from 09:00 to 09:20 are covered, but it's almost impossible to tell that 09:00 - 09:10 is related to 09:11 - 09:20 without recursing through the results until you get to the bottom of the chain. (Hopefully that makes sense).
So I exploded out all of the date ranges into every minute between the StartDate and EndDate, then you can use the ROW_NUMBER() window function to catch any duplicates, which in turn you can use to see how many different people covered the same minute.
CREATE TABLE dbo.dates
(
Location VARCHAR(64),
StartDate DATETIME,
EndDate DATETIME
);
INSERT INTO dbo.dates VALUES
('Wick','20150622 09:00:00','20150622 19:00:00'),
('Wick','20150622 18:30:00','20150624 09:00:00'),
('Wick','20150623 09:00:00','20150623 18:30:00'),
('Wick','20150623 18:00:00','20150624 09:00:00'),
('Wick','20150630 09:00:00','20150630 09:30:00'),
('Wick','20150630 09:00:00','20150630 09:45:00'),
('Wick','20150630 09:10:00','20150630 09:25:00'),
('Wick','20150630 09:35:00','20150630 09:55:00'),
('Wick','20150630 09:57:00','20150630 10:10:00');
SELECT ROW_NUMBER() OVER (PARTITION BY Location ORDER BY StartDate) [Id],
Location,
StartDate,
EndDate
INTO dbo.overlaps
FROM dbo.dates;
SELECT TOP 10000 N=IDENTITY(INT, 1, 1)
INTO dbo.Num
FROM master.dbo.syscolumns a CROSS JOIN master.dbo.syscolumns b;
SELECT 0 [N] INTO dbo.Numbers;
INSERT INTO dbo.Numbers SELECT * FROM dbo.Num;
SELECT [Location] = raw.Location,
[WorkedDate] = CAST([MinuteWorked] AS DATE),
[DayN] = DATENAME(WEEKDAY, [MinuteWorked]),
[DayNo] = DATEPART(WEEKDAY, [MinuteWorked]) -1,
[Hour] = DATEPART(HOUR, [MinuteWorked]),
[MinutesWorked] = SUM(IIF(raw.[Minutes] = 1, 1, 0)),
[MaxWorkers] = MAX(raw.[Minutes])
FROM
(
SELECT
o.Location,
DATEADD(MINUTE, n.N, StartDate) [MinuteWorked],
ROW_NUMBER() OVER (PARTITION BY o.Location, DATEADD(MINUTE, n.N, StartDate) ORDER BY DATEADD(MINUTE, n.N, StartDate)) [Minutes]
FROM dbo.overlaps o
INNER JOIN dbo.Numbers n ON n.N < DATEDIFF(MINUTE, StartDate, EndDate)
) raw
GROUP BY
raw.Location,
CAST([MinuteWorked] AS DATE),
DATENAME(WEEKDAY, [MinuteWorked]),
DATEPART(WEEKDAY, [MinuteWorked]) - 1,
DATEPART(HOUR, [MinuteWorked])
Here's a subset of the results:
Location WorkedDate DayN DayNo Hour MinutesWorked MaxWorkers
Wick 2015-06-24 Wednesday 3 8 60 2
Wick 2015-06-30 Tuesday 2 9 58 3
Wick 2015-06-30 Tuesday 2 10 10 1
Here's the fiddle

SQL breakout date range to rows

I am trying to take given date ranges found in a data set and divide them into unique rows for each day in the range (example below). Doing the opposite in SQL is pretty straight forward, but I am struggling to achieve the desired query output.
Beginning data:
ITEM START_DATE END_DATE
A 1/1/2015 1/5/2015
B 2/5/2015 2/7/2015
Desired query output:
ITEM DATE_COVERED
A 1/1/2015
A 1/2/2015
A 1/3/2015
A 1/4/2015
A 1/5/2015
B 2/5/2015
B 2/6/2015
B 2/7/2015
The fastest way will be some tally table:
DECLARE #t TABLE
(
ITEM CHAR(1) ,
START_DATE DATE ,
END_DATE DATE
)
INSERT INTO #t
VALUES ( 'A', '1/1/2015', '1/5/2015' ),
( 'B', '2/5/2015', '2/7/2015' )
;WITH cte AS(SELECT -1 + ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) d FROM
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t1(n) CROSS JOIN
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t2(n) CROSS JOIN
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t3(n) CROSS JOIN
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t4(n))
SELECT t.ITEM, ca.DATE_COVERED FROM #t t
CROSS APPLY(SELECT DATEADD(dd, d, t.START_DATE) AS DATE_COVERED
FROM cte
WHERE DATEADD(dd, d, t.START_DATE) BETWEEN t.START_DATE AND t.END_DATE) ca
ORDER BY t.ITEM, ca.DATE_COVERED
Query:
SQLFiddleExample
SELECT t.ITEM,
DATEADD(day,n.number, t.START_DATE) AS DATE_COVERED
FROM Table1 t,
(SELECT number
FROM master..spt_values
WHERE [type] = 'P') n
WHERE START_DATE <= DATEADD(day,n.number, t.START_DATE)
AND END_DATE >= DATEADD(day,n.number, t.START_DATE)
Result:
| ITEM | DATE_COVERED |
|------|--------------|
| A | 2015-01-01 |
| A | 2015-01-02 |
| A | 2015-01-03 |
| A | 2015-01-04 |
| A | 2015-01-05 |
| B | 2015-02-05 |
| B | 2015-02-06 |
| B | 2015-02-07 |
NOTE: this only works if the difference between your startdate and enddate is a maximum of 2047 days (master..spt_values only allows 0..2047 range of values)
select item, dateadd(d,v.number,d.start_date) adate
from begindata d
join master..spt_values v on v.type='P'
and v.number between 0 and datediff(d, start_date, end_date)
order by adate;
I'd like to say I did this myself but I got the code from this
Here is a fiddle with your expected result
TRY THIS...
CREATE TABLE Table1
([ITEM] varchar(1), [START_DATE] date, [END_DATE] date)
;
INSERT INTO Table1
([ITEM], [START_DATE], [END_DATE])
VALUES ('A', '2015-01-01', '2015-01-05'), ('B', '2015-02-05', 2015-02-07');
WITH Days
AS ( SELECT ITEM, START_DATE AS [Date], 1 AS [level] from Table1
UNION ALL
SELECT TABLE1.ITEM, DATEADD(DAY, 1, [Date]), [level] + 1
FROM Days,Table1
WHERE DAYS.ITEM=TABLE1.ITEM AND [Date] < END_DATE )
SELECT distinct [Date]
FROM Days
DEMO

Combine consecutive date ranges

Using SQL Server 2008 R2,
I'm trying to combine date ranges into the maximum date range given that one end date is next to the following start date.
The data is about different employments. Some employees may have ended their employment and have rejoined at a later time. Those should count as two different employments (example ID 5). Some people have different types of employment, running after each other (enddate and startdate neck-to-neck), in this case it should be considered as one employment in total (example ID 30).
An employment period that has not ended has an enddate that is null.
Some examples is probably enlightening:
declare #t as table (employmentid int, startdate datetime, enddate datetime)
insert into #t values
(5, '2007-12-03', '2011-08-26'),
(5, '2013-05-02', null),
(30, '2006-10-02', '2011-01-16'),
(30, '2011-01-17', '2012-08-12'),
(30, '2012-08-13', null),
(66, '2007-09-24', null)
-- expected outcome
EmploymentId StartDate EndDate
5 2007-12-03 2011-08-26
5 2013-05-02 NULL
30 2006-10-02 NULL
66 2007-09-24 NULL
I've been trying different "islands-and-gaps" techniques but haven't been able to crack this one.
The strange bit you see with my use of the date '31211231' is just a very large date to handle your "no-end-date" scenario. I have assumed you won't really have many date ranges per employee, so I've used a simple Recursive Common Table Expression to combine the ranges.
To make it run faster, the starting anchor query keeps only those dates that will not link up to a prior range (per employee). The rest is just tree-walking the date ranges and growing the range. The final GROUP BY keeps only the largest date range built up per starting ANCHOR (employmentid, startdate) combination.
SQL Fiddle
MS SQL Server 2008 Schema Setup:
create table Tbl (
employmentid int,
startdate datetime,
enddate datetime);
insert Tbl values
(5, '2007-12-03', '2011-08-26'),
(5, '2013-05-02', null),
(30, '2006-10-02', '2011-01-16'),
(30, '2011-01-17', '2012-08-12'),
(30, '2012-08-13', null),
(66, '2007-09-24', null);
/*
-- expected outcome
EmploymentId StartDate EndDate
5 2007-12-03 2011-08-26
5 2013-05-02 NULL
30 2006-10-02 NULL
66 2007-09-24 NULL
*/
Query 1:
;with cte as (
select a.employmentid, a.startdate, a.enddate
from Tbl a
left join Tbl b on a.employmentid=b.employmentid and a.startdate-1=b.enddate
where b.employmentid is null
union all
select a.employmentid, a.startdate, b.enddate
from cte a
join Tbl b on a.employmentid=b.employmentid and b.startdate-1=a.enddate
)
select employmentid,
startdate,
nullif(max(isnull(enddate,'32121231')),'32121231') enddate
from cte
group by employmentid, startdate
order by employmentid
Results:
| EMPLOYMENTID | STARTDATE | ENDDATE |
-----------------------------------------------------------------------------------
| 5 | December, 03 2007 00:00:00+0000 | August, 26 2011 00:00:00+0000 |
| 5 | May, 02 2013 00:00:00+0000 | (null) |
| 30 | October, 02 2006 00:00:00+0000 | (null) |
| 66 | September, 24 2007 00:00:00+0000 | (null) |
SET NOCOUNT ON
DECLARE #T TABLE(ID INT,FromDate DATETIME, ToDate DATETIME)
INSERT INTO #T(ID,FromDate,ToDate)
SELECT 1,'20090801','20090803' UNION ALL
SELECT 2,'20090802','20090809' UNION ALL
SELECT 3,'20090805','20090806' UNION ALL
SELECT 4,'20090812','20090813' UNION ALL
SELECT 5,'20090811','20090812' UNION ALL
SELECT 6,'20090802','20090802'
SELECT ROW_NUMBER() OVER(ORDER BY s1.FromDate) AS ID,
s1.FromDate,
MIN(t1.ToDate) AS ToDate
FROM #T s1
INNER JOIN #T t1 ON s1.FromDate <= t1.ToDate
AND NOT EXISTS(SELECT * FROM #T t2
WHERE t1.ToDate >= t2.FromDate
AND t1.ToDate < t2.ToDate)
WHERE NOT EXISTS(SELECT * FROM #T s2
WHERE s1.FromDate > s2.FromDate
AND s1.FromDate <= s2.ToDate)
GROUP BY s1.FromDate
ORDER BY s1.FromDate
An alternative solution that uses window functions rather than recursive CTEs
SELECT
employmentid,
MIN(startdate) as startdate,
NULLIF(MAX(COALESCE(enddate,'9999-01-01')), '9999-01-01') as enddate
FROM (
SELECT
employmentid,
startdate,
enddate,
DATEADD(
DAY,
-COALESCE(
SUM(DATEDIFF(DAY, startdate, enddate)+1) OVER (PARTITION BY employmentid ORDER BY startdate ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),
0
),
startdate
) as grp
FROM #t
) withGroup
GROUP BY employmentid, grp
ORDER BY employmentid, startdate
This works by calculating a grp value that will be the same for all consecutive rows. This is achieved by:
Determine totals days the span occupies (+1 as the dates are inclusive)
SELECT *, DATEDIFF(DAY, startdate, enddate)+1 as daysSpanned FROM #t
Cumulative sum the days spanned for each employment, ordered by startdate. This gives us the total days spanned by all the previous employment spans
We coalesce with 0 to ensure we dont have NULLs in our cumulative sum of days spanned
We do not include current row in our cumulative sum, this is because we will use the value against startdate rather than enddate (we cant use it against enddate because of the NULLs)
SELECT *, COALESCE(
SUM(daysSpanned) OVER (
PARTITION BY employmentid
ORDER BY startdate
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
)
,0
) as cumulativeDaysSpanned
FROM (
SELECT *, DATEDIFF(DAY, startdate, enddate)+1 as daysSpanned FROM #t
) inner1
Subtract the cumulative days from the startdate to get our grp. This is the crux of the solution.
If the start date increases at the same rate as the days spanned then the days are consecutive, and subtracting the two will give us the same value.
If the startdate increases faster than the days spanned then there is a gap and we will get a new grp value greater than the previous one.
Although grp is a date, the date itself is meaningless we are using just as a grouping value
SELECT *, DATEADD(DAY, -cumulativeDaysSpanned, startdate) as grp
FROM (
SELECT *, COALESCE(
SUM(daysSpanned) OVER (
PARTITION BY employmentid
ORDER BY startdate
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
)
,0
) as cumulativeDaysSpanned
FROM (
SELECT *, DATEDIFF(DAY, startdate, enddate)+1 as daysSpanned FROM #t
) inner1
) inner2
With the results
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| employmentid | startdate | enddate | daysSpanned | cumulativeDaysSpanned | grp |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 5 | 2007-12-03 00:00:00.000 | 2011-08-26 00:00:00.000 | 1363 | 0 | 2007-12-03 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 5 | 2013-05-02 00:00:00.000 | NULL | NULL | 1363 | 2009-08-08 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 30 | 2006-10-02 00:00:00.000 | 2011-01-16 00:00:00.000 | 1568 | 0 | 2006-10-02 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 30 | 2011-01-17 00:00:00.000 | 2012-08-12 00:00:00.000 | 574 | 1568 | 2006-10-02 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 30 | 2012-08-13 00:00:00.000 | NULL | NULL | 2142 | 2006-10-02 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 66 | 2007-09-24 00:00:00.000 | NULL | NULL | 0 | 2007-09-24 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
Finally we can GROUP BY grp to get the get rid of the consecutive days.
Use MIN and MAX to get the new startdate and endate
To handle the NULL enddate we give them a large value to get picked up by MAX then convert them back to NULL again
SELECT
employmentid,
MIN(startdate) as startdate,
NULLIF(MAX(COALESCE(enddate,'9999-01-01')), '9999-01-01') as enddate
FROM (
SELECT *, DATEADD(DAY, -cumulativeDaysSpanned, startdate) as grp
FROM (
SELECT *, COALESCE(
SUM(daysSpanned) OVER (
PARTITION BY employmentid
ORDER BY startdate
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
)
,0
) as cumulativeDaysSpanned
FROM (
SELECT *, DATEDIFF(DAY, startdate, enddate)+1 as daysSpanned FROM #t
) inner1
) inner2
) inner3
GROUP BY employmentid, grp
ORDER BY employmentid, startdate
To get the desired result
+--------------+-------------------------+-------------------------+
| employmentid | startdate | enddate |
+--------------+-------------------------+-------------------------+
| 5 | 2007-12-03 00:00:00.000 | 2011-08-26 00:00:00.000 |
+--------------+-------------------------+-------------------------+
| 5 | 2013-05-02 00:00:00.000 | NULL |
+--------------+-------------------------+-------------------------+
| 30 | 2006-10-02 00:00:00.000 | NULL |
+--------------+-------------------------+-------------------------+
| 66 | 2007-09-24 00:00:00.000 | NULL |
+--------------+-------------------------+-------------------------+
We can combine the inner queries to get the query at the start of this answer. Which is shorter, but less explainable
Limitations of all this required that
there are no overlaps of startdate and enddate for an employment. This could produce collisions in our grp.
startdate is not NULL. However this could be overcome by replacing NULL start dates with small date values
Future developers can decipher the window black magic you performed
A modified script for combining all overlapping periods. For example
01.01.2001-01.01.2010
05.05.2005-05.05.2015
will give one period:
01.01.2001-05.05.2015
tbl.enddate must be completed
;WITH cte
AS(
SELECT
a.employmentid
,a.startdate
,a.enddate
from tbl a
left join tbl c on a.employmentid=c.employmentid
and a.startdate > c.startdate
and a.startdate <= dateadd(day, 1, c.enddate)
WHERE c.employmentid IS NULL
UNION all
SELECT
a.employmentid
,a.startdate
,a.enddate
from cte a
inner join tbl c on a.startdate=c.startdate
and (c.startdate = dateadd(day, 1, a.enddate) or (c.enddate > a.enddate and c.startdate <= a.enddate))
)
select distinct employmentid,
startdate,
nullif(max(enddate),'31.12.2099') enddate
from cte
group by employmentid, startdate