So I need to return the most recent 24 contiguous hours from a query. Table holds hourly data. Getting the last 24 hours is not a problem but I sometimes have missing data and therefore need to go further back in time to find the the first "full" set of 24 hours.
select date, value from TABLE
where date >= (select max(date)-1 from TABLE)
However sometimes I have missing hours with this query. How can I ensure I always get 24 rows back and that it is the most recent block of 24 hours?
An example below:
Notice for category A, hour 1/31/2020 hour 23 is missing and therefore what should be returned are hours 1/31/2020 22 thru 1/30/2020 23. Category B should return hours 2/1/2020 hour 0 thru 1/31/2020 hour 1.
You need a few steps. First, for each record, you need to see how many hours of contiguous preceding data it has. That is what the grouped_hour_data clause does in the solution below.
Then, you need to select from that result, getting only the rows that have a full 24 hours of contiguous preceding data. Then fetch only the first 24 rows of that.
This solution is simplified to take advantage of the fact that all your dates were truncated to the hour and there were no duplicates. If your problem is more complicated than that, this solution can still support it but it will need to be revised.
In this example, we create test data going back several days, but remove data from individual hours on the 16th and 17th, so that the 1st contiguous 24 hour period ends early on the 16th.
alter session set nls_date_format = 'DD-MON-YYYY HH24:MI:SS';
with hour_data_raw AS (
SELECT to_date('17-JUN-2020 17:00:00','DD-MON-YYYY HH24:MI:SS') - ( INTERVAL '1' HOUR ) * rownum dte
FROM dual
CONNECT BY rownum <= 200 ),
hour_data AS ( SELECT dte
FROM hour_data_raw
WHERE NOT TRUNC(dte,'HH') = to_date('17-JUN-2020 02:00:00','DD-MON-YYYY HH24:MI:SS')
AND NOT TRUNC(dte,'HH') = to_date('16-JUN-2020 02:00:00','DD-MON-YYYY HH24:MI:SS') ),
-- SOLUTION BEGINS HERE... everything above is just test data
-- WITH...
grouped_hour_data AS (
SELECT h.*, count(trunc(h.dte,'HH')) OVER ( ORDER BY dte desc RANGE BETWEEN CURRENT ROW AND INTERVAL '1' DAY - INTERVAL '1' SECOND FOLLOWING ) cnt
FROM hour_data h
ORDER BY dte)
SELECT * FROM grouped_hour_data
WHERE cnt = 24
ORDER BY dte desc
FETCH FIRST 24 ROWS ONLY;
+----------------------+-----+
| DTE | CNT |
+----------------------+-----+
| 16-JUN-2020 01:00:00 | 24 |
| 16-JUN-2020 00:00:00 | 24 |
| 15-JUN-2020 23:00:00 | 24 |
| 15-JUN-2020 22:00:00 | 24 |
| 15-JUN-2020 21:00:00 | 24 |
| 15-JUN-2020 20:00:00 | 24 |
| 15-JUN-2020 19:00:00 | 24 |
| 15-JUN-2020 18:00:00 | 24 |
| 15-JUN-2020 17:00:00 | 24 |
| 15-JUN-2020 16:00:00 | 24 |
| 15-JUN-2020 15:00:00 | 24 |
| 15-JUN-2020 14:00:00 | 24 |
| 15-JUN-2020 13:00:00 | 24 |
| 15-JUN-2020 12:00:00 | 24 |
| 15-JUN-2020 11:00:00 | 24 |
| 15-JUN-2020 10:00:00 | 24 |
| 15-JUN-2020 09:00:00 | 24 |
| 15-JUN-2020 08:00:00 | 24 |
| 15-JUN-2020 07:00:00 | 24 |
| 15-JUN-2020 06:00:00 | 24 |
| 15-JUN-2020 05:00:00 | 24 |
| 15-JUN-2020 04:00:00 | 24 |
| 15-JUN-2020 03:00:00 | 24 |
| 15-JUN-2020 02:00:00 | 24 |
+----------------------+-----+
EDIT: handling category field
To handle the additional category field you added, you need to do a few things.
First, PARTITION BY category when you are computing the cnt field. This will cause each category's data to be treated separately when computing this value. So, for example, a value in hour 2 for category A will not count as a value in hour 2 for category B.
Second, you can no longer use FETCH FIRST 24 ROWS ONLY to get the data you want, because you need the first 24 rows in each category now. So, you need an extra step (ordered_groups, in the revised query below) to order the rows in each category that have 24 continuous hours of data preceding them. Call that ordering rn and then, in the final query, just select where rn <= 24.
WITH grouped_hour_data AS (
SELECT h.*, count(trunc(h.dte,'HH')) OVER (
PARTITION BY category
ORDER BY dte desc
RANGE BETWEEN CURRENT ROW
AND INTERVAL '1' DAY - INTERVAL '1' SECOND FOLLOWING ) cnt
FROM hour_data h
ORDER BY dte),
ordered_groups AS (
SELECT ghd.*, row_number() over ( partition by ghd.category order by ghd.dte desc ) rn
FROM grouped_hour_data
WHERE ghd.cnt = 24 )
SELECT * FROM ordered_groups
WHERE rn <= 24;
ORDER BY category, dte desc;
Disclosure: I have not tested this updated logic so there may be some errors.
It looks like you actually you the last 24 rows in your hourly table. If so, you can use a row-limiting clause:
select date, value
from mytable
order by date desc
fetch first 24 rows only
Or if there may be multiple records per hour, then another option is dense_rank():
select date, value
from (
select t.*, dense_rank() over(order by trunc(date, 'hh24') desc) rn
from mytable t
) t
where rn <= 24
[EDIT] The below should work for you:
IF OBJECT_ID('tempdb..#hours') IS NOT NULL
DROP TABLE #hours
create table #hours ([Hour] int)
insert into #hours select 1
insert into #hours select 2
insert into #hours select 3
insert into #hours select 4
insert into #hours select 5
insert into #hours select 6
insert into #hours select 7
insert into #hours select 8
insert into #hours select 9
insert into #hours select 10
insert into #hours select 11
insert into #hours select 12
insert into #hours select 13
insert into #hours select 14
insert into #hours select 15
insert into #hours select 16
insert into #hours select 17
insert into #hours select 18
insert into #hours select 19
insert into #hours select 20
insert into #hours select 21
insert into #hours select 22
insert into #hours select 23
insert into #hours select 24
-- step 1 --
IF OBJECT_ID('tempdb..#temp1') IS NOT NULL
DROP TABLE #temp1
select
t.[Date]
,convert(date,t.[Date]) [Day]
,datepart(hour,t.[Date]) [Hour]
,t.Value
into
#temp1
from
#yourtable t
-- step 2 --
IF OBJECT_ID('tempdb..#temp2') IS NOT NULL
DROP TABLE #temp2
select
max(t.[Day]) [MostRecentDay]
into
#temp2
from
#temp1 t
cross apply (
select
count(distinct i.[Hour]) [HrCt]
from
#temp1 i
where
t.[Day] = i.[Day]
) hc
where
hc.HrCt <> 24
-- step 3 --
IF OBJECT_ID('tempdb..#temp3') IS NOT NULL
DROP TABLE #temp3
select
min(t1.[Hour]) [FirstBlank]
into
#temp3
from
#temp2 t2
inner join #temp1 t1
on t2.[MostRecentDay] = t1.[Day]
left outer join #hours h
on t1.[Hour] = h.[Hour]
where
h.[Hour] is null
-- final select --
select top 24
t1.[Date]
,t1.[Value]
from
#temp1 t1
cross join #temp2 t2
cross join #temp3 t3
where
t1.[Date] < convert(datetime,concat(t2.[MostRecentDay],' ',t3.[FirstBlank]))
order by
t1.[Date] desc
Related
I have the following table Jobs:
|Id | StartDateTime | EndDateTime
+----+---------------------+----------------------
|1 | 2020-10-20 23:00:00 | 2020-10-21 05:00:00
|2 | 2020-10-21 10:00:00 | 2020-10-21 11:00:00
Note job id 1 spans October 20 and 21.
I am using the following query
SELECT DAY(StartDateTime), COUNT(id)
FROM Job
GROUP BY DAY(StartDateTime)
To get the following output. But the problem I am facing is that day 21 is not including job id 1. Since the job spans two days I want to include it in both days 20 and 21.
Day | TotalJobs
----+----------
20 | 1
21 | 1
I am struggling to get the following expected output:
Day | TotalJobs
----+----------
20 | 1
21 | 2
One method is to generate the days that you want and then count overlaps:
with days as (
select convert(date, min(j.startdatetime)) as startd,
convert(date, max(j.enddatetime)) as endd
from jobs j
union all
select dateadd(day, 1, startd), endd
from days
where startd < endd
)
select days.startd, count(j.id)
from days left join
jobs j
on j.startdatetime < dateadd(day, 1, startd) and
j.enddatetime >= startd
group by days.startd;
Here is a db<>fiddle.
You can first group by with same start and end date and then group by for start and end date having different start and end date
SELECT a.date, SUM(counts) from (
SELECT DAY(StartDateTime) as date, COUNT(id) counts
FROM Table1
WHERE DAY(StartDateTime) = DAY(EndDateTime)
GROUP BY StartDateTime
UNION ALL
SELECT DAY(EndDateTime), COUNT(id)
FROM Table1
WHERE DAY(StartDateTime) != DAY(EndDateTime)
GROUP BY EndDateTime
UNION ALL
SELECT DAY(StartDateTime), COUNT(id)
FROM Table1
WHERE DAY(StartDateTime) != DAY(EndDateTime)
GROUP BY StartDateTime) a
GROUP BY a.date
Here is SQL Fiddle link
SQL Fiddle
Also replace Table1 with Jobs when running over your db context
My knowledge is pretty basic so your help would be highly appreciated.
I'm trying to return the same row multiple times when it meets the condition (I only have access to select query).
I have a table of more than 500000 records with Customer ID, Start Date and End Date, where end date could be null.
I am trying to add a new column called Week_No and list all rows accordingly. For example if the date range is more than one week, then the row must be returned multiple times with corresponding week number. Also I would like to count overlapping days, which will never be more than 7 (week) per row and then count unavailable days using second table.
Sample data below
t1
ID | Start_Date | End_Date
000001 | 12/12/2017 | 03/01/2018
000002 | 13/01/2018 |
000003 | 02/01/2018 | 11/01/2018
...
t2
ID | Unavailable
000002 | 14/01/2018
000003 | 03/01/2018
000003 | 04/01/2018
000003 | 08/01/2018
...
I cannot pass the stage of adding week no. I have tried using CASE and UNION ALL but keep getting errors.
declare #week01start datetime = '2018-01-01 00:00:00'
declare #week01end datetime = '2018-01-07 00:00:00'
declare #week02start datetime = '2018-01-08 00:00:00'
declare #week02end datetime = '2018-01-14 00:00:00'
...
SELECT
ID,
'01' as Week_No,
'2018' as YEAR,
Start_Date,
End_Date
FROM t1
WHERE (Start_Date <= #week01end and End_Date >= #week01start)
or (Start_Date <= #week01end and End_Date is null)
UNION ALL
SELECT
ID,
'02' as Week_No,
'2018' as YEAR,
Start_Date,
End_Date
FROM t1
WHERE (Start_Date <= #week02end and End_Date >= #week02start)
or (Start_Date <= #week02end and End_Date is null)
...
The new table should look like this
ID | Week_No | Year | Start_Date | End_Date | Overlap | Unavail_Days
000001 | 01 | 2018 | 12/12/2017 | 03/01/2018 | 3 |
000002 | 02 | 2018 | 13/01/2018 | | 2 | 1
000003 | 01 | 2018 | 02/01/2018 | 11/01/2018 | 6 | 2
000003 | 02 | 2018 | 02/01/2018 | 11/01/2018 | 4 | 1
...
business wise i cannot understand what you are trying to achieve. You can use the following code though to calculate your overlapping days etc. I did it the way you asked, but i would recommend a separate table, like a Time dimension to produce a "cleaner" solution
/*sample data set in temp table*/
select '000001' as id, '2017-12-12'as start_dt, ' 2018-01-03' as end_dt into #tmp union
select '000002' as id, '2018-01-13 'as start_dt, null as end_dt union
select '000003' as id, '2018-01-02' as start_dt, '2018-01-11' as end_dt
/*calculate week numbers and week diff according to dates*/
select *,
DATEPART(WK,start_dt) as start_weekNumber,
DATEPART(WK,end_dt) as end_weekNumber,
case
when DATEPART(WK,end_dt) - DATEPART(WK,start_dt) > 0 then (DATEPART(WK,end_dt) - DATEPART(WK,start_dt)) +1
else (52 - DATEPART(WK,start_dt)) + DATEPART(WK,end_dt)
end as WeekDiff
into #tmp1
from
(
SELECT *,DATEADD(DAY, 2 - DATEPART(WEEKDAY, start_dt), CAST(start_dt AS DATE)) [start_dt_Week_Start_Date],
DATEADD(DAY, 8 - DATEPART(WEEKDAY, start_dt), CAST(start_dt AS DATE)) [startdt_Week_End_Date],
DATEADD(DAY, 2 - DATEPART(WEEKDAY, end_dt), CAST(end_dt AS DATE)) [end_dt_Week_Start_Date],
DATEADD(DAY, 8 - DATEPART(WEEKDAY, end_dt), CAST(end_dt AS DATE)) [end_dt_Week_End_Date]
from #tmp
) s
/*cte used to create duplicates when week diff is over 1*/
;with x as
(
SELECT TOP (10) rn = ROW_NUMBER() --modify the max you want
OVER (ORDER BY [object_id])
FROM sys.all_columns
ORDER BY [object_id]
)
/*final query*/
select --*
ID,
start_weekNumber+ (r-1) as Week,
DATEPART(YY,start_dt) as [YEAR],
start_dt,
end_dt,
null as Overlap,
null as unavailable_days
from
(
select *,
ROW_NUMBER() over (partition by id order by id) r
from
(
select d.* from x
CROSS JOIN #tmp1 AS d
WHERE x.rn <= d.WeekDiff
union all
select * from #tmp1
where WeekDiff is null
) a
)a_ext
order by id,start_weekNumber
--drop table #tmp1,#tmp
The above will produce the results you want except the overlap and unavailable columns. Instead of just counting weeks, i added the number of week in the year using start_dt, but you can change that if you don't like it:
ID Week YEAR start_dt end_dt Overlap unavailable_days
000001 50 2017 2017-12-12 2018-01-03 NULL NULL
000001 51 2017 2017-12-12 2018-01-03 NULL NULL
000001 52 2017 2017-12-12 2018-01-03 NULL NULL
000002 2 2018 2018-01-13 NULL NULL NULL
000003 1 2018 2018-01-02 2018-01-11 NULL NULL
000003 2 2018 2018-01-02 2018-01-11 NULL NULL
I have a table called temp. In this table I have Date and Value.
Date | Value
2016/04/01 07:00am | 1
2016/04/01 09:00am | 2
2016/04/01 11:00am | 3
...
2016/04/01 07:00pm | 5
2016/04/01 11:00pm | 2
...
2016/04/02 07:00am | 10
2016/04/02 09:00am | 13
2016/04/02 11:00am | 1
...
2016/04/02 07:00pm | 32
2016/04/02 09:00pm | 40
I would like to return:
Date | Value
04/01/2016 11:00am | 3
04/01/2016 07:00pm | 5
04/02/2016 09:00am | 13
04/02/2016 09:00pm | 40
The idea is to group in 12 hour intervals and then find the max value of said group.
So far I have:
SELECT t.date, max(t.value)
FROM temp t
WHERE t.Date between DATEADD(hour, 7, '04/01/2016') and DATEADD(minute, 1859, '04/02/2016')
GROUP BY DATEPART(Hour, t.date)%12, t.date
ORDER BY Date
But it returns all the data, no 12 hour groups.
Any ideas?
You don't want MAX as you don't want to group by the date, you want the single instance of the datetime that has the largest value. Therefore you can use ROW_NUMBER with a PARTITION based on the date and AM/PM period to get the row with the largest value in that period (ORDER BY t.value DESC):
SELECT date, value
FROM
(SELECT t.date,
t.value,
ROW_NUMBER()
OVER(PARTITION BY CAST(t.date AS date), CASE WHEN DATEPART(hour, t.date) < 12 THEN 0 ELSE 1 END
ORDER BY t.value DESC) AS rownum
FROM temp t
WHERE t.Date between DATEADD(hour, 7, '04/01/2016') and DATEADD(minute, 1859, '04/02/2016')
) max_val
WHERE max_val.rownum = 1
ORDER BY Date
Using SQL Server 2008 R2,
I'm trying to combine date ranges into the maximum date range given that one end date is next to the following start date.
The data is about different employments. Some employees may have ended their employment and have rejoined at a later time. Those should count as two different employments (example ID 5). Some people have different types of employment, running after each other (enddate and startdate neck-to-neck), in this case it should be considered as one employment in total (example ID 30).
An employment period that has not ended has an enddate that is null.
Some examples is probably enlightening:
declare #t as table (employmentid int, startdate datetime, enddate datetime)
insert into #t values
(5, '2007-12-03', '2011-08-26'),
(5, '2013-05-02', null),
(30, '2006-10-02', '2011-01-16'),
(30, '2011-01-17', '2012-08-12'),
(30, '2012-08-13', null),
(66, '2007-09-24', null)
-- expected outcome
EmploymentId StartDate EndDate
5 2007-12-03 2011-08-26
5 2013-05-02 NULL
30 2006-10-02 NULL
66 2007-09-24 NULL
I've been trying different "islands-and-gaps" techniques but haven't been able to crack this one.
The strange bit you see with my use of the date '31211231' is just a very large date to handle your "no-end-date" scenario. I have assumed you won't really have many date ranges per employee, so I've used a simple Recursive Common Table Expression to combine the ranges.
To make it run faster, the starting anchor query keeps only those dates that will not link up to a prior range (per employee). The rest is just tree-walking the date ranges and growing the range. The final GROUP BY keeps only the largest date range built up per starting ANCHOR (employmentid, startdate) combination.
SQL Fiddle
MS SQL Server 2008 Schema Setup:
create table Tbl (
employmentid int,
startdate datetime,
enddate datetime);
insert Tbl values
(5, '2007-12-03', '2011-08-26'),
(5, '2013-05-02', null),
(30, '2006-10-02', '2011-01-16'),
(30, '2011-01-17', '2012-08-12'),
(30, '2012-08-13', null),
(66, '2007-09-24', null);
/*
-- expected outcome
EmploymentId StartDate EndDate
5 2007-12-03 2011-08-26
5 2013-05-02 NULL
30 2006-10-02 NULL
66 2007-09-24 NULL
*/
Query 1:
;with cte as (
select a.employmentid, a.startdate, a.enddate
from Tbl a
left join Tbl b on a.employmentid=b.employmentid and a.startdate-1=b.enddate
where b.employmentid is null
union all
select a.employmentid, a.startdate, b.enddate
from cte a
join Tbl b on a.employmentid=b.employmentid and b.startdate-1=a.enddate
)
select employmentid,
startdate,
nullif(max(isnull(enddate,'32121231')),'32121231') enddate
from cte
group by employmentid, startdate
order by employmentid
Results:
| EMPLOYMENTID | STARTDATE | ENDDATE |
-----------------------------------------------------------------------------------
| 5 | December, 03 2007 00:00:00+0000 | August, 26 2011 00:00:00+0000 |
| 5 | May, 02 2013 00:00:00+0000 | (null) |
| 30 | October, 02 2006 00:00:00+0000 | (null) |
| 66 | September, 24 2007 00:00:00+0000 | (null) |
SET NOCOUNT ON
DECLARE #T TABLE(ID INT,FromDate DATETIME, ToDate DATETIME)
INSERT INTO #T(ID,FromDate,ToDate)
SELECT 1,'20090801','20090803' UNION ALL
SELECT 2,'20090802','20090809' UNION ALL
SELECT 3,'20090805','20090806' UNION ALL
SELECT 4,'20090812','20090813' UNION ALL
SELECT 5,'20090811','20090812' UNION ALL
SELECT 6,'20090802','20090802'
SELECT ROW_NUMBER() OVER(ORDER BY s1.FromDate) AS ID,
s1.FromDate,
MIN(t1.ToDate) AS ToDate
FROM #T s1
INNER JOIN #T t1 ON s1.FromDate <= t1.ToDate
AND NOT EXISTS(SELECT * FROM #T t2
WHERE t1.ToDate >= t2.FromDate
AND t1.ToDate < t2.ToDate)
WHERE NOT EXISTS(SELECT * FROM #T s2
WHERE s1.FromDate > s2.FromDate
AND s1.FromDate <= s2.ToDate)
GROUP BY s1.FromDate
ORDER BY s1.FromDate
An alternative solution that uses window functions rather than recursive CTEs
SELECT
employmentid,
MIN(startdate) as startdate,
NULLIF(MAX(COALESCE(enddate,'9999-01-01')), '9999-01-01') as enddate
FROM (
SELECT
employmentid,
startdate,
enddate,
DATEADD(
DAY,
-COALESCE(
SUM(DATEDIFF(DAY, startdate, enddate)+1) OVER (PARTITION BY employmentid ORDER BY startdate ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),
0
),
startdate
) as grp
FROM #t
) withGroup
GROUP BY employmentid, grp
ORDER BY employmentid, startdate
This works by calculating a grp value that will be the same for all consecutive rows. This is achieved by:
Determine totals days the span occupies (+1 as the dates are inclusive)
SELECT *, DATEDIFF(DAY, startdate, enddate)+1 as daysSpanned FROM #t
Cumulative sum the days spanned for each employment, ordered by startdate. This gives us the total days spanned by all the previous employment spans
We coalesce with 0 to ensure we dont have NULLs in our cumulative sum of days spanned
We do not include current row in our cumulative sum, this is because we will use the value against startdate rather than enddate (we cant use it against enddate because of the NULLs)
SELECT *, COALESCE(
SUM(daysSpanned) OVER (
PARTITION BY employmentid
ORDER BY startdate
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
)
,0
) as cumulativeDaysSpanned
FROM (
SELECT *, DATEDIFF(DAY, startdate, enddate)+1 as daysSpanned FROM #t
) inner1
Subtract the cumulative days from the startdate to get our grp. This is the crux of the solution.
If the start date increases at the same rate as the days spanned then the days are consecutive, and subtracting the two will give us the same value.
If the startdate increases faster than the days spanned then there is a gap and we will get a new grp value greater than the previous one.
Although grp is a date, the date itself is meaningless we are using just as a grouping value
SELECT *, DATEADD(DAY, -cumulativeDaysSpanned, startdate) as grp
FROM (
SELECT *, COALESCE(
SUM(daysSpanned) OVER (
PARTITION BY employmentid
ORDER BY startdate
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
)
,0
) as cumulativeDaysSpanned
FROM (
SELECT *, DATEDIFF(DAY, startdate, enddate)+1 as daysSpanned FROM #t
) inner1
) inner2
With the results
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| employmentid | startdate | enddate | daysSpanned | cumulativeDaysSpanned | grp |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 5 | 2007-12-03 00:00:00.000 | 2011-08-26 00:00:00.000 | 1363 | 0 | 2007-12-03 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 5 | 2013-05-02 00:00:00.000 | NULL | NULL | 1363 | 2009-08-08 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 30 | 2006-10-02 00:00:00.000 | 2011-01-16 00:00:00.000 | 1568 | 0 | 2006-10-02 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 30 | 2011-01-17 00:00:00.000 | 2012-08-12 00:00:00.000 | 574 | 1568 | 2006-10-02 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 30 | 2012-08-13 00:00:00.000 | NULL | NULL | 2142 | 2006-10-02 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 66 | 2007-09-24 00:00:00.000 | NULL | NULL | 0 | 2007-09-24 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
Finally we can GROUP BY grp to get the get rid of the consecutive days.
Use MIN and MAX to get the new startdate and endate
To handle the NULL enddate we give them a large value to get picked up by MAX then convert them back to NULL again
SELECT
employmentid,
MIN(startdate) as startdate,
NULLIF(MAX(COALESCE(enddate,'9999-01-01')), '9999-01-01') as enddate
FROM (
SELECT *, DATEADD(DAY, -cumulativeDaysSpanned, startdate) as grp
FROM (
SELECT *, COALESCE(
SUM(daysSpanned) OVER (
PARTITION BY employmentid
ORDER BY startdate
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
)
,0
) as cumulativeDaysSpanned
FROM (
SELECT *, DATEDIFF(DAY, startdate, enddate)+1 as daysSpanned FROM #t
) inner1
) inner2
) inner3
GROUP BY employmentid, grp
ORDER BY employmentid, startdate
To get the desired result
+--------------+-------------------------+-------------------------+
| employmentid | startdate | enddate |
+--------------+-------------------------+-------------------------+
| 5 | 2007-12-03 00:00:00.000 | 2011-08-26 00:00:00.000 |
+--------------+-------------------------+-------------------------+
| 5 | 2013-05-02 00:00:00.000 | NULL |
+--------------+-------------------------+-------------------------+
| 30 | 2006-10-02 00:00:00.000 | NULL |
+--------------+-------------------------+-------------------------+
| 66 | 2007-09-24 00:00:00.000 | NULL |
+--------------+-------------------------+-------------------------+
We can combine the inner queries to get the query at the start of this answer. Which is shorter, but less explainable
Limitations of all this required that
there are no overlaps of startdate and enddate for an employment. This could produce collisions in our grp.
startdate is not NULL. However this could be overcome by replacing NULL start dates with small date values
Future developers can decipher the window black magic you performed
A modified script for combining all overlapping periods. For example
01.01.2001-01.01.2010
05.05.2005-05.05.2015
will give one period:
01.01.2001-05.05.2015
tbl.enddate must be completed
;WITH cte
AS(
SELECT
a.employmentid
,a.startdate
,a.enddate
from tbl a
left join tbl c on a.employmentid=c.employmentid
and a.startdate > c.startdate
and a.startdate <= dateadd(day, 1, c.enddate)
WHERE c.employmentid IS NULL
UNION all
SELECT
a.employmentid
,a.startdate
,a.enddate
from cte a
inner join tbl c on a.startdate=c.startdate
and (c.startdate = dateadd(day, 1, a.enddate) or (c.enddate > a.enddate and c.startdate <= a.enddate))
)
select distinct employmentid,
startdate,
nullif(max(enddate),'31.12.2099') enddate
from cte
group by employmentid, startdate
My table looks like this:
Table 1:
Note: This table is very large in reality, with lots more columns (20ish) and rows (in the millions)
| Time | tmp | productionID |
| 10:00:00 | 2.2 | 5 |
| 10:00:05 | 5.2 | 5 |
| 10:00:11 | 7.4 | 5 |
| ...... | 3.2 | 5 |
| 10:10:02 | 4.5 | 5 |
Note: Timeis a varchar, so I assume I need to do something like this:
CONVERT(VARCHAR(8), DATEADD(mi, 10, time), 114)
What I need to do is:
select time, tmp
from mytable
where productionid = somevalue
and time = first_time_stamp associated to some productionID(ie. 10:00:00 table above)
time = 10 minutes after the first_time_stamp with some productionID
time = 20 minutes after the first_time_stamp with some productionID
...25, 30, 40, 60, 120, 180 minutes
I hope this makes sense. I'm not sure what the right way to do this is. I mean I thought of the following proccess:
-select first time stamp (with some productionID)
-add 10 minutes to that that time,
-add 20 minutes etc.. then use a pivot table and use joins to link to table 1
There must be an easier way.
Thank you in advance for the expertise.
Sample output expected:
| Time | tmp
| 10:00:00 | 2.2
| 10:10:02 | 4.5
| 10:20:54 | 2.3
| 10:30:22 | 5.3
If you create an interval table on-the-fly and cross join it with starting time for each ProductionID, you can extract records fromMyTable falling in the same category and choose to retrieve only the first one.
; with timeSlots (startSlot, endSlot) as (
select 0, 10
union all
select 10, 20
union all
select 25, 30
union all
select 30, 40
union all
select 40, 60
union all
select 60, 120
union all
select 120, 180
),
startTimes (ProductionID, minTime) as (
select ProductionID, min([Time])
from MyTable
group by ProductionID
),
groupedTime (ProductionID, [Time], [Tmp], groupOrder) as (
select myTable.ProductionID,
myTable.Time,
myTable.Tmp,
row_number () over (partition by myTable.productionid, timeSlots.startSlot
order by mytable.Time) groupOrder
from startTimes
cross join timeslots
inner join myTable
on startTimes.ProductionID = myTable.ProductionID
and convert(varchar(8), dateadd(minute, timeSlots.startSlot, convert(datetime, startTimes.MinTime, 114)), 114) <= mytable.Time
and convert(varchar(8), dateadd(minute, timeSlots.endSlot, convert(datetime, startTimes.MinTime, 114)), 114) > myTable.Time
)
select ProductionID, [Time], [Tmp]
from groupedTime
where groupOrder = 1
Sql Fiddle here.
This will work if there are no missing sequence in the time. I mean if there are time values for each increment of 10minutes then this will work.
create table times([Time] time,tmp float,productionID int)
INSERT INTO times
VALUES('10:10:00',2.2,5),
('10:00:05',5.2,5),
('10:00:11',7.4,5),
('10:00:18',3.2,5),
('10:10:02',4.5,5),
('10:20:22',5.3,5)
select * from times
Declare #min_time time
select #min_time = MIN(time) from times
;WITH times1 as (select row_number() over (order by (select 0)) as id,time, tmp from times where productionID = 5)
,times2 as(
select id,time,tmp from times1 where time=#min_time
union all
select t1.id,t1.time,t1.tmp from times2 t2 inner join times1 t1 on cast(t1.time as varchar(5))=cast(DATEADD(mi,10,t2.Time) as varchar(5)) --where t1.id-1=t2.id
)
,result as (select MAX(time) as time from times2
group by CAST(time as varchar(5)))
select distinct t2.Time,t2.tmp from times2 t2,Result r where t2.Time =r.time order by 1