find time slots with sql - sql

I have a scenario (SQL 2008) where I need to find the occupied timeframes /non-gaps from the below table. For e.g . I have created this dummy table.
CREATE TABLE Job
(
JobID INT NOT NULL,
WorkerID INT NOT NULL,
JobStart DATETIME NOT NULL,
JobEnd DATETIME NOT NULL
);
INSERT INTO Job (JobID, WorkerID, JobStart, JobEnd)
VALUES (1, 25, '2012-11-17 16:00', '2012-11-17 17:00'),
(2, 25, '2012-11-17 16:00', '2012-11-17 16:50'),
(3, 25, '2012-11-19 18:00', '2012-11-19 18:30'),
(4, 25, '2012-11-19 17:30', '2012-11-19 18:10'),
(5, 26, '2012-11-18 16:00', '2012-11-18 17:10'),
(6, 26, '2012-11-18 16:00', '2012-11-19 16:50');
so for this , the qry shd return data like this:
WorkerID | StartDate | EndDate
25 2012-11-17 16:00 2012-11-17 17:00
25 2012-11-17 17:30 2012-11-17 18:30
26 2012-11-18 16:00 2012-11-18 17:10
I am able to get the result but I am using while loop and its a pretty iterative method. Any chance , I can avoid using while to get the result

This is a Packing Date and Time Interval problem. Itzik Ben-Gan has published an article that provides many solutions to this problem. Using one of Itzik's solution, here is a query to solve your problem:
SQL Fiddle
WITH C1 AS(
SELECT
JobID, WorkerId, JobStart AS ts, +1 AS type, NULL AS e,
ROW_NUMBER() OVER(PARTITION BY WorkerId ORDER BY JobStart, JobId) AS s
FROM Job
UNION ALL
SELECT
JobID, WorkerId, JobEnd AS ts, -1 AS type,
ROW_NUMBER() OVER(PARTITION BY WorkerId ORDER BY JobEnd, JobId) AS e,
NULL AS s
FROM Job
),
C2 AS(
SELECT *,
ROW_NUMBER() OVER(PARTITION BY WorkerId ORDER BY ts, type DESC, JobId) AS se
FROM C1
),
C3 AS(
SELECT ts, WorkerId,
FLOOR((ROW_NUMBER() OVER(PARTITION BY WorkerId ORDER BY ts) - 1) / 2 + 1) AS grpnum
FROM C2
WHERE COALESCE(s - (se - s) - 1, (se - e) - e) = 0
)
SELECT
WorkerId,
MIN(ts) AS StartDate,
MAX(ts) AS EndDate
FROM C3
GROUP BY WorkerID, grpnum
ORDER BY WorkerID
Result
WorkerId StartDate EndDate
----------- ----------------------- -----------------------
25 2012-11-17 16:00:00.000 2012-11-17 17:00:00.000
25 2012-11-19 17:30:00.000 2012-11-19 18:30:00.000
26 2012-11-18 16:00:00.000 2012-11-19 16:50:00.000

Related

Generate Start Date and End Date based on the contiguous rows

I have a table :
ID
Startdate
Enddate
TEXT
0011
2022-02-07
2022-02-07
TEXT1
0011
2022-02-04
2022-02-05
TEXT2
0011
2022-02-06
2022-02-06
TEXT3
0011
2022-02-03
2022-02-03
TEXT4
0011
2022-02-03
2022-02-04
TEXT5
0011
2022-02-02
2022-02-07
TEXT6
0011
2022-02-02
2022-02-02
TEXT7
0011
2021-12-01
2021-12-03
TEXT8
Expected output:
ID
Startdate
Enddate
TEXT
0011
2022-02-02
2022-02-07
TEXT1,TEXT2,TEXT3,TEXT4,TEXT5,TEXT6,TEXT7
0011
2021-12-01
2021-12-03
TEXT8
I tried with :
WITH _DAYS AS (
SELECT DATEADD(DAY, SEQ4(), '2021-12-01') AS DAY
FROM TABLE(GENERATOR(ROWCOUNT => 68))
), _GRPS AS (
SELECT *
, DATEDIFF(DAY, '2021-12-01', D.DAY) - DENSE_RANK() OVER(PARTITION BY PASS1.MEMBER_ID ORDER BY D.DAY) AS GRP
FROM _DAYS AS D
JOIN table PASS1
ON D.DAY BETWEEN PASS1.Startdate AND PASS1.Enddate
)
SELECT ID
, TEXT
, MIN(DAY) AS START_DATE
, MAX(DAY) AS END_DATE
FROM _GRPS
GROUP BY ID,TEXT, GRP
I was able to achieve the desired output with only start-date and end-date in the table but the inclusion of TEXT column did not give me the desired output.
Please suggest !
Mostly a fix-up of Greg's Answer..
With a CTE for the pass1 data:
select * from values
('0011', '2022-02-07', '2022-02-07', 'TEXT1'),
('0011', '2022-02-04', '2022-02-05', 'TEXT2'),
('0011', '2022-02-06', '2022-02-06', 'TEXT3'),
('0011', '2022-02-03', '2022-02-03', 'TEXT4'),
('0011', '2022-02-03', '2022-02-04', 'TEXT5'),
('0011', '2022-02-02', '2022-02-07', 'TEXT6'),
('0011', '2022-02-02', '2022-02-02', 'TEXT7'),
('0011', '2021-12-01', '2021-12-03', 'TEXT8')
)
WITH _DAYS AS (
SELECT
DATEADD(DAY, ROW_NUMBER() OVER(ORDER BY NULL)-1, '2021-12-01')::date AS DAY
FROM TABLE(GENERATOR(ROWCOUNT => 68))
), _GRPS AS (
SELECT *
,DATEDIFF(DAY, '2021-12-01', D.DAY) - DENSE_RANK() OVER(PARTITION BY PASS1.ID ORDER BY D.DAY) AS GRP
FROM _DAYS AS D
JOIN PASS1
ON D.DAY BETWEEN PASS1.Startdate AND PASS1.Enddate
)
SELECT
ID
,MIN(DAY) AS START_DATE
,MAX(DAY) AS END_DATE
,listagg(distinct TEXT, ',') WITHIN GROUP (ORDER BY TEXT) as TEXT
FROM _GRPS
GROUP BY ID,GRP
gives:
ID
START_DATE
END_DATE
TEXT
0011
2022-02-02
2022-02-07
TEXT1,TEXT2,TEXT3,TEXT4,TEXT5,TEXT6,TEXT7
0011
2021-12-01
2021-12-03
TEXT8
you should use ROW_NUMBER to get continuous values, as SEQx() can and do have gaps. In Greg's answer he used ORDER BY on the sub-select/CTE, when you should use the WINTHIN GROUP of the LISTAGG to ORDER BY, as it's more targeted sort.
Also given you have a generator, in _DAYS you can use that as the first half of the gaps-and-islands and thus skip the math..
WITH _DAYS AS (
SELECT
ROW_NUMBER() OVER(ORDER BY NULL)-1 as rn,
DATEADD(DAY, rn, '2021-12-01')::date AS DAY
FROM TABLE(GENERATOR(ROWCOUNT => 68))
), _GRPS AS (
SELECT *
,d.rn - DENSE_RANK() OVER(PARTITION BY PASS1.ID ORDER BY D.DAY) as grp
FROM _DAYS AS D
JOIN PASS1
ON D.DAY BETWEEN PASS1.Startdate AND PASS1.Enddate
)
...
LISTAGG will do what you need, but there are duplicate TEXT values and they're out of order. You can order them in the table expression above the final query in the CTE and use distinct to deduplicate the values:
create table pass1(id int, startdate date, enddate date, text string);
insert into pass1(id, startdate, enddate, text) values
(0011, '2022-02-07', '2022-02-07', 'TEXT1'),
(0011, '2022-02-04', '2022-02-05', 'TEXT2'),
(0011, '2022-02-06', '2022-02-06', 'TEXT3'),
(0011, '2022-02-03', '2022-02-03', 'TEXT4'),
(0011, '2022-02-03', '2022-02-04', 'TEXT5'),
(0011, '2022-02-02', '2022-02-07', 'TEXT6'),
(0011, '2022-02-02', '2022-02-02', 'TEXT7'),
(0011, '2021-12-01', '2021-12-03', 'TEXT8');
WITH _DAYS AS (
SELECT DATEADD(DAY, SEQ4(), '2021-12-01') AS DAY
FROM TABLE(GENERATOR(ROWCOUNT => 68))
), _GRPS AS (
SELECT *
, DATEDIFF(DAY, '2021-12-01', D.DAY) - DENSE_RANK() OVER(PARTITION BY PASS1.ID ORDER BY D.DAY) AS GRP
FROM _DAYS AS D
JOIN PASS1
ON D.DAY BETWEEN PASS1.Startdate AND PASS1.Enddate
order by TEXT
)
SELECT ID
, listagg(distinct TEXT, ',') as TEXT
, MIN(DAY) AS START_DATE
, MAX(DAY) AS END_DATE
FROM _GRPS

create dynamic records from time stamps

I have the following table:
Id Date Time Location leadHourDiff
3 2017-01-01 2017-01-01 13:00:00.000 Boston 2
15 2017-01-01 2017-01-01 13:00:00.000 Philly 1
16 2017-01-01 2017-01-01 15:00:00.000 Philly 1
and i would like dynamically create the hour records between Time and (Time + leadHourDiff)
so the end result would be:
Date Time Location
2017-01-01 2017-01-01 13:00:00.000 Boston --main record
2017-01-01 2017-01-01 14:00:00.000 Boston --new record
2017-01-01 2017-01-01 15:00:00.000 Boston --new record
2017-01-01 2017-01-01 13:00:00.000 Philly --main record
2017-01-01 2017-01-01 14:00:00.000 Philly --new record
2017-01-01 2017-01-01 15:00:00.000 Philly --main record
2017-01-01 2017-01-01 16:00:00.000 Philly --new record
One option is to use a numbers table (This can be generated with a recursive cte) and join the leadHourDiff column on to that.
with numbers(num) as (select 0
union all
select num+1 from numbers where num < 100 --change this as needed
)
select t.*,dateadd(hour,n.num,t.datetime_col) as new_datetime
from tbl t
join numbers n on t.leadHourDiff >= n.num
A simple way is to use a recursive CTE:
with cte as (
select id, date, time, Location, leadHourDiff
from t
union all
select id, date, dateadd(hour, 1, time), location, leadHourDiff - 1
from cte
where leadHourDiff >= 0
)
select date, time, Location
from cte
order by location, date, time;
Here's how I ended up doing this. Also, forgot to mention that I only wanted the missing time values. That was an tpyo on my part. Here's the whole solution
CREATE TABLE #Orders(
Id int IDENTITY(1,1)
,[Time] datetime
,[Location] varchar(20)
,OrderAmt int
)
INSERT INTO #Orders
SELECT '2017-01-01 11:00:00', 'Boston', 23 UNION ALL
SELECT '2017-01-01 12:00:00', 'Boston', 31 UNION ALL
SELECT '2017-01-01 13:00:00', 'Boston', 45 UNION ALL
SELECT '2017-01-01 16:00:00', 'Boston', 45 UNION ALL ---15
SELECT '2017-01-01 17:00:00', 'Boston', 67 UNION ALL
SELECT '2017-01-01 18:00:00', 'Boston', 89 UNION ALL
SELECT '2017-01-01 19:00:00', 'Boston', 90 UNION ALL
SELECT '2017-01-01 20:00:00', 'Boston', 123 UNION ALL
SELECT '2017-01-01 21:00:00', 'Boston', 145 UNION ALL
SELECT '2017-01-01 22:00:00', 'Boston', 156 UNION ALL
SELECT '2017-01-01 23:00:00', 'Boston', 145 UNION ALL
SELECT '2017-01-02 00:00:00', 'Boston', 167 UNION ALL
SELECT '2017-01-01 11:00:00', 'Philly', 23 UNION ALL
SELECT '2017-01-01 12:00:00', 'Philly', 31 UNION ALL
SELECT '2017-01-01 13:00:00', 'Philly', 45 UNION ALL
SELECT '2017-01-01 15:00:00', 'Philly', 45 UNION ALL
SELECT '2017-01-01 17:00:00', 'Philly', 67 UNION ALL
SELECT '2017-01-01 18:00:00', 'Philly', 89 UNION ALL
SELECT '2017-01-01 19:00:00', 'Philly', 90 UNION ALL
SELECT '2017-01-01 20:00:00', 'Philly', 123 UNION ALL
SELECT '2017-01-01 21:00:00', 'Philly', 145 UNION ALL
SELECT '2017-01-01 22:00:00', 'Philly', 156 UNION ALL
SELECT '2017-01-01 23:00:00', 'Philly', 145 UNION ALL
SELECT '2017-01-02 00:00:00', 'Philly', 167
;WITH HourDiff AS (
SELECT *
FROM
(
SELECT
Id
,CAST([Time] AS date) AS [Date]
,[Time]
,[Location]
,COALESCE(lead(DATEPART(HOUR, [Time])) OVER(PARTITION BY [Location], CAST([Time] AS date) ORDER BY [Time] ASC ) - DATEPART(HOUR, [Time]),1)-1 AS leadHourDiff
FROM #Orders
) t1
WHERE t1.leadHourDiff <> 0
)
, CTE AS (
SELECT
Location
,DATEADD(HOUR, leadHourDiff, [Time]) AS missingTime
FROM HourDiff
UNION ALL
SELECT
Location
,DATEADD(HOUR, leadHourDiff - 1, [Time]) AS missingTime
FROM HourDiff
WHERE Time < DATEADD(HOUR, leadHourDiff - 1, [Time])
)
SELECT
Location
,CAST(missingTime AS time) AS missingTime
FROM CTE
ORDER BY Location, missingTime
DROP TABLE #Orders
Final result:
Location missingTime
Boston 14:00:00.000
Boston 15:00:00.000
Philly 14:00:00.000
Philly 16:00:00.000
UPDATE:
here's an update..the final CTE was not working properly when i add new data for new york
new data for new york:
SELECT '2017-01-01 11:00:00', 'New York', 23 UNION ALL
SELECT '2017-01-01 20:00:00', 'New York', 31 UNION ALL
new final CTE:
, CTE AS (
SELECT
Location
,DATEADD(HOUR, leadHourDiff, [Time]) AS missingTime
,[Time]
,leadHourDiff
FROM HourDiff
UNION ALL
SELECT
Location
,DATEADD(HOUR, leadHourDiff - 1 , [Time]) AS missingTime
,[Time]
,leadHourDiff - 1
FROM CTE
WHERE leadHourDiff >= 0
AND Time < DATEADD(HOUR, leadHourDiff - 1, [Time])
)
Final result:
Location missingTime
Boston 14:00:00.0000000
Boston 15:00:00.0000000
New York 12:00:00.0000000
New York 13:00:00.0000000
New York 14:00:00.0000000
New York 15:00:00.0000000
New York 16:00:00.0000000
New York 17:00:00.0000000
New York 18:00:00.0000000
New York 19:00:00.0000000
Philly 14:00:00.0000000
Philly 16:00:00.0000000

Finding duplicate records in a specific date range

I have a table where I have 4 columns
Serial(nvarchar), SID(nvarchar), DateCreated(Date), CID(unique and int)
I want to find the records where there is duplicate serial and SID and where the 2 duplicate serial fall between date range of 180 days.
please help
Sample Data
Serial SID DateCreated CID
02302-25-0036 HONMD01 2017-05-01 00:00:00.000 1
02302-25-0036 HONMD01 2017-05-01 00:00:00.000 3
0264607 HONMD01 2017-05-01 00:00:00.000 65
0264607 HONMD01 2016-05-01 00:00:00.000 45
03118-09-0366 PRIVA00 2016-05-20 00:00:00.000 34
03118-09-0366 PRIVA00 2016-05-20 00:00:00.000 87
0969130 140439 2017-05-09 00:00:00.000 32
0969130 140439 2017-05-09 00:00:00.000 23
1049567 INIIL00 2017-04-12 00:00:00.000 76
create table #Test (Serial nvarchar(20), [SID] nvarchar(10), DateCreated datetime, CID int)
Insert into #Test values ('02302-25-0036', 'HONMD01', '2017-05-01 00:00:00.000', 1)
, ('02302-25-0036', 'HONMD01', '2017-05-01 00:00:00.000', 3)
, ('0264607', 'HONMD01', '2017-05-01 00:00:00.000', 65)
, ('0264607', 'HONMD01', '2016-05-01 00:00:00.000', 45)
, ('03118-09-0366', 'PRIVA00', '2016-05-20 00:00:00.000', 34)
, ('03118-09-0366', 'PRIVA00', '2016-05-20 00:00:00.000', 87)
, ('0969130', '140439', '2017-05-09 00:00:00.000', 32)
, ('0969130', '140439', '2017-05-09 00:00:00.000', 23)
, ('1049567', 'INIIL00', '2017-04-12 00:00:00.000', 76)
select distinct a.*
from
(
select t.*
from #Test t
inner join (
Select Serial, [SID]
from #Test
group by Serial, [SID]
Having count(*)>=2
) d on d.Serial = t.Serial and t.SID= t.SID
) a
full outer join
(
select t.*
from #Test t
inner join (
Select Serial, [SID]
from #Test
group by Serial, [SID]
Having count(*)>=2
) d on d.Serial = t.Serial and t.SID= t.SID
) b on a.Serial = b.Serial and a.SID= b.SID
where datediff(d,a.DateCreated, b.DateCreated)<180
Try to do this:
with cte as (
select
serial,
sid,
dateCreated,
cid,
coalesce(max(dateCreated) over(partition by serial, sid order by cid, dateCreated asc rows between unbounded preceding and 1 preceding), '1900-01-01') as last,
coalesce(min(dateCreated) over(partition by serial, sid order by cid, dateCreated asc rows between 1 following and unbounded following), '5999-01-01') as next
from table_name
)
select *
from cte
where
datediff(day, last, dateCreated) >= 180
and datediff(day, dateCreated, next) >= 180
This was a challenging question ! I have left final output with *(PreviousDate, rno) for easy understanding. Here is my way to solve :
Create table #t(Serial nvarchar(100),SID nvarchar(100),DateCreated date,CID int)
Insert into #t values
('02302-25-0036', 'HONMD01', '2017-05-01 00:00:00.000', 1),
('02302-25-0036', 'HONMD01', '2017-05-01 00:00:00.000', 3),
('0264607', 'HONMD01', '2017-05-01 00:00:00.000', 65),
('0264607', 'HONMD01', '2016-05-01 00:00:00.000', 45),
('03118-09-0366', 'PRIVA00', '2016-05-20 00:00:00.000', 34),
('03118-09-0366', 'PRIVA00', '2016-05-20 00:00:00.000', 87),
('0969130', '140439', '2017-05-09 00:00:00.000', 32),
('0969130', '140439', '2017-05-09 00:00:00.000', 23),
('1049567', 'INIIL00', '2017-04-12 00:00:00.000', 76)
Select iq2.*
FROM
(Select iq.Serial, iq.SID, iq.DateCreated, iq.CID, iq.PreviousDate,
ROW_NUMBER() OVER (PARTITION BY iq.Serial,iq.SID, CASE WHEN DATEDIFF(day, iq.DateCreated, iq.PreviousDate) <= 180 THEN 1 ELSE 0 END
ORDER BY Serial,SID) rno
FROM
(select Serial,SID,DateCreated,CID,
MAX(DateCreated) OVER (PARTITION BY Serial,SID ORDER BY Serial,SID) maxDate,
DATEADD(day,-180,MAX(DateCreated) OVER (PARTITION BY Serial,SID ORDER BY Serial,SID)) PreviousDate
from #t
)iq
)iq2
where iq2.rno <> 1
output :
Serial SID DateCreated CID PreviousDate rno
---------- ------- ---------- ---- ----------- ----
02302-25-0036 HONMD01 2017-05-01 3 2016-11-02 2
03118-09-0366 PRIVA00 2016-05-20 87 2015-11-22 2
0969130 140439 2017-05-09 23 2016-11-10 2
PS : PreviousDate is MAX PreviousDate

Calculate total time worked in a day with multiple stops and starts

I can use DATEDIFF to find the difference between one set of dates like this
DATEDIFF(MINUTE, #startdate, #enddate)
but how would I find the total time span between multiple sets of dates? I don't know how many sets (stops and starts) I will have.
The data is on multiple rows with start and stops.
ID TimeStamp StartOrStop TimeCode
----------------------------------------------------------------
1 2017-01-01 07:00:00 Start 1
2 2017-01-01 08:15:00 Stop 2
3 2017-01-01 10:00:00 Start 1
4 2017-01-01 11:00:00 Stop 2
5 2017-01-01 10:30:00 Start 1
6 2017-01-01 12:00:00 Stop 2
This code would work assuming that your table only store data from one person, and they should be of the order Start/Stop/Start/Stop
WITH StartTime AS (
SELECT
TimeStamp
, ROW_NUMBER() PARTITION BY (ORDER BY TimeStamp) RowNum
FROM
<<table>>
WHERE
TimeCode = 1
), StopTime AS (
SELECT
TimeStamp
, ROW_NUMBER() PARTITION BY (ORDER BY TimeStamp) RowNum
FROM
<<table>>
WHERE
TimeCode = 2
)
SELECT
SUM (DATEDIFF( MINUTE, StartTime.TimeStamp, StopTime.TimeStamp )) As TotalTime
FROM
StartTime
JOIN StopTime ON StartTime.RowNum = StopTime.RowNum
This will work if your starts and stops are reliable. Your sample has two starts in order - 10:00 and 10:30 starts. I assume in production you will have an employee id to group on, so I added this to the sample data in place of the identity column.
Also in production, the CTE sets will be reduced by using a parameter on date. If there are overnight shifts, you would want your stops CTE to use dateadd(day, 1, #startDate) as your upper bound when retrieving end date.
Set up sample:
declare #temp table (
EmpId int,
TimeStamp datetime,
StartOrStop varchar(55),
TimeCode int
);
insert into #temp
values
(1, '2017-01-01 07:00:00', 'Start', 1),
(1, '2017-01-01 08:15:00', 'Stop', 2),
(1, '2017-01-01 10:00:00', 'Start', 1),
(1, '2017-01-01 11:00:00', 'Stop', 2),
(2, '2017-01-01 10:30:00', 'Start', 1),
(2, '2017-01-01 12:00:00', 'Stop', 2)
Query:
;with starts as (
select t.EmpId,
t.TimeStamp as StartTime,
row_number() over (partition by t.EmpId order by t.TimeStamp asc) as rn
from #temp t
where Timecode = 1 --Start time code?
),
stops as (
select t.EmpId,
t.TimeStamp as EndTime,
row_number() over (partition by t.EmpId order by t.TimeStamp asc) as rn
from #temp t
where Timecode = 2 --Stop time code?
)
select cast(min(sub.StartTime) as date) as WorkDay,
sub.EmpId as Employee,
min(sub.StartTime) as ClockIn,
min(sub.EndTime) as ClockOut,
sum(sub.MinutesWorked) as MinutesWorked
from
(
select strt.EmpId,
strt.StartTime,
stp.EndTime,
datediff(minute, strt.StartTime, stp.EndTime) as MinutesWorked
from starts strt
inner join stops stp
on strt.EmpId = stp.EmpId
and strt.rn = stp.rn
)sub
group by sub.EmpId
This works assuming your table has an incremental ID and interleaving start/stop records
--Data sample as provided
declare #temp table (
Id int,
TimeStamp datetime,
StartOrStop varchar(55),
TimeCode int
);
insert into #temp
values
(1, '2017-01-01 07:00:00', 'Start', 1),
(2, '2017-01-01 08:15:00', 'Stop', 2),
(3, '2017-01-01 10:00:00', 'Start', 1),
(4, '2017-01-01 11:00:00', 'Stop', 2),
(5, '2017-01-01 10:30:00', 'Start', 1),
(6, '2017-01-01 12:00:00', 'Stop', 2)
--let's see every pair start/stop and discard stop/start
select start.timestamp start, stop.timestamp stop,
datediff(mi,start.timestamp,stop.timestamp) minutes
from #temp start inner join #temp stop
on start.id+1= stop.id and start.timecode=1
--Sum all for required result
select sum(datediff(mi,start.timestamp,stop.timestamp) ) totalMinutes
from #temp start inner join #temp stop
on start.id+1= stop.id and start.timecode=1
Results
+-------------------------+-------------------------+---------+
| start | stop | minutes |
+-------------------------+-------------------------+---------+
| 2017-01-01 07:00:00.000 | 2017-01-01 08:15:00.000 | 75 |
| 2017-01-01 10:00:00.000 | 2017-01-01 11:00:00.000 | 60 |
| 2017-01-01 10:30:00.000 | 2017-01-01 12:00:00.000 | 90 |
+-------------------------+-------------------------+---------+
+--------------+
| totalMinutes |
+--------------+
| 225 |
+--------------+
Maybe the tricky part is the join clause. We need to join #table with itself by deferring 1 ID. Here is where on start.id+1= stop.id did its work.
In the other hand, for excluding stop/start couple we use start.timecode=1. In case we don't have a column with this information, something like stop.id%2=0 works just fine.

SQL query - Find daily MIN value from hourly sums

Let's cut to the chase. I have a table which looks like this one (using SQL Server 2014):
DEMO:
http://sqlfiddle.com/#!6/75f4a/1/0
CREATE TABLE TAB (
DT datetime,
VALUE float
);
INSERT INTO TAB VALUES
('2015-05-01 06:00:00', 12),
('2015-05-01 06:20:00', 10),
('2015-05-01 06:40:00', 11),
('2015-05-01 07:00:00', 14),
('2015-05-01 07:20:00', 15),
('2015-05-01 07:40:00', 13),
('2015-05-01 08:00:00', 10),
('2015-05-01 08:20:00', 9),
('2015-05-01 08:40:00', 5),
('2015-05-02 06:00:00', 19),
('2015-05-02 06:20:00', 7),
('2015-05-02 06:40:00', 11),
('2015-05-02 07:00:00', 9),
('2015-05-02 07:20:00', 7),
('2015-05-02 07:40:00', 6),
('2015-05-02 08:00:00', 10),
('2015-05-02 08:20:00', 19),
('2015-05-02 08:40:00', 15),
('2015-05-03 06:00:00', 8),
('2015-05-03 06:20:00', 8),
('2015-05-03 06:40:00', 8),
('2015-05-03 07:00:00', 21),
('2015-05-03 07:20:00', 12),
('2015-05-03 07:40:00', 7),
('2015-05-03 08:00:00', 10),
('2015-05-03 08:20:00', 4),
('2015-05-03 08:40:00', 10)
I need to:
sum values hourly
select the smallest 'hourly sum' for each day
select hour for which that sum occurred
In other words, I want to have a table which looks like this:
DATE | SUM VAL | ON HOUR
--------------------------
2015-03-01 | 24 | 8:00
2015-03-02 | 22 | 7:00
2015-03-03 | 24 | 6:00
First two points a very easy (check out sqlfiddle). I have a problem with the third one. I can't just like that select Datepart(HOUR, DT) bacause it has to be aggregated. I was trying to use JOINS and WHERE clause, but with no success (some values may occur in table more than once, which thrown an error).
I'm kinda new with SQL and I got stuck. Need your help SO! :)
One way is to use the set with minimum hourly values as a derived table and join against that. I would do something like this:
;WITH CTE AS (
SELECT Cast(Format(DT, 'yyyy-MM-dd HH:00') AS datetime) AS DT, SUM(VALUE) AS VAL
FROM TAB
GROUP BY Format(DT, 'yyyy-MM-dd HH:00')
)
SELECT b.dt "Date", val "sum val", cast(min(a.dt) as time) "on hour"
FROM cte a JOIN (
SELECT Format(DT,'yyyy-MM-dd') AS DT, MIN(VAL) AS DAILY_MIN
FROM cte HOURLY
GROUP BY Format(DT,'yyyy-MM-dd')
) b ON CAST(a.DT AS DATE) = b.DT and a.VAL = b.DAILY_MIN
GROUP BY b.DT, a.VAL
This would get:
Date sum val on hour
2015-05-01 24 08:00:00.0000000
2015-05-02 22 07:00:00.0000000
2015-05-03 24 06:00:00.0000000
I used min() for the time part as your sample data has the same low value for two separate hour for the 3rd. If you want both then remove the min function from the outer select and the group by. Then you would get:
Date sum val on hour
2015-05-01 24 08:00:00.0000000
2015-05-02 22 07:00:00.0000000
2015-05-03 24 06:00:00.0000000
2015-05-03 24 08:00:00.0000000
I'm sure it can be improved, but you should get the idea.
DECLARE #TAB TABLE
(
DT DATETIME ,
VALUE FLOAT
);
INSERT INTO #TAB
VALUES ( '2015-05-01 06:00:00', 12 ),
( '2015-05-01 06:20:00', 10 ),
( '2015-05-01 06:40:00', 11 ),
( '2015-05-01 07:00:00', 14 ),
( '2015-05-01 07:20:00', 15 ),
( '2015-05-01 07:40:00', 13 ),
( '2015-05-01 08:00:00', 10 ),
( '2015-05-01 08:20:00', 9 ),
( '2015-05-01 08:40:00', 5 ),
( '2015-05-02 06:00:00', 19 ),
( '2015-05-02 06:20:00', 7 ),
( '2015-05-02 06:40:00', 11 ),
( '2015-05-02 07:00:00', 9 ),
( '2015-05-02 07:20:00', 7 ),
( '2015-05-02 07:40:00', 6 ),
( '2015-05-02 08:00:00', 10 ),
( '2015-05-02 08:20:00', 19 ),
( '2015-05-02 08:40:00', 15 ),
( '2015-05-03 06:00:00', 8 ),
( '2015-05-03 06:20:00', 8 ),
( '2015-05-03 06:40:00', 8 ),
( '2015-05-03 07:00:00', 21 ),
( '2015-05-03 07:20:00', 12 ),
( '2015-05-03 07:40:00', 7 ),
( '2015-05-03 08:00:00', 10 ),
( '2015-05-03 08:20:00', 4 ),
( '2015-05-03 08:40:00', 10 );
WITH cteh
AS ( SELECT DT ,
CAST(dt AS DATE) AS D ,
SUM(VALUE) OVER ( PARTITION BY CAST(dt AS DATE),
DATEPART(hh, DT) ) AS S
FROM #TAB
),
ctef
AS ( SELECT * ,
ROW_NUMBER() OVER ( PARTITION BY D ORDER BY S ) AS rn
FROM cteh
)
SELECT D ,
S ,
CAST(DT AS TIME) AS H
FROM ctef
WHERE rn = 1
Output:
D S H
2015-05-01 24 08:00:00.0000000
2015-05-02 22 07:00:00.0000000
2015-05-03 24 06:00:00.0000000
Here's a method that uses a Temp Table (as opposed to the CTE's in the other solutions) to store calculated values and then filters the results to give you your desired output:
-- INSERT CALCULATED GROUPED VALUES INTO TEMP TABLE
SELECT CONVERT(DATE, DT) AS DateVal ,
SUM(VALUE) AS SumVal ,
DATEPART(HOUR, CONVERT(TIME, DT)) AS HourVal
INTO #TEMP_CALC
FROM TAB
GROUP BY CONVERT(DATE, DT) , DATEPART(HOUR, CONVERT(TIME, DT))
-- TAKE THE RELEVANT ROWS
SELECT t.DateVal ,
MIN(t.SumVal) AS SumVal ,
( SELECT TOP 1
HourVal
FROM #TEMP_CALC t2
WHERE t2.DateVal = t.DateVal
AND t2.SumVal = MIN(t.SumVal)
) AS MinHour
FROM #TEMP_CALC t
GROUP BY t.DateVal
ORDER BY DateVal
You can use DATEDIFF to get the time spans from any starting point in time (1990-1-1 in this sample) in hours and days. The use that spans to group and order, and finally use DATEADD with the same starting point to rebuild it:
WITH dates AS (
SELECT CAST(DT AS DATETIME) AS Date, -- cast the value to date
value FROM dbo.TAB AS T
),
ddh AS (SELECT
date,
DATEDIFF(DAY, '1990-1-1', date) AS daySpan, -- days span
DATEDIFF(HOUR, '1990-1-1', date) AS hourSpan, -- hours span
value
FROM dates
),
ddhv AS ( SELECT
daySpan,
hourSpan,
SUM(value) AS sumValues -- sum...
FROM ddh
group BY daySpan, hourSpan -- ...grouped by day & hour
),
ddhvr AS ( SELECT
daySpan,
hourSpan,
sumValues,
-- number rows by hourly sum of the value
ROW_NUMBER() OVER (PARTITION BY daySpan ORDER BY sumValues) AS row
FROM ddhv
)
SELECT
DATEADD(HOUR, hourSpan, '1990-1-1') AS DayHour, -- rebuild the date/hour
sumValues
FROM ddhvr
WHERE row = 1 -- take only the first occurrence for each day
This query has the advantage that you can change the periods, and the starting point easyly. For example you can make your days starts at 6:30 AM instead of at 00:00,so that the compared periods are 6:30 to 7:30, 7:30 to 8:30 and do on. And you can also change the grouping unit, for example, instead of 1 hour it could be half an hour, or 5 minutes or 2 hours. If you need to do do, please, see this SO answer. There you'll see how you can make the grouping by different periods, and get back the period staring point. It's just some simple maths.
I tested my against your fiddle:
with agg as (
select cast(dt as date) as dt, datepart(hh, dt) as hr, sum(VALUE) as sum_val
from TAB
group by cast(dt as date), datepart(hh, dt)
)
select
dt, min(sum_val) as "SUM VAL",
(
select cast(hr as varchar(2)) + ':00' from agg as agg2
where agg2.dt = agg.dt and not exists (
/* select earliest in case of ties */
select 1 from agg as agg3
where agg3.dt = agg2.dt and agg3.sum_val >= agg3.sum_val and agg3.hr > agg2.hr
)
) as "ON HOUR"
from agg
group by dt;