Related
I want to calculate the total time spent in a week (the start of the week will be from a given date).
Here the given date is 2020-06-23 15:30:00. And the next week will start from 7 days after.
The duration of the activity will be calculated by the time gap between two rows for the same id if only the second is less than an hour after the first.
select t.UserName,
1 + datediff(second, '2020-06-23 15:30:00', CompletedOn) / (24 * 60 * 60 * 7) as week_num,
sum(datediff(minute, CompletedOn, next_ts)) as duration_minutes
from (select t.*,
lead(CompletedOn) over (partition by UserName order by CompletedOn) as next_ts
from #Results t
where t.CompletedOn >= '2020-06-23 15:30:00'
) t
where datediff(minute, CompletedOn, next_ts) < 60 and CompletedOn >='2020-06-23 15:30:00' and t.UserName = 'John B'
group by t.UserName, datediff(second, '2020-06-23 15:30:00', CompletedOn) / (24 * 60 * 60 * 7)
order by t.UserName, week_num;
The above query doesn't consider displaying the week_num if there is no entry for the date in the week, so it display the result as:
UserName | week_num | duration_minutes
---------------|----------|------------------
John B | 1 | 38
John B | 2 | 10
John B | 3 | 0
John B | 5 | 0
However, I wanted the output as all the week number that falls up to the last date in the record.
UserName | week_num | duration_minutes
---------------|----------|------------------
John B | 1 | 38
John B | 2 | 10
John B | 3 | 0
John B | 4 | 0
John B | 5 | 0
Some of the sample data:
IF OBJECT_ID('tempdb..#Results') IS NOT NULL
Truncate TABLE #Results
else
CREATE TABLE #Results
(
UserName varchar(20) not null,
CompletedOn datetime not null
)
INSERT INTO #Results (UserName, CompletedOn)
SELECT 'John B', '2020-06-23T15:30:00'
INSERT INTO #Results (UserName, CompletedOn)
SELECT 'John B', '2020-06-23T15:31:00'
--1 min
INSERT INTO #Results (UserName, CompletedOn)
SELECT 'John B', '2020-06-30T12:57:00'
INSERT INTO #Results (UserName, CompletedOn)
SELECT 'John B', '2020-06-30T13:06:00'
INSERT INTO #Results (UserName, CompletedOn)
SELECT 'John B', '2020-06-30T13:34:00'
--37 min
INSERT INTO #Results (UserName, CompletedOn)
SELECT 'John B', '2020-06-30 15:31:00'
INSERT INTO #Results (UserName, CompletedOn)
SELECT 'John B', '2020-06-30 15:33:00'
INSERT INTO #Results (UserName, CompletedOn)
SELECT 'John B', '2020-06-30 15:41:00'
INSERT INTO #Results (UserName, CompletedOn)
SELECT 'John B', '2020-07-06 08:41:00'
INSERT INTO #Results (UserName, CompletedOn)
SELECT 'John B', '2020-07-07 14:29:00'
INSERT INTO #Results (UserName, CompletedOn)
SELECT 'John B', '2020-07-09 15:22:00'
INSERT INTO #Results (UserName, CompletedOn)
SELECT 'John B', '2020-07-09 16:23:00'
INSERT INTO #Results (UserName, CompletedOn)
SELECT 'John B', '2020-07-21 15:34:00'
INSERT INTO #Results (UserName, CompletedOn)
SELECT 'John B', '2020-07-21 17:00:00'
INSERT INTO #Results (UserName, CompletedOn)
SELECT 'John B', '2020-07-09 15:22:00'
INSERT INTO #Results (UserName, CompletedOn)
SELECT 'John B', '2020-07-09 16:23:00'
INSERT INTO #Results (UserName, CompletedOn)
SELECT 'John B', '2020-07-21 15:34:00'
INSERT INTO #Results (UserName, CompletedOn)
SELECT 'John B', '2020-07-21 17:00:00'
INSERT INTO #Results (UserName, CompletedOn)
SELECT 'John B', '2020-07-21 17:00:00'
INSERT INTO #Results (UserName, CompletedOn)
SELECT 'John B', '2020-07-23 06:34:00'
INSERT INTO #Results (UserName, CompletedOn)
SELECT 'John B', '2020-07-23 08:28:00'
INSERT INTO #Results (UserName, CompletedOn)
SELECT 'John B', '2020-07-23 08:28:00'
Db Fiddle
Consider joining to a recursive CTE that generates pair matches of UserName and all subsequent week_num to a defined end point. Below uses 10 but can extend to say 52.
WITH pairs AS (
SELECT DISTINCT UserName, 1 AS week_num
FROM #Results
UNION ALL
SELECT UserName, week_num + 1
FROM pairs
WHERE week_num < 10 -- ADJUST ## AS NEEDED
), sub AS (
SELECT t.UserName
, t.CompletedOn
, LEAD(CompletedOn) OVER (PARTITION BY t.UserName ORDER BY t.CompletedOn) as next_ts
FROM #Results t
WHERE t.CompletedOn >= '2020-06-23 15:30:00'
), main AS (
SELECT sub.UserName
, 1 + DATEDIFF(SECOND, '2020-06-23 15:30:00', sub.CompletedOn) / (24 * 60 * 60 * 7) AS week_num
, SUM(DATEDIFF(MINUTE, sub.CompletedOn, sub.next_ts)) AS duration_minutes
FROM sub
WHERE DATEDIFF(MINUTE, sub.CompletedOn, sub.next_ts) < 60
AND sub.CompletedOn >='2020-06-23 15:30:00'
AND sub.UserName = 'John B'
GROUP BY sub.UserName
, DATEDIFF(SECOND, '2020-06-23 15:30:00', sub.CompletedOn) / (24 * 60 * 60 * 7)
)
SELECT pairs.UserName
, pairs.week_num
, ISNULL(main.duration_minutes, 0) AS duration_minutes
FROM pairs
LEFT JOIN main
ON pairs.UserName = main.UserName
AND pairs.week_num = main.week_num
OPTION (MAXRECURSION 0);
Online Demo
| | UserName | week_num | duration_minutes |
|----|----------|----------|------------------|
| 1 | John B | 1 | 38 |
| 2 | John B | 2 | 10 |
| 3 | John B | 3 | 0 |
| 4 | John B | 4 | 0 |
| 5 | John B | 5 | 0 |
| 6 | John B | 6 | 0 |
| 7 | John B | 7 | 0 |
| 8 | John B | 8 | 0 |
| 9 | John B | 9 | 0 |
| 10 | John B | 10 | 0 |
I'm trying to calculate the average turnover time of a piece of equipment in REPAIR status.
I was able to create a query containing a list of equipments with their snapshotted status on each day.
+-----------------+--------------+--------+----------------------+------------+------------------+
| equipmentNumber | snapshotDate | status | previousSnapshotDate | prevStatus | statusChangeFlag |
+-----------------+--------------+--------+----------------------+------------+------------------+
| 123456 | 2018-04-29 | ONHIRE | 2018-04-28 | AVAILABLE | 1 |
| 123456 | 2018-04-30 | ONHIRE | 2018-04-29 | ONHIRE | 0 |
| 123456 | 2018-05-01 | ONHIRE | 2018-04-30 | ONHIRE | 0 |
| 123456 | 2018-05-02 | REPAIR | 2018-05-01 | ONHIRE | 1 |
| 123456 | 2018-05-03 | REPAIR | 2018-05-02 | REPAIR | 0 |
| 123456 | 2018-05-04 | ONHIRE | 2018-05-03 | REPAIR | 1 |
| 654321 | 2018-04-30 | REPAIR | 2018-04-29 | AVAILABLE | 1 |
| 654321 | 2018-05-01 | REPAIR | 2018-04-30 | REPAIR | 0 |
| 654321 | 2018-05-02 | REPAIR | 2018-05-01 | REPAIR | 0 |
+-----------------+--------------+--------+----------------------+------------+------------------+
So, in this example, we have 2 equipments, "123456" was in REPAIR status 2 days on 5/2 and 5/3, and "654321" was in REPAIR status 3 days on 4/30, 5/1, and 5/2. That would be an average repair turnaround time of (2+3) / 2 = 2.5 days.
I tried this algorithm (Detect consecutive dates ranges using SQL) but it doesn't seem to be quite working for my needs.
I attempt to answer Gaps and Islands using an Incrementing ID column, create one if one doesn't exist, and the ROW_NUMBER window function
CREATE TABLE T1
([equipmentNumber] int, [snapshotDate] datetime, [status] varchar(6), [previousSnapshotDate] datetime, [prevStatus] varchar(9), [statusChangeFlag] int)
;
INSERT INTO T1
([equipmentNumber], [snapshotDate], [status], [previousSnapshotDate], [prevStatus], [statusChangeFlag])
VALUES
(123456, '2018-04-29 00:00:00', 'ONHIRE', '2018-04-28 00:00:00', 'AVAILABLE', 1),
(123456, '2018-04-30 00:00:00', 'ONHIRE', '2018-04-29 00:00:00', 'ONHIRE', 0),
(123456, '2018-05-01 00:00:00', 'ONHIRE', '2018-04-30 00:00:00', 'ONHIRE', 0),
(123456, '2018-05-02 00:00:00', 'REPAIR', '2018-05-01 00:00:00', 'ONHIRE', 1),
(123456, '2018-05-03 00:00:00', 'REPAIR', '2018-05-02 00:00:00', 'REPAIR', 0),
(123456, '2018-05-04 00:00:00', 'ONHIRE', '2018-05-03 00:00:00', 'REPAIR', 1),
(654321, '2018-04-30 00:00:00', 'REPAIR', '2018-04-29 00:00:00', 'AVAILABLE', 1),
(654321, '2018-05-01 00:00:00', 'REPAIR', '2018-04-30 00:00:00', 'REPAIR', 0),
(654321, '2018-05-02 00:00:00', 'REPAIR', '2018-05-01 00:00:00', 'REPAIR', 0)
;
;WITH cteX
AS(
SELECT
Id = ROW_NUMBER()OVER(ORDER BY T.equipmentNumber, T.snapshotDate)
,T.equipmentNumber
,T.snapshotDate
,T.[status]
,T.previousSnapshotDate
,T.prevStatus
,T.statusChangeFlag
FROM dbo.T1 T
),cteIsland
AS(
SELECT
Island = X.Id - ROW_NUMBER()OVER(ORDER BY X.Id)
,*
FROM cteX X
WHERE X.[status] = 'REPAIR'
)
SELECT * FROM cteIsland
Note the Island Column
Island Id equipmentNumber status
3 4 123456 REPAIR
3 5 123456 REPAIR
4 7 654321 REPAIR
4 8 654321 REPAIR
4 9 654321 REPAIR
Using the Island Column you can get the answer you need with this TSQL
;WITH cteX
AS(
SELECT
Id = ROW_NUMBER()OVER(ORDER BY T.equipmentNumber, T.snapshotDate)
,T.equipmentNumber
,T.snapshotDate
,T.[status]
,T.previousSnapshotDate
,T.prevStatus
,T.statusChangeFlag
FROM dbo.T1 T
),cteIsland
AS(
SELECT
Island = X.Id - ROW_NUMBER()OVER(ORDER BY X.Id)
,*
FROM cteX X
WHERE X.[status] = 'REPAIR'
)
SELECT
AvgDuration =SUM(Totals.IslandCounts) / (COUNT(Totals.IslandCounts) * 1.0)
FROM
(
SELECT
IslandCounts = COUNT(I.Island)
,I.equipmentNumber
FROM cteIsland I
GROUP BY I.equipmentNumber
) Totals
Answer
AvgDuration
2.50000000000000
Here's the SQLFiddle
That method should work to identify the repair periods:
select equipmentNumber, min(snapshotDate), max(snapshotDate)
from (select t.*,
row_number() over (partition by equipmentNumber order by snapshotDate) as seqnum
from t
) t
where status = 'REPAIR'
group by equipmentNumber, dateadd(day, - seqnum, snapshotDate);
You can get the average using a subquery:
select avg(datediff(day, minsd, maxsd) * 1.0)
from (select equipmentNumber, min(snapshotDate) as minsd, max(snapshotDate) as maxsd
from (select t.*,
row_number() over (partition by equipmentNumber order by snapshotDate) as seqnum
from t
) t
where status = 'REPAIR'
group by equipmentNumber, dateadd(day, - seqnum, snapshotDate)
) e;
I have two tables:
First table has name, date, time and intraday price variables. It means there is an intraday price for each name in a specific date and time.
Second table has name, date and daily price and the daily price is intraday price aggregation for each name and date.
I try to write a program which performs the procedure below:
It can find same observations by name and date in two tables and then:
If first and last intraday price is out of 0.962 and 1.0398 times of daily price in last day; then delete all data related to that specific name and date in table 1.
The statement is:
IF first AND last (intraday price for specific name & date) NOT IN [0.962*(daily price of yesterday), 1.0398*(daily price of yesterday)] THEN DELETE.
For instance, consider two tables which are below:
data WORK.TABLE1;
infile datalines dsd truncover;
input name:$3. date:DATE9. time:TIME8. intraday_price:32.;
format date DATE9. time TIME8.;
label name="name" date="date" time="time" intraday_price="intraday price";
datalines4;
A,07MAY2008,11:32:41,3
A,07MAY2008,12:32:41,2
A,07MAY2008,13:32:41,1
A,08MAY2008,11:32:41,3.95
A,08MAY2008,12:32:41,3
A,08MAY2008,13:32:41,6
A,08MAY2008,14:32:41,4.01
B,07MAY2008,11:32:41,3.1
B,07MAY2008,12:32:41,1
B,07MAY2008,13:32:41,4
B,07MAY2008,14:32:41,2.9
B,08MAY2008,11:32:41,6
B,08MAY2008,12:32:41,1
B,09MAY2008,11:32:41,5
B,09MAY2008,12:32:41,7
C,07MAY2008,11:32:41,3
C,07MAY2008,12:32:41,2
C,08MAY2008,11:32:41,6.1
C,08MAY2008,12:32:41,3
C,08MAY2008,13:32:41,2
C,09MAY2008,11:32:41,8
C,09MAY2008,12:32:41,2
C,09MAY2008,13:32:41,3
C,09MAY2008,14:32:41,2
;;;;
And the table 2 is:
data WORK.TABLE2;
infile datalines dsd truncover;
input name:$3. date:DATE9. daily_price:32.;
format date DATE9.;
label name="name" date="date" daily_price="daily price";
datalines4;
A,05MAY2008,3
B,05MAY2008,6
C,05MAY2008,5
A,06MAY2008,5
A,07MAY2008,4
B,06MAY2008,3
B,07MAY2008,4
B,08MAY2008,3
C,06MAY2008,7
C,07MAY2008,6
C,08MAY2008,5
;;;;
Please consider that the daily price of yesterday should be used in formula.
So the result is:
+------+----------+----------+----------------+
| name | date | time | intraday price |
+------+----------+----------+----------------+
| B | 7-May-08 | 11:32:41 | 3.1 |
| B | 7-May-08 | 12:32:41 | 1 |
| B | 7-May-08 | 13:32:41 | 4 |
| B | 7-May-08 | 14:32:41 | 2.9 |
| A | 8-May-08 | 11:32:41 | 3.95 |
| A | 8-May-08 | 12:32:41 | 3 |
| A | 8-May-08 | 13:32:41 | 6 |
| A | 8-May-08 | 14:32:41 | 4.01 |
| C | 8-May-08 | 11:32:41 | 6.1 |
| C | 8-May-08 | 12:32:41 | 3 |
| C | 8-May-08 | 13:32:41 | 2 |
+------+----------+----------+----------------+
Would you please tell me how I can do that?
Thanks in advance.
Based on Shmuel and KurtBremser work in SAS community, the result is:
proc sort data=table1; by name date time; run;
proc sort data=table2; by name date; run;
proc sql;
create table table3 as
select * from table1, table2
where table1.name=table2.name and table1.date=table2.date;
quit;
data table2_new;
set table2;
by name;
/* save price of yesterday */
lag_Price = lag(Price);
if first.name then lag_Price = .;
run;
data to_delete(keep = name date);
merge table3 (in=in1)
table2_new (in=in2);
by name date;
retain start_price last_price;
if in1 and in2; /* deal with obs on both tables only */
if first.date then start_price = intradayprice;
if last.date then last_price = intradayprice;
if last.date then do;
min_price = 0.962 * lag_Price;
max_price = 1.0398 * lag_Price;
if not (min_price le start_price le max_price) and not (min_price le last_price le max_price)
then output;
end;
run;
data want;
merge table3 /* table2 */
to_delete (in=indel);
by name date;
if not indel;
run;
SAS Community
This will identify the rows you do not want:
select t1.*
from table1 t1
join table2 t2 on t1.name = t2.name and t1.date = t2.date
where (t1.intraday_price < (t2.daily_price*0.962)
or t1.intraday_price > (t2.daily_price*1.0398)
)
If you place that inside a subquery and then test for EXISTS in that subquery, you are identifying the rows you do not want.
Demo at: SQL Fiddle
CREATE TABLE Table1
([name] varchar(1), [date] datetime, [time] varchar(8), [intraday_price] decimal(12,2))
;
INSERT INTO Table1
([name], [date], [time], [intraday_price])
VALUES
('A', '2008-05-07 00:00:00', '11:32:41', 3),
('A', '2008-05-07 00:00:00', '12:32:41', 2),
('A', '2008-05-07 00:00:00', '13:32:41', 1),
('A', '2008-05-08 00:00:00', '11:32:41', 3.95),
('A', '2008-05-08 00:00:00', '12:32:41', 3),
('A', '2008-05-08 00:00:00', '13:32:41', 6),
('A', '2008-05-08 00:00:00', '14:32:41', 4.01),
('B', '2008-05-07 00:00:00', '11:32:41', 3.1),
('B', '2008-05-07 00:00:00', '12:32:41', 1),
('B', '2008-05-07 00:00:00', '13:32:41', 4),
('B', '2008-05-07 00:00:00', '14:32:41', 2.9),
('B', '2008-05-08 00:00:00', '11:32:41', 6),
('B', '2008-05-08 00:00:00', '12:32:41', 1),
('B', '2008-05-09 00:00:00', '11:32:41', 5),
('B', '2008-05-09 00:00:00', '12:32:41', 7),
('C', '2008-05-07 00:00:00', '11:32:41', 3),
('C', '2008-05-07 00:00:00', '12:32:41', 2),
('C', '2008-05-08 00:00:00', '11:32:41', 6.1),
('C', '2008-05-08 00:00:00', '12:32:41', 3),
('C', '2008-05-08 00:00:00', '13:32:41', 2),
('C', '2008-05-09 00:00:00', '11:32:41', 8),
('C', '2008-05-09 00:00:00', '12:32:41', 2),
('C', '2008-05-09 00:00:00', '13:32:41', 3),
('C', '2008-05-09 00:00:00', '14:32:41', 2)
;
CREATE TABLE Table2
([name] varchar(1), [date] datetime, [daily_price] decimal(12,2))
;
INSERT INTO Table2
([name], [date], [daily_price])
VALUES
('A', '2008-05-05 00:00:00', 3),
('B', '2008-05-05 00:00:00', 6),
('C', '2008-05-05 00:00:00', 5),
('A', '2008-05-06 00:00:00', 5),
('A', '2008-05-07 00:00:00', 4),
('B', '2008-05-06 00:00:00', 3),
('B', '2008-05-07 00:00:00', 4),
('B', '2008-05-08 00:00:00', 3),
('C', '2008-05-06 00:00:00', 7),
('C', '2008-05-07 00:00:00', 6),
('C', '2008-05-08 00:00:00', 5)
;
Query 1:
with cte as (
select
*
from Table1
where exists (
select NULL
from table1 t1
join table2 t2 on t1.name = t2.name and t1.date = t2.date
where (t1.intraday_price < (t2.daily_price*0.962)
or t1.intraday_price > (t2.daily_price*1.0398)
)
and table1.name = t1.name and table1.date = t1.date and table1.time = t1.time
)
)
delete
from cte
;
select * from table1
Results:
| name | date | time | intraday_price |
|------|----------------------|----------|----------------|
| A | 2008-05-08T00:00:00Z | 11:32:41 | 3.95 |
| A | 2008-05-08T00:00:00Z | 12:32:41 | 3 |
| A | 2008-05-08T00:00:00Z | 13:32:41 | 6 |
| A | 2008-05-08T00:00:00Z | 14:32:41 | 4.01 |
| B | 2008-05-07T00:00:00Z | 13:32:41 | 4 |
| B | 2008-05-09T00:00:00Z | 11:32:41 | 5 |
| B | 2008-05-09T00:00:00Z | 12:32:41 | 7 |
| C | 2008-05-09T00:00:00Z | 11:32:41 | 8 |
| C | 2008-05-09T00:00:00Z | 12:32:41 | 2 |
| C | 2008-05-09T00:00:00Z | 13:32:41 | 3 |
| C | 2008-05-09T00:00:00Z | 14:32:41 | 2 |
Rather than delete from source tables, create a new dataset filtered for the required records. Specifically, consider an exists subquery that selects records according to needed logic.
Below uses a self-join on table1 to align min and max time records within same name and date into one resultset and checks row-wise both intraday_price if they fall within price range.
proc sql;
create table newtable as
select *
from work.table1 main
where exists(
select 1
from work.table1 m1
inner join work.table1 m2
on m1.name = m2.name and m1.date = m2.date
inner join work.table2 t2
on m1.name = t2.name and m1.date = intnx("day", t2.date, -1)
inner join
(select t.name, t.date, min(t.time) as min_time, max(t.time) as max_time
from work.table1 t
group by t.name, t.date
) agg
on m1.name = agg.name and m1.date = agg.date
and m1.time = agg.min_time and m2.time = agg.max_time
where m1.intraday_price between (0.962 * t2.daily_price) and (1.0398 * t2.daily_price)
and m2.intraday_price between (0.962 * t2.daily_price) and (1.0398 * t2.daily_price)
and main.name = m1.name and main.date = m1.date);
quit;
I can use DATEDIFF to find the difference between one set of dates like this
DATEDIFF(MINUTE, #startdate, #enddate)
but how would I find the total time span between multiple sets of dates? I don't know how many sets (stops and starts) I will have.
The data is on multiple rows with start and stops.
ID TimeStamp StartOrStop TimeCode
----------------------------------------------------------------
1 2017-01-01 07:00:00 Start 1
2 2017-01-01 08:15:00 Stop 2
3 2017-01-01 10:00:00 Start 1
4 2017-01-01 11:00:00 Stop 2
5 2017-01-01 10:30:00 Start 1
6 2017-01-01 12:00:00 Stop 2
This code would work assuming that your table only store data from one person, and they should be of the order Start/Stop/Start/Stop
WITH StartTime AS (
SELECT
TimeStamp
, ROW_NUMBER() PARTITION BY (ORDER BY TimeStamp) RowNum
FROM
<<table>>
WHERE
TimeCode = 1
), StopTime AS (
SELECT
TimeStamp
, ROW_NUMBER() PARTITION BY (ORDER BY TimeStamp) RowNum
FROM
<<table>>
WHERE
TimeCode = 2
)
SELECT
SUM (DATEDIFF( MINUTE, StartTime.TimeStamp, StopTime.TimeStamp )) As TotalTime
FROM
StartTime
JOIN StopTime ON StartTime.RowNum = StopTime.RowNum
This will work if your starts and stops are reliable. Your sample has two starts in order - 10:00 and 10:30 starts. I assume in production you will have an employee id to group on, so I added this to the sample data in place of the identity column.
Also in production, the CTE sets will be reduced by using a parameter on date. If there are overnight shifts, you would want your stops CTE to use dateadd(day, 1, #startDate) as your upper bound when retrieving end date.
Set up sample:
declare #temp table (
EmpId int,
TimeStamp datetime,
StartOrStop varchar(55),
TimeCode int
);
insert into #temp
values
(1, '2017-01-01 07:00:00', 'Start', 1),
(1, '2017-01-01 08:15:00', 'Stop', 2),
(1, '2017-01-01 10:00:00', 'Start', 1),
(1, '2017-01-01 11:00:00', 'Stop', 2),
(2, '2017-01-01 10:30:00', 'Start', 1),
(2, '2017-01-01 12:00:00', 'Stop', 2)
Query:
;with starts as (
select t.EmpId,
t.TimeStamp as StartTime,
row_number() over (partition by t.EmpId order by t.TimeStamp asc) as rn
from #temp t
where Timecode = 1 --Start time code?
),
stops as (
select t.EmpId,
t.TimeStamp as EndTime,
row_number() over (partition by t.EmpId order by t.TimeStamp asc) as rn
from #temp t
where Timecode = 2 --Stop time code?
)
select cast(min(sub.StartTime) as date) as WorkDay,
sub.EmpId as Employee,
min(sub.StartTime) as ClockIn,
min(sub.EndTime) as ClockOut,
sum(sub.MinutesWorked) as MinutesWorked
from
(
select strt.EmpId,
strt.StartTime,
stp.EndTime,
datediff(minute, strt.StartTime, stp.EndTime) as MinutesWorked
from starts strt
inner join stops stp
on strt.EmpId = stp.EmpId
and strt.rn = stp.rn
)sub
group by sub.EmpId
This works assuming your table has an incremental ID and interleaving start/stop records
--Data sample as provided
declare #temp table (
Id int,
TimeStamp datetime,
StartOrStop varchar(55),
TimeCode int
);
insert into #temp
values
(1, '2017-01-01 07:00:00', 'Start', 1),
(2, '2017-01-01 08:15:00', 'Stop', 2),
(3, '2017-01-01 10:00:00', 'Start', 1),
(4, '2017-01-01 11:00:00', 'Stop', 2),
(5, '2017-01-01 10:30:00', 'Start', 1),
(6, '2017-01-01 12:00:00', 'Stop', 2)
--let's see every pair start/stop and discard stop/start
select start.timestamp start, stop.timestamp stop,
datediff(mi,start.timestamp,stop.timestamp) minutes
from #temp start inner join #temp stop
on start.id+1= stop.id and start.timecode=1
--Sum all for required result
select sum(datediff(mi,start.timestamp,stop.timestamp) ) totalMinutes
from #temp start inner join #temp stop
on start.id+1= stop.id and start.timecode=1
Results
+-------------------------+-------------------------+---------+
| start | stop | minutes |
+-------------------------+-------------------------+---------+
| 2017-01-01 07:00:00.000 | 2017-01-01 08:15:00.000 | 75 |
| 2017-01-01 10:00:00.000 | 2017-01-01 11:00:00.000 | 60 |
| 2017-01-01 10:30:00.000 | 2017-01-01 12:00:00.000 | 90 |
+-------------------------+-------------------------+---------+
+--------------+
| totalMinutes |
+--------------+
| 225 |
+--------------+
Maybe the tricky part is the join clause. We need to join #table with itself by deferring 1 ID. Here is where on start.id+1= stop.id did its work.
In the other hand, for excluding stop/start couple we use start.timecode=1. In case we don't have a column with this information, something like stop.id%2=0 works just fine.
I am using Postgres 9.3.3
I have a table with multiple events, two of them are "AVAILABLE" and "UNAVAILABLE". These events are assigned to a specific object. There are also other object ids in this table (removed for clarity):
What I need is the "available" time per day, something like that:
SQL Fiddle
select
object_id, day,
sum(upper(available) - lower(available)) as available
from (
select
g.object_id, date_trunc('day', d) as day,
(
available *
tsrange(date_trunc('day', d), date_trunc('day', d)::date + 1, '[)')
) as available
from
(
select
object_id, event,
tsrange(
timestamp,
lead(timestamp) over(
partition by object_id order by timestamp
),
'[)'
) as available
from events
where event in ('AVAILABLE', 'UNAVAILABLE')
) s
right join
(
generate_series(
(select min(timestamp) from events),
(select max(timestamp) from events),
'1 day'
) g (d)
cross join
(select distinct object_id from events) s
) g on
tsrange(date_trunc('day', d), date_trunc('day', d)::date + 1, '[)') && available and
(event = 'AVAILABLE' or event is null) and
g.object_id = s.object_id
) s
group by 1, 2
order by 1, 2
psql output
object_id | day | available
-----------+---------------------+-----------
1 | 1970-01-02 00:00:00 | 12:00:00
1 | 1970-01-03 00:00:00 | 12:00:00
1 | 1970-01-04 00:00:00 |
1 | 1970-01-05 00:00:00 | 1 day
1 | 1970-01-06 00:00:00 | 1 day
1 | 1970-01-07 00:00:00 | 12:00:00
Table DDL
create table events (
object_id int,
event text,
timestamp timestamp
);
insert into events (object_id, event, timestamp) values
(1, 'AVAILABLE', '1970-01-02 12:00:00'),
(1, 'UNAVAILABLE', '1970-01-03 12:00:00'),
(1, 'AVAILABLE', '1970-01-05 00:00:00'),
(1, 'UNAVAILABLE', '1970-01-07 12:00:00');
Your example output suggests that you want all your objects to be returned, but grouped. If that is the case, this query can do that
select object_id, day, sum(upper(tsrange) - lower(tsrange))
from (
select object_id, date(day) as day, e.tsrange * tsrange(day, day + interval '1' day) tsrange
from generate_series(timestamp '1970-01-01', '1970-01-07', interval '1' day) day
left join (
select object_id,
case event
when 'AVAILABLE' then tsrange(timestamp, lead(timestamp) over (partition by object_id order by timestamp))
else null
end tsrange
from events
where event in ('AVAILABLE', 'UNAVAILABLE')
) e on e.tsrange && tsrange(day, day + interval '1' day)
) d
group by object_id, day
order by day, object_id
But that will output something like that (if you have multiple object_ids):
object_id | day | sum
-----------+--------------+-----------
| '1970-01-01' |
1 | '1970-01-02' | '12:00:00'
1 | '1970-01-03' | '12:00:00'
| '1970-01-04' |
1 | '1970-01-05' | '1 day'
1 | '1970-01-06' | '1 day'
2 | '1970-01-06' | '12:00:00'
1 | '1970-01-07' | '12:00:00'
In my opinion it would make much more sense, if you would query just one object at a time:
select day, sum(upper(tsrange) - lower(tsrange))
from (
select date(day) as day, e.tsrange * tsrange(day, day + interval '1' day) tsrange
from generate_series(timestamp '1970-01-01', '1970-01-07', interval '1' day) day
left join (
select case event
when 'AVAILABLE' then tsrange(timestamp, lead(timestamp) over (partition by object_id order by timestamp))
else null
end tsrange
from events
where event in ('AVAILABLE', 'UNAVAILABLE')
and object_id = 1
) e on e.tsrange && tsrange(day, day + interval '1' day)
) d
group by day
order by day
This will output something, like:
day | sum
--------------+----------
'1970-01-01' |
'1970-01-02' | '12:00:00'
'1970-01-03' | '12:00:00'
'1970-01-04' |
'1970-01-05' | '1 day'
'1970-01-06' | '1 day'
'1970-01-07' | '12:00:00'
I used this schema/data for my outputs:
create table events (
object_id int,
event text,
timestamp timestamp
);
insert into events (object_id, event, timestamp)
values (1, 'AVAILABLE', '1970-01-02 12:00:00'),
(1, 'UNAVAILABLE', '1970-01-03 12:00:00'),
(1, 'AVAILABLE', '1970-01-05 00:00:00'),
(1, 'UNAVAILABLE', '1970-01-07 12:00:00'),
(2, 'AVAILABLE', '1970-01-06 00:00:00'),
(2, 'UNAVAILABLE', '1970-01-06 06:00:00'),
(2, 'AVAILABLE', '1970-01-06 12:00:00'),
(2, 'UNAVAILABLE', '1970-01-06 18:00:00');
This is a partial answer. If we assume that the next event after available is unavailable, then lead() comes to the rescue and the following is a start:
select object_id, to_char(timestamp, 'YYYY-MM-DD') as day,
to_char(nextts - timestamp, 'HH24:MI') as interval
from (select t.*,
lead(timestamp) over (partition by object_id order by timestamp) as nextts
from table t
where event in ('AVAILABLE', 'UNAVAILABLE')
) t
where event = 'AVAILABLE'
group by object_id, to_char(timestamp, 'YYYY-MM-DD');
I suspect, though, that when the interval spans multiple days, you want to split the days into separate parts. This becomes more of a challenge.