Determine time duration based on events without using loops - sql

I have a table with timestamps of 5 different types of events (start, stopped, restart, aborted, and completed).
The given table looks like this:
Time
EventID
Event
7:38:20
1
start
7:40:20
2
stopped
7:48:20
3
restart
7:50:20
4
aborted
8:00:20
1
start
8:40:20
5
completed
8:58:20
1
start
9:00:15
4
aborted
I would like to determine the following and display it:
Duration of individual Wash --> From (start or restart) to (stopped or aborted or completed)
Duration of Wash Cycle --> From (start) to (aborted or completed)
Duration of total wash time --> Sum of all individual wash in a Wash cycle
Duration of idle time --> Wash Cycle duration - total wash time duration
So the table should look something like the following:
Time
EventID
Event
Duration of individual Wash
Duration of Wash Cycle
Duration of total wash time
Duration of idle time
7:38:20
1
start
0:02:00
0:12:00
0:04:00
0:08:00
7:40:20
2
stopped
NULL
NULL
NULL
NULL
7:48:20
3
restart
0:02:00
NULL
NULL
NULL
7:50:20
4
aborted
NULL
NULL
NULL
NULL
8:00:20
1
start
0:40:00
0:40:00
0:01:55
0:00:00
8:40:20
5
completed
NULL
NULL
NULL
NULL
8:58:20
1
start
0:01:55
0:01:55
0:01:55
0:00:00
9:00:15
4
aborted
NULL
NULL
NULL
NULL
So far I was able to get the duration of individual Wash and the duration of Wash Cycle by joining two table (one with only start, abort, and complete; the other with all events). I am stuck on the last two columns. I'm not sure how to approach this problem efficiently without using a while loop or counter of some sort. Would love some pointers.
Here are my code so far:
SELECT IndivWash.DateTimeStamp as 'Event TimeStamp'
,IndivWash.EventIDNo AS 'Event ID Number'
,IndivWash.EventDesc AS 'Event Description'
-- for the duration of the WASH ----------------------------------------------------
,CASE
WHEN (IndivWash.EventIDNo = '1' OR IndivWash.EventIDNo = '3')
AND (LEAD(IndivWash.EventIDNo) OVER (ORDER BY IndivWash.DateTimeStamp) = '2'
OR LEAD(IndivWash.EventIDNo) OVER (ORDER BY IndivWash.DateTimeStamp) = '4'
OR LEAD(IndivWash.EventIDNo) OVER (ORDER BY IndivWash.DateTimeStamp) = '5')
AND LEAD(IndivWash.EventIDNo) OVER (ORDER BY IndivWash.DateTimeStamp) <> IndivWash.EventIDNo
THEN
DATEDIFF(s, IndivWash.DateTimeStamp, LEAD(IndivWash.DateTimeStamp) OVER (ORDER BY IndivWash.DateTimeStamp))
ELSE
NULL
END AS 'Duration of individual Wash'
-- For the duration of the CYCLE ----------------------------------------------------
,CASE
WHEN WashCycle.EventIDNo = '1'
AND LEAD(WashCycle.EventIDNo) OVER (ORDER BY WashCycle.DateTimeStamp) <> WashCycle.EventIDNo
AND (LEAD(WashCycle.EventIDNo) OVER (ORDER BY WashCycle.DateTimeStamp) = '4' OR
LEAD(WashCycle.EventIDNo) OVER (ORDER BY WashCycle.DateTimeStamp) = '5')
THEN
DATEDIFF(s, WashCycle.DateTimeStamp, LEAD(WashCycle.DateTimeStamp) OVER (ORDER BY WashCycle.DateTimeStamp))
ELSE
NULL
END AS 'Duration of Wash Cycle'
-- ----------------------------------------------------
FROM (
--FROM: table with only start, abort and complete.
-- to differentiate the cycles that are not aborted
SELECT TOP (1000) DateTimeStamp
,EventIDNo
,EventDesc
/*----------CHANGE DATABASE HERE----------*/
FROM Washer.dbo.EventLog_vw
/*----------------------------------------*/
WHERE EventIDNo IN ('1','4','5')
ORDER BY DateTimeStamp
) WashCycle
RIGHT JOIN
(
--FROM: table with all five events
SELECT TOP (1000)
DateTimeStamp
,EventIDNo
,EventDesc
/*----------CHANGE DATABASE HERE----------*/
FROM Washer.dbo.EventLog_vw
/*----------------------------------------*/
WHERE EventIDNo IN ('1','2','3','4','5')
ORDER BY DateTimeStamp
) IndivWash
ON WashCycle.DateTimeStamp=IndivWash.DateTimeStamp

Try this example based on precalculating cycles IDs CycleStartId and CycleRestartId:
SELECT *,
CASE WHEN EventID IN (1, 3) THEN
DATEDIFF(SS,
MIN(Time) OVER (PARTITION BY CycleRestartId),
MAX(Time) OVER (PARTITION BY CycleRestartId)
)
END AS DurIndSec,
CASE WHEN EventID IN (1) THEN
DATEDIFF(SS,
MIN(Time) OVER (PARTITION BY CycleStartId),
MAX(Time) OVER (PARTITION BY CycleStartId)
)
END AS DurSec,
CASE WHEN EventId = 1 THEN
SUM(CASE WHEN EventId = 1 THEN 0 ELSE TimeDiff END) OVER (PARTITION BY CycleStartId)
END AS TotalWashSec,
CASE WHEN EventId = 1 THEN
SUM(COALESCE(StopIdle, 0)) OVER (PARTITION BY CycleStartId)
END AS DurIdleSec
FROM (
SELECT *,
DATEDIFF(SS, LAG(Time, 1, Time) OVER (ORDER BY Time), Time) as TimeDiff,
SUM(CASE WHEN EventID = 1 THEN 1 ELSE 0 END)
OVER (ORDER BY Time) AS CycleStartId,
SUM(CASE WHEN EventID IN (1, 3) THEN 1 ELSE 0 END)
OVER (ORDER BY Time) AS CycleRestartId,
CASE WHEN LAG(EventId, 1, EventId) OVER (ORDER BY Time) = 2 THEN
DATEDIFF(SS, LAG(Time, 1, Time) OVER (ORDER BY Time), Time)
END AS StopIdle
FROM events
) t
Here the reports are shown in seconds. If you need to format them as time, then you can use the following expression:
CONVERT(varchar(8), DATEADD(SS, <Int in seconds>, '0:00:00'), 114)
fiddle

Related

Create partitions based on column values in sql

I am very new to sql and query writing and after alot of trying, I am asking for help.
As shown in the picture, I want to create partition of data based on is_late = 1 and show its count (that is 2) but at the same time want to capture the value of last_status where is_late = 0 to be displayed in the single row.
The task is to calculate how many time the rider was late and time taken by him from first occurrence of estimated time to the last_status.
Desired output:
You can use following query
SELECT
rider_id,
task_created_time,
expected_time_to_arrive,
is_late,
last_status,
task_count,
CONVERT(VARCHAR(5), DATEADD(MINUTE, DATEDIFF(MINUTE, expected_time_to_arrive, last_status), 0), 114) AS time_delayed
FROM
(SELECT
rider_id,
task_created_time,
expected_time_to_arrive,
is_late,
SUM(CASE WHEN is_late = 1 THEN 1 ELSE 0 END) OVER(PARTITION BY rider_id ORDER BY rider_id) AS task_count,
ROW_NUMBER() OVER(PARTITION BY rider_id ORDER BY rider_id) AS num,
MAX(last_status) OVER(PARTITION BY rider_id ORDER BY rider_id) AS last_status
FROM myTestTable) t
WHERE num = 1
db<>fiddle

Calculate the time length of 0-1 sequence with hive

The link 'calculate the time length' has solved the problem which the time length is calculated in the sub-sequency.
The data is like:
time(string) id(int)
201801051127 0
201801051130 0
201801051132 0
201801051135 1
201801051141 1
201801051145 0
201801051147 0
Now I have some questions:
(1) the time length of the first sequence should begin with '201801051100', and end with the start time of next sequency like '201801051135', so the time length of the first sequence is 35;
(2) the time length of the second sequency should begin with the start time of it and end with the start time of next sequency;
(3) the time length of the final sequency should start with the start time of it and end with '201801051200'.
In order to satisfy these three calculation rules as the first sequence,the middle sequences and the final sequence, how to use hive to realize it base on the code written in 'calculate the time length':
with q1 as (
select unix_timestamp(time, 'yyyyMMddHHmm')/60 time, id,
case id when lag(id) over(order by time) then null else 1 end
first_in_group
from t
), q2 as (
select time, id, count(first_in_group) over (order by time) grp_id
from q1
)
select min(id) id, max(time) - min(time) minutes
from q2
group by grp_id
order by grp_id
You can achieve that with some minor modifications to the query:
with q1 as (
select unix_timestamp(time, 'yyyyMMddHHmm')/60 time, id,
case id when lag(id) over(order by time) then null else 1 end
first_in_group
from t
), q2 as (
select time, id
from q1
where first_in_group = 1
)
select id, lead(time, 1, unix_timestamp('201801051200', 'yyyyMMddHHmm')/60)
over (order by time) - time
as minutes
from q2

SQL - select processes that were cancelled with a date

i have a table, which showing statuses of processes ( especially i searching canceled processes), there is no sorting out there. I want to select all of them that they were resume again. I want to do this "sticking a specific date to canceled process and check if there are still other statuses after the cancellation status.
Example:
[id] [moddate] [status]
1 01/01/17 started
1 02/01/17 waiting for signature
1 04/01/17 canceled
1 09/01/17 delivery documents
1 11/01/17 complited <-- I want to select these statuses, (Canceled and then somehow resumed)
I got something like this on start:
SELECT * FROM DATABASE
WHERE APPLICATIONSTATUSSYMBOL LIKE 'CANCELED%'
AND APPLICATIONDATE BETWEEN '17/01/01' AND '17/07/24';
One method for doing this uses window functions:
select d.*
from (select d.*,
max(case when status = 'canceled' then applicationdate end) over (partition by id) as canceldate
from database
where applicationdate between date '2017-01-01' and date '2017-07-24'
) d
where applicationdate > canceldate;

Displaying only the first record for multiple entries in a 5 minute period

I have a query which displays rows in sets, each row set will have two records identified by 1 and 2. it is basically the In and Out Time. Sometimes the user will punch data multiple times. For example, when he is punching 'in', he may punch it multiple times just to make sure, but actually he is supposed to punch it only 1 time. While punching 'out', he may punch it again multiple times just to make sure, but actually he is supposed to punch it only 1 time. The time is captured on each entry. I need to configure the below query to get only the first time entry or record for multiple entries. To be more precise, if there are multiple entries in a 5 minute period, then only the first record to be displayed ignoring the rest in the particular 5 minute period.
SELECT TransactionID, TrDate, Time1, Tr_Serial, Port, UnitNo, UserPIN,Finger,
IP, UnitName, Tr_Description,
CASE WHEN ROW_NUMBER() OVER (PARTITION BY userpin, TrDate
ORDER BY trdate, time1) % 2 = 0
THEN '2'
ELSE '1'
END Tr_Type
FROM (
SELECT row_number() OVER (
ORDER BY datetime) TransactionID, cast([datetime] AS date) TrDate, cast(
[datetime] AS time) Time1, [eventserial] Tr_Serial, '1' Port, [READERID]
UnitNo, [EVENTID] Tr_Type, [USERID] UserPIN, '1' Finger, 'NA' IP, [
READERNAME] UnitName, [EVENTNAME] Tr_Description
FROM [BBC].[dbo].[BBC_LOG]) A
WHERE Tr_Type = 47 OR Tr_Type = 55
Thanks.
SELECT TransactionID, TrDate, MIN(Time1), Tr_Serial, Port, UnitNo, UserPIN,Finger,
IP, UnitName, Tr_Description,
CASE WHEN Min(ROW_NUMBER()) OVER (PARTITION BY userpin, TrDate
ORDER BY trdate, time1) % 2 = 0
THEN '2'
ELSE '1'
END Tr_Type
FROM (
SELECT row_number() OVER (
ORDER BY datetime) TransactionID, cast([datetime] AS date) TrDate, cast(
[datetime] AS time) Time1, [eventserial] Tr_Serial, '1' Port, [READERID]
UnitNo, [EVENTID] Tr_Type, [USERID] UserPIN, '1' Finger, 'NA' IP, [
READERNAME] UnitName, [EVENTNAME] Tr_Description
FROM [BBC].[dbo].[BBC_LOG]) A
WHERE Tr_Type = 47 OR Tr_Type = 55
Try this
not sure what your output is but i know you need Min() on your datetime, to get the first date/ time i.e the first the user clocked in with

Combining records with overlapping date ranges in SQL

**EDIT: Our current server is SQL 2008 R2 so LAG/LEAD functions will not work.
I'm attempting to take multiple streams of data within a table and combine them into 1 stream of data. Given the 3 streams of data below I want the end result to be 1 stream that gives preference to the status 'on'. Recursion seems to be the best option but I've had no luck so far putting together a query that does what i want.
CREATE TABLE #Dates(
id INT IDENTITY,
status VARCHAR(4),
StartDate Datetime,
EndDate Datetime,
booth int)
INSERT #Dates
VALUES
( 'off','2015-01-01 08:00','2015-01-01 08:15',1),
( 'on','2015-01-01 08:15','2015-01-01 09:15',1),
( 'off','2015-01-01 08:50','2015-01-01 09:00',2),
( 'on','2015-01-01 09:00','2015-01-01 09:30',2),
( 'off','2015-01-01 09:30','2015-01-01 09:35',2),
( 'on','2015-01-01 09:35','2015-01-01 10:15',2),
( 'off','2015-01-01 09:30','2015-01-01 10:30',3),
( 'on','2015-01-01 10:30','2015-01-01 11:00',3)
status StartDate EndDate
---------------------------
off 08:00 08:15
on 08:15 09:15
off 08:50 09:00
on 09:00 09:30
off 09:30 09:35
on 09:35 10:15
off 09:30 10:30
on 10:30 11:00
End Result:
status StartDate EndDate
---------------------------
off 8:00 8:15
on 8:15 9:15
on 9:15 9:30
off 9:30 9:35
on 9:35 10:15
off 10:15 10:30
on 10:30 11:00
Essentially, anytime there is a status of 'on' it should override any concurrent 'off' status.
Source:
|----off----||---------on---------|
|---off--||------on----||---off---||--------on------|
|--------------off------------------||------on------|
Result (Either result would work):
|----off----||----------on--------||---on---||---off---||--------on------||-off--||------on------|
|----off----||----------on------------------||---off---||--------on------||-off--||------on------|
Here's the simplest version for 2008 that I was able to figure out:
; with Data (Date) as (
select StartDate from Dates
union
select EndDate from Dates),
Ranges (StartDate, Status) as (
select D.Date, D2.Status
from Data D
outer apply (
select top 1 D2.Status
from Dates D2
where D2.StartDate <= D.Date and D2.EndDate > D.Date
order by case when Status = 'on' then 1 else 2 end
) D2)
select R.StartDate,
(select min(D.Date) from Data D where D.Date > R.StartDate) as EndDate,
Status
from Ranges R
order by R.StartDate
It will return new row starting from each start / end point even if the status is the same as previous. Didn't find any simple way to combine them.
Edit: Changing the first CTE to this will combine the rows:
; with Data (Date) as (
select distinct StartDate from Dates D1
where not exists (Select 1 from Dates D2
where D2.StartDate < D1.StartDate and D2.EndDate > D1.StartDate and
Status = 'on')
union
select distinct EndDate from Dates D1
where not exists (Select 1 from Dates D2
where D2.StartDate < D1.EndDate and D2.EndDate > D1.EndDate and
Status = 'on')
),
So basically every time there's even one "on" record, it is on, otherwise off?
Here's a little different kind of approach to the issue, adding +1 every time an "on" cycle starts, and adding -1 when it ends. Then we can use a running total for the status, and when the status is 0, then it's off, and otherwise it is on:
select Date,
sum(oncounter) over (order by Date) as onstat,
sum(offcounter) over (order by Date) as offstat
from (
select StartDate as Date,
case when status = 'on' then 1 else 0 end oncounter,
case when status = 'off' then 1 else 0 end offcounter
from Dates
union all
select EndDate as Date,
case when status = 'on' then -1 else 0 end oncounter,
case when status = 'off' then -1 else 0 end offcounter
from Dates
) TMP
Edit: Added also counter for off -states. It works the same way as "on" counter and when both are 0, then status is neither on or off.
Final result, it seems it can be done, although it's not looking that nice anymore, but at least it's not recursive :)
select
Date as StartDate,
lead(Date, 1, '21000101') over (order by Date) as EndDate,
case onstat
when 0 then
case when offstat > 0 then 'Off' else 'N/A' end
else 'On' end as State
from (
select
Date,
onstat, prevon,
offstat, prevoff
from (
Select
Date,
onstat,
lag(onstat, 1, 0) over (order by Date) as prevon,
offstat,
lag(offstat, 1, 0) over (order by Date) as prevoff
from (
select
Date,
sum(oncounter) over (order by Date) as onstat,
sum(offcounter) over (order by Date) as offstat
from (
select
StartDate as Date,
case when status = 'on' then 1 else 0 end oncounter,
case when status = 'off' then 1 else 0 end offcounter
from
Dates
union all
select
EndDate as Date,
case when status = 'on' then -1 else 0 end oncounter,
case when status = 'off' then -1 else 0 end offcounter
from
Dates
) TMP
) TMP2
) TMP3
where (onstat = 1 and prevon = 0)
or (onstat = 0 and prevon = 1)
or (onstat = 0 and offstat = 1 and prevoff = 0)
or (onstat = 0 and offstat = 0 and prevoff = 1)
) TMP4
It has quite many derived tables for the window functions and getting only the status changes into the result set so lead can pick up correct dates. It might be possible to get rid of some of them.
SQL Fiddle: http://sqlfiddle.com/#!6/b5cfa/7