select overlapping datetime events with SQL - sql

I have a SQL table Events (ID int, Event int, StartTime datetime, Duration int).
Event is event code (1=system running, 2=break)
Duration is the amount of seconds that the event was active.
I'd like to get the amount of seconds that event 1 was active, but subtract the duration of event 2.
E.g. event 1 was from 1:00 to 6:00, event 2 from 0:00 to 2:00 and event 2 from 5:00 to 6:00. The total time should be from 2:00 to 5:00 -> 3 hours.
There is a way I can think of: for each event 1 find all events 2 that can intersect with event 1, and for each event 2 in that set: trim its duration to get only the part that was active during its event 1.
e.g. for my event 1 (1:00 - 6:00) I'll find event 2 (0:00 - 2:00), get only the part that interests me (1:00-2:00); find another event 2(5:00-6:00), get the part that interests me (it's whole event 5:00-6:00) - that summed up are two hours. The total time of event 1 was 5 hours; 5 hrs - 2 hrs (event 2) is 3 hours.
But this won't work if there are thousands of events in the specified time frame, so I'd prefer a hint of solution without loops (cursors).

;WITH CTE AS (
SELECT
evnt2.id as ID,
sum(evnt1.duration) as Duration
from
#events evnt1
INNER JOIN #events evnt2
ON evnt1.id <> evnt2.id
WHERE
DATEADD(second, evnt1.duration, evnt1.starttime)
BETWEEN
evnt2.starttime AND DATEADD(second, evnt2.duration, evnt2.starttime)
GROUP BY evnt2.id
)
SELECT
#events.duration - CTE.duration,
*
FROM
#events
INNER JOIN CTE
ON #events.id = CTE.id

The simplest way I can think of to do this is with multiple self-joins. I say multiple because the Event 2 can start before or during Event 1.
Here's a bit of code that will answer your question if Event 2 always starts before Event 1.
select DateDiff(s,e1.StartTime, DateAdd(s,e2Before.Duration,e2Before.StartTime))
from events e1
join events e2Before
on (e1.StartTime between e2Before.StartTime and DateAdd(s,e2Before.duration,e2Before.StartTime))
and e1.event = 1
and e2Before.event = 2
To answer the question fully, you'll need to add another join with some of the DateAdd parameters swapped around a bit to cater for situations where Event 2 starts after Event 1 starts.

Related

Count the time difference between specific events in SQL Server 2019 or pandas

I am working on an infusion dataset in which I need to find the time duration between the infusion stop and other infusion events.
This is a screenshot of the dataset:
In the screenshot, the first event status is STOPPED at 06:28:31 and the infusion started to run by 09:10:54. Hence the total seconds from the stop to run is 9743 which has to get populated for row 1 in a new column. Likewise 16:50:38 the pump stopped and there was an alarm by 06:04:07 so the difference would be approximately 13 hours. so on row 5, I need the difference value of 13 hours. I need this difference to be found for the entire data where ever I have stopped and the followed by running or stopped alarm infusion status.
I was able to find the difference between each running and stopped alarm status from the stopped event. However its getting populated for all places where i have stopped.
This is the SQL code I use:
SELECT
InfusionStatus, InfusionID, EventDescription,
Time AS event_time,
IIF ((InfusionStatus = 'STOPPED'),
DATEDIFF(SECOND, A1.Time, ISNULL((SELECT TOP 1 Time
FROM table1
WHERE InfusionID = A1.InfusionID
AND SiteNumber = A1.SiteNumber
AND SerialNumber = A1.SerialNumber
AND Time >= A1.Time
AND (InfusionStatus = 'RUNNING' OR
InfusionStatus = 'STOPPED_ALARM')
ORDER BY Time ASC), A1.time)), 0) AS stop_run_event_duration_secs
FROM
dbo.table1 A1
The output that I am getting is like this:
Basically I don't want the difference to be populated in the area's I have marked as "X". The difference has to get populated only for the first stopped event.
Link for the data:
Data Link
I can also go with Python code for determining this.
Any help would be greatly appreciated.
Using Common Table Expression (CTE):
;WITH cte AS
(
SELECT
*
-- Every time the InfusionStatus changes into STOPPED, RUNNING or
-- STOPPED_ALARM for the first time, set IsFirstStatusChange = 1
, CASE
WHEN InfusionStatus IN ('STOPPED', 'RUNNING', 'STOPPED_ALARM')
AND InfusionStatus != LAG(InfusionStatus, 1, '') OVER (ORDER BY Time)
THEN 1
ELSE 0
END AS IsFirstStatusChange
FROM Infusion_data
)
-- For each status change, calculate the duration from the previous change
-- I don't know your requirements around handling STOPPED so I set Duration to
-- NULL for these rows
SELECT *
, IIF(InfusionStatus = 'STOPPED', NULL, DATEDIFF(SECOND, LAG(Time) OVER (ORDER BY Time), Time)) AS Duration
FROM cte
WHERE IsFirstStatusChange = 1
ORDER BY Time

Calculating working time with overlapping events (SQL)

I have found similar queries on StackOverflow (e.g. Finding simultaneous events in a database between times) but nothing that matches exactly what I am after as far as I can tell so thought it OK to add as a new question.
I have a table that logs jobs (or "Activities"), with a start/end time for the job. I need to calculate working time (you can disregard non-working days, break times etc. as I have that covered). The complication is an individual can work on simultaneous jobs, overlapping at different points (the assumption is equal effort on simultaneous jobs), and the working time needs to reflect that. Minute accuracy is all that is required, not to the second.
Based on other suggestions I have this query, implemented as a table-valued function. It will look at each minute that activity is running, and if any other activities are running in the same period for the same person, and make calculations based on that. It works, but is very inefficient - taking over a minute to execute. Any ideas how I can do this more efficiently?
Running SQL 2005. I have done the obvious such as to add indexes on foreign keys by the way.
CREATE FUNCTION [dbo].[WorkActivity_WorkTimeCalculations] (#StartDate smalldatetime, #EndDate smalldatetime)
RETURNS #retActivity TABLE
(
ActivityID bigint PRIMARY KEY NOT NULL,
WorkMins decimal NOT NULL
)
/********************************************************************
Summary: Calculates the WORKING time on each activity running in a given date/time range
Remarks: Takes into account staff working simultaneously on jobs
(evenly distributes working time across simultaneous jobs)
Input Params: #StartDate - the start of the period to calculate
#EndDate - the end of the period to calculate
Output Params:
Returns: Recordset of activities and associated working time (minutes)
********************************************************************/
AS
BEGIN
-- any work activities still running use the overall end date as the activity's end date for the purpose of calculating
-- simulateneous jobs running
-- POPULATE A TEMP TABLE WITH EVERY MINUTE IN THE DATE RANGE
DECLARE #Minutes TABLE (MinuteDateTime smalldatetime NOT NULL)
;WITH cte AS (
SELECT #StartDate AS myDate
UNION ALL
SELECT DATEADD(minute,1,myDate)
FROM cte
WHERE DATEADD(minute,1,myDate) <= #EndDate
)
INSERT INTO #Minutes (MinuteDateTime)
SELECT myDate FROM cte
OPTION (MAXRECURSION 0)
-- POPULATE A TEMP TABLE WITH WORKLOAD PER EMPLOYEE PER MINUTE
DECLARE #JobsRunningByStaff TABLE (StaffID smallint NOT NULL, MinuteDateTime smalldatetime NOT NULL, JobsRunning decimal NOT NULL)
INSERT INTO #JobsRunningByStaff (StaffID, MinuteDateTime, JobsRunning)
SELECT wka_StaffID, MinuteDateTime, COUNT(DISTINCT wka_ItemID) JobsRunning
FROM dbo.WorkActivities
INNER JOIN #Minutes ON (MinuteDateTime BETWEEN wka_StartTime AND DATEADD(minute,-1,ISNULL(wka_EndTime,#EndDate)))
GROUP BY wka_StaffID, MinuteDateTime
-- FINALLY MAKE THE CALCULATIONS FOR EACH ACTIVITY
INSERT INTO #retActivity
SELECT wka_ActivityID, SUM(1/JobsRunning)WorkMins
FROM dbo.WorkActivities
INNER JOIN #JobsRunningByStaff ON (wka_StaffID = StaffID AND MinuteDateTime BETWEEN wka_StartTime AND DATEADD(minute,-1,ISNULL(wka_EndTime,#EndDate)))
GROUP BY wka_ActivityID
RETURN
END
Some example data (sorry for the poor formatting!)...
Source Data from WorkActivities table:
ACTIVITY ID | START TIME | END TIME | STAFF ID
1 | 03/03/2016 10:30 | 03/03/2016 10:50 | 1
2 | 03/03/2016 10:40 | 03/03/2016 11:00 | 1
And the desired results for a function call of SELECT * FROM dbo.WorkActivity_WorkTimeCalculations ('03-Mar-2016 10:30','03-Mar-2016 11:30'):
ACTIVITY ID | WORKMINS
1 | 25
2 | 15
So, the results take into account between 10:40 and 10:50 there are two jobs happening simultaneously, so calculates 5 mins working time on each over that period.
As suggested by posters, indexing made a significant difference - creating an index with wka_StartTime and wka_EndTime sorted it.
(sorry, couldn't see how to mark the comments made by others as an answer!)

WHERE [Column] Has Not Changed in N Time

So, I've got a view that is admittedly not well-indexed and there's not much I can do about it.
The view has data that looks a bit like the one in this question, but my problem is essentially the opposite of theirs and I'm not sure their solution will work here, though a similar TVF or CTE is probably in the forecast.
My data looks like this at the moment:
CustomPollerAssignmentId DateTime Status
[Some Id B] 2013-11-18 08:54:00 IDLE
[Some Id A] 2013-11-18 08:54:00 DORMANT
[Some Id B] 2013-11-18 08:53:00 IDLE
[Some Id A] 2013-11-18 08:53:00 NOMINAL
Unlike the other question, I need to see that the status hasn't changed. The view comprises three separate tables. One with minute statistics, one with fifteen minute statistics (for between three and six months ago), and one for hourly statistics (for up to a year ago).
The goal here is to check which modems have been idle for at least the last 10 minutes. We've got about 1200 active modems, so this could be up to 12000 rows, which is why I'd prefer not to do it with C#, but I'm still kind of new to SQL and set-based thinking. I'm currently working with an instance of SQL Server 2012, but it's very new here and I'm not really experienced with the newer windowed functions since we were on 2008R2 until about a month ago.
To be honest, I'm not even sure where to begin here because my OOP background wants me to just grab the TOP 10 statuses for each and loop through. If all 10 == idle || dormant, add to the result set, but I know there's got to be a better way to do it in SQL. Can someone point me in the right direction?
EDIT
To try to clarify a bit:
I'm using T-SQL.
This isn't as simple as a WHERE NOT EXISTS clause.
Regardless of whether or not the status has changed, there should be an entry for the remote's status unless it has been deactivated. This means that it could have (idle, idle, idle, idle, idle, nominal, idle, idle, idle, idle) statuses for the last 10 minutes and that example is a case I would not want to include. The result set should include ONLY those remotes which have had statuses which are only idle or dormant for the last 10 minutes. If the last status is more than three months ago, it will only have one status for a fifteen minute interval.
If i understand correctly what you're asking:
SELECT [theView].CustomPollerAssignmentId
FROM
[theView]
LEFT JOIN
(
SELECT CustomPollerAssignmentId, MAX([DateTime]) AS LastTime
FROM [theView]
WHERE [Status] <> 'IDLE' AND [DateTime] <= #Now
GROUP BY CustomPollerAssignmentId
) AS NotIdleStatus ON
[theView].CustomPollerAssignmentId = NotIdleStatus.CustomPollerAssignmentId
WHERE
[theView].[DateTime] <= #Now AND
[theView].[Status] = 'IDLE' AND
(
[theView].[DateTime] > NotIdleStatus.LastTime OR
NotIdleStatus.LastTime IS NULL
)
GROUP BY [theView].CustomPollerAssignmentId
HAVING MIN([theView].[DateTime]) <= DATEADD(MINUTE, -10, #Now)
The concept here is to
For each ID, select the latest time for non-idle status up till the current time.
For each ID, join idle status against the latest non-idle time.
Group the status by ID.
Filter for those that have been idle before 10 minutes ago.
Also filter for those that are either idle later than its latest non-idle status or does not have non-idle status.
The following code selects the last non-idle time for each of IDs:
SELECT CustomPollerAssignmentId, MAX([DateTime]) AS LastTime
FROM [theView]
WHERE [Status] <> 'IDLE' AND [DateTime] <= #Now
GROUP BY CustomPollerAssignmentId
We are using LEFT JOIN so that any status that is IDLE but never in any other status is still captured. The ON clause joins them by ID.
The following code selects those that are idle and is before the current time:
WHERE
[theView].[DateTime] <= #Now AND
[theView].[Status] = 'IDLE' AND ...
The following code groups the records by ID and selects the ID that has the earliest time earlier than 10 minutes before the current time:
GROUP BY [theView].CustomPollerAssignmentId
HAVING MIN([theView].[DateTime]) <= DATEADD(MINUTE, -10, #Now)
Also, you will need to pass in the #Now value, which would be the current time that you want to check.
SELECT v1.*
FROM (
SELECT *
FROM vdata
WHERE OnlineStatus IN ('IDLE', 'DORMANT')
) v1
WHERE NOT EXISTS (
SELECT 1
FROM vdata v2
WHERE v2.ModemId = v1.ModemId
AND v2.MinuteMarker > v1.MinuteMarker
AND v2.MinuteMarker <= DATEADD(MINUTE, 10, v1.MinuteMarker )
AND v2.OnlineStatus NOT IN ('IDLE', 'DORMANT')
)

SQL Query data issues

I have the following data:
ID Date interval interval_date tot_activity non-activity
22190 2011-09-27 00:00:00 1000 2011-09-27 10:00:00.000 265 15
I have another table with this data:
Date ID Start END sched_non_activity non_activity
10/3/2011 12:00:00 AM HBLV-22267 10/3/2011 2:02:00 PM 10/3/2011 2:11:00 PM 540
Now, in the second table's non_activity field, I would like this to be the value from the first table. However, I need to capture the tot_activity - non_activity where the intervals(in 15 min increments) from the first table, fall in the same time frame as the start and end of the second table.
I have the following so far:
SELECT 1.ID, 1.Date, 1.interval, 1.interval_date, 1.tot_activity, 1.non_activity,
1.tot_activity - 1.non_activity AS non_activity
FROM table1 AS 1 INNER JOIN
LIST AS L ON 1.ID = L.ID INNER JOIN
table2 AS 2 ON 1.Date = 2.Date AND L.ID = Right(2.ID,5)
Where 1.interval_date >= 2.Start AND 1.interval_date < 2.End
ORDER BY 1.ID, 1.interval_date
With this, I can already see I will be unable to capture if a start from table 2 is at 15:50, which means that I need to capture interval 15:45.
is there any way of doing this through queries, or should I be using variables, and doing the check per interval. Any help at all would be greatly appreciated.
I think you are asking too much from a query here.
What i would do is treat the two tables as lists ordered by time stamps and solve the problem programatically (ie not with a single query)
For example, create a function that traverses the first table in 15min increments and find the best match in the second table (i am guessing this is what you are trying to do). Implement your function to return the same results set as your query above or store it in a temporary table. Select from the result set. T-SQL is your friend :)
I'm having a tough time understanding your issue, but you might have better luck with the DATEDIFF function:
DATEDIFF(SECOND, 1.interval_date, 2.Start) >= 0 AND DATEDIFF(SECOND, 1.interval_date, 2.End) <= 0
I apologize if I'm not catching your drift. If I'm missing something, could you try to clarify a little bit?

How do I analyse time periods between records in SQL data without cursors?

The root problem: I have an application which has been running for several months now. Users have been reporting that it's been slowing down over time (so in May it was quicker than it is now). I need to get some evidence to support or refute this claim. I'm not interested in precise numbers (so I don't need to know that a login took 10 seconds), I'm interested in trends - that something which used to take x seconds now takes of the order of y seconds.
The data I have is an audit table which stores a single row each time the user carries out any activity - it includes a primary key, the user id, a date time stamp and an activity code:
create table AuditData (
AuditRecordID int identity(1,1) not null,
DateTimeStamp datetime not null,
DateOnly datetime null,
UserID nvarchar(10) not null,
ActivityCode int not null)
(Notes: DateOnly (datetime) is the DateTimeStamp with the time stripped off to make group by for daily analysis easier - it's effectively duplicate data to make querying faster).
Also for the sake of ease you can assume that the ID is assigned in date time order, that is 1 will always be before 2 which will always be before 3 - if this isn't true I can make it so).
ActivityCode is an integer identifying the activity which took place, for instance 1 might be user logged in, 2 might be user data returned, 3 might be search results returned and so on.
Sample data for those who like that sort of thing...:
1, 01/01/2009 12:39, 01/01/2009, P123, 1
2, 01/01/2009 12:40, 01/01/2009, P123, 2
3, 01/01/2009 12:47, 01/01/2009, P123, 3
4, 01/01/2009 13:01, 01/01/2009, P123, 3
User data is returned (Activity Code 2) immediate after login (Activity Code 1) so this can be used as a rough benchmark of how long the login takes (as I said, I'm interested in trends so as long as I'm measuring the same thing for May as July it doesn't matter so much if this isn't the whole login process - it takes in enough of it to give a rough idea).
(Note: User data can also be returned under other circumstances so it's not a one to one mapping).
So what I'm looking to do is select the average time between login (say ActivityID 1) and the first instance after that for that user on that day of user data being returned (say ActivityID 2).
I can do this by going through the table with a cursor, getting each login instance and then for that doing a select to say get the minimum user data return following it for that user on that day but that's obviously not optimal and is slow as hell.
My question is (finally) - is there a "proper" SQL way of doing this using self joins or similar without using cursors or some similar procedural approach? I can create views and whatever to my hearts content, it doesn't have to be a single select.
I can hack something together but I'd like to make the analysis I'm doing a standard product function so would like it to be right.
SELECT TheDay, AVG(TimeTaken) AvgTimeTaken
FROM (
SELECT
CONVERT(DATE, logins.DateTimeStamp) TheDay
, DATEDIFF(SS, logins.DateTimeStamp,
(SELECT TOP 1 DateTimeStamp
FROM AuditData userinfo
WHERE UserID=logins.UserID
and userinfo.ActivityCode=2
and userinfo.DateTimeStamp > logins.DateTimeStamp )
)TimeTaken
FROM AuditData logins
WHERE
logins.ActivityCode = 1
) LogInTimes
GROUP BY TheDay
This might be dead slow in real world though.
In Oracle this would be a cinch, because of analytic functions. In this case, LAG() makes it easy to find the matching pairs of activity codes 1 and 2 and also to calculate the trend. As you can see, things got worse on 2nd JAN and improved quite a bit on the 3rd (I'm working in seconds rather than minutes).
SQL> select DateOnly
2 , elapsed_time
3 , elapsed_time - lag (elapsed_time) over (order by DateOnly) as trend
4 from
5 (
6 select DateOnly
7 , avg(databack_time - prior_login_time) as elapsed_time
8 from
9 ( select DateOnly
10 , databack_time
11 , ActivityCode
12 , lag(login_time) over (order by DateOnly,UserID, AuditRecordID, ActivityCode) as prior_login_time
13 from
14 (
15 select a1.AuditRecordID
16 , a1.DateOnly
17 , a1.UserID
18 , a1.ActivityCode
19 , to_number(to_char(a1.DateTimeStamp, 'SSSSS')) as login_time
20 , 0 as databack_time
21 from AuditData a1
22 where a1.ActivityCode = 1
23 union all
24 select a2.AuditRecordID
25 , a2.DateOnly
26 , a2.UserID
27 , a2.ActivityCode
28 , 0 as login_time
29 , to_number(to_char(a2.DateTimeStamp, 'SSSSS')) as databack_time
30 from AuditData a2
31 where a2.ActivityCode = 2
32 )
33 )
34 where ActivityCode = 2
35 group by DateOnly
36 )
37 /
DATEONLY ELAPSED_TIME TREND
--------- ------------ ----------
01-JAN-09 120
02-JAN-09 600 480
03-JAN-09 150 -450
SQL>
Like I said in my comment I guess you're working in MSSQL. I don't know whether that product has any equivalent of LAG().
If the assumptions are that:
Users will perform various tasks in no mandated order, and
That the difference between any two activities reflects the time it takes for the first of those two activities to execute,
Then why not create a table with two timestamps, the first column containing the activity start time, the second column containing the next activity start time. Thus the difference between these two will always be total time of the first activity. So for the logout activity, you would just have NULL for the second column.
So it would be kind of weird and interesting, for each activity (other than logging in and logging out), the time stamp would be recorded in two different rows--once for the last activity (as the time "completed") and again in a new row (as time started). You would end up with a jacob's ladder of sorts, but finding the data you are after would be much more simple.
In fact, to get really wacky, you could have each row have the time that the user started activity A and the activity code, and the time started activity B and the time stamp (which, as mentioned above, gets put down again for the following row). This way each row will tell you the exact difference in time for any two activities.
Otherwise, you're stuck with a query that says something like
SELECT TIME_IN_SEC(row2-timestamp) - TIME_IN_SEC(row1-timestamp)
which would be pretty slow, as you have already suggested. By swallowing the redundancy, you end up just querying the difference between the two columns. You probably would have less need of knowing the user info as well, since you'd know that any row shows both activity codes, thus you can just query the average for all users on any given day and compare it to the next day (unless you are trying to find out which users are having the problem as well).
This is the faster query to find out, in one row you will have current and row before datetime value, after that you can use DATEDIFF ( datepart , startdate , enddate ). I use #DammyVariable and DamyField as i remember the is some problem if is not first #variable=Field in update statement.
SELECT *, Cast(NULL AS DateTime) LastRowDateTime, Cast(NULL As INT) DamyField INTO #T FROM AuditData
GO
CREATE CLUSTERED INDEX IX_T ON #T (AuditRecordID)
GO
DECLARE #LastRowDateTime DateTime
DECLARE #DammyVariable INT
SET #LastRowDateTime = NULL
SET #DammyVariable = 1
UPDATE #T SET
#DammyVariable = DammyField = #DammyVariable
, LastRowDateTime = #LastRowDateTime
, #LastRowDateTime = DateTimeStamp
option (maxdop 1)