I have an equipment that reports its number of produced pieces at random time intervals. At each record, the internal counter is reset, so if I want to get the total pieces, I would net to sum over an interval.
ts pieces
--------------------------------
2013-01-23 11:58 2013
2013-01-23 12:12 3025
2013-01-23 12:12 3025
2013-01-23 12:13 112
2013-01-23 12:17 1122
2013-01-23 12:34 3112
2013-01-23 12:36 3025
What if I want to query this data and I want the produced pieces from 12:00 to 12:30. I cannot simply do the following:
SELECT SUM(pieces)
FROM table
WHERE ts BETWEEN '2013-01-23 12:00' and '2013-01-23 12:30'
With this query, I would have too many pieces in the beginning (as of the 3025 pieces reported at 12:12 some were produced in the 2 minutes before 12:00) and I would have too little pieces at the end (pieces produced between 12:17 and 12:30 were only reported at 12:34).
Is there a built in feature in SQL server to do such calculations on timebased series, or would it require me to manually interpolate based on dateDiff between first/last values in the interval and last/first outside?
You have to do the interpolation manually. I've built some code that does it, based on various other assumptions (i.e. whether ts represents "all items produced up until and including this minute" or "all items produced before this minute"):
declare #t table (ts datetime2 not null,pieces int not null)
insert into #t(ts,pieces) values
('2013-01-23T11:58:00',2013),
('2013-01-23T12:12:00',3025),
--('2013-01-23T12:12:00',3025), --Error, two identical counts at same time?
('2013-01-23T12:13:00',112 ),
('2013-01-23T12:17:00',1122),
('2013-01-23T12:34:00',3112),
('2013-01-23T12:36:00',3025)
declare #Start datetime2
declare #End datetime2
select #Start = '2013-01-23T12:00:00', #End = '2013-01-23T12:30:00'
;With Numbers as (
select distinct number
from master..spt_values
), RowsNumbered as (
select
ROW_NUMBER() OVER (ORDER BY ts) as rn,
*
from #t
), Paired as (
select
rn2.ts as prevTime,
rn1.ts as thisTime,
rn1.pieces as pieces
from
RowsNumbered rn1
inner join
RowsNumbered rn2
on
rn1.rn = rn2.rn + 1
), Minutes as (
select
DATEADD(minute,-number,thisTime) as Time,
(pieces * 1.0) / DATEDIFF(minute,prevTime,thisTime) as Pieces
from
Paired p
inner join
Numbers n
on
number < 60 and --Reasonable?
DATEADD(minute,-number,thisTime) > prevTime and
number >= 0
)
select SUM(pieces)
from Minutes
where Time >= #Start and Time < #End
I'm also treating the start and end times as a semi-open interval with an inclusive start and exclusive end. This is normally the more sensible way to work with continuous data such as datetime.
Hopefully, you can see how I'm building up each CTE to get from the individual timestamps and piece counts to, for each minute, having a concept of how many pieces were produced in that minute. Once we've got to that point, we can just SUM over the minutes we want to include.
Related
I have a SQL database that collects temperature and sensor data from the barn.
The table definition is:
CREATE TABLE [dbo].[DataPoints]
(
[timestamp] [datetime] NOT NULL,
[pointname] [nvarchar](50) NOT NULL,
[pointvalue] [float] NOT NULL
)
The sensors report outside temperature (degrees), inside temperature (degrees), and heating (as on/off).
Sensors create a record when the previous reading has changed, so temperatures are generated every few minutes, one record for heat coming ON, one for heat going OFF, and so on.
I'm interested in how many minutes of heat has been used overnight, so a 24-hour period from 6 AM yesterday to 6 AM today would work fine.
This query:
SELECT *
FROM [home_network].[dbo].[DataPoints]
WHERE (pointname = 'Heaters')
AND (timestamp BETWEEN '2022-12-18 06:00:00' AND '2022-12-19 06:00:00')
ORDER BY timestamp
returns this data:
2022-12-19 02:00:20 | Heaters | 1
2022-12-19 02:22:22 | Heaters | 0
2022-12-19 03:43:28 | Heaters | 1
2022-12-19 04:25:31 | Heaters | 0
The end result should be 22 minutes + 42 minutes = 64 minutes of heat, but I can't see how to get this result from a single query. It also just happens that this result set has two complete heat on/off cycles, but that will not always be the case. So, if the first heat record was = 0, that means that at 6 AM, the heat was already on, but the start time won't show in the query. The same idea applies if the last heat record is =1 at, say 05:15, which means 45 minutes have to be added to the total.
Is it possible to get this minutes-of-heat-time result with a single query? Actually, I don't know the right approach, and it doesn't matter if I have to run several queries. If needed, I could use a small app that reads the raw data, and applies logic outside of SQL to arrive at the total. But I'd prefer to be able to do this within SQL.
This isn't a complete answer, but it should help you get started. From the SQL in the post, I'm assuming you're using SQL Server. I've formatted the code to match. Replace #input with your query above if you want to test on your own data. (SELECT * FROM [home_network].[dbo]...)
--generate dummy table with sample output from question
declare #input as table(
[timestamp] [datetime] NOT NULL,
[pointname] [nvarchar](50) NOT NULL,
[pointvalue] [float] NOT NULL
)
insert into #input values
('2022-12-19 02:00:20','Heaters',1),
('2022-12-19 02:22:22','Heaters',0),
('2022-12-19 03:43:28','Heaters',1),
('2022-12-19 04:25:31','Heaters',0);
--Append a row number to the result
WITH A as (
SELECT *,
ROW_NUMBER() OVER(ORDER BY(SELECT 1)) as row_count
from #input)
--Self join the table using the row number as a guide
SELECT sum(datediff(MINUTE,startTimes.timestamp,endTimes.timestamp))
from A as startTimes
LEFT JOIN A as endTimes on startTimes.row_count=endTimes.row_count-1
--Only show periods of time where the heater is turned on at the start
WHERE startTimes.row_count%2=1
Your problem can be divided into 2 steps:
Filter sensor type and date range, while also getting time span of each record by calculating date difference between timestamp of current record and the next one in chronological order.
Filter records with ON status and summarize the duration
(Optional) convert to HH:MM:SS format to display
Here's the my take on the problem with comments of what I do in each step, all combined into 1 single query.
-- Step 3: Convert output to HH:MM:SS, this is just for show and can be reduced
SELECT STUFF(CONVERT(VARCHAR(8), DATEADD(SECOND, total_duration, 0), 108),
1, 2, CAST(FLOOR(total_duration / 3600) AS VARCHAR(5)))
FROM (
-- Step 2: select records with status ON (1) and aggregate total duration in seconds
SELECT sum(duration) as total_duration
FROM (
-- Step 1: Use LEAD to get next adjacent timestamp and calculate date difference (time span) between the current record and the next one in time order
SELECT TOP 100 PERCENT
DATEDIFF(SECOND, timestamp, LEAD(timestamp, 1, '2022-12-19 06:00:00') OVER (ORDER BY timestamp)) as duration,
pointvalue
FROM [dbo].[DataPoints]
-- filtered by sensor name and time range
WHERE pointname = 'Heaters'
AND (timestamp BETWEEN '2022-12-18 06:00:00' AND '2022-12-19 06:00:00')
ORDER BY timestamp ASC
) AS tmp
WHERE tmp.pointvalue = 1
) as tmp2
Note: As the last record does not have next adjacent timestamp, it will be filled with the end time of inspection (In this case it's 6AM of the next day).
I do not really think it would be possible to achieve within single query.
Option 1:
implement stored procedure where you can implement some logic how to calculate these periods.
Option 2:
add new column (duration) and on insert new record calculate difference between NOW and previous timestamp and update duration for previous record
I have a pretty huge table with columns dates, account, amount, etc. eg.
date account amount
4/1/2014 XXXXX1 80
4/1/2014 XXXXX1 20
4/2/2014 XXXXX1 840
4/3/2014 XXXXX1 120
4/1/2014 XXXXX2 130
4/3/2014 XXXXX2 300
...........
(I have 40 months' worth of daily data and multiple accounts.)
The final output I want is the average amount of each account each month. Since there may or may not be record for any account on a single day, and I have a seperate table of holidays from 2011~2014, I am summing up the amount of each account within a month and dividing it by the number of business days of that month. Notice that there is very likely to be record(s) on weekends/holidays, so I need to exclude them from calculation. Also, I want to have a record for each of the date available in the original table. eg.
date account amount
4/1/2014 XXXXX1 48 ((80+20+840+120)/22)
4/2/2014 XXXXX1 48
4/3/2014 XXXXX1 48
4/1/2014 XXXXX2 19 ((130+300)/22)
4/3/2014 XXXXX2 19
...........
(Suppose the above is the only data I have for Apr-2014.)
I am able to do this in a hacky and slow way, but as I need to join this process with other subqueries, I really need to optimize this query. My current code looks like:
<!-- language: lang-sql -->
select
date,
account,
sum(amount/days_mon) over (partition by last_day(date))
from(
select
date,
-- there are more calculation to get the account numbers,
-- so this subquery is necessary
account,
amount,
-- this is a list of month-end dates that the number of
-- business days in that month is 19. similar below.
case when last_day(date) in ('','',...,'') then 19
when last_day(date) in ('','',...,'') then 20
when last_day(date) in ('','',...,'') then 21
when last_day(date) in ('','',...,'') then 22
when last_day(date) in ('','',...,'') then 23
end as days_mon
from mytable tb
inner join lookup_businessday_list busi
on tb.date = busi.date)
So how can I perform the above purpose efficiently? Thank you!
This approach uses sub-query factoring - what other RDBMS flavours call common table expressions. The attraction here is that we can pass the output from one CTE as input to another. Find out more.
The first CTE generates a list of dates in a given month (you can extend this over any range you like).
The second CTE uses an anti-join on the first to filter out dates which are holidays and also dates which aren't weekdays. Note that Day Number varies depending according to the NLS_TERRITORY setting; in my realm the weekend is days 6 and 7 but SQL Fiddle is American so there it is 1 and 7.
with dates as ( select date '2014-04-01' + ( level - 1) as d
from dual
connect by level <= 30 )
, bdays as ( select d
, count(d) over () tot_d
from dates
left join holidays
on dates.d = holidays.hol_date
where holidays.hol_date is null
and to_number(to_char(dates.d, 'D')) between 2 and 6
)
select yt.account
, yt.txn_date
, sum(yt.amount) over (partition by yt.account, trunc(yt.txn_date,'MM'))
/tot_d as avg_amt
from your_table yt
join bdays
on bdays.d = yt.txn_date
order by yt.account
, yt.txn_date
/
I haven't rounded the average amount.
You have 40 month of data, this data should be very stable.
I will assume that you have a cold body (big and stable easily definable range of data) and hot tail (small and active part).
Next, I would like to define a minimal period. It is a data range that is a smallest interval interesting for Business.
It might be year, month, day, hour, etc. Do you expect to get questions like "what was averege for that account between 1900 and 12am yesterday?".
I will assume that the answer is DAY.
Then,
I will calculate sum(amount) and count() for every account for every DAY of cold body.
I will not create a dummy records, if particular account had no activity on some day.
and I will save day, account, total amount, count in a TABLE.
if there are modifications later to the cold body, you delete and reload affected day from that table.
For hot tail there might be multiple strategies:
Do the same as above (same process, clear to support)
always calculate on a fly
use materialized view as an averege between 1 and 2.
Cold body table totalc could also be implemented as materialized view, but if data never change - no need to rebuild it.
With this you go from (number of account) x (number of transactions per day) x (number of days) to (number of account)x(number of active days) number of records.
That should speed up all following calculations.
I searched night and day back when I was first starting out in the sql world for an answer to this question. Could not find anything similar to this for my needs so I decided to ask and answer my own question in case others need help like I did.
Here is an example of the data I have. For simplicity, it is all from the Job table. Each JobID has it's own Start and End time that are basically random and can overlap, have gaps, start and end at the same time as other jobs etc.
--Available--
JobID WorkerID JobStart JobEnd
1 25 '2012-11-17 16:00' '2012-11-17 17:00'
2 25 '2012-11-18 16:00' '2012-11-18 16:50'
3 25 '2012-11-19 18:00' '2012-11-19 18:30'
4 25 '2012-11-19 17:30' '2012-11-19 18:10'
5 26 '2012-11-18 16:00' '2012-11-18 17:10'
6 26 '2012-11-19 16:00' '2012-11-19 16:50'
What I wanted the result of the query to show would be:
WorkerID TotalTime(in Mins)
25 170
26 120
EDIT: Forgot to mention that the overlaps need to be ignored. Basically this is supposed to treat these workers and their jobs like you would an hourly employee and not a contractor. Like if I worked two jobIDs and started and finished them both from 12:00pm to 12:30pm, as an employee I would only get paid for 30 mins, whereas a contractor would likely get paid 60 mins, since their jobs are treated individually and get paid per job. The point of this query is to analyze jobs in a database that are tied to a worker, and need to find out if that worker was treated as an employee, what would his total hours worked in a given set of time come out to be.
EDIT2: won't let me answer my own question for 7 hours, will move it there later.
Ok, Answering Question now. Basically, I use temp table to build each minute between the min and max datetime of the jobs I am looking up.
IF OBJECT_ID('tempdb..#time') IS NOT NULL
BEGIN
drop table #time
END
DECLARE #FromDate AS DATETIME,
#ToDate AS DATETIME,
#Current AS DATETIME
SET #FromDate = '2012-11-17 16:00'
SET #ToDate = '2012-11-19 18:30'
create table #time (cte_start_date datetime)
set #current = #FromDate
while (#current < #ToDate)
begin
insert into #time (cte_start_date)
values (#current)
set #current = DATEADD(n, 1, #current)
end
Now I have all the mins in a temp table. Now I need to join all the Job table info into it and select out what I need in one go.
SELECT J.WorkerID
,COUNT(DISTINCT t.cte_start_date) AS TotalTime
FROM #time AS t
INNER JOIN Job AS J ON t.cte_start_date >= J.JobStart AND t.cte_start_date < J.JobEnd --Thanks ErikE
GROUP BY J.WorkerID --Thanks Martin Parkin
drop table #time
That is the very simplified answer and is good to get someone started.
This query does the job as well. Its performance is very good (while the execution plan looks not so great, the actual CPU and IO beat many other queries).
See it working in a Sql Fiddle.
WITH Times AS (
SELECT DISTINCT
H.WorkerID,
T.Boundary
FROM
dbo.JobHistory H
CROSS APPLY (VALUES (H.JobStart), (H.JobEnd)) T (Boundary)
), Groups AS (
SELECT
WorkerID,
T.Boundary,
Grp = Row_Number() OVER (PARTITION BY T.WorkerID ORDER BY T.Boundary) / 2
FROM
Times T
CROSS JOIN (VALUES (1), (1)) X (Dup)
), Boundaries AS (
SELECT
G.WorkerID,
TimeStart = Min(Boundary),
TimeEnd = Max(Boundary)
FROM
Groups G
GROUP BY
G.WorkerID,
G.Grp
HAVING
Count(*) = 2
)
SELECT
B.WorkerID,
WorkedMinutes = Sum(DateDiff(minute, 0, B.TimeEnd - B.TimeStart))
FROM
Boundaries B
WHERE
EXISTS (
SELECT *
FROM dbo.JobHistory H
WHERE
B.WorkerID = H.WorkerID
AND B.TimeStart < H.JobEnd
AND B.TimeEnd > H.JobStart
)
GROUP BY
WorkerID
;
With a clustered index on WorkerID, JobStart, JobEnd, JobID, and with the sample 7 rows from the above fiddle a template for new worker/job data repeated enough times to yield a table with 14,336 rows, here are the performance results. I've included the other working/correct answers on the page (so far):
Author CPU Elapsed Reads Scans
------ --- ------- ------ -----
Erik 157 166 122 2
Gordon 375 378 106964 53251
I did a more exhaustive test from a different (slower) server (where each query was run 25 times, the best and worst values for each metric were thrown out, and the remaining 23 values were averaged) and got the following:
Query CPU Duration Reads Notes
-------- ---- -------- ------ ----------------------------------
Erik 1 215 231 122 query as above
Erik 2 326 379 116 alternate technique with no EXISTS
Gordon 1 578 682 106847 from j
Gordon 2 584 673 106847 from dbo.JobHistory
The alternate technique I thought to be sure to improve things. Well, it saved 6 reads, but cost a lot more CPU (which makes sense). Instead of carrying through the start/end statistics of each timeslice to the end, it is best just recalculating which slices to keep with the EXISTS against the original data. It may be that a different profile of few workers with many jobs could change the performance statistics for different queries.
In case anyone wants to try it, use the CREATE TABLE and INSERT statements from my fiddle and then run this 11 times:
INSERT dbo.JobHistory
SELECT
H.JobID + A.MaxJobID,
H.WorkerID + A.WorkerCount,
DateAdd(minute, Elapsed + 45, JobStart),
DateAdd(minute, Elapsed + 45, JobEnd)
FROM
dbo.JobHistory H
CROSS JOIN (
SELECT
MaxJobID = Max(JobID),
WorkerCount = Max(WorkerID) - Min(WorkerID) + 1,
Elapsed = DateDiff(minute, Min(JobStart), Min(JobEnd))
FROM dbo.JobHistory
) A
;
I built two other solutions to this query but the best one with about double the performance had a fatal flaw (not correctly handling fully enclosed time ranges). The other had very high/bad statistics (which I knew but had to try).
Explanation
Using all the endpoint times from each row, build up a distinct list of all possible time ranges of interest by duplicating each endpoint time and then grouping in such a way as to pair each time with the next possible time. Sum the elapsed minutes of these ranges wherever they coincide with any actual worker's working time.
A query such as the following should provide the answer you are looking for:
SELECT WorkerID,
SUM(DATEDIFF(minute, JobStart, JobEnd)) AS TotalTime
FROM Job
GROUP BY WorkerID
Apologies that it is untested (I have no SQL Server to test it here) but it should do the trick.
This is a complicated query. Explanation follows.
with j as (
select j.*,
(select 1
from jobs j2
where j2.workerid = j.workerid and
j2.starttime < j.endtime and
j2.starttime > j.starttime
) as HasOverlap
from jobs j
)
select workerId,
sum(datediff(minute, periodStart, PeriodEnd)) as NumMinutes
from (select workerId, min(startTime) as periodStart, max(endTime) as PeriodEnd
from (select j.*,
(select min(starttime)
from j j2
where j2.workerid = j.workerid and
j2.starttime >= j.starttime and
j2.HasOverlap is null
) as thegroup
from j
) j
group by workerId, thegroup
) j
group by workerId;
The key to understanding this approach is to understand the "overlap" logic. One time period overlaps with the next when the next start time is before the previous end time. By assigning an overlap flag to each record, we know if it overlaps with the "next" record. The above logic is using the start time for this. It might be better to use the JobId, especially if two jobs for the same worker could start at the same time.
The calculation of the overlap flag uses a correlated subquery (this is j in the with clause).
Then, for each record we go back and find the first record afterwards where the overlap value is NULL. This provides a grouping key for all records in a given overlap set.
The rest, then, is just to aggregate the results, first at the workerId/group level and then at the workerId level to get the final results.
I have not run this SQL, so it might have syntax errors.
I am trying to group some records into 5-, 15-, 30- and 60-minute intervals:
SELECT AVG(value) as "AvgValue",
sample_date/(5*60) as "TimeFive"
FROM DATA
WHERE id = 123 AND sample_date >= 3/21/2012
i want to run several queries, each would group my average values into the desired time increments. So the 5-min query would return results like this:
AvgValue TimeFive
6.90 1995-01-01 00:05:00
7.15 1995-01-01 00:10:00
8.25 1995-01-01 00:15:00
The 30-min query would result in this:
AvgValue TimeThirty
6.95 1995-01-01 00:30:00
7.40 1995-01-01 01:00:00
The datetime column is in yyyy-mm-dd hh:mm:ss format
I am getting implicit conversion errors of my datetime column. Any help is much appreciated!
Using
datediff(minute, '1990-01-01T00:00:00', yourDatetime)
will give you the number of minutes since 1990-1-1 (you can use the desired base date).
Then you can divide by 5, 15, 30 or 60, and group by the result of this division.
I've cheked it will be evaluated as an integer division, so you'll get an integer number you can use to group by.
i.e.
group by datediff(minute, '1990-01-01T00:00:00', yourDatetime) /5
UPDATE As the original question was edited to require the data to be shown in date-time format after the grouping, I've added this simple query that will do what the OP wants:
-- This convert the period to date-time format
SELECT
-- note the 5, the "minute", and the starting point to convert the
-- period back to original time
DATEADD(minute, AP.FiveMinutesPeriod * 5, '2010-01-01T00:00:00') AS Period,
AP.AvgValue
FROM
-- this groups by the period and gets the average
(SELECT
P.FiveMinutesPeriod,
AVG(P.Value) AS AvgValue
FROM
-- This calculates the period (five minutes in this instance)
(SELECT
-- note the division by 5 and the "minute" to build the 5 minute periods
-- the '2010-01-01T00:00:00' is the starting point for the periods
datediff(minute, '2010-01-01T00:00:00', T.Time)/5 AS FiveMinutesPeriod,
T.Value
FROM Test T) AS P
GROUP BY P.FiveMinutesPeriod) AP
NOTE: I've divided this in 3 subqueries for clarity. You should read it from inside out. It could, of course, be written as a single, compact query
NOTE: if you change the period and the starting date-time you can get any interval you need, like weeks starting from a given day, or whatever you can need
If you want to generate test data for this query use this:
CREATE TABLE Test
( Id INT IDENTITY PRIMARY KEY,
Time DATETIME,
Value FLOAT)
INSERT INTO Test(Time, Value) VALUES('2012-03-22T00:00:22', 10)
INSERT INTO Test(Time, Value) VALUES('2012-03-22T00:03:22', 10)
INSERT INTO Test(Time, Value) VALUES('2012-03-22T00:04:45', 10)
INSERT INTO Test(Time, Value) VALUES('2012-03-22T00:07:21', 20)
INSERT INTO Test(Time, Value) VALUES('2012-03-22T00:10:25', 30)
INSERT INTO Test(Time, Value) VALUES('2012-03-22T00:11:22', 30)
INSERT INTO Test(Time, Value) VALUES('2012-03-22T00:14:47', 30)
The result of executing the query is this:
Period AvgValue
2012-03-22 00:00:00.000 10
2012-03-22 00:05:00.000 20
2012-03-22 00:10:00.000 30
Building on #JotaBe's answer (to which I cannot comment on--otherwise I would), you could also try something like this which would not require a subquery.
SELECT
AVG(value) AS 'AvgValue',
-- Add the rounded seconds back onto epoch to get rounded time
DATEADD(
MINUTE,
(DATEDIFF(MINUTE, '1990-01-01T00:00:00', your_date) / 30) * 30,
'1990-01-01T00:00:00'
) AS 'TimeThirty'
FROM YourTable
-- WHERE your_date > some max lookback period
GROUP BY
(DATEDIFF(MINUTE, '1990-01-01T00:00:00', your_date) / 30)
This change removes temp tables and subqueries. It uses the same core logic for grouping by 30 minute intervals but, when presenting the data back as part of the result I'm just reversing the interval calculation to get the rounded date & time.
So, in case you googled this, but you need to do it in mysql, which was my case:
In MySQL you can do
GROUP BY
CONCAT(
DATE_FORMAT(`timestamp`,'%m-%d-%Y %H:'),
FLOOR(DATE_FORMAT(`timestamp`,'%i')/5)*5
)
In the new SQL Server 2022, you can use DATE_BUCKET, this rounds it down to the nearest interval specified.
SELECT
DATE_BUCKET(minute, 5, d.sample_date) AS TimeFive,
AVG(d.value) AS AvgValue
FROM DATA d
WHERE d.id = 123
AND d.sample_date >= '20121203'
GROUP BY
DATE_BUCKET(minute, 5, d.sample_date);
You can use the following statement, this removed the second component and calculates the number of minutes away from the five minute mark and uses this to round down to the time block. This is ideal if you want to change your window, you can simply change the mod value.
select dateadd(minute, - datepart(minute, [YOURDATE]) % 5, dateadd(minute, datediff(minute, 0, [YOURDATE]), 0)) as [TimeBlock]
This will help exactly what you want
replace dt - your datetime c - call field astro_transit1 - your table 300 refer 5 min so add 300 each time for time gap increase
SELECT FROM_UNIXTIME( 300 * ROUND( UNIX_TIMESTAMP( r.dt ) /300 ) ) AS 5datetime, ( SELECT r.c FROM astro_transit1 ra WHERE ra.dt = r.dt ORDER BY ra.dt DESC LIMIT 1 ) AS first_val FROM astro_transit1 r GROUP BY UNIX_TIMESTAMP( r.dt ) DIV 300 LIMIT 0 , 30
In a MySQL query I am using the timediff/time_to_sec functions to calculate the total minutes between two date-times.
For example:
2010-03-23 10:00:00
-
2010-03-23 08:00:00
= 120 minutes
What I would like to do is exclude any breaks that occur during the selected time range.
For example:
2010-03-23 10:00:00
-
2010-03-23 08:00:00
-
(break 08:55:00 to 09:10:00)
= 105 minutes
Is there a good method to do this without resorting to a long list of nested IF statements?
UPDATE1:
To clarify - I am trying to calculate how long a user takes to accomplish a given task. If they take a coffee break that time period needs to be excluded. The coffee breaks are a at fixed times.
sum all your breaks that occur during the times, and then subtract to the result of the timediff/time_to_sec function
SELECT TIME_TO_SEC(TIMEDIFF('17:00:00', '09:00:00')) -- 28800
SELECT TIME_TO_SEC(TIMEDIFF('12:30:00', '12:00:00')) -- 1800
SELECT TIME_TO_SEC(TIMEDIFF('10:30:00', '10:15:00')) -- 900
-- 26100
Assuming this structure :
CREATE TABLE work_unit (
id INT NOT NULL,
initial_time TIME,
final_time TIME
)
CREATE TABLE break (
id INT NOT NULL,
initial_time TIME,
final_time TIME
)
INSERT work_unit VALUES (1, '09:00:00', '17:00:00')
INSERT break VALUES (1, '10:00:00', '10:15:00')
INSERT break VALUES (2, '12:00:00', '12:30:00')
You can calculate it with next query:
SELECT *, TIME_TO_SEC(TIMEDIFF(final_time, initial_time)) total_time
, (SELECT SUM(
TIME_TO_SEC(TIMEDIFF(b.final_time, b.initial_time)))
FROM break b
WHERE (b.initial_time BETWEEN work_unit.initial_time AND work_unit.final_time) OR (b.final_time BETWEEN work_unit.initial_time AND work_unit.final_time)
) breaks
FROM work_unit