SQL (BigQuery) Grouping Runtime Per Day - sql

I have the following data which I want to group into seconds per day in BigQuery.
Source Table:
+--------------+---------------------+---------------------+
| ComputerName | StartDatetime | EndDatetime |
+--------------+---------------------+---------------------+
| Computer1 | 2020-06-10T21:01:28 | 2020-06-10T21:20:19 |
+--------------+---------------------+---------------------+
| Computer1 | 2020-06-10T22:54:01 | 2020-06-11T05:21:48 |
+--------------+---------------------+---------------------+
| Computer2 | 2020-06-08T09:11:54 | 2020-06-10T11:36:27 |
+--------------+---------------------+---------------------+
I want to be able to visualise the data in the following way
+------------+--------------+------------------+
| Date | ComputerName | Runtime(Seconds) |
+------------+--------------+------------------+
| 2020-10-10 | Computer1 | 5089 |
+------------+--------------+------------------+
| 2020-10-11 | Computer1 | 19308 |
+------------+--------------+------------------+
| 2020-10-08 | Computer2 | 53285 |
+------------+--------------+------------------+
| 2020-10-09 | Computer2 | 86400 |
+------------+--------------+------------------+
| 2020-10-10 | Computer2 | 41787 |
+------------+--------------+------------------+
I am not too sure of the way I should approach this. Some input would be greatly appreciated.

This is an interval overlap problem. You can solve this by splitting each time period into separate days and then looking at the overlap for each day:
with t as (
select 'Computer1' as computername, datetime '2020-06-10T21:01:28' as startdatetime, datetime '2020-06-10T21:20:19' as enddatetime union all
select 'Computer1' as computername, datetime '2020-06-10T22:54:01' as startdatetime, datetime '2020-06-11T05:21:48' as enddatetime union all
select 'Computer2' as computername, datetime '2020-06-08T09:11:54' as startdatetime, datetime '2020-06-10T11:36:27' as enddatetime
)
select dte, t.computername,
sum(case when enddatetime >= dte and
startdatetime < date_add(dte, interval 1 day)
then datetime_diff(least(date_add(dte, interval 1 day), enddatetime),
greatest(dte, startdatetime),
second)
end) as runtime_seconds
from (select t.*,
generate_date_array(date(t.startdatetime), date(t.enddatetime), interval 1 day) gda
from t
) t cross join
unnest(gda) dte
group by dte, t.computername;

Below is for BigQuery Standard SQL
#standardSQL
select Date, ComputerName,
sum(datetime_diff(
least(datetime (Date + 1), EndDatetime),
greatest(datetime(Date), StartDatetime),
second
)) as Runtime_Seconds
from `project.dataset.table`,
unnest(generate_date_array(date(StartDatetime), date(EndDatetime))) Date
group by Date, ComputerName
if to apply to sample data in your question - as in below example
#standardSQL
with `project.dataset.table` as (
select 'Computer1' ComputerName, datetime '2020-06-10T21:01:28' StartDatetime, datetime '2020-06-10T21:20:19' EndDatetime union all
select 'Computer1', '2020-06-10T22:54:01', '2020-06-11T05:21:48' union all
select 'Computer2', '2020-06-08T09:11:54', '2020-06-10T11:36:27'
)
select Date, ComputerName,
sum(datetime_diff(
least(datetime (Date + 1), EndDatetime),
greatest(datetime(Date), StartDatetime),
second
)) as Runtime_Seconds
from `project.dataset.table`,
unnest(generate_date_array(date(StartDatetime), date(EndDatetime))) Date
group by Date, ComputerName
output is

Another option for BigQuery Standard SQL
Straightforward, "little silly" and almost logic-less option of just "stupidly" counting seconds in respective days - still looks like an option to me
#standardSQL
select Date, ComputerName,
countif(second >= timestamp(StartDatetime) and second < timestamp(EndDatetime)) as Runtime_Seconds
from `project.dataset.table`,
unnest(generate_date_array(date(StartDatetime), date(EndDatetime))) Date,
unnest(generate_timestamp_array(timestamp(Date + 1), timestamp(Date), interval -1 second)) second with offset
where offset > 0
group by Date, ComputerName
if applied to sample data from your question - output is

Related

Subtracting a date from another date

I am trying to subtract one date from another but having issues:
SELECT MIN(date) AS first_day,
MAX(date) AS last_date,
((MAX(date)) - (MIN(date))) AS totaL_days
FROM dates;
Could someone please clarify the number format of the number it is returning below?
+------------+
| total_days |
+------------+
| 29001900 |
I have tried using DATEDIFF but this rounds the days to the nearest whole number and I need to carry out further calculations with the data. The rounding means my solutions are a little off.
In the version of DB I am using DATEDIFF() only takes two parameters so always has to be days as far as I'm aware, I get an error if I try to use hours.
SELECT DATEDIFF
(
SELECT MAX(date) FROM dates,
SELECT MIN(date) FROM dates
)
AS totaL_days;
should do the trick.
I'm assuming your RDBMS is a MySql.
Then that number you got would be the seconds between those 2 datetimes.
Because if you subtract 2 DATE types you would get the days between them.
There's more than DATEDIFF to work with.
Test data
create table mytest (
id int auto_increment primary key,
date_col date not null,
datetime_col datetime not null
);
insert into mytest(date_col, datetime_col) values
('2021-06-16', '2021-06-16 14:15:30')
,('2021-07-16', '2021-07-16 19:06:15')
Test using dates
select
min(date_col) as min_date
, max(date_col) as max_date
, max(date_col) - min(date_col) as subtracted
, datediff(max(date_col), min(date_col)) as days
from mytest
min_date | max_date | subtracted | days
:--------- | :--------- | ---------: | ---:
2021-06-16 | 2021-07-16 | 100 | 30
Test using datetimes
select
min(datetime_col) as min_date
, max(datetime_col) as max_date
, max(datetime_col) - min(datetime_col) as seconds
, datediff(max(datetime_col), min(datetime_col)) as days
from mytest
min_date | max_date | seconds | days
:------------------ | :------------------ | --------: | ---:
2021-06-16 14:15:30 | 2021-07-16 19:06:15 | 100049085 | 30
Using sec_to_time and extract
select seconds
, sec_to_time(seconds) as tm
, extract(day from sec_to_time(seconds)) as days
from (
select
max(datetime_col) - min(datetime_col) as seconds
from mytest
) q
seconds | tm | days
--------: | :-------- | ---:
100049085 | 838:59:59 | 30
db<>fiddle here

Calculating working minutes for Normal and Night Shift

I am making a query to fetch the working minutes for employees. The problem I have is the Night Shift. I know that I need to subtract the "ShiftStartMinutesFromMidnight" but I can't find the right logic.
NOTE: I can't changing the database, I only can use the data from it.
Let's say I have these records.
+----+--------------------------+----------+
| ID | EventTime | ReaderNo |
-----+--------------------------+----------+
| 1 | 2019-12-04 11:28:46.000 | In |
| 1 | 2019-12-04 12:36:17.000 | Out |
| 1 | 2019-12-04 12:39:23.000 | In |
| 1 | 2019-12-04 12:51:21.000 | Out |
| 1 | 2019-12-05 07:37:49.000 | In |
| 1 | 2019-12-05 08:01:22.000 | Out |
| 2 | 2019-12-04 22:11:46.000 | In |
| 2 | 2019-12-04 23:06:17.000 | Out |
| 2 | 2019-12-04 23:34:23.000 | In |
| 2 | 2019-12-05 01:32:21.000 | Out |
| 2 | 2019-12-05 01:38:49.000 | In |
| 2 | 2019-12-05 06:32:22.000 | Out |
-----+--------------------------+----------+
WITH CT AS (SELECT
EIn.PSNID, EIn.PSNNAME
,CAST(DATEADD(minute, -0, EIn.EventTime) AS date) AS dt
,EIn.EventTime AS LogIn
,CA_Out.EventTime AS LogOut
,DATEDIFF(minute, EIn.EventTime, CA_Out.EventTime) AS WorkingMinutes
FROM
VIEW_EVENT_EMPLOYEE AS EIn
CROSS APPLY
(
SELECT TOP(1) EOut.EventTime
FROM VIEW_EVENT_EMPLOYEE AS EOut
WHERE
EOut.PSNID = EIn.PSNID
AND EOut.ReaderNo = 'Out'
AND EOut.EventTime >= EIn.EventTime
ORDER BY EOut.EventTime
) AS CA_Out
WHERE
EIn.ReaderNo = 'In'
)
SELECT
PSNID
,PSNNAME
,dt
,LogIn
,LogOut
,WorkingMinutes
FROM CT
WHERE dt BETWEEN '2019-11-29' AND '2019-12-05'
ORDER BY LogIn
;
OUTPUT FROM QUERY
+----+------------+-------------------------+-------------------------+----------------+
| ID | date | In | Out | WorkingMinutes |
-----+------------+-------------------------+-------------------------+----------------+
| 1 | 2019-12-04 | 2019-12-04 11:28:46.000 | 2019-12-04 12:36:17.000 | 68 |
| 1 | 2019-12-04 | 2019-12-04 12:39:23.000 | 2019-12-04 12:51:21.000 | 12 |
| 1 | 2019-12-05 | 2019-12-05 07:37:49.000 | 2019-12-05 08:01:22.000 | 24 |
-----+------------+-------------------------+-------------------------+----------------+
I was thinking something like this. When Out is between 06:25 - 6:40. But I also need to check If employee, previous day has In between 21:50 - 22:30. I need that second condition because some employee from first shift maybe can Out, for example at 6:30.
*(1310 is the ShiftStartMinutesFromMidnight
Line 3 of Query
CAST(DATEADD(minute, -0, EIn.EventTime) AS date) AS dt
Updating the Line 3 with this code.
CASE
WHEN CAST(CA_Out.LogDate AS time) BETWEEN '06:25:00' AND '06:40:00'
AND CAST(EIn.LogDate AS time) BETWEEN '21:50:00' AND '22:30:00' THEN CAST(DATEADD(minute, -1310, EIn.LogDate) AS date)
ELSE CAST(DATEADD(minute, -0, EIn.LogDate) AS date)
END as dt
Expected Output
+----+------------+-------------------------+-------------------------+----------------+
| ID | date | In | Out | WorkingMinutes |
-----+------------+-------------------------+-------------------------+----------------+
| 2 | 2019-12-04 | 2019-12-04 22:11:46.000 | 2019-12-04 23:06:17.000 | 55 |
| 2 | 2019-12-04 | 2019-12-04 23:34:23.000 | 2019-12-05 01:32:21.000 | 118 |
| 2 | 2019-12-04 | 2019-12-05 01:38:49.000 | 2019-12-05 06:32:22.000 | 294 |
-----+------------+-------------------------+-------------------------+----------------+
Assuming that total minutes per separate date is enough:
WITH
/* enumerate pairs */
cte1 AS ( SELECT *,
COUNT(CASE WHEN ReaderNo = 'In' THEN 1 END)
OVER (PARTITION BY ID
ORDER BY EventTime) pair
FROM test ),
/* divide by pairs */
cte2 AS ( SELECT ID, MIN(EventTime) starttime, MAX(EventTime) endtime
FROM cte1
GROUP BY ID, pair ),
/* get dates range */
cte3 AS ( SELECT CAST(MIN(EventTime) AS DATE) minDate,
CAST(MAX(EventTime) AS DATE) maxDate
FROM test),
/* generate dates list */
cte4 AS ( SELECT minDate theDate
FROM cte3
UNION ALL
SELECT DATEADD(dd, 1, theDate)
FROM cte3, cte4
WHERE theDate < maxDate ),
/* add overlapped dates to pairs */
cte5 AS ( SELECT ID, starttime, endtime, theDate
FROM cte2, cte4
WHERE theDate BETWEEN CAST(starttime AS DATE) AND CAST(endtime AS DATE) ),
/* adjust borders */
cte6 AS ( SELECT ID,
CASE WHEN starttime < theDate
THEN theDate
ELSE starttime
END starttime,
CASE WHEN CAST(endtime AS DATE) > theDate
THEN DATEADD(dd, 1, theDate)
ELSE endtime
END endtime,
theDate
FROM cte5 )
/* calculate total minutes per date */
SELECT ID,
theDate,
SUM(DATEDIFF(mi, starttime, endtime)) workingminutes
FROM cte6
GROUP BY ID,
theDate
ORDER BY 1,2
fiddle
The solution is specially made detailed, step by step, so that you can easily understand the logic.
You may freely combine some CTEs into one. You may also use pre-last cte5 combined with cte2 if you need the output strongly as shown.
The solution assumes that none records are lost in source data (each 'In' matches strongly one 'Out' and backward, and no adjacent or overlapped pairs).
Don't know where you stopped but here is how I do,
Night shift 20:00 - 05:00 so in one day 00:00 - 5:00; 22:00 - 24:00
day shift 5:00 - 22:00
To get easier overlapping checking you need to change all dates to unix timestamp. so you don't have to split time intervals like shown above
So generate map of each period work for fetch period date_from and date_till, make sure to add holiday and pre-holiday exceptions where periods are different
something like:
Unix values is only for understanding.
unix_from_tim, unix_till_tim, shift_type
1580680800, 1580680800, 1 => example 02-02-2020:22:00:00, 03-02-2020:05:00:00, 1
1580680800, 1580680800, 0 => example 03-02-2020:05:00:00, 03-02-2020:22:00:00, 0
1580680800, 1580680800, 1 => example 03-02-2020:22:00:00, 04-02-2020:05:00:00, 1
...
Make sure you don't calculate overlapping minutes on period start/end..
And there is worker one row
with unix_from_tim, unix_from_tim
1580680800, 1580680800=> something like 02-02-2020:16:30:00, 03-02-2020:07:10:00
When you check overlapping you can get ms like this:
MIN(work_period:till,worker_period:till) - MAX(work_period:from, worker_period:from);
example in simple numbers:
work_period 3 - 7
worker_period 5 - 12
MIN(7,12) - MAX(3,5) = 7 - 5 = 2 //overlap
work_period 3 - 7
worker_period 8 - 12
MIN(7,12) - MAX(3,8) = 7 - 8 = -1 //if negative not overlap!
work_period 3 - 13
worker_period 8 - 12
MIN(13,12) - MAX(3,8) = 13 - 8 = 5 //full overlap!
And you have to check each worker period on all overlaping time generated work intervals.
May be someone can make select where you don't have to generate work_shift overlapping but its not a easy task if you add more holidays, transferred days, reduced time days etc.
Hope it helps

Reporting on time information using start and end time

Is it possible to create a report that sums hours for a day grouped by an Id using a start and end time stamp?
I need to be able to split time that spans days and take part of that time and sum to the correct date group.
NOTE: The date ids are to a date dimension table.
------------------------------------------------------------------------------
TaskId | StartDateId | EndDateId | StartTime | EndTime
------------------------------------------------------------------------------
2 | 20190317 | 20190318 | 2019-03-17 16:30:00 | 2019-03-18 09:00:00
------------------------------------------------------------------------------
1 | 20190318 | 20190318 | 2019-03-18 09:00:00 | 2019-03-18 16:30:00
------------------------------------------------------------------------------
2 | 20190318 | 20190319 | 2019-03-18 16:30:00 | 2019-03-19 09:00:00
------------------------------------------------------------------------------
So based on this, the desired report output would be:
-------------------------
Date | Task | Hours
-------------------------
2019-03-17 | 2 | 7.5
-------------------------
2019-03-18 | 1 | 7.5
-------------------------
2019-03-18 | 2 | 16.5
-------------------------
...
The only working solution I have managed to implement is splitting records so that no record spans multiple days. I was hoping to find a report query solution, rather than an ETL base based solution.
I have tried to simulate your problem here: https://rextester.com/DEV45608 and I hope it helps you :) (The CTE GetDates can be replaced by your date dimension)
DECLARE #minDate DATE
DECLARE #maxDate DATE
CREATE TABLE Tasktime
(
Task_id INT,
Start_time DATETIME,
End_time DATETIME
);
INSERT INTO Tasktime VALUES
(2,'2019-03-17 16:30:00','2019-03-18 09:00:00'),
(1,'2019-03-18 09:00:00','2019-03-18 16:30:00'),
(2,'2019-03-18 16:30:00','2019-03-19 09:00:00');
SELECT #mindate = MIN(Start_time) FROM Tasktime;
SELECT #maxdate = MAX(End_time) FROM Tasktime;
;WITH GetDates AS
(
SELECT 1 AS counter, #minDate as Date
UNION ALL
SELECT counter + 1, DATEADD(day,counter,#minDate)
from GetDates
WHERE DATEADD(day, counter, #minDate) <= #maxDate
)
SELECT counter, Date INTO #tmp FROM GetDates;
SELECT
g.Date,
t.Task_id,
SUM(
CASE WHEN CAST(t.Start_time AS DATE) = CAST(t.End_time AS DATE) THEN
DATEDIFF(second, t.Start_time, t.End_time) / 3600.0
WHEN CAST(t.Start_time AS DATE) = g.Date THEN
DATEDIFF(second, t.Start_time, CAST(DATEADD(day,1,g.Date) AS DATETIME)) / 3600.0
WHEN CAST(t.End_time AS DATE) = g.Date THEN
DATEDIFF(second, CAST(g.Date AS DATETIME), t.End_time) / 3600.0
ELSE
24.0
END) AS hours_on_the_day_for_the_task
from
#tmp g
INNER JOIN
Tasktime t
ON
g.Date BETWEEN CAST(t.Start_time AS DATE) AND CAST(t.End_time AS DATE)
GROUP BY g.Date, t.Task_id
The Desired Date can be joined to the date dimension and return the "calendar date" and you can show that date in the report.
As for the HOURS.. when you are retrieving your dataset in SQL, just do this.. it is as simple as:
cast(datediff(MINUTE,'2019-03-18 16:30:00','2019-03-19 09:00:00') /60.0 as decimal(13,1)) as 'Hours'
So in your case it would be
cast(datediff(MINUTE,sometable.startdate,sometable.enddate) /60.0 as decimal(13,1)) as 'Hours'
Just doing a HOUR will return the whole hour.. and dividing by 60 will return a whole number. Hence the /60.0 and the cast

SQLite: Sum of differences between two dates group by every date

I have a SQLite database with start and stop datetimes
With the following SQL query I get the difference hours between start and stop:
SELECT starttime, stoptime, cast((strftime('%s',stoptime)-strftime('%s',starttime)) AS real)/60/60 AS diffHours FROM tracktime;
I need a SQL query, which delivers the sum of multiple timestamps, grouped by every day (also whole dates between timestamps).
The result should be something like this:
2018-08-01: 12 hours
2018-08-02: 24 hours
2018-08-03: 12 hours
2018-08-04: 0 hours
2018-08-05: 1 hours
2018-08-06: 14 hours
2018-08-07: 8 hours
You can try this, use CTE RECURSIVE make a calendar table for every date start time and end time, and do some calculation.
Schema (SQLite v3.18)
CREATE TABLE tracktime(
id int,
starttime timestamp,
stoptime timestamp
);
insert into tracktime values
(11,'2018-08-01 12:00:00','2018-08-03 12:00:00');
insert into tracktime values
(12,'2018-09-05 18:00:00','2018-09-05 19:00:00');
Query #1
WITH RECURSIVE cte AS (
select id,starttime,date(starttime,'+1 day') totime,stoptime
from tracktime
UNION ALL
SELECT id,
date(starttime,'+1 day'),
date(totime,'+1 day'),
stoptime
FROM cte
WHERE date(starttime,'+1 day') < stoptime
)
SELECT strftime('%Y-%m-%d', starttime),(strftime('%s',CASE
WHEN totime > stoptime THEN stoptime
ELSE totime
END) -strftime('%s',starttime))/3600 diffHour
FROM cte;
| strftime('%Y-%m-%d', starttime) | diffHour |
| ------------------------------- | -------- |
| 2018-08-01 | 12 |
| 2018-09-05 | 1 |
| 2018-08-02 | 24 |
| 2018-08-03 | 12 |
View on DB Fiddle

PostgreSQL query group by two "parameters"

I've been trying to figure out the following PostgreSQL query with no success for two days now.
Let's say I have the following table:
| date | value |
-------------------------
| 2018-05-11 | 0.20 |
| 2018-05-11 | -0.12 |
| 2018-05-11 | 0.15 |
| 2018-05-10 | -1.20 |
| 2018-05-10 | -0.70 |
| 2018-05-10 | -0.16 |
| 2018-05-10 | 0.07 |
And I need to find out the query to count positive and negative values per day:
| date | positives | negatives |
------------------------------------------
| 2018-05-11 | 2 | 1 |
| 2018-05-10 | 1 | 3 |
I've been able to figure out the query to extract only positives or negatives, but not both at the same time:
SELECT to_char(table.date, 'DD/MM') AS date
COUNT(*) AS negative
FROM table
WHERE table.date >= DATE(NOW() - '20 days' :: INTERVAL) AND
value < '0'
GROUP BY to_char(date, 'DD/MM'), table.date
ORDER BY table.date DESC;
Can please someone assist? This is driving me mad. Thank you.
Use a FILTER clause with the aggregate function.
SELECT to_char(table.date, 'DD/MM') AS date,
COUNT(*) FILTER (WHERE value < 0) AS negative,
COUNT(*) FILTER (WHERE value > 0) AS positive
FROM table
WHERE table.date >= DATE(NOW() - '20 days'::INTERVAL)
GROUP BY 1
ORDER BY DATE(table.date) DESC
I would simply do:
select date_trunc('day', t.date) as dte,
sum( (value < 0)::int ) as negatives,
sum( (value > 0)::int ) as positives
from t
where t.date >= current_date - interval '20 days'
group by date_trunc('day', t.date),
order by dte desc;
Notes:
I prefer using date_trunc() to casting to a string for removing the time component.
You don't need to use now() and convert to a date. You can just use current_date.
Converting a string to an interval seems awkward, when you can specify an interval using the interval keyword.