Listing the hours between two timestamps and grouping by those hours - sql

I am trying to ascertain a count of the couriers that are active every hour of a shift using the the start and end times of their shifts to create an array which I hope to group by. Firstly, when I run it I'm given epoch times back, secondly, I am not able to group by the hours array.
Does anyone have any solutions that they would kindly share with me?
**
SELECT
GENERATE_TIMESTAMP_ARRAY(CAST(fss.start_time_local AS TIMESTAMP), CAST(fss.end_time_local AS TIMESTAMP) , INTERVAL 1 hour) as hours,
#COUNT(sys_scheduled_shift_id) AS number_schedule_shift,
FROM just-data-warehouse.delco_analytics_team_dwh.fact_scheduled_shifts AS fss
#GROUP BY hours
**
For your reference the shift data for the courier is structured like so

To calculate how many couriers have been active at least one minute in every hour I would do it like this:
SELECT
CALENDAR.datetime
,SUM(workers.flag_worker) as n_workers
FROM (
-- CALENDAR
SELECT
cast(datetime as datetime) datetime
FROM UNNEST(GENERATE_TIMESTAMP_ARRAY('2022-01-01T00:00:00', '2022-01-02T00:00:00'
,INTERVAL 1 hour)) AS datetime
) CALENDAR
-- TABLE of SHIFTS
LEFT JOIN (
SELECT * , 1 flag_worker FROM
UNNEST(
ARRAY<STRUCT<worker_id string , shift_start datetime, shift_end datetime>>[
('Worker_01', '2022-01-01T06:00:00','2022-01-01T14:00:00')
,('Worker_02', '2022-01-01T10:00:00','2022-01-01T18:00:00')
]
)
AS workers
)workers
ON CALENDAR.datetime < workers.shift_end
AND DATETIME_ADD(CALENDAR.datetime, INTERVAL 1 hour) > workers.shift_start
GROUP BY CALENDAR.datetime
The idea is to build a calendar of datetimes and then join it with a table of shifts.
Instead of hours, the calendar can be modified to have fractions of hours. Also, there may be a more elegant way to build the calendar.

Related

Simulate query over a range of dates

I have a fairly long query that looks over the past 13 weeks and determines if the current day's performance is an anomaly compared to the last 13 weeks. It just returns a single row that has the date, the performance of the current day and a flag saying if it is an anomaly or not. To make matters a little more complicated: The performance isn't just a single day but rather a running 24 hour window. This query is then run every hour to monitor the KPI over the last 24 hours. i.e. If it is 2pm on Tuesday, it will look from 2pm the previous day (Monday) to now, and compare it to every other 2pm-to-2pm for the last 13 weeks.
To test if this code is working I would like simulate it running over the past month.
The code goes as follows:
WITH performance AS(
SELECT TRUNC(dateColumn - to_number(to_char(sysdate, 'hh24')/24) as startdate,
KPI_a,
KPI_b,
KPI_c
FROM table
WHERE someConditions
GROUP BY TRUNC(dateColumn - to_number(to_char(sysdate, 'hh24')/24)),
compare_t AS(
-- looks at relationships of the KPIs),
variables AS(
-- calculates the variables required for the anomaly detection),
... ok I don't know how much of the query needs to be given but it's basically I need to simulate 'sysdate'. Instead of inputting the current date, input each hour for the last month so this query will run approx 720 times and return the result 720 times, for each hour of each day.
I'm thinking a FOR loop, but I'm not sure.
You can use a recursive subquery:
with times(time) as
(
select sysdate - interval '1' month as time from dual
union all
select time + interval '1' hour from times
where time < sysdate
)
, performance as ()
, compare_t as ()
, variables as ()
select *
from times
join ...
order by time;
I don't understand your specific requirements but I had to solve similar problems. To give you an idea here are two proposals:
Calculate average and standard deviation of KPI value from past 13 weeks to yesterday. If current value from today it lower than "AVG - 10*STDDEV" then select record, i.e. mark as anomaly.
WITH t AS
(SELECT dateColumn, KPI_A,
AVG(KPI_A) OVER (ORDER BY dateColumn RANGE BETWEEN 13 * INTERVAL '7' DAY PRECEDING AND INTERVAL '1' DAY PRECEDING) AS REF_AVG,
STDDEV(KPI_A) OVER (ORDER BY dateColumn RANGE BETWEEN 13 * INTERVAL '7' DAY PRECEDING AND INTERVAL '1' DAY PRECEDING) AS REF_STDDEV
FROM TABLE
WHERE someConditions)
SELECT dateColumn, REF_AVG, KPI_A, REF_STDDEV
FROM t
WHERE TRUNC(dateColumn, 'HH') = TRUNC(LOCALTIMESTAMP, 'HH')
AND KPI_A < REF_AVG - 10 * REF_STDDEV;
Take hourly values from last week (i.e. the same weekday as yesterday) and make correlation with hourly values from yesterday. If correlation is less than certain value (I use 95%) then consider this day as anomaly.
WITH t AS
(SELECT dateColumn, KPI_A,
FIRST_VALUE(KPI_A) OVER (ORDER BY dateColumn RANGE BETWEEN INTERVAL '7' DAY PRECEDING AND CURRENT ROW) AS KPI_A_LAST_WEEK,
dateColumn - FIRST_VALUE(dateColumn) OVER (ORDER BY dateColumn RANGE BETWEEN INTERVAL '7' DAY PRECEDING AND CURRENT ROW) AS RANGE_INT
FROM table
WHERE ...)
SELECT 100*ROUND(CORR(KPI_A, KPI_A_LAST_WEEK), 2) AS CORR_VAL
FROM t
WHERE KPI_A_LAST_WEEK IS NOT NULL
AND RANGE_INT = INTERVAL '7' DAY
AND TRUNC(dateColumn) = TRUNC(LOCALTIMESTAMP - INTERVAL '1' DAY)
GROUP BY TRUNC(dateColumn);

How can I cross join the following query results with a table of dates

I am looking for a query which gives me the daily playing time. The start (first_date) and end date(last_update) are given as shown in the Table. The following query gives me the sum of playing time on given date. How can I extend it to get a table from first day to last day and plot the query data in it and show 0 on dates when no game is played.
SELECT startTime, SUM(duration) as sum
FROM myTable
WHERE startTime = endTime
GROUP BY startTime
To show date when no one play you will need create a table days with a date field day so you could do a left join. (100 years is only 36500 rows).
Using select Generate days from date range
This use store procedure in MSQL
I will assume if a play pass the midnight a new record begin. So I could simplify my code and remove the time from datetime field
SELECT d.day, SUM(duration) as sum
FROM
days d
left join myTable m
on CONVERT(date, m.starttime) = d.day
GROUP BY d.day
If I understand correctly, you could try:
SELECT SUM(duration) AS duration, date
FROM myTable
WHERE date <= 20140430
AND date => 20140401
GROUP BY date
This would get the total time played for each date between april 1 and april 30
As far as showing 0 for dates not in the table, I don't know.
Also, the table you posted doesn't show a duration column, but the query you posted does, so I went ahead and used it.

Custom Postgres function for term dates and times

Lets say I have a large table that just consists of three columns.
Integer id,
timestamp ts,
double value
If I wanted to get the values given a complicated date expression what is the best way to achieve that ?
For example if I wanted to get all the values at anytime on weekend days and only between 18:00 and 8:00 on weekdays and any time on school holidays for the year 2014.
Obviously some of these times are variable and so the solution should be dynamic. I was thinking
of storing a series of date intervals for things like school holidays in another table to check against. However, I would like to create a custom Postgres function to hide some of the complexity.
Does anyone know of similar code or have suggestions ?
Especially dealing with cases like the times above except on weekend logic ?
Thanks
With a holiday table
select *
from
t
left join
holiday on date_trunc('day', t.ts) = holiday.day
where
extract(dow from ts) in (0, 6) -- Weekend
or
(extract(hour from ts) >= 18 and extract(hour from ts) <= 8)
or
holiday.day is not null -- Holiday

tackling the building of a complex query

I have intermediate SQL skills, but this is the most complex query I've ever attempted.
My goal is to build a query that will show how many minutes of any given day, a set of 6 drives are in use or idle. Drives that are 'in use' are writing backups to tape aka running a job. A drive can handle only on job at a time. A drive may start and end a job on the same day, or start one day and end 2 days later, if it's a big job. The most important thing is that I be able to report the number of minutes EACH drive is either UP or IDLE (both are important) and also to only report the minutes it worked on the respective day, even if the job carried into the next.
So, complexity results from following
I can't just subtract start time from end time and SUM the elapsed time of all jobs run by a particular drives, because many jobs span midnight, and I must assign the minutes worked to the day in which they occurred. IE. I can't report that a drive performed 50 hours of work in a 24 hour period, just because the end time of the job was 2 days out.
the start time and end time columns are in UTC time, and must be converted to PST.
I need placeholders for minutes of the day when any one of the drive is idle, so that I can show up/idle time for each of the drives.
The tables I need to put together are just two:
a Time calendar table. It has a row for each minute of the day starting with 10-10-2009 through 10-07-2021.
a table containing the start and end times of all jobs that have completed, the names of the drives that ran them, and the names of the jobs.
Here's DDL for a calendar table containing a row for every minute of the day since 2009 through 2014.
WITH e1(min) AS(
SELECT * FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1))x(n)
),
e2(min) AS(
SELECT e1.min FROM e1, e1 x
),
e4(min) AS(
SELECT e2.min FROM e2, e2 x
),
e8(min) AS(
SELECT e4.min FROM e4, e4 x
),
cteTally(min) AS(
SELECT TOP 6307204 ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) - 1
FROM e8
),
Test(min) AS(
SELECT DATEADD( minute, min, DATEADD( YEAR, -2, GETDATE()))
FROM cteTally)
SELECT DATEADD( MINUTE, DATEDIFF( MINUTE, 0, DATEADD( YEAR, -2, GETDATE())), 0)
FROM Test
WHERE min <= DATEADD( YEAR, 10, GETDATE())
Here’s sample DDL for table containing the device/job/start & end times.
CREATE TABLE JobHistorySummary
(JobName nvarchar(255),
ActualStartTime datetime,
EndTime datetime,
DeviceName nvarchar(128))
INSERT INTO JobHistorySummary
VALUES
('FOAMTools E: Weekly - FULL', '2013-08-04 03:20:00.000', '2013-08-04 20:20:00.000', '1 Drv'),
('HRDuplex D: Weekly - FULL', '2013-08-04 18:26:00.000', '2013-08-05 13:00:00.000', '2 Drv'),
('HRDuplex D: Daily - INC', '2013-08-04 20:44:00.000', '2013-08-05 15:50:00.000', '1 Drv'),
('PayNROLL C: Weekly - FULL', '2013-08-04 00:00:00.000', '2013-08-06 15:40:00.000','3 Drv'),
('PayNROLL C: Daily - INC', '2013-08-05 06:30:00.000', '2013-08-05 06:50:00.000', '4 Drv'),
('SmallIBM F: Daily - FULL', '2013-08-04 00:30:00.000', '2013-08-04 06:30:00.000', '5 Drv'),
('BigIBM F: Daily - INC', '2013-08-06 12:30:00.000', '2013-08-06 12:50:00.000', '6 Drv');
The calculation need to get local time is [ActualStartTime]+ GETDATE() - GETUTCDATE())
Even though I just need two tables, I can't figure out the logic of joining them so that they create NULL placeholders for those datetimes where drives are idle. I would like to count up the rows with NULL values as the idle minutes per drive. Also, I can't figure out how to isolate minutes of usage to the day in which they occurred...meaning no more than 1440 minutes of work per day per drive, even for jobs spanning midnight. Minutes of the next day are allocated as minutes worked by respective drive to the following day.
The following shows how to generate the time in use per day per device. The idle time is just 24 minus that. Assuming you have generated a table with all the dates in the range of the table, lets call that cal with one field dt of type Date. (easy to do). The following gives the general approach.
select devicename, cal.dt, sum(time(least(actualendtime, cal.dt)- time(actualstarttime))
from JobHistorySummary jhs inner join cal
on (jhs.actualstarttime >= cal.dt and jhs.actualendtime < cal.dt)
group by devicename, cal.dt
Now you have use the same statement above using the converted times and also assuming cal is in the converted time zone.
select devicename, cal.dt, sum(time(least(convert_tz(actualendtime,'UTC','US/Pacific'), cal.dt)- time(convert_tz(actualstarttime,'UTC','US/Pacific')))
from JobHistorySummary jhs inner join cal
on (convert_tz(actualstarttime,'UTC','US/Pacific') >= cal.dt and convert_tz(actualendtime,'UTC','US/Pacific') < cal.dt)
group by devicename, cal.dt
But also the above isn't exacty right either because mysql does not do substract and aggregate summing correctly on time values. So you need to use something more like:
select devicename, cal.dt, econd_to_time(sum(time_to_second(timediff(time(least(actualendtime, cal.dt), time(actualstarttime)))))
from JobHistorySummary jhs inner join cal
on (jhs.actualstarttime >= cal.dt and jhs.actualendtime < cal.dt)
group by devicename, cal.dt

Time range- Sql

please help me with my problem. So, I have a table named 'RATES' which contains these columns:
id (int)
rate (money)
start_time (datetime)
end_time(datetime)
example data:
1 150 8:00am 6:00pm
2 200 6:00pm 4:00am
3 250 8:00am 4:00am (the next day)
What I have to do is to select all the id(s) to where a given time would fall.
e.g given time: 9:00 pm, the output should be 2,3
The problem is I got this time range between 8am to 4am the next day and I don't know what to do. Help, please! thanks in advance :D
Assuming that #Andriy M is correct:
Data never spans more than 24 hours
if end_time<=start_time then end_time belongs to the next day
then what you're looking for is this:
Declare #GivenTime DateTime
Set #GivenTime = '9:00 PM'
Select ID
From Rates
Where (Start_Time<End_Time And Start_Time<=#GivenTime And End_Time>=#GivenTime)
Or (Start_Time=End_Time And Start_Time=#GivenTime)
Or (Start_Time>End_Time And (Start_Time>=#GivenTime Or End_Time<=#GivenTime))
I don't really ever use MS SQL, but maybe this will help.
I was going to suggest something like this, but by the way you have your data set up, this would fail.
SELECT id FROM RATES
WHERE datepart(hh, start_time) <= 9 AND datepart(hh, end_time) >= 9;
You'll have you search using the actual date if you expect to get the correct data back.
SELECT id FROM RATES
WHERE start_time <= '2011-1-1 9:00' AND end_time >= '2011-1-1 9:00';
This may not be exactly correct, but it may help you look in the right direction.
I guess #gbn is not going to help you. I will try and fill in.
Given -- a table called timedata that has ranges only going over at most one day
WITH normalized AS
(
SELECT *
FROM timedata
WHERE datepart(day,start_time) = datepart(day,endtime)
UNION ALL
SELECT id, rate, start_time, dateadd(second,dateadd(day,datediff(day,0,end_time),0),-1) as end_time
FROM timedata
WHERE not (datepart(day,start_time) = datepart(day,endtime))
UNION ALL
SELECT id, rate,dateadd(day,datediff(day,0,end_time),0) as start_time, end_time
FROM timedata
WHERE not (datepart(day,start_time) = datepart(day,endtime))
)
SELECT *
FROM normalized
WHERE datepart(hour,start_time) < #inhour
AND datepart(hour,end_time) > #inhour
This makes use of a CTE and a trick to truncate datetime values. To understand this trick read this question and answer: Floor a date in SQL server
Here is an outline of what this query does:
Create a normalized table with each time span only going over one day by
Selecting all rows that occur on the same day.
Then for each entry that spans two days joining in
Selecting the starttime and one second before the next day as the end time for all that span.
and
Selecting 12am of the end_time date as the starttime and the end_time.
Finally you perform the select using the hour indicator on this normalized table.
If your ranges go over more than one day you would need to use a recursive CTE to get the same normalized table.