Fetch empty/open time ranges in postgresql - sql

I'm trying to find open shifts where:
First shift starts at 6 AM
Last Shift ends at 12 AM
ie:
Given the following data/day:
start_time | end_time
-----------|---------
9 AM | 3 PM
5 PM | 10 PM
Expected results:
start_time | end_time
-----------|---------
6 AM | 9 AM
3 PM | 5 PM
10 PM | 12 AM
Here's what I tried but it's not working (Ik it's mostly way far from the correct answer)
SELECT *
FROM WORKERS_SCHEDULE
WHERE START_TIME not BETWEEN
ANY (SELECT START_TIME FROM WORKERS_SCHEDULE)
AND (SELECT START_TIME FROM WORKERS_SCHEDULE)
start_time and end_time are of datatype TIME.

Here is one way to do it with union all and window functions:
select *
from (
select '06:00:00'::time start_time, min(start_time) end_time from mytable
union all
select end_time, lead(start_time) over(order by start_time) from mytable
union all
select max(end_time), '23:59:59'::time from mytable
) t
where start_time <> end_time
It is bit complicated to thouroughly explain how it works but: the first unioned query computes the interval between 6 AM and the start of the first shift, the second subquery processes declared shift, and the last one handles the interval between the last shift and midnight. Then, the outer query filters on records that have gaps. To understand how it works, you can run the subquery independently, and see how the starts and ends ajust.
Demo on DB Fiddle:
start_time | end_time
:--------- | :-------
06:00:00 | 09:00:00
15:00:00 | 17:00:00
22:00:00 | 23:59:59

Try this if it works for you
SELECT
Case when
START_TIME=(SELECT
MIN(start_time) FROM
TABLE) AND START_TIME >'6:00
AM'
THEN
'6:00 AM -' ||MIN(START_TIME)
ELSE
SELECT min(END_TIME) FROM
TABLE
WHERE ENDTIME<S.START_time
||StartTime
End
From table S
Union
(select max(endtime) ||
' 12:00 AM' from table)

This is another way:
WITH RECURSIVE open_shifts AS (
SELECT time '6:00' AS START_TIME, MIN(START_TIME) AS END_TIME
FROM WORKERS_SCHEDULE
WHERE START_TIME BETWEEN time '6:00' AND time '23:59'
UNION
SELECT start_gap.START_TIME AS START_TIME, end_gap.END_TIME AS END_TIME FROM WORKERS_SCHEDULE end_gap,
(SELECT ws.END_TIME AS START_TIME
FROM WORKERS_SCHEDULE ws, open_shifts prev_gap
WHERE ws.START_TIME = prev_gap.END_TIME) start_gap
WHERE end_gap.END_TIME > start_gap.START_TIME
AND END_TIME BETWEEN time '6:00' AND time '23:59'
)
SELECT * FROM open_shifts
UNION
SELECT MAX(END_TIME) AS START_TIME, time '23:59' AS END_TIME FROM WORKERS_SCHEDULE
WHERE END_TIME BETWEEN time '6:00' AND time '23:59';
Used a recursive CTE to find the gaps between each shift's end time and the next closest shift's start time. This probably won't work with overlapping shifts though.

Related

Explode time duration defined by start and end timestamp by the hour

I have a table with work shifts (1 row per shift) that include date, start and end time.
Main goal: I want to aggregate the number of working hours per hour per store.
This is what my shift table looks like:
employee_id
store
start_timestamp
end_timestamp
1
1
2022-01-01T07:00
2022-01-01T11:30
2
1
2022-01-01T08:30
2022-01-01T12:30
...
...
...
...
I want to "explode" the information into a table something like this:
hour
employee_id
store
date
scheduled_work (h)
07:00
1
1
2022-01-01
1
08:00
1
1
2022-01-01
1
09:00
1
1
2022-01-01
1
10:00
1
1
2022-01-01
1
11:00
1
1
2022-01-01
0.5
08:00
2
1
2022-01-01
0.5
09:00
2
1
2022-01-01
1
10:00
2
1
2022-01-01
1
11:00
2
1
2022-01-01
1
12:00
2
1
2022-01-01
0.5
...
...
...
...
...
I have tried using a method using cross joins and it consumed a lot of memory and looks like this:
with test as (
select 1 as employee_id, 1 as store_id, timestamp('2022-01-01 07:00:00') as start_timestamp, timestamp('2022-01-01 11:30:00') as end_timestamp union all
select 2 as employee_id, 1 as store_id, timestamp('2022-01-01 08:30:00') as start_timestamp, timestamp('2022-01-01 12:30:00') as end_timestamp
)
, cte as (
select ts
, test.*
, safe_divide(
timestamp_diff(
least(date_add(ts, interval 1 hour), end_timestamp)
, greatest(ts, start_timestamp)
, millisecond
)
, 3600000
) as scheduled_work
from test
cross join unnest(generate_timestamp_array(timestamp('2022-01-01 07:00:00'),
timestamp('2022-01-01 12:30:00'), interval 1 hour)) as ts
order by employee_id, ts)
select * from cte
where scheduled_work >= 0;
It's working but I know this will not be good when the number of shifts starts to add up. Does anyone have another solution that is more efficient?
I'm using BigQuery.
you might want to remove order by inside cte subquery, it'll affect the query performance.
And another similar approach:
WITH test AS (
select 1 as employee_id, 1 as store_id, timestamp('2022-01-01 07:00:00') as start_timestamp, timestamp('2022-01-01 11:30:00') as end_timestamp union all
select 2 as employee_id, 1 as store_id, timestamp('2022-01-01 08:30:00') as start_timestamp, timestamp('2022-01-01 12:30:00') as end_timestamp
),
explodes AS (
SELECT employee_id, store_id, EXTRACT(DATE FROM h) date, TIME_TRUNC(EXTRACT(TIME FROM h), HOUR) hour, 1 AS scheduled_work
FROM test,
UNNEST (GENERATE_TIMESTAMP_ARRAY(
TIMESTAMP_TRUNC(start_timestamp + INTERVAL 1 HOUR, HOUR),
TIMESTAMP_TRUNC(end_timestamp - INTERVAL 1 HOUR, HOUR), INTERVAL 1 HOUR
)) h
UNION ALL
SELECT employee_id, store_id, EXTRACT(DATE FROM h), TIME_TRUNC(EXTRACT(TIME FROM h), HOUR),
CASE offset
WHEN 0 THEN 1 - (EXTRACT(MINUTE FROM h) * 60 + EXTRACT(SECOND FROM h)) / 3600
WHEN 1 THEN (EXTRACT(MINUTE FROM h) * 60 + EXTRACT(SECOND FROM h)) / 3600
END
FROM test, UNNEST([start_timestamp, end_timestamp]) h WITH OFFSET
)
SELECT * FROM explodes WHERE scheduled_work > 0;
Consider below approach
with temp as (
select * replace(
parse_time('%H:%M', start_time) as start_time,
parse_time('%H:%M', end_time) as end_time
)
from your_table
)
select * except(start_time, end_time),
case
when hour = time_trunc(start_time, hour) then (60 - time_diff(start_time, hour, minute)) / 60
when hour = time_trunc(end_time, hour) then time_diff(end_time, hour, minute) / 60
else 1
end as scheduled_work
from (
select time_add(time_trunc(start_time, hour), interval delta hour) as hour,
employee_id, store, date, start_time, end_time
from temp, unnest(generate_array(0,time_diff(end_time, start_time, hour))) delta
)
order by employee_id, hour
if applied to sample data as in your question
output is

Oracle generating schedule rows with an interval

I have some SQL that generates rows for every 5 minutes. How can this be modified to get rid of overlapping times (see below)
Note: Each row should be associated with a location_id with no repeats on the location_id. In this case there should be 25 rows generated so the CONNECT by should be something like SELECT count(*) from locations.
My goal is to create a function that takes in a schedule_id and a start_date in the format
'MMDDYYYY HH24:MI'; and stop creating rows if the next entry will cross midnight; that means some of the location_id may not be used.
The end result is to have the rows placed in the schedule table below. Since I don't have a function yet the schedule_id can be hard coded to 1. I've heard about recursive CTE, would this quality for that method?
Thanks in advance to all who answer and your expertise.
ALTER SESSION SET NLS_DATE_FORMAT = 'MMDDYYYY HH24:MI:SS';
create table schedule(
schedule_id NUMBER(4),
location_id number(4),
start_date DATE,
end_date DATE,
CONSTRAINT start_min check (start_date=trunc(start_date,'MI')),
CONSTRAINT end_min check (end_date=trunc(end_date,'MI')),
CONSTRAINT end_gt_start CHECK (end_date >= start_date),
CONSTRAINT same_day CHECK (TRUNC(end_date) = TRUNC(start_date))
);
CREATE TABLE locations AS
SELECT level AS location_id,
'Door ' || level AS location_name,
CASE. round(dbms_random.value(1,3))
WHEN 1 THEN 'A'
WHEN 2 THEN 'T'
WHEN 3 THEN 'G'
END AS location_type
FROM dual
CONNECT BY level <= 25;
with
row_every_5_mins as
( select trunc(sysdate) + (rownum-1)*5/1440 t_from,
trunc(sysdate) + rownum*5/1440 t_to
from dual
connect by level <= 1440/5
) SELECT * from row_every_5_mins;
Current output:
|T_FROM|T_TO|
|-----------------|-----------------|
|08162021 00:00:00|08162021 00:05:00|
|08162021 00:05:00|08162021 00:10:00|
|08162021 00:10:00|08162021 00:15:00|
|08162021 00:15:00|08162021 00:20:00|
…
Desired output
|T_FROM|T_TO|
|-----------------|-----------------|
|08162021 00:00:00|08162021 00:05:00|
|08162021 00:10:00|08162021 00:15:00|
|08162021 00:20:00|08162021 00:25:00|
…
You may avoid recursive query or loop, because you essentially need a row number of each row in locations table. So you'll need to provide an appropriate sort order to the analytic function. Below is the query:
with a as (
select
date '2021-01-01'
+ to_dsinterval('0 23:30:00')
as start_dt_param
from dual
)
, date_gen as (
select
location_id
, start_dt_param
, start_dt_param + (row_number() over(order by location_id) - 1)
* interval '10' minute as start_dt
, start_dt_param + (row_number() over(order by location_id) - 1)
* interval '10' minute + interval '5' minute as end_dt
from a
cross join locations
)
select
location_id
, start_dt
, end_dt
from date_gen
where end_dt < trunc(start_dt_param + 1)
LOCATION_ID | START_DT | END_DT
----------: | :------------------ | :------------------
1 | 2021-01-01 23:30:00 | 2021-01-01 23:35:00
2 | 2021-01-01 23:40:00 | 2021-01-01 23:45:00
3 | 2021-01-01 23:50:00 | 2021-01-01 23:55:00
UPD:
Or if you wish a procedure, then it is even simpler. Because from 12c Oracle has fetch first addition, and analytic function may be simplified to rownum pseudocolumn:
create or replace procedure populate_schedule (
p_schedule_id in number
, p_start_date in date
) as
begin
insert into schedule (schedule_id, location_id, start_date, end_date)
select
p_schedule_id
, location_id
, p_start_date + (rownum - 1) * interval '10' minute
, p_start_date + (rownum - 1) * interval '10' minute + interval '5' minute
from locations
/*Put your order of location assignment here*/
order by location_id
/*The number of 10-minute intervals before midnight from the first end_date*/
fetch first ((trunc(p_start_date + 1) - p_start_date + 1/24/60*5)*24*60/10) rows only
;
commit;
end;
/
begin
populate_schedule(1, timestamp '2020-01-01 23:37:00');
populate_schedule(2, timestamp '2020-01-01 23:35:00');
populate_schedule(3, timestamp '2020-01-01 23:33:00');
end;/
select *
from schedule
order by schedule_id, start_date
SCHEDULE_ID | LOCATION_ID | START_DATE | END_DATE
----------: | ----------: | :------------------ | :------------------
1 | 1 | 2020-01-01 23:37:00 | 2020-01-01 23:42:00
1 | 2 | 2020-01-01 23:47:00 | 2020-01-01 23:52:00
2 | 1 | 2020-01-01 23:35:00 | 2020-01-01 23:40:00
2 | 2 | 2020-01-01 23:45:00 | 2020-01-01 23:50:00
2 | 3 | 2020-01-01 23:55:00 | 2020-01-02 00:00:00
3 | 1 | 2020-01-01 23:33:00 | 2020-01-01 23:38:00
3 | 2 | 2020-01-01 23:43:00 | 2020-01-01 23:48:00
3 | 3 | 2020-01-01 23:53:00 | 2020-01-01 23:58:00
db<>fiddle here
Just loop every 10 minutes instead of every 5 minutes:
WITH input (start_time) AS (
SELECT TRUNC(SYSDATE) + INTERVAL '23:30' HOUR TO MINUTE FROM DUAL
)
SELECT start_time + (LEVEL-1) * INTERVAL '10' MINUTE
AS t_from,
start_time + (LEVEL-1) * INTERVAL '10' MINUTE + INTERVAL '5' MINUTE
AS t_to
FROM input
CONNECT BY (LEVEL-1) * INTERVAL '10' MINUTE < INTERVAL '1' DAY
AND LEVEL <= (SELECT COUNT(*) FROM locations)
AND start_time + (LEVEL-1) * INTERVAL '10' MINUTE < TRUNC(start_time) + INTERVAL '1' DAY;
db<>fiddle here
A CTE is certainly the fastest solution. If you like to get more flexibility for intervals then you can use the SCHEDULER SCHEDULE. As drawback the performance might be weaker.
CREATE OR REPLACE TYPE TimestampRecType AS OBJECT (
T_FROM TIMESTAMP(0),
T_TO TIMESTAMP(0)
);
CREATE OR REPLACE TYPE TimestampTableType IS TABLE OF TimestampRecType;
CREATE OR REPLACE FUNCTION GetGchedule(
start_time IN TIMESTAMP,
stop_time in TIMESTAMP DEFAULT TRUNC(SYSDATE)+1)
RETURN TimestampTableType AS
ret TimestampTableType := TimestampTableType();
return_date_after TIMESTAMP := start_time;
next_run_date TIMESTAMP ;
BEGIN
LOOP
DBMS_SCHEDULER.EVALUATE_CALENDAR_STRING('FREQ=MINUTELY;INTERVAL=5;', NULL, return_date_after, next_run_date);
ret.EXTEND;
ret(ret.LAST) := TimestampRecType(return_date_after, next_run_date);
return_date_after := next_run_date;
EXIT WHEN next_run_date >= stop_time;
END LOOP;
RETURN ret;
END;
SELECT *
FROM TABLE(GetGchedule(trunc(sysdate)));
See syntax for calendar here: Calendaring Syntax

BigQuery - Query for each and set elements in column

I would like to loop over several elements for a query.
Here is the query :
SELECT
timestamp_trunc(timestamp, DAY) as Day,
count(1) as Number
FROM `table`
WHERE user_id="12345" AND timestamp >= '2021-07-05 00:00:00 UTC' AND timestamp <= '2021-07-08 23:59:59 UTC'
GROUP BY 1
ORDER BY Day
So I have for the user "12345" a row counter per each day between two dates, this is perfect.
But I would like to do this query for each user_id of my table,
and if possible I would like each day on column, so each row is a user and the number available for each column (which is a day).
Result wanted :
User | 2021-07-05 | 2021-07-06 | 2021-07-07
---------------------------------------------
user_1 | 345 | 16 | 41
user_2 | 555 | 53 | 26
Thank you very much
Use below approach
SELECT * FROM (
SELECT
user_id,
DATE(timestamp) as Day,
COUNT(1) as Number
FROM `project.dataset.table`
WHERE timestamp >= '2021-07-05 00:00:00 UTC' AND timestamp <= '2021-07-08 23:59:59 UTC'
GROUP BY 1, 2
)
PIVOT (SUM(Number) FOR Day IN ('2021-07-05','2021-07-06','2021-07-07'))
Or even simpler (w/o GROUP BY as in your original query)
SELECT * FROM (
SELECT
user_id,
DATE(timestamp) as Day,
FROM `project.dataset.table`
WHERE timestamp >= '2021-07-05 00:00:00 UTC' AND timestamp <= '2021-07-08 23:59:59 UTC'
)
PIVOT (COUNT(*) FOR Day IN ('2021-07-05','2021-07-06','2021-07-07'))

Group by with Unix time stamps

I am trying write a query where time stamps are in Unix format.
The objective of the query is group by these time stamps in five minute segments and to count each unique Id in those segments.
Is there a simple way of doing this?
The result looking for this
Time_utc Id count
25/07/2019 1600 1 3
25/07/2019 1600 2 1
25/07/2019 1605 1 4
You haven't shown data, so as a starting point you can group the Unix timestamps by dividing by 300 (for 5 minutes worth of seconds):
select 300 * floor(unix_ts/300) as unix_five_minute,
timestamp '1970-01-01 00:00:00 UTC'
+ (300*floor(unix_ts/300)) * interval '1' second as oracle_timestamp,
count(*)
from cte2
group by floor(unix_ts/300);
or if you have millisecond precision adjust by a factor of 1000:
select 300000 * floor(unix_ts/300000) as unix_five_minute,
timestamp '1970-01-01 00:00:00 UTC'
+ (300*floor(unix_ts/300000)) * interval '1' second as oracle_timestamp,
count(*)
from cte2
group by floor(unix_ts/300000);
Demo using made-up data generated from current time:
-- CTEs to generate some sample data
with cte1 (oracle_interval) as (
select systimestamp - level * interval '42' second
- timestamp '1970-01-01 00:00:00.0 UTC'
from dual
connect by level <= 30
),
cte2 (unix_ts) as (
select trunc(
extract(day from oracle_interval) * 86400000
+ extract(hour from oracle_interval) * 3600000
+ extract(minute from oracle_interval) * 60000
+ extract(second from oracle_interval) * 1000
)
from cte1
)
-- actual query
select 300000 * floor(unix_ts/300000) as unix_five_minute,
timestamp '1970-01-01 00:00:00 UTC'
+ (300*floor(unix_ts/300000)) * interval '1' second as oracle_timestamp,
count(*)
from cte2
group by floor(unix_ts/300000);
UNIX_FIVE_MINUTE ORACLE_TIMESTAMP COUNT(*)
---------------- ------------------------- ----------------
1564072500000 2019-07-25 16:35:00.0 UTC 7
1564072200000 2019-07-25 16:30:00.0 UTC 7
1564071600000 2019-07-25 16:20:00.0 UTC 4
1564071900000 2019-07-25 16:25:00.0 UTC 8
1564072800000 2019-07-25 16:40:00.0 UTC 4
Unix time stamps such as 155639.600 or 155639.637
Those are unusual values; Unix/epoch times are usually 10-digit numbers, or 13 digits for millisecond precision. Assuming (or rather, guessing) that they are tenths of a second for some reason:
-- CTE for sample data
with cte (unix_ts) as (
select 155639.600 from dual
union all
select 155639.637 from dual
)
-- actual query
select 300 * floor(unix_ts*10000/300) as unix_five_minute,
timestamp '1970-01-01 00:00:00 UTC'
+ (300*floor(unix_ts*10000/300)) * interval '1' second as oracle_timestamp,
count(*)
from cte
group by floor(unix_ts*10000/300);
UNIX_FIVE_MINUTE ORACLE_TIMESTAMP COUNT(*)
---------------- ------------------------- ----------------
1556396100 2019-04-27 20:15:00.0 UTC 1
1556395800 2019-04-27 20:10:00.0 UTC 1
The 10000/300 could be simplified to 100/3, but I think it's clearer left as it is.

how to convert HH:MM representation to minutes in oracle sql

how to convert varchar(hh:mm) to minutes in oracle sql.
For example:
HH:MM Minutes
08:00 480
08:45 525
07:57 477
This will work even if the duration is 24 hours or greater:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE durations ( duration ) AS
SELECT '00:30' FROM DUAL UNION ALL
SELECT '07:57' FROM DUAL UNION ALL
SELECT '08:00' FROM DUAL UNION ALL
SELECT '12:00' FROM DUAL UNION ALL
SELECT '20:01' FROM DUAL UNION ALL
SELECT '23:59' FROM DUAL UNION ALL
SELECT '24:00' FROM DUAL UNION ALL
SELECT '24:59' FROM DUAL;
Query 1:
SELECT duration,
( (
DATE '1970-01-01'
+ NUMTODSINTERVAL( SUBSTR( duration, 1, INSTR( duration, ':' ) - 1 ), 'HOUR' )
+ NUMTODSINTERVAL( SUBSTR( duration, INSTR( duration, ':' ) + 1 ), 'MINUTE' )
)
- DATE '1970-01-01'
) * 24 * 60 AS Minutes
FROM durations
Results:
| DURATION | MINUTES |
|----------|---------|
| 00:30 | 30 |
| 07:57 | 477 |
| 08:00 | 480 |
| 12:00 | 720 |
| 20:01 | 1201 |
| 23:59 | 1439 |
| 24:00 | 1440 |
| 24:59 | 1499 |
However, there is an INTERVAL DAY TO SECOND data type that would be better suited to your data:
CREATE TABLE your_table (
duration INTERVAL DAY TO SECOND
);
Then you can just do:
INSERT INTO your_table ( duration ) VALUES ( INTERVAL '08:00' HOUR TO MINUTE );
To get the number of minutes you can then simply do:
SELECT ( ( DATE '1970-01-01' + duration ) - DATE '1970-01-01' ) *24*60 AS minutes
FROM your_table
Try this
TO_NUMBER(SUBSTR('(08:00)',2,INSTR('(08:00)',':')-2))*60+TO_NUMBER(SUBSTR('(08:00)',INSTR('(08:00)',':')+1,2))
If you can convert your input to a real date first, the task becomes much easier. Here, I have shamelessly appended the time to a fake date to create a date such as 2017-01-01 00:30. To find out the number of minutes since midnight, you simply subtract the date for "midnight". It will return the difference in days, so you need to multiply by number of minutes per day to get what you want.
select time
,(to_date('2017-01-01 ' || time, 'yyyy-mm-dd hh24:mi') - date '2017-01-01') * 24 * 60 as minutes
from (select '00:30' as time from dual union all
select '08:00' as time from dual union all
select '08:30' as time from dual union all
select '12:00' as time from dual union all
select '23:59' as time from dual
);
Here is some sample input and output
time minutes
==== =======
00:30 30
08:00 480
08:30 510
12:00 720
23:59 1 439
If you require to Print 08:00 hours as 480 minutes,
Extract the Digit before : and multply with 60 and add the digit after :. So you can convert the HH:MM representation in to minutes.
SELECT REGEXP_SUBSTR(ATT.workdur,'[^:]+',1,1)*60 + REGEXP_SUBSTR(ATT.workdur,'[^:]+',1,2) MINUTES FROM DUAL;