Using a Postgres window statement in the FROM of a sub query

Using a Postgres window statement in the FROM of a sub query - sql

I have the following SUB QUERY as part of a SELECT statement. Which is supposed to take one calculated time, away from another calculated time.
However Postgres doesn't like having the Window function within the FROM clause.
(SELECT count(*) AS work_hours
FROM generate_series (TIMESTAMP 'epoch' + MAX(wog.endtime) OVER(PARTITION BY woas.workorderid ORDER BY wog.endtime DESC)/1000 * INTERVAL '1 second'
, TIMESTAMP 'epoch' + nth_value(wog.endtime,2) OVER(PARTITION BY woas.workorderid ORDER BY wog.endtime DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)/1000 * INTERVAL '1 second' - interval '1h'
, interval '1h') h
WHERE EXTRACT(ISODOW FROM h) < 6
AND h::time >= '08:00'
AND h::time <= '18:00') AS "Max minus Second Max",
Postgres returns the following error:
ERROR: cannot use window function in function expression in FROM
How can I reformat the above statement, so that it parses without error?
Update:
I dont think the structure of the query is the issue. If I put timestamp string in place of the functions it works fine.
(SELECT count(*) AS work_hours
FROM generate_series (timestamp '2018-01-06 13:30'
, timestamp '2018-01-08 21:29' - interval '1h'
, interval '1h') h
WHERE EXTRACT(ISODOW FROM h) < 6
AND h::time >= '08:00'
AND h::time <= '18:00') "Time Difference" from workorder wo

After "FROM" clause a table/view to query the data from is expected.
You refactor the query like this:
select count(*) from (
select generate_series( ....)
where (cond1 and cond2..)
Obviously it will not work when you put the function in the "from" clause.

If you don't already have a solution, you might want to try this. I think it does what you want.
Set-up test: similar enough to your situation, I hope.
create table work_order_times
(
work_order_id integer,
end_time bigint -- milliseconds
);
insert into work_order_times (work_order_id, end_time) values (23, extract(epoch from now()) * 1000);
insert into work_order_times (work_order_id, end_time) values (23, (extract(epoch from now()) - 20000) * 1000);
insert into work_order_times (work_order_id, end_time) values (57, (extract(epoch from now()) - 40000) * 1000);
insert into work_order_times (work_order_id, end_time) values (57, (extract(epoch from now()) - 60000) * 1000);
insert into work_order_times (work_order_id, end_time) values (57, (extract(epoch from now()) - 80000) * 1000);
Check set-up:
select
work_order_id,
end_time,
to_timestamp(end_time / 1000) as end_timestamp
from
work_order_times
order by
work_order_id,
end_time;
work_order_id | end_time | end_timestamp
---------------+---------------+------------------------
23 | 1516251234772 | 2018-01-18 04:53:54+00
23 | 1516271234769 | 2018-01-18 10:27:14+00
57 | 1516191234774 | 2018-01-17 12:13:54+00
57 | 1516211234773 | 2018-01-17 17:47:14+00
57 | 1516231234772 | 2018-01-17 23:20:34+00
(5 rows)
The Query:
select
work_order_id,
generate_series (penultimate_timestamp, latest_timestamp, interval '1 hour')
from
(
select
work_order_id,
to_timestamp(latest_end_time / 1000) as latest_timestamp,
to_timestamp(penultimate_end_time / 1000) as penultimate_timestamp
from
(
select
work_order_id,
row_number() over last_2_timestamps as row_number,
max(end_time) over last_2_timestamps as latest_end_time,
lead(end_time) over last_2_timestamps as penultimate_end_time
from
work_order_times
window
last_2_timestamps as (partition by work_order_id order by end_time desc)
) x
where
row_number = 1
) y;
The results:
work_order_id | generate_series
---------------+------------------------
23 | 2018-01-18 04:53:54+00
23 | 2018-01-18 05:53:54+00
23 | 2018-01-18 06:53:54+00
23 | 2018-01-18 07:53:54+00
23 | 2018-01-18 08:53:54+00
23 | 2018-01-18 09:53:54+00
57 | 2018-01-17 17:47:14+00
57 | 2018-01-17 18:47:14+00
57 | 2018-01-17 19:47:14+00
57 | 2018-01-17 20:47:14+00
57 | 2018-01-17 21:47:14+00
57 | 2018-01-17 22:47:14+00
(12 rows)
The PostgreSQL documentation does mention some restrictions about how window functions can be nested, but it seems to work like this.
It would make things slightly simpler to store the endtime as a timestamp rather than a number of milliseconds, which is what it appears to be, but maybe you don't have the opportunity to do that.

Related

Explode time duration defined by start and end timestamp by the hour

I have a table with work shifts (1 row per shift) that include date, start and end time.
Main goal: I want to aggregate the number of working hours per hour per store.
This is what my shift table looks like:
employee_id
store
start_timestamp
end_timestamp
1
1
2022-01-01T07:00
2022-01-01T11:30
2
1
2022-01-01T08:30
2022-01-01T12:30
...
...
...
...
I want to "explode" the information into a table something like this:
hour
employee_id
store
date
scheduled_work (h)
07:00
1
1
2022-01-01
1
08:00
1
1
2022-01-01
1
09:00
1
1
2022-01-01
1
10:00
1
1
2022-01-01
1
11:00
1
1
2022-01-01
0.5
08:00
2
1
2022-01-01
0.5
09:00
2
1
2022-01-01
1
10:00
2
1
2022-01-01
1
11:00
2
1
2022-01-01
1
12:00
2
1
2022-01-01
0.5
...
...
...
...
...
I have tried using a method using cross joins and it consumed a lot of memory and looks like this:
with test as (
select 1 as employee_id, 1 as store_id, timestamp('2022-01-01 07:00:00') as start_timestamp, timestamp('2022-01-01 11:30:00') as end_timestamp union all
select 2 as employee_id, 1 as store_id, timestamp('2022-01-01 08:30:00') as start_timestamp, timestamp('2022-01-01 12:30:00') as end_timestamp
)
, cte as (
select ts
, test.*
, safe_divide(
timestamp_diff(
least(date_add(ts, interval 1 hour), end_timestamp)
, greatest(ts, start_timestamp)
, millisecond
)
, 3600000
) as scheduled_work
from test
cross join unnest(generate_timestamp_array(timestamp('2022-01-01 07:00:00'),
timestamp('2022-01-01 12:30:00'), interval 1 hour)) as ts
order by employee_id, ts)
select * from cte
where scheduled_work >= 0;
It's working but I know this will not be good when the number of shifts starts to add up. Does anyone have another solution that is more efficient?
I'm using BigQuery.

you might want to remove order by inside cte subquery, it'll affect the query performance.
And another similar approach:
WITH test AS (
select 1 as employee_id, 1 as store_id, timestamp('2022-01-01 07:00:00') as start_timestamp, timestamp('2022-01-01 11:30:00') as end_timestamp union all
select 2 as employee_id, 1 as store_id, timestamp('2022-01-01 08:30:00') as start_timestamp, timestamp('2022-01-01 12:30:00') as end_timestamp
),
explodes AS (
SELECT employee_id, store_id, EXTRACT(DATE FROM h) date, TIME_TRUNC(EXTRACT(TIME FROM h), HOUR) hour, 1 AS scheduled_work
FROM test,
UNNEST (GENERATE_TIMESTAMP_ARRAY(
TIMESTAMP_TRUNC(start_timestamp + INTERVAL 1 HOUR, HOUR),
TIMESTAMP_TRUNC(end_timestamp - INTERVAL 1 HOUR, HOUR), INTERVAL 1 HOUR
)) h
UNION ALL
SELECT employee_id, store_id, EXTRACT(DATE FROM h), TIME_TRUNC(EXTRACT(TIME FROM h), HOUR),
CASE offset
WHEN 0 THEN 1 - (EXTRACT(MINUTE FROM h) * 60 + EXTRACT(SECOND FROM h)) / 3600
WHEN 1 THEN (EXTRACT(MINUTE FROM h) * 60 + EXTRACT(SECOND FROM h)) / 3600
END
FROM test, UNNEST([start_timestamp, end_timestamp]) h WITH OFFSET
)
SELECT * FROM explodes WHERE scheduled_work > 0;

Consider below approach
with temp as (
select * replace(
parse_time('%H:%M', start_time) as start_time,
parse_time('%H:%M', end_time) as end_time
)
from your_table
)
select * except(start_time, end_time),
case
when hour = time_trunc(start_time, hour) then (60 - time_diff(start_time, hour, minute)) / 60
when hour = time_trunc(end_time, hour) then time_diff(end_time, hour, minute) / 60
else 1
end as scheduled_work
from (
select time_add(time_trunc(start_time, hour), interval delta hour) as hour,
employee_id, store, date, start_time, end_time
from temp, unnest(generate_array(0,time_diff(end_time, start_time, hour))) delta
)
order by employee_id, hour
if applied to sample data as in your question
output is

Oracle generating schedule rows with an interval

I have some SQL that generates rows for every 5 minutes. How can this be modified to get rid of overlapping times (see below)
Note: Each row should be associated with a location_id with no repeats on the location_id. In this case there should be 25 rows generated so the CONNECT by should be something like SELECT count(*) from locations.
My goal is to create a function that takes in a schedule_id and a start_date in the format
'MMDDYYYY HH24:MI'; and stop creating rows if the next entry will cross midnight; that means some of the location_id may not be used.
The end result is to have the rows placed in the schedule table below. Since I don't have a function yet the schedule_id can be hard coded to 1. I've heard about recursive CTE, would this quality for that method?
Thanks in advance to all who answer and your expertise.
ALTER SESSION SET NLS_DATE_FORMAT = 'MMDDYYYY HH24:MI:SS';
create table schedule(
schedule_id NUMBER(4),
location_id number(4),
start_date DATE,
end_date DATE,
CONSTRAINT start_min check (start_date=trunc(start_date,'MI')),
CONSTRAINT end_min check (end_date=trunc(end_date,'MI')),
CONSTRAINT end_gt_start CHECK (end_date >= start_date),
CONSTRAINT same_day CHECK (TRUNC(end_date) = TRUNC(start_date))
);
CREATE TABLE locations AS
SELECT level AS location_id,
'Door ' || level AS location_name,
CASE. round(dbms_random.value(1,3))
WHEN 1 THEN 'A'
WHEN 2 THEN 'T'
WHEN 3 THEN 'G'
END AS location_type
FROM dual
CONNECT BY level <= 25;
with
row_every_5_mins as
( select trunc(sysdate) + (rownum-1)*5/1440 t_from,
trunc(sysdate) + rownum*5/1440 t_to
from dual
connect by level <= 1440/5
) SELECT * from row_every_5_mins;
Current output:
|T_FROM|T_TO|
|-----------------|-----------------|
|08162021 00:00:00|08162021 00:05:00|
|08162021 00:05:00|08162021 00:10:00|
|08162021 00:10:00|08162021 00:15:00|
|08162021 00:15:00|08162021 00:20:00|
…
Desired output
|T_FROM|T_TO|
|-----------------|-----------------|
|08162021 00:00:00|08162021 00:05:00|
|08162021 00:10:00|08162021 00:15:00|
|08162021 00:20:00|08162021 00:25:00|
…

You may avoid recursive query or loop, because you essentially need a row number of each row in locations table. So you'll need to provide an appropriate sort order to the analytic function. Below is the query:
with a as (
select
date '2021-01-01'
+ to_dsinterval('0 23:30:00')
as start_dt_param
from dual
)
, date_gen as (
select
location_id
, start_dt_param
, start_dt_param + (row_number() over(order by location_id) - 1)
* interval '10' minute as start_dt
, start_dt_param + (row_number() over(order by location_id) - 1)
* interval '10' minute + interval '5' minute as end_dt
from a
cross join locations
)
select
location_id
, start_dt
, end_dt
from date_gen
where end_dt < trunc(start_dt_param + 1)
LOCATION_ID | START_DT | END_DT
----------: | :------------------ | :------------------
1 | 2021-01-01 23:30:00 | 2021-01-01 23:35:00
2 | 2021-01-01 23:40:00 | 2021-01-01 23:45:00
3 | 2021-01-01 23:50:00 | 2021-01-01 23:55:00
UPD:
Or if you wish a procedure, then it is even simpler. Because from 12c Oracle has fetch first addition, and analytic function may be simplified to rownum pseudocolumn:
create or replace procedure populate_schedule (
p_schedule_id in number
, p_start_date in date
) as
begin
insert into schedule (schedule_id, location_id, start_date, end_date)
select
p_schedule_id
, location_id
, p_start_date + (rownum - 1) * interval '10' minute
, p_start_date + (rownum - 1) * interval '10' minute + interval '5' minute
from locations
/*Put your order of location assignment here*/
order by location_id
/*The number of 10-minute intervals before midnight from the first end_date*/
fetch first ((trunc(p_start_date + 1) - p_start_date + 1/24/60*5)*24*60/10) rows only
;
commit;
end;
/
begin
populate_schedule(1, timestamp '2020-01-01 23:37:00');
populate_schedule(2, timestamp '2020-01-01 23:35:00');
populate_schedule(3, timestamp '2020-01-01 23:33:00');
end;/
select *
from schedule
order by schedule_id, start_date
SCHEDULE_ID | LOCATION_ID | START_DATE | END_DATE
----------: | ----------: | :------------------ | :------------------
1 | 1 | 2020-01-01 23:37:00 | 2020-01-01 23:42:00
1 | 2 | 2020-01-01 23:47:00 | 2020-01-01 23:52:00
2 | 1 | 2020-01-01 23:35:00 | 2020-01-01 23:40:00
2 | 2 | 2020-01-01 23:45:00 | 2020-01-01 23:50:00
2 | 3 | 2020-01-01 23:55:00 | 2020-01-02 00:00:00
3 | 1 | 2020-01-01 23:33:00 | 2020-01-01 23:38:00
3 | 2 | 2020-01-01 23:43:00 | 2020-01-01 23:48:00
3 | 3 | 2020-01-01 23:53:00 | 2020-01-01 23:58:00
db<>fiddle here

Just loop every 10 minutes instead of every 5 minutes:
WITH input (start_time) AS (
SELECT TRUNC(SYSDATE) + INTERVAL '23:30' HOUR TO MINUTE FROM DUAL
)
SELECT start_time + (LEVEL-1) * INTERVAL '10' MINUTE
AS t_from,
start_time + (LEVEL-1) * INTERVAL '10' MINUTE + INTERVAL '5' MINUTE
AS t_to
FROM input
CONNECT BY (LEVEL-1) * INTERVAL '10' MINUTE < INTERVAL '1' DAY
AND LEVEL <= (SELECT COUNT(*) FROM locations)
AND start_time + (LEVEL-1) * INTERVAL '10' MINUTE < TRUNC(start_time) + INTERVAL '1' DAY;
db<>fiddle here

A CTE is certainly the fastest solution. If you like to get more flexibility for intervals then you can use the SCHEDULER SCHEDULE. As drawback the performance might be weaker.
CREATE OR REPLACE TYPE TimestampRecType AS OBJECT (
T_FROM TIMESTAMP(0),
T_TO TIMESTAMP(0)
);
CREATE OR REPLACE TYPE TimestampTableType IS TABLE OF TimestampRecType;
CREATE OR REPLACE FUNCTION GetGchedule(
start_time IN TIMESTAMP,
stop_time in TIMESTAMP DEFAULT TRUNC(SYSDATE)+1)
RETURN TimestampTableType AS
ret TimestampTableType := TimestampTableType();
return_date_after TIMESTAMP := start_time;
next_run_date TIMESTAMP ;
BEGIN
LOOP
DBMS_SCHEDULER.EVALUATE_CALENDAR_STRING('FREQ=MINUTELY;INTERVAL=5;', NULL, return_date_after, next_run_date);
ret.EXTEND;
ret(ret.LAST) := TimestampRecType(return_date_after, next_run_date);
return_date_after := next_run_date;
EXIT WHEN next_run_date >= stop_time;
END LOOP;
RETURN ret;
END;
SELECT *
FROM TABLE(GetGchedule(trunc(sysdate)));
See syntax for calendar here: Calendaring Syntax

Fetch empty/open time ranges in postgresql

I'm trying to find open shifts where:
First shift starts at 6 AM
Last Shift ends at 12 AM
ie:
Given the following data/day:
start_time | end_time
-----------|---------
9 AM | 3 PM
5 PM | 10 PM
Expected results:
start_time | end_time
-----------|---------
6 AM | 9 AM
3 PM | 5 PM
10 PM | 12 AM
Here's what I tried but it's not working (Ik it's mostly way far from the correct answer)
SELECT *
FROM WORKERS_SCHEDULE
WHERE START_TIME not BETWEEN
ANY (SELECT START_TIME FROM WORKERS_SCHEDULE)
AND (SELECT START_TIME FROM WORKERS_SCHEDULE)
start_time and end_time are of datatype TIME.

Here is one way to do it with union all and window functions:
select *
from (
select '06:00:00'::time start_time, min(start_time) end_time from mytable
union all
select end_time, lead(start_time) over(order by start_time) from mytable
union all
select max(end_time), '23:59:59'::time from mytable
) t
where start_time <> end_time
It is bit complicated to thouroughly explain how it works but: the first unioned query computes the interval between 6 AM and the start of the first shift, the second subquery processes declared shift, and the last one handles the interval between the last shift and midnight. Then, the outer query filters on records that have gaps. To understand how it works, you can run the subquery independently, and see how the starts and ends ajust.
Demo on DB Fiddle:
start_time | end_time
:--------- | :-------
06:00:00 | 09:00:00
15:00:00 | 17:00:00
22:00:00 | 23:59:59

Try this if it works for you
SELECT
Case when
START_TIME=(SELECT
MIN(start_time) FROM
TABLE) AND START_TIME >'6:00
AM'
THEN
'6:00 AM -' ||MIN(START_TIME)
ELSE
SELECT min(END_TIME) FROM
TABLE
WHERE ENDTIME<S.START_time
||StartTime
End
From table S
Union
(select max(endtime) ||
' 12:00 AM' from table)

This is another way:
WITH RECURSIVE open_shifts AS (
SELECT time '6:00' AS START_TIME, MIN(START_TIME) AS END_TIME
FROM WORKERS_SCHEDULE
WHERE START_TIME BETWEEN time '6:00' AND time '23:59'
UNION
SELECT start_gap.START_TIME AS START_TIME, end_gap.END_TIME AS END_TIME FROM WORKERS_SCHEDULE end_gap,
(SELECT ws.END_TIME AS START_TIME
FROM WORKERS_SCHEDULE ws, open_shifts prev_gap
WHERE ws.START_TIME = prev_gap.END_TIME) start_gap
WHERE end_gap.END_TIME > start_gap.START_TIME
AND END_TIME BETWEEN time '6:00' AND time '23:59'
)
SELECT * FROM open_shifts
UNION
SELECT MAX(END_TIME) AS START_TIME, time '23:59' AS END_TIME FROM WORKERS_SCHEDULE
WHERE END_TIME BETWEEN time '6:00' AND time '23:59';
Used a recursive CTE to find the gaps between each shift's end time and the next closest shift's start time. This probably won't work with overlapping shifts though.

How to count ratio hourly?

I`m stuck a bit with understanding of my further actions while performing queries.
I have two tables "A"(date, response, b_id) and "B"(id, country). I need to count hourly ratio of a number of entries where response exists to the total number of entries on a specific date. The final selection should consist of columns "hour", "ratio".
SELECT COUNT(*) FROM A WHERE RESPONSE IS NOT NULL//counting entries with response
SELECT COUNT(*) FROM A//counting total number of entries
How to count the ratio? Should I create a separate variable for it?
How to count for each hour on a day? Should I make smth like a loop? + How can I get the "hour" part of a date?
What is the best way to select the hours and counted ratio? Should I make a separate table for it?
I`m rather new to make complex queries, so I woud be happy for every kind of help

You can do this as:
select to_char(datecol, 'HH24') as hour,
count(response) as has_response, count(*) as total,
count(response) / count(*) as ratio
from a
where datecol >= date '2018-09-18' and datecol < date '2018-09-19'
group by to_char(datecol, 'HH24');
You can also do this using avg() -- which is also fun:
select to_char(datecol, 'HH24'),
avg(case when response is not null then 1.0 else 0 end) as ratio
from a
where datecol >= date '2018-09-18' and datecol < date '2018-09-19'
group by to_char(datecol, 'HH24')
In this case, that requires more typing, though.

SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE A ( dt, response, b_id ) AS
SELECT DATE '2018-09-18' + INTERVAL '00:00' HOUR TO MINUTE, NULL, 1 FROM DUAL UNION ALL
SELECT DATE '2018-09-18' + INTERVAL '00:10' HOUR TO MINUTE, 'A', 1 FROM DUAL UNION ALL
SELECT DATE '2018-09-18' + INTERVAL '00:20' HOUR TO MINUTE, 'B', 1 FROM DUAL UNION ALL
SELECT DATE '2018-09-18' + INTERVAL '01:00' HOUR TO MINUTE, 'C', 1 FROM DUAL UNION ALL
SELECT DATE '2018-09-18' + INTERVAL '01:10' HOUR TO MINUTE, 'D', 1 FROM DUAL UNION ALL
SELECT DATE '2018-09-18' + INTERVAL '02:00' HOUR TO MINUTE, NULL, 1 FROM DUAL UNION ALL
SELECT DATE '2018-09-18' + INTERVAL '03:00' HOUR TO MINUTE, 'E', 1 FROM DUAL UNION ALL
SELECT DATE '2018-09-18' + INTERVAL '05:10' HOUR TO MINUTE, 'F', 1 FROM DUAL;
Query 1:
SELECT b_id,
TO_CHAR( TRUNC( dt, 'HH' ), 'YYYY-MM-DD HH24:MI:SS' ) AS hour,
COUNT(RESPONSE) AS total_response_per_hour,
COUNT(*) AS total_per_hour,
total_response_per_day,
total_per_day,
COUNT(response) / total_response_per_day AS ratio_for_responses,
COUNT(*) / total_per_day AS ratio
FROM (
SELECT A.*,
COUNT(RESPONSE) OVER ( PARTITION BY b_id, TRUNC( dt ) ) AS total_response_per_day,
COUNT(*) OVER ( PARTITION BY b_id, TRUNC( dt ) ) AS total_per_day
FROM A
)
GROUP BY
b_id,
total_per_day,
total_response_per_day,
TRUNC( dt, 'HH' )
ORDER BY
TRUNC( dt, 'HH' )
Results:
| B_ID | HOUR | TOTAL_RESPONSE_PER_HOUR | TOTAL_PER_HOUR | TOTAL_RESPONSE_PER_DAY | TOTAL_PER_DAY | RATIO_FOR_RESPONSES | RATIO |
|------|---------------------|-------------------------|----------------|------------------------|---------------|---------------------|-------|
| 1 | 2018-09-18 00:00:00 | 2 | 3 | 6 | 8 | 0.3333333333333333 | 0.375 |
| 1 | 2018-09-18 01:00:00 | 2 | 2 | 6 | 8 | 0.3333333333333333 | 0.25 |
| 1 | 2018-09-18 02:00:00 | 0 | 1 | 6 | 8 | 0 | 0.125 |
| 1 | 2018-09-18 03:00:00 | 1 | 1 | 6 | 8 | 0.16666666666666666 | 0.125 |
| 1 | 2018-09-18 05:00:00 | 1 | 1 | 6 | 8 | 0.16666666666666666 | 0.125 |

SELECT withResponses.hour,
withResponses.cnt AS withResponse,
alls.cnt AS AllEntries,
(withResponses.cnt / alls.cnt) AS ratio
FROM
( SELECT to_char(d, 'DD-MM-YY - HH24') || ':00 to :59 ' hour,
count(*) AS cnt
FROM A
WHERE RESPONSE IS NOT NULL
GROUP BY to_char(d, 'DD-MM-YY - HH24') || ':00 to :59 ' ) withResponses,
( SELECT to_char(d, 'DD-MM-YY - HH24') || ':00 to :59 ' hour,
count(*) AS cnt
FROM A
GROUP BY to_char(d, 'DD-MM-YY - HH24') || ':00 to :59 ' ) alls
WHERE alls.hour = withResponses.hour ;
SQLFiddle: http://sqlfiddle.com/#!4/c09b9/2

Get time difference between row values grouped by event

I am using Postgres 9.3.3
I have a table with multiple events, two of them are "AVAILABLE" and "UNAVAILABLE". These events are assigned to a specific object. There are also other object ids in this table (removed for clarity):
What I need is the "available" time per day, something like that:

SQL Fiddle
select
object_id, day,
sum(upper(available) - lower(available)) as available
from (
select
g.object_id, date_trunc('day', d) as day,
(
available *
tsrange(date_trunc('day', d), date_trunc('day', d)::date + 1, '[)')
) as available
from
(
select
object_id, event,
tsrange(
timestamp,
lead(timestamp) over(
partition by object_id order by timestamp
),
'[)'
) as available
from events
where event in ('AVAILABLE', 'UNAVAILABLE')
) s
right join
(
generate_series(
(select min(timestamp) from events),
(select max(timestamp) from events),
'1 day'
) g (d)
cross join
(select distinct object_id from events) s
) g on
tsrange(date_trunc('day', d), date_trunc('day', d)::date + 1, '[)') && available and
(event = 'AVAILABLE' or event is null) and
g.object_id = s.object_id
) s
group by 1, 2
order by 1, 2
psql output
object_id | day | available
-----------+---------------------+-----------
1 | 1970-01-02 00:00:00 | 12:00:00
1 | 1970-01-03 00:00:00 | 12:00:00
1 | 1970-01-04 00:00:00 |
1 | 1970-01-05 00:00:00 | 1 day
1 | 1970-01-06 00:00:00 | 1 day
1 | 1970-01-07 00:00:00 | 12:00:00
Table DDL
create table events (
object_id int,
event text,
timestamp timestamp
);
insert into events (object_id, event, timestamp) values
(1, 'AVAILABLE', '1970-01-02 12:00:00'),
(1, 'UNAVAILABLE', '1970-01-03 12:00:00'),
(1, 'AVAILABLE', '1970-01-05 00:00:00'),
(1, 'UNAVAILABLE', '1970-01-07 12:00:00');

Your example output suggests that you want all your objects to be returned, but grouped. If that is the case, this query can do that
select object_id, day, sum(upper(tsrange) - lower(tsrange))
from (
select object_id, date(day) as day, e.tsrange * tsrange(day, day + interval '1' day) tsrange
from generate_series(timestamp '1970-01-01', '1970-01-07', interval '1' day) day
left join (
select object_id,
case event
when 'AVAILABLE' then tsrange(timestamp, lead(timestamp) over (partition by object_id order by timestamp))
else null
end tsrange
from events
where event in ('AVAILABLE', 'UNAVAILABLE')
) e on e.tsrange && tsrange(day, day + interval '1' day)
) d
group by object_id, day
order by day, object_id
But that will output something like that (if you have multiple object_ids):
object_id | day | sum
-----------+--------------+-----------
| '1970-01-01' |
1 | '1970-01-02' | '12:00:00'
1 | '1970-01-03' | '12:00:00'
| '1970-01-04' |
1 | '1970-01-05' | '1 day'
1 | '1970-01-06' | '1 day'
2 | '1970-01-06' | '12:00:00'
1 | '1970-01-07' | '12:00:00'
In my opinion it would make much more sense, if you would query just one object at a time:
select day, sum(upper(tsrange) - lower(tsrange))
from (
select date(day) as day, e.tsrange * tsrange(day, day + interval '1' day) tsrange
from generate_series(timestamp '1970-01-01', '1970-01-07', interval '1' day) day
left join (
select case event
when 'AVAILABLE' then tsrange(timestamp, lead(timestamp) over (partition by object_id order by timestamp))
else null
end tsrange
from events
where event in ('AVAILABLE', 'UNAVAILABLE')
and object_id = 1
) e on e.tsrange && tsrange(day, day + interval '1' day)
) d
group by day
order by day
This will output something, like:
day | sum
--------------+----------
'1970-01-01' |
'1970-01-02' | '12:00:00'
'1970-01-03' | '12:00:00'
'1970-01-04' |
'1970-01-05' | '1 day'
'1970-01-06' | '1 day'
'1970-01-07' | '12:00:00'
I used this schema/data for my outputs:
create table events (
object_id int,
event text,
timestamp timestamp
);
insert into events (object_id, event, timestamp)
values (1, 'AVAILABLE', '1970-01-02 12:00:00'),
(1, 'UNAVAILABLE', '1970-01-03 12:00:00'),
(1, 'AVAILABLE', '1970-01-05 00:00:00'),
(1, 'UNAVAILABLE', '1970-01-07 12:00:00'),
(2, 'AVAILABLE', '1970-01-06 00:00:00'),
(2, 'UNAVAILABLE', '1970-01-06 06:00:00'),
(2, 'AVAILABLE', '1970-01-06 12:00:00'),
(2, 'UNAVAILABLE', '1970-01-06 18:00:00');

This is a partial answer. If we assume that the next event after available is unavailable, then lead() comes to the rescue and the following is a start:
select object_id, to_char(timestamp, 'YYYY-MM-DD') as day,
to_char(nextts - timestamp, 'HH24:MI') as interval
from (select t.*,
lead(timestamp) over (partition by object_id order by timestamp) as nextts
from table t
where event in ('AVAILABLE', 'UNAVAILABLE')
) t
where event = 'AVAILABLE'
group by object_id, to_char(timestamp, 'YYYY-MM-DD');
I suspect, though, that when the interval spans multiple days, you want to split the days into separate parts. This becomes more of a challenge.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Using a Postgres window statement in the FROM of a sub query - sql

After "FROM" clause a table/view to query the data from is expected. You refactor the query like this: select count(*) from ( select generate_series( ....) where (cond1 and cond2..) Obviously it will not work when you put the function in the "from" clause.

Related

Explode time duration defined by start and end timestamp by the hour

Oracle generating schedule rows with an interval

Fetch empty/open time ranges in postgresql

How to count ratio hourly?

Get time difference between row values grouped by event

Categories

Resources