I want to find maximum number of overlapping intervals I have in given period of time. Data have always start and end timestamp. And for given period of time (i.e. hourly) I want to get how many in total unique rows I had that was in given time, and a bit more troublesome maximum of concurrent ones in it.
Sample data:
id
start
end
1
2011-12-19 06:00:00
2011-12-19 08:45:00
2
2011-12-19 06:15:00
2011-12-19 06:30:00
3
2011-12-19 06:30:00
2011-12-19 06:45:00
4
2011-12-19 06:40:00
2011-12-19 07:15:00
5
2011-12-19 07:15:00
2011-12-19 08:45:00
6
2011-12-19 07:30:00
2011-12-19 07:50:00
7
2011-12-19 08:00:00
2011-12-19 08:30:00
8
2011-12-19 08:00:00
2011-12-19 08:15:00
9
2011-12-19 08:30:00
2011-12-19 08:45:00
For this data hourly result would look like:
id
period
max
total
1
2011-12-18 06:00:00 - 2011-12-19 07:00:00
3
4
2
2011-12-19 07:00:00 - 2011-12-19 08:00:00
3
4
3
2011-12-19 08:00:00 - 2011-12-19 09:00:00
4
5
Where max (max concurrent) would be:
2011-12-18 06:00:00 - Concurrent sessions: (2,1), (3,1,4) Total: 1,2,3,4
2011-12-18 07:00:00 - Concurrent sessions: (1,4), (5,1,6) Total: 1,4,5,6
2011-12-18 08:00:00 - Concurrent sessions: (1,5,7,8), (9,1,5) Total: 1,5,7,8,9
Any ideas how I could achieve something like this using SQL (BigQuery)?
This is a little complicated, but here is a query:
with t as (
select 1 as id, timestamp('2011-12-19 06:00:00') as startt, timestamp('2011-12-19 08:45:00') as endt union all
select 2 as id, timestamp('2011-12-19 06:15:00') as startt, timestamp('2011-12-19 06:30:00') as endt union all
select 3 as id, timestamp('2011-12-19 06:30:00') as startt, timestamp('2011-12-19 06:45:00') as endt union all
select 4 as id, timestamp('2011-12-19 06:40:00') as startt, timestamp('2011-12-19 07:15:00') as endt union all
select 5 as id, timestamp('2011-12-19 07:15:00') as startt, timestamp('2011-12-19 08:45:00') as endt union all
select 6 as id, timestamp('2011-12-19 07:30:00') as startt, timestamp('2011-12-19 07:50:00') as endt union all
select 7 as id, timestamp('2011-12-19 08:00:00') as startt, timestamp('2011-12-19 08:30:00') as endt union all
select 8 as id, timestamp('2011-12-19 08:00:00') as startt, timestamp('2011-12-19 08:15:00') as endt union all
select 9 as id, timestamp('2011-12-19 08:30:00') as startt, timestamp('2011-12-19 08:45:00') as endt
),
se as (
select id, startt as ts, 1 as inc
from t union all
select id, endt as ts, -1 as inc
from t union all
select null, ts, 0
from unnest(generate_timestamp_array(timestamp('2011-12-19 06:00:00'),
timestamp('2011-12-19 08:00:00'),
interval 1 hour)
) ts
),
p as (
select ts, (inc = 0) as col, sum(inc) as value_at,
countif(inc = 1) as num_starts,
sum(sum(inc)) over (order by ts, max(inc = 0) desc) as active_at,
sum(countif(inc = 0)) over (order by ts, max(inc = 0) desc) as period_grp
from se
group by 1, 2
)
select period_grp, min(ts) as period,
max(active_at) as max_in_period,
(array_agg(active_at order by ts limit 1)[ordinal(1)] +
sum(num_starts)
) as total
from p
group by period_grp;
The key idea is to split the starts and stops into separate rows with an "increment" of +1 or -1. This is then augmented with the hourly breaks that you want.
The code then does the following:
Calculate the cumulative sum of the increment to get the number of concurrent ids at each timestamp.
Calculates the "period" for each timestamp by taking a cumulative sum of the generated rows.
Then the two calculations you want are:
The max is simply the max of the concurrent in a group by.
The total is the concurrent at the beginning of the time period (not including any that start at the beginning of the time period) plus any starts during the time period.
Let's start with a resultset containing all the distinct timestamps in your ev (event) table. (UNION strips duplicates.)
SELECT start t FROM ev
UNION
SELECT end t FROM ev
Next let's figure out how many sessions are active at each of these points in time. We can do that by using a JOIN to check whether each session is active at the point in time. fiddle.
SELECT COUNT(*) concurrent, t.t
FROM ev
JOIN (
SELECT start t FROM ev
UNION
SELECT end t FROM ev
) t ON ev.start <= t.t AND ev.end > t.t
GROUP BY t.t
If you have many many sessions, this query can do a lot of heavy lifting. You'd be smart, in production, to restrict it by date range, and to put a compound index on (start, end).
Finally, group that result set by hour and take the maximum concurrency. fiddle
SELECT DATE_FORMAT(t, '%Y-%m-%d %H:00') hour_beginning,
MAX(concurrent) concurrent
FROM (
SELECT COUNT(*) concurrent, t.t
FROM ev
JOIN (
SELECT start t FROM ev
UNION
SELECT end t FROM ev
) t ON ev.start <= t.t AND ev.end > t.t
GROUP BY t.t
) q
GROUP BY DATE_FORMAT(t, '%Y-%m-%d %H:00')
Notice a couple of things.
The expression DATE_FORMAT(t, '%Y-%m-%d %H:00') gets you a timestamp that's the beginning of the hour of t.
To work perfectly, this assumes the end columns in your table record the first moment the session became inactive, not the last moment the session was active. (There are two kinds of hard problems in computer science: naming things, caching things, and off-by-one errors. :-)
This is tested on MySQL. BigQuery may vary in its syntax.
Consider below approach - seems to me most simple and least verbose
select
timestamp_trunc(ts, hour) hour,
max(concurrent) `max`,
hll_count.merge(ids) total
from (
select ts, count(distinct id) concurrent, hll_count.init(id) ids
from `project.dataset.table`,
unnest(generate_timestamp_array(start, `end`, interval 1 minute)) ts
group by ts
)
group by hour
if applied to sample data in your question - output is
Related
I have an ask for a count of number of guests in a venue broken down to the minute. The data set I have available to me is the venue, the date/time the guest entered the venue, and the date/time the guest exited the venue. The business is asking for a breakdown by minute of the count of guests in the venue.
For example, guest A enters the venue at 12:00 and exits at 13:00. Guest B enters the venue at 12:30 and exits at 13:30. The expected output would show a count of 1 from 12:00 to 12:29, a count of two from 12:30 to 13:00, and back to a count of one from 13:00 to 13:30.
I’m struggling with the ask due to restrictions placed upon me. I am not authorized to make any structure changes; therefore, no DDL, which means I am restricted to SQL or anonymous PLSQL blocks. More information: however, I am unsure if it is necessary. The database version is 12.2c and it is running on AIX.
I do have a workaround where I extract the dataset as a csv and import it into a C# console application, which I wrote, but I would prefer if the ask can be conducted within the Oracle ecosystem.
I appreciate any help or insight you can share about my problem.
You can solve this problem with a combination of several tricks: connect by level <= 91 to create the 91 minutes for the time frame, a left join to include all minutes even if there isn't an event at that minute, a case and sum to count and sum arrivals and departures, and finally an analytic function to generate the running total of guests by adding arrivals and subtracting departures.
--The number of guests present per minute.
select
the_minute,
sum(arrive_counter + depart_counter) over (order by the_minute) guest_count
from
(
--Join time and visits and count arrivals and departures.
select
the_minute,
sum(case when the_minute = arrive_date then 1 else 0 end) arrive_counter,
sum(case when the_minute = depart_date then -1 else 0 end) depart_counter
from
(
--Every minute for a time period. (Change to 1441 for an entire day.)
select timestamp '2022-01-24 12:00:00' + (level - 1) * interval '1' minute the_minute
from dual
connect by level <= 91
) minutes
left join visit
on minutes.the_minute = arrive_date
or minutes.the_minute = depart_date
group by the_minute
order by the_minute
)
order by the_minute;
Results:
THE_MINUTE GUEST_COUNT
24-JAN-22 12.00.00.000000000 PM 1
24-JAN-22 12.01.00.000000000 PM 1
...
24-JAN-22 12.28.00.000000000 PM 1
24-JAN-22 12.29.00.000000000 PM 1
24-JAN-22 12.30.00.000000000 PM 2
24-JAN-22 12.31.00.000000000 PM 2
...
24-JAN-22 12.58.00.000000000 PM 2
24-JAN-22 12.59.00.000000000 PM 2
24-JAN-22 01.00.00.000000000 PM 1
24-JAN-22 01.01.00.000000000 PM 1
...
24-JAN-22 01.28.00.000000000 PM 1
24-JAN-22 01.29.00.000000000 PM 1
24-JAN-22 01.30.00.000000000 PM 0
You can use:
SELECT timestamp AS time_from,
LEAD(timestamp) OVER(ORDER BY timestamp) AS time_to,
SUM(SUM(change_in_guests)) OVER (ORDER BY timestamp) AS guests
FROM guests
UNPIVOT(
timestamp FOR change_in_guests IN (
entry AS +1,
exit AS -1
)
)
GROUP BY timestamp;
Which, for the sample data:
CREATE TABLE guests (id, entry, exit) AS
SELECT 'A', DATE '2022-01-25' + INTERVAL '12:00' HOUR TO MINUTE, DATE '2022-01-25' + INTERVAL '13:00' HOUR TO MINUTE FROM DUAL UNION ALL
SELECT 'B', DATE '2022-01-25' + INTERVAL '12:30' HOUR TO MINUTE, DATE '2022-01-25' + INTERVAL '13:30' HOUR TO MINUTE FROM DUAL;
Outputs:
TIME_FROM
TIME_TO
GUESTS
2022-01-25 12:00:00
2022-01-25 12:30:00
1
2022-01-25 12:30:00
2022-01-25 13:00:00
2
2022-01-25 13:00:00
2022-01-25 13:30:00
1
2022-01-25 13:30:00
null
0
If you want it minute-by-minute then:
WITH minutes (minute, time_to, guests) AS (
SELECT timestamp,
LEAD(timestamp) OVER(ORDER BY timestamp),
SUM(SUM(change_in_guests)) OVER (ORDER BY timestamp)
FROM guests
UNPIVOT(
timestamp FOR change_in_guests IN (
entry AS +1,
exit AS -1
)
)
GROUP BY timestamp
UNION ALL
SELECT minute + INTERVAL '1' MINUTE,
time_to,
guests
FROM minutes
WHERE minute + INTERVAL '1' MINUTE < time_to
)
SEARCH DEPTH FIRST BY minute SET order_rn
SELECT minute,
guests
FROM minutes;
Which outputs:
MINUTE
GUESTS
2022-01-25 12:00:00
1
2022-01-25 12:01:00
1
2022-01-25 12:02:00
1
...
...
2022-01-25 12:28:00
1
2022-01-25 12:29:00
1
2022-01-25 12:30:00
2
2022-01-25 12:31:00
2
...
...
2022-01-25 12:58:00
2
2022-01-25 12:59:00
2
2022-01-25 13:00:00
1
2022-01-25 13:01:00
1
...
...
2022-01-25 13:28:00
1
2022-01-25 13:29:00
1
2022-01-25 13:30:00
0
db<>fiddle here
Given a table like this, I would like to compute the time duration of each state before changing to a different state:
id state timestamp
1 1 2018-08-17 10:40:00
1 2 2018-08-17 12:40:00
1 1 2018-08-17 14:40:00
2 1 2018-08-17 09:00:00
2 2 2018-08-17 12:00:00
The output I want is:
id state date duration
1 1 2018-08-17 2 hours
1 2 2018-08-17 2 hours
1 1 2018-08-17 9 hours 20 minutes (until the end of the day in this case)
2 1 2018-08-17 3 hours
2 2 2018-08-17 12 hours (until the end of the day in this case)
I am not so sure whether this is doable in SQL. I feel like I have to write a UDF against aggregated state and timestamp (grouped by id and ordered by ts) which outputs an array of struct (id, state, date, and duration). This array can be flattened.
Below is for BigQuery Standard SQL
#standardSQL
SELECT id, state,
IFNULL(
TIMESTAMP_DIFF(LEAD(ts) OVER(PARTITION BY id ORDER BY ts), ts, MINUTE),
24*60 - TIMESTAMP_DIFF(ts, TIMESTAMP_TRUNC(ts, DAY), MINUTE)
) AS duration_minutes
FROM `project.dataset.table`
You can test, play with above using dummy data from your question:
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, 1 state, TIMESTAMP('2018-08-17 10:40:00') ts UNION ALL
SELECT 1, 2, '2018-08-17 12:40:00' UNION ALL
SELECT 1, 1, '2018-08-17 14:40:00' UNION ALL
SELECT 2, 1, '2018-08-17 09:00:00' UNION ALL
SELECT 2, 2, '2018-08-17 12:00:00'
)
SELECT id, state,
IFNULL(
TIMESTAMP_DIFF(LEAD(ts) OVER(PARTITION BY id ORDER BY ts), ts, MINUTE),
24*60 - TIMESTAMP_DIFF(ts, TIMESTAMP_TRUNC(ts, DAY), MINUTE)
) AS duration_minutes
FROM `project.dataset.table`
-- ORDER BY id, ts
with result as below
Row id state duration_minutes
1 1 1 120
2 1 2 120
3 1 1 560
4 2 1 180
5 2 2 720
If you need your output formatted exactly the qay you showed in question - use below
#standardSQL
SELECT id, state, ts, duration_minutes,
FORMAT('%i hours %i minutes', DIV(duration_minutes, 60), MOD(duration_minutes, 60)) duration
FROM (
SELECT id, state, ts,
IFNULL(
TIMESTAMP_DIFF(LEAD(ts) OVER(PARTITION BY id ORDER BY ts), ts, MINUTE),
24*60 - TIMESTAMP_DIFF(ts, TIMESTAMP_TRUNC(ts, DAY), MINUTE)
) AS duration_minutes
FROM `project.dataset.table`
)
In this case you output will look like below
Row id state ts duration_minutes duration
1 1 1 2018-08-17 10:40:00 UTC 120 2 hours 0 minutes
2 1 2 2018-08-17 12:40:00 UTC 120 2 hours 0 minutes
3 1 1 2018-08-17 14:40:00 UTC 560 9 hours 20 minutes
4 2 1 2018-08-17 09:00:00 UTC 180 3 hours 0 minutes
5 2 2 2018-08-17 12:00:00 UTC 720 12 hours 0 minutes
Sure, you will most likely still need to adjust above to your particular case - but you've got a good start I think
I have a table with Items with
Item_id, Item_time, Item_numbers
1 2017-01-01 18:00:00 2
2 2017-01-01 18:10:00 2
3 2017-01-01 19:10:00 3
I want to group the items by hourly for some specific time (between 9 to 3 for each day) and in case if there is no entry for the particular hours then it should it be a 0.
Desired Output:
Item_time Item_numbers
2017-01-01 18:00:00 4
2017-01-01 19:00:00 3
2017-01-01 20:00:00 0
with hour_items as (select date_trunc('hour', item_time) "hour",
avg(item_numbers) as value from items where item_id=2 and
fact_time::date= '2017-01-01' group by hour) select hour, value from
hour_items where EXTRACT(HOUR FROM hour) >= '9' and EXTRACT(HOUR FROM
> hour) < '15'.
The above query groups them correctly but the where the hour is missing, there is no entry. Though it should be an entry with a 0 as stated in the desired output.
This should do.
We get all the distinct days (CTE dates), then we generate hours for each of those dates (CTE hours) and finally we left join our data on "per our" basis.
with sample_data as (
select 1 as item_id, '2018-01-01 12:03:15'::timestamp as item_time, 2 as item_numbers
union all
select 2 as item_id, '2018-01-01 12:41:15'::timestamp as item_time, 1 as item_numbers
union all
select 3 as item_id, '2018-01-01 17:41:15'::timestamp as item_time, 2 as item_numbers
union all
select 4 as item_id, '2018-01-01 19:41:15'::timestamp as item_time, 2 as item_numbers
),
dates as (
select distinct item_time::date
from sample_data
),
hours as (
select item_time + interval '1 hour' * a as hour
from dates
cross join generate_series(0,23) a
)
select h.hour, sum(coalesce(sd.item_numbers,0))
from hours h
left join sample_data sd on h.hour = date_trunc('hour', sd.item_time)
where extract(hour from hour) between 9 and 17
group by h.hour
order by h.hour
I have a set of rows containing a start timestamp and a duration. I want to perform various summaries using the overlap or concurrency.
For example: peak daily concurrency, peak concurrency grouped on another column.
Example data:
timestamp,duration
2016-01-01 12:00:00,300
2016-01-01 12:01:00,300
2016-01-01 12:06:00,300
I would like to know that peak for the period was 12:01:00-12:05:00 at 2 concurrent.
Any ideas on how to achieve this using BigQuery or, less exciting, a Map/Reduce job?
For a per-minute resolution, with session lengths of up to 255 minutes:
SELECT session_minute, COUNT(*) c
FROM (
SELECT start, DATE_ADD(start, i, 'MINUTE') session_minute FROM (
SELECT * FROM (
SELECT TIMESTAMP("2015-04-30 10:14") start, 7 minutes
),(
SELECT TIMESTAMP("2015-04-30 10:15") start, 12 minutes
),(
SELECT TIMESTAMP("2015-04-30 10:15") start, 12 minutes
),(
SELECT TIMESTAMP("2015-04-30 10:18") start, 12 minutes
),(
SELECT TIMESTAMP("2015-04-30 10:23") start, 3 minutes
)
) a
CROSS JOIN [fh-bigquery:public_dump.numbers_255] b
WHERE a.minutes>b.i
)
GROUP BY 1
ORDER BY 1
STEP 1 - First you need find all periods (start and end) with
respective concurrent entries
SELECT ts AS start, LEAD(ts) OVER(ORDER BY ts) AS finish,
SUM(entry) OVER(ORDER BY ts) AS concurrent_entries
FROM (
SELECT ts, SUM(entry)AS entry
FROM
(SELECT ts, 1 AS entry FROM yourTable),
(SELECT DATE_ADD(ts, duration, 'second') AS ts, -1 AS entry FROM yourTable)
GROUP BY ts
HAVING entry != 0
)
ORDER BY ts
Assuming input as below
(SELECT TIMESTAMP('2016-01-01 12:00:00') AS ts, 300 AS duration),
(SELECT TIMESTAMP('2016-01-01 12:01:00') AS ts, 300 AS duration),
(SELECT TIMESTAMP('2016-01-01 12:06:00') AS ts, 300 AS duration),
(SELECT TIMESTAMP('2016-01-01 12:07:00') AS ts, 300 AS duration),
(SELECT TIMESTAMP('2016-01-01 12:10:00') AS ts, 300 AS duration),
(SELECT TIMESTAMP('2016-01-01 12:11:00') AS ts, 300 AS duration)
the output of above query will look somehow like this:
start finish concurrent_entries
2016-01-01 12:00:00 UTC 2016-01-01 12:01:00 UTC 1
2016-01-01 12:01:00 UTC 2016-01-01 12:05:00 UTC 2
2016-01-01 12:05:00 UTC 2016-01-01 12:07:00 UTC 1
2016-01-01 12:07:00 UTC 2016-01-01 12:10:00 UTC 2
2016-01-01 12:10:00 UTC 2016-01-01 12:12:00 UTC 3
2016-01-01 12:12:00 UTC 2016-01-01 12:15:00 UTC 2
2016-01-01 12:15:00 UTC 2016-01-01 12:16:00 UTC 1
2016-01-01 12:16:00 UTC null 0
You might still want to polish above query a little - but mainly it does what you need
STEP 2 - now you can do any stats off of above result
For example peak on whole period:
SELECT
start, finish, concurrent_entries, RANK() OVER(ORDER BY concurrent_entries DESC) AS peak
FROM (
SELECT ts AS start, LEAD(ts) OVER(ORDER BY ts) AS finish,
SUM(entry) OVER(ORDER BY ts) AS concurrent_entries
FROM (
SELECT ts, SUM(entry)AS entry FROM
(SELECT ts, 1 AS entry FROM yourTable),
(SELECT DATE_ADD(ts, duration, 'second') AS ts, -1 AS entry FROM yourTable)
GROUP BY ts
HAVING entry != 0
)
)
ORDER BY peak
I'm, using Oracle 11g and I have this problem. I couldn't come up with any ideas to solve it yet.
I have a table with occupied classrooms. What I need to find are the hours available between a datetime range. For example, I have rooms A, B and C, the table of occupied classrooms looks like this:
Classroom start end
A 10/10/2013 10:00 10/10/2013 11:30
B 10/10/2013 09:15 10/10/2013 10:45
B 10/10/2013 14:30 10/10/2013 16:00
What I need to get is something like this:
with date time range between '10/10/2013 07:00' and '10/10/2013 21:15'
Classroom avalailable_from available_to
A 10/10/2013 07:00 10/10/2013 10:00
A 10/10/2013 11:30 10/10/2013 21:15
B 10/10/2013 07:00 10/10/2013 09:15
B 10/10/2013 10:45 10/10/2013 14:30
B 10/10/2013 16:00 10/10/2013 21:15
C 10/10/2013 07:00 10/10/2013 21:15
Is there a way I can accomplish that with sql or pl/sql?
I was looking at a solution similar in concept at least to Wernfried's, but I think it's different enough to post as well. The start is the same idea, first generating the possible time slots, and assuming you're looking at 15-minute windows: I'm using CTEs because I think they're clearer than nested selects, particularly with this many levels.
with date_time_range as (
select to_date('10/10/2013 07:00', 'DD/MM/YYYY HH24:MI') as date_start,
to_date('10/10/2013 21:15', 'DD/MM/YYYY HH24:MI') as date_end
from dual
),
time_slots as (
select level as slot_num,
dtr.date_start + (level - 1) * interval '15' minute as slot_start,
dtr.date_start + level * interval '15' minute as slot_end
from date_time_range dtr
connect by level <= (dtr.date_end - dtr.date_start) * (24 * 4) -- 15-minutes
)
select * from time_slots;
This gives you the 57 15-minute slots between the start and end date you specified. The CTE for date_time_range isn't strictly necessary, you could put your dates straight into the time_slots conditions, but you'd have to repeat them and that then introduces a possible failure point (and means binding the same value multiple times, from JDBC or wherever).
Those slots can then be cross-joined to the list of classrooms, which I'm assuming are already in another table, which gives you 171 (3x57) combinations; and those can be compared with existing bookings - once those are eliminated you're left with the 153 15-minute slots that have no booking.
with date_time_range as (...),
time_slots as (...),
free_slots as (
select c.classroom, ts.slot_num, ts.slot_start, ts.slot_end,
lag(ts.slot_end) over (partition by c.classroom order by ts.slot_num)
as lag_end,
lead(ts.slot_start) over (partition by c.classroom order by ts.slot_num)
as lead_start
from time_slots ts
cross join classrooms c
left join occupied_classrooms oc on oc.classroom = c.classroom
and not (oc.occupied_end <= ts.slot_start
or oc.occupied_start >= ts.slot_end)
where oc.classroom is null
)
select * from free_slots;
But then you have to collapse those into contiguous ranges. There are various ways of doing that; here I'm peeking at the previous and next rows to decide if a particular value is the edge of a range:
with date_time_range as (...),
time_slots as (...),
free_slots as (...),
free_slots_extended as (
select fs.classroom, fs.slot_num,
case when fs.lag_end is null or fs.lag_end != fs.slot_start
then fs.slot_start end as slot_start,
case when fs.lead_start is null or fs.lead_start != fs.slot_end
then fs.slot_end end as slot_end
from free_slots fs
)
select * from free_slots_extended
where (fse.slot_start is not null or fse.slot_end is not null);
Now we're down to 12 rows. (The outer where clause eliminates all 141 of the 153 slots from the previous step which are mid-range, since we only care about the edges):
CLASSROOM SLOT_NUM SLOT_START SLOT_END
--------- ---------- ---------------- ----------------
A 1 2013-10-10 07:00
A 12 2013-10-10 10:00
A 19 2013-10-10 11:30
A 57 2013-10-10 21:15
B 1 2013-10-10 07:00
B 9 2013-10-10 09:15
B 16 2013-10-10 10:45
B 30 2013-10-10 14:30
B 37 2013-10-10 16:00
B 57 2013-10-10 21:15
C 1 2013-10-10 07:00
C 57 2013-10-10 21:15
So those represent the edges, but on separate rows, and a final step combines them:
...
select distinct fse.classroom,
nvl(fse.slot_start, lag(fse.slot_start)
over (partition by fse.classroom order by fse.slot_num)) as slot_start,
nvl(fse.slot_end, lead(fse.slot_end)
over (partition by fse.classroom order by fse.slot_num)) as slot_end
from free_slots_extended fse
where (fse.slot_start is not null or fse.slot_end is not null)
Or putting all that together:
with date_time_range as (
select to_date('10/10/2013 07:00', 'DD/MM/YYYY HH24:MI') as date_start,
to_date('10/10/2013 21:15', 'DD/MM/YYYY HH24:MI') as date_end
from dual
),
time_slots as (
select level as slot_num,
dtr.date_start + (level - 1) * interval '15' minute as slot_start,
dtr.date_start + level * interval '15' minute as slot_end
from date_time_range dtr
connect by level <= (dtr.date_end - dtr.date_start) * (24 * 4) -- 15-minutes
),
free_slots as (
select c.classroom, ts.slot_num, ts.slot_start, ts.slot_end,
lag(ts.slot_end) over (partition by c.classroom order by ts.slot_num)
as lag_end,
lead(ts.slot_start) over (partition by c.classroom order by ts.slot_num)
as lead_start
from time_slots ts
cross join classrooms c
left join occupied_classrooms oc on oc.classroom = c.classroom
and not (oc.occupied_end <= ts.slot_start
or oc.occupied_start >= ts.slot_end)
where oc.classroom is null
),
free_slots_extended as (
select fs.classroom, fs.slot_num,
case when fs.lag_end is null or fs.lag_end != fs.slot_start
then fs.slot_start end as slot_start,
case when fs.lead_start is null or fs.lead_start != fs.slot_end
then fs.slot_end end as slot_end
from free_slots fs
)
select distinct fse.classroom,
nvl(fse.slot_start, lag(fse.slot_start)
over (partition by fse.classroom order by fse.slot_num)) as slot_start,
nvl(fse.slot_end, lead(fse.slot_end)
over (partition by fse.classroom order by fse.slot_num)) as slot_end
from free_slots_extended fse
where (fse.slot_start is not null or fse.slot_end is not null)
order by 1, 2;
Which gives:
CLASSROOM SLOT_START SLOT_END
--------- ---------------- ----------------
A 2013-10-10 07:00 2013-10-10 10:00
A 2013-10-10 11:30 2013-10-10 21:15
B 2013-10-10 07:00 2013-10-10 09:15
B 2013-10-10 10:45 2013-10-10 14:30
B 2013-10-10 16:00 2013-10-10 21:15
C 2013-10-10 07:00 2013-10-10 21:15
SQL Fiddle.
It is always a challenge when you like to "select something which does not exist". First you need a list of all available classrooms and times (in interval of 15 Minutes). Then you can select them by skipping the occupied items.
I managed to make a query without any PL/SQL:
CREATE TABLE Table1
(Classroom VARCHAR2(10), start_ts DATE, end_ts DATE);
INSERT INTO Table1 VALUES ('A', TIMESTAMP '2013-01-10 10:00:00', TIMESTAMP '2013-01-10 11:30:00');
INSERT INTO Table1 VALUES ('B', TIMESTAMP '2013-01-10 09:15:00', TIMESTAMP '2013-01-10 10:45:00');
INSERT INTO Table1 VALUES ('B', TIMESTAMP '2013-01-10 14:30:00', TIMESTAMP '2013-01-10 16:00:00');
WITH all_rooms AS
(SELECT CHR(64+LEVEL) AS ROOM FROM dual CONNECT BY LEVEL <= 3),
all_times AS
(SELECT CAST(TIMESTAMP '2013-01-10 07:00:00' + (LEVEL-1) * INTERVAL '15' MINUTE AS DATE) AS TIMES, LEVEL AS SLOT
FROM DUAL
CONNECT BY TIMESTAMP '2013-01-10 07:00:00' + (LEVEL-1) * INTERVAL '15' MINUTE <= TIMESTAMP '2013-01-10 21:15:00'),
all_free_slots AS
(SELECT ROOM, TIMES, SLOT,
CASE SLOT-LAG(SLOT, 1, 0) OVER (PARTITION BY ROOM ORDER BY SLOT)
WHEN 1 THEN 0
ELSE 1
END AS NEW_WINDOW
FROM all_times
CROSS JOIN all_rooms
WHERE NOT EXISTS
(SELECT 1 FROM TABLE1 WHERE ROOM = CLASSROOM AND TIMES BETWEEN START_TS + INTERVAL '1' MINUTE AND END_TS - INTERVAL '1' MINUTE)),
free_time_windows AS
(SELECT ROOM, TIMES, SLOT,
SUM(NEW_WINDOW) OVER (PARTITION BY ROOM ORDER BY SLOT) AS WINDOW_ID
FROM all_free_slots)
SELECT ROOM,
TO_CHAR(MIN(TIMES), 'yyyy-mm-dd hh24:mi') AS free_time_start,
TO_CHAR(MAX(TIMES), 'yyyy-mm-dd hh24:mi') AS free_time_end
FROM free_time_windows
GROUP BY ROOM, WINDOW_ID
HAVING MAX(TIMES) - MIN(TIMES) > 0
ORDER BY ROOM, 2;
ROOM FREE_TIME_START FREE_TIME_END
---- ----------------------------------
A 2013-01-10 07:00 2013-01-10 10:00
A 2013-01-10 11:30 2013-01-10 21:15
B 2013-01-10 07:00 2013-01-10 09:15
B 2013-01-10 10:45 2013-01-10 14:30
B 2013-01-10 16:00 2013-01-10 21:15
C 2013-01-10 07:00 2013-01-10 21:15
In order to understand the query you can split the sub-queries from top, e.g.
WITH all_rooms AS
(SELECT CHR(64+LEVEL) AS ROOM FROM dual CONNECT BY LEVEL <= 3),
all_times AS
(SELECT CAST(TIMESTAMP '2013-01-10 07:00:00' + (LEVEL-1) * INTERVAL '15' MINUTE AS DATE) AS TIMES, LEVEL AS SLOT
FROM DUAL
CONNECT BY TIMESTAMP '2013-01-10 07:00:00' + (LEVEL-1) * INTERVAL '15' MINUTE <= TIMESTAMP '2013-01-10 21:15:00')
SELECT ROOM, TIMES, SLOT,
CASE SLOT-LAG(SLOT, 1, 0) OVER (PARTITION BY ROOM ORDER BY SLOT)
WHEN 1 THEN 0
ELSE 1
END AS NEW_WINDOW
FROM all_times
CROSS JOIN all_rooms
WHERE NOT EXISTS (SELECT 1 FROM TABLE1 WHERE ROOM = CLASSROOM AND TIMES BETWEEN START_TS + INTERVAL '1' MINUTE AND END_TS - INTERVAL '1' MINUTE)
ORDER BY ROOM, SLOT