BigQuery: how to do semi left join? - google-bigquery

I couldn't come up with a good title for this question. Sorry about that.
I have two tables A and B. Both have timestamps and shares a common ID between them. Here are schemas of both tables:
Table A:
========
a_id int,
common_id int,
ts timestamp
...
Table B:
========
b_id int,
common_id int,
ts timestamp,
temperature int
Table A is more like device data whenever it changes its status. Table B is more IoT data which contains a temperature of a device every minute or so.
What I want to do is to create a Table C from these two tables. Table C would be in essence Table A + its temperature in closest time from table B.
How can I do this purely in BigQuery SQL? The temperature info doesn't need to be precise.

Below option (for BigQuery Standard SQL) assumes that in addition of temperature from table b you still need all the rest of values from respective row
#standardSQL
SELECT
ARRAY_AGG(
STRUCT(a_id, a.common_id, a.ts, b_id, b.ts AS b_ts, temperature)
ORDER BY ABS(TIMESTAMP_DIFF(a.ts, b.ts, SECOND))
LIMIT 1
)[SAFE_OFFSET(0)].*
FROM `project.dataset.table_a` a
LEFT JOIN `project.dataset.table_b` b
ON a.common_id = b.common_id
AND ABS(TIMESTAMP_DIFF(a.ts, b.ts, MINUTE)) < 30
GROUP BY TO_JSON_STRING(a)
I smoke-tested it with below generated dummy data
#standardSQL
WITH `project.dataset.table_a` AS (
SELECT CAST(1000000 * RAND() AS INT64) a_id, common_id, ts
FROM UNNEST(GENERATE_TIMESTAMP_ARRAY('2018-01-01 00:00:00', '2018-01-01 23:59:59', INTERVAL 45*60 + 27 SECOND)) ts
CROSS JOIN UNNEST(GENERATE_ARRAY(1, 10)) common_id
), `project.dataset.table_b` AS (
SELECT CAST(1000000 * RAND() AS INT64) b_id, common_id, ts, CAST(60 + 40 * RAND() AS INT64) temperature
FROM UNNEST(GENERATE_TIMESTAMP_ARRAY('2018-01-01 00:00:00', '2018-01-01 23:59:59', INTERVAL 1 MINUTE)) ts
CROSS JOIN UNNEST(GENERATE_ARRAY(1, 10)) common_id
)
SELECT
ARRAY_AGG(
STRUCT(a_id, a.common_id, a.ts, b_id, b.ts AS b_ts, temperature)
ORDER BY ABS(TIMESTAMP_DIFF(a.ts, b.ts, SECOND))
LIMIT 1
)[SAFE_OFFSET(0)].*
FROM `project.dataset.table_a` a
LEFT JOIN `project.dataset.table_b` b
ON a.common_id = b.common_id
AND ABS(TIMESTAMP_DIFF(a.ts, b.ts, MINUTE)) < 30
GROUP BY TO_JSON_STRING(a)
with example of few rows from output:
Row a_id common_id ts b_id b_ts temperature
1 276623 1 2018-01-01 00:00:00 UTC 166995 2018-01-01 00:00:00 UTC 74
2 218354 1 2018-01-01 00:45:27 UTC 464901 2018-01-01 00:45:00 UTC 87
3 265634 1 2018-01-01 01:30:54 UTC 565385 2018-01-01 01:31:00 UTC 87
4 758075 1 2018-01-01 02:16:21 UTC 55894 2018-01-01 02:16:00 UTC 84
5 306355 1 2018-01-01 03:01:48 UTC 844429 2018-01-01 03:02:00 UTC 92
6 348502 1 2018-01-01 03:47:15 UTC 375859 2018-01-01 03:47:00 UTC 90
7 774920 1 2018-01-01 04:32:42 UTC 438164 2018-01-01 04:33:00 UTC 61
Here - I set table_b to have temperature for each minute for 10 devices during the whole day of '2018-01-01' and in table_a I set status changed each 45 min 27 sec for same 10 devices during same day. a_id and b_id - just random numbers between 0 and 999999
Note: ABS(TIMESTAMP_DIFF(a.ts, b.ts, MINUTE)) < 30 clause in JOIN controls period that you can consider ok to look for closest ts (in case if some IoT entries are absent from table_b

Measuring the closest time by TIMESTAMP_DIFF(a.ts,b.ts, SECOND) - by its absolute value to get the closest in any direction:
WITH a AS (
SELECT 1 id, TIMESTAMP('2018-01-01 11:01:00') ts
UNION ALL SELECT 1, ('2018-01-02 10:00:00')
UNION ALL SELECT 2, ('2018-01-02 10:00:00')
)
, b AS (
SELECT 1 id, TIMESTAMP('2018-01-01 12:01:00') ts, 43 temp
UNION ALL SELECT 1, TIMESTAMP('2018-01-01 12:06:00'), 47
)
SELECT *,
(SELECT temp
FROM b
WHERE a.id=b.id
ORDER BY ABS(TIMESTAMP_DIFF(a.ts,b.ts, SECOND))
LIMIT 1) temp
FROM a

Related

SQL statement to return the Min and Max amount of stock per article for a given Month

I have a table from which I am trying to return the quantity per day that the article was in the system.
Example is in table Bestand the are multiple palletes of a different articles that each have a Booking In and Out date; I am try to find out the Min and Max amount of stock that was in the system per article and month.
My thinking is that if I can return the stock quantity for each day and then read out the Min and Max values.
The Timespan would be set at the time of running the SQL and the articles would be fixed.
To find out the quantity for each day I have used the following SQL:
SELECT DISTINCT
a.artbez1 AS Artikelbezeichnung,
b.artikelnr AS Artikelnummer,
SUM(CASE WHEN TO_DATE('2019-11-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS') BETWEEN b.neu_datum AND b.aender_datum THEN 1 * b.menge_ist ELSE 0 END) AS "01 Nov 2019"
FROM
artikel a, bestand b
WHERE
b.artikelnr IN ('273632002', .... (huge long list of numbers) ....)
AND b.artikelnr = a.artikelnr
GROUP BY
a.artbez1, b.artikelnr;
This returns for example:
ARTIKELBEZEICHNUNG
ARTIKELNUMMER
01 Nov 2019
SC-4400.CW
220450002
39
S-320.FK120
220502004
0
H-595.FK120
220800004
35
AC-548.FK209
220948032
0
AS-6800.CW
221355002
20
I would like return this for each day of the Month and then from that return the Min and Max Value for each Article
I have the following SQL to return the days of a given Month and was wondering if anyone had any ideas on how they could be combined (If at all possible):
SELECT to_date('01.11.2019','dd.mm.yyyy')+LEVEL-1
FROM dual
CONNECT BY LEVEL <= TO_CHAR(LAST_DAY(to_date('01.11.2019','dd.mm.yyyy')),'DD')
DATES
2019-11-01 00:00:00
2019-11-02 00:00:00
2019-11-03 00:00:00
2019-11-04 00:00:00
2019-11-05 00:00:00
2019-11-06 00:00:00
2019-11-07 00:00:00
The result i am try to get would be something like:
ARTIKELBEZEICHNUNG
ARTIKELNUMMER
Nov 19 Min
Nov 19 Max
SC-4400.CW
220450002
5
39
S-320.FK120
220502004
0
15
H-595.FK120
220800004
2
35
AC-548.FK209
220948032
0
0
AS-6800.CW
221355002
10
20
Is this at all possible in SQL?
Thanks for taking the time to read my post.
JeRi
You can use a partitioned outer join:
WITH calendar ( day ) AS (
SELECT DATE '2019-11-01'
FROM DUAL
UNION ALL
SELECT day + INTERVAL '1' DAY
FROM calendar
WHERE day < LAST_DAY( DATE '2019-11-01' )
),
daily_totals ( artbez1, Artikelnr, Day, total_menge_ist ) AS (
SELECT MAX( ab.artbez1 ),
ab.artikelnr,
c.day,
COALESCE( SUM( ab.menge_ist ), 0 )
FROM calendar c
LEFT OUTER JOIN
( SELECT a.artikelnr,
a.artbez1,
b.neu_datum,
b.aender_datum,
b.menge_ist
FROM artikel a
LEFT JOIN bestand b
ON ( a.artikelnr = b.artikelnr )
-- WHERE b.artikelnr IN ('273632002', .... (huge long list of numbers) ....)
) ab
PARTITION BY ( ab.artikelnr, ab.artbez1 )
ON ( c.day BETWEEN ab.neu_datum AND ab.aender_datum )
GROUP BY ab.artikelnr, c.day
)
SELECT MAX( artbez1 ) AS Artikelbezeichnung,
artikelnr AS Artikelnummer,
TRUNC( day, 'MM' ) AS month,
MIN( total_menge_ist ) AS min_total_menge_ist,
MAX( total_menge_ist ) AS max_total_menge_ist
FROM daily_totals
GROUP BY artikelnr, TRUNC( day, 'MM' );
Which, for the sample data:
CREATE TABLE artikel ( artikelnr, artbez1 ) AS
SELECT 220450002, 'SC-4400.CW' FROM DUAL UNION ALL
SELECT 220502004, 'S-320.FK120' FROM DUAL UNION ALL
SELECT 220800004, 'H-595.FK120' FROM DUAL UNION ALL
SELECT 220948032, 'AC-548.FK209' FROM DUAL UNION ALL
SELECT 221355002, 'AS-6800.CW' FROM DUAL;
CREATE TABLE bestand ( artikelnr, neu_datum, aender_datum, menge_ist ) AS
SELECT 220450002, DATE '2019-10-30', DATE '2019-11-01', 20 FROM DUAL UNION ALL
SELECT 220450002, DATE '2019-11-01', DATE '2019-11-05', 19 FROM DUAL UNION ALL
SELECT 220502004, DATE '2019-11-05', DATE '2019-11-03', 5 FROM DUAL UNION ALL
SELECT 220800004, DATE '2019-11-01', DATE '2019-11-15', 35 FROM DUAL UNION ALL
SELECT 221355002, DATE '2019-10-20', DATE '2019-11-05', 5 FROM DUAL UNION ALL
SELECT 221355002, DATE '2019-10-25', DATE '2019-11-10', 5 FROM DUAL UNION ALL
SELECT 221355002, DATE '2019-10-28', DATE '2019-11-13', 5 FROM DUAL UNION ALL
SELECT 221355002, DATE '2019-10-30', DATE '2019-11-15', 5 FROM DUAL UNION ALL
SELECT 221355002, DATE '2019-11-05', DATE '2019-11-20', 5 FROM DUAL;
Outputs:
ARTIKELBEZEICHNUNG | ARTIKELNUMMER | MONTH | MIN_TOTAL_MENGE_IST | MAX_TOTAL_MENGE_IST
:----------------- | ------------: | :------------------ | ------------------: | ------------------:
SC-4400.CW | 220450002 | 2019-11-01 00:00:00 | 0 | 39
S-320.FK120 | 220502004 | 2019-11-01 00:00:00 | 0 | 0
AC-548.FK209 | 220948032 | 2019-11-01 00:00:00 | 0 | 0
H-595.FK120 | 220800004 | 2019-11-01 00:00:00 | 0 | 35
AS-6800.CW | 221355002 | 2019-11-01 00:00:00 | 0 | 25
db<>fiddle here

How to fill the time gap after grouping date record for months in postgres

I have table records as -
date n_count
2020-02-19 00:00:00 4
2020-07-14 00:00:00 1
2020-07-17 00:00:00 1
2020-07-30 00:00:00 2
2020-08-03 00:00:00 1
2020-08-04 00:00:00 2
2020-08-25 00:00:00 2
2020-09-23 00:00:00 2
2020-09-30 00:00:00 3
2020-10-01 00:00:00 11
2020-10-05 00:00:00 12
2020-10-19 00:00:00 1
2020-10-20 00:00:00 1
2020-10-22 00:00:00 1
2020-11-02 00:00:00 376
2020-11-04 00:00:00 72
2020-11-11 00:00:00 1
I want to be grouped all the records into months for finding month total count which is working, but there is a missing of month. how to fill this gap.
time month_count
"2020-02-01" 4
"2020-07-01" 4
"2020-08-01" 5
"2020-09-01" 5
"2020-10-01" 26
"2020-11-01" 449
This is what I have tried.
SELECT (date_trunc('month', date))::date AS time,
sum(n_count) as month_count
FROM table1
group by time
order by time asc
You can use generate_series() to generate all starts of months between the earliest and latest date available in the table, then bring the table with a left join:
select d.dt, coalesce(sum(t.n_count), 0) as month_count
from (
select generate_series(date_trunc('month', min(date)), date_trunc('month', max(date)), '1 month') as dt
from table1
) as d(dt)
left join table1 t on t.date >= d.dt and t.date < d.dt + interval '1 month'
group by d.dt
order by d.dt
I would simply UNION a date series, generated from MIN and MAX date:
demo:db<>fiddle
WITH cte AS ( -- 1
SELECT
*,
date_trunc('month', date)::date AS time
FROM
t
)
SELECT
time,
SUM(n_count) as month_count --3
FROM (
SELECT
time,
n_count
FROM cte
UNION
SELECT -- 2
generate_series(
(SELECT MIN(time) FROM cte),
(SELECT MAX(time) FROM cte),
interval '1 month'
)::date,
0
) s
GROUP BY time
ORDER BY time
Use CTE to calculate date_trunc only once. Could be left out if you like to call your table twice in the UNION below
Generate monthly date series from MIN to MAX date containing your n_count value = 0. Add it to the table
Do your calculation

Cross join for time series postgresql query

I have a table with Items with
Item_id, Item_time, Item_numbers
1 2017-01-01 18:00:00 2
2 2017-01-01 18:10:00 2
3 2017-01-01 19:10:00 3
I want to group the items by hourly for some specific time (between 9 to 3 for each day) and in case if there is no entry for the particular hours then it should it be a 0.
Desired Output:
Item_time Item_numbers
2017-01-01 18:00:00 4
2017-01-01 19:00:00 3
2017-01-01 20:00:00 0
with hour_items as (select date_trunc('hour', item_time) "hour",
avg(item_numbers) as value from items where item_id=2 and
fact_time::date= '2017-01-01' group by hour) select hour, value from
hour_items where EXTRACT(HOUR FROM hour) >= '9' and EXTRACT(HOUR FROM
> hour) < '15'.
The above query groups them correctly but the where the hour is missing, there is no entry. Though it should be an entry with a 0 as stated in the desired output.
This should do.
We get all the distinct days (CTE dates), then we generate hours for each of those dates (CTE hours) and finally we left join our data on "per our" basis.
with sample_data as (
select 1 as item_id, '2018-01-01 12:03:15'::timestamp as item_time, 2 as item_numbers
union all
select 2 as item_id, '2018-01-01 12:41:15'::timestamp as item_time, 1 as item_numbers
union all
select 3 as item_id, '2018-01-01 17:41:15'::timestamp as item_time, 2 as item_numbers
union all
select 4 as item_id, '2018-01-01 19:41:15'::timestamp as item_time, 2 as item_numbers
),
dates as (
select distinct item_time::date
from sample_data
),
hours as (
select item_time + interval '1 hour' * a as hour
from dates
cross join generate_series(0,23) a
)
select h.hour, sum(coalesce(sd.item_numbers,0))
from hours h
left join sample_data sd on h.hour = date_trunc('hour', sd.item_time)
where extract(hour from hour) between 9 and 17
group by h.hour
order by h.hour

Calculating concurrency from a set of ranges

I have a set of rows containing a start timestamp and a duration. I want to perform various summaries using the overlap or concurrency.
For example: peak daily concurrency, peak concurrency grouped on another column.
Example data:
timestamp,duration
2016-01-01 12:00:00,300
2016-01-01 12:01:00,300
2016-01-01 12:06:00,300
I would like to know that peak for the period was 12:01:00-12:05:00 at 2 concurrent.
Any ideas on how to achieve this using BigQuery or, less exciting, a Map/Reduce job?
For a per-minute resolution, with session lengths of up to 255 minutes:
SELECT session_minute, COUNT(*) c
FROM (
SELECT start, DATE_ADD(start, i, 'MINUTE') session_minute FROM (
SELECT * FROM (
SELECT TIMESTAMP("2015-04-30 10:14") start, 7 minutes
),(
SELECT TIMESTAMP("2015-04-30 10:15") start, 12 minutes
),(
SELECT TIMESTAMP("2015-04-30 10:15") start, 12 minutes
),(
SELECT TIMESTAMP("2015-04-30 10:18") start, 12 minutes
),(
SELECT TIMESTAMP("2015-04-30 10:23") start, 3 minutes
)
) a
CROSS JOIN [fh-bigquery:public_dump.numbers_255] b
WHERE a.minutes>b.i
)
GROUP BY 1
ORDER BY 1
STEP 1 - First you need find all periods (start and end) with
respective concurrent entries
SELECT ts AS start, LEAD(ts) OVER(ORDER BY ts) AS finish,
SUM(entry) OVER(ORDER BY ts) AS concurrent_entries
FROM (
SELECT ts, SUM(entry)AS entry
FROM
(SELECT ts, 1 AS entry FROM yourTable),
(SELECT DATE_ADD(ts, duration, 'second') AS ts, -1 AS entry FROM yourTable)
GROUP BY ts
HAVING entry != 0
)
ORDER BY ts
Assuming input as below
(SELECT TIMESTAMP('2016-01-01 12:00:00') AS ts, 300 AS duration),
(SELECT TIMESTAMP('2016-01-01 12:01:00') AS ts, 300 AS duration),
(SELECT TIMESTAMP('2016-01-01 12:06:00') AS ts, 300 AS duration),
(SELECT TIMESTAMP('2016-01-01 12:07:00') AS ts, 300 AS duration),
(SELECT TIMESTAMP('2016-01-01 12:10:00') AS ts, 300 AS duration),
(SELECT TIMESTAMP('2016-01-01 12:11:00') AS ts, 300 AS duration)
the output of above query will look somehow like this:
start finish concurrent_entries
2016-01-01 12:00:00 UTC 2016-01-01 12:01:00 UTC 1
2016-01-01 12:01:00 UTC 2016-01-01 12:05:00 UTC 2
2016-01-01 12:05:00 UTC 2016-01-01 12:07:00 UTC 1
2016-01-01 12:07:00 UTC 2016-01-01 12:10:00 UTC 2
2016-01-01 12:10:00 UTC 2016-01-01 12:12:00 UTC 3
2016-01-01 12:12:00 UTC 2016-01-01 12:15:00 UTC 2
2016-01-01 12:15:00 UTC 2016-01-01 12:16:00 UTC 1
2016-01-01 12:16:00 UTC null 0
You might still want to polish above query a little - but mainly it does what you need
STEP 2 - now you can do any stats off of above result
For example peak on whole period:
SELECT
start, finish, concurrent_entries, RANK() OVER(ORDER BY concurrent_entries DESC) AS peak
FROM (
SELECT ts AS start, LEAD(ts) OVER(ORDER BY ts) AS finish,
SUM(entry) OVER(ORDER BY ts) AS concurrent_entries
FROM (
SELECT ts, SUM(entry)AS entry FROM
(SELECT ts, 1 AS entry FROM yourTable),
(SELECT DATE_ADD(ts, duration, 'second') AS ts, -1 AS entry FROM yourTable)
GROUP BY ts
HAVING entry != 0
)
)
ORDER BY peak

Get classroom available hours between date time range

I'm, using Oracle 11g and I have this problem. I couldn't come up with any ideas to solve it yet.
I have a table with occupied classrooms. What I need to find are the hours available between a datetime range. For example, I have rooms A, B and C, the table of occupied classrooms looks like this:
Classroom start end
A 10/10/2013 10:00 10/10/2013 11:30
B 10/10/2013 09:15 10/10/2013 10:45
B 10/10/2013 14:30 10/10/2013 16:00
What I need to get is something like this:
with date time range between '10/10/2013 07:00' and '10/10/2013 21:15'
Classroom avalailable_from available_to
A 10/10/2013 07:00 10/10/2013 10:00
A 10/10/2013 11:30 10/10/2013 21:15
B 10/10/2013 07:00 10/10/2013 09:15
B 10/10/2013 10:45 10/10/2013 14:30
B 10/10/2013 16:00 10/10/2013 21:15
C 10/10/2013 07:00 10/10/2013 21:15
Is there a way I can accomplish that with sql or pl/sql?
I was looking at a solution similar in concept at least to Wernfried's, but I think it's different enough to post as well. The start is the same idea, first generating the possible time slots, and assuming you're looking at 15-minute windows: I'm using CTEs because I think they're clearer than nested selects, particularly with this many levels.
with date_time_range as (
select to_date('10/10/2013 07:00', 'DD/MM/YYYY HH24:MI') as date_start,
to_date('10/10/2013 21:15', 'DD/MM/YYYY HH24:MI') as date_end
from dual
),
time_slots as (
select level as slot_num,
dtr.date_start + (level - 1) * interval '15' minute as slot_start,
dtr.date_start + level * interval '15' minute as slot_end
from date_time_range dtr
connect by level <= (dtr.date_end - dtr.date_start) * (24 * 4) -- 15-minutes
)
select * from time_slots;
This gives you the 57 15-minute slots between the start and end date you specified. The CTE for date_time_range isn't strictly necessary, you could put your dates straight into the time_slots conditions, but you'd have to repeat them and that then introduces a possible failure point (and means binding the same value multiple times, from JDBC or wherever).
Those slots can then be cross-joined to the list of classrooms, which I'm assuming are already in another table, which gives you 171 (3x57) combinations; and those can be compared with existing bookings - once those are eliminated you're left with the 153 15-minute slots that have no booking.
with date_time_range as (...),
time_slots as (...),
free_slots as (
select c.classroom, ts.slot_num, ts.slot_start, ts.slot_end,
lag(ts.slot_end) over (partition by c.classroom order by ts.slot_num)
as lag_end,
lead(ts.slot_start) over (partition by c.classroom order by ts.slot_num)
as lead_start
from time_slots ts
cross join classrooms c
left join occupied_classrooms oc on oc.classroom = c.classroom
and not (oc.occupied_end <= ts.slot_start
or oc.occupied_start >= ts.slot_end)
where oc.classroom is null
)
select * from free_slots;
But then you have to collapse those into contiguous ranges. There are various ways of doing that; here I'm peeking at the previous and next rows to decide if a particular value is the edge of a range:
with date_time_range as (...),
time_slots as (...),
free_slots as (...),
free_slots_extended as (
select fs.classroom, fs.slot_num,
case when fs.lag_end is null or fs.lag_end != fs.slot_start
then fs.slot_start end as slot_start,
case when fs.lead_start is null or fs.lead_start != fs.slot_end
then fs.slot_end end as slot_end
from free_slots fs
)
select * from free_slots_extended
where (fse.slot_start is not null or fse.slot_end is not null);
Now we're down to 12 rows. (The outer where clause eliminates all 141 of the 153 slots from the previous step which are mid-range, since we only care about the edges):
CLASSROOM SLOT_NUM SLOT_START SLOT_END
--------- ---------- ---------------- ----------------
A 1 2013-10-10 07:00
A 12 2013-10-10 10:00
A 19 2013-10-10 11:30
A 57 2013-10-10 21:15
B 1 2013-10-10 07:00
B 9 2013-10-10 09:15
B 16 2013-10-10 10:45
B 30 2013-10-10 14:30
B 37 2013-10-10 16:00
B 57 2013-10-10 21:15
C 1 2013-10-10 07:00
C 57 2013-10-10 21:15
So those represent the edges, but on separate rows, and a final step combines them:
...
select distinct fse.classroom,
nvl(fse.slot_start, lag(fse.slot_start)
over (partition by fse.classroom order by fse.slot_num)) as slot_start,
nvl(fse.slot_end, lead(fse.slot_end)
over (partition by fse.classroom order by fse.slot_num)) as slot_end
from free_slots_extended fse
where (fse.slot_start is not null or fse.slot_end is not null)
Or putting all that together:
with date_time_range as (
select to_date('10/10/2013 07:00', 'DD/MM/YYYY HH24:MI') as date_start,
to_date('10/10/2013 21:15', 'DD/MM/YYYY HH24:MI') as date_end
from dual
),
time_slots as (
select level as slot_num,
dtr.date_start + (level - 1) * interval '15' minute as slot_start,
dtr.date_start + level * interval '15' minute as slot_end
from date_time_range dtr
connect by level <= (dtr.date_end - dtr.date_start) * (24 * 4) -- 15-minutes
),
free_slots as (
select c.classroom, ts.slot_num, ts.slot_start, ts.slot_end,
lag(ts.slot_end) over (partition by c.classroom order by ts.slot_num)
as lag_end,
lead(ts.slot_start) over (partition by c.classroom order by ts.slot_num)
as lead_start
from time_slots ts
cross join classrooms c
left join occupied_classrooms oc on oc.classroom = c.classroom
and not (oc.occupied_end <= ts.slot_start
or oc.occupied_start >= ts.slot_end)
where oc.classroom is null
),
free_slots_extended as (
select fs.classroom, fs.slot_num,
case when fs.lag_end is null or fs.lag_end != fs.slot_start
then fs.slot_start end as slot_start,
case when fs.lead_start is null or fs.lead_start != fs.slot_end
then fs.slot_end end as slot_end
from free_slots fs
)
select distinct fse.classroom,
nvl(fse.slot_start, lag(fse.slot_start)
over (partition by fse.classroom order by fse.slot_num)) as slot_start,
nvl(fse.slot_end, lead(fse.slot_end)
over (partition by fse.classroom order by fse.slot_num)) as slot_end
from free_slots_extended fse
where (fse.slot_start is not null or fse.slot_end is not null)
order by 1, 2;
Which gives:
CLASSROOM SLOT_START SLOT_END
--------- ---------------- ----------------
A 2013-10-10 07:00 2013-10-10 10:00
A 2013-10-10 11:30 2013-10-10 21:15
B 2013-10-10 07:00 2013-10-10 09:15
B 2013-10-10 10:45 2013-10-10 14:30
B 2013-10-10 16:00 2013-10-10 21:15
C 2013-10-10 07:00 2013-10-10 21:15
SQL Fiddle.
It is always a challenge when you like to "select something which does not exist". First you need a list of all available classrooms and times (in interval of 15 Minutes). Then you can select them by skipping the occupied items.
I managed to make a query without any PL/SQL:
CREATE TABLE Table1
(Classroom VARCHAR2(10), start_ts DATE, end_ts DATE);
INSERT INTO Table1 VALUES ('A', TIMESTAMP '2013-01-10 10:00:00', TIMESTAMP '2013-01-10 11:30:00');
INSERT INTO Table1 VALUES ('B', TIMESTAMP '2013-01-10 09:15:00', TIMESTAMP '2013-01-10 10:45:00');
INSERT INTO Table1 VALUES ('B', TIMESTAMP '2013-01-10 14:30:00', TIMESTAMP '2013-01-10 16:00:00');
WITH all_rooms AS
(SELECT CHR(64+LEVEL) AS ROOM FROM dual CONNECT BY LEVEL <= 3),
all_times AS
(SELECT CAST(TIMESTAMP '2013-01-10 07:00:00' + (LEVEL-1) * INTERVAL '15' MINUTE AS DATE) AS TIMES, LEVEL AS SLOT
FROM DUAL
CONNECT BY TIMESTAMP '2013-01-10 07:00:00' + (LEVEL-1) * INTERVAL '15' MINUTE <= TIMESTAMP '2013-01-10 21:15:00'),
all_free_slots AS
(SELECT ROOM, TIMES, SLOT,
CASE SLOT-LAG(SLOT, 1, 0) OVER (PARTITION BY ROOM ORDER BY SLOT)
WHEN 1 THEN 0
ELSE 1
END AS NEW_WINDOW
FROM all_times
CROSS JOIN all_rooms
WHERE NOT EXISTS
(SELECT 1 FROM TABLE1 WHERE ROOM = CLASSROOM AND TIMES BETWEEN START_TS + INTERVAL '1' MINUTE AND END_TS - INTERVAL '1' MINUTE)),
free_time_windows AS
(SELECT ROOM, TIMES, SLOT,
SUM(NEW_WINDOW) OVER (PARTITION BY ROOM ORDER BY SLOT) AS WINDOW_ID
FROM all_free_slots)
SELECT ROOM,
TO_CHAR(MIN(TIMES), 'yyyy-mm-dd hh24:mi') AS free_time_start,
TO_CHAR(MAX(TIMES), 'yyyy-mm-dd hh24:mi') AS free_time_end
FROM free_time_windows
GROUP BY ROOM, WINDOW_ID
HAVING MAX(TIMES) - MIN(TIMES) > 0
ORDER BY ROOM, 2;
ROOM FREE_TIME_START FREE_TIME_END
---- ----------------------------------
A 2013-01-10 07:00 2013-01-10 10:00
A 2013-01-10 11:30 2013-01-10 21:15
B 2013-01-10 07:00 2013-01-10 09:15
B 2013-01-10 10:45 2013-01-10 14:30
B 2013-01-10 16:00 2013-01-10 21:15
C 2013-01-10 07:00 2013-01-10 21:15
In order to understand the query you can split the sub-queries from top, e.g.
WITH all_rooms AS
(SELECT CHR(64+LEVEL) AS ROOM FROM dual CONNECT BY LEVEL <= 3),
all_times AS
(SELECT CAST(TIMESTAMP '2013-01-10 07:00:00' + (LEVEL-1) * INTERVAL '15' MINUTE AS DATE) AS TIMES, LEVEL AS SLOT
FROM DUAL
CONNECT BY TIMESTAMP '2013-01-10 07:00:00' + (LEVEL-1) * INTERVAL '15' MINUTE <= TIMESTAMP '2013-01-10 21:15:00')
SELECT ROOM, TIMES, SLOT,
CASE SLOT-LAG(SLOT, 1, 0) OVER (PARTITION BY ROOM ORDER BY SLOT)
WHEN 1 THEN 0
ELSE 1
END AS NEW_WINDOW
FROM all_times
CROSS JOIN all_rooms
WHERE NOT EXISTS (SELECT 1 FROM TABLE1 WHERE ROOM = CLASSROOM AND TIMES BETWEEN START_TS + INTERVAL '1' MINUTE AND END_TS - INTERVAL '1' MINUTE)
ORDER BY ROOM, SLOT