Select rows that tightly bound input date range - sql

I have a table where each row has a TIMESTAMP column. I have as input a start date time and an end date time. I need to select rows from this table that cover the input time range without selecting any "extra" rows.
The only thing I can think of is that I need to execute multiple queries to find the row where m_time < start date time ordered descending limit 1, and the row where m_time > end date time ordered ascending limit 1. Then query from the table for rows between those two m_time.
Is there a way to do it in one query?
Example data:
| time_data |
|---------------------------|
| rownum | m_time |
|--------|------------------|
| 1 | 2020-11-01T00:00 |
| 2 | 2020-11-01T01:00 |
| 3 | 2020-11-01T02:00 |
| 4 | 2020-11-01T03:00 |
| 5 | 2020-11-01T04:00 |
| 6 | 2020-11-01T05:00 |
m_time has a data type of TIMESTAMP
Given
start_date_time = 2020-11-01T01:58
end_date_time = 2020-11-01T03:02
Expected output would be rows 2-5
Oracle Version 19.8.0.0.0

You can use lag() and lead():
select t.*
from (
select t.*,
lag(m_time) over(order by m_time) lag_m_time,
lead(m_time) over(order by m_time) lead_m_time
from time_data
) t
where (m_time <= :start_time and lead_m_time > :start_time)
or (m_time >= :start_time and m_time <= :end_time)
or (m_time >= :end_time and lag_m_time < :end_time)
I think the where clause could be simplified:
where lead_m_time > :start_time and lag_m_time < :end_time

In Oracle 12.1 and higher, match_recognize does quick work of such row pattern matching problems.
Notes - I changed your rownum column name, since rownum is a reserved keyword in Oracle. I used string data type for your dates (since that's what you seem to have), but that is a very bad practice. Make sure you always use date data type for dates. (This issue is unrelated to your question, so I left it alone.)
I simulate your sample data in the WITH clause; that is not part of the solution, it's just there for a quick test.
with
time_data (rn, m_time) as (
select 1, '2020-11-01T00:00' from dual union all
select 2, '2020-11-01T01:00' from dual union all
select 3, '2020-11-01T02:00' from dual union all
select 4, '2020-11-01T03:00' from dual union all
select 5, '2020-11-01T04:00' from dual union all
select 6, '2020-11-01T05:00' from dual
)
select rn, m_time
from time_data
match_recognize(
order by m_time
all rows per match
omit empty matches
pattern (F{0,1} M* L{0,1})
define M as m_time between :start_time and :end_time,
F as next(m_time) >= :start_time,
L as prev(m_time) <= :end_time
);
RN M_TIME
-- ----------------
2 2020-11-01T01:00
3 2020-11-01T02:00
4 2020-11-01T03:00
5 2020-11-01T04:00

Related

Find record by FROM date without TO date on row

consider the following table:
foo (
id: integer,
from_date: timestamp
...
)
with values:
id | from_date
--------------
1 | 1990-01-01
2 | 1995-01-01
3 | 2000-01-01
4 | 2005-01-01
5 | 2010-01-01
There is no column to_date. Each record with a newer date working as an upper boundary for the previous record. For instance date range for record with id 1 is since 1990-01-01 to 1995-01-01, if there is no newer record, the range is valid for now.
Can you tell me if there is some handy way how to find a relevant row for a date? For instance:
if I am looking for a valid record for date 2001-01-01, I expected row with id 3,
if for date 2010-01-01, I expected row with id 5.
I have no idea how to handle this table design, and I am considering to refactor it and add to_date column. Thank you for any advice.
A simple method is:
select t.*
from t
where t.date <= ?
order by t.date desc
fetch first 1 row only;
Note: You haven't specified the database that you are using. Not all support the standard fetch syntax, but all have something equivalent.
An alternative is to use lead():
select t.*
from (select t.*, lead(date) over (order by date) as next_date
from t
) t
where ? >= date and
? < next_date

Optimize the query of weekday statistics between two dates

I have a table with two fields: start_date and end_date. Now I want to count the total number of work overtime. I have created a new calendar table to maintain the working day status of the date.
table: workdays
id status
2020-01-01 4
2020-01-02 1
2020-01-03 1
2020-01-04 2
4: holiday, 1: weekday, 2: weekend
I created a function to calculate the weekdays between two dates (excluding weekends, holidays).
create or replace function get_workday_count (start_date in date, end_date in date)
return number is
day_count int;
begin
select count(0) into day_count from WORKDAYS
where TRUNC(ID) >= TRUNC(start_date)
and TRUNC(ID) <= TRUNC(end_date)
and status in (1, 3, 5);
return day_count;
end;
When I execute the following query statement, it takes about 5 minutes to display the results, erp_sj table has about 200000 rows of data.
select count(0) from ERP_SJ GET_WORKDAY_COUNT(start_date, end_date) > 5;
The fields used in query statements are indexed.
How to optimize? Or is there a better solution?
First of all, optimizing your function:
1.adding pragma udf (for faster execution in sql
2. Adding deterministic clause(for caching)
3. Replacing count(0) to count(*) (to allow cbo optimize count)
4. Replacing return number to int
create or replace function get_workday_count (start_date in date, end_date in date)
return int deterministic is
pragma udf;
day_count int;
begin
select count(*) into day_count from WORKDAYS w
where w.ID >= TRUNC(start_date)
and w.ID <= TRUNC(end_date)
and status in (1, 3, 5);
return day_count;
end;
Then you don't need to call your function in case of (end_date - start_date) < required number of days. Moreover, ideally it would be to use scalar subquery instead of function:
select count(*)
from ERP_SJ
where
case
when trunc(end_date) - trunc(start_date) > 5
then GET_WORKDAY_COUNT(trunc(start_date) , trunc(end_date))
else 0
end > 5
Or using subquery:
select count(*)
from ERP_SJ e
where
case
when trunc(end_date) - trunc(start_date) > 5
then (select count(*) from WORKDAYS w
where w.ID >= TRUNC(e.start_date)
and w.ID <= TRUNC(e.end_date)
and w.status in (1, 3, 5))
else 0
end > 5
WORKDAY_STATUSES table (just for completeness, not used below):
create table workday_statuses
( status number(1) constraint workday_statuses_pk primary key
, status_name varchar2(10) not null constraint workday_status_name_uk unique );
insert all
into workday_statuses values (1, 'Weekday')
into workday_statuses values (2, 'Weekend')
into workday_statuses values (3, 'Unknown 1')
into workday_statuses values (4, 'Holiday')
into workday_statuses values (5, 'Unknown 2')
select * from dual;
WORKDAYS table: one row for each day in 2020:
create table workdays
( id date constraint workdays_pk primary key
, status references workday_statuses not null )
organization index;
insert into workdays (id, status)
select date '2019-12-31' + rownum
, case
when to_char(date '2019-12-31' + rownum, 'Dy', 'nls_language = English') like 'S%' then 2
when date '2019-12-31' + rownum in
( date '2020-01-01', date '2020-04-10', date '2020-04-13'
, date '2020-05-08', date '2020-05-25', date '2020-08-31'
, date '2020-12-25', date '2020-12-26', date '2020-12-28' ) then 4
else 1
end
from xmltable('1 to 366')
where date '2019-12-31' + rownum < date '2021-01-01';
ERP_SJ table containing 30K rows with random data:
create table erp_sj
( id integer generated always as identity
, start_date date not null
, end_date date not null
, filler varchar2(100) );
insert into erp_sj (start_date, end_date, filler)
select dt, dt + dbms_random.value(0,7), dbms_random.string('x',100)
from ( select date '2019-12-31' + dbms_random.value(1,366) as dt
from xmltable('1 to 30000') );
commit;
get_workday_count() function:
create or replace function get_workday_count
( start_date in date, end_date in date )
return integer
deterministic -- Cache some results
parallel_enable -- In case you want to use it in parallel queries
as
pragma udf; -- Tell compiler to optimise for SQL
day_count integer;
begin
select count(*) into day_count
from workdays w
where w.id between trunc(start_date) and end_date
and w.status in (1, 3, 5);
return day_count;
end;
Notice that you should not truncate w.id, because all values have the time as 00:00:00 already. (I'm assuming that if end_date falls somewhere in the middle of a day, you want to count that day, so I have not truncated the end_date parameter.)
Test:
select count(*) from erp_sj
where get_workday_count(start_date, end_date) > 5;
COUNT(*)
--------
1302
Results returned in around 1.4 seconds.
Execution plan for the query within the function:
select count(*)
from workdays w
where w.id between trunc(sysdate) and sysdate +10
and w.status in (1, 3, 5);
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 1 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 1 |
|* 2 | FILTER | | 1 | | 7 |00:00:00.01 | 1 |
|* 3 | INDEX RANGE SCAN| WORKDAYS_PK | 1 | 7 | 7 |00:00:00.01 | 1 |
--------------------------------------------------------------------------------------------
Now try adding the function as a virtual column and indexing it:
create index erp_sj_workday_count_ix on erp_sj(workday_count);
select count(*) from erp_sj
where workday_count > 5;
Same result in 0.035 seconds. Plan:
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 5 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 5 |
|* 2 | INDEX RANGE SCAN| ERP_SJ_WORKDAY_COUNT_IX | 1 | 1302 | 1302 |00:00:00.01 | 5 |
-------------------------------------------------------------------------------------------------------
Tested in 19.0.0.
Edit: As Sayan pointed out, the index on the virtual column won't be automatically updated if there are any changes in WORKDAYS, so there is a risk of wrong results with this approach. However, if performance is critical you could work around it by rebuilding the index on ERP_SJ every time you updated WORKDAYS. Maybe you could do this in a statement-level trigger on WORKDAYS, or just through scheduled IT maintenance processes if updates are very infrequent and ERP_SJ isn't so big that an index rebuild is impractical. If the index is partitioned, rebuilding affected partitions could be an option.
Or, don't have an index and live with the 1.4 seconds query execution time.
I understand that the columns ID and status have indexes on them ( not functional index on TRUNC(ID) ). So use this query
SELECT count(0)
INTO day_count
FROM WORKDAYS
WHERE ID BETWEEN TRUNC(start_date) AND TRUNC(end_date)
AND status in (1, 3, 5);
in order to be able to exploit the index on date column ID also.
May be try Scalar Subquery Caching
(in case there are plenty erp_sj records with the same start_date and end_date)
select count(0) from ERP_SJ where
(select GET_WORKDAY_COUNT(start_date, end_date) from dual) > 5
You are dealing with a data warehouse query (not an OLTP query).
Some best practices says you should
get rid od functions - avoid contenxt switch (this could be somehow mitigated with the UDF pragma but why to use function if you don't need it?)
get rid of indexes - quick for few rows; slow for large number of records
get rid of caching - caching is basically a workaround for repeating same thing
So the data warehouse approach for the problem consists of two steps
Extend the Workday Table
The workday table can be with a little query extended with a new column MIN_END_DAY that defines for each (start) day the minimum threshold to reach the limit of 5 working days.
The query uses LEAD aggregate function to get the 4th leading working day (check the PARTITION BY clause that distincs between the working ays and other days.
For the non working days you simple takes the LAST_VALUE of the next working day.
Example
with wd as (
select ID, STATUS,
case when status in (1, 3, 5) then
lead(id,4) over (partition by case when status in (1, 3, 5) then 'workday' end order by id) /* 4 working days ahead */
end as min_stop_day
from workdays),
wd2 as (
select ID, STATUS,
last_value(MIN_STOP_DAY) ignore nulls over (order by id desc) MIN_END_DAY
from wd)
select ID, STATUS, MIN_END_DAY
from wd2
order by 1;
ID, STATUS, MIN_END_DAY
01.01.2020 00:00:00 4 08.01.2020 00:00:00
02.01.2020 00:00:00 1 08.01.2020 00:00:00
03.01.2020 00:00:00 1 09.01.2020 00:00:00
04.01.2020 00:00:00 2 10.01.2020 00:00:00
05.01.2020 00:00:00 2 10.01.2020 00:00:00
06.01.2020 00:00:00 1 10.01.2020 00:00:00
Join to the Base Table
Now you can simple join your base table with the the extended workday table on the start_day and filter rows by comparing the end_daywith the MIN_END_DAY
Query
with wd as (
select ID, STATUS,
case when status in (1, 3, 5) then
lead(id,4) over (partition by case when status in (1, 3, 5) then 'workday' end order by id)
end as min_stop_day
from workdays),
wd2 as (
select ID, STATUS,
last_value(MIN_STOP_DAY) ignore nulls over (order by id desc) MIN_END_DAY
from wd)
select count(*) from erp_sj
join wd2
on trunc(erp_sj.start_date) = wd2.ID
where trunc(end_day) >= min_end_day
This will lead for large tables to the expected HASH JOIN execution plan.
Note that I assume 1) the workday table is complete (otherwise you can't use inner join) and 2) contains enough future data (the last 5 rows are obviously not usable).

Postgres SQL Join on Nearest less than quarter end

I have table 1
ID | public_date
1 | 1992-06-03
2 | 2000-12-15
Table 2 is a series of the quarter end dates in a range
Date
1995-12-31
1996-03-31
..
..
2000-12-31
I would like to have the result table as
ID | date | public_date
1 | 1995-12-31 | 1992-06-03
1 | 1996-03-31 | 1992-06-03
1 | 1996-06-30 | 1992-06-03
...
...
1 | 2000-12-31 | 2000-12-15
Basically, assign the public date to the nearest quarter end date. Currently, I have this query
SELECT DISTINCT ON (x."date")
x."date", r.public_date
FROM quarter_end_series as x
LEFT JOIN public_time r ON r.public_date <= x."date"
where x.date >= '1995-12-31 00:00:00'
ORDER BY x."date", r.outlookdate desc;
But this query took 4 hours, any way to do it more efficiently?
Try a subquery:
select pt.*,
(select qes.date
from quarter_end_series qes
where qes.date <= pt.date
order by qes.date desc
) as quarter_end_date
from public_time pt;
Include an index on quarter_end_series(date).
This saves the sorting on a large amount of data -- which should make this more performant.
I guess your quarters are fixed for each year. Like:
1995-12-31
1996-03-31
1996-06-30
1996-09-31
1996-12-31
.... and so on
If it is then just find closest date from fixed quarter dates.
If quarter_end_series is not same dates for each year. You can try subquery instead of join. Like below:
SELECT DISTINCT ON ("date")
"date", (SELECT r.public_date FROM public_time r ORDER BY abs(date_diff(x."date",r.public_date)) ASC limit 1) as public_date
FROM quarter_end_series as x
where x.date >= '1995-12-31 00:00:00'
ORDER BY x."date";

How to remove overlap from time spans (SQL)

I have a table of time spans that overlap each other. I want to generate a table that covers the same time spans but doesn't overlap.
For example, say I have a table like this:
Start,End
1, 4
3, 5
7, 8
2, 4
I want a new table like this:
Start,End
1, 5
7, 8
What is the SQL query to do this?
Tested on spark-sql version 1.5.2.
(and with small changes - on Teradata, Oracle, PostgreSQL and SQL Server)
In order to guarantee the correctness of this solution the order by clauses in the two analytic functions should be identical and deterministic, so if you have an Id column use order by `Start`,`Id` instead of order by `Start`,`End`
select min(`Start`) as `Start`
,max(`End`) as `End`
from (select `Start`,`End`
,count(is_gap) over
(
order by `Start`,`End`
rows unbounded preceding
) + 1 as range_seq
from (select `Start`,`End`
,case
when max(`End`) over
(
order by `Start`,`End`
rows between unbounded preceding
and 1 preceding
) < `Start`
then 1
end is_gap
from mytable
) t
) t
group by range_seq
order by `Start`
+-------+-----+
| Start | End |
+-------+-----+
| 1 | 5 |
+-------+-----+
| 7 | 8 |
+-------+-----+

Can I use a SQL Server CTE to merge intersecting dates?

I'm writing an app that handles scheduling time off for some of our employees. As part of this, I need to calculate how many minutes throughout the day that they have requested off.
In the first version of this tool, we disallowed overlapping time off requests, because we wanted to be able to just add up the total of StartTime minus EndTime for all requests. Preventing overlaps makes this calculation very fast.
This has become problematic, because Managers now want to schedule team meetings but are unable to do so when someone has already asked for the day off.
So, in the new version of the tool, we have a requirement to allow overlapping requests.
Here is an example set of data like what we have:
UserId | StartDate | EndDate
----------------------------
1 | 2:00 | 4:00
1 | 3:00 | 5:00
1 | 3:45 | 9:00
2 | 6:00 | 9:00
2 | 7:00 | 8:00
3 | 2:00 | 3:00
3 | 4:00 | 5:00
4 | 1:00 | 7:00
The result that I need to get, as efficiently as possible, is this:
UserId | StartDate | EndDate
----------------------------
1 | 2:00 | 9:00
2 | 6:00 | 9:00
3 | 2:00 | 3:00
3 | 4:00 | 5:00
4 | 1:00 | 7:00
We can easily detect overlaps with this query:
select
*
from
requests r1
cross join
requests r2
where
r1.RequestId < r2.RequestId
and
r1.StartTime < r2.EndTime
and
r2.StartTime < r1.EndTime
This is, in fact, how we were detecting and preventing the problems originally.
Now, we are trying to merge the overlapping items, but I'm reaching the limits of my SQL ninja skills.
It wouldn't be too hard to come up with a method using temp tables, but we want to avoid this if at all possible.
Is there a set-based way to merge overlapping rows?
Edit:
It would also be acceptable for the all of the rows to show up, as long as they were collapsed into just their time. For example if someone wants off from three to five, and from four to six, it would be acceptable for them to have two rows, one from three to five, and the next from five to six OR one from three to four, and the next from four to six.
Also, here is a little test bench:
DECLARE #requests TABLE
(
UserId int,
StartDate time,
EndDate time
)
INSERT INTO #requests (UserId, StartDate, EndDate) VALUES
(1, '2:00', '4:00'),
(1, '3:00', '5:00'),
(1, '3:45', '9:00'),
(2, '6:00', '9:00'),
(2, '7:00', '8:00'),
(3, '2:00', '3:00'),
(3, '4:00', '5:00'),
(4, '1:00', '7:00');
Complete Rewrite:
;WITH new_grp AS (
SELECT r1.UserId, r1.StartTime
FROM #requests r1
WHERE NOT EXISTS (
SELECT *
FROM #requests r2
WHERE r1.UserId = r2.UserId
AND r2.StartTime < r1.StartTime
AND r2.EndTime >= r1.StartTime)
GROUP BY r1.UserId, r1.StartTime -- there can be > 1
),r AS (
SELECT r.RequestId, r.UserId, r.StartTime, r.EndTime
,count(*) AS grp -- guaranteed to be 1+
FROM #requests r
JOIN new_grp n ON n.UserId = r.UserId AND n.StartTime <= r.StartTime
GROUP BY r.RequestId, r.UserId, r.StartTime, r.EndTime
)
SELECT min(RequestId) AS RequestId
,UserId
,min(StartTime) AS StartTime
,max(EndTime) AS EndTime
FROM r
GROUP BY UserId, grp
ORDER BY UserId, grp
Now produces the requested result and really covers all possible cases, including disjunct sub-groups and duplicates.
Have a look at the comments to the test data in the working demo at data.SE.
CTE 1
Find the (unique!) points in time where a new group of overlapping intervals starts.
CTE 2
Count the starts of new group up to (and including) every individual interval, thereby forming a unique group number per user.
Final SELECT
Merge the groups, take earlies start and latest end for groups.
I faced some difficulty, because T-SQL window functions max() or sum() do not accept an ORDER BY clause in a in a window. They can only compute one value per partition, which makes it impossible to compute a running sum / count per partition. Would work in PostgreSQL or Oracle (but not in MySQL, of course - it has neither window functions nor CTEs).
The final solution uses one extra CTE and should be just as fast.
Ok, it is possible to do with CTEs. I did not know how to use them at the beginning of the night, but here is the results of my research:
A recursive CTE has 2 parts, the "anchor" statement and the "recursive" statements.
The crucial part about the recursive statement is that when it is evaluated, only the rows that have not already been evaluated will show up in the recursion.
So, for example, if we wanted to use CTEs to get an all-inclusive list of times for these users, we could use something like this:
WITH
sorted_requests as (
SELECT
UserId, StartDate, EndDate,
ROW_NUMBER() OVER (PARTITION BY UserId ORDER BY StartDate, EndDate DESC) Instance
FROM #requests
),
no_overlap(UserId, StartDate, EndDate, Instance) as (
SELECT *
FROM sorted_requests
WHERE Instance = 1
UNION ALL
SELECT s.*
FROM sorted_requests s
INNER JOIN no_overlap n
ON s.UserId = n.UserId
AND s.Instance = n.Instance + 1
)
SELECT *
FROM no_overlap
Here, the "anchor" statement is just the first instance for every user, WHERE Instance = 1.
The "recursive" statement joins each row to the next row in the set, using the s.UserId = n.UserId AND s.Instance = n.Instance + 1
Now, we can use the property of the data, when sorted by start date, that any overlapping row will have a start date that is less than the previous row's end date. If we continually propagate the row number of the first intersecting row, every subsequent overlapping row will share that row number.
Using this query:
WITH
sorted_requests as (
SELECT
UserId, StartDate, EndDate,
ROW_NUMBER() OVER (PARTITION BY UserId ORDER BY StartDate, EndDate DESC) Instance
FROM
#requests
),
no_overlap(UserId, StartDate, EndDate, Instance, ConnectedGroup) as (
SELECT
UserId,
StartDate,
EndDate,
Instance,
Instance as ConnectedGroup
FROM sorted_requests
WHERE Instance = 1
UNION ALL
SELECT
s.UserId,
s.StartDate,
CASE WHEN n.EndDate >= s.EndDate
THEN n.EndDate
ELSE s.EndDate
END EndDate,
s.Instance,
CASE WHEN n.EndDate >= s.StartDate
THEN n.ConnectedGroup
ELSE s.Instance
END ConnectedGroup
FROM sorted_requests s
INNER JOIN no_overlap n
ON s.UserId = n.UserId AND s.Instance = n.Instance + 1
)
SELECT
UserId,
MIN(StartDate) StartDate,
MAX(EndDate) EndDate
FROM no_overlap
GROUP BY UserId, ConnectedGroup
ORDER BY UserId
We group by the aforementioned "first intersecting row" (called ConnectedGroup in this query) and find the minimum start time and maximum end time in that group.
The first intersecting row is propagated using this statement:
CASE WHEN n.EndDate >= s.StartDate
THEN n.ConnectedGroup
ELSE s.Instance
END ConnectedGroup
Which basically says, "if this row intersects with the previous row (based on us being sorted by start date), then consider this row to have the same 'row grouping' as the previous row. Otherwise, use this row's own row number as the 'row grouping' for itself."
This gives us exactly what we were looking for.
EDIT
When I had originally thought this up on my whiteboard, I knew that I would have to advance the EndDate of each row, to ensure that it would intersect with the next row, if any of the previous rows in the connected group would have intersected. I accidentally left that out. This has been corrected.
This works for postgres. Microsoft might need some modifications.
SET search_path='tmp';
DROP TABLE tmp.schedule CASCADE;
CREATE TABLE tmp.schedule
( person_id INTEGER NOT NULL
, dt_from timestamp with time zone
, dt_to timestamp with time zone
);
INSERT INTO schedule( person_id, dt_from, dt_to) VALUES
( 1, '2011-12-03 02:00:00' , '2011-12-03 04:00:00' )
, ( 1, '2011-12-03 03:00:00' , '2011-12-03 05:00:00' )
, ( 1, '2011-12-03 03:45:00' , '2011-12-03 09:00:00' )
, ( 2, '2011-12-03 06:00:00' , '2011-12-03 09:00:00' )
, ( 2, '2011-12-03 07:00:00' , '2011-12-03 08:00:00' )
, ( 3, '2011-12-03 02:00:00' , '2011-12-03 03:00:00' )
, ( 3, '2011-12-03 04:00:00' , '2011-12-03 05:00:00' )
, ( 4, '2011-12-03 01:00:00' , '2011-12-03 07:00:00' );
ALTER TABLE schedule ADD PRIMARY KEY (person_id,dt_from)
;
CREATE UNIQUE INDEX ON schedule (person_id,dt_to);
SELECT * FROM schedule ORDER BY person_id, dt_from;
WITH RECURSIVE ztree AS (
-- Terminal part
SELECT p1.person_id AS person_id
, p1.dt_from AS dt_from
, p1.dt_to AS dt_to
FROM schedule p1
UNION
-- Recursive part
SELECT p2.person_id AS person_id
, LEAST(p2.dt_from, zzt.dt_from) AS dt_from
, GREATEST(p2.dt_to, zzt.dt_to) AS dt_to
FROM ztree AS zzt
, schedule AS p2
WHERE 1=1
AND p2.person_id = zzt.person_id
AND (p2.dt_from < zzt.dt_from AND p2.dt_to >= zzt.dt_from)
)
SELECT *
FROM ztree zt
WHERE NOT EXISTS (
SELECT * FROM ztree nx
WHERE nx.person_id = zt.person_id
-- the recursive query returns *all possible combinations of
-- touching or overlapping intervals
-- we'll have to filter, keeping only the biggest ones
-- (the ones for which there is no bigger overlapping interval)
AND ( (nx.dt_from <= zt.dt_from AND nx.dt_to > zt.dt_to)
OR (nx.dt_from < zt.dt_from AND nx.dt_to >= zt.dt_to)
)
)
ORDER BY zt.person_id,zt.dt_from
;
Result:
DROP TABLE
CREATE TABLE
INSERT 0 8
NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index "schedule_pkey" for table "schedule"
ALTER TABLE
CREATE INDEX
person_id | dt_from | dt_to
-----------+------------------------+------------------------
1 | 2011-12-03 02:00:00+01 | 2011-12-03 04:00:00+01
1 | 2011-12-03 03:00:00+01 | 2011-12-03 05:00:00+01
1 | 2011-12-03 03:45:00+01 | 2011-12-03 09:00:00+01
2 | 2011-12-03 06:00:00+01 | 2011-12-03 09:00:00+01
2 | 2011-12-03 07:00:00+01 | 2011-12-03 08:00:00+01
3 | 2011-12-03 02:00:00+01 | 2011-12-03 03:00:00+01
3 | 2011-12-03 04:00:00+01 | 2011-12-03 05:00:00+01
4 | 2011-12-03 01:00:00+01 | 2011-12-03 07:00:00+01
(8 rows)
person_id | dt_from | dt_to
-----------+------------------------+------------------------
1 | 2011-12-03 02:00:00+01 | 2011-12-03 09:00:00+01
2 | 2011-12-03 06:00:00+01 | 2011-12-03 09:00:00+01
3 | 2011-12-03 02:00:00+01 | 2011-12-03 03:00:00+01
3 | 2011-12-03 04:00:00+01 | 2011-12-03 05:00:00+01
4 | 2011-12-03 01:00:00+01 | 2011-12-03 07:00:00+01
(5 rows)