Left join on single record - sql

I have the next table that stores events:
(simplified structure)
ID
User
Action
Timestamp
12
user1
END
2022-01-01 05:00
43
user1
START
2022-01-01 04:00
54
user1
END
2022-01-01 03:00
13
user1
START
2022-01-01 02:00
I need to join 2 events in one row, so any START event is accompanied by the END event that comes after that.
So the result should be the next:
ID1
ID2
User
Start Timestamp
End Timestamp
13
54
user1
2022-01-01 02:00
2022-01-01 03:00
43
12
user1
2022-01-01 04:00
2022-01-01 05:00
Ideally, it should not have to many performance issues, as there could be a lot of records in the table.
I've tried the next query:
select
s.id as "ID1",
e.id as "ID2",
s.user,
s.time as "Start Time",
e.time as "End Time"
from Events s
left join Events e on s.user = e.user
where s.action = 'START'
and e.action = 'END'
and s.timestamp < e.timestamp
but it will also match the record 13 to record 12.
Is it possible to join the left side to right only once? (keeping in mind that is should be the next END record time-wise?
Thanks

One way is a lateral join that picks the smallest "end" timestamp that is greater than the "start" timestamp:
select st.id as id1,
en.id as id2,
st."timestamp" as start_timestamp,
en."timestamp" as end_timestamp
from events st
left join lateral (
select id, "timestamp"
from events e
where e."user" = st."user"
and e.action = 'END'
and e.timestamp >= st.timestamp
order by "timestamp"
fetch first 1 row only
) en on true
where st.action = 'START';
The above is standard ANSI SQL and works (at least) in Postgres.
In Postgres I would create an index on events ("user", "timestamp") where action = 'END' to make the lateral query fast.

Here is a PostgreSQL solution using lateral join. It might be working on HANA as no Postgres-specific features are used. The internal query selects the 'END' action for the same user that occurred soonest after the corresponding 'START'. Events that have started but not finished yet will have NULL values for "ID2" and "End timestamp".
create temporary table the_table(id integer, usr text, action text, ts timestamp);
insert into the_table values
(12,'user1','END','2022-01-01 05:00'),(43,'user1','START','2022-01-01 04:00'),
(54,'user1','END','2022-01-01 03:00'),(13,'user1','START','2022-01-01 02:00');
select tx.id as "ID1", l.id as "ID2", tx.usr as "User",
tx.ts as "Start timestamp", l.ts as "End timestamp"
from the_table as tx
left join lateral
(
select ti.id, ti.ts
from the_table as ti
where ti.action = 'END'
and ti.ts > tx.ts
and ti.usr = tx.usr
order by ti.ts - tx.ts
limit 1
) as l on true
where tx.action = 'START'
order by "Start timestamp";

The issue with your query above is that for each start event, there can be multiple end events, which occur after. However, you would like to choose the one that's 'closest' to the start event. You can achieve this by adding an additional aggregation.
Please find a HANA example (uses no HANA specific functionality):
CREATE TABLE TEST (ID integer, USER NVARCHAR(20), ACTION NVARCHAR(20), TIMESTAMP DATETIME)
INSERT INTO TEST VALUES (12, 'user1', 'END', '2022-01-01 05:00')
INSERT INTO TEST VALUES (43, 'user1', 'START', '2022-01-01 04:00')
INSERT INTO TEST VALUES (54, 'user1', 'END', '2022-01-01 03:00')
INSERT INTO TEST VALUES (13, 'user1', 'START', '2022-01-01 02:00')
INSERT INTO TEST VALUES (13, 'user1', 'START', '2022-01-01 09:00')
SELECT
S.ID ID1,
S.USER,
S.ACTION,
S.TIMESTAMP START_TIME,
MIN(E.TIMESTAMP) END_TIME
FROM TEST S
JOIN TEST E ON (
s.USER = e.USER AND
s.ACTION = 'START' AND
e.ACTION = 'END' AND
e.TIMESTAMP >= s.TIMESTAMP
)
GROUP BY S.ID, S.ACTION, S.USER, S.TIMESTAMP
If you need to have E.ID included, you will need to join it back to the result set. Note, that there may be multiple end events with the same timestamp, which you need to handle when joining back E.ID.
If you additionally would like to include START events without corresponding END event, you can use the following:
INSERT INTO TEST VALUES (13, 'user1', 'START', '2022-01-01 09:00')
SELECT
S.ID ID1,
S.USER,
S.ACTION,
S.TIMESTAMP START_TIME,
MIN(E.TIMESTAMP) END_TIME
FROM TEST S
LEFT JOIN TEST E ON (
s.USER = e.USER AND
e.ACTION = 'END' AND
e.TIMESTAMP >= s.TIMESTAMP
)
WHERE s.ACTION ='START'
GROUP BY S.ID, S.ACTION, S.USER, S.TIMESTAMP

We want to get the nearest timestamp of the END event for each START event.
I would go with the following approach:
Get the minimum greater than zero timestamp difference for each of the START events.
Now find the actual END event using the timedelta.
Assumptions
At max we can have only one event which is not ended yet!
For every START event, the timestamps will be unique. (Same goes for END event.
WITH closest_to_start AS (
SELECT
s.id,
MIN(TIMESTAMPDIFF(SECOND, s.timestamp, e.timestamp)) AS min_delta
FROM Events AS s
INNER JOIN Events AS e ON s.user = e.user
WHERE s.action = 'START'
AND e.action = 'END'
GROUP BY s.id
HAVING min_delta >= 0
)
SELECT s.id,
e.id
FROM Events AS s
OUTER JOIN closest_to_start ON closest_to_start.id = s.id
OUTER JOIN Events AS e ON e.id = s.id
WHERE s.action = 'START'
AND e.action = 'END'
AND
(
e.timestamp IS NULL
OR
TIMESTAMPDIFF(SECOND, s.timestamp, e.timestamp) = closest_to_start.min_delta
)

You can use the window function Lead.
with Daten
as
(
Select 12 as ID, 'user1' as Benutzer, 'END' as action, '05:00' as Time
Union
Select 43 as ID, 'user1' as Benutzer, 'Start' as action, '04:00' as Time
Union
Select 54 as ID, 'user1' as Benutzer, 'END' as action, '03:00' as Time
Union
Select 13 as ID, 'user1' as Benutzer, 'Start' as action, '02:00' as Time
)
Select
*
from
(
Select
*,
lead(ID,1) over (order by number) as ID2,
lead(action,1) over (order by number) as action2,
lead(time,1) over (order by number) as time2
from
(
Select
*,
ROW_NUMBER() OVER(ORDER BY Benutzer,Time,action) as number
from
Daten
) x
) y
where y.action = 'Start'

Solution tested in HANA SQL
Same query but excluding the records that are not the min duration
CREATE TABLE "TESTSCHEMA"."EVENTS" (ID integer, "user" NVARCHAR(20), "action" NVARCHAR(20), "timestamp" SECONDDATE);
INSERT INTO "TESTSCHEMA"."EVENTS" VALUES (12, 'user1', 'END', '2022-01-01 05:00');
INSERT INTO "TESTSCHEMA"."EVENTS" VALUES (43, 'user1', 'START', '2022-01-01 04:00');
INSERT INTO "TESTSCHEMA"."EVENTS" VALUES (54, 'user1', 'END', '2022-01-01 03:00');
INSERT INTO "TESTSCHEMA"."EVENTS" VALUES (13, 'user1', 'START', '2022-01-01 02:00');
INSERT INTO "TESTSCHEMA"."EVENTS" VALUES (13, 'user1', 'START', '2022-01-01 09:00');
SELECT "ID1","ID2","Start Time","End Time" FROM
(
select
ROW_NUMBER() OVER(PARTITION BY s."ID" order by SECONDS_BETWEEN(e."timestamp",s."timestamp") DESC) AS RN,
s."ID" as "ID1",
e."ID" as "ID2",
s."user",
s."timestamp" as "Start Time",
e."timestamp" as "End Time",
SECONDS_BETWEEN(e."timestamp",s."timestamp") AS "Duration"
from "TESTSCHEMA"."EVENTS" s
left join "TESTSCHEMA"."EVENTS" e on s."user" = e."user"
where s."action" = 'START'
and e."action" = 'END'
and s."timestamp" < e."timestamp"
)WHERE RN=1

Related

How to subtract two timestamps in SQL and then count?

I want to basically find out how many users paid within 15 mins, 30 mins and 60 mins of my payment_time and trigger_time
I have the following query
with redshift_direct() as conn:
trigger_time_1 = pd.read_sql(f"""
with new_data as
(
select
cycle_end_date
, prime_tagging_by_issuer_and_product
, u.user_id
, settled_status
, delay,
ots_created_at + interval '5:30 hours' as payment_time
,case when to_char(cycle_end_date,'DD') = '15' then 'Odd' else 'Even' end as cycle_order
from
settlement_summary_from_snapshot s
left join (select distinct user_phone_number, user_id from user_events where event_name = 'UserCreatedEvent') u
on u.user_id = s.user_id
and cycle_type = 'end_cycle'
and cycle_end_date > '2021-11-30' and cycle_end_date < '2022-01-15'
)
select
bucket_id
, cycle_end_date, d.cycle_order
, date(cycle_end_date) as t_cycle_end_date
,d.prime_tagging_by_issuer_and_product
,source
,status as cause
,split_part(campaign_name ,'|', 1) as campaign
,split_part(campaign_name ,'|', 2) as sms_cycle_end_date
,split_part(campaign_name ,'|', 3) as day
,split_part(campaign_name ,'|', 4) as type
,to_char(to_date(split_part(campaign_name ,'|', 2) , 'DD/MM/YYYY'), 'YYYY-MM-DD') as campaign_date,
d.payment_time, payload_event_timestamp + interval '5:30 hours' as trigger_time
,count( s.user_id) as count
from sms_callback_events s
inner join new_data d
on s.user_id = d.user_id
where bucket_id > 'date_2021_11_30' and bucket_id < 'date_2022_01_15'
and campaign_name like '%RC%'
and event_name = 'SmsStatusUpdatedEvent'
group by 1,2,3,4,5,6,7,8,9,10,11,12,13,14
""",conn)
How do i achieve making 3 columns with number of users who paid within 15mins, 30 mins and 60 mins after trigger_time in this query? I was doing it with Pandas but I want to find a way to do it here itself. Can someone help?
I wrote my own DATEDIFF function, which returns an integer value of differencing between two dates, difference by day, by month, by year, by hour, by minute and etc. You can use this function on your queries.
DATEDIFF Function SQL Code on GitHub
Sample Query about using our DATEDIFF function:
select
datediff('minute', mm.start_date, mm.end_date) as diff_minute
from
(
select
'2022-02-24 09:00:00.100'::timestamp as start_date,
'2022-02-24 09:15:21.359'::timestamp as end_date
) mm;
Result:
---------------
diff_minute
---------------
15
---------------

How to display data on separate rows in one row

I have a table with the following setup
ID InOut_Status InOut_Datetime
1 IN 9/12/2017 8:00
2 IN 9/12/2017 10:00
1 OUT 9/12/2017 1:00
2 OUT 9/12/2017 3:00
I want to be able to see both status and date on the same row vs separate rows for example
ID In_Status In_Datetime Out_Status Out_Datetime
1 IN 9/12/2017 8:00 OUT 9/12/2017 1:00
2 IN 9/12/2017 10:00 OUT 9/12/2017 3:00
I would like to return all columns. I just provided a few for example. I also would like to show only the most recent Datetime for each ID and if the user hasn't checked out, I would like for the Out_Datetime to be blank.
Any assistance would be greatly appreciated. Thanks.
You can use self join:
SELECT *
FROM
(
SELECT ins.id
, ins.InOut_Datetime as in_time
, outs.InOut_Datetime as out_time
, row_number() over (partition by ins.id order by ins.InOut_Datetime desc) as ranking
FROM table ins
LEFT JOIN table outs
ON ins.id = outs.id
AND outs.InOut_Status = 'OUT'
AND outs.InOut_Datetime > ins.InOut_Datetime
WHERE ins.InOut_Status = 'IN'
and ins.InOut_Datetime > DATEADD(day, -1, GETDATE())
) t
WHERE t.ranking = 1
Updated query to :
get logins within last 24 hours
get the latest login of a user only
show out time only if it's later than in time
You need to left join, however, you want to limit the join to the first record returned in descending order using a sub query.
SELECT * FROM
(
SELECT
ID,InOut_Status,InOutDateTime,
CheckOutInstanceDescending = ROW_NUMBER() OVER(PARTITION BY ClockOut.ID ORDER BY ClockOut.InOutDateTime DESC)
FROM
MyTable ClockIn
LEFT OUTER JOIN MyTable ClockOut ON ClockOut.ID=ClockIn.ID
WHERE
ClockIn.InOut_Status='IN'
)AS Combined
WHERE
Combined.CheckOutInstanceDescending=1
A simple pivot would fix this the easiest:
https://technet.microsoft.com/en-us/library/ms177410(v=sql.105).aspx. You can run this in SSMS 2008 or higher(I wrote it in SQL 2016).
DECLARE #Temp TABLE (Id INT, InOut_Status VARCHAR(8), InOut_Datetime DATETIME)
INSERT INTO #Temp (Id, InOut_Status, InOut_Datetime) VALUES (1, 'IN', '9-12-2017 8:00'), (2, 'IN', '9-12-2017 10:00'),(1, 'OUT', '9-12-2017 13:00'),(2, 'OUT', '9-12-2017 16:00'),(3, 'IN', '9-12-2017 06:00')
SELECT
pvt.Id
, 'IN' AS In_Status
, pvt.[In]
, 'OUT' AS Out_Status
, pvt.OUT
From #Temp
PIVOT(MAX(InOut_Datetime) FOR InOut_Status IN ([In], [OUT])) AS pvt

Summarize values across timeline in SQL

The Problem
I have a PostgreSQL database on which I am trying to summarize the revenue of a cash register over time. The cash register can either have status ACTIVE or INACTIVE, but I only want to summarize the earnings created when it was ACTIVE for a given period of time.
I have two tables; one that marks the revenue and one that marks the cash register status:
CREATE TABLE counters
(
id bigserial NOT NULL,
"timestamp" timestamp with time zone,
total_revenue bigint,
id_of_machine character varying(50),
CONSTRAINT counters_pkey PRIMARY KEY (id)
)
CREATE TABLE machine_lifecycle_events
(
id bigserial NOT NULL,
event_type character varying(50),
"timestamp" timestamp with time zone,
id_of_affected_machine character varying(50),
CONSTRAINT machine_lifecycle_events_pkey PRIMARY KEY (id)
)
A counters entry is added every 1 minute and total_revenue only increases. A machine_lifecycle_events entry is added every time the status of the machine changes.
I have added an image illustrating the problem. It is the revenue during the blue periods which should be summarized.
What I have tried so far
I have created a query which can give me the total revenue in a given instant:
SELECT total_revenue
FROM counters
WHERE timestamp < '2014-03-05 11:00:00'
AND id_of_machine='1'
ORDER BY
timestamp desc
LIMIT 1
The questions
How do I calculate the revenue earned between two timestamps?
How do I determine the start and end timestamps of the blue periods when I have to compare the timestamps in machine_lifecycle_events with the input period?
Any ideas on how to attack this problem?
Update
Example data:
INSERT INTO counters VALUES
(1, '2014-03-01 00:00:00', 100, '1')
, (2, '2014-03-01 12:00:00', 200, '1')
, (3, '2014-03-02 00:00:00', 300, '1')
, (4, '2014-03-02 12:00:00', 400, '1')
, (5, '2014-03-03 00:00:00', 500, '1')
, (6, '2014-03-03 12:00:00', 600, '1')
, (7, '2014-03-04 00:00:00', 700, '1')
, (8, '2014-03-04 12:00:00', 800, '1')
, (9, '2014-03-05 00:00:00', 900, '1')
, (10, '2014-03-05 12:00:00', 1000, '1')
, (11, '2014-03-06 00:00:00', 1100, '1')
, (12, '2014-03-06 12:00:00', 1200, '1')
, (13, '2014-03-07 00:00:00', 1300, '1')
, (14, '2014-03-07 12:00:00', 1400, '1');
INSERT INTO machine_lifecycle_events VALUES
(1, 'ACTIVE', '2014-03-01 08:00:00', '1')
, (2, 'INACTIVE', '2014-03-03 00:00:00', '1')
, (3, 'ACTIVE', '2014-03-05 00:00:00', '1')
, (4, 'INACTIVE', '2014-03-06 12:00:00', '1');
SQL Fiddle with sample data.
Example query:
The revenue between '2014-03-02 08:00:00' and '2014-03-06 08:00:00' is 300. 100 for the first ACTIVE period, and 200 for the second ACTIVE period.
DB design
To make my work easier I sanitized your DB design before I tackled the questions:
CREATE TEMP TABLE counter (
id bigserial PRIMARY KEY
, ts timestamp NOT NULL
, total_revenue bigint NOT NULL
, machine_id int NOT NULL
);
CREATE TEMP TABLE machine_event (
id bigserial PRIMARY KEY
, ts timestamp NOT NULL
, machine_id int NOT NULL
, status_active bool NOT NULL
);
Test case in the fiddle.
Major points
Using ts instead of "timestamp". Never use basic type names as column names.
Simplified & unified the name machine_id and made it out to be integer as it should be, instead of varchar(50).
event_type varchar(50) should be an integer foreign key, too, or an enum. Or even just a boolean for only active / inactive. Simplified to status_active bool.
Simplified and sanitized INSERT statements as well.
Answers
Assumptions
total_revenue only increases (per question).
Borders of the outer time frame are included.
Every "next" row per machine in machine_event has the opposite status_active.
1. How do I calculate the revenue earned between two timestamps?
WITH span AS (
SELECT '2014-03-02 12:00'::timestamp AS s_from -- start of time range
, '2014-03-05 11:00'::timestamp AS s_to -- end of time range
)
SELECT machine_id, s.s_from, s.s_to
, max(total_revenue) - min(total_revenue) AS earned
FROM counter c
, span s
WHERE ts BETWEEN s_from AND s_to -- borders included!
AND machine_id = 1
GROUP BY 1,2,3;
2. How do I determine the start and end timestamps of the blue periods when I have to compare the timestamps in machine_event with the input period?
This query for all machines in the given time frame (span).
Add WHERE machine_id = 1 in the CTE cte to select a specific machine.
WITH span AS (
SELECT '2014-03-02 08:00'::timestamp AS s_from -- start of time range
, '2014-03-06 08:00'::timestamp AS s_to -- end of time range
)
, cte AS (
SELECT machine_id, ts, status_active, s_from
, lead(ts, 1, s_to) OVER w AS period_end
, first_value(ts) OVER w AS first_ts
FROM span s
JOIN machine_event e ON e.ts BETWEEN s.s_from AND s.s_to
WINDOW w AS (PARTITION BY machine_id ORDER BY ts)
)
SELECT machine_id, ts AS period_start, period_end -- start in time frame
FROM cte
WHERE status_active
UNION ALL -- active start before time frame
SELECT machine_id, s_from, ts
FROM cte
WHERE NOT status_active
AND ts = first_ts
AND ts <> s_from
UNION ALL -- active start before time frame, no end in time frame
SELECT machine_id, s_from, s_to
FROM (
SELECT DISTINCT ON (1)
e.machine_id, e.status_active, s.s_from, s.s_to
FROM span s
JOIN machine_event e ON e.ts < s.s_from -- only from before time range
LEFT JOIN cte c USING (machine_id)
WHERE c.machine_id IS NULL -- not in selected time range
ORDER BY e.machine_id, e.ts DESC -- only the latest entry
) sub
WHERE status_active -- only if active
ORDER BY 1, 2;
Result is the list of blue periods in your image.
SQL Fiddle demonstrating both.
Recent similar question:
Sum of time difference between rows
ok, I have an answer, but I had to assume that the id of the machine_lifecycle_events can be used to determine accessor and predecessor. So for my solution to work better you should have a link between the active and inactive events. There might be also other ways to solve it but those would add even more complexity.
first, to get the revenue for all active periods per machine you can do the following:
select c.id_of_machine, cycle_id, cycle_start, cycle_end, sum(total_revenue)
from counters c join (
select e1.id as cycle_id,
e1.timestamp as cycle_start,
e2.timestamp as cycle_end,
e1.id_of_affected_machine as cycle_machine_id
from machine_lifecycle_events e1 join machine_lifecycle_events e2
on e1.id + 1 = e2.id and -- this should be replaced with a specific column to find cycles which belong together
e1.id_of_affected_machine = e2.id_of_affected_machine
where e1.event_type = 'ACTIVE'
) cycle
on c.id_of_machine = cycle_machine_id and
cycle_start <= c.timestamp and c.timestamp <= cycle_end
group by c.id_of_machine, cycle_id, cycle_start, cycle_end
order by c.id_of_machine, cycle_id
you can further use this query and add more where conditions to get the revenue only within a time frame or for specific machines:
select sum(total_revenue)
from counters c join (
select e1.id as cycle_id,
e1.timestamp as cycle_start,
e2.timestamp as cycle_end,
e1.id_of_affected_machine as cycle_machine_id
from machine_lifecycle_events e1 join machine_lifecycle_events e2
on e1.id + 1 = e2.id and -- this should be replaced with a specific column to find cycles which belong together
e1.id_of_affected_machine = e2.id_of_affected_machine
where e1.event_type = 'ACTIVE'
) cycle
on c.id_of_machine = cycle_machine_id and
cycle_start <= c.timestamp and c.timestamp <= cycle_end
where '2014-03-02 08:00:00' <= c.timestamp and c.timestamp <= '2014-03-06 08:00:00'
and c.id_of_machine = '1'
As mentioned in the beginning, and in the comments, my way of finding connecting events isn't suitable for any more complex examples with multiple machines. The easiest way would be to have another column which would always point to the preceding event. Another way would be to have a function which would find those events but this solution couldn't make use of indices.
Use self-join and build intervals table with actual status of each interval.
with intervals as (
select e1.timestamp time1, e2.timestamp time2, e1.EVENT_TYPE as status
from machine_lifecycle_events e1
left join machine_lifecycle_events e2 on e2.id = e1.id + 1
) select * from counters c
join intervals i on (timestamp between i.time1 and i.time2 or i.time2 is null)
and i.status = 'ACTIVE';
I didn't use aggregation to show the result set, you can do this simply, I think. Also I missed machineId to simplify demonstration of this pattern.

Querying for a 'run' of consecutive columns in Postgres

I have a table:
create table table1 (event_id integer, event_time timestamp without time zone);
insert into table1 (event_id, event_time) values
(1, '2011-01-01 00:00:00'),
(2, '2011-01-01 00:00:15'),
(3, '2011-01-01 00:00:29'),
(4, '2011-01-01 00:00:58'),
(5, '2011-01-02 06:03:00'),
(6, '2011-01-02 06:03:09'),
(7, '2011-01-05 11:01:31'),
(8, '2011-01-05 11:02:15'),
(9, '2011-01-06 09:34:19'),
(10, '2011-01-06 09:34:41'),
(11, '2011-01-06 09:35:06');
I would like to construct a statement that given an event could return the length of the 'run' of events starting with that event. A run is defined by:
Two events are in a run together if they are within 30 seconds of one another.
If A and B are in a run together, and B and C are in a run together then A is in a run
with C.
However my query does not need to go backwards in time, so if I select on event 2, then only events 2, 3, and 4 should be counted as part of the run of events starting with 2, and 3 should be returned as the length of the run.
Any ideas? I'm stumped.
Here is the RECURSIVE CTE-solution. (islands-and-gaps problems naturally lend themselves to recursive CTE)
WITH RECURSIVE runrun AS (
SELECT event_id, event_time
, event_time - ('30 sec'::interval) AS low_time
, event_time + ('30 sec'::interval) AS high_time
FROM table1
UNION
SELECT t1.event_id, t1.event_time
, LEAST ( rr.low_time, t1.event_time - ('30 sec'::interval) ) AS low_time
, GREATEST ( rr.high_time, t1.event_time + ('30 sec'::interval) ) AS high_time
FROM table1 t1
JOIN runrun rr ON t1.event_time >= rr.low_time
AND t1.event_time < rr.high_time
)
SELECT DISTINCT ON (event_id) *
FROM runrun rr
WHERE rr.event_time >= '2011-01-01 00:00:15'
AND rr.low_time <= '2011-01-01 00:00:15'
AND rr.high_time > '2011-01-01 00:00:15'
;
Result:
event_id | event_time | low_time | high_time
----------+---------------------+---------------------+---------------------
2 | 2011-01-01 00:00:15 | 2010-12-31 23:59:45 | 2011-01-01 00:00:45
3 | 2011-01-01 00:00:29 | 2010-12-31 23:59:45 | 2011-01-01 00:01:28
4 | 2011-01-01 00:00:58 | 2010-12-31 23:59:30 | 2011-01-01 00:01:28
(3 rows)
Could look like this:
WITH x AS (
SELECT event_time
,row_number() OVER w AS rn
,lead(event_time) OVER w AS next_time
FROM table1
WHERE event_id >= <start_id>
WINDOW w AS (ORDER BY event_time, event_id)
)
SELECT COALESCE(
(SELECT x.rn
FROM x
WHERE (x.event_time + interval '30s') < x.next_time
ORDER BY x.rn
LIMIT 1)
,(SELECT count(*) FROM x)
) AS run_length
This version does not rely on a gap-less sequence of IDs, but on event_time only.
Identical event_time's are additionally sorted by event_id to be unambiguous.
Read about the window functions row_number() and lead() and CTE (With clause) in the manual.
Edit
If we cannot assume that a bigger event_id has a later (or equal) event_time, substitute this for the first WHERE clause:
WHERE event_time >= (SELECT event_time FROM table1 WHERE event_id = <start_id>)
Rows with the same event_time as the starting row but a a smaller event_id will still be ignored.
In the special case of one run till the end no end is found and no row returned. COALESCE returns the count of all rows instead.
You can join a table onto itself on a date difference statement. Actually, this is postgres, a simple minus works.
This subquery will find all records that is a 'start event'. That is to say, all event records that does not have another event record occurring within 30 seconds before it:
(Select a.event_id, a.event_time from
(Select event_id, event_time from table1) a
left join
(select event_id, event_time from table1) b
on a.event_time - b.event_time < '00:00:30' and a.event_time - b.event_time > '00:00:00'
where b.event_time is null) startevent
With a few changes...same logic, except picking up an 'end' event:
(Select a.event_id, a.event_time from
(Select event_id, event_time from table1) a
left join
(select event_id, event_time from table1) b
on b.event_time - a.event_time < '00:00:30' and b.event_time - a.event_time > '00:00:00'
where b.event_time is null) end_event
Now we can join these together to associate which start event goes to which end event:
(still writing...there's a couple ways at going on this. I'm assuming only the example has linear ID numbers, so you'll want to join the start event time to the end event time having the smallest positive difference on the event times).
Here's my end result...kinda nested a lot of subselects
select a.start_id, case when a.event_id is null then t1.event_id::varchar else 'single event' end as end_id
from
(select start_event.event_id as start_id, start_event.event_time as start_time, last_event.event_id, min(end_event.event_time - start_event.event_time) as min_interval
from
(Select a.event_id, a.event_time from
(Select event_id, event_time from table1) a
left join
(select event_id, event_time from table1) b
on a.event_time - b.event_time < '00:00:30' and a.event_time - b.event_time > '00:00:00'
where b.event_time is null) start_event
inner join
(Select a.event_id, a.event_time from
(Select event_id, event_time from table1) a
left join
(select event_id, event_time from table1) b
on b.event_time - a.event_time < '00:00:30' and b.event_time - a.event_time > '00:00:00'
where b.event_time is null) end_event
on end_event.event_time > start_event.event_time
--check for only event
left join
(Select a.event_id, a.event_time from
(Select event_id, event_time from table1) a
left join
(select event_id, event_time from table1) b
on b.event_time - a.event_time < '00:00:30' and b.event_time - a.event_time > '00:00:00'
where b.event_time is null) last_event
on start_event.event_id = last_event.event_id
group by 1,2,3) a
left join table1 t1 on t1.event_time = a.start_time + a.min_interval
Results as start_id, end_Id:
1;"4"
5;"6"
7;"single event"
8;"single event"
9;"11"
I had to use a third left join to pick out single events as a method of detecting events that were both start events and end events. End result is in ID's and can be linked back to your original table if you want different information than just the ID. Unsure how this solution will scale, if you've got millions of events...could be an issue.

Finding overlapping dates

I have a set of Meeting rooms and meetings in that having start date and end Date. A set of meeting rooms belong to a building.
The meeting details are kept in MeetingDetail table having a startDate and endDate.
Now I want to fire a report between two time period say reportStartDate and reportEndDate, which finds me the time slots in which all the meeting rooms are booked for a given building
Table structure
MEETING_ROOM - ID, ROOMNAME, BUILDING_NO
MEETING_DETAIL - ID, MEETING_ROOM_ID, START_DATE, END_DATE
The query has to be fired for reportStartDate and REportEndDate
Just to clarify further, the aim is to find all the time slots in which all the meeting rooms were booked in a given time period of reportStartDate and reportEndDate
For SQL Server 2005+ you could try the following (see note at the end for mysql)
WITH TIME_POINTS (POINT_P) AS
(SELECT DISTINCT START_DATE FROM MEETING_DETAIL
WHERE START_DATE > #reportStartDate AND START_DATE < #reportEndDate
UNION SELECT DISTINCT END_DATE FROM MEETING_DETAIL
WHERE END_DATE > #reportStartDate AND END_DATE < #reportEndDate
UNION SELECT #reportEndDate
UNION SELECT #reportStartDate),
WITH TIME_SLICE (START_T, END_T) AS
(SELECT A.POINT_P, MIN(B.POINT_P) FROM
TIMEPOINTS A
INNER JOIN TIMEPOINTS B ON A.POINT_P > B.POINT_P
GROUP BY A.POINT_P),
WITH SLICE_MEETINGS (START_T, END_T, MEETING_ROOM_ID, BUILDING_NO) AS
(SELECT START_T, END_T, MEETING_ROOM_ID, BUILDING_NO FROM
TIME_SLICE A
INNER JOIN MEETING_DETAIL B ON B.START_DATE <= A.START_T AND B.END_DATE >= B.END_T
INNER JOIN MEETING_ROOM C ON B.MEETING_ROOM_ID = C.ID),
WITH SLICE_COUNT (START_T, END_T, BUILDING_NO, ROOMS_C) AS
(SELECT START_T, END_T, BUILDING_NO, COUNT(MEETING_ROOM_ID) FROM
SLICE_MEETINGS
GROUP BY START_T, END_T, BUILDING_NO),
WITH ROOMS_BUILDING (BUILDING_NO, ROOMS_C) AS
(SELECT BUILDING_NO, COUNT(ID) FROM
MEETING_ROOM
GROUP BY BUILDING_NO)
SELECT B.BUILDING_NO, A.START_T, A.END_T
FROM SLICE_COUNT A.
INNER JOIN ROOMS_BUILDING B WHERE A.BUILDING_NO = B.BUILDING_NO AND B.ROOMS_C = A.ROOMS_C;
what it does is (each step corresponds to each CTE definition above)
Get all the time markers, i.e. end or start times
Get all time slices i.e. the smallest unit of time between which there is no other time marker (i.e. no meetings start in a time slice, it's either at the beginning or at the end of a time slice)
Get meetings for each time slice, so now you get something like
10.30 11.00 Room1 BuildingA
10.30 11.00 Room2 BuildingA
11.00 12.00 Room1 BuildingA
Get counts of rooms booked per building per time slice
Filter out timeslice-building combinations that match the number of rooms in each building
Edit
Since mysql doesn't support the WITH clause you'll have to construct views for each (of the 5) WITH clases above. everything else would remain the same.
After reading your comment, I think I understand the problem a bit better. As a first step I would generate a matrix of meeting rooms and time slots using cross join:
select *
from (
select distinct start_date
, end_date
from #meeting_detail
) ts
cross join
#meeting_room mr
Then, for each cell in the matrix, add meetings in that timeslot:
left join
#meeting_detail md
on mr.id = md.meeting_room_id
and ts.start_date < md.end_date
and md.start_date < ts.end_date
And then demand that there are no free rooms. For example, by saying that the left join must succeed for all rooms and time slots. A left join succeeds if any field is not null:
group by
mr.building_no
, ts.start_date
, ts.end_date
having max(case when md.meeting_room_id is null
then 1 else 0 end) = 0
Here's a complete working example. It's written for SQL Server, and the table variables (#meeting_detail) won't work in MySQL. But the report generating query should work in most databases:
set nocount on
declare #meeting_room table (id int, roomname varchar(50),
building_no int)
declare #meeting_detail table (meeting_room_id int,
start_date datetime, end_date datetime)
insert #meeting_room (id, roomname, building_no)
select 1, 'Kitchen', 6
union all select 2, 'Ballroom', 6
union all select 3, 'Conservatory', 7
union all select 4, 'Dining Room', 7
insert #meeting_detail (meeting_room_id, start_date, end_date)
select 1, '2010-08-01 9:00', '2010-08-01 10:00'
union all select 1, '2010-08-01 10:00', '2010-08-01 11:00'
union all select 2, '2010-08-01 10:00', '2010-08-01 11:00'
union all select 3, '2010-08-01 10:00', '2010-08-01 11:00'
select mr.building_no
, ts.start_date
, ts.end_date
from (
select distinct start_date
, end_date
from #meeting_detail
) ts
cross join
#meeting_room mr
left join
#meeting_detail md
on mr.id = md.meeting_room_id
and ts.start_date < md.end_date
and md.start_date < ts.end_date
group by
mr.building_no
, ts.start_date
, ts.end_date
having max(case when md.meeting_room_id is null
then 1 else 0 end) = 0
This prints:
building_no start end
6 2010-08-01 10:00:00.000 2010-08-01 11:00:00.000