I have written below query to retrieve the values,
select * from abc where status in ('login,'logout');
Status
Time
Login
2021-08-29 10:00:00
Logout
2021-08-29 10:30:00
Login
2021-08-29 11:00:00
In the table above, I have to check if the latest logout status is empty or present. But logout is not done yet.
I want to use IF condition to check if logout status is empty or present and it should use only the latest values. For example, the above query retrieves all values, but I want only the latest value ,that is
Login is at 2021-08-29 11:00:00 , but logout is not done yet.
I want only last record as output. Suppose if logout has been done,
Status
Time
Login
2021-08-29 10:00:00
Logout
2021-08-29 10:30:00
Login
2021-08-29 11:00:00
Logout
2021-08-29 11:30:00
Then I should get,
Status
Time
Login
2021-08-29 11:00:00
Logout
2021-08-29 11:30:00
as output
If the correct sequence is maintained by the logging application (Login before Logout and no consecutive duplicates of the same status), then you need to select the last two statuses and decide, whether to show the time of Logout (if it is the last), or not (if it is followed by Login status).
insert into t(status, time)
select 'Login', timestamp '2021-08-29 10:00:00' from dual union all
select 'Logout', timestamp '2021-08-29 10:30:00' from dual union all
select 'Login', timestamp '2021-08-29 11:00:00' from dual
create view v_test as
with last_ as (
select t.*
, row_number() over(partition by status order by time desc) as rn
/*There should be nothing after last logout*/
, lead(status) over(order by time) as next_status
from t
where status in ('Login', 'Logout')
)
select
status
/*
Show the time of the last status (regardless of its type),
And the time of the Login status if it's followed by Logout.
*/
, decode(next_status, 'Logout', time, null, time) as time
from last_
where rn = 1
select *
from v_test
STATUS | TIME
:----- | :------------------
Login | 2021-08-29 11:00:00
Logout | null
insert into t(status, time)
values('Logout', timestamp '2021-08-29 11:30:00')
select *
from v_test
STATUS | TIME
:----- | :------------------
Login | 2021-08-29 11:00:00
Logout | 2021-08-29 11:30:00
db<>fiddle here
The following approach uses window functions to determine the last login time and the logout time after the last login time
SELECT
status, time
FROM (
select
status,time ,
CASE
WHEN status='Login' AND
(time = MAX(time) OVER (PARTITION BY status)) THEN 1
ELSE 0
END as is_last_login,
CASE
WHEN status='Logout' AND
(time > MAX(CASE WHEN status='Login' THEN time END) OVER ()) THEN 1
ELSE 0
END as is_logout_after_last_login
from
abc
where
status in ('Login','Logout')
) t
WHERE
t.is_last_login=1 OR t.is_logout_after_last_login=1;
or
WITH last_login_time AS (
select max(time) last_login from abc where status='Login'
)
select
status,
time
from
abc
where
(
status='Login' AND time = (select last_login from last_login_time)
) OR
(
status='Logout' AND time > (select last_login from last_login_time)
)
Working Demo Db Fiddle
Edit 1
Left joining on a temporary table as shown in the example query below will also provide a null result for Logout if there is no logout entry as yet
WITH last_login_time AS (
select max(time) last_login from abc where status='Login'
),
login_logout_times AS (
select
status,
time
from
abc
where
(
status='Login' AND time = (select last_login from last_login_time)
) OR
(
status='Logout' AND time > (select last_login from last_login_time)
)
)
select
d.status,
l.time
from (
select 'Login' as status from dual
union all
select 'Logout' as status from dual
) d
left join login_logout_times l on l.status=d.status
Working Demo Db Fiddle
Let me know if this works for you.
Related
I've been trying to work out a solution to this problem but so far haven't been able to work it out. I'm using Oracle.
I have a set of data that looks like this:
| USER | ACTIVITY | START_TIME | END_TIME | DURATION |
|--------|------------|-----------------|-----------------|----------|
| jsmith | Front Desk | 2020-08-24 8:00 | 2020-08-24 9:30 | 90 |
| jsmith | Phones | 2020-08-24 8:15 | 2020-08-24 8:45 | 30 |
| jsmith | Phones | 2020-08-24 9:45 | 2020-08-24 9:50 | 5 |
| bjones | Phones | 2020-08-24 9:00 | 2020-08-24 9:10 | 10 |
| bjones | Front Desk | 2020-08-24 9:05 | 2020-08-24 9:15 | 10 |
| bjones | Phones | 2020-08-24 9:15 | 2020-08-24 9:45 | 30 |
The above output can be generated from the following query:
SELECT
USER,
ACTIVITY,
START_TIME,
END_TIME,
DURATION
FROM USER_ACTIVITIES
WHERE USER IN ('jsmith', 'bjones')
AND START_TIME BETWEEN '2020-08-24 00:00:00' AND '2020-08-25 00:00:00'
ORDER BY USER, START_TIME, END_TIME
;
I need to calculate the total "busy" time per user, taking into account that some of the activities overlap each other. Using the existing query I'll get a total duration per user of 125 for jsmith and 50 for bjones, However since some of the activities overlapped this doesn't reflect the total amount of time the users were busy.
The output I'm looking for is the total busy duration per day by user:
| USER | DATE | DURATION |
|--------|------------|----------|
| jsmith | 2020-08-24 | 95 |
| bjones | 2020-08-24 | 45 |
Any help with this would be greatly appreciated.
You can unpivot the minutes first, and then exempt the non-overlapping interval through use of NOT EXISTS (I didn't consider the day interval due to this case, you can add EXTRACT( hour FROM max_end_time - min_start_time )*3600 if needed for other calculation cases )
WITH t AS
(
SELECT "user" , MIN(start_time) AS min_start_time, MAX(end_time) AS max_end_time
FROM user_activities
GROUP BY "user"
), t2 AS
(
SELECT "user", min_start_time + NUMTODSINTERVAL(level, 'minute') AS minutes
FROM t
CONNECT BY level <= EXTRACT( hour FROM max_end_time - min_start_time )*60 +
EXTRACT( minute FROM max_end_time - min_start_time )
AND PRIOR SYS_GUID() IS NOT NULL
AND PRIOR "user" = "user"
)
SELECT "user", COUNT(*) AS "Duration"
FROM t2
WHERE EXISTS ( SELECT *
FROM user_activities
WHERE minutes BETWEEN start_time and end_time
AND "user" = t2."user" )
GROUP BY "user"
Demo
I would address this with gaps-and-islands techniques rather than recursion:
select usr, sum(duration) * 24 * 60 duration
from (
select usr, max(end_time) - min(start_time) duration
from (
select
ua.*,
sum(case when start_time <= lag_end_time then 0 else 1 end) over(partition by usr order by start_time) grp
from (
select
ua.*,
lag(end_time) over(partition by usr order by start_time) lag_end_time
from user_activities ua
) ua
) ua
group by usr, grp
) ua
group by usr
The idea is to build groups of records having the same user and overlapping periods, using a window sum. You can then take the difference between the end and start of each "island", and finally aggregate per user.
The below code requires at least 12c:
WITH user_activities( "user", activity, start_time, end_time ) AS
(
SELECT 'jsmith', 'Front Desk', timestamp'2020-08-24 08:00:00' , timestamp'2020-08-24 09:30:00' FROM dual UNION ALL
SELECT 'jsmith', 'Phones' , timestamp'2020-08-24 08:15:00' , timestamp'2020-08-24 08:45:00' FROM dual UNION ALL
SELECT 'jsmith', 'Phones' , timestamp'2020-08-24 09:45:00' , timestamp'2020-08-24 09:50:00' FROM dual UNION ALL
SELECT 'bjones', 'Phones' , timestamp'2020-08-24 09:00:00' , timestamp'2020-08-24 09:10:00' FROM dual UNION ALL
SELECT 'bjones', 'Front Desk', timestamp'2020-08-24 09:05:00' , timestamp'2020-08-24 09:15:00' FROM dual UNION ALL
SELECT 'bjones', 'Phones' , timestamp'2020-08-24 09:15:00' , timestamp'2020-08-24 09:45:00' FROM dual
)
select "user", sum(durations) as durations
from
(
select "user", extract(hour from (end_time - start_time)) * 60 + extract(minute from (end_time - start_time)) as durations
from user_activities
match_recognize
(
partition by "user"
order by start_time, end_time
measures first(start_time) start_time, max(end_time) as end_time
pattern (a* b)
define a as max(end_time) >= next(start_time)
)
)
group by "user";
This should solve your problem if you are interested in match_recognize
output:
Many possible solutions. Here is another one: using CTE, first calculate the clean end time (if following start time is earlier than end time take following start time instead) using the LEAD function. Then sum and group by user:
WITH sampledata (username,activity,start_time,end_time)
AS
(
SELECT 'jsmith', 'Front Desk' ,'2020-08-24 8:00','2020-08-24 9:30' FROM DUAL UNION ALL
SELECT 'jsmith', 'Phones' ,'2020-08-24 8:15','2020-08-24 8:45' FROM DUAL UNION ALL
SELECT 'jsmith', 'Phones' ,'2020-08-24 9:45','2020-08-24 9:50' FROM DUAL UNION ALL
SELECT 'bjones', 'Phones' ,'2020-08-24 9:00','2020-08-24 9:10' FROM DUAL UNION ALL
SELECT 'bjones', 'Front Desk' ,'2020-08-24 9:05','2020-08-24 9:15' FROM DUAL UNION ALL
SELECT 'bjones', 'Phones' ,'2020-08-24 9:15','2020-08-24 9:45' FROM DUAL
), clean_sampledata (username,activity,start_time,end_time)
AS
(
SELECT
username,
activity,
TO_DATE(start_time,'YYYY-MM-DD HH24:MI'),
TO_DATE(end_time,'YYYY-MM-DD HH24:MI')
FROM sampledata
), clear_overlapped (username,activity,start_time,clean_end_time)
AS
(
SELECT
username,
activity,
start_time,
NVL(LEAST(LEAD(start_time) OVER (PARTITION BY username ORDER BY start_time),end_time),end_time)
FROM clean_sampledata
), cleaned_minutes_per_username (username,mins)
AS
(
SELECT
username,
ROUND((clean_end_time - start_time) * 1440)
FROM clear_overlapped
)
SELECT
username,
SUM(mins)
FROM cleaned_minutes_per_username
GROUP BY username ;
bjones 45
jsmith 50
I'm trying to find open shifts where:
First shift starts at 6 AM
Last Shift ends at 12 AM
ie:
Given the following data/day:
start_time | end_time
-----------|---------
9 AM | 3 PM
5 PM | 10 PM
Expected results:
start_time | end_time
-----------|---------
6 AM | 9 AM
3 PM | 5 PM
10 PM | 12 AM
Here's what I tried but it's not working (Ik it's mostly way far from the correct answer)
SELECT *
FROM WORKERS_SCHEDULE
WHERE START_TIME not BETWEEN
ANY (SELECT START_TIME FROM WORKERS_SCHEDULE)
AND (SELECT START_TIME FROM WORKERS_SCHEDULE)
start_time and end_time are of datatype TIME.
Here is one way to do it with union all and window functions:
select *
from (
select '06:00:00'::time start_time, min(start_time) end_time from mytable
union all
select end_time, lead(start_time) over(order by start_time) from mytable
union all
select max(end_time), '23:59:59'::time from mytable
) t
where start_time <> end_time
It is bit complicated to thouroughly explain how it works but: the first unioned query computes the interval between 6 AM and the start of the first shift, the second subquery processes declared shift, and the last one handles the interval between the last shift and midnight. Then, the outer query filters on records that have gaps. To understand how it works, you can run the subquery independently, and see how the starts and ends ajust.
Demo on DB Fiddle:
start_time | end_time
:--------- | :-------
06:00:00 | 09:00:00
15:00:00 | 17:00:00
22:00:00 | 23:59:59
Try this if it works for you
SELECT
Case when
START_TIME=(SELECT
MIN(start_time) FROM
TABLE) AND START_TIME >'6:00
AM'
THEN
'6:00 AM -' ||MIN(START_TIME)
ELSE
SELECT min(END_TIME) FROM
TABLE
WHERE ENDTIME<S.START_time
||StartTime
End
From table S
Union
(select max(endtime) ||
' 12:00 AM' from table)
This is another way:
WITH RECURSIVE open_shifts AS (
SELECT time '6:00' AS START_TIME, MIN(START_TIME) AS END_TIME
FROM WORKERS_SCHEDULE
WHERE START_TIME BETWEEN time '6:00' AND time '23:59'
UNION
SELECT start_gap.START_TIME AS START_TIME, end_gap.END_TIME AS END_TIME FROM WORKERS_SCHEDULE end_gap,
(SELECT ws.END_TIME AS START_TIME
FROM WORKERS_SCHEDULE ws, open_shifts prev_gap
WHERE ws.START_TIME = prev_gap.END_TIME) start_gap
WHERE end_gap.END_TIME > start_gap.START_TIME
AND END_TIME BETWEEN time '6:00' AND time '23:59'
)
SELECT * FROM open_shifts
UNION
SELECT MAX(END_TIME) AS START_TIME, time '23:59' AS END_TIME FROM WORKERS_SCHEDULE
WHERE END_TIME BETWEEN time '6:00' AND time '23:59';
Used a recursive CTE to find the gaps between each shift's end time and the next closest shift's start time. This probably won't work with overlapping shifts though.
I have a log table that contains (to be simple), user, operation, date.
There are two operations: search and view (search may return a hundred records; the user may view zero or more).
I need to have the basic output sorted by date, but I also need to have all of the views for one search together. Something like
name operation date
john search 1/1 1pm
john view 1/1 2pm
john view 1/1 3pm
james search 1/1 230pm
james view 1/1 315pm
john search 1/1 310pm
It seems I need to use the results of a subquery to perform the query, but I'm not sure how that would look. I'm OK with SQL but I kind of hit the ceiling with JOINs and UNIONs. :-/
You can identify the groups by using a window function. And you can include the window function in the order by, so no subqueries are needed.
select *
from log_table l
order by max(case when l.operation = 'search' then l.log_date end) over (partition by l.name order by l.log_date),
l.name,
l.log_date;
Here is a db<>fiddle.
You can use a conditional lag() call to find the most recent search date/time for each view row, per user; with search rows getting their own date/time:
-- CTE for sample data
with log_table (name, operation, log_date) as (
select 'john', 'search', timestamp '2019-01-01 13:00:00' from dual
union all select 'john', 'view', timestamp '2019-01-01 14:00:00' from dual
union all select 'john', 'view', timestamp '2019-01-01 15:00:00' from dual
union all select 'james', 'search', timestamp '2019-01-01 14:30:00' from dual
union all select 'james', 'view', timestamp '2019-01-01 15:15:00' from dual
union all select 'john', 'search', timestamp '2019-01-01 15:10:00' from dual
)
-- actual query
select name, operation, log_date,
case when operation = 'search' then log_date
else lag(case when operation = 'search' then log_date end ignore nulls)
over (partition by name order by log_date)
end as search_date
from log_table
order by log_date;
NAME OPERATION LOG_DATE SEARCH_DATE
----- --------- ------------------- -------------------
john search 2019-01-01 13:00:00 2019-01-01 13:00:00
john view 2019-01-01 14:00:00 2019-01-01 13:00:00
james search 2019-01-01 14:30:00 2019-01-01 14:30:00
john view 2019-01-01 15:00:00 2019-01-01 13:00:00
john search 2019-01-01 15:10:00 2019-01-01 15:10:00
james view 2019-01-01 15:15:00 2019-01-01 14:30:00
You can then use that as a CTE or inline view, and use the generated search_date to order first, then order the records with the same search date by their actual log date:
-- CTE for sample data
with log_table (name, operation, log_date) as (
select 'john', 'search', timestamp '2019-01-01 13:00:00' from dual
union all select 'john', 'view', timestamp '2019-01-01 14:00:00' from dual
union all select 'john', 'view', timestamp '2019-01-01 15:00:00' from dual
union all select 'james', 'search', timestamp '2019-01-01 14:30:00' from dual
union all select 'james', 'view', timestamp '2019-01-01 15:15:00' from dual
union all select 'john', 'search', timestamp '2019-01-01 15:10:00' from dual
)
-- actual query
select name, operation, log_date
from (
select name, operation, log_date,
case when operation = 'search' then log_date
else lag(case when operation = 'search' then log_date end ignore nulls)
over (partition by name order by log_date)
end as search_date
from log_table
)
order by search_date, log_date;
NAME OPERATION LOG_DATE
----- --------- -------------------
john search 2019-01-01 13:00:00
john view 2019-01-01 14:00:00
john view 2019-01-01 15:00:00
james search 2019-01-01 14:30:00
james view 2019-01-01 15:15:00
john search 2019-01-01 15:10:00
As you could potentially get simultaneous searches from two users, you might want to include the user in the final order-by clause too:
...
order by search_date, name, log_date;
I have a table EVENTS
USER EVENT_TS EVENT_TYPE
abc 2016-01-01 08:00:00 Login
abc 2016-01-01 08:25:00 Stuff
abc 2016-01-01 10:00:00 Stuff
abc 2016-01-01 14:00:00 Login
xyz 2015-12-31 18:00:00 Login
xyz 2016-01-01 08:00:00 Logout
What I need to do is produce a session field for each period of activity for each user. In addition, if the user has been idle for a period equal to or longer than p_timeout (1 hour in this case) then a new session starts at the next activity. Users don't always log out cleanly, so the logout isn't walways there...
Notes:
Logout always terminates a session
There doesn't have to be a logout or a login (because software)
Login is always a new session
Output like
USER EVENT_TS EVENT_TYPE SESSION
abc 2016-01-01 08:00:00 Login 1
abc 2016-01-01 08:25:00 Stuff 1
abc 2016-01-01 10:00:00 Stuff 2
abc 2016-01-01 14:00:00 Login 3
xyz 2015-12-31 18:00:00 Login 1
xyz 2016-01-01 08:00:00 Logout 1
Any thoughts on how to acheive this?
I think this may do what you need. I changed "user" to "usr" in the input, and "session" to "sess" in the output - I don't ever use reserved Oracle words for object names.
Note: as Boneist pointed out below, my solution will assign a session number of 0 to the first session, if it is a Logout event (or a succession of Logouts right at the top). If this situation can occur in the data, and if the desired behavior is to start session counts at 1 even in that case, then the definition of flag must be tweaked - for example, by making flag = 1 when lag(event_ts) over (partition by usr order by event_ts) is null as well.
Good luck!
with
events ( usr, event_ts, event_type ) as (
select 'abc', to_timestamp('2016-01-01 08:00:00', 'yyyy-mm-dd hh24:mi:ss'), 'Login' from dual union all
select 'abc', to_timestamp('2016-01-01 08:25:00', 'yyyy-mm-dd hh24:mi:ss'), 'Stuff' from dual union all
select 'abc', to_timestamp('2016-01-01 10:00:00', 'yyyy-mm-dd hh24:mi:ss'), 'Stuff' from dual union all
select 'abc', to_timestamp('2016-01-01 14:00:00', 'yyyy-mm-dd hh24:mi:ss'), 'Login' from dual union all
select 'xyz', to_timestamp('2015-12-31 18:00:00', 'yyyy-mm-dd hh24:mi:ss'), 'Login' from dual union all
select 'xyz', to_timestamp('2016-01-01 08:00:00', 'yyyy-mm-dd hh24:mi:ss'), 'Logout' from dual
),
start_of_sess ( usr, event_ts, event_type, flag ) as (
select usr, event_ts, event_type,
case when event_type != 'Logout'
and
( event_ts >= lag(event_ts) over (partition by usr
order by event_ts) + 1/24
or event_type = 'Login'
or lag(event_type) over (partition by usr
order by event_ts) = 'Logout'
)
then 1 end
from events
)
select usr, event_ts, event_type,
count(flag) over (partition by usr order by event_ts) as sess
from start_of_sess
;
Output (timestamps use my current NLS_TIMESTAMP_FORMAT setting):
USR EVENT_TS EVENT_TYPE SESS
--- --------------------------------- ---------- ------
abc 01-JAN-2016 08.00.00.000000000 AM Login 1
abc 01-JAN-2016 08.25.00.000000000 AM Stuff 1
abc 01-JAN-2016 10.00.00.000000000 AM Stuff 2
abc 01-JAN-2016 02.00.00.000000000 PM Login 3
xyz 31-DEC-2015 06.00.00.000000000 PM Login 1
xyz 01-JAN-2016 08.00.00.000000000 AM Logout 1
6 rows selected
I think this will do the trick:
WITH EVENTS AS (SELECT 'abc' usr, to_date('2016-01-01 08:00:00', 'yyyy-mm-dd hh24:mi:ss') event_ts, 'login' event_type FROM dual UNION ALL
SELECT 'abc' usr, to_date('2016-01-01 08:25:00', 'yyyy-mm-dd hh24:mi:ss') event_ts, 'Stuff' event_type FROM dual UNION ALL
SELECT 'abc' usr, to_date('2016-01-01 10:00:00', 'yyyy-mm-dd hh24:mi:ss') event_ts, 'Stuff' event_type FROM dual UNION ALL
SELECT 'abc' usr, to_date('2016-01-01 14:00:00', 'yyyy-mm-dd hh24:mi:ss') event_ts, 'login' event_type FROM dual UNION ALL
SELECT 'xyz' usr, to_date('2015-12-31 18:00:00', 'yyyy-mm-dd hh24:mi:ss') event_ts, 'login' event_type FROM dual UNION ALL
SELECT 'xyz' usr, to_date('2016-01-01 08:00:00', 'yyyy-mm-dd hh24:mi:ss') event_ts, 'Logout' event_type FROM dual UNION ALL
SELECT 'def' usr, to_date('2016-01-01 08:00:00', 'yyyy-mm-dd hh24:mi:ss') event_ts, 'Logout' event_type FROM dual UNION ALL
SELECT 'def' usr, to_date('2016-01-01 08:15:00', 'yyyy-mm-dd hh24:mi:ss') event_ts, 'Logout' event_type FROM dual)
SELECT usr,
event_ts,
event_type,
SUM(counter) OVER (PARTITION BY usr ORDER BY event_ts) session_id
FROM (SELECT usr,
event_ts,
event_type,
CASE WHEN LAG(event_type, 1, 'Logout') OVER (PARTITION BY usr ORDER BY event_ts) = 'Logout' THEN 1
WHEN event_type = 'Logout' THEN 0
WHEN event_ts - LAG(event_ts) OVER (PARTITION BY usr ORDER BY event_ts) > 1/24 THEN 1
WHEN event_type = 'login' THEN 1
ELSE 0
END counter
FROM EVENTS);
USR EVENT_TS EVENT_TYPE SESSION_ID
--- ------------------- ---------- ----------
abc 2016-01-01 08:00:00 login 1
abc 2016-01-01 08:25:00 Stuff 1
abc 2016-01-01 10:00:00 Stuff 2
abc 2016-01-01 14:00:00 login 3
def 2016-01-01 08:00:00 Logout 1
def 2016-01-01 08:15:00 Logout 2
xyz 2015-12-31 18:00:00 login 1
xyz 2016-01-01 08:00:00 Logout 1
This solution relies on the logic-short circuiting that takes place in the CASE expression and the fact that the event_type is not null. It also assumes that multiple logouts in a row are counted as separate sessions:
If the previous row was a logout row (and if there is no previous row - i.e. for the first row in the set - treat it as if a logout row was present), we want to increase the counter by one. (Logouts terminate the session, so we always have a new session following a logout.)
If the current row is a logout, then this terminates the existing session. Therefore, the counter shouldn't be increased.
If the time of the current row is greater than an hour from the previous row, increase the counter by one.
If the current row is a login row, then it's a new session, so increase the counter by one.
For any other case, we don't increase the counter.
Once we've done that, it's just a matter of doing a running total on the counter.
For completeness (for users with Oracle 12 or above), here is a solution using MATCH_RECOGNIZE:
select usr, event_ts, event_type, sess
from events
match_recognize(
partition by usr
order by event_ts
measures match_number() as sess
all rows per match
pattern (strt follow*)
define follow as event_type = 'Logout'
or ( event_type != 'Login'
and prev(event_type) != 'Logout'
and event_ts < prev(event_ts) + 1/24
)
)
;
Here I cover an unusual case: a Logout event following another Logout event. In such cases, I assume all consecutive Logouts, no matter how many and how far apart in time, belong to the same session. (If such cases are guaranteed not to occur in the data, so much the better.)
Please see also the Note I added to my other answer (for Oracle 11 and below) regarding the possibility of the very first event for a usr being a Logout (if that is even possible in the input data).
So data is something like this:
ID | START_DATE | END_DATE | UID | CANCELED
-------------------------------------------------
44 | 2015-10-20 22:30 | 2015-10-20 23:10 | 'one' |
52 | 2015-10-20 23:00 | 2015-10-20 23:30 | 'one' |
66 | 2015-10-21 13:00 | 2015-10-20 13:30 | 'two' |
There are more than 100k of these entries.
We can see that start_date of the second entry overlaps with the end_date of the first entry. When dates do overlap, entries with lower id should be marked as true in 'CANCELED' column.
I tried some queries but they take a really long time so I'm not sure even if they work. Also I want to cover all overlaping cases so this also seems to slow this down.
I am the one responsible for inserting/updating these entries using pl/sql
update table set column = 'value' where ID = '44';
if sql%rowcount = 0
then insert values(...)
end if
so I could maybe do this in this step. But all tables are updated/inserted using one big pl/sql created dynamically where all rows either get updated or new ones get inserted so once again this seems to get slow.
And of all the sql 'dialects' oracle one is the most cryptic I had chance to work with. Ideas?
EDIT: I forgot one important detail, there is also one more column (UID) which is to be matched, update above
I would start with this query:
update table t
set cancelled = true
where exists (select 1
from table t2
where t.end_date > t2.start_date and
t.uid = t2.uid and
t.id < t2.id
)
An index on table(uid, start_date, id) might help.
As a note: this is probably much easier to do when you create the table, because you can use lag().
I think the following update should work:
update tbl
set cancelled = 'TRUE'
where t_id in (select t_id
from tbl t
where exists (select 1
from tbl x
where x.t_id > t.t_id
and x.start_date <= t.end_date));
Fiddle: http://sqlfiddle.com/#!4/06447/1/0
If the table is extremely large, you might be better off creating a new table using a CTAS (create table as select) query, where you can use the nologging option, allowing you to avoid having to write to the undo log. When you execute an update like you are doing now, you are writing the changes to Oracle's undo log so that, prior to committing the transaction, you have the option to rollback. This adds overhead. As a result a CTAS query with nologging might run faster. Here is one way for that approach:
create table new_table nologging as
with sub as
(select t_id,
start_date,
end_date,
'TRUE' as cancelled
from tbl t
where exists (select 1
from tbl x
where x.t_id > t.t_id
and x.start_date <= t.end_date))
select *
from sub
union all
select t.*
from tbl t
left join sub s
on t.t_id = s.t_id
where s.t_id is null;
Fiddle: http://sqlfiddle.com/#!4/c6a29/1
This will do the trick without dynamic query nor correlated subqueries, but it consume some memory for the with clauses:
MERGE INTO Table1
USING
(
with q0 as(
select rownum fid, id, start_date from(
select id, start_date from table1
union all
select 999999 id, null start_date from dual
order by id
)
), q1 as (
select rownum fid, id, end_date from(
select -1 id, null end_date from dual
union all
select id, end_date from table1
order by id
)
)
select q0.fid, q1.id, q0.start_date, q1.END_DATE, case when (q0.start_date < q1.END_DATE) then 1 else 0 end canceled
from q0
join q1
on (q0.fid = q1.fid)
) ta ON (ta.id = Table1.id)
WHEN MATCHED THEN UPDATE
SET Table1.canceled = ta.canceled;
The inner with select statement with alias ta will produce this result:
"FID"|"ID"|"START_DATE" |"END_DATE" |"CANCELED"
---------------------------------------------------------
1 |-1 |20/10/15 22:30:00| |0
2 |44 |20/10/15 23:00:00|20/10/15 23:10:00|1
3 |52 |21/10/15 13:00:00|20/10/15 23:30:00|0
4 |66 | |20/10/15 13:30:00|0
Then its used in the merge vwithout any correlated queries. Tested and worked fine using SQLDeveloper.
You can use BULK COLLECT INTO and FORALL to reduce context switching within a procedure:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE test ( ID, START_DATE, END_DATE, CANCELED ) AS
SELECT 44, TO_DATE( '2015-10-20 22:30', 'YYYY-MM-DD HH24:MI' ), TO_DATE( '2015-10-20 23:10', 'YYYY-MM-DD HH24:MI' ), 'N' FROM DUAL
UNION ALL SELECT 52, TO_DATE( '2015-10-20 23:00', 'YYYY-MM-DD HH24:MI' ), TO_DATE( '2015-10-20 23:30', 'YYYY-MM-DD HH24:MI' ), 'N' FROM DUAL
UNION ALL SELECT 66, TO_DATE( '2015-10-21 13:00', 'YYYY-MM-DD HH24:MI' ), TO_DATE( '2015-10-21 12:30', 'YYYY-MM-DD HH24:MI' ), 'N' FROM DUAL
/
CREATE PROCEDURE updateCancelled
AS
TYPE ids_t IS TABLE OF test.id%TYPE INDEX BY PLS_INTEGER;
t_ids ids_t;
BEGIN
SELECT ID
BULK COLLECT INTO t_ids
FROM (
SELECT ID,
END_DATE,
LEAD( START_DATE ) OVER ( ORDER BY START_DATE ) AS NEXT_START_DATE
FROM TEST )
WHERE END_DATE > NEXT_START_DATE;
FORALL i IN 1 .. t_ids.COUNT
UPDATE TEST
SET CANCELED = 'Y'
WHERE ID = t_ids(i);
END;
/
BEGIN
updateCancelled();
END;
/
Query 1:
SELECT * FROM TEST
Results:
| ID | START_DATE | END_DATE | CANCELED |
|----|---------------------------|---------------------------|----------|
| 44 | October, 20 2015 22:30:00 | October, 20 2015 23:10:00 | Y |
| 52 | October, 20 2015 23:00:00 | October, 20 2015 23:30:00 | N |
| 66 | October, 21 2015 13:00:00 | October, 21 2015 12:30:00 | N |
Or as a single SQL statement:
UPDATE TEST
SET CANCELED = 'R'
WHERE ID IN ( SELECT ID
FROM ( SELECT ID,
END_DATE,
LEAD( START_DATE )
OVER ( ORDER BY START_DATE )
AS NEXT_START_DATE
FROM TEST )
WHERE END_DATE > NEXT_START_DATE )