To exemplify, suppose I have the following values in a table:
---------------------------------------
| ID_USER | START_DATE |
---------------------------------------
| 1 | 01/01/2018 08:00:00 |
| 1 | 01/01/2018 08:15:00 |
| 2 | 01/01/2018 08:30:00 |
| 1 | 01/01/2018 08:45:00 |
| 1 | 01/01/2018 09:00:00 |
| 2 | 01/01/2018 09:15:00 |
| 2 | 01/01/2018 09:30:00 |
| 1 | 01/01/2018 09:45:00 |
---------------------------------------
Now I would like to group by ID_USER, selecting the minimun START_DATE value as long as they are correlatives. The solution would be:
---------------------------------------
| ID_USER | START_DATE |
---------------------------------------
| 1 | 01/01/2018 08:00:00 |
| 2 | 01/01/2018 08:30:00 |
| 1 | 01/01/2018 08:45:00 |
| 2 | 01/01/2018 09:15:00 |
| 1 | 01/01/2018 09:45:00 |
---------------------------------------
Have you got any idea how can I make this query?
Compare current and previous row using LAG:
with cte as
(
select ID_USER, START_DATE,
lag(ID_USER, 1, -1) over (order by START_DATE) as prev_user
from myTable
)
select *
from cte
where ID_USER <> prev_user
with s (id_user, start_date) as (
select 1, to_date('01/01/2018 08:00:00', 'dd.mm.yyyy hh24:mi:ss') from dual union all
select 1, to_date('01/01/2018 08:15:00', 'dd.mm.yyyy hh24:mi:ss') from dual union all
select 2, to_date('01/01/2018 08:30:00', 'dd.mm.yyyy hh24:mi:ss') from dual union all
select 1, to_date('01/01/2018 08:45:00', 'dd.mm.yyyy hh24:mi:ss') from dual union all
select 1, to_date('01/01/2018 09:00:00', 'dd.mm.yyyy hh24:mi:ss') from dual union all
select 2, to_date('01/01/2018 09:15:00', 'dd.mm.yyyy hh24:mi:ss') from dual union all
select 2, to_date('01/01/2018 09:30:00', 'dd.mm.yyyy hh24:mi:ss') from dual union all
select 1, to_date('01/01/2018 09:45:00', 'dd.mm.yyyy hh24:mi:ss') from dual)
select id_user, start_date
from
(select s.*, lag(id_user) over (order by start_date) prev_user
from s
)
where lnnvl(prev_user = id_user);
ID_USER START_DATE
---------- -------------------
1 2018-01-01 08:00:00
2 2018-01-01 08:30:00
1 2018-01-01 08:45:00
2 2018-01-01 09:15:00
1 2018-01-01 09:45:00
-- Oracle 12c+: Pattern matching
with s (id_user, start_date) as (
select 1, to_date('01/01/2018 08:00:00', 'dd.mm.yyyy hh24:mi:ss') from dual union all
select 1, to_date('01/01/2018 08:15:00', 'dd.mm.yyyy hh24:mi:ss') from dual union all
select 2, to_date('01/01/2018 08:30:00', 'dd.mm.yyyy hh24:mi:ss') from dual union all
select 1, to_date('01/01/2018 08:45:00', 'dd.mm.yyyy hh24:mi:ss') from dual union all
select 1, to_date('01/01/2018 09:00:00', 'dd.mm.yyyy hh24:mi:ss') from dual union all
select 2, to_date('01/01/2018 09:15:00', 'dd.mm.yyyy hh24:mi:ss') from dual union all
select 2, to_date('01/01/2018 09:30:00', 'dd.mm.yyyy hh24:mi:ss') from dual union all
select 1, to_date('01/01/2018 09:45:00', 'dd.mm.yyyy hh24:mi:ss') from dual)
select id_user, start_date
from s
match_recognize(
order by start_date
measures
id_user as id_user,
start_date as start_date
pattern (v+)
define v as id_user = first(id_user)
);
ID_USER START_DATE
---------- -------------------
1 2018-01-01 08:15:00
2 2018-01-01 08:30:00
1 2018-01-01 09:00:00
2 2018-01-01 09:30:00
1 2018-01-01 09:45:00
Related
I have a table similar to
Start time | End Time | User |
09/02/2021 03:01:13 | 09/02/2021 03:45:15 | ABC |
09/02/2021 03:15:20 | 09/02/2021 05:03:20 | XYZ |
09/02/2021 06:03:12 | 09/02/2021 06:15:30 | DEF |
Expecting output:
StDt | EndDt | Count(1)
09/02/2021 00:00:00 | 09/02/2021 01:00:00 | 0
09/02/2021 01:00:00 | 09/02/2021 02:00:00 | 0
09/02/2021 02:00:00 | 09/02/2021 03:00:00 | 0
09/02/2021 03:00:00 | 09/02/2021 04:00:00 | 2
09/02/2021 04:00:00 | 09/02/2021 05:00:00 | 1
09/02/2021 05:00:00 | 09/02/2021 06:00:00 | 0
09/02/2021 06:00:00 | 09/02/2021 07:00:00 | 1
The interval in this example is hourly but i would like to keep it flexible for 10 mins/15 mins/30 mins.
I want this to be written in single sql.
All i could work out till now is how to generate the range.
select t1.StartDt, t1.EndDt from
(
select
(to_char(timestamp '2021-02-09 00:00:00' + numtodsinterval(rownum*60,'MINUTE') - numtodsinterval(60,'MINUTE'),'DD-MM-YYYY hh24:mi')) as StartDt,
(to_char(timestamp '2021-02-09 00:00:00' + numtodsinterval(rownum*60,'MINUTE'),'DD-MM-YYYY hh24:mi')) as EndDt
from dual connect by level <= 24
) t1;
I dont know how to link to the table mentioned above to get the data in the format i require.
You have such a nice startup, except keep the timestamp format for the time values within the subquery, and move TO_CHAR formatting to the main query at the result displaying phase along with using correlated subquery with distinctly count aggregation for the overlapping intervals, and use bind variables as the placeholder for the time portion values(60,30,15) such as
SQL> var min number
SQL> exec :min := 60
PL/SQL procedure successfully completed
min
---------
60
SQL> SELECT TO_CHAR(t.StartDt,'DD-MM-YYYY HH24:MI') AS StartDt,
2 TO_CHAR(t.EndDt,'DD-MM-YYYY HH24:MI') AS EndDt,
3 ( SELECT COUNT(DISTINCT "User")
FROM tab
WHERE t.EndDt >= Start_Time
AND t.StartDt <= End_Time ) AS Count
4 FROM
5 (
6 SELECT timestamp '2021-02-09 00:00:00' +
7 numtodsinterval(rownum * :min, 'MINUTE') -
8 numtodsinterval(:min, 'MINUTE') AS StartDt,
9 timestamp '2021-02-09 00:00:00' +
10 numtodsinterval(rownum * :min, 'MINUTE') AS EndDt
11 FROM dual
12 CONNECT BY level <= 24
13 ) t
14 ORDER BY StartDt;
STARTDT ENDDT COUNT
---------------- ---------------- ----------
09-02-2021 00:00 09-02-2021 01:00 0
09-02-2021 01:00 09-02-2021 02:00 0
09-02-2021 02:00 09-02-2021 03:00 0
09-02-2021 03:00 09-02-2021 04:00 2
09-02-2021 04:00 09-02-2021 05:00 1
09-02-2021 05:00 09-02-2021 06:00 1
09-02-2021 06:00 09-02-2021 07:00 1
09-02-2021 07:00 09-02-2021 08:00 0
.....
.....
Demo
In example: I have got the following table.
WITH
-- your input ....
input(t,grp,value) AS (
SELECT TIMESTAMP '2020-05-28 00:00:00','A',55
UNION ALL SELECT TIMESTAMP '2020-05-28 00:00:00','B',1.09
UNION ALL SELECT TIMESTAMP '2020-05-28 00:00:00','C',1.8
UNION ALL SELECT TIMESTAMP '2020-05-29 00:00:00','A',68
UNION ALL SELECT TIMESTAMP '2020-05-29 00:00:00','B',1.9
UNION ALL SELECT TIMESTAMP '2020-05-29 00:00:00','C',1.19
UNION ALL SELECT TIMESTAMP '2020-06-01 00:00:00','A',10
UNION ALL SELECT TIMESTAMP '2020-06-01 00:00:00','B',15
UNION ALL SELECT TIMESTAMP '2020-06-01 00:00:00','C',0.88
UNION ALL SELECT TIMESTAMP '2020-06-02 00:00:00','A',22
UNION ALL SELECT TIMESTAMP '2020-06-02 00:00:00','B',15
UNION ALL SELECT TIMESTAMP '2020-06-02 00:00:00','C',13
UNION ALL SELECT TIMESTAMP '2020-06-03 00:00:00','A',66
UNION ALL SELECT TIMESTAMP '2020-06-03 00:00:00','B',88
UNION ALL SELECT TIMESTAMP '2020-06-03 00:00:00','C',99
)
As you can see between dates 2020-30-05 and 2020-31-05 are missing in this table. So it is necessary to fill these dates with 2020-29-05 information grouped by GROUP. Additionally today date is larger than in the data (06-03 vs 06-08) (so in current month these observations are missing. As a result the final output should be like that :
date2 Group number
2020-28-05 00:00:00 A 55
2020-28-05 00:00:00 B 1.09
2020-28-05 00:00:00 C 1.8
2020-29-05 00:00:00 A 68
2020-29-05 00:00:00 B 1.9
2020-29-05 00:00:00 C 1.19
2020-30-05 00:00:00 A 68
2020-30-05 00:00:00 B 1.9
2020-30-05 00:00:00 C 1.19
2020-31-05 00:00:00 A 68
2020-31-05 00:00:00 B 1.9
2020-31-05 00:00:00 C 1.19
2020-01-06 00:00:00 A 10
2020-01-06 00:00:00 B 15
2020-01-06 00:00:00 C 0.88
2020-02-06 00:00:00 A 22
2020-02-06 00:00:00 B 15
2020-02-06 00:00:00 C 13
2020-03-06 00:00:00 A 66
2020-03-06 00:00:00 B 88
2020-03-06 00:00:00 C 99
And for periods 03-06 till 08-06 the same values
2020-08-06 00:00:00 A 66
2020-08-06 00:00:00 B 88
2020-08-06 00:00:00 C 99
The following code helps to find missing value in the dates, however those gaps are not filled up today dates. How to fix it?
SELECT ts AS t, grp, TS_FIRST_VALUE(value,'const') AS value
FROM input
TIMESERIES ts AS '1 DAY' OVER(PARTITION BY grp ORDER BY t)
ORDER BY 1,2
It's called INTERPOLATE and not EXTRAPOLATE, and that's the challenge.
You'll need to add the last row per group, but with today's date instead of the actual/original date, to the input table.
Note the padding and padded common table expressions I'm using below. Vertica has the analytic limit clause that I'm using here: LIMIT 1 OVER(PARTITION BY grp ORDER BY tmstmp DESC)..
WITH
input(tmstmp,grp,nbr) AS (
SELECT TIMESTAMP '2020-05-28 00:00:00','A',55
UNION ALL SELECT TIMESTAMP '2020-05-28 00:00:00','B',1.09
UNION ALL SELECT TIMESTAMP '2020-05-28 00:00:00','C',1.8
UNION ALL SELECT TIMESTAMP '2020-05-29 00:00:00','A',68
UNION ALL SELECT TIMESTAMP '2020-05-29 00:00:00','B',1.9
UNION ALL SELECT TIMESTAMP '2020-05-29 00:00:00','C',1.19
UNION ALL SELECT TIMESTAMP '2020-06-01 00:00:00','A',10
UNION ALL SELECT TIMESTAMP '2020-06-01 00:00:00','B',15
UNION ALL SELECT TIMESTAMP '2020-06-01 00:00:00','C',0.88
UNION ALL SELECT TIMESTAMP '2020-06-02 00:00:00','A',22
UNION ALL SELECT TIMESTAMP '2020-06-02 00:00:00','B',15
UNION ALL SELECT TIMESTAMP '2020-06-02 00:00:00','C',13
UNION ALL SELECT TIMESTAMP '2020-06-03 00:00:00','A',66
UNION ALL SELECT TIMESTAMP '2020-06-03 00:00:00','B',88
UNION ALL SELECT TIMESTAMP '2020-06-03 00:00:00','C',99
)
,
padding AS (
SELECT
CURRENT_DATE::timestamp
, grp
, nbr
FROM input
LIMIT 1 OVER(PARTITION BY grp ORDER BY tmstmp DESC)
)
,
padded AS (
SELECT * FROM input
UNION ALL
SELECT * FROM padding
)
SELECT
ts AS tmstmp
, grp
, TS_FIRST_VALUE(nbr,'const') AS nbr
FROM padded
TIMESERIES ts AS '1 DAY' OVER(PARTITION BY grp ORDER BY tmstmp)
ORDER BY 1,2
;
-- out tmstmp | grp | nbr
-- out ---------------------+-----+-------
-- out 2020-05-28 00:00:00 | A | 55.00
-- out 2020-05-28 00:00:00 | B | 1.09
-- out 2020-05-28 00:00:00 | C | 1.80
-- out 2020-05-29 00:00:00 | A | 68.00
-- out 2020-05-29 00:00:00 | B | 1.90
-- out 2020-05-29 00:00:00 | C | 1.19
-- out 2020-05-30 00:00:00 | A | 68.00
-- out 2020-05-30 00:00:00 | B | 1.90
-- out 2020-05-30 00:00:00 | C | 1.19
-- out 2020-05-31 00:00:00 | A | 68.00
-- out 2020-05-31 00:00:00 | B | 1.90
-- out 2020-05-31 00:00:00 | C | 1.19
-- out 2020-06-01 00:00:00 | A | 10.00
-- out 2020-06-01 00:00:00 | B | 15.00
-- out 2020-06-01 00:00:00 | C | 0.88
-- out 2020-06-02 00:00:00 | A | 22.00
-- out 2020-06-02 00:00:00 | B | 15.00
-- out 2020-06-02 00:00:00 | C | 13.00
-- out 2020-06-03 00:00:00 | A | 66.00
-- out 2020-06-03 00:00:00 | B | 88.00
-- out 2020-06-03 00:00:00 | C | 99.00
-- out 2020-06-04 00:00:00 | A | 66.00
-- out 2020-06-04 00:00:00 | B | 88.00
-- out 2020-06-04 00:00:00 | C | 99.00
-- out 2020-06-05 00:00:00 | A | 66.00
-- out 2020-06-05 00:00:00 | B | 88.00
-- out 2020-06-05 00:00:00 | C | 99.00
-- out 2020-06-06 00:00:00 | A | 66.00
-- out 2020-06-06 00:00:00 | B | 88.00
-- out 2020-06-06 00:00:00 | C | 99.00
-- out 2020-06-07 00:00:00 | A | 66.00
-- out 2020-06-07 00:00:00 | B | 88.00
-- out 2020-06-07 00:00:00 | C | 99.00
-- out 2020-06-08 00:00:00 | A | 66.00
-- out 2020-06-08 00:00:00 | B | 88.00
-- out 2020-06-08 00:00:00 | C | 99.00
-- out 2020-06-09 00:00:00 | A | 66.00
-- out 2020-06-09 00:00:00 | B | 88.00
-- out 2020-06-09 00:00:00 | C | 99.00
I need to create new interval rows based on a start datetime column and an end datetime column.
My statement looks like this currently
select id,
startdatetime,
enddatetime
from calls
result looks like this
id startdatetime enddatetime
1 01/01/2020 00:00:00 01/01/2020 04:00:00
I would like a result like this
id startdatetime enddatetime Intervals
1 01/01/2020 00:00:00 01/01/2020 03:00:00 01/01/2020 00:00:00
1 01/01/2020 00:00:00 01/01/2020 03:00:00 01/01/2020 01:00:00
1 01/01/2020 00:00:00 01/01/2020 03:00:00 01/01/2020 02:00:00
1 01/01/2020 00:00:00 01/01/2020 03:00:00 01/01/2020 03:00:00
Thanking you in advance
p.s. I'm new to SQL
You can use a recursive sub-query factoring clause to loop and incrementally add an hour:
WITH times ( id, startdatetime, enddatetime, intervals ) AS (
SELECT id,
startdatetime,
enddatetime,
startdatetime
FROM calls c
UNION ALL
SELECT id,
startdatetime,
enddatetime,
intervals + INTERVAL '1' HOUR
FROM times
WHERE intervals + INTERVAL '1' HOUR <= enddatetime
)
SELECT *
FROM times;
outputs:
ID | STARTDATETIME | ENDDATETIME | INTERVALS
-: | :------------------ | :------------------ | :------------------
1 | 2020-01-01 00:00:00 | 2020-01-01 04:00:00 | 2020-01-01 00:00:00
1 | 2020-01-01 00:00:00 | 2020-01-01 04:00:00 | 2020-01-01 01:00:00
1 | 2020-01-01 00:00:00 | 2020-01-01 04:00:00 | 2020-01-01 02:00:00
1 | 2020-01-01 00:00:00 | 2020-01-01 04:00:00 | 2020-01-01 03:00:00
1 | 2020-01-01 00:00:00 | 2020-01-01 04:00:00 | 2020-01-01 04:00:00
db<>fiddle here
You can use the hierarchy query as following:
SQL> WITH CALLS (ID, STARTDATETIME, ENDDATETIME)
2 AS ( SELECT 1,
3 TO_DATE('01/01/2020 00:00:00', 'dd/mm/rrrr hh24:mi:ss'),
4 TO_DATE('01/01/2020 04:00:00', 'dd/mm/rrrr hh24:mi:ss')
5 FROM DUAL)
6 -- Your query starts from here
7 SELECT
8 ID,
9 STARTDATETIME,
10 ENDDATETIME,
11 STARTDATETIME + ( COLUMN_VALUE / 24 ) AS INTERVALS
12 FROM
13 CALLS C
14 CROSS JOIN TABLE ( CAST(MULTISET(
15 SELECT LEVEL - 1
16 FROM DUAL
17 CONNECT BY LEVEL <= TRUNC(24 *(ENDDATETIME - STARTDATETIME))
18 ) AS SYS.ODCINUMBERLIST) )
19 ORDER BY INTERVALS;
ID STARTDATETIME ENDDATETIME INTERVALS
---------- ------------------- ------------------- -------------------
1 01/01/2020 00:00:00 01/01/2020 04:00:00 01/01/2020 00:00:00
1 01/01/2020 00:00:00 01/01/2020 04:00:00 01/01/2020 01:00:00
1 01/01/2020 00:00:00 01/01/2020 04:00:00 01/01/2020 02:00:00
1 01/01/2020 00:00:00 01/01/2020 04:00:00 01/01/2020 03:00:00
SQL>
Cheers!!
I have rows with periods of time that intersect for the same user. For example:
-------------------------------------------------------------
| ID_USER | START_DATE | END_DATE |
-------------------------------------------------------------
| 1 | 01/01/2018 08:00:00 | 01/01/2018 08:50:00 |
| 1 | 01/01/2018 08:15:00 | 01/01/2018 08:20:00 |
| 1 | 01/01/2018 08:45:00 | 01/01/2018 09:55:00 |
| 1 | 01/01/2018 15:45:00 | 01/01/2018 17:00:00 |
| 2 | 01/01/2018 08:45:00 | 01/01/2018 09:50:00 |
| 2 | 01/01/2018 09:15:00 | 01/01/2018 10:00:00 |
-------------------------------------------------------------
I want to avoid it. I would like to join rows in one single column, taking the starting date as the oldest and the ending date as the newest. The result of the above example would be:
-------------------------------------------------------------
| ID_USER | START_DATE | END_DATE |
-------------------------------------------------------------
| 1 | 01/01/2018 08:00:00 | 01/01/2018 09:55:00 |
| 1 | 01/01/2018 15:45:00 | 01/01/2018 17:00:00 |
| 2 | 01/01/2018 08:45:00 | 01/01/2018 10:00:00 |
-------------------------------------------------------------
Have you any idea how can I get the solution I want with a SQL sentence in Oracle?
You have two types of intersection; the first where one period exists entirely within another (e.g. your second row, 08:15-08:20), and the second where one period overlaps the start or end of another.
If you eliminate the first type then you can use lead and lag to peek ahead and behind at what's left; I've added a third data set for further fun:
select id_user, start_date, end_date,
case when start_date <= lag(end_date) over (partition by id_user order by start_date)
then null
else start_date
end as calc_start_date,
case when end_date >= lead(start_date) over (partition by id_user order by end_date)
then null
else end_date
end as calc_end_date
from your_table t1
where not exists (
select *
from your_table t2
where t2.id_user = t1.id_user
and t2.start_date <= t1.start_date and t2.end_date >= t1.end_date
and t2.rowid != t1.rowid
);
ID_USER START_DATE END_DATE CALC_START_DATE CALC_END_DATE
---------- ------------------- ------------------- ------------------- ----------------------
1 2018-01-01 08:00:00 2018-01-01 08:50:00 2018-01-01 08:00:00
1 2018-01-01 08:45:00 2018-01-01 09:55:00 2018-01-01 09:55:00
1 2018-01-01 15:45:00 2018-01-01 17:00:00 2018-01-01 15:45:00 2018-01-01 17:00:00
2 2018-01-01 08:45:00 2018-01-01 09:50:00 2018-01-01 08:45:00
2 2018-01-01 09:15:00 2018-01-01 10:00:00 2018-01-01 10:00:00
3 2018-01-01 08:00:00 2018-01-01 08:30:00 2018-01-01 08:00:00
3 2018-01-01 08:15:00 2018-01-01 08:45:00
3 2018-01-01 08:45:00 2018-01-01 09:15:00
3 2018-01-01 09:00:00 2018-01-01 09:30:00 2018-01-01 09:30:00
The not exists clause removed the first type.
You can then collapse what is left, firstly by eliminating the rows that overlapped both ends (in my extra rows for ID 3), which have both the lead and lag values as null; and then using lead and lag again to replace the remaining nulls with their adjacent rows' values:
select distinct id_user,
case when calc_start_date is null
then lag(calc_start_date) over (partition by id_user order by start_date)
else calc_start_date
end as start_date,
case when calc_end_date is null
then lead(calc_end_date) over (partition by id_user order by end_date)
else calc_end_date
end as end_date
from (
select id_user, start_date, end_date,
case when start_date <= lag(end_date) over (partition by id_user order by start_date)
then null
else start_date
end as calc_start_date,
case when end_date >= lead(start_date) over (partition by id_user order by end_date)
then null
else end_date
end as calc_end_date
from your_table t1
where not exists (
select *
from your_table t2
where t2.id_user = t1.id_user
and t2.start_date <= t1.start_date and t2.end_date >= t1.end_date
and t2.rowid != t1.rowid
)
)
where calc_start_date is not null
or calc_end_date is not null
order by id_user, start_date, end_date;
ID_USER START_DATE END_DATE
---------- ------------------- -------------------
1 2018-01-01 08:00:00 2018-01-01 09:55:00
1 2018-01-01 15:45:00 2018-01-01 17:00:00
2 2018-01-01 08:45:00 2018-01-01 10:00:00
3 2018-01-01 08:00:00 2018-01-01 09:30:00
It wouldn't entirely surprise me if there are edge cases I haven't considered and which cause problems, but hopefully will be a starting point anyway.
There are four steps required to get the result, represented with three subqueries and one main query:
1) increase END_DATE to the maximum thus far
This is required, as your END_DATE is not ordered, e.g. the first record intersects with the third record, but the second record doen't intersect with the third one.
ID_USER START_DATE END_DATE
---------- ------------------- -------------------
1 01.01.2018 08:00:00 01.01.2018 08:50:00
1 01.01.2018 08:15:00 01.01.2018 08:50:00
1 01.01.2018 08:45:00 01.01.2018 09:55:00
1 01.01.2018 15:45:00 01.01.2018 17:00:00
2 01.01.2018 08:45:00 01.01.2018 09:50:00
2 01.01.2018 09:15:00 01.01.2018 10:00:00
2) Define a new group for each non-overlapping chunk
Technically for the first record (per USER_ID) and for each record that doesn't overlap with ist predecessor - assign a new group_id (GRP)
ID_USER START_DATE END_DATE GRP
---------- ------------------- ------------------- ----------
1 01.01.2018 08:00:00 01.01.2018 08:50:00 1
1 01.01.2018 08:15:00 01.01.2018 08:50:00
1 01.01.2018 08:45:00 01.01.2018 09:55:00
1 01.01.2018 15:45:00 01.01.2018 17:00:00 4
2 01.01.2018 08:45:00 01.01.2018 09:50:00 1
2 01.01.2018 09:15:00 01.01.2018 10:00:00
3) Fill the Groups
Fill the NULLs with the last group Id assigned to enable GROUP BY.
ID_USER START_DATE END_DATE GRP2
---------- ------------------- ------------------- ----------
1 01.01.2018 08:00:00 01.01.2018 08:50:00 1
1 01.01.2018 08:15:00 01.01.2018 08:50:00 1
1 01.01.2018 08:45:00 01.01.2018 09:55:00 1
1 01.01.2018 15:45:00 01.01.2018 17:00:00 4
2 01.01.2018 08:45:00 01.01.2018 09:50:00 1
2 01.01.2018 09:15:00 01.01.2018 10:00:00 1
4) GROUP BY
The rest is simple, the dates are MIN and MAX within the group. You group on the kay (ID_USER) and teh GRP.
ID_USER START_DATE END_DATE
---------- ------------------- -------------------
1 01.01.2018 08:00:00 01.01.2018 09:55:00
1 01.01.2018 15:45:00 01.01.2018 17:00:00
2 01.01.2018 08:45:00 01.01.2018 10:00:00
The query
with myt1 as (
select ID_USER, START_DATE,
max(END_DATE) over (partition by ID_USER order by START_DATE) END_DATE
from my_table),
myt2 as (
select ID_USER,START_DATE, END_DATE,
case when (nvl(lag(END_DATE) over (partition by ID_USER order by START_DATE),START_DATE-1) < START_DATE ) then
row_number() over (partition by ID_USER order by START_DATE) end grp
from myt1
),
myt3 as (
select ID_USER,START_DATE, END_DATE,
last_value(grp ignore nulls) over (partition by ID_USER order by START_DATE) as grp2
from myt2
),
select
ID_USER,
min(START_DATE) START_DATE,
max(END_DATE) END_DATE
from myt3
group by ID_USER, GRP2
order by 1,2;
The data
create table my_table as
select 1 ID_USER, to_date('01/01/2018 08:00:00','dd/mm/yyyy hh24:mi:ss') START_DATE, to_date('01/01/2018 08:50:00','dd/mm/yyyy hh24:mi:ss') END_DATE from dual union all
select 1 ID_USER, to_date('01/01/2018 08:15:00','dd/mm/yyyy hh24:mi:ss') START_DATE, to_date('01/01/2018 08:20:00','dd/mm/yyyy hh24:mi:ss') END_DATE from dual union all
select 1 ID_USER, to_date('01/01/2018 08:45:00','dd/mm/yyyy hh24:mi:ss') START_DATE, to_date('01/01/2018 09:55:00','dd/mm/yyyy hh24:mi:ss') END_DATE from dual union all
select 1 ID_USER, to_date('01/01/2018 15:45:00','dd/mm/yyyy hh24:mi:ss') START_DATE, to_date('01/01/2018 17:00:00','dd/mm/yyyy hh24:mi:ss') END_DATE from dual union all
select 2 ID_USER, to_date('01/01/2018 08:45:00','dd/mm/yyyy hh24:mi:ss') START_DATE, to_date('01/01/2018 09:50:00','dd/mm/yyyy hh24:mi:ss') END_DATE from dual union all
select 2 ID_USER, to_date('01/01/2018 09:15:00','dd/mm/yyyy hh24:mi:ss') START_DATE, to_date('01/01/2018 10:00:00','dd/mm/yyyy hh24:mi:ss') END_DATE from dual;
You are looking for the MIN/MAX function:
SELECT MIN(aggregate_expression),MAX(aggregate_expression)
FROM tables
[WHERE conditions]
GROUP BY ID;
Reference:
https://www.techonthenet.com/oracle/functions/min.php
I would like to group rows of a table by an individual time frame.
As an example let's imagine we have a list of departures at an airport:
| Departure | Flight | Destination |
| 2016-06-01 10:12:00 | LH1234 | New York |
| 2016-06-02 14:23:00 | LH1235 | Berlin |
| 2016-06-02 14:30:00 | LH1236 | Tokio |
| 2016-06-03 18:45:00 | LH1237 | Belgrad |
| 2016-06-04 04:10:00 | LH1237 | Rio |
| 2016-06-04 06:20:00 | LH1237 | Paris |
I can easily group the data by full hours (days, weeks, ...) using the following query:
select to_char(departure, 'HH24') as "full hour", count(*) as "number flights"
from departures
group by to_char(departure, 'HH24')
This should result in the following table.
| full hour | number flights |
| 04 | 1 |
| 06 | 1 |
| 10 | 1 |
| 14 | 2 |
| 18 | 1 |
Now my question: Is there an elegant way (or best practise) to group data by an individual time frame.
The result I'm looking for is the following:
| time frame | number flights |
| 2016-05-31 22:00 - 2016-06-01 06:00 | 0 |
| 2016-06-01 06:00 - 2016-06-01 14:00 | 1 |
| 2016-06-01 14:00 - 2016-06-01 22:00 | 0 |
| 2016-06-01 22:00 - 2016-06-02 06:00 | 0 |
| 2016-06-02 06:00 - 2016-06-02 14:00 | 0 |
| 2016-06-02 14:00 - 2016-06-02 22:00 | 2 |
| 2016-06-02 22:00 - 2016-06-03 06:00 | 0 |
| 2016-06-03 06:00 - 2016-06-03 14:00 | 0 |
| 2016-06-03 14:00 - 2016-06-03 22:00 | 1 |
| 2016-06-03 22:00 - 2016-06-04 06:00 | 1 |
| 2016-06-04 06:00 - 2016-06-04 14:00 | 1 |
| 2016-06-04 14:00 - 2016-06-04 22:00 | 0 |
| 2016-06-04 22:00 - 2016-06-05 06:00 | 0 |
(The rows with 0 flights aren't relevant. They are just there for a better visualization of the problem.)
Thanks for your answers in advance. :-)
Peter
Since you have groups starting at 22:00 and multiples of 8 hours afterwards then you can use TRUNC() and an offset of 2 hours to get the results grouped by each day.
You can then work out the which third of the day the departure is in and also group by that:
GROUP BY TRUNC( Departure + 2/24 ),
FLOOR( ( Departure + 2/24 - TRUNC( Departure + 2/24 ) ) * 3 )
Something like this should work. Please note the two input variables, first_time and timespan. The timespan is whatever you want it to be (I wrote it in the form 8/24 for eight hours; if you make timespan into a bind variable as a number expressed in HOURS, you need the division by 24). Due to the way I wrote the formulas, there are NO requirements on first_time other than it should be one of your boundary date/times; it may even be in the future, it won't change the results. It may also be made into a bind variable, then you can decide in what format you want it to be made available to the query.
with timetable (departure, flight, destination) as (
select to_date('2016-06-01 10:12:00', 'yyyy-mm-dd hh24:mi:ss'), 'LH1234', 'New York'
from dual union all
select to_date('2016-06-02 14:23:00', 'yyyy-mm-dd hh24:mi:ss'), 'LH1235', 'Berlin'
from dual union all
select to_date('2016-06-02 14:30:00', 'yyyy-mm-dd hh24:mi:ss'), 'LH1236', 'Tokyo'
from dual union all
select to_date('2016-06-03 18:45:00', 'yyyy-mm-dd hh24:mi:ss'), 'LH1237', 'Belgrad'
from dual union all
select to_date('2016-06-04 04:10:00', 'yyyy-mm-dd hh24:mi:ss'), 'LH1237', 'Rio'
from dual union all
select to_date('2016-06-04 06:20:00', 'yyyy-mm-dd hh24:mi:ss'), 'LH1237', 'Paris'
from dual
),
input_values (first_time, timespan) as (
select to_date('2010-01-01 06:00:00', 'yyyy-mm-dd hh24:mi:ss'), 8/24 from dual
),
prep (adj_departure, flight, destination) as (
select first_time + timespan * floor((departure - first_time) / timespan),
flight, destination
from timetable, input_values
)
select to_char(adj_departure, 'yyyy-mm-dd hh24:mi:ss') || ' - ' ||
to_char(adj_departure + timespan, 'yyyy-mm-dd hh24:mi:ss') as time_interval,
count(*) as ct
from prep, input_values
group by adj_departure, timespan
order by adj_departure
;
Output:
TIME_INTERVAL CT
----------------------------------------- ----------
2016-06-01 06:00:00 - 2016-06-01 14:00:00 1
2016-06-02 14:00:00 - 2016-06-02 22:00:00 2
2016-06-03 14:00:00 - 2016-06-03 22:00:00 1
2016-06-03 22:00:00 - 2016-06-04 06:00:00 1
2016-06-04 06:00:00 - 2016-06-04 14:00:00 1