Oracle SQL: to count the records based on fixed time frame (say 15 or 30 minutes) - sql

I have a table similar to
Start time | End Time | User |
09/02/2021 03:01:13 | 09/02/2021 03:45:15 | ABC |
09/02/2021 03:15:20 | 09/02/2021 05:03:20 | XYZ |
09/02/2021 06:03:12 | 09/02/2021 06:15:30 | DEF |
Expecting output:
StDt | EndDt | Count(1)
09/02/2021 00:00:00 | 09/02/2021 01:00:00 | 0
09/02/2021 01:00:00 | 09/02/2021 02:00:00 | 0
09/02/2021 02:00:00 | 09/02/2021 03:00:00 | 0
09/02/2021 03:00:00 | 09/02/2021 04:00:00 | 2
09/02/2021 04:00:00 | 09/02/2021 05:00:00 | 1
09/02/2021 05:00:00 | 09/02/2021 06:00:00 | 0
09/02/2021 06:00:00 | 09/02/2021 07:00:00 | 1
The interval in this example is hourly but i would like to keep it flexible for 10 mins/15 mins/30 mins.
I want this to be written in single sql.
All i could work out till now is how to generate the range.
select t1.StartDt, t1.EndDt from
(
select
(to_char(timestamp '2021-02-09 00:00:00' + numtodsinterval(rownum*60,'MINUTE') - numtodsinterval(60,'MINUTE'),'DD-MM-YYYY hh24:mi')) as StartDt,
(to_char(timestamp '2021-02-09 00:00:00' + numtodsinterval(rownum*60,'MINUTE'),'DD-MM-YYYY hh24:mi')) as EndDt
from dual connect by level <= 24
) t1;
I dont know how to link to the table mentioned above to get the data in the format i require.

You have such a nice startup, except keep the timestamp format for the time values within the subquery, and move TO_CHAR formatting to the main query at the result displaying phase along with using correlated subquery with distinctly count aggregation for the overlapping intervals, and use bind variables as the placeholder for the time portion values(60,30,15) such as
SQL> var min number
SQL> exec :min := 60
PL/SQL procedure successfully completed
min
---------
60
SQL> SELECT TO_CHAR(t.StartDt,'DD-MM-YYYY HH24:MI') AS StartDt,
2 TO_CHAR(t.EndDt,'DD-MM-YYYY HH24:MI') AS EndDt,
3 ( SELECT COUNT(DISTINCT "User")
FROM tab
WHERE t.EndDt >= Start_Time
AND t.StartDt <= End_Time ) AS Count
4 FROM
5 (
6 SELECT timestamp '2021-02-09 00:00:00' +
7 numtodsinterval(rownum * :min, 'MINUTE') -
8 numtodsinterval(:min, 'MINUTE') AS StartDt,
9 timestamp '2021-02-09 00:00:00' +
10 numtodsinterval(rownum * :min, 'MINUTE') AS EndDt
11 FROM dual
12 CONNECT BY level <= 24
13 ) t
14 ORDER BY StartDt;
STARTDT ENDDT COUNT
---------------- ---------------- ----------
09-02-2021 00:00 09-02-2021 01:00 0
09-02-2021 01:00 09-02-2021 02:00 0
09-02-2021 02:00 09-02-2021 03:00 0
09-02-2021 03:00 09-02-2021 04:00 2
09-02-2021 04:00 09-02-2021 05:00 1
09-02-2021 05:00 09-02-2021 06:00 1
09-02-2021 06:00 09-02-2021 07:00 1
09-02-2021 07:00 09-02-2021 08:00 0
.....
.....
Demo

Related

SQL-Aggregate Timeseries Table (HourOfDay, Val) to Average Value of HourOfDay by Weeekday (fi. Avg of Mondays 10:00-11:00, 11:00-12:00,...,Tue...)

So far I made an SQL query that provides me with a table containing the amount of customers handled for each hour of the day - given a arbitrary start and an end datetime value (from Grafana interface). The result might be over many weeks. My goal is to implement an hourly heatmap by weekday with averaged values.
How do I aggregate those customer per hour to show the average value of that hours per weekday?
So let's say I got 24 values per day over 19 days. How do I aggregate so I get 24 values for each mon, tue, wed, thu, fri, sat, sun - each hour representing the average value for those days?
Also only use data of full weeks, so strip leading and trailing days, that are not part of a fully represented week (so same amount of individual weekdays representing an average value).
Here is a segment on how the return of my SQL query looks so far. (hour of each day, number of customers):
...
2021-12-13 11:00:00 | 0
2021-12-13 12:00:00 | 3
2021-12-13 13:00:00 | 4
2021-12-13 14:00:00 | 4
2021-12-13 15:00:00 | 7
2021-12-13 16:00:00 | 17
2021-12-13 17:00:00 | 12
2021-12-13 18:00:00 | 18
2021-12-13 19:00:00 | 15
2021-12-13 20:00:00 | 8
2021-12-13 21:00:00 | 10
2021-12-13 22:00:00 | 1
2021-12-13 23:00:00 | 0
2021-12-14 00:00:00 | 0
2021-12-14 01:00:00 | 0
2021-12-14 02:00:00 | 0
2021-12-14 03:00:00 | 0
2021-12-14 04:00:00 | 0
2021-12-14 05:00:00 | 0
2021-12-14 06:00:00 | 0
2021-12-14 07:00:00 | 0
2021-12-14 08:00:00 | 0
2021-12-14 09:00:00 | 0
2021-12-14 10:00:00 | 12
2021-12-14 11:00:00 | 12
2021-12-14 12:00:00 | 19
2021-12-14 13:00:00 | 11
2021-12-14 14:00:00 | 11
2021-12-14 15:00:00 | 12
2021-12-14 16:00:00 | 9
2021-12-14 17:00:00 | 2
...
So (schematically, example data) startDate 2021-12-10 11:00 to endDate 2021-12-31 17:00
-------------------------------
...
Mon 2021-12-13 12:00 | 3
Mon 2021-12-13 13:00 | 4
Mon 2021-12-13 14:00 | 4
...
Mon 2021-12-20 12:00 | 1
Mon 2021-12-20 13:00 | 6
Mon 2021-12-20 13:00 | 2
...
Mon 2021-12-27 12:00 | 2
Mon 2021-12-27 13:00 | 2
Mon 2021-12-27 13:00 | 3
...
-------------------------------
into this:
strip leading fri 10., sat 11., sun 12.
strip trailing tue 28., wen 29., thu 30., fri 31.
average hours per weekday
-------------------------------
...
Mon 12:00 | 2
Mon 13:00 | 4
Mon 14:00 | 3
...
Tue 12:00 | x
Tue 13:00 | y
Tue 13:00 | z
...
-------------------------------
My approach so far:
WITH CustomersPerHour as (
SELECT dateadd(hour, datediff(hour, 0, Systemdatum),0) as DayHour, Count(*) as C
FROM CustomerList
WHERE CustomerID > 0
AND Datum BETWEEN '2021-12-010T11:00:00Z' AND '2021-12-31T17:00:00Z'
AND EntryID IN (62,65)
AND CustomerID IN (SELECT * FROM udf_getActiveUsers())
GROUP BY dateadd(hour, datediff(hour, 0, Systemdatum), 0)
)
-- add null values on missing data/insert missing hours
SELECT DATEDIFF(second, '1970-01-01', dt.Date) AS time, C as Customers
FROM dbo.udf_generateHoursTable('2021-12-03T18:14:56Z', '2022-03-13T18:14:56Z') as dt
LEFT JOIN CustomersPerHour cPh ON dt.Date = cPh.DayHour
ORDER BY
time ASC
Hi simpliest solution is just do what you have written in example. Create custom base for aggregation.
So first step is to prepare your data in aggregated table with Date & Hour precision & customer count.
Then create base.
This is example of basic idea:
-- EXAMPLE
SELECT
DATENAME(WEEKDAY, GETDATE()) + ' ' + CAST(DATEPART(HOUR, GETDATE()) + ':00' AS varchar(8))
-- OUTPUT: Sunday 21:00
You can concatenate data and then use it in GROUP BY clause.
Adjust this query for your use case:
SELECT
DATENAME(WEEKDAY, <DATETIME_COL>) + ' ' + CAST(DATEPART(HOUR, <DATETIME_COL>) AS varchar(8)) + ':00' as base
,SUM(...) as sum_of_whatever
,AVG(...) as avg_of_whatever
FROM <YOUR_AGG_TABLE>
GROUP BY DATENAME(WEEKDAY, <DATETIME_COL>) + ' ' + CAST(DATEPART(HOUR, <DATETIME_COL>) AS varchar(8)) + ':00'
This create base exactly as you wanted.
You can use this logic to create other desired agg. bases.

Oracle SQL List Intervals

I need to create new interval rows based on a start datetime column and an end datetime column.
My statement looks like this currently
select id,
startdatetime,
enddatetime
from calls
result looks like this
id startdatetime enddatetime
1 01/01/2020 00:00:00 01/01/2020 04:00:00
I would like a result like this
id startdatetime enddatetime Intervals
1 01/01/2020 00:00:00 01/01/2020 03:00:00 01/01/2020 00:00:00
1 01/01/2020 00:00:00 01/01/2020 03:00:00 01/01/2020 01:00:00
1 01/01/2020 00:00:00 01/01/2020 03:00:00 01/01/2020 02:00:00
1 01/01/2020 00:00:00 01/01/2020 03:00:00 01/01/2020 03:00:00
Thanking you in advance
p.s. I'm new to SQL
You can use a recursive sub-query factoring clause to loop and incrementally add an hour:
WITH times ( id, startdatetime, enddatetime, intervals ) AS (
SELECT id,
startdatetime,
enddatetime,
startdatetime
FROM calls c
UNION ALL
SELECT id,
startdatetime,
enddatetime,
intervals + INTERVAL '1' HOUR
FROM times
WHERE intervals + INTERVAL '1' HOUR <= enddatetime
)
SELECT *
FROM times;
outputs:
ID | STARTDATETIME | ENDDATETIME | INTERVALS
-: | :------------------ | :------------------ | :------------------
1 | 2020-01-01 00:00:00 | 2020-01-01 04:00:00 | 2020-01-01 00:00:00
1 | 2020-01-01 00:00:00 | 2020-01-01 04:00:00 | 2020-01-01 01:00:00
1 | 2020-01-01 00:00:00 | 2020-01-01 04:00:00 | 2020-01-01 02:00:00
1 | 2020-01-01 00:00:00 | 2020-01-01 04:00:00 | 2020-01-01 03:00:00
1 | 2020-01-01 00:00:00 | 2020-01-01 04:00:00 | 2020-01-01 04:00:00
db<>fiddle here
You can use the hierarchy query as following:
SQL> WITH CALLS (ID, STARTDATETIME, ENDDATETIME)
2 AS ( SELECT 1,
3 TO_DATE('01/01/2020 00:00:00', 'dd/mm/rrrr hh24:mi:ss'),
4 TO_DATE('01/01/2020 04:00:00', 'dd/mm/rrrr hh24:mi:ss')
5 FROM DUAL)
6 -- Your query starts from here
7 SELECT
8 ID,
9 STARTDATETIME,
10 ENDDATETIME,
11 STARTDATETIME + ( COLUMN_VALUE / 24 ) AS INTERVALS
12 FROM
13 CALLS C
14 CROSS JOIN TABLE ( CAST(MULTISET(
15 SELECT LEVEL - 1
16 FROM DUAL
17 CONNECT BY LEVEL <= TRUNC(24 *(ENDDATETIME - STARTDATETIME))
18 ) AS SYS.ODCINUMBERLIST) )
19 ORDER BY INTERVALS;
ID STARTDATETIME ENDDATETIME INTERVALS
---------- ------------------- ------------------- -------------------
1 01/01/2020 00:00:00 01/01/2020 04:00:00 01/01/2020 00:00:00
1 01/01/2020 00:00:00 01/01/2020 04:00:00 01/01/2020 01:00:00
1 01/01/2020 00:00:00 01/01/2020 04:00:00 01/01/2020 02:00:00
1 01/01/2020 00:00:00 01/01/2020 04:00:00 01/01/2020 03:00:00
SQL>
Cheers!!

How to generate series for date range with minutes interval in oracle?

In Postgres below query is working using generate_series function
SELECT dates
FROM generate_series(CAST('2019-03-01' as TIMESTAMP), CAST('2019-04-01' as TIMESTAMP), interval '30 mins') AS dates
Below query is also working in Oracle but only for date interval
select to_date('2019-03-01','YYYY-MM-DD') + rownum -1 as dates
from all_objects
where rownum <= to_date('2019-03-06','YYYY-MM-DD')-to_date('2019-03-01','YYYY-MM-DD')+1
SELECT dates
FROM generate_series(CAST('2019-03-01' as TIMESTAMP), CAST('2019-04-01' as TIMESTAMP), interval '30 mins') AS dates
I want same result in Oracle for below query
SELECT dates
FROM generate_series(CAST('2019-03-01' as TIMESTAMP), CAST('2019-04-01' as TIMESTAMP), interval '30 mins') AS dates
Use a hierarchical query:
SELECT DATE '2019-03-01' + ( LEVEL - 1 ) * INTERVAL '30' MINUTE AS dates
FROM DUAL
CONNECT BY DATE '2019-03-01' + ( LEVEL - 1 ) * INTERVAL '30' MINUTE <= DATE '2019-04-01';
Output:
| DATES |
| :------------------ |
| 2019-03-01 00:00:00 |
| 2019-03-01 00:30:00 |
| 2019-03-01 01:00:00 |
| 2019-03-01 01:30:00 |
| 2019-03-01 02:00:00 |
| 2019-03-01 02:30:00 |
| 2019-03-01 03:00:00 |
| 2019-03-01 03:30:00 |
| 2019-03-01 04:00:00 |
| 2019-03-01 04:30:00 |
| 2019-03-01 05:00:00 |
| 2019-03-01 05:30:00 |
...
| 2019-03-31 19:30:00 |
| 2019-03-31 20:00:00 |
| 2019-03-31 20:30:00 |
| 2019-03-31 21:00:00 |
| 2019-03-31 21:30:00 |
| 2019-03-31 22:00:00 |
| 2019-03-31 22:30:00 |
| 2019-03-31 23:00:00 |
| 2019-03-31 23:30:00 |
| 2019-04-01 00:00:00 |
db<>fiddle here

Splitting interval overlapping more days in PostgreSQL

I have a PostgreSQL table containing start timestamp and duration time.
timestamp | interval
------------------------------
2018-01-01 15:00:00 | 06:00:00
2018-01-02 23:00:00 | 04:00:00
2018-01-04 09:00:00 | 2 days 16 hours
What I would like is to have the interval splitted into every day like this:
timestamp | interval
------------------------------
2018-01-01 15:00:00 | 06:00:00
2018-01-02 23:00:00 | 01:00:00
2018-01-03 00:00:00 | 03:00:00
2018-01-04 09:00:00 | 15:00:00
2018-01-05 00:00:00 | 24:00:00
2018-01-06 00:00:00 | 24:00:00
2018-01-07 00:00:00 | 01:00:00
I am playing with generate_series(), width_bucket(), range functions, but I still can't find plausible solution. Is there any existing or working solution?
not sure about all edge cases, but this seems working:
t=# with c as (select *,min(t) over (), max(t+i) over (), tsrange(date_trunc('day',t),t+i) tr from t)
, mid as (
select distinct t,i,g,tr
, case when g < t then t else g end tt
from c
right outer join (select generate_series(date_trunc('day',min),date_trunc('day',max),'1 day') g from c) e on g <# tr order by 3,1
)
select
tt
, i
, case when tt+'1 day' > upper(tr) and t < g then upper(tr)::time::interval when upper(tr) - lower(tr) < '1 day' then i else g+'1 day' - tt end
from mid
order by tt;
tt | i | case
---------------------+-----------------+----------
2018-01-01 15:00:00 | 06:00:00 | 06:00:00
2018-01-02 23:00:00 | 04:00:00 | 01:00:00
2018-01-03 00:00:00 | 04:00:00 | 03:00:00
2018-01-04 09:00:00 | 2 days 16:00:00 | 15:00:00
2018-01-05 00:00:00 | 2 days 16:00:00 | 1 day
2018-01-06 00:00:00 | 2 days 16:00:00 | 1 day
2018-01-07 00:00:00 | 2 days 16:00:00 | 01:00:00
(7 rows)
also please mind that timestamp without time zone can fail you when comparing timestamps...

Group by an individual timeframe

I would like to group rows of a table by an individual time frame.
As an example let's imagine we have a list of departures at an airport:
| Departure | Flight | Destination |
| 2016-06-01 10:12:00 | LH1234 | New York |
| 2016-06-02 14:23:00 | LH1235 | Berlin |
| 2016-06-02 14:30:00 | LH1236 | Tokio |
| 2016-06-03 18:45:00 | LH1237 | Belgrad |
| 2016-06-04 04:10:00 | LH1237 | Rio |
| 2016-06-04 06:20:00 | LH1237 | Paris |
I can easily group the data by full hours (days, weeks, ...) using the following query:
select to_char(departure, 'HH24') as "full hour", count(*) as "number flights"
from departures
group by to_char(departure, 'HH24')
This should result in the following table.
| full hour | number flights |
| 04 | 1 |
| 06 | 1 |
| 10 | 1 |
| 14 | 2 |
| 18 | 1 |
Now my question: Is there an elegant way (or best practise) to group data by an individual time frame.
The result I'm looking for is the following:
| time frame | number flights |
| 2016-05-31 22:00 - 2016-06-01 06:00 | 0 |
| 2016-06-01 06:00 - 2016-06-01 14:00 | 1 |
| 2016-06-01 14:00 - 2016-06-01 22:00 | 0 |
| 2016-06-01 22:00 - 2016-06-02 06:00 | 0 |
| 2016-06-02 06:00 - 2016-06-02 14:00 | 0 |
| 2016-06-02 14:00 - 2016-06-02 22:00 | 2 |
| 2016-06-02 22:00 - 2016-06-03 06:00 | 0 |
| 2016-06-03 06:00 - 2016-06-03 14:00 | 0 |
| 2016-06-03 14:00 - 2016-06-03 22:00 | 1 |
| 2016-06-03 22:00 - 2016-06-04 06:00 | 1 |
| 2016-06-04 06:00 - 2016-06-04 14:00 | 1 |
| 2016-06-04 14:00 - 2016-06-04 22:00 | 0 |
| 2016-06-04 22:00 - 2016-06-05 06:00 | 0 |
(The rows with 0 flights aren't relevant. They are just there for a better visualization of the problem.)
Thanks for your answers in advance. :-)
Peter
Since you have groups starting at 22:00 and multiples of 8 hours afterwards then you can use TRUNC() and an offset of 2 hours to get the results grouped by each day.
You can then work out the which third of the day the departure is in and also group by that:
GROUP BY TRUNC( Departure + 2/24 ),
FLOOR( ( Departure + 2/24 - TRUNC( Departure + 2/24 ) ) * 3 )
Something like this should work. Please note the two input variables, first_time and timespan. The timespan is whatever you want it to be (I wrote it in the form 8/24 for eight hours; if you make timespan into a bind variable as a number expressed in HOURS, you need the division by 24). Due to the way I wrote the formulas, there are NO requirements on first_time other than it should be one of your boundary date/times; it may even be in the future, it won't change the results. It may also be made into a bind variable, then you can decide in what format you want it to be made available to the query.
with timetable (departure, flight, destination) as (
select to_date('2016-06-01 10:12:00', 'yyyy-mm-dd hh24:mi:ss'), 'LH1234', 'New York'
from dual union all
select to_date('2016-06-02 14:23:00', 'yyyy-mm-dd hh24:mi:ss'), 'LH1235', 'Berlin'
from dual union all
select to_date('2016-06-02 14:30:00', 'yyyy-mm-dd hh24:mi:ss'), 'LH1236', 'Tokyo'
from dual union all
select to_date('2016-06-03 18:45:00', 'yyyy-mm-dd hh24:mi:ss'), 'LH1237', 'Belgrad'
from dual union all
select to_date('2016-06-04 04:10:00', 'yyyy-mm-dd hh24:mi:ss'), 'LH1237', 'Rio'
from dual union all
select to_date('2016-06-04 06:20:00', 'yyyy-mm-dd hh24:mi:ss'), 'LH1237', 'Paris'
from dual
),
input_values (first_time, timespan) as (
select to_date('2010-01-01 06:00:00', 'yyyy-mm-dd hh24:mi:ss'), 8/24 from dual
),
prep (adj_departure, flight, destination) as (
select first_time + timespan * floor((departure - first_time) / timespan),
flight, destination
from timetable, input_values
)
select to_char(adj_departure, 'yyyy-mm-dd hh24:mi:ss') || ' - ' ||
to_char(adj_departure + timespan, 'yyyy-mm-dd hh24:mi:ss') as time_interval,
count(*) as ct
from prep, input_values
group by adj_departure, timespan
order by adj_departure
;
Output:
TIME_INTERVAL CT
----------------------------------------- ----------
2016-06-01 06:00:00 - 2016-06-01 14:00:00 1
2016-06-02 14:00:00 - 2016-06-02 22:00:00 2
2016-06-03 14:00:00 - 2016-06-03 22:00:00 1
2016-06-03 22:00:00 - 2016-06-04 06:00:00 1
2016-06-04 06:00:00 - 2016-06-04 14:00:00 1