Cumulative sum group by 15 min interval - Oracle SQL - sql

I would like to get the cumulative sum group by 15 min interval.
eg:
table name: myTable
id name start_time faults
============================================
1 a 06/07/19 23:30 1
2 b 06/07/19 23:35 1
3 c 06/07/19 23:36 1
4 d 06/07/19 23:50 1
5 e 06/07/19 23:54 1
6 f 07/07/19 00:05 1
7 g 07/07/19 00:20 1
8 h 07/07/19 00:25 1
Result:
start_Time faults
============================================
06/07/19 23:15 0
06/07/19 23:30 3
06/07/19 23:45 5
06/07/19 00:00 6
07/07/19 00:15 8
07/07/19 00:30 8
08/07/19 00:45 8
08/07/19 01:00 8
thanks

I think you want:
select trunc(start_time, 'hh') + (floor(extract(minute from start_time) / 15) * 15) * interval '1' minute as dte,
sum(count(*)) over (order by min(start_time))
from t
group by trunc(start_time, 'hh') + (floor(extract(minute from start_time) / 15) * 15) * interval '1' minute
order by dte;
Here is a db<>fiddle.

This query gives cumulative sums for existing quarters:
dbfiddle
select date '1900-01-01' + tm / 24 / 4 tm, sum(sum(faults)) over (order by tm) faults
from (select floor((start_time - date '1900-01-01') * 24 * 4) tm, faults from mytable)
group by tm
If you want wider date range then generate it using connect by query and join with above using lag() with ignore nulls, like here:
with
flts as (
select date '1900-01-01' + tm / 24 / 4 tm, sum(sum(faults)) over (order by tm) faults
from (select floor((start_time - date '1900-01-01') * 24 * 4) tm, faults from mytable)
group by tm),
quarters as (
select to_date('2019-07-06 23:00', 'yyyy-mm-dd hh24:mi') + level * interval '15' minute tm
from dual connect by level <= 10 )
select to_char(tm, 'yyyy-mm-dd hh24:mi') tm,
nvl(faults, lag(faults, 1, 0) ignore nulls over (order by tm))
from quarters left join flts using (tm)

You can achieve it using group by with dates truncated to 15 mins and then using analytical function to calculate the cumulative sum as following:
Select
start_time
+ 15/1440
- mod((start_time - trunc(start_time)) * 1440, 15)/1440,
sum(count(1)) over (order by min(start_time))
from your_table
Group by start_time
+ 15/1440
- mod((start_time - trunc(start_time)) * 1440, 15)/1440
It is same as gordon's answer but using arithmetics. :)
Cheers!!

Related

Split row data base on timestamp SQL Oracle

Good day everyone. I have a table as below. Duration is the time from current state to next state.
Timestamp
State
Duration(minutes)
10/9/2022 8:50:00 AM
A
35
10/9/2022 9:25:00 AM
B
10
10/9/2022 9:35:00 AM
C
...
How do I split data at 9:00 AM of each day like below:
Timestamp
State
Duration(minutes)
10/9/2022 8:50:00 AM
A
10
10/9/2022 9:00:00 AM
A
25
10/9/2022 9:25:00 AM
B
10
10/9/2022 9:35:00 AM
C
...
Thank you.
Use a row-generator function to generate extra rows when the timestamp is before 09:00 and the next timestamp is after 09:00 (and calculate the diff value rather than storing it in the table):
SELECT l.ts AS timestamp,
t.state,
ROUND((l.next_ts - l.ts) * 24 * 60, 2) As diff
FROM (
SELECT timestamp,
LEAD(timestamp) OVER (ORDER BY timestamp) AS next_timestamp,
state
FROM table_name
) t
CROSS APPLY (
SELECT GREATEST(
t.timestamp,
TRUNC(t.timestamp - INTERVAL '9' HOUR) + INTERVAL '9' HOUR + LEVEL - 1
) AS ts,
LEAST(
t.next_timestamp,
TRUNC(t.timestamp - INTERVAL '9' HOUR) + INTERVAL '9' HOUR + LEVEL
) AS next_ts
FROM DUAL
CONNECT BY
TRUNC(t.timestamp - INTERVAL '9' HOUR) + INTERVAL '9' HOUR + LEVEL - 1 < t.next_timestamp
) l;
Which, for your sample data:
CREATE TABLE table_name (Timestamp, State) AS
SELECT DATE '2022-10-09' + INTERVAL '08:50' HOUR TO MINUTE, 'A' FROM DUAL UNION ALL
SELECT DATE '2022-10-09' + INTERVAL '09:25' HOUR TO MINUTE, 'B' FROM DUAL UNION ALL
SELECT DATE '2022-10-09' + INTERVAL '09:35' HOUR TO MINUTE, 'C' FROM DUAL UNION ALL
SELECT DATE '2022-10-12' + INTERVAL '09:35' HOUR TO MINUTE, 'D' FROM DUAL;
Outputs:
TIMESTAMP
STATE
DIFF
2022-10-09 08:50:00
A
10
2022-10-09 09:00:00
A
25
2022-10-09 09:25:00
B
10
2022-10-09 09:35:00
C
1405
2022-10-10 09:00:00
C
1440
2022-10-11 09:00:00
C
1440
2022-10-12 09:00:00
C
35
2022-10-12 09:35:00
D
null
fiddle

Explode time duration defined by start and end timestamp by the hour

I have a table with work shifts (1 row per shift) that include date, start and end time.
Main goal: I want to aggregate the number of working hours per hour per store.
This is what my shift table looks like:
employee_id
store
start_timestamp
end_timestamp
1
1
2022-01-01T07:00
2022-01-01T11:30
2
1
2022-01-01T08:30
2022-01-01T12:30
...
...
...
...
I want to "explode" the information into a table something like this:
hour
employee_id
store
date
scheduled_work (h)
07:00
1
1
2022-01-01
1
08:00
1
1
2022-01-01
1
09:00
1
1
2022-01-01
1
10:00
1
1
2022-01-01
1
11:00
1
1
2022-01-01
0.5
08:00
2
1
2022-01-01
0.5
09:00
2
1
2022-01-01
1
10:00
2
1
2022-01-01
1
11:00
2
1
2022-01-01
1
12:00
2
1
2022-01-01
0.5
...
...
...
...
...
I have tried using a method using cross joins and it consumed a lot of memory and looks like this:
with test as (
select 1 as employee_id, 1 as store_id, timestamp('2022-01-01 07:00:00') as start_timestamp, timestamp('2022-01-01 11:30:00') as end_timestamp union all
select 2 as employee_id, 1 as store_id, timestamp('2022-01-01 08:30:00') as start_timestamp, timestamp('2022-01-01 12:30:00') as end_timestamp
)
, cte as (
select ts
, test.*
, safe_divide(
timestamp_diff(
least(date_add(ts, interval 1 hour), end_timestamp)
, greatest(ts, start_timestamp)
, millisecond
)
, 3600000
) as scheduled_work
from test
cross join unnest(generate_timestamp_array(timestamp('2022-01-01 07:00:00'),
timestamp('2022-01-01 12:30:00'), interval 1 hour)) as ts
order by employee_id, ts)
select * from cte
where scheduled_work >= 0;
It's working but I know this will not be good when the number of shifts starts to add up. Does anyone have another solution that is more efficient?
I'm using BigQuery.
you might want to remove order by inside cte subquery, it'll affect the query performance.
And another similar approach:
WITH test AS (
select 1 as employee_id, 1 as store_id, timestamp('2022-01-01 07:00:00') as start_timestamp, timestamp('2022-01-01 11:30:00') as end_timestamp union all
select 2 as employee_id, 1 as store_id, timestamp('2022-01-01 08:30:00') as start_timestamp, timestamp('2022-01-01 12:30:00') as end_timestamp
),
explodes AS (
SELECT employee_id, store_id, EXTRACT(DATE FROM h) date, TIME_TRUNC(EXTRACT(TIME FROM h), HOUR) hour, 1 AS scheduled_work
FROM test,
UNNEST (GENERATE_TIMESTAMP_ARRAY(
TIMESTAMP_TRUNC(start_timestamp + INTERVAL 1 HOUR, HOUR),
TIMESTAMP_TRUNC(end_timestamp - INTERVAL 1 HOUR, HOUR), INTERVAL 1 HOUR
)) h
UNION ALL
SELECT employee_id, store_id, EXTRACT(DATE FROM h), TIME_TRUNC(EXTRACT(TIME FROM h), HOUR),
CASE offset
WHEN 0 THEN 1 - (EXTRACT(MINUTE FROM h) * 60 + EXTRACT(SECOND FROM h)) / 3600
WHEN 1 THEN (EXTRACT(MINUTE FROM h) * 60 + EXTRACT(SECOND FROM h)) / 3600
END
FROM test, UNNEST([start_timestamp, end_timestamp]) h WITH OFFSET
)
SELECT * FROM explodes WHERE scheduled_work > 0;
Consider below approach
with temp as (
select * replace(
parse_time('%H:%M', start_time) as start_time,
parse_time('%H:%M', end_time) as end_time
)
from your_table
)
select * except(start_time, end_time),
case
when hour = time_trunc(start_time, hour) then (60 - time_diff(start_time, hour, minute)) / 60
when hour = time_trunc(end_time, hour) then time_diff(end_time, hour, minute) / 60
else 1
end as scheduled_work
from (
select time_add(time_trunc(start_time, hour), interval delta hour) as hour,
employee_id, store, date, start_time, end_time
from temp, unnest(generate_array(0,time_diff(end_time, start_time, hour))) delta
)
order by employee_id, hour
if applied to sample data as in your question
output is

How to fill the time gap after grouping date record for months in postgres

I have table records as -
date n_count
2020-02-19 00:00:00 4
2020-07-14 00:00:00 1
2020-07-17 00:00:00 1
2020-07-30 00:00:00 2
2020-08-03 00:00:00 1
2020-08-04 00:00:00 2
2020-08-25 00:00:00 2
2020-09-23 00:00:00 2
2020-09-30 00:00:00 3
2020-10-01 00:00:00 11
2020-10-05 00:00:00 12
2020-10-19 00:00:00 1
2020-10-20 00:00:00 1
2020-10-22 00:00:00 1
2020-11-02 00:00:00 376
2020-11-04 00:00:00 72
2020-11-11 00:00:00 1
I want to be grouped all the records into months for finding month total count which is working, but there is a missing of month. how to fill this gap.
time month_count
"2020-02-01" 4
"2020-07-01" 4
"2020-08-01" 5
"2020-09-01" 5
"2020-10-01" 26
"2020-11-01" 449
This is what I have tried.
SELECT (date_trunc('month', date))::date AS time,
sum(n_count) as month_count
FROM table1
group by time
order by time asc
You can use generate_series() to generate all starts of months between the earliest and latest date available in the table, then bring the table with a left join:
select d.dt, coalesce(sum(t.n_count), 0) as month_count
from (
select generate_series(date_trunc('month', min(date)), date_trunc('month', max(date)), '1 month') as dt
from table1
) as d(dt)
left join table1 t on t.date >= d.dt and t.date < d.dt + interval '1 month'
group by d.dt
order by d.dt
I would simply UNION a date series, generated from MIN and MAX date:
demo:db<>fiddle
WITH cte AS ( -- 1
SELECT
*,
date_trunc('month', date)::date AS time
FROM
t
)
SELECT
time,
SUM(n_count) as month_count --3
FROM (
SELECT
time,
n_count
FROM cte
UNION
SELECT -- 2
generate_series(
(SELECT MIN(time) FROM cte),
(SELECT MAX(time) FROM cte),
interval '1 month'
)::date,
0
) s
GROUP BY time
ORDER BY time
Use CTE to calculate date_trunc only once. Could be left out if you like to call your table twice in the UNION below
Generate monthly date series from MIN to MAX date containing your n_count value = 0. Add it to the table
Do your calculation

SQL Query calculating two additional columns

I have a table which gets populated daily with database size. I need to modify the query where I can calculate daily growth and weekly growth.
select * from sys.dbsize
where SNAP_TIME > sysdate -3
order by SNAP_TIME
Current Output
I would like to add two additional columns which would be
Daily Growth (DB_SIZE sysdate - DB_SIZE (sysdate -1))
Weekly Growth (DB_SIZE sysdate - DB_SIZE (sysdate -7))
Need some help constructing the SQL for those two additional columns. Any help will be greatly appreciated.
Thanks,
One option is to use LAG analytic function to calculate daily growth and correlated subquery (within the SELECT statement) for weekly growth.
For example:
SQL> with dbsize (snap_time, db_size) as
2 (select sysdate - 8, 100 from dual union all
3 select sysdate - 7, 110 from dual union all
4 select sysdate - 6, 105 from dual union all
5 select sysdate - 5, 120 from dual union all
6 select sysdate - 4, 130 from dual union all
7 select sysdate - 3, 130 from dual union all
8 select sysdate - 2, 142 from dual union all
9 select sysdate - 1, 144 from dual union all
10 select sysdate - 0, 150 from dual
11 )
12 select
13 a.snap_time,
14 a.db_size,
15 a.db_size - lag(a.db_size) over (order by a.snap_time) daily_growth,
16 --
17 db_size - (select db_size from dbsize b
18 where trunc(b.snap_time) = trunc(a.snap_time) - 7
19 ) weekly_growth
20 from dbsize a
21 order by a.snap_time;
SNAP_TIME DB_SIZE DAILY_GROWTH WEEKLY_GROWTH
------------------- ---------- ------------ -------------
24.08.2020 21:52:20 100
25.08.2020 21:52:20 110 10
26.08.2020 21:52:20 105 -5
27.08.2020 21:52:20 120 15
28.08.2020 21:52:20 130 10
29.08.2020 21:52:20 130 0
30.08.2020 21:52:20 142 12
31.08.2020 21:52:20 144 2 44
01.09.2020 21:52:20 150 6 40
9 rows selected.
SQL>
I would recommend lag() for both columns:
select s.*,
(dbsize - dbsize_1) as daily_growth,
(dbsize - dbsize_7) as weekly_growth
from (select s.*,
lag(dbsize) over (order by snap_time) as dbsize_1,
lag(dbsize, 7) over (order by snap_time) as dbsize_7
from sys.dbsize
) s
where SNAP_TIME > sysdate -3
order by SNAP_TIME;
If you don't have a snapshot each day, you can handle this with a window frame:
select s.*,
(dbsize - dbsize_1) as daily_growth,
(dbsize - dbsize_7) as weekly_growth
from (select s.*,
max(dbsize) over (order by trunc(snap_time) range between interval '1' day preceding and interval '1' second preceding) as dbsize_1,
lag(dbsize, 7) over (order by trunc(snap_time) range between '7' day preceding and interval '6 1' day to hour) as dbsize_7
from sys.dbsize
) s
where SNAP_TIME > sysdate - 3
order by SNAP_TIME;
If there is always is one record per day, you can use lag():
select
snap_time
db_size,
db_size - lag(db_size, 1) over(order by snap_time) daily_growth,
db_size - lag(db_size, 7) over(order by snap_time) weekly_growth
from sys.db.size
order by snap_time
This actually looks 1 row back and 7 rows back. If there are missing dates, or multiple records per day, then you could average the snap size by day, and use a window range in the window function:
select
trunc(snap_time) snap_day,
avg(db_size) avg_db_size,
avg(db_size) - avg(db_size) over(
order by trunc(snap_time)
range between interval '1' day preceding and interval '1' day preceding
) daily_growth,
avg(db_size) - avg(db_size) over(
order by trunc(snap_time)
range between interval '7' day preceding and interval '7' day preceding
) weekly_growth
from sys.db.size
group by trunc(snap_time)
order by trunc(snap_time)
If you want the results for the last 3 days only, you can turn any of the two above queries to subqueries, and filter in the outer query:
select *
from ( ... ) t
where snap_time > sysdate - 3 -- or: snap_day > trunc(sysdate) - 3

Cross join for time series postgresql query

I have a table with Items with
Item_id, Item_time, Item_numbers
1 2017-01-01 18:00:00 2
2 2017-01-01 18:10:00 2
3 2017-01-01 19:10:00 3
I want to group the items by hourly for some specific time (between 9 to 3 for each day) and in case if there is no entry for the particular hours then it should it be a 0.
Desired Output:
Item_time Item_numbers
2017-01-01 18:00:00 4
2017-01-01 19:00:00 3
2017-01-01 20:00:00 0
with hour_items as (select date_trunc('hour', item_time) "hour",
avg(item_numbers) as value from items where item_id=2 and
fact_time::date= '2017-01-01' group by hour) select hour, value from
hour_items where EXTRACT(HOUR FROM hour) >= '9' and EXTRACT(HOUR FROM
> hour) < '15'.
The above query groups them correctly but the where the hour is missing, there is no entry. Though it should be an entry with a 0 as stated in the desired output.
This should do.
We get all the distinct days (CTE dates), then we generate hours for each of those dates (CTE hours) and finally we left join our data on "per our" basis.
with sample_data as (
select 1 as item_id, '2018-01-01 12:03:15'::timestamp as item_time, 2 as item_numbers
union all
select 2 as item_id, '2018-01-01 12:41:15'::timestamp as item_time, 1 as item_numbers
union all
select 3 as item_id, '2018-01-01 17:41:15'::timestamp as item_time, 2 as item_numbers
union all
select 4 as item_id, '2018-01-01 19:41:15'::timestamp as item_time, 2 as item_numbers
),
dates as (
select distinct item_time::date
from sample_data
),
hours as (
select item_time + interval '1 hour' * a as hour
from dates
cross join generate_series(0,23) a
)
select h.hour, sum(coalesce(sd.item_numbers,0))
from hours h
left join sample_data sd on h.hour = date_trunc('hour', sd.item_time)
where extract(hour from hour) between 9 and 17
group by h.hour
order by h.hour