SQL - group by issue - sql

I have a table that has records in the form(simplified): ID (int), startTime(DateTime), endTime(DateTime)
I want to be able to group records that "overlap" time duration by minute.
Ex:
1 - 12.00.AM - 12.10.AM ( duration here is 10 min)
2 - 12.05.AM - 12.07.AM (duration here is 2 minutes but is overlapping with record ID = 1 in minutes 05, 06, 07)
The result of such a query should be
minute 12.00 - record 1,
minute 12.01 - record 1,
...
minute 12.05 - record 1 + record 2,
minute 12.06 - record 1 + record 2,
minute 12.07 - record 1 + record 2
...
minute 12.10 - record 1
Note I use sql server (2005 uppwards)

This is one way to do it in Oracle (11g Release 2 as it includes the LISTAGG function):
with CTE as
( select STRT + (rownum - 1) / 24 / 60 as TIMES
from (select min(STARTTIME) as STRT from FORM1)
connect by level <=
(select (max(ENDTIME) - min(STARTTIME)) * 24 * 60
from FORM1))
select to_char(CTE.TIMES, 'hh24:mi') as MINUTE
,LISTAGG(ID, ',') within group (order by ID) as IDS
from CTE
join
FORM1
on CTE.TIMES <= FORM1.ENDTIME and CTE.TIMES >= FORM1.STARTTIME
group by to_char(CTE.TIMES, 'hh24:mi')
order by to_char(CTE.TIMES, 'hh24:mi')
The test data I used was:
create table FORM1
(
ID number
,STARTTIME date
,ENDTIME date
);
insert into FORM1
values (
1
,to_date('26/01/2012 00:00:00', 'dd/mm/yyyy hh24:mi:ss')
,to_date('26/01/2012 00:10:00', 'dd/mm/yyyy hh24:mi:ss'));
insert into FORM1
values (
2
,to_date('26/01/2012 00:05:00', 'dd/mm/yyyy hh24:mi:ss')
,to_date('26/01/2012 00:07:00', 'dd/mm/yyyy hh24:mi:ss'));
And I get the following result:
Minute IDs
00:00 1
00:01 1
00:02 1
00:03 1
00:04 1
00:05 1,2
00:06 1,2
00:07 1,2
00:08 1
00:09 1

Related

Split row data base on timestamp SQL Oracle

Good day everyone. I have a table as below. Duration is the time from current state to next state.
Timestamp
State
Duration(minutes)
10/9/2022 8:50:00 AM
A
35
10/9/2022 9:25:00 AM
B
10
10/9/2022 9:35:00 AM
C
...
How do I split data at 9:00 AM of each day like below:
Timestamp
State
Duration(minutes)
10/9/2022 8:50:00 AM
A
10
10/9/2022 9:00:00 AM
A
25
10/9/2022 9:25:00 AM
B
10
10/9/2022 9:35:00 AM
C
...
Thank you.
Use a row-generator function to generate extra rows when the timestamp is before 09:00 and the next timestamp is after 09:00 (and calculate the diff value rather than storing it in the table):
SELECT l.ts AS timestamp,
t.state,
ROUND((l.next_ts - l.ts) * 24 * 60, 2) As diff
FROM (
SELECT timestamp,
LEAD(timestamp) OVER (ORDER BY timestamp) AS next_timestamp,
state
FROM table_name
) t
CROSS APPLY (
SELECT GREATEST(
t.timestamp,
TRUNC(t.timestamp - INTERVAL '9' HOUR) + INTERVAL '9' HOUR + LEVEL - 1
) AS ts,
LEAST(
t.next_timestamp,
TRUNC(t.timestamp - INTERVAL '9' HOUR) + INTERVAL '9' HOUR + LEVEL
) AS next_ts
FROM DUAL
CONNECT BY
TRUNC(t.timestamp - INTERVAL '9' HOUR) + INTERVAL '9' HOUR + LEVEL - 1 < t.next_timestamp
) l;
Which, for your sample data:
CREATE TABLE table_name (Timestamp, State) AS
SELECT DATE '2022-10-09' + INTERVAL '08:50' HOUR TO MINUTE, 'A' FROM DUAL UNION ALL
SELECT DATE '2022-10-09' + INTERVAL '09:25' HOUR TO MINUTE, 'B' FROM DUAL UNION ALL
SELECT DATE '2022-10-09' + INTERVAL '09:35' HOUR TO MINUTE, 'C' FROM DUAL UNION ALL
SELECT DATE '2022-10-12' + INTERVAL '09:35' HOUR TO MINUTE, 'D' FROM DUAL;
Outputs:
TIMESTAMP
STATE
DIFF
2022-10-09 08:50:00
A
10
2022-10-09 09:00:00
A
25
2022-10-09 09:25:00
B
10
2022-10-09 09:35:00
C
1405
2022-10-10 09:00:00
C
1440
2022-10-11 09:00:00
C
1440
2022-10-12 09:00:00
C
35
2022-10-12 09:35:00
D
null
fiddle

Explode time duration defined by start and end timestamp by the hour

I have a table with work shifts (1 row per shift) that include date, start and end time.
Main goal: I want to aggregate the number of working hours per hour per store.
This is what my shift table looks like:
employee_id
store
start_timestamp
end_timestamp
1
1
2022-01-01T07:00
2022-01-01T11:30
2
1
2022-01-01T08:30
2022-01-01T12:30
...
...
...
...
I want to "explode" the information into a table something like this:
hour
employee_id
store
date
scheduled_work (h)
07:00
1
1
2022-01-01
1
08:00
1
1
2022-01-01
1
09:00
1
1
2022-01-01
1
10:00
1
1
2022-01-01
1
11:00
1
1
2022-01-01
0.5
08:00
2
1
2022-01-01
0.5
09:00
2
1
2022-01-01
1
10:00
2
1
2022-01-01
1
11:00
2
1
2022-01-01
1
12:00
2
1
2022-01-01
0.5
...
...
...
...
...
I have tried using a method using cross joins and it consumed a lot of memory and looks like this:
with test as (
select 1 as employee_id, 1 as store_id, timestamp('2022-01-01 07:00:00') as start_timestamp, timestamp('2022-01-01 11:30:00') as end_timestamp union all
select 2 as employee_id, 1 as store_id, timestamp('2022-01-01 08:30:00') as start_timestamp, timestamp('2022-01-01 12:30:00') as end_timestamp
)
, cte as (
select ts
, test.*
, safe_divide(
timestamp_diff(
least(date_add(ts, interval 1 hour), end_timestamp)
, greatest(ts, start_timestamp)
, millisecond
)
, 3600000
) as scheduled_work
from test
cross join unnest(generate_timestamp_array(timestamp('2022-01-01 07:00:00'),
timestamp('2022-01-01 12:30:00'), interval 1 hour)) as ts
order by employee_id, ts)
select * from cte
where scheduled_work >= 0;
It's working but I know this will not be good when the number of shifts starts to add up. Does anyone have another solution that is more efficient?
I'm using BigQuery.
you might want to remove order by inside cte subquery, it'll affect the query performance.
And another similar approach:
WITH test AS (
select 1 as employee_id, 1 as store_id, timestamp('2022-01-01 07:00:00') as start_timestamp, timestamp('2022-01-01 11:30:00') as end_timestamp union all
select 2 as employee_id, 1 as store_id, timestamp('2022-01-01 08:30:00') as start_timestamp, timestamp('2022-01-01 12:30:00') as end_timestamp
),
explodes AS (
SELECT employee_id, store_id, EXTRACT(DATE FROM h) date, TIME_TRUNC(EXTRACT(TIME FROM h), HOUR) hour, 1 AS scheduled_work
FROM test,
UNNEST (GENERATE_TIMESTAMP_ARRAY(
TIMESTAMP_TRUNC(start_timestamp + INTERVAL 1 HOUR, HOUR),
TIMESTAMP_TRUNC(end_timestamp - INTERVAL 1 HOUR, HOUR), INTERVAL 1 HOUR
)) h
UNION ALL
SELECT employee_id, store_id, EXTRACT(DATE FROM h), TIME_TRUNC(EXTRACT(TIME FROM h), HOUR),
CASE offset
WHEN 0 THEN 1 - (EXTRACT(MINUTE FROM h) * 60 + EXTRACT(SECOND FROM h)) / 3600
WHEN 1 THEN (EXTRACT(MINUTE FROM h) * 60 + EXTRACT(SECOND FROM h)) / 3600
END
FROM test, UNNEST([start_timestamp, end_timestamp]) h WITH OFFSET
)
SELECT * FROM explodes WHERE scheduled_work > 0;
Consider below approach
with temp as (
select * replace(
parse_time('%H:%M', start_time) as start_time,
parse_time('%H:%M', end_time) as end_time
)
from your_table
)
select * except(start_time, end_time),
case
when hour = time_trunc(start_time, hour) then (60 - time_diff(start_time, hour, minute)) / 60
when hour = time_trunc(end_time, hour) then time_diff(end_time, hour, minute) / 60
else 1
end as scheduled_work
from (
select time_add(time_trunc(start_time, hour), interval delta hour) as hour,
employee_id, store, date, start_time, end_time
from temp, unnest(generate_array(0,time_diff(end_time, start_time, hour))) delta
)
order by employee_id, hour
if applied to sample data as in your question
output is

SQL Query calculating two additional columns

I have a table which gets populated daily with database size. I need to modify the query where I can calculate daily growth and weekly growth.
select * from sys.dbsize
where SNAP_TIME > sysdate -3
order by SNAP_TIME
Current Output
I would like to add two additional columns which would be
Daily Growth (DB_SIZE sysdate - DB_SIZE (sysdate -1))
Weekly Growth (DB_SIZE sysdate - DB_SIZE (sysdate -7))
Need some help constructing the SQL for those two additional columns. Any help will be greatly appreciated.
Thanks,
One option is to use LAG analytic function to calculate daily growth and correlated subquery (within the SELECT statement) for weekly growth.
For example:
SQL> with dbsize (snap_time, db_size) as
2 (select sysdate - 8, 100 from dual union all
3 select sysdate - 7, 110 from dual union all
4 select sysdate - 6, 105 from dual union all
5 select sysdate - 5, 120 from dual union all
6 select sysdate - 4, 130 from dual union all
7 select sysdate - 3, 130 from dual union all
8 select sysdate - 2, 142 from dual union all
9 select sysdate - 1, 144 from dual union all
10 select sysdate - 0, 150 from dual
11 )
12 select
13 a.snap_time,
14 a.db_size,
15 a.db_size - lag(a.db_size) over (order by a.snap_time) daily_growth,
16 --
17 db_size - (select db_size from dbsize b
18 where trunc(b.snap_time) = trunc(a.snap_time) - 7
19 ) weekly_growth
20 from dbsize a
21 order by a.snap_time;
SNAP_TIME DB_SIZE DAILY_GROWTH WEEKLY_GROWTH
------------------- ---------- ------------ -------------
24.08.2020 21:52:20 100
25.08.2020 21:52:20 110 10
26.08.2020 21:52:20 105 -5
27.08.2020 21:52:20 120 15
28.08.2020 21:52:20 130 10
29.08.2020 21:52:20 130 0
30.08.2020 21:52:20 142 12
31.08.2020 21:52:20 144 2 44
01.09.2020 21:52:20 150 6 40
9 rows selected.
SQL>
I would recommend lag() for both columns:
select s.*,
(dbsize - dbsize_1) as daily_growth,
(dbsize - dbsize_7) as weekly_growth
from (select s.*,
lag(dbsize) over (order by snap_time) as dbsize_1,
lag(dbsize, 7) over (order by snap_time) as dbsize_7
from sys.dbsize
) s
where SNAP_TIME > sysdate -3
order by SNAP_TIME;
If you don't have a snapshot each day, you can handle this with a window frame:
select s.*,
(dbsize - dbsize_1) as daily_growth,
(dbsize - dbsize_7) as weekly_growth
from (select s.*,
max(dbsize) over (order by trunc(snap_time) range between interval '1' day preceding and interval '1' second preceding) as dbsize_1,
lag(dbsize, 7) over (order by trunc(snap_time) range between '7' day preceding and interval '6 1' day to hour) as dbsize_7
from sys.dbsize
) s
where SNAP_TIME > sysdate - 3
order by SNAP_TIME;
If there is always is one record per day, you can use lag():
select
snap_time
db_size,
db_size - lag(db_size, 1) over(order by snap_time) daily_growth,
db_size - lag(db_size, 7) over(order by snap_time) weekly_growth
from sys.db.size
order by snap_time
This actually looks 1 row back and 7 rows back. If there are missing dates, or multiple records per day, then you could average the snap size by day, and use a window range in the window function:
select
trunc(snap_time) snap_day,
avg(db_size) avg_db_size,
avg(db_size) - avg(db_size) over(
order by trunc(snap_time)
range between interval '1' day preceding and interval '1' day preceding
) daily_growth,
avg(db_size) - avg(db_size) over(
order by trunc(snap_time)
range between interval '7' day preceding and interval '7' day preceding
) weekly_growth
from sys.db.size
group by trunc(snap_time)
order by trunc(snap_time)
If you want the results for the last 3 days only, you can turn any of the two above queries to subqueries, and filter in the outer query:
select *
from ( ... ) t
where snap_time > sysdate - 3 -- or: snap_day > trunc(sysdate) - 3

Cumulative sum group by 15 min interval - Oracle SQL

I would like to get the cumulative sum group by 15 min interval.
eg:
table name: myTable
id name start_time faults
============================================
1 a 06/07/19 23:30 1
2 b 06/07/19 23:35 1
3 c 06/07/19 23:36 1
4 d 06/07/19 23:50 1
5 e 06/07/19 23:54 1
6 f 07/07/19 00:05 1
7 g 07/07/19 00:20 1
8 h 07/07/19 00:25 1
Result:
start_Time faults
============================================
06/07/19 23:15 0
06/07/19 23:30 3
06/07/19 23:45 5
06/07/19 00:00 6
07/07/19 00:15 8
07/07/19 00:30 8
08/07/19 00:45 8
08/07/19 01:00 8
thanks
I think you want:
select trunc(start_time, 'hh') + (floor(extract(minute from start_time) / 15) * 15) * interval '1' minute as dte,
sum(count(*)) over (order by min(start_time))
from t
group by trunc(start_time, 'hh') + (floor(extract(minute from start_time) / 15) * 15) * interval '1' minute
order by dte;
Here is a db<>fiddle.
This query gives cumulative sums for existing quarters:
dbfiddle
select date '1900-01-01' + tm / 24 / 4 tm, sum(sum(faults)) over (order by tm) faults
from (select floor((start_time - date '1900-01-01') * 24 * 4) tm, faults from mytable)
group by tm
If you want wider date range then generate it using connect by query and join with above using lag() with ignore nulls, like here:
with
flts as (
select date '1900-01-01' + tm / 24 / 4 tm, sum(sum(faults)) over (order by tm) faults
from (select floor((start_time - date '1900-01-01') * 24 * 4) tm, faults from mytable)
group by tm),
quarters as (
select to_date('2019-07-06 23:00', 'yyyy-mm-dd hh24:mi') + level * interval '15' minute tm
from dual connect by level <= 10 )
select to_char(tm, 'yyyy-mm-dd hh24:mi') tm,
nvl(faults, lag(faults, 1, 0) ignore nulls over (order by tm))
from quarters left join flts using (tm)
You can achieve it using group by with dates truncated to 15 mins and then using analytical function to calculate the cumulative sum as following:
Select
start_time
+ 15/1440
- mod((start_time - trunc(start_time)) * 1440, 15)/1440,
sum(count(1)) over (order by min(start_time))
from your_table
Group by start_time
+ 15/1440
- mod((start_time - trunc(start_time)) * 1440, 15)/1440
It is same as gordon's answer but using arithmetics. :)
Cheers!!

Group by with Unix time stamps

I am trying write a query where time stamps are in Unix format.
The objective of the query is group by these time stamps in five minute segments and to count each unique Id in those segments.
Is there a simple way of doing this?
The result looking for this
Time_utc Id count
25/07/2019 1600 1 3
25/07/2019 1600 2 1
25/07/2019 1605 1 4
You haven't shown data, so as a starting point you can group the Unix timestamps by dividing by 300 (for 5 minutes worth of seconds):
select 300 * floor(unix_ts/300) as unix_five_minute,
timestamp '1970-01-01 00:00:00 UTC'
+ (300*floor(unix_ts/300)) * interval '1' second as oracle_timestamp,
count(*)
from cte2
group by floor(unix_ts/300);
or if you have millisecond precision adjust by a factor of 1000:
select 300000 * floor(unix_ts/300000) as unix_five_minute,
timestamp '1970-01-01 00:00:00 UTC'
+ (300*floor(unix_ts/300000)) * interval '1' second as oracle_timestamp,
count(*)
from cte2
group by floor(unix_ts/300000);
Demo using made-up data generated from current time:
-- CTEs to generate some sample data
with cte1 (oracle_interval) as (
select systimestamp - level * interval '42' second
- timestamp '1970-01-01 00:00:00.0 UTC'
from dual
connect by level <= 30
),
cte2 (unix_ts) as (
select trunc(
extract(day from oracle_interval) * 86400000
+ extract(hour from oracle_interval) * 3600000
+ extract(minute from oracle_interval) * 60000
+ extract(second from oracle_interval) * 1000
)
from cte1
)
-- actual query
select 300000 * floor(unix_ts/300000) as unix_five_minute,
timestamp '1970-01-01 00:00:00 UTC'
+ (300*floor(unix_ts/300000)) * interval '1' second as oracle_timestamp,
count(*)
from cte2
group by floor(unix_ts/300000);
UNIX_FIVE_MINUTE ORACLE_TIMESTAMP COUNT(*)
---------------- ------------------------- ----------------
1564072500000 2019-07-25 16:35:00.0 UTC 7
1564072200000 2019-07-25 16:30:00.0 UTC 7
1564071600000 2019-07-25 16:20:00.0 UTC 4
1564071900000 2019-07-25 16:25:00.0 UTC 8
1564072800000 2019-07-25 16:40:00.0 UTC 4
Unix time stamps such as 155639.600 or 155639.637
Those are unusual values; Unix/epoch times are usually 10-digit numbers, or 13 digits for millisecond precision. Assuming (or rather, guessing) that they are tenths of a second for some reason:
-- CTE for sample data
with cte (unix_ts) as (
select 155639.600 from dual
union all
select 155639.637 from dual
)
-- actual query
select 300 * floor(unix_ts*10000/300) as unix_five_minute,
timestamp '1970-01-01 00:00:00 UTC'
+ (300*floor(unix_ts*10000/300)) * interval '1' second as oracle_timestamp,
count(*)
from cte
group by floor(unix_ts*10000/300);
UNIX_FIVE_MINUTE ORACLE_TIMESTAMP COUNT(*)
---------------- ------------------------- ----------------
1556396100 2019-04-27 20:15:00.0 UTC 1
1556395800 2019-04-27 20:10:00.0 UTC 1
The 10000/300 could be simplified to 100/3, but I think it's clearer left as it is.