I have a table with work shifts (1 row per shift) that include date, start and end time.
Main goal: I want to aggregate the number of working hours per hour per store.
This is what my shift table looks like:
employee_id
store
start_timestamp
end_timestamp
1
1
2022-01-01T07:00
2022-01-01T11:30
2
1
2022-01-01T08:30
2022-01-01T12:30
...
...
...
...
I want to "explode" the information into a table something like this:
hour
employee_id
store
date
scheduled_work (h)
07:00
1
1
2022-01-01
1
08:00
1
1
2022-01-01
1
09:00
1
1
2022-01-01
1
10:00
1
1
2022-01-01
1
11:00
1
1
2022-01-01
0.5
08:00
2
1
2022-01-01
0.5
09:00
2
1
2022-01-01
1
10:00
2
1
2022-01-01
1
11:00
2
1
2022-01-01
1
12:00
2
1
2022-01-01
0.5
...
...
...
...
...
I have tried using a method using cross joins and it consumed a lot of memory and looks like this:
with test as (
select 1 as employee_id, 1 as store_id, timestamp('2022-01-01 07:00:00') as start_timestamp, timestamp('2022-01-01 11:30:00') as end_timestamp union all
select 2 as employee_id, 1 as store_id, timestamp('2022-01-01 08:30:00') as start_timestamp, timestamp('2022-01-01 12:30:00') as end_timestamp
)
, cte as (
select ts
, test.*
, safe_divide(
timestamp_diff(
least(date_add(ts, interval 1 hour), end_timestamp)
, greatest(ts, start_timestamp)
, millisecond
)
, 3600000
) as scheduled_work
from test
cross join unnest(generate_timestamp_array(timestamp('2022-01-01 07:00:00'),
timestamp('2022-01-01 12:30:00'), interval 1 hour)) as ts
order by employee_id, ts)
select * from cte
where scheduled_work >= 0;
It's working but I know this will not be good when the number of shifts starts to add up. Does anyone have another solution that is more efficient?
I'm using BigQuery.
you might want to remove order by inside cte subquery, it'll affect the query performance.
And another similar approach:
WITH test AS (
select 1 as employee_id, 1 as store_id, timestamp('2022-01-01 07:00:00') as start_timestamp, timestamp('2022-01-01 11:30:00') as end_timestamp union all
select 2 as employee_id, 1 as store_id, timestamp('2022-01-01 08:30:00') as start_timestamp, timestamp('2022-01-01 12:30:00') as end_timestamp
),
explodes AS (
SELECT employee_id, store_id, EXTRACT(DATE FROM h) date, TIME_TRUNC(EXTRACT(TIME FROM h), HOUR) hour, 1 AS scheduled_work
FROM test,
UNNEST (GENERATE_TIMESTAMP_ARRAY(
TIMESTAMP_TRUNC(start_timestamp + INTERVAL 1 HOUR, HOUR),
TIMESTAMP_TRUNC(end_timestamp - INTERVAL 1 HOUR, HOUR), INTERVAL 1 HOUR
)) h
UNION ALL
SELECT employee_id, store_id, EXTRACT(DATE FROM h), TIME_TRUNC(EXTRACT(TIME FROM h), HOUR),
CASE offset
WHEN 0 THEN 1 - (EXTRACT(MINUTE FROM h) * 60 + EXTRACT(SECOND FROM h)) / 3600
WHEN 1 THEN (EXTRACT(MINUTE FROM h) * 60 + EXTRACT(SECOND FROM h)) / 3600
END
FROM test, UNNEST([start_timestamp, end_timestamp]) h WITH OFFSET
)
SELECT * FROM explodes WHERE scheduled_work > 0;
Consider below approach
with temp as (
select * replace(
parse_time('%H:%M', start_time) as start_time,
parse_time('%H:%M', end_time) as end_time
)
from your_table
)
select * except(start_time, end_time),
case
when hour = time_trunc(start_time, hour) then (60 - time_diff(start_time, hour, minute)) / 60
when hour = time_trunc(end_time, hour) then time_diff(end_time, hour, minute) / 60
else 1
end as scheduled_work
from (
select time_add(time_trunc(start_time, hour), interval delta hour) as hour,
employee_id, store, date, start_time, end_time
from temp, unnest(generate_array(0,time_diff(end_time, start_time, hour))) delta
)
order by employee_id, hour
if applied to sample data as in your question
output is
Related
Good day everyone. I have a table as below. Duration is the time from current state to next state.
Timestamp
State
Duration(minutes)
10/9/2022 8:50:00 AM
A
35
10/9/2022 9:25:00 AM
B
10
10/9/2022 9:35:00 AM
C
...
How do I split data at 9:00 AM of each day like below:
Timestamp
State
Duration(minutes)
10/9/2022 8:50:00 AM
A
10
10/9/2022 9:00:00 AM
A
25
10/9/2022 9:25:00 AM
B
10
10/9/2022 9:35:00 AM
C
...
Thank you.
Use a row-generator function to generate extra rows when the timestamp is before 09:00 and the next timestamp is after 09:00 (and calculate the diff value rather than storing it in the table):
SELECT l.ts AS timestamp,
t.state,
ROUND((l.next_ts - l.ts) * 24 * 60, 2) As diff
FROM (
SELECT timestamp,
LEAD(timestamp) OVER (ORDER BY timestamp) AS next_timestamp,
state
FROM table_name
) t
CROSS APPLY (
SELECT GREATEST(
t.timestamp,
TRUNC(t.timestamp - INTERVAL '9' HOUR) + INTERVAL '9' HOUR + LEVEL - 1
) AS ts,
LEAST(
t.next_timestamp,
TRUNC(t.timestamp - INTERVAL '9' HOUR) + INTERVAL '9' HOUR + LEVEL
) AS next_ts
FROM DUAL
CONNECT BY
TRUNC(t.timestamp - INTERVAL '9' HOUR) + INTERVAL '9' HOUR + LEVEL - 1 < t.next_timestamp
) l;
Which, for your sample data:
CREATE TABLE table_name (Timestamp, State) AS
SELECT DATE '2022-10-09' + INTERVAL '08:50' HOUR TO MINUTE, 'A' FROM DUAL UNION ALL
SELECT DATE '2022-10-09' + INTERVAL '09:25' HOUR TO MINUTE, 'B' FROM DUAL UNION ALL
SELECT DATE '2022-10-09' + INTERVAL '09:35' HOUR TO MINUTE, 'C' FROM DUAL UNION ALL
SELECT DATE '2022-10-12' + INTERVAL '09:35' HOUR TO MINUTE, 'D' FROM DUAL;
Outputs:
TIMESTAMP
STATE
DIFF
2022-10-09 08:50:00
A
10
2022-10-09 09:00:00
A
25
2022-10-09 09:25:00
B
10
2022-10-09 09:35:00
C
1405
2022-10-10 09:00:00
C
1440
2022-10-11 09:00:00
C
1440
2022-10-12 09:00:00
C
35
2022-10-12 09:35:00
D
null
fiddle
I would like to get the cumulative sum group by 15 min interval.
eg:
table name: myTable
id name start_time faults
============================================
1 a 06/07/19 23:30 1
2 b 06/07/19 23:35 1
3 c 06/07/19 23:36 1
4 d 06/07/19 23:50 1
5 e 06/07/19 23:54 1
6 f 07/07/19 00:05 1
7 g 07/07/19 00:20 1
8 h 07/07/19 00:25 1
Result:
start_Time faults
============================================
06/07/19 23:15 0
06/07/19 23:30 3
06/07/19 23:45 5
06/07/19 00:00 6
07/07/19 00:15 8
07/07/19 00:30 8
08/07/19 00:45 8
08/07/19 01:00 8
thanks
I think you want:
select trunc(start_time, 'hh') + (floor(extract(minute from start_time) / 15) * 15) * interval '1' minute as dte,
sum(count(*)) over (order by min(start_time))
from t
group by trunc(start_time, 'hh') + (floor(extract(minute from start_time) / 15) * 15) * interval '1' minute
order by dte;
Here is a db<>fiddle.
This query gives cumulative sums for existing quarters:
dbfiddle
select date '1900-01-01' + tm / 24 / 4 tm, sum(sum(faults)) over (order by tm) faults
from (select floor((start_time - date '1900-01-01') * 24 * 4) tm, faults from mytable)
group by tm
If you want wider date range then generate it using connect by query and join with above using lag() with ignore nulls, like here:
with
flts as (
select date '1900-01-01' + tm / 24 / 4 tm, sum(sum(faults)) over (order by tm) faults
from (select floor((start_time - date '1900-01-01') * 24 * 4) tm, faults from mytable)
group by tm),
quarters as (
select to_date('2019-07-06 23:00', 'yyyy-mm-dd hh24:mi') + level * interval '15' minute tm
from dual connect by level <= 10 )
select to_char(tm, 'yyyy-mm-dd hh24:mi') tm,
nvl(faults, lag(faults, 1, 0) ignore nulls over (order by tm))
from quarters left join flts using (tm)
You can achieve it using group by with dates truncated to 15 mins and then using analytical function to calculate the cumulative sum as following:
Select
start_time
+ 15/1440
- mod((start_time - trunc(start_time)) * 1440, 15)/1440,
sum(count(1)) over (order by min(start_time))
from your_table
Group by start_time
+ 15/1440
- mod((start_time - trunc(start_time)) * 1440, 15)/1440
It is same as gordon's answer but using arithmetics. :)
Cheers!!
I am trying write a query where time stamps are in Unix format.
The objective of the query is group by these time stamps in five minute segments and to count each unique Id in those segments.
Is there a simple way of doing this?
The result looking for this
Time_utc Id count
25/07/2019 1600 1 3
25/07/2019 1600 2 1
25/07/2019 1605 1 4
You haven't shown data, so as a starting point you can group the Unix timestamps by dividing by 300 (for 5 minutes worth of seconds):
select 300 * floor(unix_ts/300) as unix_five_minute,
timestamp '1970-01-01 00:00:00 UTC'
+ (300*floor(unix_ts/300)) * interval '1' second as oracle_timestamp,
count(*)
from cte2
group by floor(unix_ts/300);
or if you have millisecond precision adjust by a factor of 1000:
select 300000 * floor(unix_ts/300000) as unix_five_minute,
timestamp '1970-01-01 00:00:00 UTC'
+ (300*floor(unix_ts/300000)) * interval '1' second as oracle_timestamp,
count(*)
from cte2
group by floor(unix_ts/300000);
Demo using made-up data generated from current time:
-- CTEs to generate some sample data
with cte1 (oracle_interval) as (
select systimestamp - level * interval '42' second
- timestamp '1970-01-01 00:00:00.0 UTC'
from dual
connect by level <= 30
),
cte2 (unix_ts) as (
select trunc(
extract(day from oracle_interval) * 86400000
+ extract(hour from oracle_interval) * 3600000
+ extract(minute from oracle_interval) * 60000
+ extract(second from oracle_interval) * 1000
)
from cte1
)
-- actual query
select 300000 * floor(unix_ts/300000) as unix_five_minute,
timestamp '1970-01-01 00:00:00 UTC'
+ (300*floor(unix_ts/300000)) * interval '1' second as oracle_timestamp,
count(*)
from cte2
group by floor(unix_ts/300000);
UNIX_FIVE_MINUTE ORACLE_TIMESTAMP COUNT(*)
---------------- ------------------------- ----------------
1564072500000 2019-07-25 16:35:00.0 UTC 7
1564072200000 2019-07-25 16:30:00.0 UTC 7
1564071600000 2019-07-25 16:20:00.0 UTC 4
1564071900000 2019-07-25 16:25:00.0 UTC 8
1564072800000 2019-07-25 16:40:00.0 UTC 4
Unix time stamps such as 155639.600 or 155639.637
Those are unusual values; Unix/epoch times are usually 10-digit numbers, or 13 digits for millisecond precision. Assuming (or rather, guessing) that they are tenths of a second for some reason:
-- CTE for sample data
with cte (unix_ts) as (
select 155639.600 from dual
union all
select 155639.637 from dual
)
-- actual query
select 300 * floor(unix_ts*10000/300) as unix_five_minute,
timestamp '1970-01-01 00:00:00 UTC'
+ (300*floor(unix_ts*10000/300)) * interval '1' second as oracle_timestamp,
count(*)
from cte
group by floor(unix_ts*10000/300);
UNIX_FIVE_MINUTE ORACLE_TIMESTAMP COUNT(*)
---------------- ------------------------- ----------------
1556396100 2019-04-27 20:15:00.0 UTC 1
1556395800 2019-04-27 20:10:00.0 UTC 1
The 10000/300 could be simplified to 100/3, but I think it's clearer left as it is.
I`m stuck a bit with understanding of my further actions while performing queries.
I have two tables "A"(date, response, b_id) and "B"(id, country). I need to count hourly ratio of a number of entries where response exists to the total number of entries on a specific date. The final selection should consist of columns "hour", "ratio".
SELECT COUNT(*) FROM A WHERE RESPONSE IS NOT NULL//counting entries with response
SELECT COUNT(*) FROM A//counting total number of entries
How to count the ratio? Should I create a separate variable for it?
How to count for each hour on a day? Should I make smth like a loop? + How can I get the "hour" part of a date?
What is the best way to select the hours and counted ratio? Should I make a separate table for it?
I`m rather new to make complex queries, so I woud be happy for every kind of help
You can do this as:
select to_char(datecol, 'HH24') as hour,
count(response) as has_response, count(*) as total,
count(response) / count(*) as ratio
from a
where datecol >= date '2018-09-18' and datecol < date '2018-09-19'
group by to_char(datecol, 'HH24');
You can also do this using avg() -- which is also fun:
select to_char(datecol, 'HH24'),
avg(case when response is not null then 1.0 else 0 end) as ratio
from a
where datecol >= date '2018-09-18' and datecol < date '2018-09-19'
group by to_char(datecol, 'HH24')
In this case, that requires more typing, though.
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE A ( dt, response, b_id ) AS
SELECT DATE '2018-09-18' + INTERVAL '00:00' HOUR TO MINUTE, NULL, 1 FROM DUAL UNION ALL
SELECT DATE '2018-09-18' + INTERVAL '00:10' HOUR TO MINUTE, 'A', 1 FROM DUAL UNION ALL
SELECT DATE '2018-09-18' + INTERVAL '00:20' HOUR TO MINUTE, 'B', 1 FROM DUAL UNION ALL
SELECT DATE '2018-09-18' + INTERVAL '01:00' HOUR TO MINUTE, 'C', 1 FROM DUAL UNION ALL
SELECT DATE '2018-09-18' + INTERVAL '01:10' HOUR TO MINUTE, 'D', 1 FROM DUAL UNION ALL
SELECT DATE '2018-09-18' + INTERVAL '02:00' HOUR TO MINUTE, NULL, 1 FROM DUAL UNION ALL
SELECT DATE '2018-09-18' + INTERVAL '03:00' HOUR TO MINUTE, 'E', 1 FROM DUAL UNION ALL
SELECT DATE '2018-09-18' + INTERVAL '05:10' HOUR TO MINUTE, 'F', 1 FROM DUAL;
Query 1:
SELECT b_id,
TO_CHAR( TRUNC( dt, 'HH' ), 'YYYY-MM-DD HH24:MI:SS' ) AS hour,
COUNT(RESPONSE) AS total_response_per_hour,
COUNT(*) AS total_per_hour,
total_response_per_day,
total_per_day,
COUNT(response) / total_response_per_day AS ratio_for_responses,
COUNT(*) / total_per_day AS ratio
FROM (
SELECT A.*,
COUNT(RESPONSE) OVER ( PARTITION BY b_id, TRUNC( dt ) ) AS total_response_per_day,
COUNT(*) OVER ( PARTITION BY b_id, TRUNC( dt ) ) AS total_per_day
FROM A
)
GROUP BY
b_id,
total_per_day,
total_response_per_day,
TRUNC( dt, 'HH' )
ORDER BY
TRUNC( dt, 'HH' )
Results:
| B_ID | HOUR | TOTAL_RESPONSE_PER_HOUR | TOTAL_PER_HOUR | TOTAL_RESPONSE_PER_DAY | TOTAL_PER_DAY | RATIO_FOR_RESPONSES | RATIO |
|------|---------------------|-------------------------|----------------|------------------------|---------------|---------------------|-------|
| 1 | 2018-09-18 00:00:00 | 2 | 3 | 6 | 8 | 0.3333333333333333 | 0.375 |
| 1 | 2018-09-18 01:00:00 | 2 | 2 | 6 | 8 | 0.3333333333333333 | 0.25 |
| 1 | 2018-09-18 02:00:00 | 0 | 1 | 6 | 8 | 0 | 0.125 |
| 1 | 2018-09-18 03:00:00 | 1 | 1 | 6 | 8 | 0.16666666666666666 | 0.125 |
| 1 | 2018-09-18 05:00:00 | 1 | 1 | 6 | 8 | 0.16666666666666666 | 0.125 |
SELECT withResponses.hour,
withResponses.cnt AS withResponse,
alls.cnt AS AllEntries,
(withResponses.cnt / alls.cnt) AS ratio
FROM
( SELECT to_char(d, 'DD-MM-YY - HH24') || ':00 to :59 ' hour,
count(*) AS cnt
FROM A
WHERE RESPONSE IS NOT NULL
GROUP BY to_char(d, 'DD-MM-YY - HH24') || ':00 to :59 ' ) withResponses,
( SELECT to_char(d, 'DD-MM-YY - HH24') || ':00 to :59 ' hour,
count(*) AS cnt
FROM A
GROUP BY to_char(d, 'DD-MM-YY - HH24') || ':00 to :59 ' ) alls
WHERE alls.hour = withResponses.hour ;
SQLFiddle: http://sqlfiddle.com/#!4/c09b9/2
I have a table with Items with
Item_id, Item_time, Item_numbers
1 2017-01-01 18:00:00 2
2 2017-01-01 18:10:00 2
3 2017-01-01 19:10:00 3
I want to group the items by hourly for some specific time (between 9 to 3 for each day) and in case if there is no entry for the particular hours then it should it be a 0.
Desired Output:
Item_time Item_numbers
2017-01-01 18:00:00 4
2017-01-01 19:00:00 3
2017-01-01 20:00:00 0
with hour_items as (select date_trunc('hour', item_time) "hour",
avg(item_numbers) as value from items where item_id=2 and
fact_time::date= '2017-01-01' group by hour) select hour, value from
hour_items where EXTRACT(HOUR FROM hour) >= '9' and EXTRACT(HOUR FROM
> hour) < '15'.
The above query groups them correctly but the where the hour is missing, there is no entry. Though it should be an entry with a 0 as stated in the desired output.
This should do.
We get all the distinct days (CTE dates), then we generate hours for each of those dates (CTE hours) and finally we left join our data on "per our" basis.
with sample_data as (
select 1 as item_id, '2018-01-01 12:03:15'::timestamp as item_time, 2 as item_numbers
union all
select 2 as item_id, '2018-01-01 12:41:15'::timestamp as item_time, 1 as item_numbers
union all
select 3 as item_id, '2018-01-01 17:41:15'::timestamp as item_time, 2 as item_numbers
union all
select 4 as item_id, '2018-01-01 19:41:15'::timestamp as item_time, 2 as item_numbers
),
dates as (
select distinct item_time::date
from sample_data
),
hours as (
select item_time + interval '1 hour' * a as hour
from dates
cross join generate_series(0,23) a
)
select h.hour, sum(coalesce(sd.item_numbers,0))
from hours h
left join sample_data sd on h.hour = date_trunc('hour', sd.item_time)
where extract(hour from hour) between 9 and 17
group by h.hour
order by h.hour