How to use date_diff for two adjacent sessions using BigQuery?

How to use date_diff for two adjacent sessions using BigQuery? - sql

I'm trying to calculate average hours between two adjacent sessions using the data from the following table:
user_id
event_timestamp
session_num
A
2021-04-16 10:00:00.000 UTC
1
A
2021-04-16 11:00:00.000 UTC
2
A
2021-04-16 13:00:00.000 UTC
3
A
2021-04-16 16:00:00.000 UTC
4
B
2021-04-16 12:00:00.000 UTC
1
B
2021-04-16 14:00:00.000 UTC
2
B
2021-04-16 19:00:00.000 UTC
3
C
2021-04-16 10:00:00.000 UTC
1
C
2021-04-16 17:00:00.000 UTC
2
C
2021-04-16 18:00:00.000 UTC
3
So, for user A we have
1 hour between session_num = 2 and session_num = 1,
2 hours between session_num = 3 and session_num = 2,
3 hours between session_num = 4 and session_num = 3.
Same for the other users:
2, 5 hours for user B;
7, 1 hours for user C.
The result I expect to get should be the arithmetic average of this date_diff(HOUR).
So, avg(1,2,3,2,5,7,1) = 3 hours is the average time between two adjacent sessions.
Any one have an idea what query can be used so the date_diff function would be applien only for anjacent sessions?

The average hours between sessions for a given user is most simply calculated as:
select user_id,
timestamp_diff(max(event_timestamp), min(event_timestamp), hour) * 1.0 / nullif(count(*) - 1, 0)
from t
group by user_id;
That is, the average time between sessions for a user is the maximum timestamp minus the minimum timestamp divided by one less than the number of sessions.

Try this one:
with mytable as (
select 'A' as user_id, timestamp '2021-04-16 10:00:00.000' as event_timestamp, 1 as session_num union all
select 'A', '2021-04-16 11:00:00.000', 2 as session_num union all
select 'A', '2021-04-16 13:00:00.000', 3 as session_num union all
select 'A', '2021-04-16 16:00:00.000', 4 as session_num union all
select 'B', '2021-04-16 12:00:00.000', 1 as session_num union all
select 'B', '2021-04-16 14:00:00.000', 2 as session_num union all
select 'B', '2021-04-16 19:00:00.000', 3 as session_num union all
select 'C', '2021-04-16 10:00:00.000', 1 as session_num union all
select 'C', '2021-04-16 17:00:00.000', 2 as session_num union all
select 'C', '2021-04-16 18:00:00.000', 3 as session_num
)
select avg(diff) as average
from (
select
user_id,
timestamp_diff(event_timestamp, lag(event_timestamp) OVER (partition by user_id order by event_timestamp), hour) as diff
from mytable
)

Related

Time difference between different stages in data set

I was wondering if you could help me out. Im not sure if this is possible but given the table of data below I was wondering if it is possible to write a query that could easily show the time taken for each car between the carViewed and carBought stage. Ideally I would like to see the carID along with the time. For example the results should be something like this:
CarID
TimeDifference
1
00:17:83
2
00:04:21
3
01:57:83
Data
CarID
Stage
Timestamp
1
carArrived
2022-01-20 13:00:00
1
carViewed
2022-01-20 14:00:00
1
carBought
2022-01-20 14:17:83
1
carLeft
2022-01-20 15:17:83
2
carArrived
2022-01-21 15:00:00
2
carViewed
2022-01-21 16:00:00
2
carBought
2022-01-21 16:04:21
2
carLeft
2022-01-21 16:27:83
3
carArrived
2022-01-22 13:00:00
3
carViewed
2022-01-22 14:00:00
3
carBought
2022-01-22 15:57:83
3
carLeft
2022-01-22 16:17:83
Any help with this would be greatly appreciated. Thank you.

Use conditional aggregation:
SELECT carid,
MAX(CASE stage WHEN 'carBought' THEN timestamp END)
- MIN(CASE stage WHEN 'carViewed' THEN timestamp END) AS timeDifference
FROM table_name
GROUP BY carid
Which, for the sample data:
CREATE TABLE table_name (CarID, Stage, Timestamp) AS
SELECT 1, 'carArrived', TIMESTAMP '2022-01-20 13:00:00' FROM DUAL UNION ALL
SELECT 1, 'carViewed', TIMESTAMP '2022-01-20 14:00:00' FROM DUAL UNION ALL
SELECT 1, 'carBought', TIMESTAMP '2022-01-20 14:17:53' FROM DUAL UNION ALL
SELECT 1, 'carLeft', TIMESTAMP '2022-01-20 15:17:53' FROM DUAL UNION ALL
SELECT 2, 'carArrived', TIMESTAMP '2022-01-21 15:00:00' FROM DUAL UNION ALL
SELECT 2, 'carViewed', TIMESTAMP '2022-01-21 16:00:00' FROM DUAL UNION ALL
SELECT 2, 'carBought', TIMESTAMP '2022-01-21 16:04:21' FROM DUAL UNION ALL
SELECT 2, 'carLeft', TIMESTAMP '2022-01-21 16:27:53' FROM DUAL UNION ALL
SELECT 3, 'carArrived', TIMESTAMP '2022-01-22 13:00:00' FROM DUAL UNION ALL
SELECT 3, 'carViewed', TIMESTAMP '2022-01-22 14:00:00' FROM DUAL UNION ALL
SELECT 3, 'carBought', TIMESTAMP '2022-01-22 15:57:53' FROM DUAL UNION ALL
SELECT 3, 'carLeft', TIMESTAMP '2022-01-22 16:17:53' FROM DUAL;
Outputs:
CARID
TIMEDIFFERENCE
1
+000000000 00:17:53.000000000
2
+000000000 00:04:21.000000000
3
+000000000 01:57:53.000000000
db<>fiddle here

Last 12 months of data for each item/row from the selected date in oracle sql

I need to identify last 12 months of data for each item in each row. For eg: if A item has dateperiod:1/12/2021. it will provide an output as Qty: 23 which covers from 11/2021 to 12/2020
How can I write Sql in Oracle to achieve this below result.
Four columns item, Qty, DatePeriod and 12 months Qty Value
Item Qty DatePeriod 12 Months Qty Value
A 2 1/1/2020
A 3 1/2/2020
A 4 1/3/2020
A 1 1/4/2020
A 2 1/5/2020
A 2 1/6/2020
A 1 1/7/2020
A 2 1/8/2020
A 1 1/9/2020
A 2 1/10/2020
A 2 1/11/2020
A 2 1/12/2020
A 2 1/1/2021
A 3 1/2/2021
A 4 1/3/2021
A 1 1/4/2021
A 2 1/5/2021
A 2 1/6/2021
A 1 1/7/2021
A 2 1/8/2021
A 1 1/9/2021
A 2 1/10/2021 9/2021 to 10/2020 qty: 24
A 1 1/11/2021 10/ 2021 to 11/2020 Qty: 24
A 1 1/12/2021 11/2021 to 12/2020 Qty: 23
B 2 1/1/2020
B 2 1/2/2020
B 2 1/3/2020
B 5 1/4/2020
B 6 1/5/2020
B 2 1/6/2020
B 1 1/7/2020
B 2 1/8/2020
B 1 1/9/2020
B 2 1/10/2020
B 2 1/11/2020
B 2 1/12/2020
B 2 1/1/2021
B 1 1/2/2021
B 1 1/3/2021
B 1 1/4/2021
B 1 1/5/2021
B 2 1/6/2021
B 1 1/7/2021
B 2 1/8/2021
B 2 1/9/2021
B 3 1/10/2021 9/2021 to 10/2020 qty: 19
B 2 1/11/2021 10/ 2021 to 11/2020 Qty: 20
B 2 1/12/2021 11/2021 to 12/2020 Qty: 20

To find the last 12 months data for each item (which may have different times for the latest items) then, from Oracle 12, you can use MATCH_RECOGNIZE to perform row-by-row processing:
SELECT *
FROM table_name
MATCH_RECOGNIZE (
PARTITION BY item
ORDER BY DatePeriod DESC
MEASURES
LAST(dateperiod) AS from_date,
FIRST(dateperiod) AS to_date,
SUM(qty) AS total
PATTERN (^ year+)
DEFINE year AS dateperiod > ADD_MONTHS(FIRST(datePeriod), -12)
);
In earlier versions, you can use:
SELECT item,
from_date,
dateperiod AS to_date,
total
FROM (
SELECT t.*,
SUM(qty) OVER (
PARTITION BY item
ORDER BY dateperiod
RANGE BETWEEN INTERVAL '11' MONTH PRECEDING
AND INTERVAL '0' MONTH FOLLOWING
) AS total,
MIN(dateperiod) OVER (
PARTITION BY item
ORDER BY dateperiod
RANGE BETWEEN INTERVAL '11' MONTH PRECEDING
AND INTERVAL '0' MONTH FOLLOWING
) AS from_date,
ROW_NUMBER() OVER (PARTITION BY item ORDER BY dateperiod DESC) AS rn
FROM table_name t
)
WHERE rn = 1;
Which, for your sample data:
CREATE TABLE table_name (Item, Qty, DatePeriod) AS
SELECT 'A', 2, DATE '2020-01-01' FROM DUAL UNION ALL
SELECT 'A', 3, DATE '2020-02-01' FROM DUAL UNION ALL
SELECT 'A', 4, DATE '2020-03-01' FROM DUAL UNION ALL
SELECT 'A', 1, DATE '2020-04-01' FROM DUAL UNION ALL
SELECT 'A', 2, DATE '2020-05-01' FROM DUAL UNION ALL
SELECT 'A', 2, DATE '2020-06-01' FROM DUAL UNION ALL
SELECT 'A', 1, DATE '2020-07-01' FROM DUAL UNION ALL
SELECT 'A', 2, DATE '2020-08-01' FROM DUAL UNION ALL
SELECT 'A', 1, DATE '2020-09-01' FROM DUAL UNION ALL
SELECT 'A', 2, DATE '2020-10-01' FROM DUAL UNION ALL
SELECT 'A', 2, DATE '2020-11-01' FROM DUAL UNION ALL
SELECT 'A', 2, DATE '2020-12-01' FROM DUAL UNION ALL
SELECT 'A', 2, DATE '2021-01-01' FROM DUAL UNION ALL
SELECT 'A', 3, DATE '2021-02-01' FROM DUAL UNION ALL
SELECT 'A', 4, DATE '2021-03-01' FROM DUAL UNION ALL
SELECT 'A', 1, DATE '2021-04-01' FROM DUAL UNION ALL
SELECT 'A', 2, DATE '2021-05-01' FROM DUAL UNION ALL
SELECT 'A', 2, DATE '2021-06-01' FROM DUAL UNION ALL
SELECT 'A', 1, DATE '2021-07-01' FROM DUAL UNION ALL
SELECT 'A', 2, DATE '2021-08-01' FROM DUAL UNION ALL
SELECT 'A', 1, DATE '2021-09-01' FROM DUAL UNION ALL
SELECT 'A', 2, DATE '2021-10-01' FROM DUAL UNION ALL
SELECT 'A', 1, DATE '2021-11-01' FROM DUAL UNION ALL
SELECT 'A', 1, DATE '2021-12-01' FROM DUAL UNION ALL
SELECT 'B', 2, DATE '2020-01-01' FROM DUAL UNION ALL
SELECT 'B', 2, DATE '2020-02-01' FROM DUAL UNION ALL
SELECT 'B', 2, DATE '2020-03-01' FROM DUAL UNION ALL
SELECT 'B', 5, DATE '2020-04-01' FROM DUAL UNION ALL
SELECT 'B', 6, DATE '2020-05-01' FROM DUAL UNION ALL
SELECT 'B', 2, DATE '2020-06-01' FROM DUAL UNION ALL
SELECT 'B', 1, DATE '2020-07-01' FROM DUAL UNION ALL
SELECT 'B', 2, DATE '2020-08-01' FROM DUAL UNION ALL
SELECT 'B', 1, DATE '2020-09-01' FROM DUAL UNION ALL
SELECT 'B', 2, DATE '2020-10-01' FROM DUAL UNION ALL
SELECT 'B', 2, DATE '2020-11-01' FROM DUAL UNION ALL
SELECT 'B', 2, DATE '2020-12-01' FROM DUAL UNION ALL
SELECT 'B', 2, DATE '2021-01-01' FROM DUAL UNION ALL
SELECT 'B', 1, DATE '2021-02-01' FROM DUAL UNION ALL
SELECT 'B', 1, DATE '2021-03-01' FROM DUAL UNION ALL
SELECT 'B', 1, DATE '2021-04-01' FROM DUAL UNION ALL
SELECT 'B', 1, DATE '2021-05-01' FROM DUAL UNION ALL
SELECT 'B', 2, DATE '2021-06-01' FROM DUAL UNION ALL
SELECT 'B', 1, DATE '2021-07-01' FROM DUAL UNION ALL
SELECT 'B', 2, DATE '2021-08-01' FROM DUAL UNION ALL
SELECT 'B', 2, DATE '2021-09-01' FROM DUAL UNION ALL
SELECT 'B', 3, DATE '2021-10-01' FROM DUAL UNION ALL
SELECT 'B', 2, DATE '2021-11-01' FROM DUAL UNION ALL
SELECT 'B', 2, DATE '2021-12-01' FROM DUAL;
Both output:
ITEM
FROM_DATE
TO_DATE
TOTAL
A
2021-01-01 00:00:00
2021-12-01 00:00:00
22
B
2021-01-01 00:00:00
2021-12-01 00:00:00
20
If you want to get the running totals for each row then you can use the SUM analytic function with a range window:
SELECT t.*,
SUM(qty) OVER (
PARTITION BY item
ORDER BY dateperiod
RANGE BETWEEN INTERVAL '11' MONTH PRECEDING
AND INTERVAL '0' MONTH FOLLOWING
) AS last_year_total
FROM table_name t
Which outputs:
ITEM
QTY
DATEPERIOD
LAST_YEAR_TOTAL
A
2
2020-01-01 00:00:00
2
A
3
2020-02-01 00:00:00
5
A
4
2020-03-01 00:00:00
9
A
1
2020-04-01 00:00:00
10
A
2
2020-05-01 00:00:00
12
A
2
2020-06-01 00:00:00
14
A
1
2020-07-01 00:00:00
15
A
2
2020-08-01 00:00:00
17
A
1
2020-09-01 00:00:00
18
A
2
2020-10-01 00:00:00
20
A
2
2020-11-01 00:00:00
22
A
2
2020-12-01 00:00:00
24
A
2
2021-01-01 00:00:00
24
A
3
2021-02-01 00:00:00
24
A
4
2021-03-01 00:00:00
24
A
1
2021-04-01 00:00:00
24
A
2
2021-05-01 00:00:00
24
A
2
2021-06-01 00:00:00
24
A
1
2021-07-01 00:00:00
24
A
2
2021-08-01 00:00:00
24
A
1
2021-09-01 00:00:00
24
A
2
2021-10-01 00:00:00
24
A
1
2021-11-01 00:00:00
23
A
1
2021-12-01 00:00:00
22
B
2
2020-01-01 00:00:00
2
B
2
2020-02-01 00:00:00
4
B
2
2020-03-01 00:00:00
6
B
5
2020-04-01 00:00:00
11
B
6
2020-05-01 00:00:00
17
B
2
2020-06-01 00:00:00
19
B
1
2020-07-01 00:00:00
20
B
2
2020-08-01 00:00:00
22
B
1
2020-09-01 00:00:00
23
B
2
2020-10-01 00:00:00
25
B
2
2020-11-01 00:00:00
27
B
2
2020-12-01 00:00:00
29
B
2
2021-01-01 00:00:00
29
B
1
2021-02-01 00:00:00
28
B
1
2021-03-01 00:00:00
27
B
1
2021-04-01 00:00:00
23
B
1
2021-05-01 00:00:00
18
B
2
2021-06-01 00:00:00
18
B
1
2021-07-01 00:00:00
18
B
2
2021-08-01 00:00:00
18
B
2
2021-09-01 00:00:00
19
B
3
2021-10-01 00:00:00
20
B
2
2021-11-01 00:00:00
20
B
2
2021-12-01 00:00:00
20
db<>fiddle here

Next value per group in SQL

I am trying to fix a data quality issue and I have the following table origin:
WITH origin AS (
SELECT 1 AS item_id, 'cake' as item_group, DATE '2020-04-01' AS start_date, DATE '2020-12-07' AS end_date, 1 as group_rank UNION ALL
SELECT 2, 'cake', DATE '2020-12-07',DATE '2020-12-31', 2 as group_rank UNION ALL
SELECT 3, 'cake', DATE '2020-12-07',DATE '2020-12-31', 2 as group_rank UNION ALL
SELECT 4, 'cake', DATE '2020-12-07',DATE '2020-12-31', 2 as group_rank UNION ALL
SELECT 5, 'cake', DATE '2020-12-07',DATE '2020-12-31', 2 as group_rank UNION ALL
SELECT 6, 'cake', DATE '2020-12-31',DATE '2021-12-07', 3 as group_rank UNION ALL
SELECT 7, 'cake', DATE '2020-12-31',DATE '2021-12-07', 3 as group_rank UNION ALL
SELECT 8, 'pie', DATE '2020-12-07',DATE '2020-12-31', 1 as group_rank UNION ALL
SELECT 9, 'pie', DATE '2020-12-31',DATE '2021-12-07', 2 as group_rank UNION ALL
SELECT 10, 'pie', DATE '2020-12-31',DATE '2021-12-07', 2 as group_rank
)
select *
from origin
item_id
item_group
start_date
end_date
group_rank
1
cake
2020-04-01
2020-12-07
1
2
cake
2020-12-07
2020-12-31
2
3
cake
2020-12-07
2020-12-31
2
4
cake
2020-12-07
2020-12-31
2
5
cake
2020-12-07
2020-12-31
2
6
cake
2020-12-31
2021-12-07
3
7
cake
2020-12-31
2021-12-07
3
8
pie
2020-12-07
2020-12-31
1
9
pie
2020-12-31
2021-12-07
2
10
pie
2020-12-31
2021-12-07
2
Every row is a unique item, which is of a certain item_group: pie or cake. Items within the group are ranked according to the start_date. The problem with the table is that when I do a join with a calendar table, because some items have overlapping start_date and end_date (1 item ends the same day when the other one end ) I end up having duplicates. What I want to achieve, I want to fix the end_dates (-1 day) of the old items.
For that I need to understand whether the items are overlapping within 1 day. I thought i'd use the rank to find the next value within the group: basically check current rank, find the one higher, take the start_date of the higher rank. But i couldn't figure out the way to get this right.
So my ideal table is the following:
WITH final_result AS (
SELECT 1 AS item_id, 'cake' as item_group, DATE '2020-04-01' AS start_date, DATE '2020-12-07' AS end_date, 1 as group_rank, DATE '2020-12-07' as next_group_start_date, 1 as end_date_equals_next_group_start_date, DATE '2020-12-06' as new_end_date UNION ALL
SELECT 2, 'cake', DATE '2020-12-07',DATE '2020-12-31', 2 as group_rank, DATE '2020-12-31', 1, DATE '2020-12-30' UNION ALL
SELECT 3, 'cake', DATE '2020-12-07',DATE '2020-12-31', 2 as group_rank, DATE '2020-12-31', 1, DATE '2020-12-30' UNION ALL
SELECT 4, 'cake', DATE '2020-12-07',DATE '2020-12-31', 2 as group_rank, DATE '2020-12-31', 1, DATE '2020-12-30' UNION ALL
SELECT 5, 'cake', DATE '2020-12-07',DATE '2020-12-31', 2 as group_rank, DATE '2020-12-31', 1, DATE '2020-12-30' UNION ALL
SELECT 6, 'cake', DATE '2020-12-31',DATE '2021-12-07', 3 as group_rank, NULL, 0, DATE '2020-12-07' UNION ALL
SELECT 7, 'cake', DATE '2020-12-31',DATE '2021-12-07', 3 as group_rank, NULL, 0, DATE '2020-12-07' UNION ALL
SELECT 8, 'pie', DATE '2020-12-07',DATE '2020-12-31', 1 as group_rank, DATE '2020-12-31', 1, DATE '2020-12-30' UNION ALL
SELECT 9, 'pie', DATE '2020-12-31',DATE '2021-12-07', 2 as group_rank, NULL, 0, DATE '2020-12-06' UNION ALL
SELECT 10, 'pie', DATE '2020-12-31',DATE '2021-12-07', 2 as group_rank, NULL, 0, DATE '2020-12-06'
)
select *
from final_result
item_id
item_group
start_date
end_date
group_rank
next_group_start_date
end_date_equals_next_group_start_date
new_end_date
1
cake
2020-04-01
2020-12-07
1
2020-12-07
1
2020-12-06
2
cake
2020-12-07
2020-12-31
2
2020-12-31
1
2020-12-30
3
cake
2020-12-07
2020-12-31
2
2020-12-31
1
2020-12-30
4
cake
2020-12-07
2020-12-31
2
2020-12-31
1
2020-12-30
5
cake
2020-12-07
2020-12-31
2
2020-12-31
1
2020-12-30
6
cake
2020-12-31
2021-12-07
3
NULL
0
2020-12-07
7
cake
2020-12-31
2021-12-07
3
NULL
0
2020-12-07
8
pie
2020-12-07
2020-12-31
1
2020-12-31
1
2020-12-30
9
pie
2020-12-31
2021-12-07
2
NULL
0
2020-12-06
10
pie
2020-12-31
2021-12-07
2
NULL
0
2020-12-06
By identifying the new_group_start_date I can understand whether there is an overlap on a day. end_date_equals_next_group_start_dateshows whether start_date = new_group_start_date, i.e. there is an overlap. If so - I can create a new_end_date, which is end_date - 1.

What you want to do is use the LEAD() window function.
SELECT
*,
LEAD(start_date, 1) OVER(PARTITION BY item_group ORDER BY group_rank) AS next_group_start_date
FROM origin
This works but doesn't give the exact result you were expecting. In order to get the expected result you need to join the origin table with a table using the LEAD() window function a distinct item_group, group, start_date table.
SELECT
*,
end_date - end_date_equals_next_group_start_date AS new_end_date
FROM (
SELECT
origin.*,
b.next_group_start_date,
CASE
WHEN origin.end_date = b.next_group_start_date
THEN 1
ELSE 0
END AS end_date_equals_next_group_start_date
FROM origin
JOIN (
SELECT
item_group,
group_rank,
LEAD(start_date, 1) OVER(PARTITION BY item_group ORDER BY group_rank) AS next_group_start_date
FROM (
SELECT DISTINCT item_group, group_rank, start_date
FROM origin
) a
) b ON origin.item_group = b.item_group and origin.group_rank = b.group_rank
) c
Here's a dbfiddle of the query

BigQuery: Computing the timestamp diff in time ordered rows in a group

Given a table like this, I would like to compute the time duration of each state before changing to a different state:
id state timestamp
1 1 2018-08-17 10:40:00
1 2 2018-08-17 12:40:00
1 1 2018-08-17 14:40:00
2 1 2018-08-17 09:00:00
2 2 2018-08-17 12:00:00
The output I want is:
id state date duration
1 1 2018-08-17 2 hours
1 2 2018-08-17 2 hours
1 1 2018-08-17 9 hours 20 minutes (until the end of the day in this case)
2 1 2018-08-17 3 hours
2 2 2018-08-17 12 hours (until the end of the day in this case)
I am not so sure whether this is doable in SQL. I feel like I have to write a UDF against aggregated state and timestamp (grouped by id and ordered by ts) which outputs an array of struct (id, state, date, and duration). This array can be flattened.

Below is for BigQuery Standard SQL
#standardSQL
SELECT id, state,
IFNULL(
TIMESTAMP_DIFF(LEAD(ts) OVER(PARTITION BY id ORDER BY ts), ts, MINUTE),
24*60 - TIMESTAMP_DIFF(ts, TIMESTAMP_TRUNC(ts, DAY), MINUTE)
) AS duration_minutes
FROM `project.dataset.table`
You can test, play with above using dummy data from your question:
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, 1 state, TIMESTAMP('2018-08-17 10:40:00') ts UNION ALL
SELECT 1, 2, '2018-08-17 12:40:00' UNION ALL
SELECT 1, 1, '2018-08-17 14:40:00' UNION ALL
SELECT 2, 1, '2018-08-17 09:00:00' UNION ALL
SELECT 2, 2, '2018-08-17 12:00:00'
)
SELECT id, state,
IFNULL(
TIMESTAMP_DIFF(LEAD(ts) OVER(PARTITION BY id ORDER BY ts), ts, MINUTE),
24*60 - TIMESTAMP_DIFF(ts, TIMESTAMP_TRUNC(ts, DAY), MINUTE)
) AS duration_minutes
FROM `project.dataset.table`
-- ORDER BY id, ts
with result as below
Row id state duration_minutes
1 1 1 120
2 1 2 120
3 1 1 560
4 2 1 180
5 2 2 720
If you need your output formatted exactly the qay you showed in question - use below
#standardSQL
SELECT id, state, ts, duration_minutes,
FORMAT('%i hours %i minutes', DIV(duration_minutes, 60), MOD(duration_minutes, 60)) duration
FROM (
SELECT id, state, ts,
IFNULL(
TIMESTAMP_DIFF(LEAD(ts) OVER(PARTITION BY id ORDER BY ts), ts, MINUTE),
24*60 - TIMESTAMP_DIFF(ts, TIMESTAMP_TRUNC(ts, DAY), MINUTE)
) AS duration_minutes
FROM `project.dataset.table`
)
In this case you output will look like below
Row id state ts duration_minutes duration
1 1 1 2018-08-17 10:40:00 UTC 120 2 hours 0 minutes
2 1 2 2018-08-17 12:40:00 UTC 120 2 hours 0 minutes
3 1 1 2018-08-17 14:40:00 UTC 560 9 hours 20 minutes
4 2 1 2018-08-17 09:00:00 UTC 180 3 hours 0 minutes
5 2 2 2018-08-17 12:00:00 UTC 720 12 hours 0 minutes
Sure, you will most likely still need to adjust above to your particular case - but you've got a good start I think

How to get data for previous 7 days based on set of dates in groups in sql

I was going through this forum for my query to get data for previous 7 days,but most of them give it for current date.Below is my requirement:
I have a Table 1 as below:
These are start dates of week which is monday
from_date
2016-01-04
2016-01-11
2016-01-18
Table 2
I have all days of week here starting from monday.Ex: jan 04 - monday to jan 10 - sunday and so on for other weeks also.
get_date flag value
2016-01-04 N 4
2016-01-05 N 9
2016-01-06 Y 2
2016-01-07 Y 13
2016-01-08 Y 7
2016-01-09 Y 8
2016-01-10 Y 8
2016-01-11 Y 1
2016-01-12 Y 9
2016-01-13 N 8
2016-01-14 N 24
2016-01-15 N 8
2016-01-16 Y 4
2016-01-17 Y 5
2016-01-18 Y 9
2016-01-19 Y 2
2016-01-20 Y 8
2016-01-21 Y 4
2016-01-22 N 9
2016-01-23 N 87
2016-01-24 Y 3
Expected Result
here wk is the unique number for each start-end dates respectively
avg value is the avg of the values for the dates in that week.
last 2 days of the week are weekend days.
say 2016-01-09 and 2016-01-10 are weekends
from_date get_date Wk Total_days Total_weekdays_flag_Y Total_weekenddays_flag_Y Avg_value
2016-01-04 2016-01-10 1 7 3 2 6.714285714
2016-01-11 2016-01-17 2 7 2 2 8.428571429
2016-01-18 2016-01-24 3 7 4 1 17.42857143
Could anyone help me with this as I am not good at sql.
Thanks

select
from_date
, Wk
, count(case when day_of_week <=5 and flag = 'Y' then 1 end) as Total_weekdays_flag_Y
, count(case when day_of_week > 5 and flag = 'Y' then 1 end) as Total_weekenddays_flag_Y
, avg(value) as Avg_value
from (
select trunc(get_date,'IW') as from_date
, (trunc(get_date,'IW')- trunc(date'2016-01-04','IW'))/7 + 1 as Wk
, flag
, value
, get_date - trunc(get_date,'IW') as day_of_week
from Table_2)
group by from_date, Wk
order by from_date, Wk;
EDIT:
/*generate some test_data for table 2*/
with table_2 (get_date, flag, value) as (
select date'2016-01-03' + level,
DECODE(mod(level,3),0,'Y','N'),
round(dbms_random.value(0,10))
from dual connect by level < 101
),
/*generate some test_weeks for table 1*/
table_1 (FROM_date) as (select date'2016-01-04' + (level-1)*7 from dual connect by level < 101 )
/*main query */
select
from_date
, Wk
, count(day_of_week) as total
, count(case when day_of_week <=5 and flag = 'Y' then 1 end) as Total_weekdays_flag_Y
, count(case when day_of_week > 5 and flag = 'Y' then 1 end) as Total_weekenddays_flag_Y
, avg(value) as Avg_value
from (
select last_value(from_date ignore nulls) over (order by get_date) as from_date
,last_value(Wk ignore nulls) over (order by get_date) as Wk
, flag
, value
, get_date - trunc(get_date,'IW') as day_of_week
from Table_2 t2
full join (select row_number() over (order by from_date) as wk,from_date from table_1) t1 on t2.get_date = t1.from_date
)
group by from_date, Wk
having count(day_of_week) > 0
order by from_date, Wk

In the query below, I create the test data right within the query; in final form, you would delete the subqueries table_1 and table_2 and use the rest.
The syntax will work from Oracle 11.2 on. In Oracle 11.1, you need to move the column names in factored subqueries to the select... from dual part. Or, since you really only have one subquery (prep) and an outer query, you can write prep as an actual, in-line subquery.
Your arithmetic seems off on the average for the first week.
In your sample output you use get_date for the last day of the week. That is odd, since in table_2 that name has a different meaning. I used to_date in my output. I also do not show total_days - that is always 7, so why include it at all? (If it is not always 7, then there is something you didn't tell us; anyway, a count(...), if that is what it should be, is easy to add).
with
-- begin test data, can be removed in final solution
table_1 ( from_date ) as (
select date '2016-01-04' from dual union all
select date '2016-01-11' from dual union all
select date '2016-01-18' from dual
)
,
table_2 ( get_date, flag, value ) as (
select date '2016-01-04', 'N', 4 from dual union all
select date '2016-01-05', 'N', 9 from dual union all
select date '2016-01-06', 'Y', 2 from dual union all
select date '2016-01-07', 'Y', 13 from dual union all
select date '2016-01-08', 'Y', 7 from dual union all
select date '2016-01-09', 'Y', 8 from dual union all
select date '2016-01-10', 'Y', 8 from dual union all
select date '2016-01-11', 'Y', 1 from dual union all
select date '2016-01-12', 'Y', 9 from dual union all
select date '2016-01-13', 'N', 8 from dual union all
select date '2016-01-14', 'N', 24 from dual union all
select date '2016-01-15', 'N', 8 from dual union all
select date '2016-01-16', 'Y', 4 from dual union all
select date '2016-01-17', 'Y', 5 from dual union all
select date '2016-01-18', 'Y', 9 from dual union all
select date '2016-01-19', 'Y', 2 from dual union all
select date '2016-01-20', 'Y', 8 from dual union all
select date '2016-01-21', 'Y', 4 from dual union all
select date '2016-01-22', 'N', 9 from dual union all
select date '2016-01-23', 'N', 87 from dual union all
select date '2016-01-24', 'Y', 3 from dual
),
-- end test data, continue actual query
prep ( get_date, flag, value, from_date, wd_flag ) as (
select t2.get_date, t2.flag, t2.value, t1.from_date,
case when t2.get_date - t1.from_date <= 4 then 'wd' else 'we' end
from table_1 t1 inner join table_2 t2
on t2.get_date between t1.from_date and t1.from_date + 6
)
select from_date,
from_date + 6 as to_date,
row_number() over (order by from_date) as wk,
count(case when flag = 'Y' and wd_flag = 'wd' then 1 end)
as total_weekday_Y,
count(case when flag = 'Y' and wd_flag = 'we' then 1 end)
as total_weekend_Y,
round(avg(value), 6) as avg_value
from prep
group by from_date;
Output:
FROM_DATE TO_DATE WK TOTAL_WEEKDAY_Y TOTAL_WEEKEND_Y AVG_VALUE
---------- ---------- ---- --------------- --------------- ----------
2016-01-04 2016-01-10 1 3 2 7.285714
2016-01-11 2016-01-17 2 2 2 8.428571
2016-01-18 2016-01-24 3 4 1 17.428571

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to use date_diff for two adjacent sessions using BigQuery? - sql

Related

Time difference between different stages in data set

Last 12 months of data for each item/row from the selected date in oracle sql

Next value per group in SQL

BigQuery: Computing the timestamp diff in time ordered rows in a group

How to get data for previous 7 days based on set of dates in groups in sql

Categories

Resources