I need your help to get the total running duration per day from a table when I record only start and stop events:
id
ts
event
1
2020-12-26 09:00:00.589016
0
2
2020-12-26 10:25:00.589016
1
3
2020-12-26 19:30:45.644092
0
4
2020-12-26 22:30:00.554092
1
0 = stop event
1 = start event
The difficulty here is to compute the duration between start and stop events but also:
if a start event is the day before, include the duration between midnight and the first start event (in this example 9h)
Any idea to achieve it ?
Assuming your Times are already in a datetime_64 format as shown below:
ts event
id
1 2020-12-25 23:55:09.589016 1
2 2020-12-26 00:05:18.589016 0
3 2020-12-26 09:00:00.589016 1
4 2020-12-26 10:25:00.589016 0
5 2020-12-26 19:30:45.644092 1
6 2020-12-26 22:30:00.554092 0
You can do the following:
dfs = df.loc[df.event == 1]
dfs = dfs.rename(columns={"ts": "Start"})
dfs.reset_index(drop= True, inplace=True)
dfnd = df.loc[df.event==0]
dfnd = dfnd.rename(columns={"ts": "Stop"})
dfnd.reset_index(drop= True, inplace=True)
dfdur = dfnd.Stop - dfs.Start
Which Yields the following:
0 0 days 00:10:09
1 0 days 01:25:00
2 0 days 02:59:14.910000
For each row with event = 0 and non existing previous row of the same day with event = 1 create another row with ts at midnight of the same day.
Similarly for each row with event = 1 and non existing next row of the same day with event = 0 create another row with ts at 23:59:59.99999 of the same day.
This can be done in a CTE.
Then use window function LAG() for each row with event = 0 to get the starting time and with strftime() calculate the difference and finally aggregate on all the differences of each day:
WITH cte AS (
SELECT ts, event FROM tablename
UNION ALL
SELECT datetime(date(t.ts)), 1
FROM tablename t
WHERE event = 0 AND NOT EXISTS (SELECT 1 FROM tablename WHERE event = 1 AND date(ts) = date(t.ts) AND ts < t.ts)
UNION ALL
SELECT date(t.ts) || ' 23:59:59.999999', 0
FROM tablename t
WHERE event = 1 AND NOT EXISTS (SELECT 1 FROM tablename WHERE event = 0 AND date(ts) = date(t.ts) AND ts > t.ts)
)
SELECT date(ts) date,
SUM(strftime('%s', ts) - strftime('%s', prev_ts)) total
FROM (
SELECT *, LAG(ts) OVER (ORDER BY ts) prev_ts
FROM cte
)
WHERE event = 0
GROUP BY date
You will get the total per day in seconds.
If you want better accuracy you can use the function julianday() instead of strftime():
..............................
SELECT date(ts) date,
SUM(julianday(ts) - julianday(prev_ts)) * 24 * 3600 total
..............................
Or, a more efficient way:
WITH cte AS (
SELECT *,
LAG(ts, 1, date(ts)) OVER (PARTITION BY date(ts) ORDER BY ts) start_ts,
event = 1 AND LEAD(ts) OVER (PARTITION BY date(ts) ORDER BY ts) IS NULL flag
FROM tablename
)
SELECT date(ts) date,
SUM(
CASE flag
WHEN 0 THEN strftime('%s', ts) - strftime('%s', start_ts)
WHEN 1 THEN strftime('%s', date(ts) || ' 23:59:59.999999') - strftime('%s', ts)
END
) total
FROM cte
WHERE event = 0 OR flag = 1
GROUP BY date
Note that this code works only if all datetimes are in the format YYYY-MM-DD hh:mm:ss.ssssss (I noticed that in your sample data there is a value that is not of that format: '2020-12-26 9:00:00.589016').
See the demo.
Results:
> date | total
> :--------- | ----:
> 2020-12-26 | 70544
You can find the difference between the start and stop times for each interval, and then sum the latter result, grouped by the day:
with _events as (select row_number() over (order by t1.id) r, substr(t1.ts, 0, instr(t1.ts, " ")) day, t1.* from test t1),
events as (select (select sum(e2.event = 1 and e2.r < e1.r) from _events e2) c, e1.* from _events e1)
select day_r.day, sum(diff) from (
select e3.day, (julianday(max(e3.ts)) - julianday(min(e3.ts)))*24*60*60 diff
from events e3
group by e3.c
)
day_r group by day_r.day;
Related
I have the following table in SQL Server. I would like to find the longest duration for the machine running.
Row
DateTime
Machine On
1
9/22/2022 8:20
1
2
9/22/2022 9:10
0
3
9/22/2022 10:40
1
4
9/22/2022 10:52
0
5
9/22/2022 12:30
1
6
9/22/2022 14:30
0
7
9/22/2022 15:00
1
8
9/22/2022 15:40
0
9
9/22/2022 16:25
1
10
9/22/2022 16:55
0
In the example above, the longest duration for the machine is ON is 2 hours using rows 5 and 6. What would be the best SQL statement that can provide the longest duration given a time range?
Desired Result:
60 minutes
I have looked into the LAG Function and the LEAD Function in SQL.
Here's another way that uses traditional gaps & islands methodology:
WITH src AS
(
SELECT Island, mint = MIN([Timestamp]), maxt = MAX([Timestamp])
FROM
(
SELECT [Timestamp], Island =
ROW_NUMBER() OVER (ORDER BY [Timestamp]) -
ROW_NUMBER() OVER (PARTITION BY Running ORDER BY [Timestamp])
FROM dbo.Machine_Status
) AS x GROUP BY Island
)
SELECT TOP (1) delta =
(DATEDIFF(second, mint, LEAD(mint,1) OVER (ORDER BY island)))
FROM src ORDER BY delta DESC;
Example db<>fiddle based on the sample data in your new duplicate.
If this is really your data, you can simply use INNER JOIN and DATEDIFF:
SELECT MAX(DATEDIFF(MINUTE, T1.[DateTime], T2.[DateTime]))
FROM [my_table] T1
INNER JOIN [my_table] T2
ON T1.[Row] + 1 = T2.[Row];
This is a gaps and islands problem, one option to solve it is to use a running sum that increased by 1 whenever a machine_on = 0, this will define unique groups for consecutive 1s followed by 0.
select top 1 datediff(minute, min([datetime]), max([datetime])) duration
from
(
select *,
sum(case when machine_on = 0 then 1 else 0 end) over (order by datetime desc) grp
from table_name
) T
group by grp
order by datediff(minute, min([datetime]), max([datetime])) desc
See demo
This is a classic Gaps and Islands with a little twist Adj
Example
Select Top 1
Row1 = min(row)
,Row2 = max(row)+1
,TS1 = min(TimeStamp)
,TS2 = dateadd(SECOND,max(Adj),max(TimeStamp))
,Dur = datediff(Second,min(TimeStamp),max(TimeStamp)) + max(Adj)
From (
Select *
,Grp = row_number() over( partition by Running order by TimeStamp) - row_number() over (order by timeStamp)
,Adj = case when Running=1 and lead(Running,1) over (order by timestamp) = 0 then datediff(second,TimeStamp,lead(TimeStamp,1) over (order by TimeStamp) ) else 0 end
From Machine_Status
) A
Where Running=1
Group By Grp
Order By Dur Desc
Results
Row1 Row2 TS1 TS2 Dur
8 12 2023-01-10 08:25:30.000 2023-01-10 08:28:55.000 205
I have the following table in SQL Server. I would like to find the longest duration for the machine running.
Row
DateTime
Machine On
1
9/22/2022 8:20
1
2
9/22/2022 9:10
0
3
9/22/2022 10:40
1
4
9/22/2022 10:52
0
5
9/22/2022 12:30
1
6
9/22/2022 14:30
0
7
9/22/2022 15:00
1
8
9/22/2022 15:40
0
9
9/22/2022 16:25
1
10
9/22/2022 16:55
0
In the example above, the longest duration for the machine is ON is 2 hours using rows 5 and 6. What would be the best SQL statement that can provide the longest duration given a time range?
Desired Result:
60 minutes
I have looked into the LAG Function and the LEAD Function in SQL.
Here's another way that uses traditional gaps & islands methodology:
WITH src AS
(
SELECT Island, mint = MIN([Timestamp]), maxt = MAX([Timestamp])
FROM
(
SELECT [Timestamp], Island =
ROW_NUMBER() OVER (ORDER BY [Timestamp]) -
ROW_NUMBER() OVER (PARTITION BY Running ORDER BY [Timestamp])
FROM dbo.Machine_Status
) AS x GROUP BY Island
)
SELECT TOP (1) delta =
(DATEDIFF(second, mint, LEAD(mint,1) OVER (ORDER BY island)))
FROM src ORDER BY delta DESC;
Example db<>fiddle based on the sample data in your new duplicate.
If this is really your data, you can simply use INNER JOIN and DATEDIFF:
SELECT MAX(DATEDIFF(MINUTE, T1.[DateTime], T2.[DateTime]))
FROM [my_table] T1
INNER JOIN [my_table] T2
ON T1.[Row] + 1 = T2.[Row];
This is a gaps and islands problem, one option to solve it is to use a running sum that increased by 1 whenever a machine_on = 0, this will define unique groups for consecutive 1s followed by 0.
select top 1 datediff(minute, min([datetime]), max([datetime])) duration
from
(
select *,
sum(case when machine_on = 0 then 1 else 0 end) over (order by datetime desc) grp
from table_name
) T
group by grp
order by datediff(minute, min([datetime]), max([datetime])) desc
See demo
This is a classic Gaps and Islands with a little twist Adj
Example
Select Top 1
Row1 = min(row)
,Row2 = max(row)+1
,TS1 = min(TimeStamp)
,TS2 = dateadd(SECOND,max(Adj),max(TimeStamp))
,Dur = datediff(Second,min(TimeStamp),max(TimeStamp)) + max(Adj)
From (
Select *
,Grp = row_number() over( partition by Running order by TimeStamp) - row_number() over (order by timeStamp)
,Adj = case when Running=1 and lead(Running,1) over (order by timestamp) = 0 then datediff(second,TimeStamp,lead(TimeStamp,1) over (order by TimeStamp) ) else 0 end
From Machine_Status
) A
Where Running=1
Group By Grp
Order By Dur Desc
Results
Row1 Row2 TS1 TS2 Dur
8 12 2023-01-10 08:25:30.000 2023-01-10 08:28:55.000 205
I have data like
id | date |
-------------
1 | 1.1.20 |
3 | 4.1.20 |
2 | 4.1.20 |
1 | 5.1.20 |
6 | 2.1.20 |
What I would like to get is to get the amount of occurrences an user with ID did in the past 2 weeks on any given date so basically "occurences between date - 14 days and date. I'm trying to categorize users by their amount of sessions past 2 weeks, and I'm following them by daily cohorts.
This query does not work since there can be days when the user does not log in aka does not have a row:
COUNT (distinct id) OVER (PARTITION BY id ORDER BY date ROWS BETWEEN 14 PRECEDING AND 0 FOLLOWING)
Unfortunately, Presto does not support range() window functions. One method is a self-join/aggregation or correlated subquery:
select t.id, count(tprev.id)
from t left join
t tprev
on tprev.id = t.id and
tprev.date > t.date - interval '13' day and
tprev.date <= t.date
group by t.id;
This interprets your request as wanting 14 days of data, including the current day.
Another method that is much more verbose but might be faster is to use lag() . . . and lag() again:
select t.id,
(1 + -- current date
(case when lag(date, 1) over (partition by id order by date) > date - interval '14' day then 1 else 0 end) +
(case when lag(date, 2) over (partition by id order by date) > date - interval '14' day then 1 else 0 end) +
. . .
(case when lag(date, 13) over (partition by id order by date) > date - interval '14' day then 1 else 0 end) +
) as cnt_14
from t;
I have dates and some value, I would like to sum values within 7-day cycle starting from the first date.
date value
01-01-2021 1
02-01-2021 1
05-01-2021 1
07-01-2021 1
10-01-2021 1
12-01-2021 1
13-01-2021 1
16-01-2021 1
18-01-2021 1
22-01-2021 1
23-01-2021 1
30-01-2021 1
this is my input data with 4 groups to see what groups will create the 7-day cycle.
It should start with first date and sum all values within 7 days after first date included.
then start a new group with next day plus anothe 7 days, 10-01 till 17-01 and then again new group from 18-01 till 25-01 and so on.
so the output will be
group1 4
group2 4
group3 3
group4 1
with match_recognize would be easy current_day < first_day + 7 as a condition for the pattern but please don't use match_recognize clause as solution !!!
One approach is a recursive CTE:
with tt as (
select dte, value, row_number() over (order by dte) as seqnum
from t
),
cte (dte, value, seqnum, firstdte) as (
select tt.dte, tt.value, tt.seqnum, tt.dte
from tt
where seqnum = 1
union all
select tt.dte, tt.value, tt.seqnum,
(case when tt.dte < cte.firstdte + interval '7' day then cte.firstdte else tt.dte end)
from cte join
tt
on tt.seqnum = cte.seqnum + 1
)
select firstdte, sum(value)
from cte
group by firstdte
order by firstdte;
This identifies the groups by the first date. You can use row_number() over (order by firstdte) if you want a number.
Here is a db<>fiddle.
I have a time series as follows :
Day. Data
1/1/2020. 0
2/1/2020 .2
3/1/2020 0
...... ...
1/2/2020 0
2/2/2020. 0
3/2/2020. .2
4/2/2020. .3
5/2/2020. 0
6/2/2020 0
7/2/2020. 0
8/2/2020 2
9/2/2020 2.4
10/2/2020 3
So I want filter data only show after final sequence of zeros that we have in time series in this case I want to get only data after 8/2/202.
I have tried this
SELECT * FROM table where Data> 0
here is the result :
Day. Data
2/1/2020 .2
...... ...
3/2/2020. .2
4/2/2020. .3
8/2/2020 2
9/2/2020 2.4
10/2/2020 3
However this does not find the lates 0 and remove everything before that.
I want also show the result 2 days after the final zero in sequence in the table.
Day Data
10/2/2020 3
11/2/2020. 3.5
..... ....
One method is:
select t.*
from t
where t.day > (select max(t2.day) from t t2 where t2.value = 0);
You can offset this:
where t.day > (select max(t2.day) + interval '2' day from t t2 where t2.value = 0);
The above assumes that at least one row has zeros. Here are two easy fixes:
where t.day > all (select max(t2.day) from t t2 where t2.value = 0);
or:
where t.day > (select coalesce(max(t2.day), '2000-01-01') from t t2 where t2.value = 0);
You can use window functions:
select day, data
from (
select t.*, max(case when data = 0 then day end) over() day0
from mytable t
) t
where day > day0 or day0 is null
order by day0
This is easily adapted if you want to start two days after the last 0:
select day, data
from (
select t.*, max(case when data = 0 then day end) over() day0
from mytable t
) t
where day > day0 + interval '2 day' or day0 is null
order by day0