PostgresSQL query to count time interval inside day period

PostgresSQL query to count time interval inside day period - sql

I have an application that logs three states of some machine. I need to report how long the machine stayed in each state from 00:00:00 until 23:59:59 for each day.
I need a help to build a postgresql query to get entire time interval that some event occur during the day.
For example, as you can see below the data are shown between 23:50:00 previous day until 01:00:00 follwing day.
device_id varchar(50)
Value int4
Date_time timestamptz
device1
0
2022-23-04 23:50:00
device1
0
2022-24-04 00:10:00
device1
0
2022-24-04 00:15:00
device1
0
2022-24-04 00:20:00
device1
1
2022-24-04 00:25:00
device1
1
2022-24-04 00:30:00
device1
1
2022-24-04 11:00:00
device1
0
2022-24-04 21:00:00
device1
1
2022-25-04 01:00:00
I am calculating the duration between state changes and I am also inserting in the table and I am simply making the sum but it is giving me the following result:
Total:
State = 0 - 04:35
State = 1 - 20:35
Sum Both = 25:10:00
The query I need should not consider the portion of time that belong to other days that is not 24/04/2021 and must give me:
Total Day 24/04/2022:
State = 0 - 03:25
State = 1 - 20:35
Sum Both = 24:00:00
And in the and of each day I need to have the percentage of time the machine stayed in each state and build a pie chart.
Is there a way to make a query that fits this needs?
Thank You all in advance for the help.
#shawnt00 answer worked. Now I am trying to figure out how to organize the data so the query result come as the following:
device_id
state_0
state_1
state_9
device_id
timespan_state_0
timespan_state_1
timespan_state_9
For the given example, should be as shown below, I added an second device just to augment the example :)
device_id
state_0
state_1
state_9
device1
03:25
20:35
00:00
device2
X
Y
Z
Att.
Winner Martins

I think you just need to use lead()/lag() along with some case expressions to detect the spans across midnight. No join is required:
with data as (
select *,
cast(date_trunc('day', Date_time) as date) as dt,
lag(Date_time) over (partition by device_id order by Date_time) as last_Date_time,
lead(Date_time) over (partition by device_id order by Date_time) as next_Date_time
from T
)
select device_id, dt as "date", Value,
coalesce(sum(
case when date_trunc('day', next_Date_time) > date_trunc('day', Date_time)
then date_trunc('day', Date_time + interval '1 day') - Date_time
else coalesce(next_date_time - Date_time, interval '0 seconds') end
+
case when date_trunc('day', last_Date_time) < date_trunc('day', Date_time)
then Date_time - date_trunc('day', Date_time)
else interval '0 seconds' end
), interval '0 seconds') as timespan2
from data
group by device_id, dt, Value
order by device_id, dt, Value;
https://dbfiddle.uk/?rdbms=postgres_12&fiddle=ab32fee1615b637f9f2f844aa1bf5064
I'm not overly familiar with all the PostGres date functions so there's possibly a slightly cleaner way to get the time calculation.

The query below would work.
select
std.state,
sum(case
when std.rk = 1 then std.time_diff + std.time_diff_start
when std.trunc_state_start = std.trunc_state_end then std.time_diff
when std.trunc_state_start <> std.trunc_state_end then std.time_diff_end
else std.time_diff
end)
from
(
select
a.state,
date_trunc('day', a.date_time) as trunc_state_start,
date_trunc('day', b.date_time) as trunc_state_end,
b.date_time - a.date_time as time_diff,
a.date_time - date_trunc('day', a.date_time) as time_diff_start,
date_trunc('day', b.date_time) - a.date_time as time_diff_end,
rank() over(order by a.date_time) rk
from
(select ds.*, rank() over(order by date_time) rk from devicestat ds) a
inner join
(select ds.*, rank() over(order by date_time) rk from devicestat ds) b
on
a.rk + 1 = b.rk
where
date_trunc('day', a.date_time) = '2022-04-24') std
group by
std.state;
The joining between itself makes it easy for me to calculate time difference between state start and end time. The rest is how to calculate boundary differences between start and end of the day. I think there are many ways to do that, but this is what came up in my mind.

Related

SQL How to subtract 2 row values of a same column based on same key

How to extract the difference of a specific column of multiple rows with same id?
Example table:
id
prev_val
new_val
date
1
0
1
2020-01-01 10:00
1
1
2
2020-01-01 11:00
2
0
1
2020-01-01 10:00
2
1
2
2020-01-02 10:00
expected result:
id
duration_in_hours
1
1
2
24
summary:
with id=1, (2020-01-01 10:00 - 2020-01-01 11:00) is 1hour;
with id=2, (2020-01-01 10:00 - 2020-01-02 10:00) is 24hour
Can we achieve this with SQL?

This solutions will be an effective way
with pd as (
select
id,
max(date) filter (where c.old_value = '0') as "prev",
max(date) filter (where c.old_value = '1') as "new"
from
table
group by
id )
select
id ,
new - prev as diff
from
pd;

if you need the difference between successive readings something like this should work
select a.id, a.new_val, a.date - b.date
from my_table a join my_table b
on a.id = b.id and a.prev_val = b.new_val

you could use min/max subqueries. For example:
SELECT mn.id, (mx.maxdate - mn.mindate) as "duration",
FROM (SELECT id, max(date) as mindate FROM table GROUP BY id) mn
JOIN (SELECT id, min(date) as maxdate FROM table GROUP BY id) mx ON
mx.id=mn.id
Let me know if you need help in converting duration to hours.

You can use the lead()/lag() window functions to access data from the next/ previous row. You can further subtract timestamps to give an interval and extract the parts needed.
select id, floor( extract('day' from diff)*24 + extract('hour' from diff) ) "Time Difference: Hours"
from (select id, date_ts - lag(date_ts) over (partition by id order by date_ts) diff
from example
) hd
where diff is not null
order by id;
NOTE: Your expected results, as presented, are incorrect. The results would be -1 and -24 respectively.
DATE is a very poor choice for a column name. It is both a Postgres data type (at best leads to confusion) and a SQL Standard reserved word.

Snowflake SQL Time Breakdown

I have a table with a timestamp for when an incident occurred and the downtime associated with that timestamp (in minutes). I want to break down this table by minute using Time_slice and show the minute associated with each slice. For example:
Time Duration
11:34 4.5
11:40 2
to:
time Duration
11:34 1
11:35 1
11:36 1
11:37 1
11:38 0.5
11:39 1
11:40 1
How can I accomplish this?

if you are fine with the same minute being listed many times if the input time + duration over lap, then you can do this.
WITH big_list_of_numbers AS (
SELECT
ROW_NUMBER() OVER (ORDER BY SEQ4())-1 as rn
FROM generator(ROWCOUNT => 1000)
)
SELECT
DATEADD('minute', r.rn, t.time) AS TIME
IFF(r.rn > t.duration, r.rn - t.duration, 1) AS duration
FROM table AS t
JOIN big_list_of_numbers AS r
ON t.duration < r.time
ORDER BY 1
if you want the total per minute you can put a grouping on it like:
WITH big_list_of_numbers AS (
SELECT
ROW_NUMBER() OVER (ORDER BY SEQ4()) as rn
FROM generator(ROWCOUNT => 1000)
)
SELECT
DATEADD('minute', r.rn, t.time) AS TIME
SUM(IFF(r.rn > t.duration, r.rn - t.duration, 1)) AS duration
FROM table AS t
JOIN big_list_of_numbers AS r
ON t.duration < r.time
GROUP BY 1
ORDER BY 1
The GENERATOR needs fixed input, so just use a huge number, it's not the expensive. Also SEQx() function can (and do) have gaps in them, so for data where you need continuous values (like this example) the SEQx() needs to be feed into a ROW_NUMBER() to force non-distributed allocation of numbers.

Calculate time elapsed for each category in SQL

I need help for a project of equipment follow-up.
I have a SQL table with 3 columns (Equipment Name, Status, Date of Status Change (DATETIME format)).
EQUIPMENT
STATUS
CHANGEDATE
EQUIPMENT-1
QUALIFICATION
2020-06-30 09:37:42
EQUIPMENT-1
WAIT REPAIR
2020-06-30 16:29:20
EQUIPMENT-1
UP
2020-07-27 14:19:33
EQUIPMENT-1
ENGINEERING
2020-09-18 15:25:01
EQUIPMENT-1
UP
2020-09-20 17:31:53
The idea is to determine the elapsed time of each equipment in each status between 2 fixed dates.
For example, I would like to know the elpased time of EQUIPMENT-1 in all the status between the 2020-07-01 and the 2020-10-01 with a table result something like this
STATUS
ELAPSED TIME (in days)
WAIT REPAIR
26,60
UP
63,31 (10,27 + 53,05)
ENGINEERING
2,09
Today I have a C# code which calculates theses elapsed times, but it's slow...
So i would to know if it's easy to replace this process by a SQL query.
Thanks for your help,

I think you want lead() and aggregation:
select equipment, status,
sum(datediff(minute,
changedate,
coalesce(next_changedate, '2020-10-01')
) / (24 * 60.0)
) as decimal_days
from (select t.*,
lead(changedate) over (partition by equipment order by changedate)
from t
where changedate >= '2020-07-01' and changedate < '2020-10-01'
) t
group by equipment, status;
EDIT:
If you need the initial time as well:
select equipment, status,
sum(datediff(minute,
(case when changedate < '2020-07-01' then '2020-07-01' else changeddate end),
coalesce(next_changedate, '2020-10-01')
) / (24 * 60.0)
) as decimal_days
from (select t.*,
lead(changedate) over (partition by equipment order by changedate) as next_changedate
from t
) t
where changedate >= '2020-07-01' and changedate < '2020-10-01' or
next_changedate >= '2020-07-01' and next_changedate < '2020-10-01' or
group by equipment, status;

How to use BigQuery Analytic Functions to calculate time between timestamped rows?

I have a data set that represents analytics events like:
Row timestamp account_id type
1 2018-11-14 21:05:40 UTC abc start
2 2018-11-14 21:05:40 UTC xyz another_type
3 2018-11-26 22:01:19 UTC xyz start
4 2018-11-26 22:01:23 UTC abc start
5 2018-11-26 22:01:29 UTC xyz some_other_type
11 2018-11-26 22:13:58 UTC xyz start
...
With some number of account_ids. I need to find the average time between start records per account_id.
I'm trying to use analytic functions as described here. My end goal would be a table like:
Row account_id avg_time_between_events_mins
1 xyz 53
2 abc 47
3 pqr 65
...
my best attempt--based on this post--looks like this:
WITH
events AS (
SELECT
COUNTIF(type = 'start' AND account_id='abc') OVER (ORDER BY timestamp) as diff,
timestamp
FROM
`myproject.dataset.events`
WHERE
account_id='abc')
SELECT
min(timestamp) AS start_time,
max(timestamp) AS next_start_time,
ABS(timestamp_diff(min(timestamp), max(timestamp), MINUTE)) AS minutes_between
FROM
events
GROUP BY
diff
This calculates the time between each start event and the last non-start event prior to the next start event for a specific account_id.
I tried to use PARTITION and a WINDOW FRAME CLAUSE like this:
WITH
events AS (
SELECT
COUNT(*) OVER (PARTITION BY account_id ORDER BY timestamp ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) as diff,
timestamp
FROM
`myproject.dataset.events`
WHERE
type = 'start')
SELECT
min(timestamp) AS start_time,
max(timestamp) AS next_start_time,
ABS(timestamp_diff(min(timestamp), max(timestamp), MINUTE)) AS minutes_between
FROM
events
GROUP BY
diff
But I got a nonsense result table. Can anyone walk me through how I would write and reason about a query like this?

You don't really need analytic functions for this:
select timestamp_diff(min(timestamp), max(timestamp), MINUTE)) / nullif(count(*) - 1, 0)
from `myproject.dataset.events`
where type = 'start'
group by account_id;
This is the timestamp of the most recent minus the oldest, divided by one less than the number of starts. That is the average between the starts.

Query aggregated data with a given sampling time

Suppose my raw data is:
Timestamp High Low Volume
10:24.22345 100 99 10
10:24.23345 110 97 20
10:24.33455 97 89 40
10:25.33455 60 40 50
10:25.93455 40 20 60
With a sample time of 1 second, the output data should be as following (add additional column):
Timestamp High Low Volume Count
10:24 110 89 70 3
10:25 60 20 110 2
The sampling unit from varying from 1 second, 5 sec, 1 minute, 1 hour, 1 day, ...
How to query the sampled data in quick time in the PostgreSQL database with Rails?
I want to fill all the interval by getting the error
ERROR: JOIN/USING types bigint and timestamp without time zone cannot be matched
SQL
SELECT
t.high,
t.low
FROM
(
SELECT generate_series(
date_trunc('second', min(ticktime)) ,
date_trunc('second', max(ticktime)) ,
interval '1 sec'
) FROM czces AS g (time)
LEFT JOIN
(
SELECT
date_trunc('second', ticktime) AS time ,
max(last_price) OVER w AS high ,
min(last_price) OVER w AS low
FROM czces
WHERE product_type ='TA' AND contract_month = '2014-08-01 00:00:00'::TIMESTAMP
WINDOW w AS (
PARTITION BY date_trunc('second', ticktime)
ORDER BY ticktime ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)
) t USING (time)
ORDER BY 1
) AS t ;

Simply use date_trunc() before you aggregate. Works for basic time units 1 second, 1 minute, 1 hour, 1 day - but not for 5 sec. Arbitrary intervals are slightly more complex, see link below!
SELECT date_trunc('second', timestamp) AS timestamp -- or minute ...
, max(high) AS high, min(low) AS low, sum(volume) AS vol, count(*) AS ct
FROM tbl
GROUP BY 1
ORDER BY 1;
If there are no rows for a sample point, you get no row in the result. If you need one row for every sample point:
SELECT g.timestamp, t.high, t.low, t.volume, t.ct
FROM (SELECT generate_series(date_trunc('second', min(timestamp))
,date_trunc('second', max(timestamp))
,interval '1 sec') AS g (timestamp) -- or minute ...
LEFT JOIN (
SELECT date_trunc('second', timestamp) AS timestamp -- or minute ...
, max(high) AS high, min(low) AS low, sum(volume) AS vol, count(*) AS ct
FROM tbl
GROUP BY 1
) t USING (timestamp)
ORDER BY 1;
The LEFT JOIN is essential.
For arbitrary intervals:
Best way to count records by arbitrary time intervals in Rails+Postgres
Retrieve aggregates for arbitrary time intervals
Aside: Don't use timestamp as column name. It's a basic type name and a reserved word in standard SQL. It's also misleading for data that's not actually a timestamp.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

PostgresSQL query to count time interval inside day period - sql

Related

SQL How to subtract 2 row values of a same column based on same key

Snowflake SQL Time Breakdown

Calculate time elapsed for each category in SQL

How to use BigQuery Analytic Functions to calculate time between timestamped rows?

Query aggregated data with a given sampling time

Categories

Resources