Group by arbitrary interval - sql

I have a column that is of type timestamp. I would like to dynamically group the results by random period time (it can be 10 seconds or even 5 hours).
Supposing, I have this kind of data:
Image
If the user provides 2 hours and wants to get the max value of the air_pressure, I would like to have the first row combined with the second one. The result should look like this:
date | max air_pressure
2022-11-22 00:00:00:000 | 978.81666667
2022-11-22 02:00:00:000 | 978.53
2022-11-22 04:00:00:000 | 987.23333333
and so on. As I mentioned, the period must be easy to change, because maybe he wants to group by days/seconds...
The functionality should work like function date_trunc(). But that can only group by minutes/seconds/hours, while I would like to group for arbitrary intervals.

Basically:
SELECT g.start_time, max(air_pressure) AS max_air_pressure
FROM generate_series($start
, $end
, interval '15 min') g(start_time)
LEFT JOIN tbl t ON t.date_id >= g.start_time
AND t.date_id < g.start_time + interval '15 min' -- same interval
GROUP BY 1
ORDER BY 1;
$start and $end are timestamps delimiting your time frame of interest.
Returns all time slots, and NULL for max_air_pressure if no matching entries are found for the time slot.
See:
Best way to count rows by arbitrary time intervals
Aside: "date_id" is an unfortunate column name for a timestamp.

Related

Averaging a variable over a period of time

I am currently having difficulty formulating this into an sql query:
I would like to average the data of a column here twa for a duration of 10 minutes starting from the last value of the table i.e. data included here:
last date-10minutes<=date<=last date
I tried to start a first query but it does not show the right answer:
SELECT AVG(twa), horaire FROM OF50 WHERE ((SELECT horaire FROM of50 ORDER BY horaire DESC LIMIT 1)-INTERVAL '1 minutes'>horaire) ORDER BY horaire;
Regards,
Maybe this will do.
with t as (select max(horaire) maxhoraire from of50)
select AVG(of50.twa)
from of50, t
where of50.horaire between t.maxhoraire - interval '1 minute' and t.maxhoraire;
or even this may do, given that the last value can not be 'younger' then now and at least one event happened during the last minute, though it is not exactly the same and says 'the average over the last 1 minute'
select AVG(twa)
from of50
where horaire >= now() - interval '1 minute';

getting day wise query result for a certain time period in postgresql

i have a table in postgresql database called orders. where all the order related informations are stored. now, if an order gets rejected that certain order row gets moved from the orders table and gets stored in the rejected_orders table. As a result, the count function does not provide the correct number of orders.
Now, if I want to get the number of order request(s) in a certain day. I have to subtract the id numbers between the last order of the day and first order of the day. Below, i have the query for number total request for March 1st, 2022. Sadly, the previous employe forgot to save the timezone correctly in the database. Data is saved in the DB at UTC+00 timezone, Fetched data needs to be in GMT+06 timezone.
select
(select id from orders
where created_at<'2022-03-02 00:00:00+06'
order by created_at desc limit 1
)
-
(select id from orders
where created_at>='2022-03-01 00:00:00+06'
order by created_at limit 1
) as march_1st;
march_1st
-----------
185
Now,
If I want to get total request per day for certain time period(let's for month March, 2021). how can I do that in one sql query without having to write one query per day ?
To wrap-up,
total_request_per_day = id of last order of the day - id of first
order of the day.
How do I write a query based on that logic that would give me total_request_per_day for every day in a certain month.
like this,
|Date | total requests|
|01-03-2022 | 187 |
|02-03-2022 | 202 |
|03-03-2022 | 227 |
................
................
With respect, using id numbers to determine numbers of rows in a time period is incorrect. DELETEing rows leaves gaps in id number sequences; they are not designed for this purpose.
This is a job for date_trunc(), COUNT(*), and GROUP BY.
The date_trunc('day', created_at) function turns an arbitrary timestamp into midnight on its day. For example, it turns ``2022-03-02 16:41:00into2022-03-02 00:00:00`. Using that we can write the query this way.
SELECT COUNT(*) order_count,
date_trunc('day', created_at) day
FROM orders
WHERE created_at >= date_trunc('day', NOW()) - INTERVAL '7 day'
AND created_at < date_trunc('day', NOW())
GROUP BY date_trunc('day', created_at)
This query gives the number of orders on each day in the last 7 days.
Every minute you spend learning how to use SQL data arithmetic like this will pay off in hours saved in your work.
Try this :
SELECT d.ref_date :: date AS "date"
, count(*) AS "total requests"
FROM generate_series('20220301' :: timestamp, '20220331' :: timestamp, '1 day') AS d(ref_date)
LEFT JOIN orders
ON date_trunc('day', d.ref_date) = date_trunc('day', created_at)
GROUP BY d.ref_date
generate_series() generates the list of reference days where you
want to count the number of orders
Then you join with the orders table by comparing the reference date with the created_at date on year/month/day only. LEFT JOIN allows you to select reference days with no existing order.
Finally you count the number of orders per day by grouping by reference day.

Get count of matching time ranges for every minute of the day in Postgres

Problem
I have a table of records each containing id, in_datetime, and out_datetime. A record is considered "open" during the time between the in_datetime and out_datetime. I want to know how many time records were "open" for each minute of the day (regardless of date). For example, for the last 90 days I want to know how many records were "open" at 3:14 am, then 3:15 am, then 3:16 am, then... If no records were "open" at 2:00 am the query should return 0 or null instead of excluding the row, thus 1440 rows should always be returned (the number of minutes in a day). Datetimes are stored in UTC and need to be cast to a time zone.
Simplified example graphic
record_id | time_range
| 0123456789 (these are minutes past midnight)
1 | =========
2 | ===
3 | =======
4 | ===
5 | ==
______________________
result 3323343210
Desired output
time | count of open records at this time
00:00 120
00:01 135
00:02 132
...
23:57 57
23:58 62
23:59 60
No more than 1440 records would ever be returned as there are only 1440 minutes in the day.
What I've tried
1.) In a subquery, I currently generate a minutely series of times for the entire range of each time record. I then group those by time and get a count of the records per minute.
Here is a db-fiddle using my current query:
select
trs.minutes,
count(trs.minutes)
from (
select
generate_series(
DATE_TRUNC('minute', (time_records.in_datetime::timestamptz AT TIME ZONE 'America/Denver')),
DATE_TRUNC('minute', (time_records.out_datetime::timestamptz AT TIME ZONE 'America/Denver')),
interval '1 min'
)::time as minutes
from
time_records
) trs
group by
trs.minutes
This works but is quite inefficient and takes several seconds to run due to the size of my table. Additionally, it excludes times when no records were open. I think somehow I could use window functions to count the number of overlapping time records for each minute of the day, but I don't quite understand how to do that.
2.) Modifying Gordon Linoff's query in his answer below, I came to this (db-fiddle link):
with tr as (
select
date_trunc('minute', (tr.in_datetime::timestamptz AT TIME ZONE 'America/Denver'))::time as m,
1 as inc
from
time_records tr
union all
select
(date_trunc('minute', (tr.out_datetime::timestamptz AT TIME ZONE 'America/Denver')) + interval '1 minute')::time as m,
-1 as inc
from
time_records tr
union all
select
minutes::time,
0
from
generate_series(timestamp '2000-01-01 00:00', timestamp '2000-01-01 23:59', interval '1 min') as minutes
)
select
m,
sum(inc) as changes_at_inc,
sum(sum(inc)) over (order by m) as running_count
from
tr
where
m is not null
group by
m
order by
m;
This runs reasonably quickly, but towards the end of the day (about 22:00 onwards in the linked example) the values turn negative for some reason. Additionally, this query doesn't seem to work correctly with records with time ranges that cross over midnight. It's a step in the right direction, but I unfortunately don't understand it enough to improve on it further.
Here is a faster method. Generate "in" and "out" records for when something gets counted. Then aggregate and use a running sum.
To get all minutes, throw in a generate_series() for the time period in question:
with tr as (
select date_trunc('minute', (tr.in_datetime::timestamptz AT TIME ZONE 'America/Denver')) as m,
1 as inc
from time_records tr
union all
select date_trunc('minute', (tr.out_datetime::timestamptz AT TIME ZONE 'America/Denver')) + interval '1 minute' as m,
-1 as inc
from time_records tr
union all
select generate_series(date_trunc('minute',
min(tr.in_datetime::timestamptz AT TIME ZONE 'America/Denver')),
date_trunc('minute',
max(tr.out_datetime::timestamptz AT TIME ZONE 'America/Denver')),
interval '1 minute'
), 0
from time_records tr
)
select m,
sum(inc) as changes_at_inc,
sum(sum(inc)) over (order by m) as running_count
from tr
group by m
order by m;

Efficient PostgreSQL Query for Mins and Maxis withing equal intervals in a time period

I am using Postgres v9.2.6.
Have a system with lots of devices that take measurements. These measurements are stored in
table with three fields.
device_id
measurement (Indexed)
time (Indexed)
There could be 10 Million measurements in a single year. Most of the time the user is only interested in 100 min max pairs within equal interval for a certain period, for example in last 24 hours or in last 53 weeks. To get these 100 mins and maxs the period is divided into 100 equal intervals. From each interval min and max is extracted. Would you recommend the most efficient approach to query the data? So far I have tried the following query:
WITH periods AS (
SELECT time.start AS st, time.start + (interval '1 year' / 100) AS en FROM generate_series(now() - interval '1 year', now(), interval '1 year' / 100) AS time(start)
)
SELECT * FROM sample_data
JOIN periods
ON created_at BETWEEN periods.st AND periods.en AND
customer_id = 23
WHERE
sample_data.id = (SELECT id FROM sample_data WHERE created_at BETWEEN periods.st AND periods.en ORDER BY sample ASC LIMIT 1)
This test approach took over a minute for 1 million points on MacBook Pro.
Thanks...
Sorry about that. It was actually my question and looks like the author of this post caught cold so I ca not ask him to edit it. I've posted "more good" question here - Slow PostgreSQL Query for Mins and Maxs within equal intervals in a time period. Could you please close this question?

MySQL Sum based on date range and time of day

I have a large set of data collected every 15 minutes. I am trying to select data between a certain time period and then within that time period divide it up by another date intervals. And within those intervals sum over a certain time period.
For example, I would like to be able to select data between 01/01/2009 and 01/01/2010 and group by date ranges 01/01/2009 - 05/01/2009, 05/02/2009 - 11/01/2009, 11/02/2009 - 01/01/2010 and then within each group select the data from time 00:00:01 - 12:00:00 and 12:00:01 - 23:59:59
SELECT SUM(Data.usage)AS sum
FROM Data.meter_id = Meter.id
WHERE Data.start_read >= '2009-01-01'
AND Data.end_read <= '2010-01-01 23:59:59'
GROUP BY date range? Not sure how to separate the data. Thanks
To group by date ranges, I often use case statements:
Group By Case
When start_read between '01/01/2009' and '05/01/2010' then 'Jan-Apr 09'
When start_read between '05/01/2009' and '11/01/2010' then 'May-Nov 09'
...etc