Averaging a variable over a period of time - sql

I am currently having difficulty formulating this into an sql query:
I would like to average the data of a column here twa for a duration of 10 minutes starting from the last value of the table i.e. data included here:
last date-10minutes<=date<=last date
I tried to start a first query but it does not show the right answer:
SELECT AVG(twa), horaire FROM OF50 WHERE ((SELECT horaire FROM of50 ORDER BY horaire DESC LIMIT 1)-INTERVAL '1 minutes'>horaire) ORDER BY horaire;
Regards,

Maybe this will do.
with t as (select max(horaire) maxhoraire from of50)
select AVG(of50.twa)
from of50, t
where of50.horaire between t.maxhoraire - interval '1 minute' and t.maxhoraire;
or even this may do, given that the last value can not be 'younger' then now and at least one event happened during the last minute, though it is not exactly the same and says 'the average over the last 1 minute'
select AVG(twa)
from of50
where horaire >= now() - interval '1 minute';

Related

Google Big Query to look at data of 2 specific dates

I am new to Big Query. I am trying to do a where condition to only select yesterday's data and that of same day last year (in this case, 10/25/2021 data and 10/25/2020 data). I know how to select a range of data, but I couldn't figure out a way to only select those 2 days of data. Any help is appreciated.
I recommend using BigQuery functions to define dates. You can read about them here.
WHERE DATE(your_date_field) IN ((DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY),
DATE_SUB(DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY), INTERVAL 1 YEAR))
This is dynamic to any day that you run the query. It will take the current date, then subtract 1 day. For the other date, it will take the current date and subtract 1 day and then 1 year, making it yesterday's date 1 year prior.
WHERE date_my_field IN (DATE('2021-10-25'), DATE('2020-10-25'))
Use IN which is a short cut for OR operator
Consider below (less verbose approach - especially if you remove time zone)
select
current_date('America/Los_Angeles') - 1 as yesterday,
date(current_date('America/Los_Angeles') - 1 - interval 1 year) same_day_last_year
with output
So, now you can use it in your WHERE clause as in below example (with dummy data via CTE)
with data as (
select your_date_field
from unnest(generate_date_array(current_date() - 1000, current_date())) your_date_field
)
select *
from data
where your_date_field in (
current_date('America/Los_Angeles') - 1,
date(current_date('America/Los_Angeles') - 1 - interval 1 year)
)
with output

Simulate query over a range of dates

I have a fairly long query that looks over the past 13 weeks and determines if the current day's performance is an anomaly compared to the last 13 weeks. It just returns a single row that has the date, the performance of the current day and a flag saying if it is an anomaly or not. To make matters a little more complicated: The performance isn't just a single day but rather a running 24 hour window. This query is then run every hour to monitor the KPI over the last 24 hours. i.e. If it is 2pm on Tuesday, it will look from 2pm the previous day (Monday) to now, and compare it to every other 2pm-to-2pm for the last 13 weeks.
To test if this code is working I would like simulate it running over the past month.
The code goes as follows:
WITH performance AS(
SELECT TRUNC(dateColumn - to_number(to_char(sysdate, 'hh24')/24) as startdate,
KPI_a,
KPI_b,
KPI_c
FROM table
WHERE someConditions
GROUP BY TRUNC(dateColumn - to_number(to_char(sysdate, 'hh24')/24)),
compare_t AS(
-- looks at relationships of the KPIs),
variables AS(
-- calculates the variables required for the anomaly detection),
... ok I don't know how much of the query needs to be given but it's basically I need to simulate 'sysdate'. Instead of inputting the current date, input each hour for the last month so this query will run approx 720 times and return the result 720 times, for each hour of each day.
I'm thinking a FOR loop, but I'm not sure.
You can use a recursive subquery:
with times(time) as
(
select sysdate - interval '1' month as time from dual
union all
select time + interval '1' hour from times
where time < sysdate
)
, performance as ()
, compare_t as ()
, variables as ()
select *
from times
join ...
order by time;
I don't understand your specific requirements but I had to solve similar problems. To give you an idea here are two proposals:
Calculate average and standard deviation of KPI value from past 13 weeks to yesterday. If current value from today it lower than "AVG - 10*STDDEV" then select record, i.e. mark as anomaly.
WITH t AS
(SELECT dateColumn, KPI_A,
AVG(KPI_A) OVER (ORDER BY dateColumn RANGE BETWEEN 13 * INTERVAL '7' DAY PRECEDING AND INTERVAL '1' DAY PRECEDING) AS REF_AVG,
STDDEV(KPI_A) OVER (ORDER BY dateColumn RANGE BETWEEN 13 * INTERVAL '7' DAY PRECEDING AND INTERVAL '1' DAY PRECEDING) AS REF_STDDEV
FROM TABLE
WHERE someConditions)
SELECT dateColumn, REF_AVG, KPI_A, REF_STDDEV
FROM t
WHERE TRUNC(dateColumn, 'HH') = TRUNC(LOCALTIMESTAMP, 'HH')
AND KPI_A < REF_AVG - 10 * REF_STDDEV;
Take hourly values from last week (i.e. the same weekday as yesterday) and make correlation with hourly values from yesterday. If correlation is less than certain value (I use 95%) then consider this day as anomaly.
WITH t AS
(SELECT dateColumn, KPI_A,
FIRST_VALUE(KPI_A) OVER (ORDER BY dateColumn RANGE BETWEEN INTERVAL '7' DAY PRECEDING AND CURRENT ROW) AS KPI_A_LAST_WEEK,
dateColumn - FIRST_VALUE(dateColumn) OVER (ORDER BY dateColumn RANGE BETWEEN INTERVAL '7' DAY PRECEDING AND CURRENT ROW) AS RANGE_INT
FROM table
WHERE ...)
SELECT 100*ROUND(CORR(KPI_A, KPI_A_LAST_WEEK), 2) AS CORR_VAL
FROM t
WHERE KPI_A_LAST_WEEK IS NOT NULL
AND RANGE_INT = INTERVAL '7' DAY
AND TRUNC(dateColumn) = TRUNC(LOCALTIMESTAMP - INTERVAL '1' DAY)
GROUP BY TRUNC(dateColumn);

Efficient PostgreSQL Query for Mins and Maxis withing equal intervals in a time period

I am using Postgres v9.2.6.
Have a system with lots of devices that take measurements. These measurements are stored in
table with three fields.
device_id
measurement (Indexed)
time (Indexed)
There could be 10 Million measurements in a single year. Most of the time the user is only interested in 100 min max pairs within equal interval for a certain period, for example in last 24 hours or in last 53 weeks. To get these 100 mins and maxs the period is divided into 100 equal intervals. From each interval min and max is extracted. Would you recommend the most efficient approach to query the data? So far I have tried the following query:
WITH periods AS (
SELECT time.start AS st, time.start + (interval '1 year' / 100) AS en FROM generate_series(now() - interval '1 year', now(), interval '1 year' / 100) AS time(start)
)
SELECT * FROM sample_data
JOIN periods
ON created_at BETWEEN periods.st AND periods.en AND
customer_id = 23
WHERE
sample_data.id = (SELECT id FROM sample_data WHERE created_at BETWEEN periods.st AND periods.en ORDER BY sample ASC LIMIT 1)
This test approach took over a minute for 1 million points on MacBook Pro.
Thanks...
Sorry about that. It was actually my question and looks like the author of this post caught cold so I ca not ask him to edit it. I've posted "more good" question here - Slow PostgreSQL Query for Mins and Maxs within equal intervals in a time period. Could you please close this question?

Best way to count rows by arbitrary time intervals

My app has a Events table with time-stamped events.
I need to report the count of events during each of the most recent N time intervals. For different reports, the interval could be "each week" or "each day" or "each hour" or "each 15-minute interval".
For example, a user can display how many orders they received each week, day, or hour, or quarter-hour.
1) My preference is to dynamically do a single SQL query (I'm using Postgres) that groups by an arbitrary time interval. Is there a way to do that?
2) An easy but ugly brute force way is to do a single query for all records within the start/end timeframe sorted by timestamp, then have a method manually build a tally by whatever interval.
3) Another approach would be add separate fields to the event table for each interval and statically store an the_week the_day, the_hour, and the_quarter_hour field so I take the 'hit' at the time the record is created (once) instead of every time I report on that field.
What's best practice here, given I could modify the model and pre-store interval data if required (although at the modest expense of doubling the table width)?
Luckily, you are using PostgreSQL. The window function generate_series() is your friend.
Test case
Given the following test table (which you should have provided):
CREATE TABLE event(event_id serial, ts timestamp);
INSERT INTO event (ts)
SELECT generate_series(timestamp '2018-05-01'
, timestamp '2018-05-08'
, interval '7 min') + random() * interval '7 min';
One event for every 7 minutes (plus 0 to 7 minutes, randomly).
Basic solution
This query counts events for any arbitrary time interval. 17 minutes in the example:
WITH grid AS (
SELECT start_time
, lead(start_time, 1, 'infinity') OVER (ORDER BY start_time) AS end_time
FROM (
SELECT generate_series(min(ts), max(ts), interval '17 min') AS start_time
FROM event
) sub
)
SELECT start_time, count(e.ts) AS events
FROM grid g
LEFT JOIN event e ON e.ts >= g.start_time
AND e.ts < g.end_time
GROUP BY start_time
ORDER BY start_time;
The query retrieves minimum and maximum ts from the base table to cover the complete time range. You can use an arbitrary time range instead.
Provide any time interval as needed.
Produces one row for every time slot. If no event happened during that interval, the count is 0.
Be sure to handle upper and lower bound correctly. See:
Unexpected results from SQL query with BETWEEN timestamps
The window function lead() has an often overlooked feature: it can provide a default for when no leading row exists. Providing 'infinity' in the example. Else the last interval would be cut off with an upper bound NULL.
Minimal equivalent
The above query uses a CTE and lead() and verbose syntax. Elegant and maybe easier to understand, but a bit more expensive. Here is a shorter, faster, minimal version:
SELECT start_time, count(e.ts) AS events
FROM (SELECT generate_series(min(ts), max(ts), interval '17 min') FROM event) g(start_time)
LEFT JOIN event e ON e.ts >= g.start_time
AND e.ts < g.start_time + interval '17 min'
GROUP BY 1
ORDER BY 1;
Example for "every 15 minutes in the past week"`
Formatted with to_char().
SELECT to_char(start_time, 'YYYY-MM-DD HH24:MI'), count(e.ts) AS events
FROM generate_series(date_trunc('day', localtimestamp - interval '7 days')
, localtimestamp
, interval '15 min') g(start_time)
LEFT JOIN event e ON e.ts >= g.start_time
AND e.ts < g.start_time + interval '15 min'
GROUP BY start_time
ORDER BY start_time;
Still ORDER BY and GROUP BY on the underlying timestamp value, not on the formatted string. That's faster and more reliable.
db<>fiddle here
Related answer producing a running count over the time frame:
PostgreSQL: running count of rows for a query 'by minute'

SELECT using timestamp in Oracle SQL

my problem is trying to use a SELECT statement and order the top 10 by a certain column. I managed to compile something after searching through lots of forums, however I need to confirm that the timestamp in one field is within the last week. I have gotten this to execute however i'm not sure whether this is correct as I can't print the value for the where clause:
SELECT itemid, count(itemid)
FROM Rateddate
WHERE TO_CHAR(CURRENT_TIMESTAMP - DATE_RATED) < TO_CHAR(7)
GROUP BY itemid;
TLDR:
TO_CHAR(CURRENT_TIMESTAMP - DATE_RATED) < TO_CHAR(7)
does this make sure the date_rated timestamp is less than a week old?
It would make more sense to say
WHERE date_rated > sysdate - interval '7' day
if you want data from the last 168 hours. You may want
WHERE date_rated > trunc(sysdate) - interval '7' day
if you want data from any point in the day 7 days ago rather than caring about what time of day it is currently.
Wouldn't this work for you:
trunc(CURRENT_TIMESTAMP - DATE_RATED) < 7
This is assuming that DATE_RATED is a date field. If not, you need to convert it first with TO_DATE().