sql sliding window - finding max value over interval - sql

i have a sliding window problem. specifically, i do not know where my window should start and where it should end. i do know the size of my interval/window.
i need to find the start/end of the window that delivers the best (or worst, depending on how you look at it) case scenario.
here is an example dataset:
value | tstamp
100 | 2013-02-20 00:01:00
200 | 2013-02-20 00:02:00
300 | 2013-02-20 00:03:00
400 | 2013-02-20 00:04:00
500 | 2013-02-20 00:05:00
600 | 2013-02-20 00:06:00
500 | 2013-02-20 00:07:00
400 | 2013-02-20 00:08:00
300 | 2013-02-20 00:09:00
200 | 2013-02-20 00:10:00
100 | 2013-02-20 00:11:00
let's say i know that my interval needs to be 5 minutes. so, i need to know the value and timestamps included in the 5 minute interval where the sum of 'value' is the highest. in my above example, the rows from '2013-02-20 00:04:00' to '2013-02-20 00:08:00' would give me a sum of 400+500+600+500+400 = 2400, which is the highest value over 5 minutes in that table.
i'm not opposed to using multiple tables if needed. but i'm trying to find a "best case scenario" interval. results can go either way, as long as they net the interval. if i get all data points over that interval, it still works. if i get the start and end points, i can use those as well.
i've found several sliding window problems for SQL, but haven't found any where the window size is the known factor, and the starting point is unknown.

SELECT *,
(
SELECT SUM(value)
FROM mytable mi
WHERE mi.tstamp BETWEEN m.tstamp - '5 minute'::INTERVAL AND m.tstamp
) AS maxvalue
FROM mytable m
ORDER BY
maxvalue DESC
LIMIT 1
In PostgreSQL 11 and above:
SELECT SUM(value) OVER (ORDER BY tstamp RANGE '5 minute' PRECEDING) AS maxvalue,
*
FROM mytable m
ORDER BY
maxvalue DESC
LIMIT 1

Related

Extract 30 minutes from timestamp and group it by 30 mins time interval -PGSQL

In PostgreSQL I am extracting hour from the timestamp using below query.
select count(*) as logged_users, EXTRACT(hour from login_time::timestamp) as Hour
from loginhistory
where login_time::date = '2021-04-21'
group by Hour order by Hour;
And the output is as follows
logged_users | hour
--------------+------
27 | 7
82 | 8
229 | 9
1620 | 10
1264 | 11
1990 | 12
1027 | 13
1273 | 14
1794 | 15
1733 | 16
878 | 17
126 | 18
21 | 19
5 | 20
3 | 21
1 | 22
I want the same output for same SQL for 30 mins. Please suggest
SELECT to_timestamp((extract(epoch FROM login_time::timestamp)::bigint / 1800) * 1800)::timestamp AS interval_30_min
, count(*) AS logged_users
FROM loginhistory
WHERE login_time::date = '2021-04-21' -- inefficient!
GROUP BY 1
ORDER BY 1;
Extracting the epoch gets the number of seconds since the epoch. Integer division truncates. Multiplying back effectively rounds down, achieving the same as date_trunc() for arbitrary time intervals.
1800 because 30 minutes contain 1800 seconds.
Detailed explanation:
Truncate timestamp to arbitrary intervals
The cast to timestamp makes me wonder about the actual data type of login_time? If it's timestamptz, the cast depends on your current time zone setting and sets you up for surprises if that setting changes. See:
How do I match an entire day to a datetime field?
Subtract hours from the now() function
Ignoring time zones altogether in Rails and PostgreSQL
Depending on the actual data type, and exact definition of your date boundaries, there is a more efficient way to phrase your WHERE clause.
You can change the column on which you're aggregating to use the minute too:
select
count(*) as logged_users,
CONCAT(EXTRACT(hour from login_time::timestamp), '-', CASE WHEN EXTRACT(minute from login_time::timestamp) < 30 THEN 0 ELSE 30 END) as HalfHour
from loginhistory
where login_time::date = '2021-04-21'
group by HalfHour
order by HalfHour;

How to average data on periods from a table in SQL

I'm trying to average data on specific period of time and then, averaging a date between from these result.
Having data like:
value | datetime
-------+------------------------
15 | 2015-08-16 01:00:40+02
22 | 2015-08-16 01:01:40+02
16 | 2015-08-16 01:02:40+02
19 | 2015-08-16 01:03:40+02
21 | 2015-08-16 01:04:40+02
18 | 2015-08-16 01:05:40+02
29 | 2015-08-16 01:06:40+02
16 | 2015-08-16 01:07:40+02
16 | 2015-08-16 01:08:40+02
15 | 2015-08-16 01:09:40+02
I would like to obtain something like in one query:
value | datetime
-------+------------------------
18.6 | 2015-08-16 01:03:00+02
18.8 | 2015-08-16 01:08:00+02
where value corresponding with the first 5 initial values averaged and the datetime with the middle (or average) of the 5 intial datetimes. 5 representing the interval n.
I saw some posts that put me on the track with avg, group by and averaging date format in SQL but I'm still not able to find out what to do exactly.
I'm working under PostgreSQL 9.4
You would need to share more information but here is a way to do it. Here is more information on it : HERE
mysql> SELECT AVG(value), AVG(datetime)
FROM database.table
WHERE datetime > date1
AND datetime < date2;
Something like
SELECT
to_timestamp(round(AVG(EXTRACT(epoch from datetime)))) as middleDate,
avg(value) AS avgValue
FROM
myTable
GROUP BY
(id) / ((SELECT Count(*) FROM myTable) / 100);
filled roughtly my requirements, with 100 acting on averaged intervals length (globally equals to the outputed lines).

TSQL reduce the amount of data returned by a query to a parametric defined sample

I have a table containing a large amount of data which is stored on change.
tbl_bigOne
----------
timestamp | var01 | var02 | ...
2016-01-14 15:20:21 | 10.1 | 100.6 | ...
2016-01-14 15:20:26 | 11.2 | 110.3 | ...`
2016-01-14 15:21:27 | 52.1 | 620.1 | ...
2016-01-14 15:35:00 | 13.5 | 230.6 | ...
...
2016-01-15 09:18:01 | 94.4 | 140.0 | ...
2016-01-15 10:01:15 | 105.3 | 188.7 | ...
...
and so on for years of data
What I would like to obtain is a query/stored procedure that given two datetime references (date_from and date_to) gives the required selected data.
Now, the query just mentioned is pretty straight forward what I would also like to achieve is to set the maximum number of rows returned per day (if data is available) while doing the average of the values.
Let's give a few examples:
date_from: 2016-01-14 00:00:00
date_to: 2016-01-20 23:59:59
max_points:12
in this case the time windows is of 7 days and in this one i would like to have a maximum of 12 rows for each days of the 7 day window, giving a max total of 84 rows whilst doing the average from all the grouping done since, the data for each day is now partitioned by 12.
It is possible to see this partitioning as if every hour worth of data for that specific day is averaged, generating one row of the 12 required for a day.
date_from: 2016-01-14 00:00:00
date_to: 2016-01-14 23:59:59
max_points:1440
in this case the time window is one day worth and, if available, i would like to have a maximum of 1440 rows (for each day) for the selected period.
In this way the parameter defines the maximum number of rows for each day. The minimum time window is one day nothing below that.
Can something like this be achieved just using TSQL?
Thank you.
edit for taking care of the observations raised by #Thorsten Kettner
Use the analytic function ROW_NUMBER() to number the matching rows per day. Then only keep rows up to the given limit. If you want the rows arbitrarily chosen when there exist more than needed, then number the rows in random order using NEWID().
select timestmp, var01, var02, var03
from
(
select
mytable.*,
row_number() over (partition by convert(date, timestmp) order by newid()) as rn
from mytable
where convert(date, timestmp) between #start_date and #end_date
) numbered
where rn <= #limit
order by timestmp;

Postgres count items by interval

I am trying to get the count of items given an interval with no start or stop times specified. I would imagine you could do it with window functions but i am not too sure how to go about it.
The problem is as follows i would like to get the number of times people login to a website within a given an arbitrary interval say 20 mins.
Example A
1. 2015-06-24 23:00:00
2. 2015-06-24 23:45:00
3. 2015-06-25 00:00:00
4. 2015-06-25 00:15:00
5. 2015-06-25 00:17:00
6. 2015-06-25 00:21:00
In the above example I would highlight items (2,3),(3,4,5), (4,5,6), (5,6) the output I would like is the
start,end,count
2015-06-25 23:45:00,2015-06-25 00:00:00,2
2015-06-25 00:00:00,2015-06-25 00:17:00,3
2015-06-25 00:15:00,2015-06-25 00:21:00,3
Also only keep the data where count >= 2 otherwise everything will be a valid grouping
Now is a window function the way i should go, cte or is there another practice to adopt?
Try this query with self join:
select a.id, a.log_at, max(b.log_at), count(1)
from logs a
join logs b on b.log_at >= a.log_at and b.log_at <= a.log_at+ '20 m'::interval
group by 1, 2
having count(1) > 1
order by 1
You can get each "day" groups with counts by a query like:
SELECT MIN(last_seen_at), MAX(last_seen_at), COUNT(*)
FROM user_kinds
GROUP BY DATE(last_seen_at)
ORDER BY DATE(last_seen_at) DESC LIMIT 5;
Which on my sample data set yields a result like:
2015-06-26 00:12:30.476548 | 2015-06-26 22:06:25.134322 | 69
2015-06-25 00:46:03.392651 | 2015-06-25 23:49:46.616964 | 14
2015-06-24 14:22:33.578176 | 2015-06-24 23:39:01.32241 | 10
2015-06-23 01:42:53.438663 | 2015-06-23 20:12:21.864601 | 2
(5 rows)

Group records by time

I have a table containing a datetime column and some misc other columns. The datetime column represents an event happening. It can either contains a time (event happened at that time) or NULL (event didn't happen)
I now want to count the number of records happening in specific intervals (15 minutes), but do not know how to do that.
example:
id | time | foreign_key
1 | 2012-01-01 00:00:01 | 2
2 | 2012-01-01 00:02:01 | 4
3 | 2012-01-01 00:16:00 | 1
4 | 2012-01-01 00:17:00 | 9
5 | 2012-01-01 00:31:00 | 6
I now want to create a query that creates a result set similar to:
interval | COUNT(id)
2012-01-01 00:00:00 | 2
2012-01-01 00:15:00 | 2
2012-01-01 00:30:00 | 1
Is this possible in SQL or can anyone advise what other tools I could use? (e.g. exporting the data to a spreadsheet program would not be a problem)
Give this a try:
select datetime((strftime('%s', time) / 900) * 900, 'unixepoch') interval,
count(*) cnt
from t
group by interval
order by interval
Check the fiddle here.
I have limited SQLite background (and no practice instance), but I'd try grabbing the minutes using
strftime( FORMAT, TIMESTRING, MOD, MOD, ...)
with the %M modifier (http://souptonuts.sourceforge.net/readme_sqlite_tutorial.html)
Then divide that by 15 and get the FLOOR of your quotient to figure out which quarter-hour you're in (e.g., 0, 1, 2, or 3)
cast(x as int)
Getting the floor value of a number in SQLite?
Strung together it might look something like:
Select cast( (strftime( 'YYYY-MM-DD HH:MI:SS', your_time_field, '%M') / 15) as int) from your_table
(you might need to cast before you divide by 15 as well, since strftime probably returns a string)
Then group by the quarter-hour.
Sorry I don't have exact syntax for you, but that approach should enable you to get the functional groupings, after which you can massage the output to make it look how you want.