Limit result rows for minimal time intervals for PostgreSQL - sql

Background: I am running TeslaMate/Grafana for monitoring my car status, one of the gauges plots the battery level fetched from database. My server is located remotely and running in a Dock from an old NAS, so both query performance and network overhead matters.
I found the koisk page frequently hangs and by investigation, it might caused by the query -- two of the plots returns 10~100k rows of results from database. I want to limit the number of rows returned by SQL queries, as the plots certainly don't have that much precision for drawing such detailed intervals.
I tried to follow this answer and use row_number() to pop only 100-th rows of results, but more complicated issues turned up, that is, the time intervals among rows are not consistent.
The car has 4 status, driving / online / asleep / offline.
If the car is at driving status, the time interval could be less than 200ms as the car pushes the status whenever it has new data.
If the car is at online status, the time interval could be several minutes as the system actively fetches the status from the car.
Even worse, if the system thinks the car is going to sleep and need to stop fetching status (to avoid preventing the car to sleep), the interval could be 40 minutes maximum depend on settings.
If the car is in asleep/offline status, no data is recorded at all.
This obviously makes skipping every n-th rows a bad idea, as for case 2-4 above, lots of data points might missing so that Grafana cannot plot correct graph representing the battery level at satisfactory precision.
I wonder if there's any possible to skip the rows by time interval from a datetime field rather than row_number() without much overhead from the query? i.e., fetch every row with minimal 1000ms from the previous row.
E.g., I have following data in the table, I want the rows returned are row 1, 4 and 5.
row date
[1] 1610000001000
[2] 1610000001100
[3] 1610000001200
[4] 1610000002000
[5] 1610000005000
The current (problematic) method I am using is as follows:
SELECT $__time(t.date), t.battery_level AS "SOC [%]"
FROM (
SELECT date, battery_level, row_number() OVER(ORDER BY date ASC) AS row
FROM (
SELECT battery_level, date
FROM positions
WHERE car_id = $car_id AND $__timeFilter(date)
UNION ALL
SELECT battery_level, date
FROM charges c
JOIN charging_processes p ON p.id = c.charging_process_id
WHERE $__timeFilter(date) AND p.car_id = $car_id) AS data
ORDER BY date ASC) as t
WHERE t.row % 100 = 0;
This method clearly gives problem that only returns alternate rows instead of what I wanted (given the last row reads t.row % 2 = 0)
PS: please ignore the table structures and UNION from the sample code, I haven't dig deep enough to the tables which could be other tweaks but irrelevant to this question anyway.
Thanks in advance!

You can use a recursive CTE:
WITH RECURSIVE rec(cur_row, cur_date) AS (
(
SELECT row, date
FROM t
ORDER BY date
LIMIT 1
)
UNION ALL
(
SELECT row, date
FROM t
JOIN rec
ON t.date >= cur_date + 1000
ORDER BY t.date
LIMIT 1
)
)
SELECT *
FROM rec;
cur_row
cur_date
1
1610000001000
4
1610000002000
5
1610000005000
View on DB Fiddle
Using a function instead would probably be faster:
CREATE OR REPLACE FUNCTION f() RETURNS SETOF t AS
$$
DECLARE
row t%ROWTYPE;
cur_date BIGINT;
BEGIN
FOR row IN
SELECT *
FROM t
ORDER BY date
LOOP
IF row.date >= cur_date + 1000 OR cur_date IS NULL
THEN
cur_date := row.date;
RETURN NEXT row;
END IF;
END LOOP;
END;
$$ LANGUAGE plpgsql;
SELECT *
FROM f();
row
date
1
1610000001000
4
1610000002000
5
1610000005000

Related

SQL script to find previous value, not necessarily previous row

is there a way in SQL to find a previous value, not necessarily in the previous row, within the same SELECT statement?
See picture below. I'd like to add another column, ELAPSED, that calculates the time difference between TIMERSTART, but only when DEVICEID is the same, and I_TYPE is viewDisplayed. e.g. subtract 1 from 2, store difference in 3, store 0 in 4 because i_type is not viewDisplayed, subtract 2 from 5, store difference in 6, and so on.
It has to be a statement, I can't use a stored procedure in this case.
SELECT DEVICEID, I_TYPE, TIMERSTART,
O AS ELAPSED -- CASE WHEN <CONDITION> THEN TIMEDIFF() ELSE 0 END AS ELAPSED
FROM CLIENT_USAGE
ORDER BY TIMERSTART ASC
I'm using SAP HANA DB, but it works pretty much like the latest version of MS-SQL. So, if you know how to make it work in SQL, I can make it work in HANA.
You can make a subquery to find the last time entered previous to the row in question.
select deviceid, i_type, timerstart, (timerstart - timerlast) as elapsed.
from CLIENT_USAGE CU
join ( select top 1 timerstart as timerlast
from CLIENT_USAGE C
where (C.i_type = CU.i_type) and
(C.deviceid = CU.deviceid) and (C.timerstart < CU.timerstart)
order by C.timerstart desc
) as temp1
on temp1.i_type = CU.i_type
order by timerstart asc
This is a rough sketch of what the sql should look like I do not know what your primary key is on this table if it is i_type or i_type and deviceid. But this should help with how to atleast calculate the field. I do not think it would be necessary to store the value unless this table is very large or the hardware being used is very slow. It can be calculated rather easily each time this query is run.
SAP HANA supports window functions:
select DEVICEID,
TIMERSTART,
lag(TIMERSTART) over (partition by DEVICEID order by TIMERSTART) as previous_start
from CLIENT_USAGE
Then you can wrap this in parentheses and manipulate the data to your hearts' content

Oracle SQL last n records

i have read tons of articles regarding last n records in Oracle SQL by using rownum functionality, but on my case it does not give me the correct rows.
I have 3 columns in my table: 1) message (varchar), mes_date (date) and mes_time (varchar2).
Inside lets say there is 3 records:
Hello world | 20-OCT-14 | 23:50
World Hello | 21-OCT-14 | 02:32
Hello Hello | 20-OCT-14 | 23:52
I want to get the last 2 records ordered by its date and time (first row the oldest, and second the newest date/time)
i am using this query:
SELECT *
FROM (SELECT message
FROM messages
ORDER
BY MES_DATE, MES_TIME DESC
)
WHERE ROWNUM <= 2 ORDER BY ROWNUM DESC;
Instead of getting row #3 as first and as second row #2 i get row #1 and then row #3
What should i do to get the older dates/times on top follow by the newest?
Maybe that helps:
SELECT *
FROM (SELECT message,
mes_date,
mes_time,
ROW_NUMBER() OVER (ORDER BY TO_DATE(TO_CHAR(mes_date, 'YYYY-MM-DD') || mes_time, 'YYYY-MM-DD HH24:MI') DESC) rank
FROM messages
)
WHERE rank <= 2
ORDER
BY rank
I am really sorry to disappoint - but in Oracle there's no such thing as "the last two records".
The table structure does not allocate data at the end, and does not keep a visible property of time (the only time being held is for the sole purpose of "flashback queries" - supplying results as of point in time, such as the time the query started...).
The last inserted record is not something you can query using the database.
What can you do? You can create a trigger that orders the inserted records using a sequence, and select based on it (so SELECT * from (SELECT * FROM table ORDER BY seq DESC) where rownum < 3) - that will assure order only if the sequence CACHE value is 1.
Notice that if the column that contains the message date does not have many events in a second, you can use that column, as the other solution suggested - e.g. if you have more than 2 events that arrive in a second, the query above will give you random two records, and not the actual last two.
AGAIN - Oracle will not be queryable for the last two rows inserted since its data structure do not managed orders of inserts, and the ordering you see when running "SELECT *" is independent of the actual inserts in some specific cases.
If you have any questions regarding any part of this answer - post it down here, and I'll focus on explaining it in more depth.
select * from table
minus
select * from table
where rownum<=(select count(*) from table)-n

Identify groups of rows in close proximity

I am doing a project for school and have been given a database over gps-recordings for three people during the course of a week. I am trying to group these recordings into trips based on the time between them. If a recording is within 300 seconds from the recording before it, they are considered to be part of the same trip, otherwise, they are considered part of different trips.
So far I have managed to calculate the time difference between a recording on row n and the one on row n-1 and I am now trying to create a function for merging the recordings intro trips. This would have been real easy in another programming language, but in this course we are using PostgreSQL which I am not that well versed in.
To solve this, I am trying to create a function with a variable that increases every time the time difference between two recordings is greater than 300 seconds and assigns each row to a trip based on the variable. This is as far as I have currently gotten, although at the moment, the variable resets X all the time, thus assigning all rows to trip 1...
CREATE OR REPLACE FUNCTION tripmerge(time_diff double precision)
RETURNs integer AS $$
declare
X integer := 1;
ID integer;
BEGIN
IF time_diff < 300 THEN
ID = X;
ELSE
ID =X;
X:=X+1;
END IF;
RETURN ID;
END;$$
LANGUAGE plpgsql;
How do I change so X does not reset all the time? I am using PostgreSQL 9.1.
EDIT:
This is the table I am working with:
curr_rec (timestamp), prev_rec (timestamp), time_diff (double precision)
With this being a sample of the dataset:
'2013-11-14 05:22:33.991+01',null ,null
'2013-11-14 09:15:40.485+01','2013-11-14 05:22:33.991+01',13986.494
'2013-11-14 09:17:04.837+01','2013-11-14 09:15:40.485+01',84.352
'2013-11-14 09:17:43.055+01','2013-11-14 09:17:04.837+01',38.218
'2013-11-14 09:23:24.205+01','2013-11-14 09:17:43.055+01',341.15
The expected result would add a column:
tripID
1
2
2
2
3
And I think this fiddle should be working: http://sqlfiddle.com/#!1/4e3e5/1/0
This query uses only curr_rec, not the other redundant, precomputed columns:
SELECT 1 + count(step OR NULL) OVER (ORDER BY curr_rec) AS trip_id
FROM (
SELECT curr_rec
,lag(curr_rec) OVER (ORDER BY curr_rec) AS prev_rec
,curr_rec - lag(curr_rec) OVER (ORDER BY curr_rec)
> interval '5 min' AS step
FROM timestamps
) x;
Key features are:
The window function lag(), which I use to see if the previous row is more than 5 minutes ago. (Just using an interval for the comparison, no need to extract seconds)
The window aggregate function count() - that's just the basic aggregate function with an OVER clause.
The expression step OR NULL only leaves TRUE or NULL, where only TRUE is counted in a running count, thereby arriving at your desired result.
SQL Fiddle (building on the one you provided).

Enhancing Performance

I'm not as clued up on shortcuts in SQL so I was hoping to utilize the brainpower on here to help speed up a query I'm using. I'm currently using Oracle 8i.
I have a query:
SELECT
NAME_CODE, ACTIVITY_CODE, GPS_CODE
FROM
(SELECT
a.NAME_CODE, b.ACTIVITY_CODE, a.GPS_CODE,
ROW_NUMBER() OVER (PARTITION BY a.GPS_DATE ORDER BY b.ACTIVITY_DATE DESC) AS RN
FROM GPS_TABLE a, ACTIVITY_TABLE b
WHERE a.NAME_CODE = b.NAME_CODE
AND a.GPS_DATE >= b.ACTIVITY_DATE
AND TRUNC(a.GPS_DATE) > TRUNC(SYSDATE) - 2)
WHERE
RN = 1
and this takes about 7 minutes give or take 10 seconds to run.
Now the GPS_TABLE is currently 6.586.429 rows and continues to grow as new GPS coordinates are put into the system, each day it grows by about 8.000 rows in 6 columns.
The ACTIVITY_TABLE is currently 1.989.093 rows and continues to grow as new activities are put into the system, each day it grows by about 2.000 rows in 31 columns.
So all in all these are not small tables and I understand that there will always be a time hit running this or similar queries. As you can see I'm already limiting it to only the last 2 days worth of data, but anything to speed it up would be appreciated.
Your strongest filter seems to be the filter on the last 2 days of GPS_TABLE. It should filter the GPS_TABLE to about 15k rows. Therefore one of the best candidate for improvement is an index on the column GPS_DATE.
You will find that your filter TRUNC(a.GPS_DATE) > TRUNC(SYSDATE) - 2 is equivalent to a.GPS_DATE > TRUNC(SYSDATE) - 2, therefore a simple index on your column will work if you change the query. If you can't change it, you could add a function-based index on TRUNC(GPS_DATE).
Once you have this index in place, we need to access the rows in ACTIVITY_TABLE. The problem with your join is that we will get all the old activities and therefore a good portion of the table. This means that the join as it is will not be efficient with index scans.
I suggest you define an index on ACTIVITY_TABLE(name_code, activity_date DESC) and a PL/SQL function that will retrieve the last activity in the least amount of work using this index specifically:
CREATE OR REPLACE FUNCTION get_last_activity (p_name_code VARCHAR2,
p_gps_date DATE)
RETURN ACTIVITY_TABLE.activity_code%type IS
l_result ACTIVITY_TABLE.activity_code%type;
BEGIN
SELECT activity_code
INTO l_result
FROM (SELECT activity_code
FROM activity_table
WHERE name_code = p_name_code
AND activity_date <= p_gps_date
ORDER BY activity_date DESC)
WHERE ROWNUM = 1;
RETURN l_result;
END;
Modify your query to use this function:
SELECT a.NAME_CODE,
a.GPS_CODE,
get_last_activity(a.name_code, a.gps_date)
FROM GPS_TABLE a
WHERE trunc(a.GPS_DATE) > trunc(sysdate) - 2
Optimising an SQL query is generally done by:
Add some indexes
Try a different way to get the same information
So, start by adding an index for ACTIVITY_DATE, and perhaps some other fields that are used in the conditions.

Postgres SQL select a range of records spaced out by a given interval

I am trying to determine if it is possible, using only sql for postgres, to select a range of time ordered records at a given interval.
Lets say I have 60 records, one record for each minute in a given hour. I want to select records at 5 minute intervals for that hour. The resulting rows should be 12 records each one 5 minutes apart.
This is currently accomplished by selecting the full range of records and then looping thru the results and pulling out the records at the given interval. I am trying to see if I can do this purly in sql as our db is large and we may be dealing with tens of thousands of records.
Any thoughts?
Yes you can. Its really easy once you get the hang of it. I think its one of jewels of SQL and its especially easy in PostgreSQL because of its excellent temporal support. Often, complex functions can turn into very simple queries in SQL that can scale and be indexed properly.
This uses generate_series to draw up sample time stamps that are spaced 1 minute apart. The outer query then extracts the minute and uses modulo to find the values that are 5 minutes apart.
select
ts,
extract(minute from ts)::integer as minute
from
( -- generate some time stamps - one minute apart
select
current_time + (n || ' minute')::interval as ts
from generate_series(1, 30) as n
) as timestamps
-- extract the minute check if its on a 5 minute interval
where extract(minute from ts)::integer % 5 = 0
-- only pick this hour
and extract(hour from ts) = extract(hour from current_time)
;
ts | minute
--------------------+--------
19:40:53.508836-07 | 40
19:45:53.508836-07 | 45
19:50:53.508836-07 | 50
19:55:53.508836-07 | 55
Notice how you could add an computed index on the where clause (where the value of the expression would make up the index) could lead to major speed improvements. Maybe not very selective in this case, but good to be aware of.
I wrote a reservation system once in PostgreSQL (which had lots of temporal logic where date intervals could not overlap) and never had to resort to iterative methods.
http://www.amazon.com/SQL-Design-Patterns-Programming-Focus/dp/0977671542 is an excellent book that goes has lots of interval examples. Hard to find in book stores now but well worth it.
Extract the minutes, convert to int4, and see, if the remainder from dividing by 5 is 0:
select *
from TABLE
where int4 (date_part ('minute', COLUMN)) % 5 = 0;
If the intervals are not time based, and you just want every 5th row; or
If the times are regular and you always have one record per minute
The below gives you one record per every 5
select *
from
(
select *, row_number() over (order by timecolumn) as rown
from tbl
) X
where mod(rown, 5) = 1
If your time records are not regular, then you need to generate a time series (given in another answer) and left join that into your table, group by the time column (from the series) and pick the MAX time from your table that is less than the time column.
Pseudo
select thetimeinterval, max(timecolumn)
from ( < the time series subquery > ) X
left join tbl on tbl.timecolumn <= thetimeinterval
group by thetimeinterval
And further join it back to the table for the full record (assuming unique times)
select t.* from
tbl inner join
(
select thetimeinterval, max(timecolumn) timecolumn
from ( < the time series subquery > ) X
left join tbl on tbl.timecolumn <= thetimeinterval
group by thetimeinterval
) y on tbl.timecolumn = y.timecolumn
How about this:
select min(ts), extract(minute from ts)::integer / 5
as bucket group by bucket order by bucket;
This has the advantage of doing the right thing if you have two readings for the same minute, or your readings skip a minute. Instead of using min even better would be to use one of the the first() aggregate functions-- code for which you can find here:
http://wiki.postgresql.org/wiki/First_%28aggregate%29
This assumes that your five minute intervals are "on the fives", so to speak. That is, that you want 07:00, 07:05, 07:10, not 07:02, 07:07, 07:12. It also assumes you don't have two rows within the same minute, which might not be a safe assumption.
select your_timestamp
from your_table
where cast(extract(minute from your_timestamp) as integer) in (0,5);
If you might have two rows with timestamps within the same minute, like
2011-01-01 07:00:02
2011-01-01 07:00:59
then this version is safer.
select min(your_timestamp)
from your_table
group by (cast(extract(minute from your_timestamp) as integer) / 5)
Wrap either of those in a view, and you can join it to your base table.