How to iterate over table and delete rows based on specific condition on previous row - PostgreSQL - sql

I have a table of ships, which consists of:
row id (number)
ship id (character varying)
timestamp (timestamp in yyyy-mm-dd hh:mm:ss format)
Timestamp is the time that the specific ship (ship id) emitted a signal during its course. The table looks like this:
What I need to do (in PostgreSQL - pgAdmin) is for every ship_id, find if a signal has been emitted 5 seconds or less after another signal from the same ship, and then delete the row with the latter.
In the example table shown above, for the ship "foo" the signals are almost 9 minutes apart so it's all good, but for the ship "bar" the signal with row_id 4 was emitted 3 seconds after the previous one with row_id 3, so it needs to go.
Thanks a lot in advance.

Windowing functions Lag/Lead in this case will do the trick.
Add a LAG to calculate the difference between timestamps for the same ships. This will allow you to calculate the time difference for the same ship and its most recent posting.
Use that to filter out what to delete
SELECT ROW_ID, SHIP_ID, EXTRACT(EPOCH FROM (TIMESTAMP - LAG (TIMESTAMP,1) OVER (PARTITION BY SHIP_ID ORDER BY TIMESTAMP ASC))) AS SECONDS_DIFF
--THEN SOMETHING LIKE THIS TO FIND WHICH ROWS TO DELETE
DELETE FROM SHIP_TABLE WHERE ROW_ID IN
(SELECT ROW_ID FROM
(SELECT ROW_ID, SHIP_ID, EXTRACT(EPOCH FROM (TIMESTAMP - LAG (TIMESTAMP,1) OVER (PARTITION BY SHIP_ID ORDER BY TIMESTAMP ASC))) AS SECONDS_DIFF) SUB_1
WHERE SECONDS_DIFF <= 10 --THRESHOLD
) SUB_2

Related

How to compare the value of one row with the upper row in one column of an ordered table?

I have a table in PostgreSQL that contains the GPS points from cell phones. It has an integer column that stores epoch (the number of seconds from 1960). I want to order the table based on time (epoch column), then, break the trips to sub trips when there is no GPS record for more than 2 minutes.
I did it with GeoPandas. However, it is too slow. I want to do it inside the PostgreSQL. How can I compare each row of the ordered table with the previous row (to see if the epoch has a difference of 2 minutes or more)?
In fact, I do not know how to compare each row with the upper row.
You can use lag():
select t.*
from (select t.*,
lag(timestamp_epoch) over (partition by trip order by timestamp_epoch) as last_timestamp_epoch
from t
) t
where last_timestamp_epoch < timestamp_epoch - 120
I want to order the table based on time (epoch column), then, break the trips to sub trips when there is no GPS record for more than 2 minutes.
After comparing to the previous (or next) row, with the window function lag() (or lead()), form groups based on the gaps to get sub trip numbers:
SELECT *, count(*) FILTER (WHERE step) OVER (PARTITION BY trip ORDER BY timestamp_epoch) AS sub_trip
FROM (
SELECT *
, (timestamp_epoch - lag(timestamp_epoch) OVER (PARTITION BY trip ORDER BY timestamp_epoch)) > 120 AS step
FROM tbl
) sub;
Further reading:
Select longest continuous sequence

Obtain latest record for a given second Postgres

I have data with millisecond precision timestamp. I want to only filter for the most recent timestamp within a given second. Ie. records (2020-07-13 5:05.38.009, event1), (2020-07-13 5:05.38.012, event2) should only retrieve the latter.
I've tried the following:
SELECT
timestamp as time, event as value, event_type as metric
FROM
table
GROUP BY
date_trunc('second', time)
But then I'm asked to group by event as well and I see all the data (as if no group by was provided)
In Postgres, you can use distinct on:
select distinct on (date_trunc('second', time)) t.*
from t
order by time desc;

How to compare time stamps from consecutive rows

I have a table that I would like to sort by a timestamp desc and then compare all consecutive rows to determine the difference between each row. From there, I would like to find all the rows whose difference is greater than ~2hours.
I'm stuck on how to actually compare consecutive rows in a table. Any help would be much appreciated.
I'm using Oracle SQL Developer 3.2
You didn't show us your table definition, but something like this:
select *
from (
select t.*,
t.timestamp_column,
t.timestamp_column - lag(timestamp_column) over (order by timestamp_column) as diff
from the_table t
) x
where diff > interval '2' hour;
This assumes that timestamp_column is defined as timestamp not date (otherwise the result of the difference wouldn't be an interval)

examine if one time series column of table has two adjacent time points which have interval larger than certain length

I am dealing with data preprocessing on a table containing time series column
toy example Table A
timestamp value
12:30:24 1
12:32:21 3
12:33:21 4
timestamp is ordered and always go incrementally
Is that possible to define an function or something else to return "True expression" when table has two adjacent time points which have interval larger than certain length and return "False" otherwise?
I am using postgresql, thank you
SQL Fiddle
select bool_or(bigger_than) as bigger_than
from (
select
time - lag(time) over (order by time)
>
interval '1 minute' as bigger_than
from table_a
) s;
bigger_than
-------------
t
bool_or will stop searching as soon as it finds the first true value.
http://www.postgresql.org/docs/current/static/functions-aggregate.html
Your sample data shows a time value. But it works the same for a timestamp
Something like this:
select count(*) > 0
from (
select timestamp,
lag(timestamp) over (order by value) as prev_ts
from table_a
) t
where timestamp - prev_ts < interval '1' minute;
It calculates the difference between a timestamp and it's "previous" timestamp. The order of the timestamps is defined by the value column. The outer query then counts the number of rows where the difference is smaller than 1 minute.
lag() is called a window functions. More details on those can be found in the manual:
http://www.postgresql.org/docs/current/static/tutorial-window.html

Every 10th row based on timestamp

I have a table with signal name, value and timestamp. these signals where recorded at sampling rate of 1sample/sec. Now i want to plot a graph on values of months, and it is becoming very heavy for the system to perform it within seconds. So my question is " Is there any way to view 1 value/minute in other words i want to see every 60th row.?"
You can use the row_number() function to enumerate the rows, and then use modulo arithmetic to get the rows:
select signalname, value, timestamp
from (select t.*,
row_number() over (order by timestamp) as seqnum
from table t
) t
where seqnum % 60 = 0;
If your data really is regular, you can also extract the seconds value and check when that is 0:
select signalname, value, timestamp
from table t
where datepart(second, timestamp) = 0
This assumes that timestamp is stored in an appropriate date/time format.
Instead of sampling, you could use the one minute average for your plot:
select name
, min(timestamp)
, avg(value)
from Yourtable
group by
name
, datediff(minute, '2013-01-01', timestamp)
If you are charting months, even the hourly average might be detailed enough.