Unable to calculate difference between CTE subquery outputs for use in larger PostgreSQL query output column - sql

Using PostgreSQL v9.4.5 from Shell I created a database called moments in psql by running create database moments. I then created a moments table:
CREATE TABLE moments
(
id SERIAL4 PRIMARY KEY,
moment_type BIGINT NOT NULL,
flag BIGINT NOT NULL,
time TIMESTAMP NOT NULL,
UNIQUE(moment_type, time)
);
INSERT INTO moments (moment_type, flag, time) VALUES (1, 7, '2016-10-29 12:00:00');
INSERT INTO moments (moment_type, flag, time) VALUES (1, -30, '2016-10-29 13:00:00');
INSERT INTO moments (moment_type, flag, time) VALUES (3, 5, '2016-10-29 14:00:00');
INSERT INTO moments (moment_type, flag, time) VALUES (2, 9, '2016-10-29 18:00:00');
INSERT INTO moments (moment_type, flag, time) VALUES (2, -20, '2016-10-29 17:00:00');
INSERT INTO moments (moment_type, flag, time) VALUES (3, 10, '2016-10-29 16:00:00');
I run select * from moments to view the table:
Moments Table
id | moment_type | flag | time
----+-------------+------+---------------------
1 | 1 | 7 | 2016-10-29 12:00:00
2 | 1 | -30 | 2016-10-29 13:00:00
3 | 3 | 5 | 2016-10-29 14:00:00
4 | 2 | 9 | 2016-10-29 18:00:00
5 | 2 | -20 | 2016-10-29 17:00:00
6 | 3 | 10 | 2016-10-29 16:00:00
I then try to write an SQL query that produces the following output, whereby for each pair of duplicate moment_type values it returns the difference between the flag value of the moment_type having the most recent timestamp value, and the flag value of the second most recent timestamp value, and lists the results in ascending order by moment_type.
Expected SQL Query Output
moment_type | flag |
------------+------+
1 | -37 | (i.e. -30 - 7)
2 | 29 | (i.e. 9 - -20)
3 | 5 | (i.e. 10 - 5)
The SQL query that I came up with is as follows, which uses the WITH query to write multiple Common Table Expressions (CET) subqueries for use as temporary tables in the larger SELECT query at the end. I also use an SQL function to calculate the difference between two of the subquery outputs (alternatively I think I could have just used DIFFERENCE DIFFERENCE(most_recent_flag, second_most_recent_flag) AS flag instead of the function):
CREATE FUNCTION difference(most_recent_flag, second_most_recent_flag) RETURNS numeric AS $$
SELECT $1 - $2;
$$ LANGUAGE SQL;
-- get two flags that have the most recent timestamps
WITH two_most_recent_flags AS (
SELECT moments.flag
FROM moments
ORDER BY moments.time DESC
LIMIT 2
),
-- get one flag that has the most recent timestamp
most_recent_flag AS (
SELECT *
FROM two_most_recent_flags
ORDER BY flag DESC
LIMIT 1
),
-- get one flag that has the second most recent timestamp
second_most_recent_flag AS (
SELECT *
FROM two_most_recent_flags
ORDER BY flag ASC
LIMIT 1
)
SELECT DISTINCT ON (moments.moment_type)
moments.moment_type,
difference(most_recent_flag, second_most_recent_flag) AS flag
FROM moments
ORDER BY moment_type ASC
LIMIT 2;
But when I run the above SQL query in PostgreSQL, it returns the following error:
ERROR: column "most_recent_flag" does not exist
LINE 21: difference(most_recent_flag, second_most_recent_flag) AS fla...
Question
What techniques can I use and how may I apply them to overcome this error, and calculate and display the differences in the flag column to achieve the Expected SQL Query Output?
Note: Perhaps the Window Function may be used somehow as it performs calculations across table rows

Use the lag() window function:
select moment_type, difference
from (
select *, flag- lag(flag) over w difference
from moments
window w as (partition by moment_type order by time)
) s
where difference is not null
order by moment_type
moment_type | difference
-------------+------------
1 | -37
2 | 29
3 | 5
(3 rows)

One method is to use conditional aggregation. The window function row_number() can be used to identify the first and last time values:
select m.moment_type,
(max(case when seqnum_desc = 1 then flag end) -
min(case when seqnum_asc = 1 then flag end)
)
from (select m.*,
row_number() over (partition by m.moment_type order by m.time) as seqnum_asc,
row_number() over (partition by m.moment_type order by m.time desc) as seqnum_desc
from moments m
) m
group by m.moment_type;

Related

Get max value of binned time-interval

I have a 'requests' table with a 'time_request' column which has a timestamp for each request. I want to know the maximum amount of requests that i had in a single minute.
So im guessing i need to somehow 'group by' a 1m time interval, and then do some sort of MAX(COUNT(request_id))? Although nested aggregations are not allowed.
Will appreciate any help.
Table example:
request_id | time_request
------------------+---------------------
ab1 | 2021-03-29 16:20:05
ab2 | 2021-03-29 16:20:20
bc3 | 2021-03-31 20:34:07
fw3 | 2021-03-31 20:38:53
fe4 | 2021-03-31 20:39:53
Expected result: 2 (There were a maximum of 2 requests in a single minute)
Thanks!
You may use window function count and specify logical interval of one minute as the window boundary. It will calculate the count for each row and will account all the rows that are within one minute before.
Code for Postgres is below:
with a as (
select
id
, cast(ts as timestamp) as ts
from(values
('ab1', '2021-03-29 16:20:05'),
('ab2', '2021-03-29 16:20:20'),
('bc3', '2021-03-31 20:34:07'),
('fw3', '2021-03-31 20:38:53'),
('fe4', '2021-03-31 20:39:53')
) as t(id, ts)
)
, count_per_interval as (
select
a.*
, count(id) over (
order by ts asc
range between
interval '1' minute preceding
and current row
) as cnt_per_min
from a
)
select max(cnt_per_min)
from count_per_interval
| max |
| --: |
| 2 |
db<>fiddle here

Optimize the query of weekday statistics between two dates

I have a table with two fields: start_date and end_date. Now I want to count the total number of work overtime. I have created a new calendar table to maintain the working day status of the date.
table: workdays
id status
2020-01-01 4
2020-01-02 1
2020-01-03 1
2020-01-04 2
4: holiday, 1: weekday, 2: weekend
I created a function to calculate the weekdays between two dates (excluding weekends, holidays).
create or replace function get_workday_count (start_date in date, end_date in date)
return number is
day_count int;
begin
select count(0) into day_count from WORKDAYS
where TRUNC(ID) >= TRUNC(start_date)
and TRUNC(ID) <= TRUNC(end_date)
and status in (1, 3, 5);
return day_count;
end;
When I execute the following query statement, it takes about 5 minutes to display the results, erp_sj table has about 200000 rows of data.
select count(0) from ERP_SJ GET_WORKDAY_COUNT(start_date, end_date) > 5;
The fields used in query statements are indexed.
How to optimize? Or is there a better solution?
First of all, optimizing your function:
1.adding pragma udf (for faster execution in sql
2. Adding deterministic clause(for caching)
3. Replacing count(0) to count(*) (to allow cbo optimize count)
4. Replacing return number to int
create or replace function get_workday_count (start_date in date, end_date in date)
return int deterministic is
pragma udf;
day_count int;
begin
select count(*) into day_count from WORKDAYS w
where w.ID >= TRUNC(start_date)
and w.ID <= TRUNC(end_date)
and status in (1, 3, 5);
return day_count;
end;
Then you don't need to call your function in case of (end_date - start_date) < required number of days. Moreover, ideally it would be to use scalar subquery instead of function:
select count(*)
from ERP_SJ
where
case
when trunc(end_date) - trunc(start_date) > 5
then GET_WORKDAY_COUNT(trunc(start_date) , trunc(end_date))
else 0
end > 5
Or using subquery:
select count(*)
from ERP_SJ e
where
case
when trunc(end_date) - trunc(start_date) > 5
then (select count(*) from WORKDAYS w
where w.ID >= TRUNC(e.start_date)
and w.ID <= TRUNC(e.end_date)
and w.status in (1, 3, 5))
else 0
end > 5
WORKDAY_STATUSES table (just for completeness, not used below):
create table workday_statuses
( status number(1) constraint workday_statuses_pk primary key
, status_name varchar2(10) not null constraint workday_status_name_uk unique );
insert all
into workday_statuses values (1, 'Weekday')
into workday_statuses values (2, 'Weekend')
into workday_statuses values (3, 'Unknown 1')
into workday_statuses values (4, 'Holiday')
into workday_statuses values (5, 'Unknown 2')
select * from dual;
WORKDAYS table: one row for each day in 2020:
create table workdays
( id date constraint workdays_pk primary key
, status references workday_statuses not null )
organization index;
insert into workdays (id, status)
select date '2019-12-31' + rownum
, case
when to_char(date '2019-12-31' + rownum, 'Dy', 'nls_language = English') like 'S%' then 2
when date '2019-12-31' + rownum in
( date '2020-01-01', date '2020-04-10', date '2020-04-13'
, date '2020-05-08', date '2020-05-25', date '2020-08-31'
, date '2020-12-25', date '2020-12-26', date '2020-12-28' ) then 4
else 1
end
from xmltable('1 to 366')
where date '2019-12-31' + rownum < date '2021-01-01';
ERP_SJ table containing 30K rows with random data:
create table erp_sj
( id integer generated always as identity
, start_date date not null
, end_date date not null
, filler varchar2(100) );
insert into erp_sj (start_date, end_date, filler)
select dt, dt + dbms_random.value(0,7), dbms_random.string('x',100)
from ( select date '2019-12-31' + dbms_random.value(1,366) as dt
from xmltable('1 to 30000') );
commit;
get_workday_count() function:
create or replace function get_workday_count
( start_date in date, end_date in date )
return integer
deterministic -- Cache some results
parallel_enable -- In case you want to use it in parallel queries
as
pragma udf; -- Tell compiler to optimise for SQL
day_count integer;
begin
select count(*) into day_count
from workdays w
where w.id between trunc(start_date) and end_date
and w.status in (1, 3, 5);
return day_count;
end;
Notice that you should not truncate w.id, because all values have the time as 00:00:00 already. (I'm assuming that if end_date falls somewhere in the middle of a day, you want to count that day, so I have not truncated the end_date parameter.)
Test:
select count(*) from erp_sj
where get_workday_count(start_date, end_date) > 5;
COUNT(*)
--------
1302
Results returned in around 1.4 seconds.
Execution plan for the query within the function:
select count(*)
from workdays w
where w.id between trunc(sysdate) and sysdate +10
and w.status in (1, 3, 5);
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 1 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 1 |
|* 2 | FILTER | | 1 | | 7 |00:00:00.01 | 1 |
|* 3 | INDEX RANGE SCAN| WORKDAYS_PK | 1 | 7 | 7 |00:00:00.01 | 1 |
--------------------------------------------------------------------------------------------
Now try adding the function as a virtual column and indexing it:
create index erp_sj_workday_count_ix on erp_sj(workday_count);
select count(*) from erp_sj
where workday_count > 5;
Same result in 0.035 seconds. Plan:
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 5 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 5 |
|* 2 | INDEX RANGE SCAN| ERP_SJ_WORKDAY_COUNT_IX | 1 | 1302 | 1302 |00:00:00.01 | 5 |
-------------------------------------------------------------------------------------------------------
Tested in 19.0.0.
Edit: As Sayan pointed out, the index on the virtual column won't be automatically updated if there are any changes in WORKDAYS, so there is a risk of wrong results with this approach. However, if performance is critical you could work around it by rebuilding the index on ERP_SJ every time you updated WORKDAYS. Maybe you could do this in a statement-level trigger on WORKDAYS, or just through scheduled IT maintenance processes if updates are very infrequent and ERP_SJ isn't so big that an index rebuild is impractical. If the index is partitioned, rebuilding affected partitions could be an option.
Or, don't have an index and live with the 1.4 seconds query execution time.
I understand that the columns ID and status have indexes on them ( not functional index on TRUNC(ID) ). So use this query
SELECT count(0)
INTO day_count
FROM WORKDAYS
WHERE ID BETWEEN TRUNC(start_date) AND TRUNC(end_date)
AND status in (1, 3, 5);
in order to be able to exploit the index on date column ID also.
May be try Scalar Subquery Caching
(in case there are plenty erp_sj records with the same start_date and end_date)
select count(0) from ERP_SJ where
(select GET_WORKDAY_COUNT(start_date, end_date) from dual) > 5
You are dealing with a data warehouse query (not an OLTP query).
Some best practices says you should
get rid od functions - avoid contenxt switch (this could be somehow mitigated with the UDF pragma but why to use function if you don't need it?)
get rid of indexes - quick for few rows; slow for large number of records
get rid of caching - caching is basically a workaround for repeating same thing
So the data warehouse approach for the problem consists of two steps
Extend the Workday Table
The workday table can be with a little query extended with a new column MIN_END_DAY that defines for each (start) day the minimum threshold to reach the limit of 5 working days.
The query uses LEAD aggregate function to get the 4th leading working day (check the PARTITION BY clause that distincs between the working ays and other days.
For the non working days you simple takes the LAST_VALUE of the next working day.
Example
with wd as (
select ID, STATUS,
case when status in (1, 3, 5) then
lead(id,4) over (partition by case when status in (1, 3, 5) then 'workday' end order by id) /* 4 working days ahead */
end as min_stop_day
from workdays),
wd2 as (
select ID, STATUS,
last_value(MIN_STOP_DAY) ignore nulls over (order by id desc) MIN_END_DAY
from wd)
select ID, STATUS, MIN_END_DAY
from wd2
order by 1;
ID, STATUS, MIN_END_DAY
01.01.2020 00:00:00 4 08.01.2020 00:00:00
02.01.2020 00:00:00 1 08.01.2020 00:00:00
03.01.2020 00:00:00 1 09.01.2020 00:00:00
04.01.2020 00:00:00 2 10.01.2020 00:00:00
05.01.2020 00:00:00 2 10.01.2020 00:00:00
06.01.2020 00:00:00 1 10.01.2020 00:00:00
Join to the Base Table
Now you can simple join your base table with the the extended workday table on the start_day and filter rows by comparing the end_daywith the MIN_END_DAY
Query
with wd as (
select ID, STATUS,
case when status in (1, 3, 5) then
lead(id,4) over (partition by case when status in (1, 3, 5) then 'workday' end order by id)
end as min_stop_day
from workdays),
wd2 as (
select ID, STATUS,
last_value(MIN_STOP_DAY) ignore nulls over (order by id desc) MIN_END_DAY
from wd)
select count(*) from erp_sj
join wd2
on trunc(erp_sj.start_date) = wd2.ID
where trunc(end_day) >= min_end_day
This will lead for large tables to the expected HASH JOIN execution plan.
Note that I assume 1) the workday table is complete (otherwise you can't use inner join) and 2) contains enough future data (the last 5 rows are obviously not usable).

Determining total of consecutive values using SQL

I would like to determine the number of consecutive absences as per the following table. Initial research suggests I may be able to achieve this using a window function. For the data provided, the longest streak is four consecutive occurrences. Please can you advise how I can set a running absence total as a separate column.
create table events (eventdate date, absence int);
insert into events values ('2014-10-01', 0);
insert into events values ('2014-10-08', 1);
insert into events values ('2014-10-15', 1);
insert into events values ('2014-10-22', 0);
insert into events values ('2014-11-05', 0);
insert into events values ('2014-11-12', 1);
insert into events values ('2014-11-19', 1);
insert into events values ('2014-11-26', 1);
insert into events values ('2014-12-03', 1);
insert into events values ('2014-12-10', 0);
Based on Gordon Linhoff's answer here, you could do:
SELECT TOP 1
MIN(eventdate) AS spanStart ,
MAX(eventdate) AS spanEnd,
COUNT(*) AS spanLength
FROM ( SELECT e.* ,
( ROW_NUMBER() OVER ( ORDER BY eventdate )
- ROW_NUMBER() OVER ( PARTITION BY absence ORDER BY eventdate ) ) AS grp
FROM #events e
) t
GROUP BY grp ,
absence
HAVING absence = 1
ORDER BY COUNT(*) DESC;
Which returns:
spanStart | spanEnd | spanLength
---------------------------------------
2014-11-12 |2014-12-03 | 4
You don't specify which RDBMS you are using, but the following works under postgresql's window functions and should be translatable to similar SQL engines:
SELECT eventdate,
absence,
-- XXX We take advantage of the fact that absence is an int (1 or 0)
-- otherwise we'd COUNT(1) OVER (...) and only conditionally
-- display the count if absence = 1
SUM(absence) OVER (PARTITION BY span ORDER BY eventdate)
AS consecutive_absences
FROM (SELECT spanstarts.*,
SUM(newspan) OVER (ORDER BY eventdate) AS span
FROM (SELECT events.*,
CASE LAG(absence) OVER (ORDER BY eventdate)
WHEN absence THEN NULL
ELSE 1 END AS newspan
FROM events)
spanstarts
) eventsspans
ORDER BY eventdate;
which gives you:
eventdate | absence | consecutive_absences
------------+---------+----------------------
2014-10-01 | 0 | 0
2014-10-08 | 1 | 1
2014-10-15 | 1 | 2
2014-10-22 | 0 | 0
2014-11-05 | 0 | 0
2014-11-12 | 1 | 1
2014-11-19 | 1 | 2
2014-11-26 | 1 | 3
2014-12-03 | 1 | 4
2014-12-10 | 0 | 0
There is an excellent dissection of the above approach on the pgsql-general mailing list. The short of it is:
Innermost query (spanstarts) uses LAG to find the start of new
spans of absences, whether a span of 1's or a span 0's
Next query (eventsspans) identifies those spans by summing the number of new spans that have come before us. So, we find span 1, then span 2, then 3, etc.
The outer query the counts the number of absences in each span.
As the SQL comment says, we cheat a little bit on #3, taking advantage of its data type, but the net effect is the same.
I don't know what your DBMS is but this is from SQLServer. Hopefully it is of some help : )
-------------------------------------------------------------------------------------------
Query:
--tableRN is used to get the rownumber
;with tableRN as (SELECT a.*, ROW_NUMBER() OVER (ORDER BY a.event) as rn, COUNT(*) as maxRN
FROM absence a GROUP BY a.event, a.absence),
--cte is a recursive function that returns the...
--absence value, the level (amount of times 1 appeared in a row)
--rn (row number), total (total count
cte (absence, level, rn, total) AS (
SELECT 0, 0, 1, 0
UNION ALL
SELECT r.absence,
CASE WHEN c.absence = 1 AND r.absence = 1 THEN level + 1
ELSE 0
END,
c.rn + 1,
CASE WHEN c.level = 1 THEN total + 1
ELSE total
END
FROM cte c JOIN tableRN r ON c.rn + 1 = r.rn)
--This gets you the total count of times there
--was a consective absent (twice or more in a row).
SELECT MAX(c.total) AS Count FROM cte c
-------------------------------------------------------------------------------------------
Results:
|Count|
+-----+
| 2 |
Create a new column called consecutive_absence_count with default 0.
You may write a SQL procedure for insert - Fetch the latest record, retrieve the absence value, identify if the new record to be inserted has a present or an absent value.
If they latest and the new record have consecutive dates and absence value set to 0, increment the consecutive_absence_count else set it to 0.

SQL grouping by datetime with a maximum difference of x minutes

I have a problem with grouping my dataset in MS SQL Server.
My table looks like
# | CustomerID | SalesDate | Turnover
---| ---------- | ------------------- | ---------
1 | 1 | 2016-08-09 12:15:00 | 22.50
2 | 1 | 2016-08-09 12:17:00 | 10.00
3 | 1 | 2016-08-09 12:58:00 | 12.00
4 | 1 | 2016-08-09 13:01:00 | 55.00
5 | 1 | 2016-08-09 23:59:00 | 10.00
6 | 1 | 2016-08-10 00:02:00 | 5.00
Now I want to group the rows where the SalesDate difference to the next row is of a maximum of 5 minutes.
So that row 1 & 2, 3 & 4 and 5 & 6 are each one group.
My approach was getting the minutes with the DATEPART() function and divide the result by 5:
(DATEPART(MINUTE, SalesDate) / 5)
For row 1 and 2 the result would be 3 and grouping here would work perfectly.
But for the other rows where there is a change in the hour or even in the day part of the SalesDate, the result cannot be used for grouping.
So this is where I'm stuck. I would really appreciate, if someone could point me in the right direction.
You want to group adjacent transactions based on the timing between them. The idea is to assign some sort of grouping identifier, and then use that for aggregation.
Here is an approach:
Identify group starts using lag() and date arithmetic.
Do a cumulative sum of the group starts to identify each group.
Aggregate
The query looks like this:
select customerid, min(salesdate), max(saledate), sum(turnover)
from (select t.*,
sum(case when salesdate > dateadd(minute, 5, prev_salesdate)
then 1 else 0
end) over (partition by customerid order by salesdate) as grp
from (select t.*,
lag(salesdate) over (partition by customerid order by salesdate) as prev_salesdate
from t
) t
) t
group by customerid, grp;
EDIT
Thanks to #JoeFarrell for pointing out I have answered the wrong question. The OP is looking for dynamic time differences between rows, but this approach creates fixed boundaries.
Original Answer
You could create a time table. This is a table that contains one record for each second of the day. Your table would have a second column that you can use to perform group bys on.
CREATE TABLE [Time]
(
TimeId TIME(0) PRIMARY KEY,
TimeGroup TIME
)
;
-- You could use a loop here instead.
INSERT INTO [Time]
(
TimeId,
TimeGroup
)
VALUES
('00:00:00', '00:00:00'), -- First group starts here.
('00:00:01', '00:00:00'),
('00:00:02', '00:00:00'),
('00:00:03', '00:00:00'),
...
('00:04:59', '00:00:00'),
('00:05:00', '00:05:00'), -- Second group starts here.
('00:05:01', '00:05:00')
;
The approach works best when:
You need to reuse your custom grouping in several different queries.
You have two or more custom groups you often use.
Once populated you can simply join to the table and output the desired result.
/* Using the time table.
*/
SELECT
t.TimeGroup,
SUM(Turnover) AS SumOfTurnover
FROM
Sales AS s
INNER JOIN [Time] AS t ON t.TimeId = CAST(s.SalesDate AS Time(0))
GROUP BY
t.TimeGroup
;

Finding gaps in huge event streams?

I have about 1 million events in a PostgreSQL database that are of this format:
id | stream_id | timestamp
----------+-----------------+-----------------
1 | 7 | ....
2 | 8 | ....
There are about 50,000 unique streams.
I need to find all of the events where the time between any two of the events is over a certain time period. In other words, I need to find event pairs where there was no event in a certain period of time.
For example:
a b c d e f g h i j k
| | | | | | | | | | |
\____2 mins____/
In this scenario, I would want to find the pair (f, g) since those are the events immediately surrounding a gap.
I don't care if the query is (that) slow, i.e. on 1 million records it's fine if it takes an hour or so. However, the data set will keep growing, so hopefully if it's slow it scales sanely.
I also have the data in MongoDB.
What's the best way to perform this query?
You can do this with the lag() window function over a partition by the stream_id which is ordered by the timestamp. The lag() function gives you access to previous rows in the partition; without a lag value, it is the previous row. So if the partition on stream_id is ordered by time, then the previous row is the previous event for that stream_id.
SELECT stream_id, lag(id) OVER pair AS start_id, id AS end_id,
("timestamp" - lag("timestamp") OVER pair) AS diff
FROM my_table
WHERE diff > interval '2 minutes'
WINDOW pair AS (PARTITION BY stream_id ORDER BY "timestamp");
In postgres it can be done very easily with a help of the lag() window function. Check the fiddle below as an example:
SQL Fiddle
PostgreSQL 9.3 Schema Setup:
CREATE TABLE Table1
("id" int, "stream_id" int, "timestamp" timestamp)
;
INSERT INTO Table1
("id", "stream_id", "timestamp")
VALUES
(1, 7, '2015-06-01 15:20:30'),
(2, 7, '2015-06-01 15:20:31'),
(3, 7, '2015-06-01 15:20:32'),
(4, 7, '2015-06-01 15:25:30'),
(5, 7, '2015-06-01 15:25:31')
;
Query 1:
with c as (select *,
lag("timestamp") over(partition by stream_id order by id) as pre_time,
lag(id) over(partition by stream_id order by id) as pre_id
from Table1
)
select * from c where "timestamp" - pre_time > interval '2 sec'
Results:
| id | stream_id | timestamp | pre_time | pre_id |
|----|-----------|------------------------|------------------------|--------|
| 4 | 7 | June, 01 2015 15:25:30 | June, 01 2015 15:20:32 | 3 |