PostgreSQL find entries where timestamp differences are within range? - sql

I have a table that has a session_id, user_id, start_time, and value
Technically, a user should get a new session_id every 30 minutes, so there should never be a case where 2 entries have the same user_id but their start times are within 30 minutes of eachother.
How do I run a query to look for these error cases?
I did something like this to see some of the time differences for entries for a given user:
select t1.start_time - t2.start_time
from user_sessions as t1 inner join
user_sesssions as t2
on t1.user_id = 1 and t2.user_id = 1
I know that I'm looking for cases where:
((t1.start_time-t2.start_time) < 60*30*1000000 and (t1.start_time-t2.start_time) > 0) and t1.user_id = t2.user_id
I'm just not sure how to put the two pieces together into one query.

Does this do what you want?
select t1.start_time - t2.start_time
from user_sessions t1 inner join
user_sesssions t2
on t1.user_id = t2.user_id
where (t1.start_time - t2.start_time) < 60*30*1000000 and
(t1.start_time - t2.start_time) > 0;

Using LAG() OVER() allows a simple way to calculate the time difference between rows:
SELECT
user_id, previous_start, start_time, minutes_diff
FROM (
SELECT
user_id
, LAG(start_time) OVER(PARTITION BY user_id ORDER BY start_time) previous_start
, EXTRACT(MINUTES FROM
start_time - lag(start_time) over(partition by user_id order by start_time)
) minutes_diff
FROM user_sessions
) d
WHERE minutes_diff < 30
;

Related

BigQuery: 'join lateral' alternative for referencing value in subquery

I have a BigQuery table that holds append-only data - each time an entity is updated a new version of it is inserted. Each entity has its unique ID and each entry has a timestamp of when it was inserted.
When querying for the latest version of the entity, I order by rank, partition by id, and select the most recent version.
I want to take advantage of this and chart the progression of these entities over time. For example, I would like to generate a row for each day since Jan. 1st, with a summary of the entities as they were on that day. In postgres, I would do:
select
...
from generate_series('2022-01-01'::timestamp, '2022-09-01'::timestamp, '1 day'::interval) query_date
left join lateral (
select *
from (
with snapshot as (
select distinct on (id) *
from table
where "createdOn" <= query_date
order by id, "createdOn" desc
)
This basically behaves like a for-each, having each subquery run once for each query_date (day, in this instance) which I can reference in the where clause. Each subquery then filters the data so that it only uses data up to a certain time.
I know that I can create a saved query for the "subquery" logic and then schedule a prefill to run once for each day over the timeline, but I would like to understand how to write an exploratory query.
EDIT 1
Using a correlated subquery is a step in the right direction, but does not work when the subquery needs to join with another table (another append-only table holding a related entity).
So this works:
select
day
, (
select count(*)
from `table` t
where date(createdOn) < day
)
from unnest((select generate_date_array(date('2022-01-01'), current_date(), interval 1 day) as day)) day
order by day desc
But if I need the subquery to join with another table, like in:
select
day
, (
select as struct *
from (
select
id
, status
, rank() over (partition by id order by createdOn desc) as rank
from `table1`
where date(createdOn) < day
qualify rank = 1
) t1
left join (
select
id
, other
, rank() over (partition by id order by createdOn desc) as rank
from `table2`
where date(createdOn) < day
qualify rank = 1
) t2 on t2.other = t1.id
)
from unnest((select generate_date_array(date('2022-01-01'), current_date(), interval 1 day) as day)) day
order by day desc
I get an error saying Correlated subqueries that reference other tables are not supported unless they can be de-correlated, such as by transforming them into an efficient JOIN. Another SO question about that error (Avoid correlated subqueries error in BigQuery) solves the issue by moving the correlated query to a join in the top query - which misses what I am trying to achieve.
Took me a while, but I figured out a way to do this using the answer in Bigquery: WHERE clause using column from outside the subquery.
Basically, it requires to flip the order of the queries, here's how it's done:
select *
from (
select *
from `table1` t1
JOIN (select day from unnest((select generate_timestamp_array(timestamp('2022-01-01'), current_timestamp(), interval 1 day) as day)) day) day
ON (t1.createdOn) < day.day
QUALIFY ROW_NUMBER() OVER (PARTITION BY day, t1.id ORDER BY t1.createdOn desc) = 1
)
left join (
select
* -- aggregate here
from (
SELECT
id, other, createdOn
FROM `table2` t2
JOIN (select day from unnest((select generate_timestamp_array(timestamp('2022-01-01'), current_timestamp(), interval 1 day) as day)) day) day
ON (t2.createdOn) < day.day
QUALIFY ROW_NUMBER() OVER (PARTITION BY day, t2.id ORDER BY t2.createdOn desc) = 1
) snapshot
group by rs.other, day
) t2 on t2.other = t1.id and t2.day = t1.day
group by t1.day

Is there a way to calculate difference whthin a specific time range using clickhouse MV?

For example, this is my source table
CREATE TABLE IF NOT EXISTS market
(
id UInt64,
price DECIMAL128(18),
create_time UInt64
) ENGINE = MergeTree()
partition by toYYYYMM(FROM_UNIXTIME(create_time))
order by create_time;
What I want to do is to create a MV which asynchronously calculate the price difference whthin 30 min. And then, I can get the difference from this MV using an id every time. Here is an SQL example. It works correctly while directly execute it, but does not work while creating a MV(AS SELECT ...) and querying from the MV.
select id,
(t1.price - t2.price) as price_delta,
t2.price as start_price,
t1.price as end_price,
round(toFloat64(price_delta) / toFloat64(t2.price), 2) as price_delta_rate,
now() as update_time
from (select id, price
from market
where create_time >= date_sub(MINUTE, 30, now())
order by create_time desc
limit 1) t1
left join
(select id, price
from market
where create_time >= date_sub(MINUTE, 30, now())
order by create_time
limit 1) t2
ON t1.id = t2.id;
Here is my SQL to create a MV
CREATE MATERIALIZED VIEW IF NOT EXISTS market_stats_30min
ENGINE = ReplacingMergeTree()
order by id
POPULATE
AS
select id,
(t1.price - t2.price) as price_delta,
t2.price as start_price,
t1.price as end_price,
round(toFloat64(price_delta) / toFloat64(t2.price), 2) as price_delta_rate,
now() as update_time
from (select id, price
from market
where create_time >= date_sub(MINUTE, 30, now())
order by create_time desc
limit 1) t1
left join
(select id, price
from market
where create_time >= date_sub(MINUTE, 30, now())
order by create_time
limit 1) t2
ON t1.id = t2.id;
Materialized view is just after insert trigger which works inside new data block which you inserted
So, your INSERT INTO market doesn't contain all required data in most of the cases
POPULATE recalculate full table for materialized view just once
Try to use window functions in SELECT
https://clickhouse.com/docs/en/sql-reference/window-functions/, without materialized view

Self-referencing a table for previous record matching user ID

I'm trying to find the easiest way to calculate cycle times from SQL data. In the data source I have unique station ID's, user ID's, and a date/time stamp, along with other data they are performing.
What I want to do is join the table to itself so that for each date/time stamp I get:
- the date/time stamp of the most recent previous instance of that user ID within 3 minutes or null
- the difference between those two stamps (the cycle time = amount of time between records)
This should be simple but I can't wrap my brain around it. Any help?
Unfortunately SQL Server does not support date range specifications in window functions. I would recommend a lateral join here:
select
t.*,
t1.timestamp last_timestamp,
datediff(second, t1.timestamp, t.timestamp) diff_seconds
from mytable t
outer apply (
select top(1) t1.*
from mytable t1
where
t1.user_id = t.user_id
and t1.timestamp >= dateadd(minute, -3, t.timestamp)
and t1.timestamp < t.timestamp
order by t1.timestamp desc
) t1
The subquery brings the most recent row within 3 minutes for the same user_id (or an empty resultset, if there is no row within that timeframe). You can then use that information in the outer query to display the corresponding timestamp, and compute the difference with the current one.
Simply calculate the difference of the current and the LAG timestamp, if it's more than three minutes return NULL instead:
with cte as
(
select
t.*
,datediff(second, timestamp, lag(timestamp) over (partition by user_id order by timestamp) as diff_seconds
from mytable as t
)
select cte.*
,case when diff_seconds <= 180 then diff_seconds end
from cte

Grouping by ID and time interval in ms sql

I am trying to write a query that groups like ids within a timespan.
Real world scenario:
I want to see rows created by the same ID within 5 seconds of each other.
SELECT top 10
Id,
CreatedOn
FROM Logs
where ((DATEPART(SECOND, CreatedOn) + 5) - DATEPART(SECOND, CreatedOn)) < 10
GROUP BY
DATEPART(SECOND, CreatedOn),
Id,
CreatedOn
order by CreatedOn desc
This isnt quite right but I feel like I am on the right track.
thanks in advance
You may try doing a query on the condition that the ID matches, and the seconds since epoch is within 5 seconds of the matching record:
SELECT
t1.Id,
t1.CreatedOn
FROM logs t1
WHERE EXISTS (SELECT 1 FROM logs t2
WHERE t1.Id = t2.Id AND
t1.CreatedOn <> t2.CreatedOn AND
ABS(DATEDIFF(SECOND, t1.CreatedOn, t2.CreatedOn)) <= 5)
ORDER BY
t1.CreatedOn DESC;
Could be further optimized this way:
SELECT t1.Id,
,t1.CreatedOn
FROM logs t1
WHERE EXISTS (
SELECT 1
FROM logs t2
WHERE t2.Id = t1.Id
AND t2.CreatedOn <> t1.CreatedOn
AND ABS(DATEDIFF(SECOND, t1.CreatedOn, t2.CreatedOn)) <= 5
)
ORDER BY
t1.CreatedOn DESC;

Filter rows by those created within a close timeframe

I have a application where users create orders that are stored in a Oracle database. I'm trying to find a bug that only happens when a user creates orders within 30 seconds of the last order they created.
Here is the structure of the order table:
order_id | user_id | creation_date
I would like to write a query that can give me a list of orders where the creation_date is within 30 seconds of the last order for the same user. The results will hopefully help me find the bug.
I tried using the Oracle LAG() function but it doesn't seem to with the WHERE clause.
Any thoughts?
SELECT O.*
FROM YourTable O
WHERE EXISTS (
SELECT *
FROM YourTable O2
WHERE
O.creation_date > O2.creation_date
AND O.user_id = O2.user_id
AND O.creation_date - (30 / 86400) <= O2.creation_date
);
See this in action in a Sql Fiddle.
You can use the LAG function if you want, you would just have to wrap the query into a derived table and then put your WHERE condition in the outer query.
SELECT distinct
t1.order_id, t1.user_id, t1.creation_date
FROM
YourTable t1
join YourTable t2
on t2.user_id = t1.user_id
and t2.creation_date between t1.creation_date - 30/86400 and t1.creation_date
and t2.rowid <> t1.rowid
order by 3 desc
Example of using LAG():
SELECT id, (pss - css) time_diff_in_seconds
, creation_date, prev_date
FROM
(
SELECT id, creation_date, prev_date
, EXTRACT(SECOND From creation_date) css
, EXTRACT(SECOND From prev_date) pss
FROM
(
SELECT id, creation_date
, LAG(creation_date, 1, creation_date) OVER (ORDER BY creation_date) prev_date
FROM
( -- Table/data --
SELECT 1 id, timestamp '2013-03-20 13:56:58' creation_date FROM dual
UNION ALL
SELECT 2, timestamp '2013-03-20 13:57:27' FROM dual
UNION ALL
SELECT 3, timestamp '2013-03-20 13:59:16' FROM dual
)))
--WHERE (pss - css) <= 30
/
ID TIME_DIFF_IN_SECONDS
--------------------------
1 0 <<-- if uncomment where
2 31
3 11 <<-- if uncomment where