Difference between last and second last event in a table of events - sql

I have the following table
which created by
create table events (
event_type integer not null,
value integer not null,
time timestamp not null,
unique (event_type, time)
);
given the data in the pic, I want to write a query that for each event_type that has been
registered more than once returns the difference between the latest and
the second latest value.
Given the above data, the output should be like
event_type value
2 -5
3 4
I solved it using the following :
CREATE VIEW [max_date] AS
SELECT event_type, max(time) as time, value
FROM events
group by event_type
having count(event_type) >1
order by time desc;
select event_type, value
from
(
select event_type, value, max(time)
from(
Select E1.event_type, ([max_date].value - E1.value) as value, E1.time
From events E1, [max_date]
Where [max_date].event_type = E1.event_type
and [max_date].time > E1.time
)
group by event_type
)
but this seems like a very complicated query and I wonder if there is an easier way?

Use window functions:
select e.*,
(value - prev_value)
from (select e.*,
lag(value) over (partition by event_type order by time) as prev_value,
row_number() over (partition by event_type order by time desc) as seqnum
from events e
) e
where seqnum = 1 and prev_value is not null;

You could use lag() and row_number()
select event_type, val
from (
select
event_type,
value - lag(value) over(partition by event_type order by time desc) val,
row_number() over(partition by event_type order by time desc) rn
from events
) t
where rn = 1 and val is not null
The inner query ranks records having the same event_type by descending time, and computes the difference between each value and the previous one.
Then, the outer query just filters on the top record per group.

Here is a way to do this using a combination of analytic functions and aggregation. This approach is friendly in the event that your database does not support LEAD and LAG.
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY event_type ORDER BY time DESC)
FROM events
)
SELECT
event_type,
MAX(CASE WHEN rn = 1 THEN value END) - MAX(CASE WHEN rn = 2 THEN value END) AS value
FROM cte
GROUP BY
event_type
HAVING
COUNT(*) > 1;

Related

How to get increment number when there are any change in a column in Bigquery?

I have data date, id, and flag on this table. How I can get the value column where this column is incremental number and reset from 1 when there are any change in flag column?
Consider below approach
select * except(changed, grp),
row_number() over(partition by id, grp order by date) value
from (
select *, countif(changed) over(partition by id order by date) grp
from (
select *,
ifnull(flag != lag(flag) over(partition by id order by date), true) changed
from `project.dataset.table`
))
if applied to sample data in your question - output is
You seem to want to count the number of falses since the last true. You can use:
select t.* except (grp),
(case when flag
then 1
else row_number() over (partition by id, grp order by date) - 1
end)
from (select t.*,
countif(flag) over (partition by id order by date) as grp
from t
) t;
If you know that the dates have no gaps, you can actually do this without a subquery:
select t.*,
(case when flag then 1
else date_diff(date,
max(case when flag then date end) over (partition by id),
day)
end)
from t;

Redshift - Group Table based on consecutive rows

I am working right now with this table:
What I want to do is to clear up this table a little bit, grouping some consequent rows together.
Is there any form to achieve this kind of result?
The first table is already working fine, I just want to get rid of some rows to free some disk space.
One method is to peak at the previous row to see when the value changes. Assuming that valid_to and valid_from are really dates:
select id, class, min(valid_to), max(valid_from)
from (select t.*,
sum(case when prev_valid_to >= valid_from + interval '-1 day' then 0 else 1 end) over (partition by id order by valid_to rows between unbounded preceding and current row) as grp
from (select t.*,
lag(valid_to) over (partition by id, class order by valid_to) as prev_valid_to
from t
) t
) t
group by id, class, grp;
If the are not dates, then this gets trickier. You could convert to dates. Or, you could use the difference of row_numbers:
select id, class, min(valid_from), max(valid_to)
from (select t.*,
row_number() over (partition by id order by valid_from) as seqnum,
row_number() over (partition by id, class order by valid_from) as seqnum_2
from t
) t
group by id, class, (seqnum - seqnum_2)

How to select specific rows in a "group by" groups using conditions on multiple columns?

I have the following table with many userId (in the example only one userId for demo purpose):
For every userId I want to extract two rows:
The first row should be isTransaction = 0 and the earliest date!
The second row should be isTransaction = 1, device should be different from that of the first row, isTransaction should be equal to 1 and the earliest date right after that of the first row
That is, the output should be:
Time userId device isTransaction
2021-01-27 10187675 mobile 0
2021-01-30 10187675 web 1
I tried to rank rows with partitioning and ordering but it didn't work:
Select * from
(SELECT *, rank() over(partition by userId, device, isTransaction order by isTransaction, Time) as rnk
FROM table 1)
where rnk=1
order by Time
Please help! It would be also good to check the time difference between these two rows to not exceed 30 days. Otherwise, userId should be dropped.
You can first identify the earliest time for 0. Then enumerate the rows and take only the first one:
select t.*
from (select t.*,
row_number() over (partition by userid, status order by time) as seqnum
from (select t.*,
min(case when isTransaction = 0 then time end) over (partition by userid order by time) as time_0
from t
) t
where time > time_0
) t
where seqnum = 1;
This satisfies the two conditions you enumerated.
Then buried in the text, you want to eliminate rows where the difference is greater than 30 days. That is a little tricker . . . but not too hard:
select t.*
from (select t.*,
min(case when isTransaction = 1 then time end) over (partition by userid) as time_1
row_number() over (partition by userid, status order by time) as seqnum
from (select t.*,
min(case when isTransaction = 0 then time end) over (partition by userid order by time) as time_0
from t
) t
where time > time_0
) t
where seqnum = 1 and
time_1 < timestamp_add(time_0, interval 30 day);

Returning 5 Most Recent Trips Per ID

I have a table with the number of trips taken and a station_id, and I want to return the 5 most recent trips made per ID (sample image of the table is below)
The query I made below aggregates the station id's and the most recent trip, but I am having a difficult time returning the 5 most recent
SELECT start_station_id, MAX(start_time)
FROM `bpd.shop.trips`
group by start_station_id, start_time
Trips:
https://imgur.com/Ebh9FeZ
Any help would be much appreciated, thanks!
You can use row_number():
SELECT t.*
FROM (SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY start_station_id ORDER BY start_time DESC) as seqnum
FROM `bpd.shop.trips` t
) t
WHERE seqnum <= 5;
Below is for BigQuery Standard SQL
Option 1
#standardSQL
SELECT record.*
FROM (
SELECT ARRAY_AGG(t ORDER BY start_time DESC LIMIT 5) arr
FROM `bpd.shop.trips` t
GROUP BY start_station_id
), UNNEST(arr) record
Option 2
#standardSQL
SELECT * EXCEPT (pos) FROM (
SELECT *, ROW_NUMBER() OVER(win) AS pos
FROM `bpd.shop.trips`
WINDOW win AS (PARTITION BY start_station_id ORDER BY start_time DESC)
)
WHERE pos <= 5
I recommend using Option 1 as more scalable option

Subtraction of values depending on time SQL

For each EVENT_TYPE that is repeated more than once
I need a SQL statement that returns the event_type and the subtraction of the last value registered for this event_type and the second value. I appreciate your help
You can use LEAD() (or LAG() if you prefer) to get the next record in the series, and calculate the difference only when there is another record and only taking the latest Time per Event_Type:
With Cte As
(
Select *,
Row_Number() Over (Partition By Event_Type Order By Time Desc) As Row_Number,
Lead(Value) Over (Partition By Event_Type Order By Time Desc) As Prev
From YourTable
)
Select Event_Type, Value - Prev As Value
From Cte
Where Prev Is Not Null
And Row_Number = 1
I would use row_number() and conditional aggregation:
select e.event_type,
sum(case when seqnum = 1 then value when seqnum = 2 then - value end) as diff
from (select e.*,
row_number() over (partition by e.event_type order by e.time desc) as seqnum
from events e
) e
group by e.event_type
having count(*) >= 2;