Returning 5 Most Recent Trips Per ID - sql

I have a table with the number of trips taken and a station_id, and I want to return the 5 most recent trips made per ID (sample image of the table is below)
The query I made below aggregates the station id's and the most recent trip, but I am having a difficult time returning the 5 most recent
SELECT start_station_id, MAX(start_time)
FROM `bpd.shop.trips`
group by start_station_id, start_time
Trips:
https://imgur.com/Ebh9FeZ
Any help would be much appreciated, thanks!

You can use row_number():
SELECT t.*
FROM (SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY start_station_id ORDER BY start_time DESC) as seqnum
FROM `bpd.shop.trips` t
) t
WHERE seqnum <= 5;

Below is for BigQuery Standard SQL
Option 1
#standardSQL
SELECT record.*
FROM (
SELECT ARRAY_AGG(t ORDER BY start_time DESC LIMIT 5) arr
FROM `bpd.shop.trips` t
GROUP BY start_station_id
), UNNEST(arr) record
Option 2
#standardSQL
SELECT * EXCEPT (pos) FROM (
SELECT *, ROW_NUMBER() OVER(win) AS pos
FROM `bpd.shop.trips`
WINDOW win AS (PARTITION BY start_station_id ORDER BY start_time DESC)
)
WHERE pos <= 5
I recommend using Option 1 as more scalable option

Related

How to return max date per month for user

I have following table:
And I would like to have returned maximum threshold date per each month for every user, so my final result should look like that:
I wanted to use analytic function ROW_NUMBER and return maximum number of row but how to do it per month for each user? Is there any simpler way to do it in BigQuery?
You can partition the row_number by the user and the month, and then take the first one for each:
SELECT user_id, threshold_date, net_deposists_usd
FROM (SELECT user_id, threshold_date, net_deposists_usd,
ROW_NUMBER () OVER (PARTITION BY user_id, EXTRACT (MONTH FROM threshold_date)
ORDER BY net_deposists_usd DESC) AS rk
FROM mytable)
WHERE rk = 1
BigQuery now supports qualify, which does everything you want. For the month, just use date_trunc():
select t.*
from t
qualify row_number() over (partition by user_id, date_trunc(threshold_date, month)
order by threshold_date desc, net_deposits_usd desc
);
A simple alternative uses arrays and group by:
select array_agg(t order by threshold_date desc, net_deposits_usd desc limit 1)[ordinal(1)].*
from t
group by user_id, date_trunc(threshold_date, month) ;

Difference between last and second last event in a table of events

I have the following table
which created by
create table events (
event_type integer not null,
value integer not null,
time timestamp not null,
unique (event_type, time)
);
given the data in the pic, I want to write a query that for each event_type that has been
registered more than once returns the difference between the latest and
the second latest value.
Given the above data, the output should be like
event_type value
2 -5
3 4
I solved it using the following :
CREATE VIEW [max_date] AS
SELECT event_type, max(time) as time, value
FROM events
group by event_type
having count(event_type) >1
order by time desc;
select event_type, value
from
(
select event_type, value, max(time)
from(
Select E1.event_type, ([max_date].value - E1.value) as value, E1.time
From events E1, [max_date]
Where [max_date].event_type = E1.event_type
and [max_date].time > E1.time
)
group by event_type
)
but this seems like a very complicated query and I wonder if there is an easier way?
Use window functions:
select e.*,
(value - prev_value)
from (select e.*,
lag(value) over (partition by event_type order by time) as prev_value,
row_number() over (partition by event_type order by time desc) as seqnum
from events e
) e
where seqnum = 1 and prev_value is not null;
You could use lag() and row_number()
select event_type, val
from (
select
event_type,
value - lag(value) over(partition by event_type order by time desc) val,
row_number() over(partition by event_type order by time desc) rn
from events
) t
where rn = 1 and val is not null
The inner query ranks records having the same event_type by descending time, and computes the difference between each value and the previous one.
Then, the outer query just filters on the top record per group.
Here is a way to do this using a combination of analytic functions and aggregation. This approach is friendly in the event that your database does not support LEAD and LAG.
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY event_type ORDER BY time DESC)
FROM events
)
SELECT
event_type,
MAX(CASE WHEN rn = 1 THEN value END) - MAX(CASE WHEN rn = 2 THEN value END) AS value
FROM cte
GROUP BY
event_type
HAVING
COUNT(*) > 1;

count consecutive record with timestamp interval requirement

ref to this post: link, I used the answer provided by #Gordon Linoff:
select taxi, count(*)
from (select t.taxi, t.client, count(*) as num_times
from (select t.*,
row_number() over (partition by taxi order by time) as seqnum,
row_number() over (partition by taxi, client order by time) as seqnum_c
from t
) t
group by t.taxi, t.client, (seqnum - seqnum_c)
having count(*) >= 2
)
group by taxi;
and got my answer perfectly like this:
Tom 3 (AA count as 1, AAA count as 1 and BB count as 1, so total of 3 count)
Bob 1
But now I would like to add one more condition which is the time between two consecutive clients for same taxi should not be longer than 2hrs.
I know that I should probably use row_number() again and calculate the time difference with datediff. But I have no idea where to add and how to do.
So any suggestion?
This requires a bit more logic. In this case, I would use lag() to calculate the groups:
select taxi, count(*)
from (select t.taxi, t.client, count(*) as num_times
from (select t.*,
sum(case when prev_client = client and
prev_time > time - interval '2 hour'
then 1
else 0
end) over (partition by client order by time) as grp
from (select t.*,
lag(client) over (partition by taxi order by time) as prev_client,
lag(time) over (partition by taxi order by time) as prev_time
from t
) t
) t
group by t.taxi, t.client, grp
having count(*) >= 2
)
group by taxi;
Note: You don't specify the database, so this uses ISO/ANSI standard syntax for date/time comparisons. You can adjust this for your actual database.

Subtraction of values depending on time SQL

For each EVENT_TYPE that is repeated more than once
I need a SQL statement that returns the event_type and the subtraction of the last value registered for this event_type and the second value. I appreciate your help
You can use LEAD() (or LAG() if you prefer) to get the next record in the series, and calculate the difference only when there is another record and only taking the latest Time per Event_Type:
With Cte As
(
Select *,
Row_Number() Over (Partition By Event_Type Order By Time Desc) As Row_Number,
Lead(Value) Over (Partition By Event_Type Order By Time Desc) As Prev
From YourTable
)
Select Event_Type, Value - Prev As Value
From Cte
Where Prev Is Not Null
And Row_Number = 1
I would use row_number() and conditional aggregation:
select e.event_type,
sum(case when seqnum = 1 then value when seqnum = 2 then - value end) as diff
from (select e.*,
row_number() over (partition by e.event_type order by e.time desc) as seqnum
from events e
) e
group by e.event_type
having count(*) >= 2;

Retrieve recent 5 days forecast for each cities with latest issue date

I need to retrieve the recent 5 days forecast info for each cities.
My table looks like below
The real problem is with the issue date.
the city may contain several forecast info for the same date with distinct issue date.
I need to retrieve recent 5 records for each cities with latest issue date and group by forecast date
I have tried something like below but not giving the expected result
SELECT * FROM(
SELECT
ROW_NUMBER () OVER (PARTITION BY CITY_ID ORDER BY FORECAST_DATE DESC, ISSUE_DATE DESC) AS rn,
CITY_ID, FORECAST_DATE, ISSUE_DATE
FROM
FORECAST
GROUP BY FORECAST_DATE
) WHERE rn <= 5
Any suggestion or advice will be helpful
This will get the latest issued forecast per day over the most recent 5 days for each city:
SELECT *
FROM (
SELECT f.*,
DENSE_RANK() OVER ( PARTITION BY city_id ORDER BY forecast_date DESC )
AS forecast_rank,
ROW_NUMBER() OVER ( PARTITION BY city_id, forecast_date ORDER BY issue_date DESC )
AS issue_rn
FROM Forecast f
)
WHERE forecast_rank <= 5
AND issue_rn = 1;
Partition by works like group by but for the function only.
Try
with CTE as
(
select t1.*,
row_number() over (partition by city_id, forecast_date order by issue_date desc) as r_ord
from Forecast
)
select CTE.*
from CTE
where r_ord <= 5
Try this
SELECT * FROM(
SELECT
ROW_NUMBER () OVER (PARTITION BY CITY_ID, FORECAST_DATE order by ISSUE_DATE DESC) AS rn,
CITY_ID, FORECAST_DATE, ISSUE_DATE
FROM
FORECAST
) WHERE rn <= 5