Max difference between update timestamps - sql

I have a table:
id | updated_at
1 | 2018-10-22T21:00:00Z
2 | 2018-10-22T21:02:00Z
I'd like to find the largest delta for a given day between closest updated timestamps. For example, if there were 5 rows:
id | updated_at
1 | 2018-10-22T21:00:00Z
2 | 2018-10-22T21:02:00Z
3 | 2018-10-22T21:05:00Z
4 | 2018-10-22T21:06:00Z
5 | 2018-10-22T21:16:00Z
The largest delta is between 4 and 5 (10 minutes). Note that really when comparing, I just want to find the next closest updated_at timestamp and then give me the max of this. I feel like I'm messing up the subquery to do this.
with nearest_time(time_diff)
as
(
select datediff('minute', updated_at as u1, (select updated_at from table where updated_at > u1 limit 1) as u2)
group by updated_at::date
)
select max(select time_diff from nearest_time);

demo:db<>fiddle
SELECT
lead(updated) OVER (ORDER BY updated) - updated as diff
FROM dates
ORDER BY diff DESC NULLS LAST
LIMIT 1;
Using window function LEAD allows you to get the value of the next row: In this case you can get the next timestamp.
With that you can do a substraction, sorting the results descending and take the first value.

Use lag to get the updated_at from the previous row and then get the max difference per day.
select dt_updated_at,max(time_diff)
from (select updated_at::date as dt_updated_at
,updated_at - lag(updated_at) over(partition by updated_at::date order by updated_at) as time_diff
from tbl
) t
group by dt_updated_at
One more option using DISTINCT ON (only works on Postgres..as the question was initially tagged Postgres, keeping this answer)
select distinct on
(updated_at::date)
updated_at::date as dt_updated_at
,updated_at-lag(updated_at) over(partition by updated_at::date order by updated_at) as diff
from dates
order by updated_at::date,diff desc
nulls last

Related

SQL Distinct / GroupBy

Ok, I’m stuck on an SQL query and tried long enough that it’s time to ask for help :) I'm using Objection.js – but that's not super relevant as I really just can't figure out how to structure the SQL.
I have the following example data set:
Items
id
name
1
Test 1
2
Test 2
3
Test 3
Listings
id
item_id
price
created_at
1
1
100
1654640000
2
1
60
1654640001
3
1
80
1654640002
4
2
90
1654640003
5
2
90
1654640004
6
3
50
1654640005
What I’m trying to do:
Return the lowest priced listing for each item
If all listings for an item have the same price, I want to return the newest of the two items
Overall, I want to return the resulting items by price
I’m trying to write a query that returns the data:
id
item_id
name
price
created_at
6
3
Test 3
50
1654640005
2
1
Test 1
60
1654640001
5
2
Test 2
90
1654640004
Any help would be greatly appreciated! I'm also starting fresh, so I can add new columns to the data if that would help at all :)
An example of where my query is right now:
select * from "listings" inner join (select "item_id", MIN(price) as "min_price" from "listings" group by "item_id") as "grouped_listings" on "listings"."item_id" = "grouped_listings"."item_id" and "listings"."price" = "grouped_listings"."min_price" where "listings"."sold_at" is null and "listings"."expires_at" > ? order by CAST(price AS DECIMAL) ASC limit ?;
This gets me listings – but if two listings have the same price, it returns multiple listings with the same item_id – not ideal.
Given the postgresql tag, this should work:
with listings_numbered as (
select *, row_number() over (
partition by item_id
order by price asc, created_at desc
) as rownum
from listings
)
select l.id, l.item_id, i.name, l.price, l.created_at
from listings_numbered l
join items i on l.item_id=i.id
where l.rownum=1
order by price asc;
This is a bit of an advanced query, using window functions and a common table expression, but we can break it down.
with listings_numbered as (...) select simply means to run the query inside of the ..., and then we can refer to the results of that query as listings_numbered inside of the select, as though it was a table.
We're selecting all of the columns in listings, plus one more:
row_number() over (partition by item_id order by price asc, created_at desc). partition by item_id means that we would like the row number to reset for each new item_id, and the order by specifies the ordering that the rows should get within each partition before we number them: first increasing by price, then decreasing by creation time to break ties.
The result of the CTE listings_numbered looks like:
id
item_id
price
created_at
rownum
2
1
60
1654640001
1
3
1
80
1654640002
2
1
1
100
1654640000
3
5
2
90
1654640004
1
4
2
90
1654640003
2
6
3
50
1654640005
1
If you look at only the rows where rownum (the last column) is 1, then you can see that it's exactly the set of listings that you're interested in.
The outer query then selects from this this dataset, joins on items to get the name, filters to only the listings where rownum is 1, and sorts by price, to get the final result:
id
item_id
name
price
created_at
6
3
Test 3
50
1654640005
2
1
Test 1
60
1654640001
5
2
Test 2
90
1654640004
Aggregation functions, as the MIN function you employed in your query, is a viable option, yet if you want to have an efficient query for your problem, window functions can be your best friends. This class of functions allow to compute values over "windows" (partitions) of your table given some specified columns.
For the solution to this problem I'm going to compute two values using the window functions:
the minimum value for "listings.price", by partitioning on "listings.item_id",
the maximum value for "created_at", by partitioning on "listings.item_id" and listings.price
SELECT *,
MIN(price) OVER(PARTITION BY item_id) AS min_price,
MAX(created_at) OVER(PARTITION BY item_id, price) AS max_created_at
FROM listings
Once you have all records of listings associated to the corresponding minimum price and latest date, it's necessary for you to select the records whose
price equals the minimum price
created_at equals the most recent created_at
WITH cte AS (
SELECT *,
MIN(price) OVER(PARTITION BY item_id) AS min_price,
MAX(created_at) OVER(PARTITION BY item_id, price) AS max_created_at
FROM listings
)
SELECT id,
item_id,
price,
created_at
FROM cte
WHERE price = min_price
AND created_at = max_created_at
If you need to order by price, it's sufficient to add a ORDER BY price clause.
Check the demo here.

Get highest value from column in where clause

I have table:
id date day
2 21-07 1
2 10-07 3
2 11-07 2
I need to get date which has highest day, in current example it should be 10-07.
Next query works fine, but is there way to make it more optimised?
SELECT date
FROM (SELECT date, MAX(day)
FROM some_table
WHERE id = 2
GROUP BY date
LIMIT 1) x
I would not suggest a where clause. Just use:
select date
from some_table
where id = 2
order by day desc
limit 1;

dense_rank in sql partition by id and session id but ordered by timestamp

I have a table as following:
User ID
Session ID
Timestamp
100
7e938c4437a0
1:30:30
100
7e938c4437a0
1:30:33
100
c1fcfd8b1a25
2:40:00
100
7b5e86d91103
3:20:00
200
bda6c8743671
2:20:00
200
bda6c8743671
2:25:00
200
aac5d66421a0
3:10:00
200
aac5d66421a0
3:11:00
I am trying to rank each session_id for by user_id, sequenced(ordered by) timestamp. I want something like the second table.
I am doing the following but it does not order by timestamp:
dense_rank() over (partition by user_id order by session_id) as visit_number
it outputs in wrong order and when I add timestamp in the order by it behaves like a row_number() function.
Below is what I am really looking for to get as a result:
User ID
Session ID
Timestamp
Rank
100
7e938c4437a0
1:30:30
1
100
7e938c4437a0
1:30:33
1
100
c1fcfd8b1a25
2:40:00
2
100
7b5e86d91103
3:20:00
3
200
bda6c8743671
2:20:00
1
200
bda6c8743671
2:25:00
1
200
aac5d66421a0
3:10:00
2
200
aac5d66421a0
3:11:00
2
If you want to dense rank by the hour component of the timestamp, you can extract the hour. This should give the results you specify. In standard SQL, this looks like:
dense_rank() over (partition by user_id order by extract(hour from timestamp) as visit_number
Of course, date/time functions are highly database dependent, so your database might have a different function for extracting the hour.
I wanted to do something similar and since I found the answer I thought I would come and post here. This is what I have learned you can do.
SELECT user_id, session_id, session_timestamp,
-- This ranks the records according to the date, which is the same for each user_id, session_group
DENSE_RANK() OVER (PARTITION BY tbl.user_id ORDER BY tbl.min_dt) AS rank
FROM (
SELECT user_id, session_id, session_timestamp,
-- We want to get the MIN or MAX session_timestamp but only for each group. This allows us to keep the ordering by timestamp, but still group by user_id and session_id.
MIN(session_timestamp) OVER (PARTITION BY user_id, session_id) AS min_dt
FROM sessions) tbl
ORDER BY user_id, rank, session_timestamp
This is the results that look the same as asked for.

SQL: How many rows have the largest value for a column

I am sure this is a very simple answer, though I have not turned anything up. Most because I am sure I am phrasing the question wrong.
Anyway, lets say I have this very simple table:
Table: election_candidates
id | candidate_id | election_id | votes
---------------------------------------
1 | 2 | 1 | 3
2 | 5 | 1 | 3
3 | 3 | 1 | 2
I need to know if two candidates are tied. So if there is more than one candidate with the most amount of votes for an election.
I know I can use MAX function to get the largest value for an election, but is their an easy query to get how many candidates have the MAX for a given election?
I'm using PHP and the Codeigniter framework, though just a general example of a query that could work is just fine.
Most databases support ANSI-standard window functions. One way to do this is using rank():
select ec.election_id, count(*) as NumTies
from (select ec.*, rank(votes) over (partition by election_id order by votes desc) as seqnum
from election_candidates ec
) ec
where seqnum = 1
group by ec.election_id;
Couldn't you just do something like:
select e.*
from election_candidates e
inner join (
select election_id, max(votes) as maxVotes,
from election_candidates
group by election_id
) maxVotesPerElectionId on e.election_Id = maxVotesPerElectionId.election_id
and e.votes = maxVotesPerElectionId.maxVotes
this should get you the candiates (per election) with the max votes.
Just the winner:
SELECT *
from election_candidates
ORDER BY votes DESC
LIMIT 0,1
This will group all elections together, using rank() sort each election by votes cast and list in the order of placement.
All candidates are listed and displayed on how they did in each election.
DECLARE #T AS TABLE (id INT,candidate_id INT,election_id INT,votes INT)
INSERT INTO #T VALUES
(1 ,2,1,3),(2 ,5,1,3),(3 ,3,1,2),(4 ,2,2,3),(5 ,5,3,1),(6 ,6,1,4),(7 ,2,3,3),(8 ,1,4,3),
(9 ,1,5,2),(10,4,5,3),(11,5,5,3),(12,6,5,4)
SELECT
election_id,
votes,
RANK() OVER (PARTITION BY election_id ORDER BY votes) AS RANKING,
candidate_id
FROM #T
ORDER BY election_id,
RANK() OVER (PARTITION BY election_id ORDER BY votes)

Multiple filters on SQL query

I have been reading many topics about filtering SQL queries, but none seems to apply to my case, so I'm in need of a bit of help. I have the following data on a SQL table.
Date item quantity moved quantity in stock sequence
13-03-2012 16:51:00 xpto 2 2 1
13-03-2012 16:51:00 xpto -2 0 2
21-03-2012 15:31:21 zyx 4 6 1
21-03-2012 16:20:11 zyx 6 12 2
22-03-2012 12:51:12 zyx -3 9 1
So this is quantities moved in the warehouse, and the problem is on the first two rows which was a reception and return at the same time, because I'm trying to make a query which gives me the stock at a given time of all items. I use max(date) but i don't get the right quantity on result.
SELECT item, qty_in_stock
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY item ORDER BY item_date DESC, sequence DESC) rn
FROM mytable
WHERE item_date <= #date_of_stock
) q
WHERE rn = 1
If you are on SQL-Server 2012, these are several nice features added.
You can use the LAST_VALUE - or the FIRST_VALUE() - function, in combination with a ROWS or RANGE window frame (see OVER clause):
SELECT DISTINCT
item,
LAST_VALUE(quantity_in_stock) OVER (PARTITION BY item
ORDER BY date, sequence
ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING)
AS quantity_in_stock
FROM tableX
WHERE date <= #date_of_stock
Add a where clause and do the summation:
select item, sum([quantity moved])
from t
group by item
where t.date <= #DESIREDDATETIME
If you put a date in for the desired datetime, remember that goes to midnight when the day starts.