SQL - Datediff between rows with Rank Applied - sql

I am trying to work out how to to apply a datediff between rows where a rank is applied to the USER ID;
Example of how the data below;
UserID Order Number ScanDateStart ScanDateEnd Minute Difference Rank | Minute Difference Rank vs Rank+1
User1 10-24 10:20:00 10:40:00 20 1 | 5
User1 10-25 10:45:00 10:50:00 5 2 | 33
User1 10-26 11:12:00 11:45:00 33 3 | NULL
User2 10-10 00:09:00 00:09:20 20 1 | 4
User2 10-11 00:09:24 00:09:25 1 2 | 15
User2 10-12 00:09:40 00:10:12 32 3 | 3
User2 10-13 00:10:15 00:10:35 20 4 | NULL
What i'm looking for is how to code the final column of this table.
The rank is applied to UserID ordered by ScanDateStart.
Basically, i want to know the time between the ScanDateEnd of Rank 1, to ScanDateStart of Rank2, and so on, but for each user.... (calculating time between order processing etc)
Appreciate the help

This can be achieved by performing a LEFT JOIN to the same table on the UserID column and the Rank column, plus 1.
The following (simplified) pseudo-code should illustrate how to achieve this:
SELECT R.UserID,
R.Rank,
R1.Diff
FROM Rank R
LEFT JOIN Rank R1 ON R1.UserID = R.UserID AND R1.Rank = R.Rank + 1
Effectively, you are showing the UserID and Rank from the current row, but the Difference from the row of the same UserID with the Rank + 1.

Related

LeetCode 534. Game Play Analysis III

The following is the leetcode question
Table: Activity
Column Name
Type
player_id
int
device_id
int
event_date
date
games_played
int
(player_id, event_date) is the primary key of this table.
This table shows the activity of players of some games.
Each row is a record of a player who logged in and played a number of games (possibly 0) before logging out on someday using some device.
Write an SQL query to report for each player and date, how many games played so far by the player. That is, the total number of games played by the player until that date. Check the example for clarity.
Return the result table in any order.
The query result format is in the following example.
Player_Id
Device_ID
Event_Date
Games_Played
1
2
2016-03-01
5
1
2
2016-03-02
6
1
3
2017-06-25
1
3
1
2016-03-02
0
3
4
2018-07-03
5
I have solved it using the Windows function . My code is below
select player_id,event_date,sum(games_played) over(partition by player_id order by event_date) as games_played_so_far from activity
and the output
player_id
event_date
games_played_So_Far
1
2016-03-01
5
1
2016-05-02
11
1
2017-06-25
12
3
2016-03-02
0
3
2018-07-03
5
but on solving it using JOIN, I am not able to understand, why do we need to sum over a2.games_played and not a1.games_played. The code is below
SELECT a1.player_id, a1.event_date ,SUM(a2.games_played) AS games_played_so_far
FROM activity a1, activity a2
WHERE a1.player_id = a2.player_id
AND a1.event_date >=a2.event_date
GROUP BY a1.player_id, a1.event_date
ORDER BY a1.player_id, a1.event_date;
And, I wrote the following code and got the below result. If one sees the output below, then a1_played looks aligned and a2_Played contains only 0 and 5 values. I am not able to make out why then are we summing over a2_played
SELECT a1.player_id, a1.event_date as a1_Date, a2.event_date as a2_Date,a1.games_played as a1played,a2.games_played as a2played,
SUM(a1.games_played) AS sum_a1,SUM(a2.games_played) AS sum_a2
FROM activity a1, activity a2
WHERE a1.player_id = a2.player_id
AND a1.event_date >=a2.event_date
GROUP BY a1.player_id, a1.event_date
ORDER BY a1.player_id, a1.event_date;
Player_Id
a1_Date
a2_Date
a1_Played
a2_Played
sum_a1_Played
Sum_a2_Played
1
2016-03-01
2016-03-01
5
5
5
5
1
2016-05-02
2016-03-01
6
5
12
11
1
2017-06-25
2016-03-01
1
5
3
12
3
2016-03-02
2016-03-02
0
0
0
0
3
2018-07-03
2016-03-02
5
0
10
5
If you rewrote your query in your dad's SQL rather than your granddad's SQL, that is with explicit JOINs, the answer would leap out at you. (Sorry to be snarky, but I personally switched to explicit JOINs in 1994 and never looked back.)
SELECT a1.player_id, a1.event_date, SUM(a2.games_played) AS games_so_far
FROM activity a1
JOIN activity a2 ON a1.player_id = a2.player_id
AND a1.event_date >=a2.event_date
GROUP BY a1.player_id, a1.event_date
ORDER BY a1.player_id, a1.event_date
The >= inequality in the second part of the ON clause does the trick. Each successive event_date from a1 joins to an ever-increasing number of rows. Then they get summed up.

Creating a new calculated column in SQL

Is there a way to find the solution so that I need for 2 days, there are 2 UD's because there are June 24 2 times and for the rest there are single days.
I am showing the expected output here:
Primary key UD Date
-------------------------------------------
1 123 2015-06-24 00:00:00.000
6 456 2015-06-24 00:00:00.000
2 123 2015-06-25 00:00:00.000
3 658 2015-06-26 00:00:00.000
4 598 2015-06-27 00:00:00.000
5 156 2015-06-28 00:00:00.000
No of times Number of days
-----------------------------
4 1
2 2
The logic is 4 users are there who used the application on 1 day and there are 2 userd who used the application on 2 days
You can use two levels of aggregation:
select cnt, count(*)
from (select date, count(*) as cnt
from t
group by date
) d
group by cnt
order by cnt desc;

Getting date difference between consecutive rows in the same group

I have a database with the following data:
Group ID Time
1 1 16:00:00
1 2 16:02:00
1 3 16:03:00
2 4 16:09:00
2 5 16:10:00
2 6 16:14:00
I am trying to find the difference in times between the consecutive rows within each group. Using LAG() and DATEDIFF() (ie. https://stackoverflow.com/a/43055820), right now I have the following result set:
Group ID Difference
1 1 NULL
1 2 00:02:00
1 3 00:01:00
2 4 00:06:00
2 5 00:01:00
2 6 00:04:00
However I need the difference to reset when a new group is reached, as in below. Can anyone advise?
Group ID Difference
1 1 NULL
1 2 00:02:00
1 3 00:01:00
2 4 NULL
2 5 00:01:00
2 6 00:04:00
The code would look something like:
select t.*,
datediff(second, lag(time) over (partition by group order by id), time)
from t;
This returns the difference as a number of seconds, but you seem to know how to convert that to a time representation. You also seem to know that group is not acceptable as a column name, because it is a SQL keyword.
Based on the question, you have put group in the order by clause of the lag(), not the partition by.

Postgres count items by interval

I am trying to get the count of items given an interval with no start or stop times specified. I would imagine you could do it with window functions but i am not too sure how to go about it.
The problem is as follows i would like to get the number of times people login to a website within a given an arbitrary interval say 20 mins.
Example A
1. 2015-06-24 23:00:00
2. 2015-06-24 23:45:00
3. 2015-06-25 00:00:00
4. 2015-06-25 00:15:00
5. 2015-06-25 00:17:00
6. 2015-06-25 00:21:00
In the above example I would highlight items (2,3),(3,4,5), (4,5,6), (5,6) the output I would like is the
start,end,count
2015-06-25 23:45:00,2015-06-25 00:00:00,2
2015-06-25 00:00:00,2015-06-25 00:17:00,3
2015-06-25 00:15:00,2015-06-25 00:21:00,3
Also only keep the data where count >= 2 otherwise everything will be a valid grouping
Now is a window function the way i should go, cte or is there another practice to adopt?
Try this query with self join:
select a.id, a.log_at, max(b.log_at), count(1)
from logs a
join logs b on b.log_at >= a.log_at and b.log_at <= a.log_at+ '20 m'::interval
group by 1, 2
having count(1) > 1
order by 1
You can get each "day" groups with counts by a query like:
SELECT MIN(last_seen_at), MAX(last_seen_at), COUNT(*)
FROM user_kinds
GROUP BY DATE(last_seen_at)
ORDER BY DATE(last_seen_at) DESC LIMIT 5;
Which on my sample data set yields a result like:
2015-06-26 00:12:30.476548 | 2015-06-26 22:06:25.134322 | 69
2015-06-25 00:46:03.392651 | 2015-06-25 23:49:46.616964 | 14
2015-06-24 14:22:33.578176 | 2015-06-24 23:39:01.32241 | 10
2015-06-23 01:42:53.438663 | 2015-06-23 20:12:21.864601 | 2
(5 rows)

Access SQL - Select only the last sequence

I have a table with an ID and multiple informative columns. Sometimes however, I can have multiple data for an ID, so I added a column called "Sequence". Here is a shortened example:
ID Sequence Name Tel Date Amount
124 1 Bob 873-4356 2001-02-03 10
124 2 Bob 873-4356 2002-03-12 7
124 3 Bob 873-4351 2006-07-08 24
125 1 John 983-4568 2007-02-01 3
125 2 John 983-4568 2008-02-08 13
126 1 Eric 345-9845 2010-01-01 18
So, I would like to obtain only these lines:
124 3 Bob 873-4351 2006-07-08 24
125 2 John 983-4568 2008-02-08 13
126 1 Eric 345-9845 2010-01-01 18
Anyone could give me a hand on how I could build a SQL query to do this ?
Thanks !
You can calculate the maximum sequence using group by. Then you can use join to get only the maximum in the original data.
Assuming your table is called t:
select t.*
from t join
(select id, MAX(sequence) as maxs
from t
group by id
) tmax
on t.id = tmax.id and
t.sequence = tmax.maxs