LeetCode 534. Game Play Analysis III - sql

The following is the leetcode question
Table: Activity
Column Name
Type
player_id
int
device_id
int
event_date
date
games_played
int
(player_id, event_date) is the primary key of this table.
This table shows the activity of players of some games.
Each row is a record of a player who logged in and played a number of games (possibly 0) before logging out on someday using some device.
Write an SQL query to report for each player and date, how many games played so far by the player. That is, the total number of games played by the player until that date. Check the example for clarity.
Return the result table in any order.
The query result format is in the following example.
Player_Id
Device_ID
Event_Date
Games_Played
1
2
2016-03-01
5
1
2
2016-03-02
6
1
3
2017-06-25
1
3
1
2016-03-02
0
3
4
2018-07-03
5
I have solved it using the Windows function . My code is below
select player_id,event_date,sum(games_played) over(partition by player_id order by event_date) as games_played_so_far from activity
and the output
player_id
event_date
games_played_So_Far
1
2016-03-01
5
1
2016-05-02
11
1
2017-06-25
12
3
2016-03-02
0
3
2018-07-03
5
but on solving it using JOIN, I am not able to understand, why do we need to sum over a2.games_played and not a1.games_played. The code is below
SELECT a1.player_id, a1.event_date ,SUM(a2.games_played) AS games_played_so_far
FROM activity a1, activity a2
WHERE a1.player_id = a2.player_id
AND a1.event_date >=a2.event_date
GROUP BY a1.player_id, a1.event_date
ORDER BY a1.player_id, a1.event_date;
And, I wrote the following code and got the below result. If one sees the output below, then a1_played looks aligned and a2_Played contains only 0 and 5 values. I am not able to make out why then are we summing over a2_played
SELECT a1.player_id, a1.event_date as a1_Date, a2.event_date as a2_Date,a1.games_played as a1played,a2.games_played as a2played,
SUM(a1.games_played) AS sum_a1,SUM(a2.games_played) AS sum_a2
FROM activity a1, activity a2
WHERE a1.player_id = a2.player_id
AND a1.event_date >=a2.event_date
GROUP BY a1.player_id, a1.event_date
ORDER BY a1.player_id, a1.event_date;
Player_Id
a1_Date
a2_Date
a1_Played
a2_Played
sum_a1_Played
Sum_a2_Played
1
2016-03-01
2016-03-01
5
5
5
5
1
2016-05-02
2016-03-01
6
5
12
11
1
2017-06-25
2016-03-01
1
5
3
12
3
2016-03-02
2016-03-02
0
0
0
0
3
2018-07-03
2016-03-02
5
0
10
5

If you rewrote your query in your dad's SQL rather than your granddad's SQL, that is with explicit JOINs, the answer would leap out at you. (Sorry to be snarky, but I personally switched to explicit JOINs in 1994 and never looked back.)
SELECT a1.player_id, a1.event_date, SUM(a2.games_played) AS games_so_far
FROM activity a1
JOIN activity a2 ON a1.player_id = a2.player_id
AND a1.event_date >=a2.event_date
GROUP BY a1.player_id, a1.event_date
ORDER BY a1.player_id, a1.event_date
The >= inequality in the second part of the ON clause does the trick. Each successive event_date from a1 joins to an ever-increasing number of rows. Then they get summed up.

Related

Creating a new calculated column in SQL

Is there a way to find the solution so that I need for 2 days, there are 2 UD's because there are June 24 2 times and for the rest there are single days.
I am showing the expected output here:
Primary key UD Date
-------------------------------------------
1 123 2015-06-24 00:00:00.000
6 456 2015-06-24 00:00:00.000
2 123 2015-06-25 00:00:00.000
3 658 2015-06-26 00:00:00.000
4 598 2015-06-27 00:00:00.000
5 156 2015-06-28 00:00:00.000
No of times Number of days
-----------------------------
4 1
2 2
The logic is 4 users are there who used the application on 1 day and there are 2 userd who used the application on 2 days
You can use two levels of aggregation:
select cnt, count(*)
from (select date, count(*) as cnt
from t
group by date
) d
group by cnt
order by cnt desc;

Getting date difference between consecutive rows in the same group

I have a database with the following data:
Group ID Time
1 1 16:00:00
1 2 16:02:00
1 3 16:03:00
2 4 16:09:00
2 5 16:10:00
2 6 16:14:00
I am trying to find the difference in times between the consecutive rows within each group. Using LAG() and DATEDIFF() (ie. https://stackoverflow.com/a/43055820), right now I have the following result set:
Group ID Difference
1 1 NULL
1 2 00:02:00
1 3 00:01:00
2 4 00:06:00
2 5 00:01:00
2 6 00:04:00
However I need the difference to reset when a new group is reached, as in below. Can anyone advise?
Group ID Difference
1 1 NULL
1 2 00:02:00
1 3 00:01:00
2 4 NULL
2 5 00:01:00
2 6 00:04:00
The code would look something like:
select t.*,
datediff(second, lag(time) over (partition by group order by id), time)
from t;
This returns the difference as a number of seconds, but you seem to know how to convert that to a time representation. You also seem to know that group is not acceptable as a column name, because it is a SQL keyword.
Based on the question, you have put group in the order by clause of the lag(), not the partition by.

Sqlite query for fetching latest Exam date time with distinct patientID

In Sqlite db I have a table: Examination with columns ExamID, InternalPID, ExamDateTime
ExamID InternalPID ExamDateTime (from left to right)
1 2 2015-03-11
2 1 2015-11-11
3 4 2015-05-01
4 6 2015-08-10
5 2 2015-04-22
6 1 2014-12-11
7 2 2015-03-12
the query output should be latest Examination date of each patient. i.e the InternalPID should be distinct with its latest ExamDateTime.
Expect output from query:
ExamID InternalPID ExamDateTime
5 2 2015-04-22
2 1 2015-11-11
3 4 2015-05-01
4 6 2015-08-10
Thank you in advance
You can do this using a join and aggregation or a clever where clause:
select e.*
from examination e
where e.ExamDateTime = (select max(e2.ExamDateTime)
from examination e2
where e2.patientid = e.patientid
);

When Using OVER with COUNT, What Does It Mean to Use Two Arguments With PARTITION BY?

SELECT
M.Listing_ID,
COUNT(1) OVER (PARTITION BY M.User_ID,EXTRACT(MONTH FROM M.Start_Date)
ORDER BY M.Start_Date, M.Listing_ID ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) X
FROM LISTINGS M
Here is LISTINGS:
Listings
User_ID Listing_ID Start_Date
A 1 2014-02-14
A 2 2014-03-10
A 3 2014-03-22
B 4 2014-06-08
B 5 2014-10-02
C 6 2014-09-04
C 7 2014-09-04
C 8 2014-09-04
C 9 2014-09-05
C 10 2014-10-03
I'm trying to decode what this code returns but I don't really know what it means to partition by 2 catagories. Can someone shed light?
You will get the number of rows, COUNT, by user and by month. The PARTITION BY tells the database when to reset the count

SQL - Datediff between rows with Rank Applied

I am trying to work out how to to apply a datediff between rows where a rank is applied to the USER ID;
Example of how the data below;
UserID Order Number ScanDateStart ScanDateEnd Minute Difference Rank | Minute Difference Rank vs Rank+1
User1 10-24 10:20:00 10:40:00 20 1 | 5
User1 10-25 10:45:00 10:50:00 5 2 | 33
User1 10-26 11:12:00 11:45:00 33 3 | NULL
User2 10-10 00:09:00 00:09:20 20 1 | 4
User2 10-11 00:09:24 00:09:25 1 2 | 15
User2 10-12 00:09:40 00:10:12 32 3 | 3
User2 10-13 00:10:15 00:10:35 20 4 | NULL
What i'm looking for is how to code the final column of this table.
The rank is applied to UserID ordered by ScanDateStart.
Basically, i want to know the time between the ScanDateEnd of Rank 1, to ScanDateStart of Rank2, and so on, but for each user.... (calculating time between order processing etc)
Appreciate the help
This can be achieved by performing a LEFT JOIN to the same table on the UserID column and the Rank column, plus 1.
The following (simplified) pseudo-code should illustrate how to achieve this:
SELECT R.UserID,
R.Rank,
R1.Diff
FROM Rank R
LEFT JOIN Rank R1 ON R1.UserID = R.UserID AND R1.Rank = R.Rank + 1
Effectively, you are showing the UserID and Rank from the current row, but the Difference from the row of the same UserID with the Rank + 1.