Get row for each unique user based on highest column value - sql

I have the following data
+--------+-----------+--------+
| UserId | Timestamp | Rating |
+--------+-----------+--------+
| 1 | 1 | 1202 |
| 2 | 1 | 1198 |
| 1 | 2 | 1204 |
| 2 | 2 | 1196 |
| 1 | 3 | 1206 |
| 2 | 3 | 1194 |
| 1 | 4 | 1198 |
| 2 | 4 | 1202 |
+--------+-----------+--------+
I am trying to find the distribution of each user's Rating, based on their latest row in the table (latest is determined by Timestamp). On the path to that, I am trying to get a list of user IDs and Ratings which would look like the following
+--------+--------+
| UserId | Rating |
+--------+--------+
| 1 | 1198 |
| 2 | 1202 |
+--------+--------+
Trying to get here, I sorted the list on UserId and Timestamp (desc) which gives the following.
+--------+-----------+--------+
| UserId | Timestamp | Rating |
+--------+-----------+--------+
| 1 | 4 | 1198 |
| 2 | 4 | 1202 |
| 1 | 3 | 1206 |
| 2 | 3 | 1194 |
| 1 | 2 | 1204 |
| 2 | 2 | 1196 |
| 1 | 1 | 1202 |
| 2 | 1 | 1198 |
+--------+-----------+--------+
So now I just need to take the top N rows, where N is the number of players. But, I can't do a LIMIT statement as that needs a constant expression, as I want to use count(id) as the input for LIMIT which doesn't seem to work.
Any suggestions on how I can get the data I need?
Cheers!
Andy

This should work:
SELECT test.UserId, Rating FROM test
JOIN
(select UserId, MAX(Timestamp) Timestamp FROM test GROUP BY UserId) m
ON test.UserId = m.UserId AND test.Timestamp = m.Timestamp
If you can use WINDOW FUNCTIONS then you can use the following:
SELECT UserId, Rating FROM(
SELECT UserId, Rating, ROW_NUMBER() OVER (PARTITION BY UserId ORDER BY Timestamp DESC) row_num FROM test
)m WHERE row_num = 1

Related

SQL Count depending on certain conditions

I have two tables.
One have userid and email (users table). The other have payments information (payments table) from the userid in users.
users
+--------+------------+
| Userid | Name |
+--------+------------+
| 1 | Alex T |
| 2 | Jeremy T |
| 3 | Frederic A |
+--------+------------+
payments
+--------+-----------+------------+----------+
| Userid | ValuePaid | PaidMonths | Refunded |
+--------+-----------+------------+----------+
| 1 | 1 | 12 | null |
| 1 | 20 | 12 | null |
| 1 | 20 | 12 | null |
| 1 | 20 | 1 | null |
| 2 | 1 | 1 | null |
| 2 | 20 | 12 | 1 |
| 2 | 20 | 12 | null |
| 2 | 20 | 1 | null |
| 3 | 1 | 12 | null |
| 3 | 20 | 1 | 1 |
| 3 | 20 | 1 | null |
+--------+-----------+------------+----------+
I want to count the PaidMonths taking in consideration the following rules:
If ValuePaid < 10 PaidMonths should be = 0.23 (even if in the column the value seen is any other mumber).
If Refund=1 the PaidMonths should be = 0.
Based on this when i join both tables by userid, and sum the PaidMonths based in the previousrules, i expect to see as result:
+--------+------------+------------+
| userid | Name | paidMonths |
+--------+------------+------------+
| 1 | Alex T | 25.23 |
| 2 | Jeremy T | 13.23 |
| 3 | Frederic A | 1.23 |
+--------+------------+------------+
Can you help me to achieve this in the most elegant way? Should a temporary table be used?
The following gives your desired results, using apply with case expression to map your values:
select u.UserID, u.Name, Sum(pm) PaidMonths
from users u join payments p on p.userid=u.userid
cross apply (values(
case
when valuepaid <10 then 0.23
when Refunded=1 then 0
else PaidMonths end
))x(pm)
group by u.UserID, u.Name
See Working Fiddle

SQL - Create number of categories based on pre-defined number of splits

I am using BigQuery, and trying to assign categorical values to each of my records, based on the number of 'splits' assigned to it.
The table has a cumulative count of records, grouped at the STR level - i.e., if there are 4 SKUs at 2 STR, the SKUs will be labeled 1,2,3,4. Each STR is assigned a SPLIT value, so if the STR has a SPLIT value of 2, I want it to split its SKUs into 2 categories. I want to create another column that would assign SKUs labeled 1-2 as '1', and SKUs labeled 3-4 as '2'. (The actual data is on a much larger scale, but thought this would be easier.)
+-----+------+---------------+--------+
| STR | SKU | SKU_ROW_COUNT | SPLITS |
+-----+------+---------------+--------+
| 1 | 1230 | 1 | 3 |
| 1 | 1231 | 2 | 3 |
| 1 | 1232 | 3 | 3 |
| 1 | 1233 | 4 | 3 |
| 1 | 1234 | 5 | 3 |
| 1 | 1235 | 6 | 3 |
| 2 | 1310 | 1 | 2 |
| 2 | 1311 | 2 | 2 |
| 2 | 1312 | 3 | 2 |
| 2 | 1313 | 4 | 2 |
| 3 | 2345 | 1 | 1 |
| 3 | 2346 | 2 | 1 |
| 3 | 2347 | 3 | 1 |
+-----+------+---------------+--------+
The SPLITS column is dynamic, ranging from 1 to 3. The number of SKUs in each category should be relatively equal, but that's not a priority as much as just the number of groups that are created. Ideally, the final table with the new column (HOST_NUMBER) would look something like this:
+-----+------+---------------+--------+-------------+
| STR | SKU | SKU_ROW_COUNT | SPLITS | HOST_NUMBER |
+-----+------+---------------+--------+-------------+
| 1 | 1230 | 1 | 3 | 1 |
| 1 | 1231 | 2 | 3 | 1 |
| 1 | 1232 | 3 | 3 | 2 |
| 1 | 1233 | 4 | 3 | 2 |
| 1 | 1234 | 5 | 3 | 3 |
| 1 | 1235 | 6 | 3 | 3 |
| 2 | 1310 | 1 | 2 | 1 |
| 2 | 1311 | 2 | 2 | 1 |
| 2 | 1312 | 3 | 2 | 2 |
| 2 | 1313 | 4 | 2 | 2 |
| 3 | 2345 | 1 | 1 | 1 |
| 3 | 2346 | 2 | 1 | 1 |
| 3 | 2347 | 3 | 1 | 1 |
+-----+------+---------------+--------+-------------+
You can use window functions and arithmetics:
select
t.*,
1 + floor((sku_row_count - 1) * splits / count(*) over(partition by str)) host_number
from mytable t
order by sku
Actually, ntile() seems to do exactly what you want - and you don't even need the sku_row_count column (which basically mimics row_number() anyway):
select
t.*,
ntile(splits) over(partition by str order by sku) host_number
from mytable t
order by sku
If the ordering of the values in the groups doesn't matter, just use modulo arithmetic:
select t.*, (SKU_ROW_COUNT % SPLITS) as split_group
from t
Below is for BigQuery Standard SQL
#standardSQL
SELECT *, 1 + MOD(SKU_ROW_COUNT, SPLITS) AS HOST_NUMBER
FROM `project.dataset.table`

Group by multiple columns and limit per group - Postgres

I'm creating a messaging app as a side project and I'm trying to query a user's conversations efficiently.
The messages table structure is basic right now with some dummy data:
| id | sender_id | receiver_id | message | created_at |
|------|-----------|-------------|---------|------------|
| 1 | 1 | 2 | text | time |
| 2 | 2 | 1 | text | time |
| 3 | 1 | 2 | text | time |
| 4 | 1 | 3 | text | time |
| 5 | 3 | 2 | text | time |
| 6 | 3 | 1 | text | time |
| 7 | 2 | 1 | text | time |
I'd like to be able to query the DB and group by "conversation" - A.K.A any rows that have the same sender_id or receiver_id in either column - rows (1, 2, 3, 7), (4, 6), (5). I'd like to be able to limit each "group" to n rows and order them by the created_at column. It would ideally look like (created_at values are arbitrary numbers to show descending values):
| id | sender_id | receiver_id | message | created_at |
|------|-----------|-------------|---------|------------|
| 1 | 1 | 2 | text | 400 |
| 2 | 2 | 1 | text | 300 |
| 3 | 1 | 2 | text | 200 |
| 7 | 2 | 1 | text | 100 |
| 4 | 1 | 3 | text | 700 |
| 6 | 3 | 1 | text | 500 |
| 5 | 3 | 2 | text | 300 |
Ideally there would be an additional column added that would number each group (easy to create multi-dimensional array.
So far I've been able to "group" by sender/receiver ids, order by created_at, and limit the number per group. However, It's not quite right. Here's the query:
SELECT
filter.id, filter.sender_id, filter.receiver_id, filter.message, filter.created_at
FROM (
SELECT messages.*,
rank() OVER (
PARTITION BY sender_id
ORDER BY created_at DESC
)
FROM messages
WHERE messages.sender_id = 1 or messages.receiver_id = 1
) filter WHERE rank <= 50;
My result set looks like this:
| id | sender_id | receiver_id | message | created_at |
|------|-----------|-------------|---------|------------|
| 1 | 1 | 2 | text | 400 |
| 3 | 1 | 2 | text | 300 |
| 4 | 1 | 3 | text | 700 |
| 2 | 2 | 1 | text | 300 |
| 7 | 2 | 1 | text | 100 |
| 6 | 3 | 1 | text | 500 |
| 5 | 3 | 2 | text | 300 |
You can see that row 3 and 6 should be grouped but aren't.
You can use rank(). To limit the number of records per conversation (ie sender/receiver or receiver/sender tuple), you can use a partition like least(sender_id, receiver_id), greatest(sender_id, receiver_id):
select filter.id, filter.sender_id, filter.receiver_id, filter.message, filter.created_at
from (
select
t.*,
rank() over(
partition by least(sender_id, receiver_id), greatest(sender_id, receiver_id)
order by created_at desc
) rn
from mytable t
) t
where rn <= 50
order by least(sender_id, receiver_id), greatest(sender_id, receiver_id), rn

SQL aggregation over one column giving a result from another

I am trying (and failing) to join some tables in a SQLite database. The data itself is complicated but I think I have boiled it down to an illustrative example.
Here are the three tables I want to join.
Table: Events
+----+---------+-------+-----------+
| id | user_id | class | timestamp |
+----+---------+-------+-----------+
| 1 | 'user1' | 6 | 100 |
| 2 | 'user1' | 12 | 400 |
| 3 | 'user1' | 4 | 900 |
| 4 | 'user2' | 6 | 400 |
| 5 | 'user2' | 3 | 800 |
| 6 | 'user2' | 8 | 900 |
+----+---------+-------+-----------+
Table: Games
+---------+---------+------------+-----------+
| user_id | game_id | game_class | timestamp |
+---------+---------+------------+-----------+
| 'user1' | 1 | 'A' | 200 |
| 'user2' | 2 | 'A' | 300 |
| 'user1' | 3 | 'B' | 500 |
| 'user1' | 4 | 'A' | 600 |
| 'user1' | 5 | 'A' | 700 |
+---------+---------+------------+-----------+
Table: AScores
+---------+-------+
| game_id | score |
+---------+-------+
| 1 | 8 |
| 2 | 2 |
| 4 | 9 |
| 5 | 6 |
+---------+-------+
I would like to join these to provide an additional column on the first table containing the users current score in game class A at the time of the event. I.e. I would like theresult of the join to look like this:
Desired Result
+----+----------+-------+-----------+-----------------+
| id | user_id | class | timestamp | current_a_score |
+----+----------+-------+-----------+-----------------+
| 1 | 'user1' | 6 | 100 | (null) |
| 2 | 'user1' | 12 | 400 | 8 |
| 3 | 'user1' | 4 | 900 | 6 |
| 4 | 'user2' | 6 | 400 | 2 |
| 5 | 'user2' | 3 | 800 | 2 |
| 6 | 'user2' | 8 | 900 | 2 |
+----+----------+-------+-----------+-----------------+
The following simple join pulls together the two tables AScores and Games.
SELECT * FROM AScores
INNER JOIN Games
ON AScores.game_id = Games.game_id
And so I was hoping to join this to the Events table as a sub-query. Something like this:
SELECT Events.*, AScoredGames.time_stamp AS game_time_stamp, AScoredGames.score
FROM Events
LEFT OUTER JOIN (
SELECT AScores.score, Games.* FROM AScores
INNER JOIN Games
ON AScores.game_id = Games.game_id
) AS AScoredGames
ON Events.user_id = AScoredGames.user_id
AND Events.time_stamp >= AScoredGames.time_stamp
ORDER BY Events.time_stamp ASC
That results in the following:
+----+---------+-------+------------+-----------------+-------+
| id | user_id | class | time_stamp | game_time_stamp | score |
+----+---------+-------+------------+-----------------+-------+
| 1 | user1 | 6 | 100 | NULL | NULL |
| 2 | user1 | 12 | 400 | 200 | 8 |
| 4 | user2 | 6 | 400 | 300 | 2 |
| 5 | user2 | 3 | 800 | 300 | 2 |
| 6 | user2 | 8 | 900 | 300 | 2 |
| 3 | user1 | 4 | 900 | 200 | 8 |
| 3 | user1 | 4 | 900 | 600 | 9 |
| 3 | user1 | 4 | 900 | 700 | 6 |
+----+---------+-------+------------+-----------------+-------+
So I need to group by Events.id to get rid of the triplicated row with Events.id 3. But what I want to do is to choose the row with the maximum game_time_stamp but then use the row's score. If I do MAX(game_time_stamp) as my aggregation I still have to independently aggregate the score. Is there a way to tie the row choice in the score column's aggregation function to the result of the game_time_stamp column's aggregation function?
(N.B. Existing answers to questions like Select first record in a One-to-Many relation using left join and SQL Server: How to Join to first row seem to suggest I cannot and say one must use a WHERE clause over a sub-query. But I am struggling with that (I'll post another question about that) and I can think of at least one solution and I am hoping there are better ones.)
The following query should do it. It uses a NOT EXISTS condition with a correlated subquery to locate the relevant game record for each event.
SELECT e.*, s.score current_a_score
FROM
events e
LEFT JOIN games g
ON g.user_id = e .user_id
AND g.timestamp < e.timestamp
AND NOT EXISTS (
SELECT 1
FROM games g1
WHERE
g1.user_id = e .user_id
AND g1.timestamp < e.timestamp
AND g1.timestamp > g.timestamp
)
LEFT JOIN ascores s
ON s.game_id = g.game_id
ORDER BY e.id
This DB Fiddle demo with your test data returns :
| id | user_id | class | timestamp | current_a_score |
| --- | ------- | ----- | --------- | --------------- |
| 1 | user1 | 6 | 100 | |
| 2 | user1 | 12 | 400 | 8 |
| 3 | user1 | 4 | 900 | 6 |
| 4 | user2 | 6 | 400 | 2 |
| 5 | user2 | 3 | 800 | 2 |
| 6 | user2 | 8 | 900 | 2 |
I have one work-around, but it feels hacky and relies on the specifics of my data. First note that the time_stamps are all multiples of 100 while the scores are all below 10. I can acombine these in a way that will not interfere with my comparison but will mean they are both encoded in one numeric column. This query gives the desired result:
SELECT Events.id, MIN(Events.user_id) AS user_id, MIN(Events.class) AS class, MIN(Events.time_stamp) AS time_stamp, MAX(AScoredGames.combination) % 10 AS current_a_score
FROM Events
LEFT OUTER JOIN (
SELECT AScores.score, AScores.score + (Games.time_stamp - 10) AS combination, Games.* FROM AScores
INNER JOIN Games
ON AScores.game_id = Games.game_id) AS AScoredGames
ON Events.user_id = AScoredGames.user_id AND Events.time_stamp >= AScoredGames.time_stamp
GROUP BY Events.id
ORDER BY id ASC
(The combining is done in AScores.score + (Games.time_stamp - 10) and so the aggregate function becomes MAX(AScoredGames.combination) % 10.)
Actual Result
+----+---------+-------+------------+-----------------+
| id | user_id | class | time_stamp | current_a_score |
+----+---------+-------+------------+-----------------+
| 1 | user1 | 6 | 100 | NULL |
| 2 | user1 | 12 | 400 | 8 |
| 3 | user1 | 4 | 900 | 6 |
| 4 | user2 | 6 | 400 | 2 |
| 5 | user2 | 3 | 800 | 2 |
| 6 | user2 | 8 | 900 | 2 |
+----+---------+-------+------------+-----------------+

Limit a sorted number of rows joined

I have two tables, A and B, and a join table M. I want to, for each A.id, get the top 2 B.id's sorting on the value in table M, producing the results below. This is running on an Azure SQL database
Table A Table M Table B
+-----+ +-----+-----+-------+ +-----+
| Id | | AId | BId | Value | | Id |
+-----+ +-----+-----+-------+ +-----+
| 1 | | 1 | 3 | 4 | | 1 |
| 2 | | 1 | 2 | 3 | | 2 |
| 3 | | 3 | 2 | 3 | | 3 |
| 4 | | 3 | 5 | 6 | | 4 |
+-----+ | 3 | 3 | 4 | | 5 |
| 4 | 1 | 2 | +-----+
| 4 | 2 | 1 |
| 4 | 4 | 3 |
+-----+-----+-------+
Result
+-----+-----+-------+
| AId | BId | Value |
+-----+-----+-------+
| 1 | 3 | 4 |
| 1 | 2 | 3 |
| 3 | 5 | 6 |
| 3 | 3 | 4 |
| 4 | 1 | 2 |
| 4 | 4 | 3 |
+-----+-----+-------+
I know that I can select all the M.AId rows where they equal 1, sort it, and limit by 2, but I need to do this for every row in Table A. I've made an attempt to use group by, but I wasn't sure how to sort and limit it. I've also tried to search for resources associated with this issue but I couldn't find any resources.
(I also wasn't sure how to word the title for this issue)
You can just use ROW_NUMBER:
SELECT
AId, BId, Value
FROM (
SELECT *,
Rn = ROW_NUMBER() OVER(PARTITION BY AId ORDER BY Value DESC)
FROM M
) t
WHERE Rn <= 2