Optimize a query for creating a ranking in MS SQL Server - sql

I'm creating an application where users do workouts. They pass on their results via an app, and these results are stored in an SQL Server database. Results are saved in this way in a SQL Server table:
I want to write a query to create a ranking based on the best score of each user. This is what I have so far:
SELECT id,
workout_id,
level_id,
a.user_id,
total_time,
score,
datetime_added
FROM nodefit_rankings_fitness as a INNER JOIN
(
SELECT user_id,
MAX(score) AS MAXSCORE
FROM nodefit_rankings_fitness
GROUP BY user_id
) AS lookup
ON lookup.user_id = a.user_id
AND
lookup.MAXSCORE = a.score
ORDER BY score DESC,
datetime_added DESC
This generates this ranking:
The problem is that if a user has achieved the same maximum score a number of times, he will appear multiple times in the ranking. The query must be adjusted so that when a user has the same maximum score a few times, only the result of the last attempt (based on the datetime_added column) is displayed in the rankings.
Unfortunately, I cannot find a solution myself. Help is certainly appreciated.

If you care about performance, you should also try a correlated subquery:
SELECT id, workout_id, level_id, a.user_id, total_time, score, datetime_added
FROM nodefit_rankings_fitness nrf
WHERE nrf.id = (SELECT TOP (1) nrf2.id
FROM nodefit_rankings_fitness nrf2
WHERE nrf2.user_id = nrf.user_id
ORDER BY nrf2.score DESC
)
ORDER BY score DESC, datetime_added DESC;
In particular, this can take advantage of an index on nodefit_rankings_fitness(user_id, score desc, id).

Window functions make stuff like this easy. Something like:
SELECT id, workout_id, level_id, user_id, total_time, score, datetime_added
FROM (SELECT *, row_number() OVER (PARTITION BY user_id ORDER BY score DESC, datetime_added DESC) AS rn
FROM nodefit_rankings_fitness) AS a
WHERE rn = 1
ORDER BY score DESC, datetime_added DESC;

Related

Show only the most frequest number SQL

How do i show only the highest value?
Even if there a tie
So like is there anyway to use MAX COUNT
using sqlLite
SELECT GAMEID,
COUNT(GAMEID)
FROM GAMES
GROUP BY GAMEID
ORDER BY COUNT(GAMEID) DESC
If you expect more than one row with the same count, you can use window functions to do this and deal with the ties.
You didn't specify your DBMS product, but the following is 100% ANSI standard SQL:
SELECT *
FROM (
SELECT gameid,
count(*),
dense_rank() over (order by count(*) desc) as rnk
FROM games
GROUP BY gameid
) t
WHERE rnk = 1
Online example
Since it sounds like you're using an outdated sqlite version and can't use window functions, here's another approach that handles ties:
WITH counted AS (SELECT gameid, count(gameid) AS count FROM games GROUP BY gameid)
SELECT gameid, count
FROM counted
WHERE count = (SELECT max(count) FROM counted);
You can add "LIMIT 1" to the end of the query and it will only show one result. However, if 2 entries have the same result it's arbitrary which one will be shown.
If you don't care about ties, then just use LIMIT:
SELECT GAMEID, COUNT(GAMEID) AS CNT
FROM GAMES
GROUP BY GAMEID
ORDER BY COUNT(GAMEID) DESC
LIMIT 1;
If you want to find the games having the highest count, including possible ties, then the RANK analytic function provides one option here:
WITH cte AS (
SELECT GAMEID, COUNT(GAMEID) AS CNT, RANK() OVER (ORDER BY COUNT(GAMEID) DESC) rnk
FROM GAMES
GROUP BY GAMEID
)
SELECT GAMEID, CNT
FROM cte
WHERE rnk = 1;

Create a MS SQL Server query to determine ranking

I'm creating an application where users do workouts. They pass on their results via an app, and these results are stored in an SQL Server database. Results are saved in this way in a SQL Server table:
I want to write a query to create a ranking based on the best score of each user. This is what I have so far (thanks to this post):
SELECT id, workout_id, level_id, user_id, total_time, score, datetime_added
FROM nodefit_rankings_fitness nrf
WHERE nrf.id = (SELECT TOP (1) nrf2.id
FROM nodefit_rankings_fitness nrf2
WHERE nrf2.user_id = nrf.user_id
ORDER BY nrf2.score DESC
)
ORDER BY score DESC, datetime_added DESC;
This generates following, where a ranking is created based on the best score for a user:
When a certain user submits a new workout, I want to check his ranking based on the last submitted workout, compared to the best performances of other users. So suppose user_id 2 adds a new workout, and his score is, say, 12, what is his current ranking based on that new performance? In that case he has a second place in this table. Thanks.
Try this:
DECLARE #CurrentUser INT = 5;
WITH DataSource AS
(
SELECT id,
workout_id,
level_id,
user_id,
total_time,
score,
datetime_added,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY score DESC, datetime_added DESC) rowID
FROM nodefit_rankings_fitness
WHERE user_id <> #CurrentUser
UNION ALL
SELECT id,
workout_id,
level_id,
user_id,
total_time,
score,
datetime_added,
ROW_NUMBER() OVER (ORDER BY datetime_added DESC) rowID
FROM nodefit_rankings_fitness
WHERE user_id = #CurrentUser
)
SELECT *
,ROW_NUMBER() OVER (ORDER BY score DESC) AS RankID
FROM DataSource
WHERE rowID = 1
ORDER BY RankID

SQL - Select maximum from each table, ordered by said maximum and then by created_at

SELECT id, author_id, max(result_score) as maxscore
FROM submissions
WHERE challenge_id = 10
GROUP BY author_id
ORDER BY maxscore DESC, created_at ASC
This query gets the submission(one) from each author which has the biggest score and is created earliest. In the end, we should end up with ordered submissions, one for each author, with all of them ordered by maxscore and created_at
This worked perfectly in SQLite3, but it fails to compile on PostgreSQL as it is more strict.
PostgreSQL requires id to either by used in a group by clause or in some sort of aggregate function.
I tried all sort of approaches, using DISTINCT ON or HAVING but could not get it to work. Is this sort of query possible and if yes, what is a way to achieve what I want here?
I managed to solve this using the rank function like this
SELECT id, author_id, result_score
FROM (
SELECT id, author_id, result_score, created_at,
rank() OVER (PARTITION BY author_id ORDER BY result_score DESC, created_at ASC) AS rank
FROM submission
WHERE challenge_id = %s) as sub
WHERE sub.rank = 1
ORDER BY result_score DESC, created_at ASC
It only works "perfectly" in SQLite because SQLite is broken. That is, it would fail in almost any other database, because it is not correct SQL.
In Postgres, you can do what you want using DISTINCT ON:
SELECT DISTINCT ON (author_id) id, author_id, result_score
FROM submissions
WHERE challenge_id = 10
ORDER BY author_id, result_score DESC, created_at ASC;
EDIT:
If you want the final result sorted by result_score, then you can use a subquery:
SELECT s.*
FROM (SELECT DISTINCT ON (author_id) id, author_id, result_score
FROM submissions
WHERE challenge_id = 10
ORDER BY author_id, result_score DESC, created_at ASC
) s
ORDER BY result_score DESC, created_at ASC;

Excluding only one MIN value on Oracle SQL

I am trying to select all but the lowest value in a column (GameScore), but when there are two of this lowest value, my code excludes both (I know why it does this, I just don't know exactly how to correct it and include one of the two lowest values).
The code looks something like this:
SELECT Id, SUM(Score) / COUNT(Score) AS Score
FROM
(SELECT Id, Score
FROM GameScore
WHERE Game_No = 1
AND Score NOT IN
(SELECT MIN(Score)
FROM GameScore
WHERE Game_No = 1
GROUP BY Id))
GROUP BY Id
So if I am drawing from 5 values, but one of the rows only pulls 3 scores because the bottom two are the same, how do I include the 4th? Thanks.
In order to do this you have to separate them up somehow; your current issue is that the 2 lowest scores are the same so any (in)equality operation performed on either values treats the other one identically.
You could use something like the analytic query ROW_NUMBER() to uniquely identify rows:
select id, sum(score) / count(score) as score
from ( select id, score, row_number() over (order by score) as score_rank
from gamescore
where gameno = 1
)
where score_rank <> 1
group by id
ROW_NUMBER():
assigns a unique number to each row to which it is applied (either each row in the partition or each row returned by the query), in the ordered sequence of rows specified in the order_by_clause, beginning with 1.
As the ORDER BY clause is on SCORE in ascending order one of the lowest score will be removed. This will be a random value unless you add other tie-breaker conditions to the ORDER BY.
You can do this a few ways, including what #Ben shows. From a mostly SQL Server background I was curious if just ROWNUM could be used and found this piece on ROWNUM vs ROW_NUMBER interesting. I'm not sure if it is dated.
All in a SQLFiddle.
Note: I'm using a subquery factoring/CTE as I think the read more clearly than in-line subqueries.
Using ROWNUM:
WITH OrderedScore AS (
SELECT id, game_no, score
,rownum as score_rank
FROM GameScore
WHERE game_no = 1
ORDER BY Score ASC
)
SELECT id
,sum(score)/count(score)
FROM OrderedScore
WHERE score_rank > 1
GROUP BY id;
Using ROW_NUMBER() OVER(ORDER BY...) as Ben does:
WITH OrderedScore AS (
SELECT id, game_no, score
,ROW_NUMBER() OVER(ORDER BY score ASC) as score_rank
FROM GameScore
WHERE game_no = 1
ORDER BY Score ASC
)
SELECT id
,sum(score)/count(score)
FROM OrderedScore
WHERE score_rank > 1
GROUP BY id;
Using ROW_NUMBER() OVER(PARTION BY...ORDER BY...) which I think leads to more flexibility if you want to remove the low score by game_no or id at some point:
WITH OrderedScore AS (
SELECT id, game_no, score
,ROW_NUMBER() OVER(PARTITION BY id ORDER BY score ASC) as score_rank
FROM GameScore
WHERE game_no = 1
ORDER BY Score ASC
)
SELECT id
,sum(score)/count(score)
FROM OrderedScore
WHERE score_rank > 1
GROUP BY id;

How to query for records based on a group and ordering?

Given the following models:
User
id
UserPulses
id, user_id, group_id, created_at
What I would like to do is obtain all of a user's UserPulses grouped by (group_id) and only obtain the most recent UserPulse per group_id. I've been able to do this by looping through group by group, but that takes a large number of queries. Is this possible with one query?
Something like:
user.user_pulses.group("group_id)")
Any ideas? Thanks
You can't do this reliably through the usual ActiveRecord interface but you can do it through SQL using a window function. You want some SQL like this:
select id, user_id, group_id, created_at
from (
select id, user_id, group_id, created_at,
row_number() over (partition by group_id order by created_at desc, id desc) as r
from user_pulses
where user_id = :user_id
) dt
where r = 1
and then wrap that in find_by_sql:
pulses = UserPulses.find_by_sql([%q{
select id, user_id, group_id, created_at
from (
select id, user_id, group_id, created_at,
row_number() over (partition by group_id order by created_at desc, id desc) as r
from user_pulses
where user_id = :user_id
) dt
where r = 1
}, :user_id => user.id])
The window function part essentially does a local GROUP BY with each group_id, sorts them (with id desc as the secondary sort key as a "just in case" tie breaker), and tacks the per-group row number in r. Then the outer query filters out the first in each group (where r = 1)and peels off the originaluser_pulses` columns.
You can use the PostgreSQL specific extension of the SQL feature DISTINCT: DISTINCT ON
SELECT DISTINCT ON (group_id)
id, user_id, group_id, created_at
FROM user_pulses
WHERE user_id = :user_id
ORDER BY group_id, created_at DESC, id; -- id just to break ties
Simpler than window functions (but not as portable) and probably fastest.
More details under this related question.
Something like this, perhaps. But there could be multiple records for a user/group_id combo if they share the same date.
SELECT p.id, p.user_id, p.group_id, p.created_at
FROM UserPulses p
,( SELECT user_id, group_id, MAX(created_at) as max_date
FROM UserPulses
GROUP BY user_id, group_id ) m
WHERE u.user_id = m.user_id
AND u.group_id = m.group_id
AND u.created_at = m.max_date