I'm having trouble with a ranking of achievements in mysql - sql

I'm making a system that users earn achievements with certain actions on the site.
The database is divided into three, the user, achievements and table linking the two.
For the ranking what is important is the table that links users and achievements.
Structured like this (called user_achievements):
| id | user_id | achievement_id | created_at | updated_at |
On top ranking should be the person with the most achievements and who won that first.
The query that was assembled:
SELECT user_id,
COUNT(achievement_id) AS 'total',
created_at
FROM `user_achievements`
GROUP BY user_id ORDER BY total DESC, created_at ASC LIMIT 10
This query works, but the problem is the "group by" for who has the most achievements (for the "total"), it does not back the last created_at user. Come the first created_at that user, which does not fit sort who was the first to earn that amount of achievements.
My system uses Ruby on Rails, is there a solution to this query in ActiveRecord, would be of great help.
Thanks a lot!
Sorry for my english...

SELECT user_id,
COUNT(achievement_id) AS 'total',
MAX(created_at) as 'lastTS'
FROM `user_achievements`
GROUP BY user_id
ORDER BY total DESC, lastTS ASC LIMIT 10

I know you've selected your answer, but in case you still want a solution in ActiveRecord...
class User < ActiveRecord::Base
scope :ranked, order('achievements.count DESC, achievements.created_at ASC').limit(10)
end
And then you just need to call up Users.ranked or #users.ranked to get the return you need.

Related

Finding entry count for each unique id in SQL

I have an SQL database that shows the amount of times a person submits an entry. I want to count how many time each person who owns a unique id makes a claim. Each unique i.d can make mulpile entries into the table and I want to find out how many everyone has made.
I also want to filter the people based on the amount of entries they have made. For example 10.
select id, entry, COUNT(ID) from Table where COUNT(entry) <=10 GROUP BY ID
This is my thinking so far but I havent had much success. If anyone could help I would greatly appreciate it.
This is all you need
select id, COUNT(*) cnt
from Table
GROUP BY ID
having COUNT(*) <= 10
order by 2 desc -- optional descending count
Optionally, add Order by 2. desc or asc

SQL schema Site Leader Board

So I am trying to set up a site which has challenges and then want to convert that to leader boards for each challenge, and then an all time leaderboard.
So I have a challenges table that looks like this:
Challenge ID Challenge Name Challenge Date Sport Prize Pool
Then I need a way so each challenge has its own leader board of say 50 people.
linked by the challenge ID where that will = Leaderboard ID
I have a leader board of 50 people for that challenge that will look something like this:
Challenge ID User Place Prize Won
My question is 2 things:
How can I make a table auto create when a new challenge is added to the challenges table?
How can I get an A site wide leader board for every challenge so it will show the following:
Rank USER Prize Money Won(total every challenge placed)
and then base rank order by how much money won..
I know this is a lot of questions all wrapped in one, schema design and logic.
Any insights greatly appreciated
A better approach than one table per challenge is one table for all of them. That way you can compute grand totals and individual challenge rankings all with the same table. You'd also want to not record the place directly but compute it on the fly with the appropriate window function depending on how you want to handle ties (rank(), dense_rank(), and row_number() will have different results in those cases); that way you don't have to keep adjusting it as you add new records.
A table something like (You didn't specify a SQL database, so I'm going to assume Sqlite. Adjust as needed.):
CREATE TABLE challenge_scores(user_id INTEGER REFERENCES users(id),
challenge_id INTEGER REFERENCES challenges(id),
prize_amount NUMERIC,
PRIMARY KEY(user_id, challenge_id));
will let you do things like
SELECT *
FROM (SELECT user_id,
sum(prize_amount) AS total,
rank() OVER (ORDER BY sum(prize_amount) DESC) AS place
FROM challenge_scores
GROUP BY user_id)
WHERE place <= 50
ORDER BY place;
for the global leaderboard, or the similar:
SELECT *
FROM (SELECT user_id,
prize_amount,
rank() OVER (ORDER BY prize_amount DESC) AS place
FROM challenge_scores
WHERE challenge_id = :some_challenge_id
GROUP BY user_id)
WHERE place <= 50
ORDER BY place;
for a specific challenge's.

PostgreSQL: five stars rating, ordering objects

In my database users can add a vote from 1 to 5 stars to every groups.
Then I have to display a leaderboard by those votes.
What I was doing until now is to order them by votes average without a weight. This is not so nice because a group having 5.0 with 20 votes is before of a group having 4.9 avg and 10000 votes.
This is my votes table:
CREATE TABLE IF NOT EXISTS votes(
user_id BIGINT,
group_id BIGINT,
vote SMALLINT,
vote_date timestamp,
PRIMARY KEY (user_id, group_id)
This is how I sort them now:
SELECT
group_id,
COUNT(vote) AS amount,
ROUND(AVG(vote), 1) AS average,
RANK() OVER(PARTITION BY s.lang ORDER BY ROUND(AVG(VOTE), 1)DESC, COUNT(VOTE)DESC)
FROM votes
LEFT OUTER JOIN supergroups AS s
USING (group_id)
GROUP BY group_id, s.lang, s.banned_until, s.bot_inside
HAVING
(s.banned_until IS NULL OR s.banned_until < now())
AND COUNT(vote) >= %s
AND s.bot_inside IS TRUE
How could I add a sort of weight to solve the problem I said before?
I read about bayesan approach here but I am not sure if it's the right thing because I read it's about to sort the top 'n' elements, while I have to do a leaderboard including anyone of them.
you're going to have to fudge it somehow,
perhaps this way.
order by (0.0+sum(vote))/(count(vote)+log(count(vote)))
Or sqrt might work better than log, it depends how much weight you want the population size to have.
order by (0.0+sum(vote))/(count(vote)+sqrt(count(vote)))
basically the fudge needs to be a function that increases at a slower rate than it input. you could even try a constant.

How to efficiently get a range of ranked users (for a leaderboard) using Postgresql

I have read many posts on this topic, such as
mysql-get-rank-from-leaderboards.
However, none of the solutions are efficient at scale for getting a range of ranks from the database.
The problem is simple. Suppose we have a Postgres table with an "id" column and another INTEGER column whose values are not unique, but we have an index for this column.
e.g. table could be:
CREATE TABLE my_game_users (id serial PRIMARY KEY, rating INTEGER NOT NULL);
The goal
Define a rank for users ordering users on the "rating" column descending
Be able to query for a list of ~50 users ordered by this new "rank", centered at any particular user
For example, we might return users with ranks { 15, 16, ..., 64, 65 } where the center user has rank #40
Performance must scale, e.g. be under 80 ms for 100,000 users.
Attempt #1: row_number() window function
WITH my_ranks AS
(SELECT my_game_users.*, row_number() OVER (ORDER BY rating DESC) AS rank
FROM my_game_users)
SELECT *
FROM my_ranks
WHERE rank >= 4000 AND rank <= 4050
ORDER BY rank ASC;
This "works", but the queries average 550ms with 100,000 users on a fast laptop without any other real work being done.
I tried adding indexes, and re-phrasing this query to not use the "WITH" syntax, and nothing worked to speed it up.
Attempt #2 - count the number of rows with a greater rating value
I tried a query like this:
SELECT t1.*,
(SELECT COUNT(*)
FROM my_game_users t2
WHERE (t1.rating, -t1.id) <= (t2.rating, -t2.id)
) AS rank
FROM my_game_users t1
WHERE id = 2000;
This is decent, this query takes about 120ms with 100,000 users having random ratings. However, this only returns the rank for user with a particular id (2000).
I can't see any efficient way to extend this query to get a range of ranks. Any attempt at extending this makes a very slow query.
I only know the ID of the "center" user, since the users have to be ordered by rank before we know which ones are in the range!
Attempt #3: in-memory ordered Tree
I ended up using a Java TreeSet to store the ranks. I can update the TreeSet whenever a new user is inserted into the database, or a user's rating changes.
This is super fast, around 25 ms with 100,000 users.
However, it has a serious drawback that it's only updated on the Webapp node that serviced the request. I'm using Heroku and will deploy multiple nodes for my app. So, I needed to add a scheduled task for the server to re-build this ranking tree every hour, to make sure the nodes don't get too out-of-sync!
If anyone knows of an efficient way to do this in Postgres with full solution, then I am all ears!
You can get the same results by using order by rating desc and offset and limit to get users between a certain rank.
WITH my_ranks AS
(SELECT my_game_users.*, row_number() OVER (ORDER BY rating DESC) AS rank FROM my_game_users)
SELECT * FROM my_ranks WHERE rank >= 4000 AND rank <= 4050 ORDER BY rank ASC;
The query above is the same as
select * , rank() over (order by rating desc) rank
from my_game_users
order by rating desc
limit 50 offset 4000
If you want to select users around rank #40 you could select ranks #15-#65
select *, rank() over (order by rating desc) rank
from my_game_users
order by rating desc
limit 50 offset 15
Thanks, #FuzzyTree !
Your solution doesn't quite give me everything I need, but it nudged me in the right direction. Here's the full solution I'm going with for now.
The only limitation with your solution is that there's no way to get a unique rank for a particular user. All users with the same rating would have the same rank (or at least it is undefined by SQL standard). If I knew the OFFSET ahead of time, then your rank would be good enough, but I have to get the rank of a particular user first.
My solution is to do the following query to get a range of ranks:
SELECT * FROM my_game_users ORDER BY rating DESC, id ASC LIMIT ? OFFSET ?
This is basically uniquely defining the ranks by rating, then by who joined the Game first (lower id).
To make this efficient I'm creating an index on (rating DESC, id)
Then, I'm getting a particular user's rank to plug in to this query with:
SELECT COUNT(*) FROM my_game_users WHERE rating > ? OR (rating = ? AND id < ?)
I actually made this more efficient with:
SELECT (SELECT COUNT(*) FROM my_game_users WHERE rating > ?) + (SELECT COUNT(*) FROM my_game_users WHERE rating = ? AND id < ?) + 1
Now, even with these queries it takes about 78ms average and median time to get the ranks around a user. If anyone has a good idea how to speed these up I'm all ears!
For example, getting a range of ranks takes about 60ms, and explaining it yields:
EXPLAIN SELECT * FROM word_users ORDER BY rating DESC, id ASC LIMIT 50 OFFSET 50000;
"Limit (cost=6350.28..6356.63 rows=50 width=665)"
" -> Index Scan using idx_rating_desc_and_id on word_users (cost=0.29..12704.83 rows=100036 width=665)"
So, it's using the rating and id index, yet it still has this highly variable cost from 0.29...12704.83. Any ideas how to improve??
If you order it in desc order you have it in the right order. Use the rownumber() function.
Select Row number in postgres
Also you would use an in memory cache to store stuff in memory. Something like redis. Its a separate application that can serve multiple instances, even remotely.

Sql query - selecting top 5 rows and further selecting rows only if User is present

I kind of stuck on how to implement this query - this is pretty similar to the query I posted earlier but I'm not able to crack it.
I have a shopping table where everytime a user buys anything, a record is inserted.
Some of the fields are
* shopping_id (primary key)
* store_id
* user_id
Now what I need is to pull only the list of those stores where he's among the top 5 visitors:
When I break it down - this is what I want to accomplish:
* Find all stores where this UserA has visited
* For each of these stores - see who the top 5 visitors are.
* Select the store only if UserA is among the top 5 visitors.
The corresponding queries would be:
select store_id from shopping where user_id = xxx
select user_id,count(*) as 'visits' from shopping
where store_id in (select store_id from shopping where user_id = xxx)
group by user_id
order by visits desc
limit 5
Now I need to check in this resultset if UserA is present and select that store only if he's present.
For example if he has visited a store 5 times - but if there are 5 or more people who have visited that store more than 5 times - then that store should not be selected.
So I'm kind of lost here.
Thanks for your help
This should do it. It uses an intermediate VIEW to figure out how many times each user has shopped at each store. Also, it assumes you have a stores table somewhere with each store_id listed once. If that's not true, you can change SELECT store_id FROM stores to SELECT DISTINCT store_id FROM shopping for the same effect but slower results.
CREATE VIEW shop_results (store_id, user_id, purchase_count) AS
SELECT store_id, user_id, COUNT(*)
FROM shopping GROUP BY store_id, user_id
SELECT store_id FROM stores
WHERE 'UserA' IN
(SELECT user_id FROM shop_results
WHERE shop_results.store_id = stores.store_id
ORDER BY purchase_count DESC LIMIT 5)
You can combine these into a single query by placing the SELECT from the VIEW inside the sub-query, but I think it's easier to read this way and it may well be true that you want that aggregated information elsewhere in the system — more consistent to define it once in a view than repeat it in multiple queries.