COUNT(*) function is returning multiple values - sql

I am writing a specific sql query that needs to return the position of a particular entry, based on a grouped table.
Background info: I am coding a Golf Club Data Management system using Java and MS Access. In this system, the user is able to store their scores as a new entry into this table. Using this table, I have managed to extract a ranking of the top 3 Golf players, using all their recorded scores (I only used top 3 to preserve screen space).
Select TOP 3 Username, Sum(Points)
FROM Scores
GROUP By Username
ORDER BY Sum(Points) desc
This produces the required result. However, if the current user falls outside of the top 3, I want to be able to tell the user where they currently sit in the complete ranking of all the players. So, I tried to write a query that counts the number of players having a sum of points below the current user. Here is my query:
Select COUNT(*)
From Scores
GROUP BY Username
HAVING Sum(Points) < (Select Sum(Points)
FROM Scores
WHERE Username = 'Golfer210'
GROUP By Username)
This does not produce the expected number 2, but instead does this.
I have tried removing the GROUP BY function but that returns null. The COUNT DISTINCT Function refuses to work as well, and continuously returns a syntax error message, no matter how I word it.
Questions: Is there a way to count the number of entries while using a GROUP BY function? if not, is there an easier, more practical way to select the position of an entry from the grouped table? Or can this only be done in Java, after the ranking has been extracted from the database? I have not been able to find a solution anywhere

You need an additional level of aggregation:
SELECT COUNT(*)
FROM (SELECT COUNT(*)
FROM Scores
GROUP BY Username
HAVING Sum(Points) < (SELECT Sum(Points)
FROM Scores
WHERE Username = 'Golfer210'
)
) as s;
Note: You might want to check if your logic does what you expect when there are ties.

Related

SQL JOIN to select MAX value among multiple user attempts returns two values when both attempts have the same value

Good morning, everyone!
I have a pretty simple SELECT/JOIN statement that gets some imported data from a placement test and returns the highest scored attempt a user made, the best score. Users can take this test multiple times, so we just use the best attempt. What if a user makes multiple attempts (say, takes it twice,) and receives the SAME score both times?
My current query ends up returning BOTH of those records, as they're both equal, so MAX() returns both. There are no primary keys setup on this yet--the query I'm using below is the one I hope to add into an INSERT statement for another table, once I only get a SINGLE best attempt per User (StudentID), and set that StudentID as the key. So you see my problem...
I've tried a few DISTINCT or TOP statements in my query but either I'm putting them into the wrong part of the query or they still return two records for a user who had identically scored attempts. Any suggestions?
SELECT p.*
FROM
(SELECT
StudentID, MAX(PlacementResults) AS PlacementResults
FROM AleksMathResults
GROUP BY StudentID)
AS mx
JOIN AleksMathResults p ON mx.StudentID = p.StudentID AND mx.PlacementResults = p.PlacementResults
ORDER BY
StudentID
Sounds like you want row_number():
SELECT amr.*
FROM (SELECT amr.*
ROW_NUMBER() OVER (PARTITION BY StudentID ORDER BY PlacementResults DESC) as seqnum
FROM AleksMathResults amr
) amr
WHERE seqnum = 1;

How to efficiently get a range of ranked users (for a leaderboard) using Postgresql

I have read many posts on this topic, such as
mysql-get-rank-from-leaderboards.
However, none of the solutions are efficient at scale for getting a range of ranks from the database.
The problem is simple. Suppose we have a Postgres table with an "id" column and another INTEGER column whose values are not unique, but we have an index for this column.
e.g. table could be:
CREATE TABLE my_game_users (id serial PRIMARY KEY, rating INTEGER NOT NULL);
The goal
Define a rank for users ordering users on the "rating" column descending
Be able to query for a list of ~50 users ordered by this new "rank", centered at any particular user
For example, we might return users with ranks { 15, 16, ..., 64, 65 } where the center user has rank #40
Performance must scale, e.g. be under 80 ms for 100,000 users.
Attempt #1: row_number() window function
WITH my_ranks AS
(SELECT my_game_users.*, row_number() OVER (ORDER BY rating DESC) AS rank
FROM my_game_users)
SELECT *
FROM my_ranks
WHERE rank >= 4000 AND rank <= 4050
ORDER BY rank ASC;
This "works", but the queries average 550ms with 100,000 users on a fast laptop without any other real work being done.
I tried adding indexes, and re-phrasing this query to not use the "WITH" syntax, and nothing worked to speed it up.
Attempt #2 - count the number of rows with a greater rating value
I tried a query like this:
SELECT t1.*,
(SELECT COUNT(*)
FROM my_game_users t2
WHERE (t1.rating, -t1.id) <= (t2.rating, -t2.id)
) AS rank
FROM my_game_users t1
WHERE id = 2000;
This is decent, this query takes about 120ms with 100,000 users having random ratings. However, this only returns the rank for user with a particular id (2000).
I can't see any efficient way to extend this query to get a range of ranks. Any attempt at extending this makes a very slow query.
I only know the ID of the "center" user, since the users have to be ordered by rank before we know which ones are in the range!
Attempt #3: in-memory ordered Tree
I ended up using a Java TreeSet to store the ranks. I can update the TreeSet whenever a new user is inserted into the database, or a user's rating changes.
This is super fast, around 25 ms with 100,000 users.
However, it has a serious drawback that it's only updated on the Webapp node that serviced the request. I'm using Heroku and will deploy multiple nodes for my app. So, I needed to add a scheduled task for the server to re-build this ranking tree every hour, to make sure the nodes don't get too out-of-sync!
If anyone knows of an efficient way to do this in Postgres with full solution, then I am all ears!
You can get the same results by using order by rating desc and offset and limit to get users between a certain rank.
WITH my_ranks AS
(SELECT my_game_users.*, row_number() OVER (ORDER BY rating DESC) AS rank FROM my_game_users)
SELECT * FROM my_ranks WHERE rank >= 4000 AND rank <= 4050 ORDER BY rank ASC;
The query above is the same as
select * , rank() over (order by rating desc) rank
from my_game_users
order by rating desc
limit 50 offset 4000
If you want to select users around rank #40 you could select ranks #15-#65
select *, rank() over (order by rating desc) rank
from my_game_users
order by rating desc
limit 50 offset 15
Thanks, #FuzzyTree !
Your solution doesn't quite give me everything I need, but it nudged me in the right direction. Here's the full solution I'm going with for now.
The only limitation with your solution is that there's no way to get a unique rank for a particular user. All users with the same rating would have the same rank (or at least it is undefined by SQL standard). If I knew the OFFSET ahead of time, then your rank would be good enough, but I have to get the rank of a particular user first.
My solution is to do the following query to get a range of ranks:
SELECT * FROM my_game_users ORDER BY rating DESC, id ASC LIMIT ? OFFSET ?
This is basically uniquely defining the ranks by rating, then by who joined the Game first (lower id).
To make this efficient I'm creating an index on (rating DESC, id)
Then, I'm getting a particular user's rank to plug in to this query with:
SELECT COUNT(*) FROM my_game_users WHERE rating > ? OR (rating = ? AND id < ?)
I actually made this more efficient with:
SELECT (SELECT COUNT(*) FROM my_game_users WHERE rating > ?) + (SELECT COUNT(*) FROM my_game_users WHERE rating = ? AND id < ?) + 1
Now, even with these queries it takes about 78ms average and median time to get the ranks around a user. If anyone has a good idea how to speed these up I'm all ears!
For example, getting a range of ranks takes about 60ms, and explaining it yields:
EXPLAIN SELECT * FROM word_users ORDER BY rating DESC, id ASC LIMIT 50 OFFSET 50000;
"Limit (cost=6350.28..6356.63 rows=50 width=665)"
" -> Index Scan using idx_rating_desc_and_id on word_users (cost=0.29..12704.83 rows=100036 width=665)"
So, it's using the rating and id index, yet it still has this highly variable cost from 0.29...12704.83. Any ideas how to improve??
If you order it in desc order you have it in the right order. Use the rownumber() function.
Select Row number in postgres
Also you would use an in memory cache to store stuff in memory. Something like redis. Its a separate application that can serve multiple instances, even remotely.

Pull up login times in sql database

I'm having a little trouble getting info from my database. I have a table containing two columns, one the player name, and the other the login time. I want to get a list of players and the number of times they logged in during a time period. However, users can login multiple times a day, but I want all logins within the same day to be counted as once.
To put this into context, I want a list of player name and the number of days in which the users have logged in during 2013-11-18 to 2013-11-24.
Is there a query that can return this result in one go?
I have tried using GROUP BY player_name but that just gives the number of times the player have logged in in total (large numbers because each player may login multiple times a day).
Any help would be appreciated.
SELECT player_name, COUNT(DISTINCT DATE(login_time))
FROM login_table
GROUP BY player_name;
Nail it, let me know if it doesnt work :)
I didn't try it, but this should do the job:
Select player_name, count(logged_in)
From (Select player_name, Date(login_date) As logged_in From ... Group By player_name, Date(login_date)) as tmp
Group By player_name
EDIT: Derived table must have an alias

Grouping Minus Oracle Problems

I've just created this query and I get confuse by the time I grouping this because I can't see them as one grouping. This query runs but not the way I wanted, I want to group the query by the team name but the problem occurs when its query being counted using count(*) and the result of its counting produces the same number ,,,
SELECT TEAM.NAMATEAM, PERSONAL.KODEPERSON
FROM TEAM, PERSONAL
WHERE TEAM.KODETEAM = PERSONAL.KODETEAM
GROUP BY PERSONAL.KODEPERSON, TEAM.NAMATEAM
MINUS
SELECT TEAM.NAMATEAM, PERSONAL.KODEPERSON
FROM TEAM, PERSONAL, AWARD_PERSON
WHERE TEAM.KODETEAM = PERSONAL.KODETEAM
AND AWARD_PERSON.PEMENANG = PERSONAL.KODEPERSON
GROUP BY TEAM.NAMATEAM, PERSONAL.KODEPERSON;
I want to group all these using the team name but using counting will be problem since I have no idea to group within the technique that can be run smoothly as I wanted. Thank you.
Do I understand your question? You are trying to make a table of columns NAMATEAM,X where NAMATEAM are the team names, and X are the number of people on each team who do not have awards (listed in AWARD_PERSON). If so, you should be able to use a sub-select:
SELECT T_NAME, COUNT(*)
FROM (
SELECT TEAM.NAMATEAM "T_NAME", PERSONAL.KODEPERSON
FROM TEAM, PERSONAL
WHERE TEAM.KODETEAM = PERSONAL.KODETEAM
MINUS
SELECT TEAM.NAMATEAM "T_NAME", PERSONAL.KODEPERSON
FROM TEAM, PERSONAL, AWARD_PERSON
WHERE TEAM.KODETEAM = PERSONAL.KODETEAM
AND AWARD_PERSON.PEMENANG = PERSONAL.KODEPERSON )
-- your original query without the GROUP BYs
GROUP BY T_NAME
The first subselect SELECT creates a full list of players, the second subselect SELECT creates a list of players who have won awards (I assume), the MINUS removes the award winners from the full list. Thus the full subselect returns a list of players and their teams, for all players without awards.
The main SELECT then summarizes on the team name only, to yield a per-team count of players without awards.
You should not need your original GROUP BY TEAM.NAMATEAM, PERSONAL.KODEPERSON, unless you have duplicate rows in your database, e.g., one player on one team has more than one row in the database.

Compute Users average weight

I have two tables, Users and DoctorVisit
User
- UserID
- Name
DoctorsVisit
- UserID
- Weight
- Date
The doctorVisit table contains all the visits a particular user did to the doctor.
The user's weight is recorded per visit.
Query: Sum up all the Users weight, using the last doctor's visit's numbers. (then divide by number of users to get the average weight)
Note: some users may have not visited the doctor at all, while others may have visited many times.
I need the average weight of all users, but using the latest weight.
Update
I want the average weight across all users.
If I understand your question correctly, you should be able to get the average weight of all users based on their last visit from the following SQL statement. We use a subquery to get the last visit as a filter.
SELECT avg(uv.weight) FROM (SELECT weight FROM uservisit uv INNER JOIN
(SELECT userid, MAX(dateVisited) DateVisited FROM uservisit GROUP BY userid) us
ON us.UserID = uv.UserId and us.DateVisited = uv.DateVisited
I should point out that this does assume that there is a unique UserID that can be used to determine uniqueness. Also, if the DateVisited doesn't include a time but just a date, one patient who visits twice on the same day could skew the data.
This should get you the average weight per user if they have visited:
select user.name, temp.AvgWeight
from user left outer join (select userid, avg(weight)
from doctorsvisit
group by userid) temp
on user.userid = temp.userid
Write a query to select the most recent weight for each user (QueryA), and use that query as an inner select of a query to select the average (QueryB), e.g.,
SELECT AVG(weight) FROM (QueryA)
I think there's a mistake in your specs.
If you divide by all the users, your average will be too low. Each user that has no doctor visits will tend to drag the average towards zero. I don't believe that's what you want.
I'm too lazy to come up with an actual query, but it's going to be one of these things where you use a self join between the base table and a query with a group by that pulls out all the relevant Id, Visit Date pairs from the base table. The only thing you need the User table for is the Name.
We had a sample of the same problem in here a couple of weeks ago, I think. By the "same problem", I mean the problem where we want an attribute of the representative of a group, but where the attribute we want isn't included in the group by clause.
I think this will work, though I could be wrong:
Use an inner select to make sure you have the most recent visit, then use AVG. Your User table in this example is superfluous: since you have no weight data there and you don't care about user names, it doesn't do you any good to examine it.
SELECT AVG(dv.Weight)
FROM DoctorsVisit dv
WHERE dv.Date = (
SELECT MAX(Date)
FROM DoctorsVisit innerdv
WHERE innerdv.UserID = dv.UserID
)
If you're using SQL Server 2005 you don't need the sub query on the GROUP BY.
You can use the new ROW_NUMBER and PARTION BY functionality.
SELECT AVG(a.weight) FROM
(select
ROW_NUMBER() OVER(PARTITION BY dv.UserId ORDER BY Date desc) as ID,
dv.weight
from
DoctorsVisit dv) a
WHERE a.Id = 1
As someone else has mentioned though, this is the average weight across all the users who have VISITED the doctor. If you want the average weight across ALL of the users then anyone not visiting the doctor will give a misleading average.
Here's my stab at the solution:
select
avg(a.Weight) as AverageWeight
from
DoctorsVisit as a
innner join
(select
UserID,
max (Date) as LatestDate
from
DoctorsVisit
group by
UserID) as b
on a.UserID = b.UserID and a.Date = b.LatestDate;
Note that the User table isn't used at all.
This average omits entirely users who have no doctors visits at all, or whose weight is recorded as NULL in their latest doctors visit. This average is skewed if any users have more than one visit on the same date, and if the latest date is one of those date where the user got wighed more than once.