MySQL: Getting highest score for a user - sql

I have the following table (highscores),
id gameid userid name score date
1 38 2345 A 100 2009-07-23 16:45:01
2 39 2345 A 500 2009-07-20 16:45:01
3 31 2345 A 100 2009-07-20 16:45:01
4 38 2345 A 200 2009-10-20 16:45:01
5 38 2345 A 50 2009-07-20 16:45:01
6 32 2345 A 120 2009-07-20 16:45:01
7 32 2345 A 100 2009-07-20 16:45:01
Now in the above structure, a user can play a game multiple times but I want to display the "Games Played" by a specific user. So in games played section I can't display multiple games. So the concept should be like if a user played a game 3 times then the game with highest score should be displayed out of all.
I want result data like:
id gameid userid name score date
2 39 2345 A 500 2009-07-20 16:45:01
3 31 2345 A 100 2009-07-20 16:45:01
4 38 2345 A 200 2009-10-20 16:45:01
6 32 2345 A 120 2009-07-20 16:45:01
I tried following query but its not giving me the correct result:
SELECT id,
gameid,
userid,
date,
MAX(score) AS score
FROM highscores
WHERE userid='2345'
GROUP BY gameid
Please tell me what will be the query for this?
Thanks

Requirement is a bit vague/confusing but would something like this satisfy the need ?
(purposely added various aggregates that may be of interest).
SELECT gameid,
MIN(date) AS FirstTime,
MAX(date) AS LastTime,
MAX(score) AS TOPscore.
COUNT(*) AS NbOfTimesPlayed
FROM highscores
WHERE userid='2345'
GROUP BY gameid
-- ORDER BY COUNT(*) DESC -- for ex. to have games played most at top
Edit: New question about adding the id column to the the SELECT list
The short answer is: "No, id cannot be added, not within this particular construct". (Read further to see why) However, if the intent is to have the id of the game with the highest score, the query can be modified, using a sub-query, to achieve that.
As explained by Alex M on this page, all the column names referenced in the SELECT list and which are not used in the context of an aggregate function (MAX, MIN, AVG, COUNT and the like), MUST be included in the ORDER BY clause. The reason for this rule of the SQL language is simply that in gathering the info for the results list, SQL may encounter multiple values for such an column (listed in SELECT but not GROUP BY) and would then not know how to deal with it; rather than doing anything -possibly useful but possibly silly as well- with these extra rows/values, SQL standard dictates a error message, so that the user can modify the query and express explicitly his/her goals.
In our specific case, we could add the id in the SELECT and also add it in the GROUP BY list, but in doing so the grouping upon which the aggregation takes place would be different: the results list would include as many rows as we have id + gameid combinations the aggregate values for each of this row would be based on only the records from the table where the id and the gameid have the corresponding values (assuming id is the PK in table, we'd get a single row per aggregation, making the MAX() and such quite meaningless).
The way to include the id (and possibly other columns) corresponding to the game with the top score, is with a sub-query. The idea is that the subquery selects the game with TOP score (within a given group by), and the main query's SELECTs any column of this rows, even when the fieds wasn't (couldn't be) in the sub-query's group-by construct. BTW, do give credit on this page to rexem for showing this type of query first.
SELECT H.id,
H.gameid,
H.userid,
H.name,
H.score,
H.date
FROM highscores H
JOIN (
SELECT M.gameid, hs.userid, MAX(hs.score) MaxScoreByGameUser
FROM highscores H2
GROUP BY H2.gameid, H2.userid
) AS M
ON M.gameid = H.gameid
AND M.userid = H.userid
AND M.MaxScoreByGameUser = H.score
WHERE H.userid='2345'
A few important remarks about the query above
Duplicates: if there the user played several games that reached the same hi-score, the query will produce that many rows.
GROUP BY of the sub-query may need to change for different uses of the query. If rather than searching for the game's hi-score on a per user basis, we wanted the absolute hi-score, we would need to exclude userid from the GROUP BY (that's why I named the alias of the MAX with a long, explicit name)
The userid = '2345' may be added in the [now absent] WHERE clause of the sub-query, for efficiency purposes (unless MySQL's optimizer is very smart, currently all hi-scores for all game+user combinations get calculated, whereby we only need these for user '2345'); down side duplication; solution; variables.
There are several ways to deal with the issues mentioned above, but these seem to be out of scope for a [now rather lenghty] explanation about the GROUP BY constructs.

Every field you have in your SELECT (when a GROUP BY clause is present) must be either one of the fields in the GROUP BY clause, or else a group function such as MAX, SUM, AVG, etc. In your code, userid is technically violating that but in a pretty harmless fashion (you could make your code technically SQL standard compliant with a GROUP BY gameid, userid); fields id and date are in more serious violation - there will be many ids and dates within one GROUP BY set, and you're not telling how to make a single value out of that set (MySQL picks a more-or-less random ones, stricter SQL engines might more helpfully give you an error).
I know you want the id and date corresponding to the maximum score for a given grouping, but that's not explicit in your code. You'll need a subselect or a self-join to make it explicit!

Use:
SELECT t.id,
t.gameid,
t.userid,
t.name,
t.score,
t.date
FROM HIGHSCORES t
JOIN (SELECT hs.gameid,
hs.userid,
MAX(hs.score) 'max_score'
FROM HIGHSCORES hs
GROUP BY hs.gameid, hs.userid) mhs ON mhs.gameid = t.gameid
AND mhs.userid = t.userid
AND mhs.max_score = t.score
WHERE t.userid = '2345'

Related

Group by demands us to include all selected rows, when we need the results grouped by just one row

Following this programming exercise: SQL with Street Fighter, which statement is:
It's time to assess which of the world's greatest fighters are through
to the 6 coveted places in the semi-finals of the Street Fighter World
Fighting Championship. Every fight of the year has been recorded and
each fighter's wins and losses need to be added up.
Each row of the table fighters records, alongside the fighter's name,
whether they won (1) or lost (0), as well as the type of move that
ended the bout.
id
name
won
lost
move_id
winning_moves
id
move
However, due to new health and safety regulations, all ki blasts have
been outlawed as a potential fire hazard. Any bout that ended with
Hadoken, Shouoken or Kikoken should not be counted in the total wins
and losses.
So, your job:
Return name, won, and lost columns displaying the name, total number of wins and total number of losses. Group by the fighter's
name.
Do not count any wins or losses where the winning move was Hadoken, Shouoken or Kikoken.
Order from most-wins to least
Return the top 6. Don't worry about ties.
How could we group the fighters by their names?
We have tried:
select name, won, lost from fighters inner join winning_moves on fighters.id=winning_moves.id
group by name order by won desc limit 6;
However it displays:
There was an error with the SQL query:
PG::GroupingError: ERROR: column "fighters.won" must appear in the
GROUP BY clause or be used in an aggregate function LINE 3: select
name, won, lost from fighters inner join winning_move...
In addition we have also tried to include all selected rows:
select name, won, lost from fighters inner join winning_moves on fighters.id=winning_moves.id
group by name,won,lost order by won desc limit 6;
But the results differ from the expected.
Expected:
name won lost
Sakura 44 15
Cammy 44 17
Rose 42 19
Karin 42 13
Dhalsim 40 15
Ryu 39 16
Actual:
name won lost
Vega 2 1
Guile 2 1
Ryu 2 1
Rose 1 0
Vega 1 0
Zangief 1 0
Besides we have read:
https://www.w3schools.com/sql/sql_join.asp
MySql Inner Join with WHERE clause
How to limit rows in PostgreSQL SELECT
https://www.w3schools.com/sql/sql_groupby.asp
GROUP BY clause or be used in an aggregate function
PostgreSQL column must appear in the GROUP BY clause or be used in an aggregate function when using case statement
must appear in the GROUP BY clause or be used in an aggregate function
I guess you need to have sum() to aggregate the ids wins n loss. In addition to that you dont need join as you dont wanna show the move in the first query
select name, sum(won) as wins,
sum(lost)
from fighters
group by name order by sum(won)
desc limit 6;

Access SQL - Add Row Number to Query Result for a Multi-table Join

What I am trying to do is fairly simple. I just want to add a row number to a query. Since this is in Access is a bit more difficult than other SQL, but under normal circumstances is still doable using solutions such as DCount or Select Count(*), example here: How to show row number in Access query like ROW_NUMBER in SQL or Access SQL how to make an increment in SELECT query
My Issue
My issue is I'm trying to add this counter to a multi-join query that orders by fields from numerous tables.
Troubleshooting
My code is a bit ridiculous (19 fields, seven of which are long expressions, from 9 different joined tables, and ordered by fields from 5 of those tables). To make things simple, I have an simplified example query below:
Example Query
SELECT DCount("*","Requests_T","[Requests_T].[RequestID]<=" & [Requests_T].[RequestID]) AS counter, Requests_T.RequestHardDeadline AS Deadline, Requests_T.RequestOverridePriority AS Priority, Requests_T.RequestUserGroup AS [User Group], Requests_T.RequestNbrUsers AS [Nbr of Users], Requests_T.RequestSubmissionDate AS [Submitted on], Requests_T.RequestID
FROM (((((((Requests_T
INNER JOIN ENUM_UserGroups_T ON ENUM_UserGroups_T.UserGroups = Requests_T.RequestUserGroup)
INNER JOIN ENUM_RequestNbrUsers_T ON ENUM_RequestNbrUsers_T.NbrUsers = Requests_T.RequestNbrUsers)
INNER JOIN ENUM_RequestPriority_T ON ENUM_RequestPriority_T.Priority = Requests_T.RequestOverridePriority)
ORDER BY Requests_T.RequestHardDeadline, ENUM_RequestPriority_T.DisplayOrder DESC , ENUM_UserGroups_T.DisplayOrder, ENUM_RequestNbrUsers_T.DisplayOrder DESC , Requests_T.RequestSubmissionDate;
If the code above is trying to select a field from a table not included, I apologize - just trust the field comes from somewhere (lol i.e. one of the other joins I excluded to simply the query). A great example of this is the .DisplayOrder fields used in the ORDER BY expression. These are fields from a table that simply determines the "priority" of an enum. Example: Requests_T.RequestOverridePriority displays to the user as an combobox option of "Low", "Med", "High". So in a table, I assign a numerical priority to these of "1", "2", and "3" to these options, respectively. Thus when ENUM_RequestPriority_T.DisplayOrder DESC is called in order by, all "High" priority requests will display above "Medium" and "Low". Same holds true for ENUM_UserGroups_T.DisplayOrder and ENUM_RequestNbrUsers_T.DisplayOrder.
I'd also prefer to NOT use DCOUNT due to efficiency, and rather do something like:
select count(*) from Requests_T where Requests_T.RequestID>=RequestID) as counter
Due to the "Order By" expression however, my 'counter' doesn't actually count my resulting rows sequentially since both of my examples are tied to the RequestID.
Example Results
Based on my actual query results, I've made an example result of the query above.
Counter Deadline Priority User_Group Nbr_of_Users Submitted_on RequestID
5 12/01/2016 High IT 2-4 01/01/2016 5
7 01/01/2017 Low IT 2-4 05/06/2016 8
10 Med IT 2-4 07/13/2016 11
15 Low IT 10+ 01/01/2016 16
8 Low IT 2-4 01/01/2016 9
2 Low IT 2-4 05/05/2016 2
The query is displaying my results in the proper order (those with the nearest deadline at the top, then those with the highest priority, then user group, then # of users, and finally, if all else is equal, it is sorted by submission date). However, my "Counter" values are completely wrong! The counter field should simply intriment +1 for each new row. Thus if displaying a single request on a form for a user, I could say
"You are number: Counter [associated to RequestID] in the
development queue."
Meanwhile my results:
Aren't sequential (notice the first four display sequentially, but then the final two rows don't)! Even though the final two rows are lower in priority than the records above them, they ended up with a lower Counter value simply because they had the lower RequestID.
They don't start at "1" and increment +1 for each new record.
Ideal Results
Thus my ideal result from above would be:
Counter Deadline Priority User_Group Nbr_of_Users Submitted_on RequestID
1 12/01/2016 High IT 2-4 01/01/2016 5
2 01/01/2017 Low IT 2-4 05/06/2016 8
3 Med IT 2-4 07/13/2016 11
4 Low IT 10+ 01/01/2016 16
5 Low IT 2-4 01/01/2016 9
6 Low IT 2-4 05/05/2016 2
I'm spoiled by PLSQL and other software where this would be automatic lol. This is driving me crazy! Any help would be greatly appreciated.
FYI - I'd prefer an SQL option over VBA if possible. VBA is very much welcomed and will definitely get an up vote and my huge thanks if it works, but I'd like to mark an SQL option as the answer.
Unfortuantely, MS Access doesn't have the very useful ROW_NUMBER() function like other clients do. So we are left to improvise.
Because your query is so complicated and MS Access does not support common table expressions, I recommend you follow a two step process. First, name that query you already wrote IntermediateQuery. Then, write a second query called FinalQuery that does the following:
SELECT i1.field_primarykey, i1.field2, ... , i1.field_x,
(SELECT field_primarykey FROM IntermediateQuery i2
WHERE t2.field_primarykey <= t1.field_primarykey) AS Counter
FROM IntermediateQuery i1
ORDER BY Counter
The unfortunate side effect of this is the more data your table returns, the longer it will take for the inline subquery to calculate. However, this is the only way you'll get your row numbers. It does depend on having a primary key in the table. In this particular case, it doesn't have to be an explicitly defined primary key, it just needs to be a field or combination of fields that is completely unique for each record.

Selecting filtered values from Oracle using ROWNUM

I have a requirement wherein i need to find the record number of the records that are returned from the resultset. I know that i can use ROWNUM to get the record number from the resultset but my issue is slightly different. below are the details
Table : ProcessSummary
Columns:
PS_PK ProcessId StepId AsscoiateId ProcessName AssetAmount
145 25 50 Process1 3,500.00
267 26 45 Process2 4,400.00
356 27 70 Process3 2,400.00
456 28 80 90 Process4 780.00
556 29 56 67 Process5 4,500.00
656 45 70 Process6 6,000.00
789 31 75 Process7 8,000.00
Now what i need to do is fetch all the records from the ProcessSummary Table when either of ProcessId OR StepId OR AssociateId is NULL. I wrote the below query
select * from ProcessSummary where ProcessId IS NULL OR StepId IS NULL OR AsscoiateId IS NULL
As expected i got 1st, 2nd, 3rd, 6th and 7th records in the resultset that got returned.
Now what i need is to get the records numbers 1,2,3,6,7. I tried to use the ROWNUM as below but i got the values of 1,2,3,4,5 and not 1,2,3,6,7.
select ROWNUM from ProcessSummary where ProcessId IS NULL OR StepId IS NULL OR AsscoiateId IS NULL
Is it possible to get the ROWNUM values in the sequence that i want and if yes then can you please let me know how can i do this. Also if ROWNUM cannot be used then what would be the other option that i can use to get the result in the form that i want.
Any help would be greately appericiated as i could not find much on the net or SO regarding this sort of requirement.
Thanks
Vikeng21
rownum is an internal numbering that gives you a row number based on the current query results only, so that numbering is not tied to a specific record, and it will change when you change the data or the query.
But the numbering you ask for is already in your table. It looks like you just need to SELECT PS_PK .. instead. PS_PK is the field in your table that contains the actual number you want.
You can generate a numbering using an analytical function, and then filter that query. You need some fields to order by, though. In this case I've chosen PS_PK, but it can be another field, like ProcessName or a combination of other fields as well.
select
*
from
(select
dense_rank() over (order by PS_PK) as RANKING,
p.*
from
ProcessSummary p)
where
ProcessId IS NULL OR StepId IS NULL OR AsscoiateId IS NULL
So, in this query, first a numbering is calculated for each row that is returned from the inner query. The numbering is returned as the field RANKING. And then the other query filters further, but still will return the field RANKING with the original numbering.
Instead of dense_rank there is also rank and row_number. The differences are subtle, but you can just experiment and read some docs here and here to learn about the differences and see which one fits you best.
Note that this might slow down your query, because the inner query first generates a number for each row in the table (there is no filtering on that level now).

SQL rollup - prevent summing records multiple times

Firstly, I could not think of a better question title. Apologies for that.
So, I am writing a query and here is something(I think) it would return without aggregating functions and group by. I am using this as an example and actual query contains a lot more fields:
SUBJ CLASSROOM CLASSROOM_CAPACITY
A 1 25
B 2 50
C 3 60
A 2 50
A 1 25 <--Not actually duplicate
Now you would say there are duplicate records. But in fact they are not duplicate in a way that there are some extra fields(not shown here) which would have different values for those seemingly duplicate records.
What I want:
SUBJ CLASSROOM CLASSROOM_CAPACITY
A 1 25
2 50
TOTAL 75
B 2 50
TOTAL 50
C 3 60
TOTAL 60
//EDIT - Apparently following line is causing too much confusion. Ignore it. How can I get rest of the table correctly?
TOTAL 135 //It seems its quite difficult to get 135 here. Its ok if this total is messed up
What I am trying:
SELECT
SOME_FIELDS,
SUBJ,
CLASSROOM,
SUM(CLASSROOM_CAPACITY)
FROM
MYTABLE
WHERE .....
GROUP BY SOME_FIELDS, ROLLUP(SUBJ,CLASSROOM)
The problem:
Thanks to those "seemingly duplicate" records, classroom capacities are being summed up multiple times. How do I prevent that? Am I doing this the wrong way?
The actual query is lot more complicated but I think if I can get this right, I can apply it to bigger query.
PS: I know how to get text "Total" instead of blank entry with ROLLUP using GROUPING so you can skip that part.
The cardinality you're introducing is a little off and when you sort the that ROLLUP starts to work. Your saying that:
SUBJ CLASSROOM CLASSROOM_CAPACITY
A 1 25
is equal to:
SUBJ CLASSROOM CLASSROOM_CAPACITY
A 1 25
But the SOME_FIELDS could vary per row. When you aggregate up to just the columns above, what do you expect to happen to SOME_FIELDS?
If these can be ignore for the purposes of this query your best bet is to first find the DISTINCT records (i.e. records that contain a unique tuple of subj, classroom and classroom_capacity) and then do the ROLLUP on this data set. The following query achieves this:
WITH distinct_subj_classrm_capacity AS (
SELECT DISTINCT
subj
, classroom
, classroom_capacity
FROM mytable
)
SELECT
subj
, classroom
, SUM(classroom_capacity)
FROM distinct_subj_classrm_capacity
GROUP BY ROLLUP(subj, classroom)
If you're not interested in the break report results that ROLLUP gives you and you simply want the raw totals then you can use the analytic version of SUM (see here for more on Oracle analytic functions: http://docs.oracle.com/cd/E11882_01/server.112/e26088/functions004.htm)
WITH distinct_subj_classrm_capacity AS (
SELECT DISTINCT
subj
, classroom
, classroom_capacity
FROM mytable
)
SELECT DISTINCT
subj
, SUM(classroom_capacity) OVER (PARTITION BY subj) classroom_capacity_per_subj
FROM distinct_subj_classrm_capacity
This gives results in the format:
SUBJ CLASSROOM_CAPACITY_PER_SUBJ
A 75
B 50
C 60

Retrieve names by ratio of their occurrence

I'm somewhat new to SQL queries, and I'm struggling with this particular problem.
Let's say I have query that returns the following 3 records (kept to one column for simplicity):
Tom
Jack
Tom
And I want to have those results grouped by the name and also include the fraction (ratio) of the occurrence of that name out of the total records returned.
So, the desired result would be (as two columns):
Tom | 2/3
Jack | 1/3
How would I go about it? Determining the numerator is pretty easy (I can just use COUNT() and GROUP BY name), but I'm having trouble translating that into a ratio out of the total rows returned.
SELECT name, COUNT(name)/(SELECT COUNT(1) FROM names) FROM names GROUP BY name;
Since the denominator is fixed, the "ratio" is directly proportional to the numerator. Unless you really need to show the denominator, it'll be a lot easier to just use something like:
select name, count(*) from your_table_name
group by name
order by count(*) desc
and you'll get the right data in the right order, but the number that's shown will be the count instead of the ratio.
If you really want that denominator, you'd do a count(*) on a non-grouped version of the same select -- but depending on how long the select takes, that could be pretty slow.