How to do order by and group by in this scenario? - sql

I need to select TOP 3 Sports based on TotalUsers by DESC and group them by Individual Sports.
What I've done till now is
SELECT *
FROM (
SELECT R.Sports, R.RoomID ,R.Name,
COUNT(C.ChatUserLogId) AS TotalUsers,
ROW_NUMBER()
OVER (PARTITION BY R.SPORTS ORDER BY R.SPORTS DESC ) AS Rank
FROM Room R JOIN ChatUserLog C
ON R.RoomID = C.RoomId
GROUP BY
R.RoomID,
R.Name,
R.Sports
) rs WHERE Rank IN (1, 2, 3)
ORDER BY Sports, TotalUsers DESC
Below is the output of the SQL
Sports RoomID Name TotalUsers Rank
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aerobics 6670 Aerobic vs. Anaerobic Exercise: Which is Best to Burn more Fat? 17 1
Aerobics 7922 Is it okay to be fat if you’re fit? 13 2
Aerobics 6669 What is the best time of the day to do an aerobic work out? 7 3
Archery 7924 Who were the best archers in history? 8 1
Archery 7925 Should I get into shooting or archery? 7 2
Archery 7926 What advantages, if any, do arrows have over bullets? 6 3
Badminton 6678 Which is more challenging, physically and mentally: badminton or tennis? 9 1
Badminton 6677 Who is your favorite - Lee chong wei or Lin dan? 8 2
Badminton 6794 Which single athlete most changed the sport? 7 3
Billiards  6691 How to get great at billiards? 34 1
Billiards  6692 Why is Efren Reyes the greatest billiards and pool player of all time? 31 2
Boxing 6697 Mike Tyson: The greatest heavyweight of all time? 13 1
Boxing 6700 Who is considered the greatest boxer of all time? Why? 13 2
Boxing 6699 What is the greatest, most exciting boxing fight of all-time? 12 3
But my query does not solve my requirement. I need the output something like below. The below output selects the TotalUsers and groups them by Sports.
Sports TotalUsers
-----------------------
Billiards 34
Billiards 31
Aerobics 17
Aerobics 13
Aerobics 7
Boxing 13
Boxing 13
Boxing 12
Any help is appreciated.

Your code looks very close, but there appear to be three issues.
Over clause
There looks to be an error in your OVER clause:
ROW_NUMBER() OVER(PARTITION BY R.SPORTS ORDER BY R.SPORTS DESC)
The PARTITION BY statement is correct in restarting the ranking for each partition. However, within each partition you are ordering by the partition criteria, which is nondeterministic (R.SPORTS will necessarily be equal for each value in the partition, so ORDER BY will have no effect). What you want, instead, is to order by the total users. The statement is then:
ROW_NUMBER() OVER(PARTITION BY R.SPORTS ORDER BY COUNT(C.CHATUSERLOGID) DESC)
(You can also use RANK() in place of ROW_NUMBER if you want to rooms with equal number of users to have the same ranking.)
Final query ordering
The question indicates you are seeking to order the result set as follows:
First, by sport; sports should be ordered by the largest room within that category
Second, the top 3 rooms for each sport in descending order of size
The first criteria requires a new column in your inner select statement: for each room, what was the highest number of users for any room for that sport? This can be written as:
MAX(COUNT(C.CHATUSERLOGID)) OVER (PARTITION BY R.SPORTS) MaxSportsUsers
With this column available, you can order by MaxSportsUsers descending followed by Rank ascending.
Limiting to top 3 sports: a problem arises
The question solution indicates you only want the top three sports, ranked by the number of users in its top room. Thus, you need to do a ranking of the form:
RANK() OVER (PARTITION BY CATEGORY ORDER BY MAX(COUNT(USERID)) OVER (PARTITION BY CATEGORY)) CategoryTop
But SQL Server does not support this, and attempting it will raise the error "Windowed functions cannot be used in the context of another windowed function or aggregate".
There are a few alternatives. As one, note that if we run SELECT TOP 3 SPORT, MAX(TotalUsers) MaxUsers FROM RS ORDER BY 2 DESC against the inner query (rs), the query will produce the top three sports and highest user count. Joining these records against RS on Sport will limit the final output to the top three sports.
This approach requires that RS to be referenced from an inner join. To do so, it's necessary to convert the nested query (SELECT * FROM (SELECT...) rs) to Common Table Expression form (WITH RS AS (SELECT...) SELECT * FROM RS). This allows a query of the form WITH RS AS (SELECT...) SELECT * FROM RS JOIN (SELECT... FROM RS) R2...
Once the query is in CTE format, we can join on the CTE query, i.e., INNER JOIN (SELECT TOP 3 SPORT, MAX(TOTALUSERS) MaxSportsUsers FROM RS GROUP BY SPORT ORDER BY 2 DESC) RS2 ON RS2.SPORT = RS.SPORT), keeping the ORDER BY clause the same. The inner join will limit the final dataset to the top 3 sports.
With the MaxSportsUsers column moved to the inner join, it can be removed from RS (formerly the inner query).
Final query
Combining the above, we get the final query:
WITH RS AS
(
SELECT R.Sports, R.RoomID ,R.Name,
COUNT(C.ChatUserLogId) AS TotalUsers,
ROW_NUMBER() OVER (PARTITION BY R.SPORTS ORDER BY COUNT(C.ChatUserLogId) DESC ) AS Rank
FROM Room R
JOIN ChatUserLog C ON R.RoomID = C.RoomId
GROUP BY R.RoomID, R.Name, R.Sports
)
SELECT rs.Sports, rs.TotalUsers
FROM rs
INNER JOIN (
SELECT TOP 3 SPORTS, MAX(TOTALUSERS) MaxSportsUsers FROM RS GROUP BY SPORTS ORDER BY 2 DESC
) RS2 ON RS2.SPORTS = RS.SPORTS
WHERE Rank IN (1, 2, 3)
ORDER BY MaxSportsUsers DESC, RANK;

From the description of your desired data, you appear to only want to select two columns from the subquery:
SELECT rs.Sports, rs.TotalUsers
FROM (SELECT R.Sports, R.RoomID ,R.Name,
COUNT(C.ChatUserLogId) AS TotalUsers,
ROW_NUMBER() OVER (PARTITION BY R.SPORTS ORDER BY R.SPORTS DESC ) AS Rank
FROM Room R JOIN
ChatUserLog C
ON R.RoomID = C.RoomId
GROUP BY R.RoomID, R.Name, R.Sports
) rs
WHERE Rank IN (1, 2, 3)
ORDER BY Sports, TotalUsers DESC;
The only change is that the outer query selects the two columns you want.

If you want the top 3, start by getting the top 3. Something like this:
with top3Sports as (
select top 3 sports, count(chatUserLogId) users
from room r join chatUserLog c on r.roomId = c.roomId
group by sports
order by count(chatUserLogId) desc
)
select the fields you need
from top3Sports join other tables etc
It's a lot simpler than the approach you tried. Bear in mind, however, that no matter what approach you take, ties will mess you up.

Related

SQL: select max(A), B but don't want to group by or aggregate B

If I have a house with multiple rooms, but I want the color of the most recently created, I would say:
select house.house_id, house.street_name, max(room.create_date), room.color
from house, room
where house.house_id = room.house_id
and house.house_id = 5
group by house.house_id, house.street_name
But I get the error:
Column 'room.color' is invalid in the select list because it is not
contained in either an aggregate function or the GROUP BY clause.
If I say max(room.color), then sure, it will give me the max(color) along with the max(create_date), but I want the COLOR OF THE ROOM WITH THE MAX CREATE DATE.
just added the street_name because I do need to do the join, was just trying to simplify the query to clarify the question..
Expanding this to work for any number of houses (rather than only working for exactly one house)...
SELECT
house.*,
room.*
FROM
house
OUTER APPLY
(
SELECT TOP (1) room.create_date, room.color
FROM room
WHERE house.house_id = room.house_id
ORDER BY room.create_date DESC
)
AS room
One option is WITH TIES and also best to use the explicit JOIN
If you want to see ties, change row_number() to dense_rank()
select top 1 with ties
house.house_id
,room.create_date
, room.color
from house
join room on house.house_id = room.house_id
Where house.house_id = 5
Order by row_number() over (partition by house.house_ID order by room.create_date desc)
In standard SQL, you would write this as:
select r.house_id, r.create_date, r..color
from room r
where r.house_id = 5
order by r.create_date desc
offset 0 row fetch first 1 row only;
Note that the house table is not needed. If you did need columns from it, then you would use join/on syntax.
Not all databases support the standard offset/fetch clause. You might need to use limit, select top or something else depending on your database.
The above works in SQL Server, but is probably more commonly written as:
select top (1) r.house_id, r.create_date, r..color
from room r
where r.house_id = 5
order by r.create_date desc;
You could order by create_date and then select top 1:
select top 1 house.house_id, room.create_date, room.color
from house, room
where house.house_id = room.house_id
and house.house_id = 5
order by room.create_date desc

Sql max trophy count

I Create DataBase in SQL about Basketball. Teacher give me the task, I need print out basketball players from my database with the max trophy count. So, I wrote this little bit of code:
select surname ,count(player_id) as trophy_count
from dbo.Players p
left join Trophies t on player_id=p.id
group by p.surname
and SQL gave me this:
but I want, that SQL will print only this:
I read info about select in selects, but I don't know how it works, I tried but it doesn't work.
Use TOP:
SELECT TOP 1 surname, COUNT(player_id) AS trophy_count -- or TOP 1 WITH TIES
FROM dbo.Players p
LEFT JOIN Trophies t
ON t.player_id = p.id
GROUP BY p.surname
ORDER BY COUNT(player_id) DESC;
If you want to get all ties for the highest count, then use SELECT TOP 1 WITH TIES.
;WITH CTE AS
(
select surname ,count(player_id) as trophy_count
from dbo.Players p
group by p.surname;
)
select *
from CTE
where trophy_count = (select max(trophy_count) from CTE)
While select top with ties works (and is probably more efficient) I would say this code is probably more useful in the real world as it could be used to find the max, min or specific trophy count if needed with a very simple modification of the code.
This is basically getting your group by first, then allowing you to specify what results you want back. In this instance you can use
max(trophy_count) - get the maximum
min(trophy_count) - get the minimum
# i.e. - where trophy_count = 3 - to get a specific trophy count
avg(trophy_count) - get the average trophy_count
There are many others. Google "SQL Aggregate functions"
You will eventually go down the rabbit hole of needing to subsection this (examples are by week or by league). Then you are going to want to use windows functions with a cte or subquery)
For your example:
;with cte_base as
(
-- Set your detail here (this step is only needed if you are looking at aggregates)
select surname,Count(*) Ct
left join Trophies t on player_id=p.id
group by p.surname
, cte_ranked as
-- Dense_rank is chosen because of ties
-- Add to the partition to break out your detail like by league, surname
(
select *
, dr = DENSE_RANK() over (partition by surname order by Ct desc)
from cte_base
)
select *
from cte_ranked
where dr = 1 -- Bring back only the #1 of each partition
This is by far overkill but helping you lay the foundation to handle much more complicated queries. Tim Biegeleisen's answer is more than adequate to answer you question.

Trying to figure out how to join these queries

I have a table named grades. A column named Students, Practical, Written. I am trying to figure out the top 5 students by total score on the test. Here are the queries that I have not sure how to join them correctly. I am using oracle 11g.
This get's me the total sums from each student:
SELECT Student, Practical, Written, (Practical+Written) AS SumColumn
FROM Grades;
This gets the top 5 students:
SELECT Student
FROM ( SELECT Student,
, DENSE_RANK() OVER (ORDER BY Score DESC) as Score_dr
FROM Grades )
WHERE Student_dr <= 5
order by Student_dr;
The approach I prefer is data-centric, rather than row-position centric:
SELECT g.Student, g.Practical, g.Written, (g.Practical+g.Written) AS SumColumn
FROM Grades g
LEFT JOIN Grades g2 on g2.Practical+g2.Written > g.Practical+g.Written
GROUP BY g.Student, g.Practical, g.Written, (g.Practical+g.Written) AS SumColumn
HAVING COUNT(*) < 5
ORDER BY g.Practical+g.Written DESC
This works by joining with all students that have greater scores, then using a HAVING clause to filter out those that have less than 5 with a greater score - giving you the top 5.
The left join is needed to return the top scorer(s), which have no other students with greater scores to join to.
Ties are all returned, leading to more than 5 rows in the case of a tie for 5th.
By not using row position logic, which varies from darabase to database, this query is also completely portable.
Note that the ORDER BY is optional.
With Oracle's PLSQL you can do:
SELECT score.Student, Practical, Written, (Practical+Written) as SumColumn
FROM ( SELECT Student, DENSE_RANK() OVER (ORDER BY Score DESC) as Score_dr
FROM VOTES ) as score, students
WHERE score.score_dr <= 5
and score.Student = students.Student
order by score.Score_dr;
You can easily include the projection of the first query in the sub-query of the second.
SELECT Student
, Practical
, Written
, tot_score
FROM (
SELECT Student
, Practical
, Written
, (Practical+Written) AS tot_score
, DENSE_RANK() OVER (ORDER BY (Practical+Written) DESC) as Score_dr
FROM Grades
)
WHERE Student_dr <= 5
order by Student_dr;
One virtue of analytic functions is that we can just use them in any query. This distinguishes them from aggregate functions, where we need to include all non-aggregate columns in the GROUP BY clause (at least with Oracle).

TSQL - Sum of Top 3 records of multiple teams

I am trying to generate a TSQL query that will take the top 3 scores (out of about 50) for a group of teams, sum the total of just those 3 scores and give me a result set that has just the name of the team, and that total score ordered by the score descending. I'm pretty sure it is a nested query - but for the life of me can't get it to work!
Here are the specifics, there is only 1 table involved....
table = comp_lineup (this table holds a separate record for each athlete in a match)
* athlete
* team
* score
There are many athletes to a match - each one belongs to a team.
Example:
id athlete team score<br>
1 1 1 24<br>
2 2 1 23<br>
3 3 2 21<br>
4 4 2 25<br>
5 5 1 20<br>
Thank You!
It is indeed a subquery, which I often put in a CTE instead just for clarity. The trick is the use of the rank() function.
;with RankedScores as (
select
id,
athlete,
team,
score,
rank() over (partition by team order by score desc) ScoreRank
from
#scores
)
select
Team,
sum(Score) TotalScore
from
RankedScores
where
ScoreRank <= 3
group by
team
order by
TotalScore desc
To get the top n value for every group of data a query template is
Select group_value, sum(value) total_value
From mytable ext
Where id in (Select top *n* id
From mytable sub
Where ext.group_value = sub.group_value
Order By value desc)
Group By group_value
The subquery retrieve only the ID of the valid data for the current group_value, the connection between the two dataset is the Where ext.group_value = sub.group_value part, the WHERE in the main query is used to mask every other ID, like a cursor.
For the specific question the template became
Select team, sum(score) total_score
From mytable ext
Where id in (Select top 3 id
From mytable sub
Where ext.team = sub.team
Order By score desc)
Group By team
Order By sum(score) Desc
with the added Order By in the main query for the descending total score

Fetch one row per account id from list, part 2

Not sure how to ask a followup on SO, but this is in reference to an earlier question:
Fetch one row per account id from list
The query I'm working with is:
SELECT *
FROM scores s1
WHERE accountid NOT IN (SELECT accountid FROM scores s2 WHERE s1.score < s2.score)
ORDER BY score DESC
This selects the top scores, and limits results to one row per accountid; their top score.
The last hurdle is that this query is returning multiple rows for accountids that have multiple occurrences of their top score. So if accountid 17 has scores of 40, 75, 30, 75 the query returns both rows with scores of 75.
Can anyone modify this query (or provide a better one) to fix this case, and truly limit it to one row per account id?
Thanks again!
If you're only interested in the accountid and the score, then you can use the simple GROUP BY query given by Paul above.
SELECT accountid, MAX(score)
FROM scores
GROUP BY accountid;
If you need other attributes from the scores table, then you can get other attributes from the row with a query like the following:
SELECT s1.*
FROM scores AS s1
LEFT OUTER JOIN scores AS s2 ON (s1.accountid = s2.accountid
AND s1.score < s2.score)
WHERE s2.accountid IS NULL;
But this still gives multiple rows, in your example where a given accountid has two scores matching its maximum value. To further reduce the result set to a single row, for example the row with the latest gamedate, try this:
SELECT s1.*
FROM scores AS s1
LEFT OUTER JOIN scores AS s2 ON (s1.accountid = s2.accountid
AND s1.score < s2.score)
LEFT OUTER JOIN scores AS s3 ON (s1.accountid = s3.accountid
AND s1.score = s3.score AND s1.gamedate < s3.gamedate)
WHERE s2.accountid IS NULL
AND s3.accountid IS NULL;
select accountid, max(score) from scores group by accountid;
If your RDBMS supports them, then an analytic function would be a good approach particularly if you need all the columns of the row.
select ...
from (
select accountid,
score,
...
row_number() over
(partition by accountid
order by score desc) score_rank
from scores)
where score_rank = 1;
The row returned is indeterminate in the case you describe, but you can easily modify the analytic function, for example by ordering on (score desc, test_date desc) to get the more recent of two matching high scores.
Other analytic functions based on rank will achieve a similar purpose.
If you don't mind duplicates then the following would probably me more efficient than your current method:
select ...
from (
select accountid,
score,
...
max(score) over (partition by accountid) max_score
from scores)
where score = max_score;
If you are selecting a subset of columns then you can use the DISTINCT keyword to filter results.
SELECT DISTINCT UserID, score
FROM scores s1
WHERE accountid NOT IN (SELECT accountid FROM scores s2 WHERE s1.score < s2.score)
ORDER BY score DESC
Does your database support distinct? As in select distinct x from y?
This solutions works in MS SQL, giving you the whole row.
SELECT *
FROM scores
WHERE scoreid in
(
SELECT max(scoreid)
FROM scores as s2
JOIN
(
SELECT max(score) as maxscore, accountid
FROM scores s1
GROUP BY accountid
) sub ON s2.score = sub.maxscore AND s2.accountid = s1.accountid
GROUP BY s2.score, s2.accountid
)