SQL: max(count((x)) - sql

I have a table of baseball fielding statistics for a project. There are many fields on this table, but the ones I care about for this are playerID, pos (position), G (games).
This table is historical so it contains multiple rows per playerID (one for each year/pos). What I want to be able to do is return the position that a player played the most for his career.
First what I imaging I have to do is count the games per position per playerID, then return the max of it. How can this be done in SQL? I am using SQL Server. On a side note, there may be a situation where there are ties, what would max do then?

If the player played in the same position over multiple teams over multiple games, I'd be more apt to use the sum() function, instead of count, in addition to using a group by statement, as a sub-query. See code for explanation.
SELECT playerID, pos, MAX( g_sum )
FROM (
SELECT DISTINCT playerID, pos, SUM( G ) as g_sum
FROM player_stats
GROUP BY id, pos
ORDER BY 3 DESC
) game_sums
GROUP BY playerID
It may not be the exact answer, at least it's a decent starting point and it worked on my lame testbed that I whipped up in 10 minutes.
As far as how max() acts with ties: It doesn't (as far as I can tell, at least). It's up to the actual GROUP BY statement itself, and where and how that max value shows up within the query or sub query.
If we were to include pos in the outer GROUP BY statement, in the event of a tie, it would show you both positions and the amount of games the player has played at said positions (which would be the same number). With it not in that GROUP BY statement, the query will go with the last given value for that column. So if position 2 showed up before position 3 in the sub query, the full query will show position 3 as the position that the player has played the most games in.

In SQL, I believe this will do it. Given that the same subquery is needed twice, I expect that doing this as a stored procedure would be more efficient.
SELECT MaxGamesInAnyPosition.playerID, GamesPerPosition.pos
FROM (
SELECT playerID, Max(totalGames) As maxGames
FROM (
SELECT playerID, pos, SUM(G) As totalGames
FROM tblStats
GROUP BY playerId, pos) Tallies
GROUP BY playerID) MaxGamesInAnyPosition
INNER JOIN (
SELECT playerID, pos, SUM(g) As totalGames
FROM tblStats
GROUP BY playerID, pos) GamesPerPosition
ON (MaxGamesInAnyPosition.playerID=GamesPerPosition.playerId
AND MaxGamesInAnyPosition.maxGames=GamesPerPosition.totalGames)

does not look pretty, but it is direct translation of what I built in linq to sql, give it a try and see if that's what you want:
SELECT [t2].[playerID], (
SELECT TOP (1) [t7].[pos]
FROM (
SELECT [t4].[playerID], [t4].[pos], (
SELECT COUNT(*)
FROM (
SELECT DISTINCT [t5].[G]
FROM [players] AS [t5]
WHERE ([t4].[playerID] = [t5].[playerID]) AND ([t4].[pos] = [t5].[pos])
) AS [t6]
) AS [value]
FROM (
SELECT [t3].[playerID], [t3].[pos]
FROM [players] AS [t3]
GROUP BY [t3].[playerID], [t3].[pos]
) AS [t4]
) AS [t7]
WHERE [t2].[playerID] = [t7].[playerID]
ORDER BY [t7].[value] DESC
) AS [pos]
FROM (
SELECT [t1].[playerID]
FROM (
SELECT [t0].[playerID]
FROM [players] AS [t0]
GROUP BY [t0].[playerID], [t0].[pos]
) AS [t1]
GROUP BY [t1].[playerID]
) AS [t2]

Here is a second answer, much better (I think) than my first kick at the can last night. Certainly much easier to read and understand.
SELECT playerID, pos
FROM (
SELECT playerID, pos, SUM(G) As totGames
FROM tblStats
GROUP BY playerID, pos) Totals
WHERE NOT (Totals.totGames < ANY(
SELECT SUM(G)
FROM tblStats
WHERE Totals.playerID=tblStats.playerID
GROUP BY playerID, pos))
The subquery ensures that all rows will be thrown out if its games total at that given position is smaller than the number of games that player played at any other position.
In case of ties, the player in question will have all tied rows appear, as none of the tied records will be thrown out.

Related

Sql max trophy count

I Create DataBase in SQL about Basketball. Teacher give me the task, I need print out basketball players from my database with the max trophy count. So, I wrote this little bit of code:
select surname ,count(player_id) as trophy_count
from dbo.Players p
left join Trophies t on player_id=p.id
group by p.surname
and SQL gave me this:
but I want, that SQL will print only this:
I read info about select in selects, but I don't know how it works, I tried but it doesn't work.
Use TOP:
SELECT TOP 1 surname, COUNT(player_id) AS trophy_count -- or TOP 1 WITH TIES
FROM dbo.Players p
LEFT JOIN Trophies t
ON t.player_id = p.id
GROUP BY p.surname
ORDER BY COUNT(player_id) DESC;
If you want to get all ties for the highest count, then use SELECT TOP 1 WITH TIES.
;WITH CTE AS
(
select surname ,count(player_id) as trophy_count
from dbo.Players p
group by p.surname;
)
select *
from CTE
where trophy_count = (select max(trophy_count) from CTE)
While select top with ties works (and is probably more efficient) I would say this code is probably more useful in the real world as it could be used to find the max, min or specific trophy count if needed with a very simple modification of the code.
This is basically getting your group by first, then allowing you to specify what results you want back. In this instance you can use
max(trophy_count) - get the maximum
min(trophy_count) - get the minimum
# i.e. - where trophy_count = 3 - to get a specific trophy count
avg(trophy_count) - get the average trophy_count
There are many others. Google "SQL Aggregate functions"
You will eventually go down the rabbit hole of needing to subsection this (examples are by week or by league). Then you are going to want to use windows functions with a cte or subquery)
For your example:
;with cte_base as
(
-- Set your detail here (this step is only needed if you are looking at aggregates)
select surname,Count(*) Ct
left join Trophies t on player_id=p.id
group by p.surname
, cte_ranked as
-- Dense_rank is chosen because of ties
-- Add to the partition to break out your detail like by league, surname
(
select *
, dr = DENSE_RANK() over (partition by surname order by Ct desc)
from cte_base
)
select *
from cte_ranked
where dr = 1 -- Bring back only the #1 of each partition
This is by far overkill but helping you lay the foundation to handle much more complicated queries. Tim Biegeleisen's answer is more than adequate to answer you question.

SQL FROM clause missing error

I just started learning SQL the other day and hit a stumbling block. I've got some code that looks like this:
SELECT player, SUM(wins) from (
SELECT win_counts.player, win_counts.wins
from win_counts
UNION
SELECT lose_counts.player, lose_counts.loses
from lose_counts
group by win_counts.player
) as temp_alias
Here is the error I get:
ERROR: missing FROM-clause entry for table "win_counts"
LINE 7: group by win_counts.player
This win_counts table contains a list of player ids and the number of matches they have one. The lose_counts tables contains a list of player ids and the number of matches they have lost. Ultimately I want a table of player ids and the total number of matches each player has played.
Thank you for the help. Sorry I don't have more information... my understanding of sql is pretty rudimentary.
Group by appears to be in the wrong place.
SELECT player, SUM(wins) as SumWinsLoses
FROM(
SELECT win_counts.player, win_counts.wins
FROM win_counts
UNION ALL -- as Gordon points out 'ALL' is likely needed, otherwise your sum will be
-- off as the UNION alone will distinct the values before the sum
-- and if you have a player with the same wins and losses (2),
-- the sum will return only 2 instead of (4).
SELECT lose_counts.player, lose_counts.loses
FROM lose_counts) as temp_alias
GROUP BY player
Just so we are clear though the SUm(Wins) will sum wins and losses as "wins" the first name in a union for a field is the name used. So a players wins and losses are going to be aggregated.
Here's a working SQL FIddle Notice without the union all... the player #2 has an improper count.
You already have a good answer and comments from others. For your edification:
In some scenarios it might be more efficient to aggregate before the union.
select player, sum(wins) from (
select player, count(*) as wins
from win_counts
group by player
UNION ALL /* without ALL you'll eliminate duplicate rows */
select player, count(*) as losses
from lose_counts
group by player
) as t
group by player
This should also give equivalent results if each player has both wins and losses:
select wins.player, wins + losses as total_matches
from
(
select player, count(*) as wins
from win_counts
group by player
) as wins
inner join
(
select player, count(*) as losses
from lose_counts
group by player
) as losses
on losses.player = wins.player
The fix for missing wins/losses is a full outer join:
select
coalesce(wins.player, losses.player) as player,
coalesce(wins. 0) + coalesce(losses, 0) as total_matches
from
(
select player, count(*) as wins
from win_counts
group by player
) as wins
full outer join
(
select player, count(*) as losses
from lose_counts
group by player
) as losses
on losses.player = wins.player
These complicated queries should give you a taste of why it's a bad idea to use separate tables for data that belongs together. In this case you should probably prefer a single table that records all matches as wins, losses (or ties).

Pulling max values grouped by a variable with other columns in SQL

Say I have three columns in a very large table: a timestamp variable (last_time_started), a player name (Michael Jordan), and the team he was on the last time he started (Washington Wizards, Chicago Bulls), how do I pull the last time a player started, grouped by player, showing the team? For example:
if I did
select max(last_time_started), player, team
from table
group by 2
I would not know which team the player was on when he played his last game, which is important to me.
In Postgres the most efficient way is to use distinct on():
SELECT DISTINCT ON (player)
last_time_started,
player,
team,
FROM the_table
ORDER BY player, last_time_started DESC;
Using a window function is usually the second fastest solution, using a join with a derived table is usually the slowest alternative.
Here's a couple of ways to do this in Postgres:
With windowing functions:
SELECT last_time_started, player, team
FROM
(
SELECT
last_time_started,
player,
team,
CASE WHEN max(last_time_started) OVER (PARTITION BY PLAYER) = last_time_started then 'X' END as max_last_time_started
FROM table
)
WHERE max_last_time_started = 'x';
Or with a correlated subquery:
SELECT last_time_started, player, team
FROM table t1
WHERE last_time_started = (SELECT max(last_time_started) FROM table WHERE table.player = t1.player);
Try this solution
select s.*
from table s
inner join (
select max(t.last_time_started) as last_time_started, t.player
from table t
group by t.player) v on s.player = t.player and s.last_time_started = t.last_time_started
Also this approach should be faster, because it does not contain join
select v.last_time_started,
v.player,
v.team
from (
select t.last_time_started,
t.player,
t.team,
row_number() over (partition by t.player order by last_time_started desc) as n
from table t
) v
where v.n = 1

GROUP BY, value could be in either of 2 columns

Ok, I'm really not sure how to tackle this SQL problem so I'll just go ahead and explain it...
I have a table with the following columns :
gameId, winnerPlayerId, winnerFactionId, loserPlayerId, loserFactionId
Now, I'd like to build a query that, given a factionId, outputs the following data :
playerId, totalGamesPlayedAsFaction
What stumps me here, is that I each row in the table needs to be counted TWICE, once for the loser and once for the winner... Therefore I can't use a simple GROUP BY winnerPlayerId.
I feel as if my question is not particularly clear and that the solution is probably quite simple...
You need to "duplicate" the data somehow. The most typical way would use union all:
select . . .
from ((select gameId, winnerPlayerId as PlayerId, winnerFactionId as FactionId
from table t
) union all
(select gameId, loserPlayerId as PlayerId, loserFactionId as FactionId
from table t
)
) t
group by PlayerId, FactionId;
Looks like a job for the ever popular self join
select playerid, count(*) total
from player p left join game winner on p.playerid = winner.winnerplayerid
left join game loser on p.playerid = loser.loserplayerid
etc
This will get you started. You'll have to contend with filters and null values on your own.
The glamorous and multi-purpose way would be to do an unpivot of sorts, but the quick and dirty way is similar to the union all method. The full SQL for that would be:
select games.PlayerId, COUNT(games.gameId)
from ((select gameId, winnerPlayerId as PlayerId, winnerFactionId as FactionId
from tblGame winners
)
union all
(select gameId, loserPlayerId as PlayerId, loserFactionId as FactionId
from tblGame losers
)
) games
where games.FactionId = 'x'
group by games.PlayerId

Finding entry with maximum date range between two columns in SQL

Note: this is in Oracle not MySQL, limit/top won't work.
I want to return the name of the person that has stayed the longest in a hotel. The longest stay can be found by subtracting the date in the checkout column with the checkin column.
So far I have:
select fans.name
from fans
where fans.checkout-fans.checkin is not null
order by fans.checkout-fans.checkin desc;
but this only orders the length of stay of each person from highest to lowest. I want it to only return the name (or names, if they are tied) of people who have stayed the longest. Also, As more than one person could have stayed for the highest length of time, simply adding limit 1 to the end won't do.
Edit (for gbn), when adding a join to get checkin/checkout from other table it wont work (no records returned)
edit 2 solved now, the below join should of been players.team = teams.name
select
x.name
from
(
select
players.name,
dense_rank() over (order by teams.checkout-teams.checkin desc) as rnk
from
players
join teams
on players.name = teams.name
where
teams.checkout-teams.checkin is not null
) x
where
x.rnk = 1
Should be this using DENSE_RANK to get ties
select
x.name
from
(
select
fans.name,
dense_rank() over (order by fans.checkout-fans.checkin desc) as rnk
from
fans
where
fans.checkout-fans.checkin is not null
) x
where
x.rnk = 1;
SQL Server has TOP..WITH TIES for this, but this is a generic solution for any RDBMS that has DENSE_RANK.
Longest is a fuzzy word, you should first define what is long for you. Using limit may not be a solution for this case. So you can define your treshold and try to filter your results where fans.checkout-fans.checkin > 10 for instance.
Try this:
select name, (checkout-checkin) AS stay
from fans
where stay is not null -- remove fans that never stayed at a hotel
order by stay desc;
For Oracle:
select * from
(
select fans.name
from fans
where fans.checkout-fans.checkin is not null
order by fans.checkout-fans.checkin desc)
where rownum=1
Another way that should work in all dbms (or almost all, at least those that support subqueries):
select fans.name
from fans
where fans.checkout-fans.checkin =
( select max(f.checkout-f.checkin)
from fans f
) ;
If both the columns are date fields you can use this query:
select fans.name from fans where fans.checkout-fans.checkin in (select max(fans.checkout-fans.checkin) from fans );