I just started learning SQL the other day and hit a stumbling block. I've got some code that looks like this:
SELECT player, SUM(wins) from (
SELECT win_counts.player, win_counts.wins
from win_counts
UNION
SELECT lose_counts.player, lose_counts.loses
from lose_counts
group by win_counts.player
) as temp_alias
Here is the error I get:
ERROR: missing FROM-clause entry for table "win_counts"
LINE 7: group by win_counts.player
This win_counts table contains a list of player ids and the number of matches they have one. The lose_counts tables contains a list of player ids and the number of matches they have lost. Ultimately I want a table of player ids and the total number of matches each player has played.
Thank you for the help. Sorry I don't have more information... my understanding of sql is pretty rudimentary.
Group by appears to be in the wrong place.
SELECT player, SUM(wins) as SumWinsLoses
FROM(
SELECT win_counts.player, win_counts.wins
FROM win_counts
UNION ALL -- as Gordon points out 'ALL' is likely needed, otherwise your sum will be
-- off as the UNION alone will distinct the values before the sum
-- and if you have a player with the same wins and losses (2),
-- the sum will return only 2 instead of (4).
SELECT lose_counts.player, lose_counts.loses
FROM lose_counts) as temp_alias
GROUP BY player
Just so we are clear though the SUm(Wins) will sum wins and losses as "wins" the first name in a union for a field is the name used. So a players wins and losses are going to be aggregated.
Here's a working SQL FIddle Notice without the union all... the player #2 has an improper count.
You already have a good answer and comments from others. For your edification:
In some scenarios it might be more efficient to aggregate before the union.
select player, sum(wins) from (
select player, count(*) as wins
from win_counts
group by player
UNION ALL /* without ALL you'll eliminate duplicate rows */
select player, count(*) as losses
from lose_counts
group by player
) as t
group by player
This should also give equivalent results if each player has both wins and losses:
select wins.player, wins + losses as total_matches
from
(
select player, count(*) as wins
from win_counts
group by player
) as wins
inner join
(
select player, count(*) as losses
from lose_counts
group by player
) as losses
on losses.player = wins.player
The fix for missing wins/losses is a full outer join:
select
coalesce(wins.player, losses.player) as player,
coalesce(wins. 0) + coalesce(losses, 0) as total_matches
from
(
select player, count(*) as wins
from win_counts
group by player
) as wins
full outer join
(
select player, count(*) as losses
from lose_counts
group by player
) as losses
on losses.player = wins.player
These complicated queries should give you a taste of why it's a bad idea to use separate tables for data that belongs together. In this case you should probably prefer a single table that records all matches as wins, losses (or ties).
Related
I Create DataBase in SQL about Basketball. Teacher give me the task, I need print out basketball players from my database with the max trophy count. So, I wrote this little bit of code:
select surname ,count(player_id) as trophy_count
from dbo.Players p
left join Trophies t on player_id=p.id
group by p.surname
and SQL gave me this:
but I want, that SQL will print only this:
I read info about select in selects, but I don't know how it works, I tried but it doesn't work.
Use TOP:
SELECT TOP 1 surname, COUNT(player_id) AS trophy_count -- or TOP 1 WITH TIES
FROM dbo.Players p
LEFT JOIN Trophies t
ON t.player_id = p.id
GROUP BY p.surname
ORDER BY COUNT(player_id) DESC;
If you want to get all ties for the highest count, then use SELECT TOP 1 WITH TIES.
;WITH CTE AS
(
select surname ,count(player_id) as trophy_count
from dbo.Players p
group by p.surname;
)
select *
from CTE
where trophy_count = (select max(trophy_count) from CTE)
While select top with ties works (and is probably more efficient) I would say this code is probably more useful in the real world as it could be used to find the max, min or specific trophy count if needed with a very simple modification of the code.
This is basically getting your group by first, then allowing you to specify what results you want back. In this instance you can use
max(trophy_count) - get the maximum
min(trophy_count) - get the minimum
# i.e. - where trophy_count = 3 - to get a specific trophy count
avg(trophy_count) - get the average trophy_count
There are many others. Google "SQL Aggregate functions"
You will eventually go down the rabbit hole of needing to subsection this (examples are by week or by league). Then you are going to want to use windows functions with a cte or subquery)
For your example:
;with cte_base as
(
-- Set your detail here (this step is only needed if you are looking at aggregates)
select surname,Count(*) Ct
left join Trophies t on player_id=p.id
group by p.surname
, cte_ranked as
-- Dense_rank is chosen because of ties
-- Add to the partition to break out your detail like by league, surname
(
select *
, dr = DENSE_RANK() over (partition by surname order by Ct desc)
from cte_base
)
select *
from cte_ranked
where dr = 1 -- Bring back only the #1 of each partition
This is by far overkill but helping you lay the foundation to handle much more complicated queries. Tim Biegeleisen's answer is more than adequate to answer you question.
Say I have three columns in a very large table: a timestamp variable (last_time_started), a player name (Michael Jordan), and the team he was on the last time he started (Washington Wizards, Chicago Bulls), how do I pull the last time a player started, grouped by player, showing the team? For example:
if I did
select max(last_time_started), player, team
from table
group by 2
I would not know which team the player was on when he played his last game, which is important to me.
In Postgres the most efficient way is to use distinct on():
SELECT DISTINCT ON (player)
last_time_started,
player,
team,
FROM the_table
ORDER BY player, last_time_started DESC;
Using a window function is usually the second fastest solution, using a join with a derived table is usually the slowest alternative.
Here's a couple of ways to do this in Postgres:
With windowing functions:
SELECT last_time_started, player, team
FROM
(
SELECT
last_time_started,
player,
team,
CASE WHEN max(last_time_started) OVER (PARTITION BY PLAYER) = last_time_started then 'X' END as max_last_time_started
FROM table
)
WHERE max_last_time_started = 'x';
Or with a correlated subquery:
SELECT last_time_started, player, team
FROM table t1
WHERE last_time_started = (SELECT max(last_time_started) FROM table WHERE table.player = t1.player);
Try this solution
select s.*
from table s
inner join (
select max(t.last_time_started) as last_time_started, t.player
from table t
group by t.player) v on s.player = t.player and s.last_time_started = t.last_time_started
Also this approach should be faster, because it does not contain join
select v.last_time_started,
v.player,
v.team
from (
select t.last_time_started,
t.player,
t.team,
row_number() over (partition by t.player order by last_time_started desc) as n
from table t
) v
where v.n = 1
Ok, I'm really not sure how to tackle this SQL problem so I'll just go ahead and explain it...
I have a table with the following columns :
gameId, winnerPlayerId, winnerFactionId, loserPlayerId, loserFactionId
Now, I'd like to build a query that, given a factionId, outputs the following data :
playerId, totalGamesPlayedAsFaction
What stumps me here, is that I each row in the table needs to be counted TWICE, once for the loser and once for the winner... Therefore I can't use a simple GROUP BY winnerPlayerId.
I feel as if my question is not particularly clear and that the solution is probably quite simple...
You need to "duplicate" the data somehow. The most typical way would use union all:
select . . .
from ((select gameId, winnerPlayerId as PlayerId, winnerFactionId as FactionId
from table t
) union all
(select gameId, loserPlayerId as PlayerId, loserFactionId as FactionId
from table t
)
) t
group by PlayerId, FactionId;
Looks like a job for the ever popular self join
select playerid, count(*) total
from player p left join game winner on p.playerid = winner.winnerplayerid
left join game loser on p.playerid = loser.loserplayerid
etc
This will get you started. You'll have to contend with filters and null values on your own.
The glamorous and multi-purpose way would be to do an unpivot of sorts, but the quick and dirty way is similar to the union all method. The full SQL for that would be:
select games.PlayerId, COUNT(games.gameId)
from ((select gameId, winnerPlayerId as PlayerId, winnerFactionId as FactionId
from tblGame winners
)
union all
(select gameId, loserPlayerId as PlayerId, loserFactionId as FactionId
from tblGame losers
)
) games
where games.FactionId = 'x'
group by games.PlayerId
I'm having a problem with a complex SELECT, so I hope some of you can help me out, because I'm really stuck with it... or maybe you can point me in a direction.
I have a table with the following columns:
score1, gamedate1, score2, gamedate2, score3, gamedate3
Basically I need to determine the ultimate winner of all the games, who got the SUMMED MAX score FIRST, based on the game times in ASCENDING order.
Assuming that the 1,2,3 are different players, something like this should work:
-- construct table as the_lotus suggests
WITH LotusTable AS
(
SELECT 'P1' AS Player, t.Score1 AS Score, t.GameDate1 as GameDate
FROM Tbl t
UNION ALL
SELECT 'P2' AS Player, t.Score2 AS Score, t.GameDate2 as GameDate
FROM Tbl t
UNION ALL
SELECT 'P3' AS Player, t.Score3 AS Score, t.GameDate3 as GameDate
FROM Tbl t
)
-- get running scores up through date for each player
, RunningScores AS
(
SELECT b.Player, b.GameDate, SUM(a.Score) AS Score
FROM LotusTable a
INNER JOIN LotusTable b -- self join
ON a.Player = b.Player
AND a.GameDate <= b.GameDate -- a is earlier dates
GROUP BY b.Player, b.GameDate
)
-- get max score for any player
, MaxScore AS
(
SELECT MAX(r.Score) AS Score
FROM RunningScores r
)
-- get min date for the max score
, MinGameDate AS
(
SELECT MIN(r.GameDate) AS GameDate
FROM RunningsScores r
WHERE r.Score = (SELECT m.Score FROM MaxScore m)
)
-- get all players who got the max score on the min date
SELECT *
FROM RunningScores r
WHERE r.Score = (SELECT m.Score FROM MaxScore m)
AND r.GameDate = (SELECT d.GameDate FROM MinGameDate d)
;
There are more efficient ways of doing it; in particular, the self-join could be avoided.
If your tables are set up three columns: player_id, score1, time
Then you would just need a simple query to sum their scores and group them by player_ID as follows:
SELECT gamedata1.player_ID as 'Player_ID',
sum(gamedata1.score1 + gamedata2.score1 + gamedata3.score1) as 'Total_Score'
FROM gamedata1
LEFT JOIN gamedata2 ON (gamedata1.player_ID = gamedata2.player_ID)
LEFT JOIN gamedata3 ON (gamedata1.player_ID = gamedata3.player_ID)
GROUP BY 'player_ID'
ORDER BY time ASC
Explanation:
You are essentially grouping by each player so you can get a distinct player in each row, and then summing their scores and organizing the data in this fashion. I put the "time" as a date type. The can be changed of coarse to any datetype, etc that you would prefer. The structure of the query would be the same.
I have a table of baseball fielding statistics for a project. There are many fields on this table, but the ones I care about for this are playerID, pos (position), G (games).
This table is historical so it contains multiple rows per playerID (one for each year/pos). What I want to be able to do is return the position that a player played the most for his career.
First what I imaging I have to do is count the games per position per playerID, then return the max of it. How can this be done in SQL? I am using SQL Server. On a side note, there may be a situation where there are ties, what would max do then?
If the player played in the same position over multiple teams over multiple games, I'd be more apt to use the sum() function, instead of count, in addition to using a group by statement, as a sub-query. See code for explanation.
SELECT playerID, pos, MAX( g_sum )
FROM (
SELECT DISTINCT playerID, pos, SUM( G ) as g_sum
FROM player_stats
GROUP BY id, pos
ORDER BY 3 DESC
) game_sums
GROUP BY playerID
It may not be the exact answer, at least it's a decent starting point and it worked on my lame testbed that I whipped up in 10 minutes.
As far as how max() acts with ties: It doesn't (as far as I can tell, at least). It's up to the actual GROUP BY statement itself, and where and how that max value shows up within the query or sub query.
If we were to include pos in the outer GROUP BY statement, in the event of a tie, it would show you both positions and the amount of games the player has played at said positions (which would be the same number). With it not in that GROUP BY statement, the query will go with the last given value for that column. So if position 2 showed up before position 3 in the sub query, the full query will show position 3 as the position that the player has played the most games in.
In SQL, I believe this will do it. Given that the same subquery is needed twice, I expect that doing this as a stored procedure would be more efficient.
SELECT MaxGamesInAnyPosition.playerID, GamesPerPosition.pos
FROM (
SELECT playerID, Max(totalGames) As maxGames
FROM (
SELECT playerID, pos, SUM(G) As totalGames
FROM tblStats
GROUP BY playerId, pos) Tallies
GROUP BY playerID) MaxGamesInAnyPosition
INNER JOIN (
SELECT playerID, pos, SUM(g) As totalGames
FROM tblStats
GROUP BY playerID, pos) GamesPerPosition
ON (MaxGamesInAnyPosition.playerID=GamesPerPosition.playerId
AND MaxGamesInAnyPosition.maxGames=GamesPerPosition.totalGames)
does not look pretty, but it is direct translation of what I built in linq to sql, give it a try and see if that's what you want:
SELECT [t2].[playerID], (
SELECT TOP (1) [t7].[pos]
FROM (
SELECT [t4].[playerID], [t4].[pos], (
SELECT COUNT(*)
FROM (
SELECT DISTINCT [t5].[G]
FROM [players] AS [t5]
WHERE ([t4].[playerID] = [t5].[playerID]) AND ([t4].[pos] = [t5].[pos])
) AS [t6]
) AS [value]
FROM (
SELECT [t3].[playerID], [t3].[pos]
FROM [players] AS [t3]
GROUP BY [t3].[playerID], [t3].[pos]
) AS [t4]
) AS [t7]
WHERE [t2].[playerID] = [t7].[playerID]
ORDER BY [t7].[value] DESC
) AS [pos]
FROM (
SELECT [t1].[playerID]
FROM (
SELECT [t0].[playerID]
FROM [players] AS [t0]
GROUP BY [t0].[playerID], [t0].[pos]
) AS [t1]
GROUP BY [t1].[playerID]
) AS [t2]
Here is a second answer, much better (I think) than my first kick at the can last night. Certainly much easier to read and understand.
SELECT playerID, pos
FROM (
SELECT playerID, pos, SUM(G) As totGames
FROM tblStats
GROUP BY playerID, pos) Totals
WHERE NOT (Totals.totGames < ANY(
SELECT SUM(G)
FROM tblStats
WHERE Totals.playerID=tblStats.playerID
GROUP BY playerID, pos))
The subquery ensures that all rows will be thrown out if its games total at that given position is smaller than the number of games that player played at any other position.
In case of ties, the player in question will have all tied rows appear, as none of the tied records will be thrown out.