SQL - Distinct Not Providing Unique Results for Designated Column - sql

I'm currently learning SQL by working through these exercises: https://sqlzoo.net/wiki/The_JOIN_operation
I'm on Example 8 which asks: "Show the name of all players who scored a goal against Germany."
Here is what I currently have:
SELECT DISTINCT(goal.player), goal.gtime, game.team1, game.team2
FROM game JOIN goal ON (goal.matchid = game.id)
WHERE (game.team1='GER' OR game.team2='GER') AND (goal.teamid<>'GER')
I would expect that results would be returned with only unique names. However, that is not the case as we can see "Mario Balotelli" is listed twice. Why doesn't the DISTINCT command work in this instance?
Thank you!

DISTINCT operates on the record level, so you should use distinct for the whole row or if you need extra fields to show up in your result, you need to perform a GROUP BY on the player and bring along other fields by joining to the grouped result.
but i reckon the intended answer is only the player name, so query would be something like this:
SELECT DISTINCT player
FROM game JOIN goal ON matchid = id
WHERE (game.team1='GER' OR game.team2='GER') AND (goal.teamid<>'GER')

It looks like "Mario Balotelli" can have multiple goal.gtime. Or can have different contributing values from Team1 and Team2. So try removing the additional columns you have in your select clause.

DISTINCT gets the distinct rows based on all selected columns. As the goal times differ selecting that column will make the rows different (distinct) from one another.
The question only asks you to select the player's name
SELECT DISTINCT player
FROM game
JOIN goal ON matchid = id
WHERE (team1='GER' OR team2='GER')
AND (teamid <>'GER')
This link looks like it would useful further reading https://www.designcise.com/web/tutorial/what-is-the-order-of-execution-of-an-sql-query
Edit: If you want more than one column but only a distinct list of players you are in the realms of aggregation, you would min/max/sum/avg the other data for the group.
SELECT player, team1, team2, MIN(gtime) AS min_gtime, MAX(gtime) AS max_gtime, COUNT(1) AS goals_scored
FROM game
JOIN goal ON matchid = id
WHERE (team1='GER' OR team2='GER')
AND (teamid <>'GER')
GROUP BY player, team1, team2

Related

When to use multiple GROUP BY in SQL?

I'm practicing SQL on SQLZOO, and I'm working on Joins. Question 11 of that section asks: "For every match involving 'POL', show the matchid, date and the number of goals scored."
So I tried the following code:
SELECT matchid, mdate, COUNT(player)
FROM goal JOIN game ON matchid = id
WHERE (team1 = 'POL' OR team2 = 'POL')
GROUP BY matchid
But it throws an error:
'gisq.game.mdate' isn't in GROUP BY
So the answer is:
SELECT matchid, mdate, COUNT(player)
FROM goal JOIN game ON matchid = id
WHERE (team1 = 'POL' OR team2 = 'POL')
GROUP BY matchid, mdate
My question is, why is it required to also include mdate in the GROUP BY clause if it's not part of the aggregate function? Thank you and sorry for the newbie question. Here is the table's format: https://sqlzoo.net/wiki/The_JOIN_operation
The simple reason why it is required is because SQL requires that the GROUP BY columns and the SELECT columns need to be compatible. Those are the rules of the language.
Your query slightly simplified is:
SELECT matchid, mdate, COUNT(player)
FROM goal JOIN
game
ON matchid = id
WHERE 'POL' IN (team1, team2)
GROUP BY matchid;
The query is saying: Return one row per matchid -- because of the GROUP BY. But then which mdate gets returned? There could be multiple matches.
SQL requires that you be explicit about what you want. You might intend the most recent date, in which case you would use MAX(mdate). Or you might want a separate row for each date, in which case you would include it in the GROUP BY. Or you might intend something else. The query needs to be clear.
When using aggregations and aggregating functions (COUNT, MAX, MIN, AVG, etc.) in the SELECT part of a query together with direct (not aggregated) columns, it's mandatory to repeat all not aggregated columns from the SELECT part in the GROUP BY part of the query. As the result, and this is what is required, all columns are aggregated, some of them by aggregating functions in the SELECT part of your query, the rest of them are aggregated in the GROUP BY clause.
Group By single column: Group By single column means, to place all the rows with same value of only that particular column in one group.
Group By multiple columns: Group by multiple column for example, GROUP BY column1, column2. This means to place all the rows with same values of both the columns column1 and column2 in one group
Since the question asks you to select date as well, you will have to put that in group by clause, lets suppose what if POL had multiple games on the same date. Keeping date in groupby clause can help you with that scenario.

SQL Join query brings multiple results

I have 2 tables. One lists all the goals scored in the English Premier League and who scored it and the other, the squad numbers of each player in the league.
I want to do a join so that the table sums the total number of goals by player name, and then looks up the squad number of that player.
Table A [goal_scorer]
[]1
Table B [squads]
[]2
I have the SQL query below:
SELECT goal_scorer.*,sum(goal_scorer.number),squads.squad_number
FROM goal_scorer
Inner join squads on goal_scorer.name=squads.player
group by goal_scorer.name
The issue I have is that in the result, the sum of 'number' is too high and seems to include duplicate rows. For example, Aaron Lennon has scored 33 times, not 264 as shown below.
Maybe you want something like this?
SELECT goal_scorer.*, s.total, squads.squad_number
FROM goal_scorer
LEFT JOIN (
SELECT name, sum(number) as total
FROM goal_scorer
GROUP BY name
) s on s.name = goal_scorer.name
JOIN squads on goal_scorer.name=squads.player
There are other ways to do it, but here I'm using a sub-query to get the total by player. NB: Most modern SQL platforms support windowing functions to do this too.
Also, probably don't need the left on the sub-query (since we know there will always be at least one name), but I put it in case your actual use case is more complicated.
Can you try this if you are using sql-server?
select *
from squads
outer apply(
selecr sum(goal_scorer.number) as score
from goal_scorer where goal_scorer.name=squads.player
)x

Return all entries in groups where a group member satisfies condition x

In Ms Access, I'm trying to write a SQL query that answers a question like:
Return all player IDs on all teams in a list of football players where at least one member of the team is injured.
I'm new to SQL. I've tried something like
SELECT pid FROM players WHERE team_id IN
(SELECT team_id FROM players WHERE injury = 'yes')
Access won't accept this IN. Is there a simple way to do this? I'd rather run thus as one query, instead of creating separate queries, so I can change it easily as necessary.
I find it hard to believe that it doesn't support IN, nevertheless you can do it using a join:
SELECT distinct players.pid
FROM players as injured
INNER JOIN players on players.team_id = injured.team_id
WHERE injured.injury = 'yes'
I used distinct in case there's multiple injured players on the one team, which would result in the players from that team being returned multiple times

Grouping Minus Oracle Problems

I've just created this query and I get confuse by the time I grouping this because I can't see them as one grouping. This query runs but not the way I wanted, I want to group the query by the team name but the problem occurs when its query being counted using count(*) and the result of its counting produces the same number ,,,
SELECT TEAM.NAMATEAM, PERSONAL.KODEPERSON
FROM TEAM, PERSONAL
WHERE TEAM.KODETEAM = PERSONAL.KODETEAM
GROUP BY PERSONAL.KODEPERSON, TEAM.NAMATEAM
MINUS
SELECT TEAM.NAMATEAM, PERSONAL.KODEPERSON
FROM TEAM, PERSONAL, AWARD_PERSON
WHERE TEAM.KODETEAM = PERSONAL.KODETEAM
AND AWARD_PERSON.PEMENANG = PERSONAL.KODEPERSON
GROUP BY TEAM.NAMATEAM, PERSONAL.KODEPERSON;
I want to group all these using the team name but using counting will be problem since I have no idea to group within the technique that can be run smoothly as I wanted. Thank you.
Do I understand your question? You are trying to make a table of columns NAMATEAM,X where NAMATEAM are the team names, and X are the number of people on each team who do not have awards (listed in AWARD_PERSON). If so, you should be able to use a sub-select:
SELECT T_NAME, COUNT(*)
FROM (
SELECT TEAM.NAMATEAM "T_NAME", PERSONAL.KODEPERSON
FROM TEAM, PERSONAL
WHERE TEAM.KODETEAM = PERSONAL.KODETEAM
MINUS
SELECT TEAM.NAMATEAM "T_NAME", PERSONAL.KODEPERSON
FROM TEAM, PERSONAL, AWARD_PERSON
WHERE TEAM.KODETEAM = PERSONAL.KODETEAM
AND AWARD_PERSON.PEMENANG = PERSONAL.KODEPERSON )
-- your original query without the GROUP BYs
GROUP BY T_NAME
The first subselect SELECT creates a full list of players, the second subselect SELECT creates a list of players who have won awards (I assume), the MINUS removes the award winners from the full list. Thus the full subselect returns a list of players and their teams, for all players without awards.
The main SELECT then summarizes on the team name only, to yield a per-team count of players without awards.
You should not need your original GROUP BY TEAM.NAMATEAM, PERSONAL.KODEPERSON, unless you have duplicate rows in your database, e.g., one player on one team has more than one row in the database.

SQL Database SELECT question

Need some help with an homework assignment on SQL
Problem
Find out who (first name and last name) has played the most games in the chess tournament with an ID = 41
Background information
I got a table called Games, which contains information...
game ID
tournament ID
start_time
end_time
white_pieces_player_id
black_pieces_player_id
white_result
black_result
...about all the separate chess games that have taken place in three different tournaments ....
(tournaments having ID's of 41,42 and 47)
...and the first and last names of the players are stored in a table called People....
person ID (same ID which comes up in the table 'Games' as white_pieces_player_id and
black_pieces_player_id)
first_name
last_name
...how to make a SELECT statement in SQL that would give me the answer?
sounds like you need to limit by tournamentID in your where clause, join with the people table on white_pieces_player_id and black_pieces_player_id, and use the max function on the count of white_result = win union black_result = win.
interesting problem.
what do you have so far?
hmm... responding to your comment
SELECT isik.eesnimi
FROM partii JOIN isik ON partii.valge=isik.id
WHERE turniir='41'
group by isik.eesnimi
having count(*)>4
consider using the max() function instead of the having count(*)> number
you can add the last name to the select clause if you also add it to the group by clause
sry, I only speak American. What language is this code in?
I would aggregate a join to that table to a derived table like this:
SELECT a.last_name, a.first_name, CNT(b.gamecount) totalcount
FROM players a
JOIN (select cnt(*) gamecount, a.playerid
FROM games
WHERE a.tournamentid = 47
AND (white_player_id = a.playerid OR black_player_id = a.playerid)
GROUP BY playerid
) b
ON b.playerid = a.playerid
GROUP BY last_name, first_name
ORDER BY totalcount
something like this so that you are getting both counts for their black/white play and then joining and aggregating on that.
Then, if you only want the top one, just select the TOP 1