SQL Join query brings multiple results - sql

I have 2 tables. One lists all the goals scored in the English Premier League and who scored it and the other, the squad numbers of each player in the league.
I want to do a join so that the table sums the total number of goals by player name, and then looks up the squad number of that player.
Table A [goal_scorer]
[]1
Table B [squads]
[]2
I have the SQL query below:
SELECT goal_scorer.*,sum(goal_scorer.number),squads.squad_number
FROM goal_scorer
Inner join squads on goal_scorer.name=squads.player
group by goal_scorer.name
The issue I have is that in the result, the sum of 'number' is too high and seems to include duplicate rows. For example, Aaron Lennon has scored 33 times, not 264 as shown below.

Maybe you want something like this?
SELECT goal_scorer.*, s.total, squads.squad_number
FROM goal_scorer
LEFT JOIN (
SELECT name, sum(number) as total
FROM goal_scorer
GROUP BY name
) s on s.name = goal_scorer.name
JOIN squads on goal_scorer.name=squads.player
There are other ways to do it, but here I'm using a sub-query to get the total by player. NB: Most modern SQL platforms support windowing functions to do this too.
Also, probably don't need the left on the sub-query (since we know there will always be at least one name), but I put it in case your actual use case is more complicated.

Can you try this if you are using sql-server?
select *
from squads
outer apply(
selecr sum(goal_scorer.number) as score
from goal_scorer where goal_scorer.name=squads.player
)x

Related

SQL - Distinct Not Providing Unique Results for Designated Column

I'm currently learning SQL by working through these exercises: https://sqlzoo.net/wiki/The_JOIN_operation
I'm on Example 8 which asks: "Show the name of all players who scored a goal against Germany."
Here is what I currently have:
SELECT DISTINCT(goal.player), goal.gtime, game.team1, game.team2
FROM game JOIN goal ON (goal.matchid = game.id)
WHERE (game.team1='GER' OR game.team2='GER') AND (goal.teamid<>'GER')
I would expect that results would be returned with only unique names. However, that is not the case as we can see "Mario Balotelli" is listed twice. Why doesn't the DISTINCT command work in this instance?
Thank you!
DISTINCT operates on the record level, so you should use distinct for the whole row or if you need extra fields to show up in your result, you need to perform a GROUP BY on the player and bring along other fields by joining to the grouped result.
but i reckon the intended answer is only the player name, so query would be something like this:
SELECT DISTINCT player
FROM game JOIN goal ON matchid = id
WHERE (game.team1='GER' OR game.team2='GER') AND (goal.teamid<>'GER')
It looks like "Mario Balotelli" can have multiple goal.gtime. Or can have different contributing values from Team1 and Team2. So try removing the additional columns you have in your select clause.
DISTINCT gets the distinct rows based on all selected columns. As the goal times differ selecting that column will make the rows different (distinct) from one another.
The question only asks you to select the player's name
SELECT DISTINCT player
FROM game
JOIN goal ON matchid = id
WHERE (team1='GER' OR team2='GER')
AND (teamid <>'GER')
This link looks like it would useful further reading https://www.designcise.com/web/tutorial/what-is-the-order-of-execution-of-an-sql-query
Edit: If you want more than one column but only a distinct list of players you are in the realms of aggregation, you would min/max/sum/avg the other data for the group.
SELECT player, team1, team2, MIN(gtime) AS min_gtime, MAX(gtime) AS max_gtime, COUNT(1) AS goals_scored
FROM game
JOIN goal ON matchid = id
WHERE (team1='GER' OR team2='GER')
AND (teamid <>'GER')
GROUP BY player, team1, team2

Efficient approach to get two-dimensional datau using

For the sake of example, let's say I have the following models:
teams
each team has an arbitrary amount of fans
In SQL, this means you end up with the following tables:
team: identifier, name
fan: identifier, name
team_fan: team_identifier, fan_identifier
I am looking for an approach to retrieve:
all teams, and
for each team, the first 5 fans of which his/her name starts with an 'A'.
What is an efficient approach to do this?
In my current naive approach, I do <# teams> + 1 queries, which is troublesome:
First: SELECT * FROM team
Then, for each team with identifier X:
SELECT *
FROM fan
INNER JOIN team_fan
ON fan.identifier = team_fan.fan_identifier AND team_fan.team_identifier = X
WHERE fan.name LIKE 'A%'
ORDER BY fan.name LIMIT 5
There should be a better way to do this.
I could first retrieve all teams, as I do now, and then do something like:
SELECT *
FROM fan
WHERE fan.name LIKE 'A%'
AND fan.identifier IN (
SELECT fan_identifier
FROM team_fan
WHERE team_identifier IN (<all team identifiers from first query>))
ORDER BY fan.name
However, this approach ignores the requirement that I need the first 5 fans for each team with his/her name starting with an 'A'. Just adding LIMIT 5 to the query above is not correct.
Also, with this approach, if I have a large amount of teams, I am sending the corresponding team identifiers back to the database in the second query (for the IN (<all team identifiers from first query>)), which might kill performance?
I am developing against PostgreSQL, Java, Spring and plain JDBC.
You need a three table join
SELECT team.*, fan.*
FROM team
JOIN team_fan
ON team.team_identifier = team_fan.team_identifier
JOIN fan
ON fan.fan_identifier = team_fan.fan_identifier
Now to filter you need to do this.
with cte as (
SELECT team.*, fan.*,
row_number() over (partition by team.team_identifier
order by fan.name) as rn
FROM team
JOIN team_fan
ON team.team_identifier = team_fan.team_identifier
JOIN fan
ON fan.fan_identifier = team_fan.fan_identifier
WHERE fan.name LIKE 'A%'
)
SELECT *
FROM cte
WHERE rn <= 5
Usually, RDBMSes have their own hacks around standard SQL that allows you to have a number in a count over some condition of grouping/ordering.
Postgres is no exception, it got ROW_NUMBER() function.
What you need is to partition your row numbers properly, order them by alphabet and restrict the query to row numbers < 6.

How to show count value as 0 on rows removed with WHERE (microsoft access)

I have two tables where one table represent the survey with the location and the other table the people interviewed (there are many people for each survey). I'm trying to show the count of people over a certain age in each location, however some provinces don't have anyone over certain ages therefore don't show in the resulting table. I would like the count to show zero if no one is over a certain age.
I have:
SELECT a.location, Count([b.age])
FROM Survey AS a LEFT JOIN person AS b ON a.surveyid = b.surveyid
Where b.age >= 85
GROUP BY a.location;
I realize that the WHERE clause is what is eliminating the zero count results but I can't figure out the subquery I would need.
Use conditional aggregation instead. That means moving the boolean condition to the argument of the aggregation function
SELECT s.location,
SUM(IIF(p.age >= 85, 1, 0))
FROM Survey AS s LEFT JOIN
person AS p
ON s.surveyid = p.surveyid
GROUP BY s.location;
Noticed that I changed the table aliases to be abbreviations of the table names. This makes the query easier to follow.

How to include zero results when querying one single table?

I have a table called Apartments that has three columns: apartment_type, person, date. It includes the apartment type selected by a certain person and date. I need to count how many people picked each of the apartment types. Some apartment type have 0 population.
Here is my query:
SELECT apartment_type, COUNT(*) AS TOTAL
FROM Apartments
GROUP BY apartment_type
It works great, but it doesn't include apartment types with a value of 0. Please, help me to correct this query.
In case some appartment_type have 0 population - your table will not contain any record with that type - so you must add some join from another table, where all apartment types exists. Or use union to create all 0 populated entries.
Something like:
SELECT apartment_type, COUNT(*) AS TOTAL
FROM (SELECT * FROM Apartments UNION ALL SELECT apartment_type, 0 as person, 0 as date from SomeTableWithFullListOfTypes group by apartment_type) as tmp
GROUP BY apartment_type
I generally agree with Nosyara's answer, but I don't agree with his sample query with the union all. I'm not sure it works, and it's certainly too complicated.
As stated already, if you don't have a table with all the possible apartment types, create one. Then you can write your query using a simple left join:
select t.apartment_type, count(a.apartment_type) as total
from apartment_types t
left join apartments a
on a.apartment_type = t.apartment_type
group by t.apartment_type
Note how count(*) was replaced by count(a.apartment_type). That change is necessary to have an accurate count in the case where you don't have apartments for a certain apartment type.
SELECT apartment_type, COUNT(apartment.*) AS TOTAL
FROM apartment_type
left join apartment
on apartment_type.aparentment_type = apartements.apartment_type
GROUP BY apartment_type
Using a left join will give you everything from the left side of the join (so all your types) and anything from the right that matches.

SQL Database SELECT question

Need some help with an homework assignment on SQL
Problem
Find out who (first name and last name) has played the most games in the chess tournament with an ID = 41
Background information
I got a table called Games, which contains information...
game ID
tournament ID
start_time
end_time
white_pieces_player_id
black_pieces_player_id
white_result
black_result
...about all the separate chess games that have taken place in three different tournaments ....
(tournaments having ID's of 41,42 and 47)
...and the first and last names of the players are stored in a table called People....
person ID (same ID which comes up in the table 'Games' as white_pieces_player_id and
black_pieces_player_id)
first_name
last_name
...how to make a SELECT statement in SQL that would give me the answer?
sounds like you need to limit by tournamentID in your where clause, join with the people table on white_pieces_player_id and black_pieces_player_id, and use the max function on the count of white_result = win union black_result = win.
interesting problem.
what do you have so far?
hmm... responding to your comment
SELECT isik.eesnimi
FROM partii JOIN isik ON partii.valge=isik.id
WHERE turniir='41'
group by isik.eesnimi
having count(*)>4
consider using the max() function instead of the having count(*)> number
you can add the last name to the select clause if you also add it to the group by clause
sry, I only speak American. What language is this code in?
I would aggregate a join to that table to a derived table like this:
SELECT a.last_name, a.first_name, CNT(b.gamecount) totalcount
FROM players a
JOIN (select cnt(*) gamecount, a.playerid
FROM games
WHERE a.tournamentid = 47
AND (white_player_id = a.playerid OR black_player_id = a.playerid)
GROUP BY playerid
) b
ON b.playerid = a.playerid
GROUP BY last_name, first_name
ORDER BY totalcount
something like this so that you are getting both counts for their black/white play and then joining and aggregating on that.
Then, if you only want the top one, just select the TOP 1