How to use GROUP BY and JOIN in a SQL Statement? - sql

I have two tables called "Player" and "Country":
Player
Person Goals Country
------- ----- -------
Pogba 1 France
Pavard 1 France
Griezmann 2 France
Neymar 3 Brazil
Silva 2 Brazil
Country
Name Continent
------- --------------
France Europe
Brazil South America
I want to show the sum of goals for each country and display the country name, continent, and total goals
So, I would like my output to look like this:
Country Continent Goals
------- ------------- ------
France Europe 4
Brazil South America 5
I can display Country & Continent together and Country & Goals together but I can't do all three.
Here is what I tried:
SELECT Country.Name, Country.Continent, SUM(Player.Goals)
FROM Player
INNER JOIN Country ON Player.Country = Country.Name
GROUP BY Player.Country;
Maybe I'm over simplifying it? I just don't know how I can get the desired result.

Try the below - add Country.Continent to group by too
SELECT Country.Name, Country.Continent, SUM(Player.Goals)
FROM Player
INNER JOIN Country ON Player.Country = Country.Name
GROUP BY Country.Name, Country.Continent

SELECT Country.Continent, Player.Country, SUM(Player.Goals)
FROM Player
INNER JOIN Country ON Player.Country = Country.Name
GROUP BY Country.Continent, Player.Country

try this;
SELECT Country.Name, Country.Continent, SUM(Player.Goals) as Goals
FROM Player
INNER JOIN Country ON Player.Country = Country.Name
GROUP BY Country.Name, Country.Continent;

Although adding continent to the GROUP BY is definitely one solution, there are others.
First, you can use an aggregation function on continent:
SELECT c.Name, MAX(c.Continent) as Continent, SUM(p.Goals)
FROM Player p JOIN
Country c
ON p.Country = c.Name
GROUP BY c.Name;
Or, you can use a correlated subquery:
SELECT c.*,
(SELECT SUM(p.Goals)
FROM Player p
WHERE p.Country = c.Name
) as goals
FROM Country c;
Note that both of these include table aliases, which make the query easier to write and read.
The advantage of the last version is that it avoids an aggregation in the outer query and also it allows you to easily choose as many columns from Country as you like. It can also take advantage of an index on Player(Country), if available -- although that is irrelevant on smaller amounts of data.

Related

Sql query for the lowest score per country

I have a database with 3 tables. In table country I have id and name columns. The sport table also has id and name columns. Finally, the table match has id, player1, and player2(that are ids of country that play one against other), winner_id (id of country that won the match) and sport_id of the sport which was played. The least wins means that I just need in which sport country had the least wins, no matther on played matches.
I want to show the sport per country with the least wins. It should look like this:
Country
Sport
Wins
France
Basketball
2
How can I construct this query? I'm using SQL Server.
Data in table look like this. Table countries:
country_id
name
1
France
2
England
Table sport:
sport_id
name
1
Footbal
2
Basketball
Table match:
match_id
player1
player2
winner_id
sport_id
1
3
1
3
1
2
6
4
4
2
I want to note that the used wording with least wins is not clear, in my solution with least wins means most matches played with least wins.
To get this ranking, we need to know how many matches a country has played in each sport and how many of those have been won.
SELECT
country.name AS country,
sport.name AS sport,
sport_wins.wins
FROM
country
OUTER APPLY (
SELECT TOP 1
t.match_count,
COALESCE(t.wins, 0) AS wins,
t.sport_id
FROM (
SELECT
COUNT(*) AS match_count,
m_c.sport_id,
t.wins
FROM match m_c
OUTER APPLY (
SELECT
COUNT(*) AS wins,
match.sport_id
FROM match
WHERE country.country_id = match.winner_id
AND match.sport_id = m_c.sport_id
GROUP BY match.sport_id
) t
WHERE country.country_id IN (m_c.player1, m_c.player2)
GROUP BY m_c.sport_id, t.wins
) t
ORDER BY t.wins ASC, t.match_count DESC
) sport_wins
JOIN sport ON sport.sport_id = sport_wins.sport_id
Please, check a demo.
If you do not take into account losses, but only the number of wins is of interest, you can use a query like this one
WITH cte AS (
SELECT
country.country_id,
sport.sport_id,
SUM(CASE WHEN match.winner_id = country.country_id THEN 1 ELSE 0 END) AS wins
FROM country
CROSS JOIN sport
JOIN match ON match.sport_id = sport.sport_id
AND country.country_id IN (match.player1, match.player2)
GROUP BY country.country_id, sport.sport_id
)
SELECT
country.name,
sport.name,
t.min_wins AS wins
FROM (
SELECT
country_id,
MIN(wins) AS min_wins
FROM cte
GROUP BY country_id
) t
JOIN cte ON cte.country_id = t.country_id AND cte.wins = min_wins
JOIN country ON cte.country_id = country.country_id
JOIN sport ON cte.sport_id = sport.sport_id
This query takes into account the fact that the country participates in matches in sport, so if a country does not compete in a sport, that sport will not be included in the statistics as it will have 0 wins and this will be the minimum value.
Please, check a demo
You need to first cross-join the sports with the countries, then get the total.
Then you can use a row-number approach to get the bottom country in each sport
SELECT
c.Country,
c.Sport,
c.Wins
FROM (
SELECT
c.name Country,
s.name Sport,
COUNT(m.winner_id) Wins,
ROW_NUMBER() OVER (PARTITION BY s.sport_id, s.name ORDER BY COUNT(m.winner_id)) rn
FROM country c
CROSS JOIN sport s
LEFT JOIN [match] m
ON s.sport_id = m.sport_id AND m.winner_id = c.country_id
GROUP BY
s.sport_id,
s.name,
c.country_id,
c.name
) c
WHERE c.rn = 1;

How to write the SQL QUERY for this question

Given the CITY and COUNTRY tables, query the names of all the continents (COUNTRY.Continent) and their respective average city populations (CITY.Population) rounded down to the nearest integer.
Note: CITY.CountryCode and COUNTRY.Code are matching key columns.
Try the following
select
c.continent,
floor(avg(ci.population))
from country c
join city ci
on c.Code = ci.countrycode
group by
c.continent;

How to join tables on column containing text

I have 2 tables to merge:
t1
Continent Country City
-----------------------
Europe Germany Munich
NA Canada Ontario
Asia Singapore (blank)
Asia Japan Tokyo
AND
t2
Country Status
-----------------
Germany Complete
Canada Incomplete
Singapore Complete
Japan Complete
I want to get the continent with 2nd highest "Complete" status. I am new to SQL and I am trying hard to learn the basics, but I cannot get this done.
I understand that you want to pull out the continent that has the second most country marked as completed.
If so, you can join, aggregate, order by the count of completed countries per continent, and then filter on the second rows:
select continent
from
(select distinct country, continent from t2) t2
inner join t1 on t1.country = t2.country
group by continent
order by sum(case when status = 'Complete' then 1 else 0 end) desc
limit 1, 1
Note the use of distinct when retrieving the association of countries and continents: this is because your sample data seems like it could have more than one row per country/continent tuple (since it is referencing cities). Without the distinct, we would potientally generate duplicate rows, causing sum() to be wrong.
I understand you mean the country with more cities with status complete. You can use sub-queries:
with a as
(
select a.country,
sum(case when status = 'Complete' then 1 else 0 end) as CompleteCount
from t1 a inner join t2 b on a.country = b.country
group by a.country
)
select country from
(
select country,
ROW_NUMBER() OVER( ORDER BY CompleteCount desc) as OrderComplete
from a
)a where OrderComplete = 2

Select single row or multiple rows based on condition

I'm trying to identify a student's home district by joining a student's zip code to a district zip code. A given district may overlap several zip codes, so several possible home districts may appear for the student. For example, Zip code 99999 may include the Houston and Sugarland school districts. I can narrow down the home district to a single record when the student's city has the same name as the district name as for example if the student's city is Houston and the district name is Houston. In that case, I only want to retrieve the Houston district, not both Houston and Sugarland. However, if the student happened to live in Bayou with the zip code of 99999, then I'd want to retrieve both Houston and Sugarland districts since I don't have a fix on the district. I've tried several approaches but cannot come up with a solution. Here's
a primitive attempt:
Select S.Name, S.City, S.Zip, D.DistrictName
From tblStudent S
Left Join tblDistrict D on D.zip=S.zip
Where
(Case
When D.DistrictName=S.City then D.DistrictName
Else D.DistrictName
End)=D.DistrictName
Any suggestions are greatly appreciated!
You can try selecting each case separately and then making a union of the two queries.
Case 1: District Name equals City Name
Case 2: There is no District Name that is equal to the City Name
Something like this:
Select S.Name, S.City, S.Zip, D.DistrictName
From tblStudent S
Inner Join tblDistrict D on D.zip=S.zip and D.DistrictName = S.City
Union
Select S.Name, S.City, S.Zip, D.DistrictName
From tblStudent S
Inner Join tblDistrict D on D.zip=S.zip
Where not exists (
Select D2.* from tblDistrict D2
Where D2.DistrictName = S.City
And D2.zip = S.zip
)
SQL Fiddle: http://sqlfiddle.com/#!9/06d6d7/2/0

SQL Query Not Working, Returning Nothing Back

The goal of my query is to return the country, capital, and number of languages spoken. It also needs to be ordered by descending number of languages spoken, and then by capital. Finally, the number of languages must be at least 5 and 10 or less.
Here is my query:
SELECT country.name AS Country,
city.name AS Capital,
Count(countrylanguage.language) AS NumLanguages
FROM country,
city,
countrylanguage
WHERE city.id = country.capital
GROUP BY city.name,
country.name
HAVING ( Count(countrylanguage.language) BETWEEN 5 AND 10 );
It returns nothing. The where clause is necessary in order to get the city name to display. In the country table is just an id number, and then the city table holds the id number and name.
If anyone could spot my error I"d be very grateful!
You are missing the relationship with countrylanguage. Without it, you have a cartesian product, so Count(countrylanguage.language) is equal the number of records in countrylanguage, which is most likely to be greater then 10.
Here's a proposed solution (adjust with field names/DB structure accordingly):
SELECT country.name AS Country,
city.name AS Capital,
Count(countrylanguage.language) AS NumLanguages
FROM country,
city,
countrylanguage
WHERE city.id = country.capital
AND countrylanguage.language_id = country.language_id
GROUP BY city.name,
country.name
HAVING ( Count(countrylanguage.language) BETWEEN 5 AND 10 )
ORDER BY NumLanguages desc, city.Name
That said, you should always try to avoid joins in the WHERE clause of the query (implicit joins). Favoring explicit (declarative) joins will give you more readability and also more flexibility.
Update
As per comments suggestion, here is the query's version using ANSI-92 join syntax:
SELECT country.name AS Country,
city.name AS Capital,
Count(countrylanguage.language) AS NumLanguages
FROM country
INNER JOIN city on city.id = country.capital
INNER JOIN countrylanguage on countrylanguage.language_id = country.language_id
GROUP BY city.name,
country.name
HAVING ( Count(countrylanguage.language) BETWEEN 5 AND 10 );
ORDER BY NumLanguages desc, city.Name