Find average GDP according to continent in SQL - sql

I have 2 tables- Economics (land_code, gdp) and Continent (Land_code, Cont, Percentage). I need to create a query that calculates average GDP for Continent. In case the country is at the same time in several continents, we should also consider the percentage of GDP that belongs to continent. As I have understood if Egypt has GDP of 100, then 90 belongs to Africa and 10 to Asia, how can I implement this expression?
!!! ALREADY DONE)

Obviously,
select Economics.land_code, Economics.gdp * Continent.Percentage
from ...

Related

Needing Clarity on SQL Join Query

Having some trouble understanding this query, particularly the WHERE in the subquery. I don't really get what it is accomplishing. Any help would be appreciated. Thanks
# Find the largest country (by area) in each continent. Show the continent,
# name, and area.
SELECT continent, name, area
FROM countries AS a
WHERE area = (
SELECT MAX(area)
FROM countries AS b
WHERE a.continent = b.continent
)
Consider the following subset of the countries data:
Continent Country Area
North America USA 3718691
North America Canada 3855081
North America Mexico 761602
Europe France 211208
Europe Germany 137846
Europe UK 94525
Europe Italy 116305
This is a correlated query that behaves as follows:
Reads the first row returned by the outer query (North America, USA, 3718691)
Runs the subquery which correlates to a.continent, North America, and returns 3855081 which is the maximum area in North America.
Does the where equality which checks to see if 3855081 matches the area on the row we're working on.
It doesn't match so the next row in the outer query is read and we start over at step 1 this time working on the second row.
Repeat for all rows in the outer query.
When we're looking at rows 2 and 4, step 4. will match so those rows will be returned by the query.
You can check the results by using this data in your countries table and running the query.
Note that this is a very poor way to determine the country with the maximum area per continent because it repeats the subquery for every country. Using my sample data, it determines the maximum area for North America 3 times and the maximum area for Europe 4 times.
Since you asked in your comment, I would write this query as follows:
SELECT a.continent, a.name, a.area
FROM countries AS a
inner join (select continent, max(area) max_area
from countries
group by continent) as b on a.continent = b.continent
WHERE a.area = b.max_area
In this version of the query, the maximum for each continent is only determined once. The original query was written to illustrate correlated queries and it's important to understand them. Correlated queries can often be used to resolve complex logic.
The subquery is finding the maximum area for countries. Which countries? All countries that match the continent of the country in the outer query.
So, for each country it gets the area of the largest country on the same continent.
The WHERE clause then says "are the two areas the same -- the maximum area and the area of this country?". It chooses only countries that have the maximum area.

using 'groupby.count' with agg

df.head
Populous Continents
Australia 2.331602e+07 Australia
Brazil 2.059153e+08 South America
Canada 3.523986e+07 North America
China 1.367645e+09 Asia
France 6.383735e+07 Europe
Above are the first 5 entries of my dataframe.
I want to group them by Continents, then I want to perform some statistical analysis. I want to create a new dataframe with the Avg, Sum, STD of each Group's populous as well as the count of countries in each group, as its columns.
new_df =df.groupby('Continents')['Populous'].agg({ 'Avg': np.average, 'Sum':np.sum, 'STD': np.std}), takes care of three columns, but I don't know how to get count in there. I tried including 'Size': count , within the agg method, but it resulted in an error.
Thank you.
You might also find this useful:
df.groupby('Continents').Populous.describe().unstack()
Also see this answer if you want more stats.
You can use 'Size': len or 'Size': 'count' for this to work. However, as #DSM pointed out, len does count missing values whereas 'count' doesn't.

How to design table relationship where the foreign key can mean "all rows", "some rows" or "one row"?

I hope you can help me with this. I've used pseudocode to keep everything simple.
I have a table which describes locations.
location_table
location = charfield(200) # New York, London, Tokyo
A product manager now wants locations to be as follows:
Global = select every location
Asia = select every location in Asia
US = select every location in US
Current system = London (etc.)
This is my proposed redesign.
location_table
location = charfield(200) # New York, London, Tokyo
continent = foreign key to continent_table
continent_table
continent = charfield(50) # "None", "Global", Asia, Europe
But this seems horrible. It means in my code I'll always need to check if the customer is using "global" or "none", and then select the corresponding location records. For example, there will be code like this scattered everywhere:
get continent
if continent is global, select everything from location_table
else if continent is none, select location from location_table
else select location from location_table where foreign key is continent
My feeling is this is a known problem, and there is a known solution for it. Any ideas?
Thank you.
What you seem to have here is a set of locations, and then a set of location groups. Those groups might be all of the locations (global), or a subset of them.
You can build this with an intermediate table between the locations and a new location sets table which associates locations and location sets.
You might build the location set table and the join table so that the individual locations are also location sets, but ones which join only to one location. That way all location selections come from one table -- the location sets.
So you end up with three different types of location set:
Ones which map 1:1 with a location
One which maps 1:all ("global")
Ones which map 1:many (continents and other areas)
It's conceivable that this could be created as a hierarchy, but those queries can be inefficient because the join cardinalities tend to be obscured from the optimiser.
You could do this using a hierarchy, and a self referencing foreign key, e.g.
LocationID Name ParentLocationID LocationType
------------------------------------------------------------------
1 Planet Earth NULL Planet
2 Africa 1 Continent
3 Antartica 1 Continent
4 Asia 1 Continent
5 Australasia 1 Continent
6 Europe 1 Continent
7 North America 1 Continent
8 South America 1 Continent
9 United States 7 Country
10 Canada 7 Country
11 Mexico 7 Country
12 California 9 State
13 San Diego 12 City
14 England 6 Country
15 Cornwall 14 County
16 Truro 15 City
Hierarchical data usually requires either recursion, or multiple joins to get all levels, this answer contains links to articles comparing performance on the major DBMS.
Many DBMS now support recursive Common table expressions, and since no DBMS is specified I will use SQL Server syntax because it is what I am most comfortable with, a quick example would be.
DECLARE #LocationID INT = 7; -- NORTH AMERICA
WITH LocationCTE AS
( SELECT l.LocationID, l.Name, l.ParentLocationID, l.LocationType
FROM dbo.Location AS l
WHERE LocationID = #LocationID
UNION ALL
SELECT l.LocationID, l.Name, l.ParentLocationID, l.LocationType
FROM dbo.Location AS l
INNER JOIN LocationCTE AS c
ON c.LocationID = l.ParentLocationID
)
SELECT *
FROM LocationCTE;
Output based on above sample data
LocationID Name ParentLocationID LocationType
-----------------------------------------------------------------
7 North America 1 Continent
9 United States 7 Country
10 Canada 7 Country
11 Mexico 7 Country
12 California 9 State
13 San Diego 12 City
Online Demo
Supplying a value of 1 (Planet Earth) for the location ID will return the full table, or supplying a locationID of 11 (Mexico) would only return this one row, because there is nothing smaller than this in the sample data.
I'll go with your answer and say that I don't find it quite horrible to look everytime a customer to check if he searches by city or location, or nothing. That would be the role of the backend code and would always lead to different queries depending on what option he chooses.
But I would remove "None", "Global" from the continent table, and just use other queries when these option are not chosen. You would end up with the 3 possibles SQL queries you have, and I don't find it to be bad design per se. Maybe other solution are more performant, but this one seems to be more readable and logical. It's just optional querying with join tables.
Other answer will trade performance/duplication for readability (which isn't a bad thing, depending on how many time you will be relying on this condition in your application, in how many queries you'll be using it, and how many cities you have).
For readability and non-repetition, the best thing would be to concentrate these condition in one SQL function wich take a string parameter and return all location depending on the input (but at the cost of preformance).
Use levels:
0 -> None
00 -> Global
001 -> Europe
002 -> Asia
003 -> Africa
select location from location_table where continent like '[value]%'
Using a fixed length code, you can prefix regions, and then add one more digit for a region inside a region, and so on.
Ok, let me try to improve it.
Consider the world, it has the minimum level (or maximum depending on how you see it)
World ID = '0' (1 digit)
Now, select how you want to divide the world: (Continents, Half-Continents, ...) and assign the next level.
Europe ID = '01' (First digit World + Second digit Europe)
Asia ID = '02'
America ID = '03'
...
Next Level: Countries. (At least 2 digits)
England ID = '0101' (World + Continent + Country)
Deutchland ID = '0102'
....
Texas ID = '0301'
....
Next Level: Regions (2 digits)
Yorkshire ID = '010101' (World + Continent + Country + Region)
....
Next Level: Cities (2 or 3 digits)
London ID = '01010101' (World + Continent + Country + Region + City)
And so on.
Now, the same SELECT some_aggregate, statistics, ... FROM ... can be used for no matter what region, simply change:
WHERE Region like '0%' --> The whole world
WHERE Region like '02%' --> Asia
WHERE Region like '01010101%' --> London
WHERE Region like '02%' AND Region like '01%' --> Asia & Europe

Sqlzoo SELECT within SELECT Tutorial #5

My Question is:
Germany (population 80 million) has the largest population of the
countries in Europe. Austria (population 8.5 million) has 11% of the
population of Germany.
Show the name and the population of each country in Europe. Show the
population as a percentage of the population of Germany.
My answer:
SELECT name,CONCAT(ROUND(population/80000000,-2),'%')
FROM world
WHERE population = (SELECT population
FROM world
WHERE continent='Europe')
What I am doing wrong?
Thanks.
The question was incomplete and was taken from here
This is the answer
SELECT
name,
CONCAT(ROUND((population*100)/(SELECT population
FROM world WHERE name='Germany'), 0), '%')
FROM world
WHERE population IN (SELECT population
FROM world
WHERE continent='Europe')
I was wondering about sub-query as from OP'S question it wasn't clear (at least to me). The reason is that "world" table (as the name suggest, I have to admit) contains all world country whereas we're interested only into european one. Moreover, the population of Germany has to be retrieved from DB because it's not extacly 80.000.000; if you use that number you receive back 101% as Germany population.
When using sql server in SQL Zoo, then don't use CONCAT:
I think SQL Zoo uses a version of SQL Server that doesn't support CONCAT and furthermore it looks like you have to do a CAST. Instead concatenate with the use of '+'. Also see this post.
I figure the script should be something like beneath (though I haven't got it to my desired stated, because of the fact I want to result to look like 3%;0%;4%;etc. instead of 3.000000000000000%;0.000000000000000%;4.000000000000000%;etc.. And I start a new topic for that one here).
SELECT
name,
CAST(ROUND(population*100/(SELECT population FROM world WHERE name='Germany'), 0) as varchar(20)) +'%'
FROM world
WHERE population IN (SELECT population
FROM world
WHERE continent='Europe')
select name, CONCAT(ROUND((population/(select population from world where name = "Germany"))*100),"%")
from world
where continent= "Europe"
SELECT name, CONCAT(ROUND(population/(SELECT population FROM world WHERE name = 'Germany')*100,0), '%')
FROM world
WHERE continent = 'Europe'
As of this writing (Aug 16th, 2022), the accepted answer is a percentage without trailing 0s.
Using CONCAT and CAST solves this.
SELECT name, CONCAT(CAST(100*ROUND((population / (SELECT population FROM world WHERE name ='Germany')), 2) AS INT), '%')
FROM world
WHERE continent = 'Europe';
sub query should be return multiple data so you can use in function like this
SELECT name,CONCAT(ROUND(population/80000000,-2),'%')
FROM world
WHERE population IN (SELECT population
FROM world
WHERE continent='Europe')
Below added code snippet
select name, concat (round(population/(select population from world where
name='germany')*100,0), '%') from world where continent='Europe'
The following worked for me:
SELECT name, concat(format(ROUND(population / (SELECT population FROM world WHERE name = 'Germany') * 100, 5), '0'), '%') AS percentage
FROM world
WHERE continent = 'Europe';

Select distinct values with count in PostgreSQL

This is a heavily simplified version of an SQL problem I'm dealing with. Let's say I've got a table of all the cities in the world, like this:
country city
------------
Canada Montreal
Cuba Havanna
China Beijing
Canada Victoria
China Macau
I want to count how many cities each country has, so that I would end up with a table as such:
country city_count
------------------
Canada 50
Cuba 10
China 200
I know that I can get the distinct country values with SELECT distinct country FROM T1 and I suspect I need to construct a subquery for the city_count column. But my non-SQL brain is just telling me I need to loop through the results...
Thanks!
Assuming the only reason for a new row is a unique city
select country, count(country) AS City_Count
from table
group by country