SQL Excercise part 2 - sql

I am stuck on another SQL exercise, here is the question:
Show the 20 biggest cities in the United States along with their rank in the state (with respect to their population) and percent of the city population in a state (call it: perc_pop_state).
Here is what i have so far. This produces the table i am looking for, but for some weird reason the percentages of city population to state population are 0's for all the states with multiple cities and 1 for all states with one city. Can anyone guide me as to what is wrong with my code.
select
city.name, city.population, city.district, rank() over (partition by district order by city.population desc), city.population / sum(city.population) over (partition by district) as perc_pop_state
from
city
inner join country on code = countrycode
where
country.name = 'United States'
order by
city.population desc

I don't know which database this is for, but most likely it's because you the figures are not decimal data type - they are whole numbers, so when you divide them the result is a whole number (0 or 1) rather than a fraction. So you should use something like this
CAST(city.population AS DECIMAL(19,4))
/
CAST(sum(city.population) AS DECIMAL(19,4))

Related

using result of SELECT in another SELECT

I'm trying to do question 9 here:
https://sqlzoo.net/wiki/SELECT_within_SELECT_Tutorial
I currently have the code:
SELECT continent, SUM(y.population) as Population
FROM world AS y
GROUP BY(y.continent)
HAVING SUM(y.population) < 250000000;
This returns the continents with a sum of their respective populations less than 250000000. I know I need to encase this in another select to make use of the continent returned, but don't know how to do this?
I tried something like this:
SELECT A.continent from world A
INNER JOIN(
SELECT B.continent, SUM(B.population) as Population
FROM world B
GROUP BY(B.continent)
HAVING SUM(B.population) < 250000000
) ON A.continent = B.continent;
^This was to try and get a single list of the continents which i could then encase in another select to iterate through and print the country names, although I feel there must be a way to directly iterate through the continent column from the first example?
This is likely something pretty trivial, but regardless any help would be great
there are multiple ways to solve this - i used a count of all countries on the continent = count of countries with population<25000000
your main mistake is in logic - SUM - it should be EACH country, not the summary
select name,w1.continent,population
from world w1
join
(
SELECT distinct continent, count(name) cnt
FROM world x
WHERE population<=25000000
group by continent
)w2
on w1.continent=w2.continent
where cnt=(select count(name) from world where w1.continent=continent)

Nested Select in SQL

I'm trying the 5th question in the Nested Select of SQL zoo (using Oracle engine) http://sqlzoo.net/wiki/SELECT_within_SELECT_Tutorial
Show the name and the population of each country in Europe. Show the
population as a percentage of the population of Germany.
I know the correct answer (given below), but something puzzled me.
SELECT name, CONCAT(ROUND(population/(SELECT population
FROM world WHERE name = 'Germany'),2)*100,'%')
FROM world WHERE continent = 'Europe'
When I run the following modified query, only one row (Albania) is returned.
SELECT name, population/(SELECT population
FROM world WHERE name = 'Germany')
FROM world WHERE continent = 'Europe'
Wondering if anyone can shed light on the inner workings of Oracle as to why only Albania is returned? Its puzzling to me why it doesn't work without ROUND().
The correct answer is actually:
SELECT name,
CONCAT(ROUND(population/(SELECT population FROM world WHERE name = 'Germany')*100,0),'%')
FROM world
WHERE continent = 'Europe'
Note the results set rounds to zero decimal spaces.
That said, I tried your exact code above and still get results for all countries:
SELECT name,
population/(SELECT population FROM world WHERE name = 'Germany')
FROM world
WHERE continent = 'Europe'
You're right to be puzzled as it should certainly return regardless of using ROUND() or not, but since I cannot recreate it, I can't explain it.
A more efficient Oracle query (that gets rid of the sub-query) is to use an analytic function:
SELECT name,
ROUND(
population
/ MAX( CASE name WHEN 'Germany' THEN population END ) OVER ()
* 100
) || '%'
FROM world;
However, sqlzoo appears to use MariaDB so you can't put that query into the website (but if you recreate the table in Oracle then you can test it).

having clause without aggregate in select

Similar to this question: NOT IN vs IN Do Not Return Complimentary Results
Basically I am trying to answer this question: Find each country that belongs to a continent where all populations are less than 25000000. Show name, continent and population.
Number 7 here has all the table details: http://sqlzoo.net/wiki/SELECT_within_SELECT_Tutorial
This query works: it effectively takes all countries belonging to a continent which has no country with a pop greater than 25mm
SELECT name, continent, population
FROM world x
WHERE continent NOT IN(SELECT DISTINCT continent FROM world
WHERE population >= 25000000)
This query does not work. I am trying to use having without having an aggregate function in the select statement. Is this allowed? Currently my subquery returns no results, so I am obviously mistaken somewhere.
SELECT name, continent, population
FROM world x
WHERE continent in (SELECT continent FROM world
having max(population) < 25000000)
Figured it out, thank you #Michael Berkowsiki for the hint in the comment above.
SELECT name, continent, population
FROM world x
WHERE continent in (SELECT continent FROM world
group by continent
having max(population) < 25000000)
"Each column referenced in the SELECT statement must be referenced in the GROUP BY clause, unless the column is an argument for an aggregate function included in the SELECT clause." ("Modern Database Management" 10th Edition, Jefferey A. Hoffer, Page 276)
So in this case, I believe your answer needs to be modified as:
SELECT name, continent, population
FROM world x
WHERE continent in (SELECT continent FROM world
group by name, continent, population
having max(population) < 25000000)

How do I use the MAX function over three tables?

So, I have a problem with a SQL Query.
It's about getting weather data for German cities. I have 4 tables: staedte (the cities with primary key loc_id), gehoert_zu (contains the city-key and the key of the weather station that is closest to this city (stations_id)), wettermessung (contains all the weather information and the station's key value) and wetterstation (contains the stations key and location). And I'm using PostgreSQL
Here is how the tables look like:
wetterstation
s_id[PK] standort lon lat hoehe
----------------------------------------
10224 Bremen 53.05 8.8 4
wettermessung
stations_id[PK] datum[PK] max_temp_2m ......
----------------------------------------------------
10224 2013-3-24 -0.4
staedte
loc_id[PK] name lat lon
-------------------------------
15 Asch 48.4 9.8
gehoert_zu
loc_id[PK] stations_id[PK]
-----------------------------
15 10224
What I'm trying to do is to get the name of the city with the (for example) highest temperature at a specified date (could be a whole month, or a day). Since the weather data is bound to a station, I actually need to get the station's ID and then just choose one of the corresponding to this station cities. A possible question would be: "In which city was it hottest in June ?" and, say, the highest measured temperature was in station number 10224. As a result I want to get the city Asch. What I got so far is this
SELECT name, MAX (max_temp_2m)
FROM wettermessung, staedte, gehoert_zu
WHERE wettermessung.stations_id = gehoert_zu.stations_id
AND gehoert_zu.loc_id = staedte.loc_id
AND wettermessung.datum BETWEEN '2012-8-1' AND '2012-12-1'
GROUP BY name
ORDER BY MAX (max_temp_2m) DESC
LIMIT 1
There are two problems with the results:
1) it's taking waaaay too long. The tables are not that big (cities has about 70k entries), but it needs between 1 and 7 minutes to get things done (depending on the time span)
2) it ALWAYS produces the same city and I'm pretty sure it's not the right one either.
I hope I managed to explain my problem clearly enough and I'd be happy for any kind of help. Thanks in advance ! :D
If you want to get the max temperature per city use this statement:
SELECT * FROM (
SELECT gz.loc_id, MAX(max_temp_2m) as temperature
FROM wettermessung as wm
INNER JOIN gehoert_zu as gz
ON wm.stations_id = gz.stations_id
WHERE wm.datum BETWEEN '2012-8-1' AND '2012-12-1'
GROUP BY gz.loc_id) as subselect
INNER JOIN staedte as std
ON std.loc_id = subselect.loc_id
ORDER BY subselect.temperature DESC
Use this statement to get the city with the highest temperature (only 1 city):
SELECT * FROM(
SELECT name, MAX(max_temp_2m) as temp
FROM wettermessung as wm
INNER JOIN gehoert_zu as gz
ON wm.stations_id = gz.stations_id
INNER JOIN staedte as std
ON gz.loc_id = std.loc_id
WHERE wm.datum BETWEEN '2012-8-1' AND '2012-12-1'
GROUP BY name
ORDER BY MAX(max_temp_2m) DESC
LIMIT 1) as subselect
ORDER BY temp desc
LIMIT 1
For performance reasons always use explicit joins as LEFT, RIGHT, INNER JOIN and avoid to use joins with separated table name, so your sql serevr has not to guess your table references.
This is a general example of how to get the item with the highest, lowest, biggest, smallest, whatever value. You can adjust it to your particular situation.
select fred, barney, wilma
from bedrock join
(select fred, max(dino) maxdino
from bedrock
where whatever
group by fred ) flinstone on bedrock.fred = flinstone.fred
where dino = maxdino
and other conditions
I propose you use a consistent naming convention. Singular terms for tables holding a single item per row is a good convention. You only table breaking this is staedte. Should be stadt.
And I suggest to use station_id consistently instead of either s_id and stations_id.
Building on these premises, for your question:
... get the name of the city with the ... highest temperature at a specified date
SELECT s.name, w.max_temp_2m
FROM (
SELECT station_id, max_temp_2m
FROM wettermessung
WHERE datum >= '2012-8-1'::date
AND datum < '2012-12-1'::date -- exclude upper border
ORDER BY max_temp_2m DESC, station_id -- id as tie breaker
LIMIT 1
) w
JOIN gehoert_zu g USING (station_id) -- assuming normalized names
JOIN stadt s USING (loc_id)
Use explicit JOIN conditions for better readability and maintenance.
Use table aliases to simplify your query.
Use x >= a AND x < b to include the lower border and exclude the upper border, which is the common use case.
Aggregate first and pick your station with the highest temperature, before you join to the other tables to retrieve the city name. Much simpler and faster.
You did not specify what to do when multiple "wettermessungen" tie on max_temp_2m in the given time frame. I added station_id as tiebreaker, meaning the station with the lowest id will be picked consistently if there are multiple qualifying stations.

How do I eliminate duplicate city names while adding their total count in a SQL query?

SELECT city, COUNT(pNo) Total
FROM Zip z JOIN Property p ON (z.zipcode = p.zipcode)
WHERE state = 'AL' AND rent <= 500
GROUP BY city, p.zipcode HAVING COUNT(pNo) >= 15
ORDER BY Total DESC, city;
Above is my code. My goal is to not have multiple listings of the same city, but instead have each city display once and if the city has duplicates, add their totals together. I have tried the DISTINCT clause, but it only eliminates the duplicates without doing doing any adding. I have tried sticking SUM in the code, too, but I can't quite put my finger on where it should go. Any suggestions?
The problem is you're grouping by zip code, thus creating duplicate city entries (presumably with different counts).
If you want just distinct cities, remove p.zipcode from your GROUP BY and you should be good to go.
Good luck.