SQL: Redundant WHERE clause specifying column is > 0? - sql

Help me understand this: In the sqlzoo tutorial for question 3a ("Find the largest country in each region"), why does attaching 'AND population > 0' to the nested SELECT statement make this correct?

The reason is because the:
AND population > 0
...is filtering out the null row for the region "Europe", name "Vatican", which complicates the:
WHERE population >= ALL (SELECT population
FROM ...)
...because NULL isn't a value, so Russia won't be ranked properly. The ALL operator requires that the value you were comparing to be greater or equal to ALL the values returned from the subquery, which can never happen when there's a NULL in there.
My query would've been either:
SELECT region, name, population
FROM bbc x
WHERE population = (SELECT MAX(population)
FROM bbc y
WHERE y.region = x.region)
...or, using a JOIN:
SELECT x.region, x.name, x.population
FROM bbc x
JOIN (SELECT y.region,
MAX(y.population) AS max_pop
FROM bbc y
GROUP BY y.region) z ON z.region = x.region
AND z.max_pop = x.population

No it doesn't. Largest country has a priori nonzero population.
It's like checking if a largest book has any pages in it.

Related

Why does this code work with a less than sign when the question requests for a greater than sign (nested selects from SQL zoo)?

(updated) I'm trying to learn SQL from the following website:
https://sqlzoo.net/wiki/Nested_SELECT_Quiz
In this quiz, the second question asks to find countries belonging to regions with all populations over 50000 -
It says that this is the correct answer:
SELECT name,region,population FROM bbc x WHERE 50000 < ALL (SELECT population FROM bbc y WHERE x.region=y.region AND y.population>0)
Thats the answer it gives me. Can anyone explain in plain English why this works? If we're looking for population over 50,000 why is the code using a less than sign? And how do nested selects work in general then?
Order matters. 50000 < population is the same as population > 50000.
Why write it this funny way? Because you have to.
Specifically all is a quantified comparison predicate and it must be of the form <value> <operator> all(<subquery>). So 50000 < all(subquery). I can't say why it cannot be reversed, possibly to make parsing this special case easier.
And how do nested selects work in general then?
all is true if every row of the subquery meets the condition. 50000 < all(subquery) means that 50,000 is less than every row in the subquery (or every row in the subquery is over 50000).
SELECT name,region,population
FROM bbc x
WHERE 50000 < ALL (
SELECT population
FROM bbc y
WHERE x.region=y.region AND y.population>0
)
The subquery runs once for each row in bbc. x is the bbc table in the original query and y is the bbc table in the subquery. where x.region=y.region filters the subquery results to only rows in the same region as the original row.

Question about #1 and #2 (ALL) in the Nested SELECT Quiz from sqlzoo.net

I'm a little confused by the use of ALL.In the Nested SELECT quiz from sqlzoo (link here:)
Q1: Select the code that shows the name, region and population of the smallest country in each region
SELECT region, name, population FROM bbc x WHERE population <= ALL (SELECT
population FROM bbc y WHERE y.region=x.region AND population>0)
I thought this made sense to me, because we're trying to get the population that is less than the smallest country in each region (queried with the inner subquery first).
But then, Q2 comes along: Select the code that shows the countries belonging to regions with all populations over 50000
And then code for that is:
SELECT name,region,population FROM bbc x WHERE 50000 < ALL (SELECT population
FROM bbc y WHERE x.region=y.region AND y.population>0)
If we're trying to get the countries with population > 50000, why is the sign not > but < instead?
I feel like I'm missing a basic understanding somewhere, but I'm not even sure where.
It would more readable and easier to understand if it could be written like this:
WHERE ALL (SELECT population....) > 50000
but this is syntactically wrong.
From https://oracle-base.com/articles/misc/all-any-some-comparison-conditions-in-sql
The ALL comparison condition is used to compare a value to a list or
subquery. It must be preceded by =, !=, >, <, <=, >= and
followed by a list or subquery.
Also from https://learn.microsoft.com/en-us/sql/t-sql/language-elements/all-transact-sql?view=sql-server-2017
the syntax should be:
scalar_expression { = | <> | != | > | >= | !> | < | <= | !< } ALL (subquery)
so you can't avoid having the ALL clause at the right side of the comparison,
but it's all the same since 50000 must be less than every item in the subquery.
For the first question, population is the first, so when you query "the population that is <= (less or equal to) all of the populations", it implies it's the smallest one.
In the second question, 50000 comes first in the comparison.
"Populations of over 50000" also implies that 50000 < those populations, which is exactly what it says in the query.

Code that would show the countries with a greater GDP than any country

I am new to SQL and was trying to solve a question on SQLzoo
Select the code that would show the countries with a greater GDP than any country in Africa (some countries may have NULL gdp values).
My answer to the question was
SELECT name FROM bbc
WHERE gdp > ALL (SELECT gdp
FROM bbc
WHERE region = 'Africa'
AND gdp<>NULL)
But the correct answer on the site is
SELECT name FROM bbc
WHERE gdp > (SELECT MAX(gdp)
FROM bbc
WHERE region = 'Africa')
I am not getting why the answer selected by me is wrong
Quiz Question No 5
The problem with the > all (...) solution selected by you is the fact that some of the gdp values is null.
When you compare a non-null value with a null, then the result is null, unless you use a null-safe operator, such as is null.
I also had the same question wrong.
Got this from somewhere
Because the NULL value cannot be equal or unequal to any value, you cannot perform any comparison on this value by using operators such as '=' or '<>'.
On testing, 50 <> null doesn't give the usual boolean 0 but gives null

difference between acting >= and > over a list

SELECT name, area
FROM world
WHERE area > ALL (SELECT area FROM world
WHERE continent="Europe" AND area IS NOT NULL)
SELECT name, area
FROM world
WHERE area >= ALL (SELECT area FROM world
WHERE continent="Europe" AND area IS NOT NULL)
What is the difference between these 2 queries?
Because they both give different result.
2 >= 2 is true.
2 > 2 is false.
your first query simply returns all countries in world that are bigger than all of countries in Europe (if you have set their area) in another word you are getting all countries that are bigger than the biggest country in Europe, the second query just returns all countries that are bigger than or equal to the biggest country in Europe.

I don't understand the need for self-joins. Can someone please explain them to me?

SELECT region, name, population
FROM bbc x
WHERE population <= ALL (SELECT population FROM bbc y WHERE y.region=x.region AND population>0)
I dont understand the logic of x and y using for the same table.
x and y are taking to be two different instances of table bbc, To list a table two times in the same query, you must provide a table alias for at least one of instance of the table name. This table alias helps the query processor determine whether columns should present data from the right or left version of the table.
This query returns all regions with smalest population in each rgion. To make this query without self-join you'll need to do 2 queries for each region:
1.
set #min=Select min(population) from bbc where population>0 and region=#region
2.
select region, name, population from bbc where population=#min and region=#region