difference between acting >= and > over a list - sql

SELECT name, area
FROM world
WHERE area > ALL (SELECT area FROM world
WHERE continent="Europe" AND area IS NOT NULL)
SELECT name, area
FROM world
WHERE area >= ALL (SELECT area FROM world
WHERE continent="Europe" AND area IS NOT NULL)
What is the difference between these 2 queries?
Because they both give different result.

2 >= 2 is true.
2 > 2 is false.
your first query simply returns all countries in world that are bigger than all of countries in Europe (if you have set their area) in another word you are getting all countries that are bigger than the biggest country in Europe, the second query just returns all countries that are bigger than or equal to the biggest country in Europe.

Related

Question about #1 and #2 (ALL) in the Nested SELECT Quiz from sqlzoo.net

I'm a little confused by the use of ALL.In the Nested SELECT quiz from sqlzoo (link here:)
Q1: Select the code that shows the name, region and population of the smallest country in each region
SELECT region, name, population FROM bbc x WHERE population <= ALL (SELECT
population FROM bbc y WHERE y.region=x.region AND population>0)
I thought this made sense to me, because we're trying to get the population that is less than the smallest country in each region (queried with the inner subquery first).
But then, Q2 comes along: Select the code that shows the countries belonging to regions with all populations over 50000
And then code for that is:
SELECT name,region,population FROM bbc x WHERE 50000 < ALL (SELECT population
FROM bbc y WHERE x.region=y.region AND y.population>0)
If we're trying to get the countries with population > 50000, why is the sign not > but < instead?
I feel like I'm missing a basic understanding somewhere, but I'm not even sure where.
It would more readable and easier to understand if it could be written like this:
WHERE ALL (SELECT population....) > 50000
but this is syntactically wrong.
From https://oracle-base.com/articles/misc/all-any-some-comparison-conditions-in-sql
The ALL comparison condition is used to compare a value to a list or
subquery. It must be preceded by =, !=, >, <, <=, >= and
followed by a list or subquery.
Also from https://learn.microsoft.com/en-us/sql/t-sql/language-elements/all-transact-sql?view=sql-server-2017
the syntax should be:
scalar_expression { = | <> | != | > | >= | !> | < | <= | !< } ALL (subquery)
so you can't avoid having the ALL clause at the right side of the comparison,
but it's all the same since 50000 must be less than every item in the subquery.
For the first question, population is the first, so when you query "the population that is <= (less or equal to) all of the populations", it implies it's the smallest one.
In the second question, 50000 comes first in the comparison.
"Populations of over 50000" also implies that 50000 < those populations, which is exactly what it says in the query.

Code that would show the countries with a greater GDP than any country

I am new to SQL and was trying to solve a question on SQLzoo
Select the code that would show the countries with a greater GDP than any country in Africa (some countries may have NULL gdp values).
My answer to the question was
SELECT name FROM bbc
WHERE gdp > ALL (SELECT gdp
FROM bbc
WHERE region = 'Africa'
AND gdp<>NULL)
But the correct answer on the site is
SELECT name FROM bbc
WHERE gdp > (SELECT MAX(gdp)
FROM bbc
WHERE region = 'Africa')
I am not getting why the answer selected by me is wrong
Quiz Question No 5
The problem with the > all (...) solution selected by you is the fact that some of the gdp values is null.
When you compare a non-null value with a null, then the result is null, unless you use a null-safe operator, such as is null.
I also had the same question wrong.
Got this from somewhere
Because the NULL value cannot be equal or unequal to any value, you cannot perform any comparison on this value by using operators such as '=' or '<>'.
On testing, 50 <> null doesn't give the usual boolean 0 but gives null

I don't understand the need for self-joins. Can someone please explain them to me?

SELECT region, name, population
FROM bbc x
WHERE population <= ALL (SELECT population FROM bbc y WHERE y.region=x.region AND population>0)
I dont understand the logic of x and y using for the same table.
x and y are taking to be two different instances of table bbc, To list a table two times in the same query, you must provide a table alias for at least one of instance of the table name. This table alias helps the query processor determine whether columns should present data from the right or left version of the table.
This query returns all regions with smalest population in each rgion. To make this query without self-join you'll need to do 2 queries for each region:
1.
set #min=Select min(population) from bbc where population>0 and region=#region
2.
select region, name, population from bbc where population=#min and region=#region

NOT IN vs IN Do Not Return Complimentary Results

Hi I am working through example #7 from the sql zoo tutorial: SELECT within SELECT. In the following question
"Find each country that belongs to a continent where all populations are less than 25000000. Show name, continent and population."
I get the right answer by using NOT IN and a sub query like this:
SELECT name, continent, population FROM world
WHERE continent NOT IN (
SELECT continent FROM world
WHERE population > 25000000)
If I on the other hand use "IN" instead of "NOT IN" and "population < 25000000" I do not get the right answer and I can not understand why that is, there is probably simple reason for this I just don't see it, can anyone explain it to me?
If I'm reading this correctly, the question asks to list every country in a continent where every country has a population below 25000000, correct?
If yes, look at your sub query:
SELECT continent FROM world
WHERE population > 25000000
You are pulling every continent that has at least one country w/ population over 25000000, so excluding those is why it works.
Example: Continent Alpha has 5 countries, four of them are small, but one of them, country Charlie has a population of 50000000.
So your sub query will return Continent Alpha because country Charlie fit the constraint of population > 25000000. This sub query will find everything that you don't want, that's why using the not in will work.
On the other hand:
SELECT continent FROM world
WHERE population > 25000000
If ANY country is below 25000000, it will display the continent, which is not what you want, because you want EVERY country to be below.
Example: Continent Alpha from before, the four small countries. Those four are below 25000000, so they will be returned by your sub query, regardless of the fact that Country Charlie has 50000000.
Obviously, this is not the best way to go about it, but this is why the first query worked, and the second did not.
Because every other continent has at least one country with less then 25 Mio population. That is what this says.
 SELECT name, continent, population FROM world
WHERE continent IN (
SELECT continent FROM world
WHERE population < 25000000)
Translating it into words: From the list of all countries (in table world) please find all countries where the continent has a country that has less than 25 Mio population.
why use a sub query?
try using:
SELECT name, continent, population FROM world
WHERE population > 25000000
and/or
SELECT name, continent, population FROM world
WHERE population <= 25000000
the column of your condition: "population" is in the FROM table: "world". There is no need to use a sub query of the same table "world" again, just use the "population" column directly in the WHERE
or are you trying to do this:
SELECT name, continent, population FROM world
WHERE continent NOT IN (
SELECT continent FROM world
GROUP BY continent
HAVING SUM(population) > 25000000)
notice the: SUM(), GROUP BY, and HAVING
Show the table DECLARATION. It seems you use CONTINENT as the continent number. Then you should check it is marked with PRIMARY KEY and NOT NULL options.
I realyl suspect you just forgot about very special meaning NULL has in SQL.
I make an example in Firebird 2.5.1 SQL server.
CREATE TABLE WORLD (
CONTINENT INTEGER,
NAME VARCHAR(20),
POPULATION INTEGER
);
INSERT INTO WORLD (CONTINENT, NAME, POPULATION) VALUES (NULL, 'null-id', 100);
INSERT INTO WORLD (CONTINENT, NAME, POPULATION) VALUES (1, 'normal 1', 10);
INSERT INTO WORLD (CONTINENT, NAME, POPULATION) VALUES (2, 'normal 2', 200);
INSERT INTO WORLD (CONTINENT, NAME, POPULATION) VALUES (3, 'null-pop', NULL);
INSERT INTO WORLD (CONTINENT, NAME, POPULATION) VALUES (4, 'normal 4', 110);
COMMIT WORK;
Now let's try your requests and see if the 1st row, having CONTINENT IS NULL would be present anywhere:
SELECT continent, population FROM world
WHERE continent IN (
SELECT continent FROM world
WHERE population > 100)
CONTINENT POPULATION
2 200
4 110
and then
SELECT continent, population FROM world
WHERE continent NOT IN (
SELECT continent FROM world
WHERE population > 100)
CONTINENT POPULATION
1 10
3 <NULL>
By the logic of the request you suppose CONTINENT to be the row ID, then you should make it NOT-NULL and then there would not be the line, that is not seen by [NOT] IN condition.
Now, let re-phrase this into flat query:
SELECT continent, population FROM world
WHERE NOT (population > 100)
CONTINENT POPULATION
<NULL> 100
1 10
SELECT continent, population FROM world
WHERE population > 100
CONTINENT POPULATION
2 200
4 110
This time the missed row was the one having NULL for Population column.
Then FreshPrinceOfSO suggested using EXISTS clause. While potentially it may end with most slow (non-effective) query plan, it at least masks away the special meaning of NULL value in SQL.
SELECT continent, population FROM world w_ext
WHERE EXISTS (
SELECT continent FROM world w_int
WHERE (w_int.population > 100) and (w_int.continent = w_ext.continent)
)
CONTINENT POPULATION
2 200
4 110
SELECT continent, population FROM world w_ext
WHERE NOT EXISTS (
SELECT continent FROM world w_int
WHERE (w_int.population > 100) and (w_int.continent = w_ext.continent)
)
CONTINENT POPULATION
<NULL> 100
1 10
3 <NULL>

SQL: Redundant WHERE clause specifying column is > 0?

Help me understand this: In the sqlzoo tutorial for question 3a ("Find the largest country in each region"), why does attaching 'AND population > 0' to the nested SELECT statement make this correct?
The reason is because the:
AND population > 0
...is filtering out the null row for the region "Europe", name "Vatican", which complicates the:
WHERE population >= ALL (SELECT population
FROM ...)
...because NULL isn't a value, so Russia won't be ranked properly. The ALL operator requires that the value you were comparing to be greater or equal to ALL the values returned from the subquery, which can never happen when there's a NULL in there.
My query would've been either:
SELECT region, name, population
FROM bbc x
WHERE population = (SELECT MAX(population)
FROM bbc y
WHERE y.region = x.region)
...or, using a JOIN:
SELECT x.region, x.name, x.population
FROM bbc x
JOIN (SELECT y.region,
MAX(y.population) AS max_pop
FROM bbc y
GROUP BY y.region) z ON z.region = x.region
AND z.max_pop = x.population
No it doesn't. Largest country has a priori nonzero population.
It's like checking if a largest book has any pages in it.