Why does this code work with a less than sign when the question requests for a greater than sign (nested selects from SQL zoo)? - sql

(updated) I'm trying to learn SQL from the following website:
https://sqlzoo.net/wiki/Nested_SELECT_Quiz
In this quiz, the second question asks to find countries belonging to regions with all populations over 50000 -
It says that this is the correct answer:
SELECT name,region,population FROM bbc x WHERE 50000 < ALL (SELECT population FROM bbc y WHERE x.region=y.region AND y.population>0)
Thats the answer it gives me. Can anyone explain in plain English why this works? If we're looking for population over 50,000 why is the code using a less than sign? And how do nested selects work in general then?

Order matters. 50000 < population is the same as population > 50000.
Why write it this funny way? Because you have to.
Specifically all is a quantified comparison predicate and it must be of the form <value> <operator> all(<subquery>). So 50000 < all(subquery). I can't say why it cannot be reversed, possibly to make parsing this special case easier.
And how do nested selects work in general then?
all is true if every row of the subquery meets the condition. 50000 < all(subquery) means that 50,000 is less than every row in the subquery (or every row in the subquery is over 50000).
SELECT name,region,population
FROM bbc x
WHERE 50000 < ALL (
SELECT population
FROM bbc y
WHERE x.region=y.region AND y.population>0
)
The subquery runs once for each row in bbc. x is the bbc table in the original query and y is the bbc table in the subquery. where x.region=y.region filters the subquery results to only rows in the same region as the original row.

Related

Question about #1 and #2 (ALL) in the Nested SELECT Quiz from sqlzoo.net

I'm a little confused by the use of ALL.In the Nested SELECT quiz from sqlzoo (link here:)
Q1: Select the code that shows the name, region and population of the smallest country in each region
SELECT region, name, population FROM bbc x WHERE population <= ALL (SELECT
population FROM bbc y WHERE y.region=x.region AND population>0)
I thought this made sense to me, because we're trying to get the population that is less than the smallest country in each region (queried with the inner subquery first).
But then, Q2 comes along: Select the code that shows the countries belonging to regions with all populations over 50000
And then code for that is:
SELECT name,region,population FROM bbc x WHERE 50000 < ALL (SELECT population
FROM bbc y WHERE x.region=y.region AND y.population>0)
If we're trying to get the countries with population > 50000, why is the sign not > but < instead?
I feel like I'm missing a basic understanding somewhere, but I'm not even sure where.
It would more readable and easier to understand if it could be written like this:
WHERE ALL (SELECT population....) > 50000
but this is syntactically wrong.
From https://oracle-base.com/articles/misc/all-any-some-comparison-conditions-in-sql
The ALL comparison condition is used to compare a value to a list or
subquery. It must be preceded by =, !=, >, <, <=, >= and
followed by a list or subquery.
Also from https://learn.microsoft.com/en-us/sql/t-sql/language-elements/all-transact-sql?view=sql-server-2017
the syntax should be:
scalar_expression { = | <> | != | > | >= | !> | < | <= | !< } ALL (subquery)
so you can't avoid having the ALL clause at the right side of the comparison,
but it's all the same since 50000 must be less than every item in the subquery.
For the first question, population is the first, so when you query "the population that is <= (less or equal to) all of the populations", it implies it's the smallest one.
In the second question, 50000 comes first in the comparison.
"Populations of over 50000" also implies that 50000 < those populations, which is exactly what it says in the query.

Find outliers to each band of records

The goal is to find extremely small or large records for each band based on a formula.
Input:
Distance Rate
10 5
25 200
50 300
1000 5
2000 2000
Bands are defined by my input. For example, I want to have two bands for this input (actually there are more, like 10 bands) for distance: 1-100, 101-10000.
For each band, we want to find all records that the rates are outliers by formula f (two standard deviations away from mean, if you are interested in the formula)
The formula f I want to use
(Rate- avg(Rate) over ()) / (stddev(Rate) over ()) > 2
Output:
Distance Rate
10 5
1000 5 (this number is for illustrative purpose only.)
The difficult part is I do not know how to do it for each band, and it makes applying formula more difficult.
Without knowing how you intend to apply your formula (my guess would be UDF), you can create your "bands" by grouping by a CASE expression:
GROUP BY CASE
WHEN Distance BETWEEN 1 AND 100 THEN 'Band1'
WHEN Distance BETWEEN 101 AND 10000 THEN 'Band2'
ETC
END
Similarly you use the same CASE expression in a RANK() OVER () function, if that works better for the rest of your query.
EDIT: based on your clarification, you need to handle this with a correlated sub-query in your WHERE clause. I would consider encapsulating it in a UDF to make the main query look cleaner. Something like:
WHERE (Rate- {Correlated query to select the AVG(rate) of all rows in this band (using the above CASE statement to determine "this band"} over ()) / (stddev(Rate) over ()) > 2

Very special join on float value

I got stuck with a tricky query (in MS Access 2013). I'd like to do a fairly simple Thing:
I have got two Tables (see example below): Table "scores" with scores of an exam and table "grading_key".
The scores table has a field named "quotient" which contains a float value representing the percentage of success (1.0 being all questions answered correctly). The grading_key table has quotient limits which separate one grade level from the next. Thus the “grading_key” table can be used to get a grade for any quotient value.
A grade can be found by performing:
SELECT TOP 1 Grade FROM grading_key WHERE {ANY_QUOTIENT_VALUE} <= Quotient
Sample Tables:
|-grade_key-| |-----scores-----|
Quotient Grade StudentId Quotient
0,92 1 123 0,85
0,87 1,5 321 0,8
0,81 2 766 0,91
0,76 2,5 222 0,78
My Problem is, I’d like to join scores and grades in a query resulting in associating each quotient in table “scores” with one grade in table “grade_key” (see desired_result below). Unfortunately I can’t simply join, as the quotients in “scores” do not necessarily match the grade limits defined in “grade_key”.
Currently I used a VBA function (calculateScoreForQuotient()) but I want to remove the VBA dependency as the resulting table should be called from outside MS Access and in this case VBA functions do not work.
|--------Desired_Result-------|
StudentId Quotient Grade
123 0,85 2
312 0,8 2,5
Does anyone know a way to get desired table with plain SQL? I played around with different combinations of JOINs and and WHEREs for quite a while now but my best result was to associate all available grades with each student (not really meaningful).
Any help would safe my day ;-)
You can use a co-related sub-query to return the grade based on the quotient of the student. You could use Max() or TOP 1 with an order by clause, whichever you prefer.
select
StudentID,
Quotient,
(select Max(grade) from Grades where grades.quotient <= student.quotient) as grade
from Student

I don't understand the need for self-joins. Can someone please explain them to me?

SELECT region, name, population
FROM bbc x
WHERE population <= ALL (SELECT population FROM bbc y WHERE y.region=x.region AND population>0)
I dont understand the logic of x and y using for the same table.
x and y are taking to be two different instances of table bbc, To list a table two times in the same query, you must provide a table alias for at least one of instance of the table name. This table alias helps the query processor determine whether columns should present data from the right or left version of the table.
This query returns all regions with smalest population in each rgion. To make this query without self-join you'll need to do 2 queries for each region:
1.
set #min=Select min(population) from bbc where population>0 and region=#region
2.
select region, name, population from bbc where population=#min and region=#region

Retrieve names by ratio of their occurrence

I'm somewhat new to SQL queries, and I'm struggling with this particular problem.
Let's say I have query that returns the following 3 records (kept to one column for simplicity):
Tom
Jack
Tom
And I want to have those results grouped by the name and also include the fraction (ratio) of the occurrence of that name out of the total records returned.
So, the desired result would be (as two columns):
Tom | 2/3
Jack | 1/3
How would I go about it? Determining the numerator is pretty easy (I can just use COUNT() and GROUP BY name), but I'm having trouble translating that into a ratio out of the total rows returned.
SELECT name, COUNT(name)/(SELECT COUNT(1) FROM names) FROM names GROUP BY name;
Since the denominator is fixed, the "ratio" is directly proportional to the numerator. Unless you really need to show the denominator, it'll be a lot easier to just use something like:
select name, count(*) from your_table_name
group by name
order by count(*) desc
and you'll get the right data in the right order, but the number that's shown will be the count instead of the ratio.
If you really want that denominator, you'd do a count(*) on a non-grouped version of the same select -- but depending on how long the select takes, that could be pretty slow.