Question about #1 and #2 (ALL) in the Nested SELECT Quiz from sqlzoo.net - sql

I'm a little confused by the use of ALL.In the Nested SELECT quiz from sqlzoo (link here:)
Q1: Select the code that shows the name, region and population of the smallest country in each region
SELECT region, name, population FROM bbc x WHERE population <= ALL (SELECT
population FROM bbc y WHERE y.region=x.region AND population>0)
I thought this made sense to me, because we're trying to get the population that is less than the smallest country in each region (queried with the inner subquery first).
But then, Q2 comes along: Select the code that shows the countries belonging to regions with all populations over 50000
And then code for that is:
SELECT name,region,population FROM bbc x WHERE 50000 < ALL (SELECT population
FROM bbc y WHERE x.region=y.region AND y.population>0)
If we're trying to get the countries with population > 50000, why is the sign not > but < instead?
I feel like I'm missing a basic understanding somewhere, but I'm not even sure where.

It would more readable and easier to understand if it could be written like this:
WHERE ALL (SELECT population....) > 50000
but this is syntactically wrong.
From https://oracle-base.com/articles/misc/all-any-some-comparison-conditions-in-sql
The ALL comparison condition is used to compare a value to a list or
subquery. It must be preceded by =, !=, >, <, <=, >= and
followed by a list or subquery.
Also from https://learn.microsoft.com/en-us/sql/t-sql/language-elements/all-transact-sql?view=sql-server-2017
the syntax should be:
scalar_expression { = | <> | != | > | >= | !> | < | <= | !< } ALL (subquery)
so you can't avoid having the ALL clause at the right side of the comparison,
but it's all the same since 50000 must be less than every item in the subquery.

For the first question, population is the first, so when you query "the population that is <= (less or equal to) all of the populations", it implies it's the smallest one.
In the second question, 50000 comes first in the comparison.
"Populations of over 50000" also implies that 50000 < those populations, which is exactly what it says in the query.

Related

Why does this code work with a less than sign when the question requests for a greater than sign (nested selects from SQL zoo)?

(updated) I'm trying to learn SQL from the following website:
https://sqlzoo.net/wiki/Nested_SELECT_Quiz
In this quiz, the second question asks to find countries belonging to regions with all populations over 50000 -
It says that this is the correct answer:
SELECT name,region,population FROM bbc x WHERE 50000 < ALL (SELECT population FROM bbc y WHERE x.region=y.region AND y.population>0)
Thats the answer it gives me. Can anyone explain in plain English why this works? If we're looking for population over 50,000 why is the code using a less than sign? And how do nested selects work in general then?
Order matters. 50000 < population is the same as population > 50000.
Why write it this funny way? Because you have to.
Specifically all is a quantified comparison predicate and it must be of the form <value> <operator> all(<subquery>). So 50000 < all(subquery). I can't say why it cannot be reversed, possibly to make parsing this special case easier.
And how do nested selects work in general then?
all is true if every row of the subquery meets the condition. 50000 < all(subquery) means that 50,000 is less than every row in the subquery (or every row in the subquery is over 50000).
SELECT name,region,population
FROM bbc x
WHERE 50000 < ALL (
SELECT population
FROM bbc y
WHERE x.region=y.region AND y.population>0
)
The subquery runs once for each row in bbc. x is the bbc table in the original query and y is the bbc table in the subquery. where x.region=y.region filters the subquery results to only rows in the same region as the original row.

Subquery Returns Muiltiple Values, but I am using "IN" as my operator

Long story short, I have a Crosswalk table that has a column name MgrFilterRacf. There is a single line in the table I use to reference various things across a few queries, this method has worked fine for me. However, when I had to reference more than one value it gives me the error below. I am really at a loss as to why it is not working. I see a TON of topics/posts on this, but most are solutions stating to use the IN operator, which I already am. The mildly infuriating thing is that it works when I only call one value and delete the second line on the crosswalk table regardless of "In" or "=" being used.
Error for reference:
Msg 512, Level 16, State 1, Line 2
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
Crosswalk table format:
Id | AttCeilingScore | AttFloorScore | MgrFilterRacf
----+-----------------+----------------+---------------
1 | 100 | 75 | Value1
2 | NULL | NULL | Value2
Query:
--\\*Perform calculation For Attendance Score Sa*\\
Select
MgrRacf,
MgrName,
Ctc.racf as EmpRacf,
Ctc.EmpName,
Case
when AttCntSegment is null then 0
else AttCntSegment
end as AttCntSegment,
Case
when AttSumDuration is null then 0
else AttSumDuration
end as AttSumDuration,
case
when AttCntSegment > 12
then (Select AttFloorScore from tblAvs1Scoring)
when AttCntSegment is null
then (Select AttCeilingScore from tblAvs1Scoring)
when 100 - ((AttCntSegment) * 2 + PercentReduction) < (Select AttFloorScore from tblAvs1Scoring)
then (Select AttFloorScore from tblAvs1Scoring)
else 100 - ((AttCntSegment)*2+PercentReduction)
end As AttScore,
Case
when AttCntSegment is null then 100
else 100 - ((AttCntSegment)*2+PercentReduction)
end as AttScoreRaw,
'RollSixSum' as ReportTag
From
(--\\*Get Total Occurrences from Rolling 6 months per advocate*\\
SELECT
EMP_ID,
COUNT(SEG_CODE) AS AttCntSegment,
SUM(DURATION) AS AttSumDuration
FROM
tblAttendance AS Att
WHERE
(START_DATE >= Getdate() - 180)
AND (SEG_CODE NOT IN ('FLEX2', 'FMLA'))
AND (DURATION > 7)
AND START_DATE IS NOT NULL
GROUP BY
EMP_ID) As Totals
INNER JOIN
tblCrosswalkAttendanceTime AS Time ON AttSumDuration BETWEEN Time.BegTotalTime AND Time.EndTotalTime
RIGHT JOIN
tblContactListFull AS Ctc ON Ctc.employeeID = Totals.EMP_ID
WHERE
--Ctc.Mgr2racf IN ('Value1','Value2') --This works
Ctc.Mgr2racf IN (SELECT MgrFilterRacf FROM tblAvs1Scoring) --This returns the same 2 values but doesn't work, note works with only 1 value present
AND (title LIKE '%IV%' OR Title LIKE '%III%' OR title LIKE '%Cust Relat%') --Going to apply same logic here once I have a solution
AND employeestatus2 <> 'Inactive'
Specific offending lines of code:
Ctc.Mgr2racf IN (Select MgrFilterRacf from tblAvs1Scoring) --This returns the same 2 values but doesnt work, note works with only 1 value present
AND (title LIKE '%IV%' or Title like '%III%' or title LIKE '%Cust Relat%') --Going to apply same logic here once I have a solution
It's not the IN clause that’s causing the error, since IN doesn't have to return only 1 value. Instead it's likely here:
CASE ... < (Select AttFloorScore from tblAvs1Scoring)
Try select count(AttFloorScore) from tblAvs1Scoring and see if it's > 1. I am sure it is.
This can be mitigated by what ever method is appropriate for your data.
Using TOP 1
Using an aggregate like MAX()
Correlating the sub-query to limit the rows returned.
Using the appropriate WHERE clause
The error was not where I thought it was, it was at the top of the query in those sub queries. Even though there as no values present, it still needs to be told since I added a line to return the max value. Once I updated that all was well and worked.
Bottom line, check all your sub queries.

Confusing with Having query in sql

I am using sql server management studio 2012 and have to make a query to show which subject a student has failed(condition for failing is point<5.0) the most for the first time from this table
StudentID | SubjectID | First/Second_Time | Point.
1 | 02 | 1 | 5.0
2 | 04 | 2 | 7.0
3 | 03 | 2 | 9
... etc
Here are my teacher's query:
SELECT SubjectID
FROM Result(NAME OF the TABLE)
WHERE [First/Second_Time] = 1 AND Point < 5
GROUP BY SubjectID
HAVING count(point) >= ALL
(
SELECT count(point)
FROM Result
WHERE [First/Second_Time] = 1 AND point < 5
GROUP BY SubjectID
)
I don't understand the reason for making the having query. Because Count(point) is always >=all(select count(point)
from Result
where First/Second_Time=1 and point<5
group by SubjectID), isnt it ?
and it doesn't show that the subject has most student fail for the first time. Thanks in advance and sorry for my bad english
The subquery is returning a list of the number of times a subject was failed (on the first attempt). It might be easier for you to see what it's doing if you run it like this:
SELECT SubjectID, count(point)
FROM Result
WHERE [First/Second_Time] = 1 AND point < 5
GROUP BY SubjectID
So if someone failed math twice and science once, the subquery would return:
2
1
You want to know which subject was failed the most (in this case, which subject was failed 2 or more times, since that is the highest number of failures in your subquery). So you count again (also grouping by subject), and use having to return only subjects with 2 or more failures (greater than or equal to the highest value in your subquery).
SELECT SubjectID
FROM Result
WHERE [First/Second_Time] = 1 AND Point < 5
GROUP BY SubjectID
HAVING count(point)...
See https://msdn.microsoft.com/en-us/library/ms178543.aspx for more examples.
Sounds like you are working on a project for a class, so I'm not even sure I should answer this, but here goes. The question is why the having clause. Have you read the descriptions for having and all ?
All "Compares a scalar value with a single-column set of values".
The scalar value in this case is count(point) or the number of occurrences of a subject id with point less than 5. The single-column set in this case is a list of the number of occurrences of every subject that has less than 5 points.
The net result of the comparison is in the ">=". "All" will only evaluate to true if it is true for every value in the subquery. The subquery returns a set of counts of all subjects meeting the <5 and 1st time requirement. If you have three subjects that meet the <5 and 1st time criteria, and they have a frequency of 1,2,3 times respectively, then the main query will have three "having" results; 1,2,3. Each of the main query results has to be >= each of the subquery results for that main value to evaluate true. So going through step by step, First main value 1 is >= 1, but isn't >= 2 so 1 drops because the "having" is false. Second main value 2 is >=1, is >= 2, but is not >= 3 so it drops. Third value, 3, evaluates true as >= 1, 2, and 3, so you end up returning the subject with the highest frequency.
This is fairly clear in the "remarks" section of the MSDN discussion of "All" keyword, but not as relates to your specific application.
Remember, MSDN is our friend!

difference between acting >= and > over a list

SELECT name, area
FROM world
WHERE area > ALL (SELECT area FROM world
WHERE continent="Europe" AND area IS NOT NULL)
SELECT name, area
FROM world
WHERE area >= ALL (SELECT area FROM world
WHERE continent="Europe" AND area IS NOT NULL)
What is the difference between these 2 queries?
Because they both give different result.
2 >= 2 is true.
2 > 2 is false.
your first query simply returns all countries in world that are bigger than all of countries in Europe (if you have set their area) in another word you are getting all countries that are bigger than the biggest country in Europe, the second query just returns all countries that are bigger than or equal to the biggest country in Europe.

SQL: Redundant WHERE clause specifying column is > 0?

Help me understand this: In the sqlzoo tutorial for question 3a ("Find the largest country in each region"), why does attaching 'AND population > 0' to the nested SELECT statement make this correct?
The reason is because the:
AND population > 0
...is filtering out the null row for the region "Europe", name "Vatican", which complicates the:
WHERE population >= ALL (SELECT population
FROM ...)
...because NULL isn't a value, so Russia won't be ranked properly. The ALL operator requires that the value you were comparing to be greater or equal to ALL the values returned from the subquery, which can never happen when there's a NULL in there.
My query would've been either:
SELECT region, name, population
FROM bbc x
WHERE population = (SELECT MAX(population)
FROM bbc y
WHERE y.region = x.region)
...or, using a JOIN:
SELECT x.region, x.name, x.population
FROM bbc x
JOIN (SELECT y.region,
MAX(y.population) AS max_pop
FROM bbc y
GROUP BY y.region) z ON z.region = x.region
AND z.max_pop = x.population
No it doesn't. Largest country has a priori nonzero population.
It's like checking if a largest book has any pages in it.