ALL operator VS Any on an empty query - sql

I'm reading the oracle documentation on the ANY and ALL operators. I pretty understand their uses except for one thing. It states:
ALL :
If a subquery returns zero rows, the condition evaluates to TRUE.
ANY :
If a subquery returns zero rows, the condition evaluates to FALSE.
It doesn't seem very logical to me. Why would ALL on an empty subquery would return TRUE but ANY returns FALSE?
I'm relatively new to SQL, so I assume it would have a use-case for this behavior which is really counter-intuitive to me.
ANY and ALL on an empty set should return the same value no?

Consider the example of the EMP table in that link.
Specifically this query -
SELECT e1.empno, e1.sal
FROM emp e1
WHERE e1.sal > ANY (SELECT e2.sal
FROM emp e2
WHERE e2.deptno = 20);
In case of ANY, the question you are asking is "Is my salary greater than anyone in department 20 (at least 1 person)". This means you are hoping at least one person has a salary less than you. When there are no rows, this returns FALSE because there is nobody whose salary is less than you, you were hoping for at least one.
In case of ALL, the obvious question you would be asking is "Is my salary greater than everyone?". Rephrasing that as "Is there nobody that has salary greater than me?" When there are no rows returned, your answer is TRUE, because "there is indeed nobody whose salary is greater than me.

Because ANY is to be interpreted as EXIST (if there is any, it means they exist). Therefore, it return false if no rows are found.
All does not certify that any values exist, it just certifies that it represents all possible values. Therefore, it return true even if no rows are found.

Related

COUNT Function: Result of a query

I'm having trouble with a part of the following question. Thank you in advance for your help. I have a hard time visualizing this "fake" database table. I was hoping someone could help me run through my logic and see if it's correct. If someone could just point me in the right direction that would be great!
About:
Sesame is a way to find online class for adults & activities for adults around you.
Imagine a database table named activities. It has four columns:
activity_id [int, non null]
activity_provider_id [int, non null]
area_id [int, nullable]
starts_at [timestamp, non null]
Question: Given the following query, which counts would you expect to return the highest and lowest values? Which counts would you expect to be the same? Why?
select
count(activity_id),
count(distinct activity_provider_id),
count(area_id),
count(distinct area_id),
count(*)

 from activities
My Solution
Highest values: count(*)
Reasoning: The Count(*) function returns the number of rows returned by a SELECT statement, including NULL and duplicates.
Lowest values: count(distinct activity_provider_id)
Reasoning: Less activity providers per activity per area*
Same: Unsure - Could someone just point me in the right direction?
count(*) takes in account all rows in the table, while count(some_col) only counts non-null values of some_col.
Since activity_id is a non nullable colum, one would expect the following expressions return the same, "highest" count:
count(activity_id)
count(*)
As for wich expression returns the lowest count out of the three remaining choices, it is not really possible to tell for sure from the information provided in the question. If actually depends whether that are more, or less, distinct areas than activity providers.
There even is an edge case where all expressions return the same case, if all activity providers (resp. areas) are not null and unique in the table.

Selecting only such groups that contain certain value

First of all, even though this SQL: How do you select only groups that do not contain a certain value? thread is almost identical to my problem, it doesn't fully dissipate my confusion about the problem.
Let's have a table "Contacts" like this one:
+----------------------+
| Department FirstName |
+----------------------+
| 100 Thomas |
| 200 Peter |
| 100 Jerry |
+----------------------+
First, I want to group the rows by the department number and show number of rows in each displayed group. This, I believe, can be easily done by the following query.
SELECT Department, Count(*) As "Rows_in_group"
FROM Contacts
GROUP BY Department
This outputs 2 groups. First with dep.no. 100 containing 2 rows, second with 200 containing only one row.
But then, I want to extend the query to exclude any group that doesn't contain certain value in certain column (e.g. Thomas in FirstName). Here are my questions:
1) Reading the above-mentioned thread I was able to come up with this, which seems to work correctly:
SELECT Department, Count(*) As "Rows_in_group"
FROM Contacts
WHERE Department IN (SELECT Department FROM Contacts WHERE FirstName = "Thomas")
GROUP BY Department
Q: How does this work? I understand the "WHERE Department IN" part, but then I'd expect a value, but instead another nested query is included, which to me doesn't make much sense as I'm only beginner with SQL.
2) By accident I was able to come up with another query that also seems to work, but feels weird and I also don't understand its workings.
SELECT Department, Count(*) As "Rows_in_group"
FROM Contacts
GROUP BY Department
HAVING NOT SUM(FirstName = "Thomas") = 0
Q: How does this work? Why alteration: HAVING SUM(FirstName = "Thomas") > 0 doesn't work?
3) Q: Is there any simple and correct way to do this using the HAVING clause?
I expected, that simple "HAVING FirstName='Thomas'" after the GROUP BY would do the trick as it seems to follow a common language, but it does not.
Note that I want the whole groups to be chosen by the query so "WHERE FirstName='Thomas'" isn't s solution for my problem as it excludes all the rows that don't satisfy the condition before the grouping takes place (at least the way I understand it).
Q: How does this work? I understand the "WHERE Department IN" part,
but then I'd expect a value, but instead another nested query is
included, which to me doesn't make much sense as I'm only beginner
with SQL.
The nested query returns values which are used to match against Department
2) By accident I was able to come up with another query that also
seems to work, but feels weird and I also don't understand its
workings.
HAVING NOT SUM(FirstName = "Thomas") = 0
"Feels weird" because, well, it is. This is not a place for the SUM function.
EDIT: Why does this work?
The expression FirstName = "Thomas" gets evaluated as true or false (known as a Boolean expression). True numerically is equal to 1 and False converts to 0 (zero). By including SUM you then calculated the totals so really zero (still) means false and "not zero" is true. Then to make it weird(er) you included NOT which negated the whole thing and it becomes NOT TRUE = 0 or FALSE = FALSE (which is of course... TRUE)!!
EDIT: I think what could be more helpful to you is consideration of when to use WHERE and when to use HAVING (instead of the Boolean magic taking place).
From this answer:
WHERE clause introduces a condition on individual rows; HAVING clause introduces a condition on aggregations, i.e. results of selection where a single result, such as count, average, min, max, or sum, has been produced from multiple rows.
WHERE was appropriate for your example because first you want to "only return rows WHERE Department IN (100)" and then you want to "group those rows by Department" and get a COUNT of how many rows had been selected.

Why doesn't SQL support "= null" instead of "is null"?

I'm not asking if it does. I know that it doesn't.
I'm curious as to the reason. I've read support docs such as as this one on Working With Nulls in MySQL but they don't really give any reason. They only repeat the mantra that you have to use "is null" instead.
This has always bothered me. When doing dynamic SQL (those rare times when it has to be done) it would be so much easier to pass "null" into where clause like this:
#where = "where GroupId = null"
Which would be a simple replacement for a regular variable. Instead we have to use if/else blocks to do stuff like:
if #groupId is null then
#where = "where GroupId is null"
else
#where = "where GroupId = #groupId"
end
In larger more-complicated queries, this is a huge pain in the neck. Is there a specific reason that SQL and all the major RDBMS vendors don't allow this? Some kind of keyword conflict or value conflict that it would create?
Edit:
The problem with a lot of the answers (in my opinion) is that everyone is setting up an equivalency between null and "I don't know what the value is". There's a huge difference between those two things. If null meant "there's a value but it's unknown" I would 100% agree that nulls couldn't be equal. But SQL null doesn't mean that. It means that there is no value. Any two SQL results that are null both have no value. No value does not equal unknown value. Two different things. That's an important distinction.
Edit 2:
The other problem I have is that other HLLs allow null=null perfectly fine and resolve it appropriately. In C# for instance, null=null returns true.
The reason why it's off by default is that null is really not equal to null in a business sense. For example, if you were joining orders and customers:
select * from orders o join customers c on c.name = o.customer_name
It wouldn't make a lot of sense to match orders with an unknown customer with customers with an unknown name.
Most databases allow you to customize this behaviour. For example, in SQL Server:
set ansi_nulls on
if null = null
print 'this will not print'
set ansi_nulls off
if null = null
print 'this should print'
Equality is something that can be absolutely determined. The trouble with null is that it's inherently unknown. If you follow the truth table for three-value logic, null combined with any other value is null - unknown. Asking SQL "Is my value equal to null?" would be unknown every single time, even if the input is null. I think the implementation of IS NULL makes it clear.
It's a language semantic.
Null is the lack of a value.
is null makes sense to me. It says, "is lacking a value" or "is unknown". Personally I've never asked somebody if something is, "equal to lacking a value".
I can't help but feel that you're still not satisfied with the answers that have been given so far, so I thought I'd try another tack. Let's have an example (no, I've no idea why this specific example has come into my head).
We have a table for employees, EMP:
EMP
---
EMPNO GIVENNAME
E0001 Boris
E0002 Chris
E0003 Dave
E0004 Steve
E0005 Tony
And, for whatever bizarre reason, we're tracking what colour trousers each employee chooses to wear on a particular day (TROUS):
TROUS
-----
EMPNO DATE COLOUR
E0001 20110806 Brown
E0002 20110806 Blue
E0003 20110806 Black
E0004 20110806 Brown
E0005 20110806 Black
E0001 20110807 Black
E0003 20110807 Black
E0004 20110807 Grey
I could go on. We write a query, where we want to know the name of every employee, and what colour trousers they had on on the 7th August:
SELECT e.GIVENNAME,t.COLOUR
FROM
EMP e
LEFT JOIN
TROUS t
ON
e.EMPNO = t.EMPNO and
t.DATE = '20110807'
And we get the result set:
GIVENNAME COLOUR
Chris NULL
Steve Grey
Dave Black
Boris Black
Tony NULL
Now, this result set could be in a view, or CTE, or whatever, and we might want to continue asking questions about these results, using SQL. What might some of these questions be?
Were Dave and Boris wearing the same colour trousers on that day? (Yes, Black==Black)
Were Dave and Steve wearing the same colour trousers on that day? (No, Black!=Grey)
Were Boris and Tony wearing the same colour trousers on that day? (Unknown - we're trying to compare with NULL, and we're following the SQL rules)
Were Boris and Tony not wearing the same colour trousers on that day? (Unknown - we're again comparing to NULL, and we're following SQL rules)
Were Chris and Tony wearing the same colour trousers on that day? (Unknown)
Note, that you're already aware of specific mechanisms (e.g. IS NULL) to force the outcomes you want, if you've designed your database to never use NULL as a marker for missing information.
But in SQL, NULL has been given two roles (at least) - to mark inapplicable information (maybe we have complete information in the database, and Chris and Tony didn't turn up for work that day, or did but weren't wearing trousers), and to mark missing information (Chris did turn up that day, we just don't have the information recorded in the database at this time)
If you're using NULL purely as a marker of inapplicable information, I assume you're avoiding such constructs as outer joins.
I find it interesting that you've brought up NaN in comments to other answers, without seeing that NaN and (SQL) NULL have a lot in common. The biggest difference between them is that NULL is intended for use across the system, no matter what data type is involved.
You're biggest issue seems to be that you've decided that NULL has a single meaning across all programming languages, and you seem to feel that SQL has broken that meaning. In fact, null in different languages frequently has subtly different meanings. In some languages, it's a synonym for 0. In others, not, so the comparison 0==null will succeed in some, and fail in others. You mentioned VB, but VB (assuming you're talking .NET versions) does not have null. It has Nothing, which again is subtly different (it's the equivalent in most respects of the C# construct default(T)).
The concept is that NULL is not an equitable value. It denotes the absence of a value.
Therefore, a variable or a column can only be checked if it IS NULL, but not if it IS EQUAL TO NULL.
Once you open up arithmetic comparisions, you may have to contend with IS GREATER THAN NULL, or IS LESS THAN OR EQUAL TO NULL
NULL is unknown. It is neither true nor false so when you are comparing anything to unknown, the only answer is "unknown" Much better article on wikipedia http://en.wikipedia.org/wiki/Null_(SQL)
Because in ANSI SQL, null means "unknown", which is not a value. As such, it doesn't equal anything; you can just evaluate the value's state (known or unknown).
a. Null is not the "lack of a value"
b. Null is not "empty"
c. Null is not an "unset value"
It's all of the above and none of the above.
By technical rights, NULL is an "unknown value". However, like uninitialized pointers in C/C++, you don't really know what your pointing at. With databases, they allocate the space but do not initialize the value in that space.
So, it is an "empty" space in the sense that it's not initialized. If you set a value to NULL, the original value stays in that storage location. If it was originally an empty string (for example), it will remain that.
It's a "lack of a value" in the fact that it hasn't been set to what the database deems a valid value.
It's an "unset value" in that if the space was just allocated, the value that is there has never been set.
"Unknown" is the closest that we can truly come to knowing what to expect when we examine a NULL.
Because of that, if we try to compare this "unknown" value, we will get a comparison that
a) may or may not be valid
b) may or may not have the result we expect
c) may or may not crash the database.
So, the DBMS systems (long ago) decided that it doesn't even make sense to use equality when it comes to NULL.
Therefore, "= null" makes no sense.
In addition to all that has already been said, I wish to stress that what you write in your first line is wrong. SQL does support the “= NULL” syntax, but it has a different semantic than “IS NULL” – as can be seen in the very piece of documentation you linked to.
I agree with the OP that
where column_name = null
should be syntactic sugar for
where column_name is null
However, I do understand why the creators of SQL wanted to make the distinction. In three-valued logic (IMO this is a misnomer), a predicate can return two values (true or false) OR unknown which is technically not a value but just a way to say "we don't know which of the two values this is". Think about the following predicate in terms of three-valued logic:
A == B
This predicate tests whether A is equal to B. Here's what the truth table looks like:
T U F
-----
T | T U F
U | U U U
F | F U T
If either A or B is unknown, the predicate itself always returns unknown, regardless of whether the other one is true or false or unknown.
In SQL, null is a synonym for unknown. So, the SQL predicate
column_name = null
tests whether the value of column_name is equal to something whose value is unknown, and returns unknown regardless of whether column_name is true or false or unknown or anything else, just like in three-valued logic above. SQL DML operations are restricted to operating on rows for which the predicate in the where clause returns true, ignoring rows for which the predicate returns false or unknown. That's why "where column_name = null" doesn't operate on any rows.
NULL doesn't equal NULL. It can't equal NULL. It doesn't make sense for them to be equal.
A few ways to think about it:
Imagine a contacts database, containing fields like FirstName, LastName, DateOfBirth and HairColor. If I looked for records WHERE DateOfBirth = HairColor, should it ever match anything? What if someone's DateOfBirth was NULL, and their HairColor was too? An unknown hair color isn't equal to an unknown anything else.
Let's join the contacts table with purchases and product tables. Let's say I want to find all the instances where a customer bought a wig that was the same color as their own hair. So I query WHERE contacts.HairColor = product.WigColor. Should I get matches between every customer I don't know the hair color of and products that don't have a WigColor? No, they're a different thing.
Let's consider that NULL is another word for unknown. What's the result of ('Smith' = NULL)? The answer is not false, it's unknown. Unknown is not true, therefore it behaves like false. What's the result of (NULL = NULL)? The answer is also unknown, therefore also effectively false. (This is also why concatenating a string with a NULL value makes the whole string become NULL -- the result really is unknown.)
Why don't you use the isnull function?
#where = "where GroupId = "+ isnull(#groupId,"null")

SQL comparison operators

are these two statements the same?
query 1: WHERE salary > 999;
query 2: WHERE salary >= 1000;
I thought they were, but apparently according to my peers they are not (Though they have failed to explain why).
That's not necessarily the same. If you're storing doubles, when you do:
WHERE salary >= 1000;
you're not taking in count all values that are between 999 and 1000 (eg. 999.50)
Otherwise, if you're working with integers, that's true, not only in programming, but also in maths.
n > k <=> n >= k+1
The two queries will give the same results, if salary is an integral type. But if salary is some real type, then the results will be different.
This is entirely dependant on the datatype of salary.
If salary is an INT or a BIGINT then yes they will yield the same results.
If salary is pretty much any other datatype, the first once will return results for 999.9, but the second one won't.

Oracle: '= ANY()' vs. 'IN ()'

I just stumbled upon something in ORACLE SQL (not sure if it's in others), that I am curious about. I am asking here as a wiki, since it's hard to try to search symbols in google...
I just found that when checking a value against a set of values you can do
WHERE x = ANY (a, b, c)
As opposed to the usual
WHERE x IN (a, b, c)
So I'm curious, what is the reasoning for these two syntaxes? Is one standard and one some oddball Oracle syntax? Or are they both standard? And is there a preference of one over the other for performance reasons, or ?
Just curious what anyone can tell me about that '= ANY' syntax.
ANY (or its synonym SOME) is a syntax sugar for EXISTS with a simple correlation:
SELECT *
FROM mytable
WHERE x <= ANY
(
SELECT y
FROM othertable
)
is the same as:
SELECT *
FROM mytable m
WHERE EXISTS
(
SELECT NULL
FROM othertable o
WHERE m.x <= o.y
)
With the equality condition on a not-nullable field, it becomes similar to IN.
All major databases, including SQL Server, MySQL and PostgreSQL, support this keyword.
IN- Equal to any member in the list
ANY- Compare value to **each** value returned by the subquery
ALL- Compare value to **EVERY** value returned by the subquery
<ANY() - less than maximum
>ANY() - more than minimum
=ANY() - equivalent to IN
>ALL() - more than the maximum
<ALL() - less than the minimum
eg:
Find the employees who earn the same salary as the minimum salary for each department:
SELECT last_name, salary,department_id
FROM employees
WHERE salary IN (SELECT MIN(salary)
FROM employees
GROUP BY department_id);
Employees who are not IT Programmers and whose salary is less than that of any IT programmer:
SELECT employee_id, last_name, salary, job_id
FROM employees
WHERE salary <ANY
(SELECT salary
FROM employees
WHERE job_id = 'IT_PROG')
AND job_id <> 'IT_PROG';
Employees whose salary is less than the salary ofall employees with a job ID of IT_PROG and whose job is not IT_PROG:
SELECT employee_id,last_name, salary,job_id
FROM employees
WHERE salary <ALL
(SELECT salary
FROM employees
WHERE job_id = 'IT_PROG')
AND job_id <> 'IT_PROG';
....................
Hope it helps.
-Noorin Fatima
To put it simply and quoting from O'Reilly's "Mastering Oracle SQL":
"Using IN with a subquery is functionally equivalent to using ANY, and returns TRUE if a match is found in the set returned by the subquery."
"We think you will agree that IN is more intuitive than ANY, which is why IN is almost always used in such situations."
Hope that clears up your question about ANY vs IN.
I believe that what you are looking for is this:
http://download-west.oracle.com/docs/cd/B10501_01/server.920/a96533/opt_ops.htm#1005298
(Link found on Eddie Awad's Blog)
To sum it up here:
last_name IN ('SMITH', 'KING',
'JONES')
is transformed into
last_name = 'SMITH' OR last_name =
'KING' OR last_name = 'JONES'
while
salary > ANY (:first_sal,
:second_sal)
is transformed into
salary > :first_sal OR salary >
:second_sal
The optimizer transforms a condition
that uses the ANY or SOME operator
followed by a subquery into a
condition containing the EXISTS
operator and a correlated subquery
The ANY syntax allows you to write things like
WHERE x > ANY(a, b, c)
or event
WHERE x > ANY(SELECT ... FROM ...)
Not sure whether there actually is anyone on the planet who uses ANY (and its brother ALL).
A quick google found this http://theopensourcery.com/sqlanysomeall.htm
Any allows you to use an operator other than = , in most other respect (special cases for nulls) it acts like IN. You can think of IN as ANY with the = operator.
This is a standard. The SQL 1992 standard states
8.4 <in predicate>
[...]
<in predicate> ::=
<row value constructor>
[ NOT ] IN <in predicate value>
[...]
2) Let RVC be the <row value constructor> and let IPV be the <in predicate value>.
[...]
4) The expression
RVC IN IPV
is equivalent to
RVC = ANY IPV
So in fact, the <in predicate> behaviour definition is based on the 8.7 <quantified comparison predicate>. In Other words, Oracle correctly implements the SQL standard here
Perhaps one of the linked articles points this out, but isn't it true that when looking for a match (=) the two return the same thing. However, if looking for a range of answers (>, <, etc) you cannot use "IN" and would have to use "ANY"...
I'm a newb, forgive me if I've missed something obvious...
MySql clears up ANY in it's documentation pretty well:
The ANY keyword, which must follow a comparison operator, means
“return TRUE if the comparison is TRUE for ANY of the values in the
column that the subquery returns.” For example:
SELECT s1 FROM t1 WHERE s1 > ANY (SELECT s1 FROM t2);
Suppose that there is a row in table t1 containing (10). The
expression is TRUE if table t2 contains (21,14,7) because there is a
value 7 in t2 that is less than 10. The expression is FALSE if table
t2 contains (20,10), or if table t2 is empty. The expression is
unknown (that is, NULL) if table t2 contains (NULL,NULL,NULL).
https://dev.mysql.com/doc/refman/5.5/en/any-in-some-subqueries.html
Also Learning SQL by Alan Beaulieu states the following:
Although most people prefer to use IN, using = ANY is equivalent to
using the IN operator.
Why I always use any is because in some oracle or mssql versions IN list is limited by 1000/999 elements. While = any () is not limited by 1000.
Nobody likes their sql query crashing a web request.
So there is a practical difference.
Second reason it is the more modern form. As it correlates with expressions like > all (...).
Third reason is somehow for me as non-native English speaker it appears more natural to use "any" and "all" than to use IN.