why is null not equal to null false - sql

I was reading this article:
Get null == null in SQL
And the consensus is that when trying to test equality between two (nullable) sql columns, the right approach is:
where ((A=B) OR (A IS NULL AND B IS NULL))
When A and B are NULL, (A=B) still returns FALSE, since NULL is not equal to NULL. That is why the extra check is required.
What about when testing inequalities? Following from the above discussion, it made me think that to test inequality I would need to do something like:
WHERE ((A <> B) OR (A IS NOT NULL AND B IS NULL) OR (A IS NULL AND B IS NOT NULL))
However, I noticed that that is not necessary (at least not on informix 11.5), and I can just do:
where (A<>B)
If A and B are NULL, this returns FALSE. If NULL is not equal to NULL, then shouldn't this return TRUE?
EDIT
These are all good answers, but I think my question was a little vague. Allow me to rephrase:
Given that either A or B can be NULL, is it enough to check their inequality with
where (A<>B)
Or do I need to explicitly check it like this:
WHERE ((A <> B) OR (A IS NOT NULL AND B IS NULL) OR (A IS NULL AND B IS NOT NULL))
REFER to this thread for the answer to this question.

Because that behavior follows established ternary logic where NULL is considered an unknown value.
If you think of NULL as unknown, it becomes much more intuitive:
Is unknown a equal to unknown b? There's no way to know, so: unknown.

relational expressions involving NULL actually yield NULL again
edit
here, <> stands for arbitrary binary operator, NULL is the SQL placeholder, and value is any value (NULL is not a value):
NULL <> value -> NULL
NULL <> NULL -> NULL
the logic is: NULL means "no value" or "unknown value", and thus any comparison with any actual value makes no sense.
is X = 42 true, false, or unknown, given that you don't know what value (if any) X holds? SQL says it's unknown. is X = Y true, false, or unknown, given that both are unknown? SQL says the result is unknown. and it says so for any binary relational operation, which is only logical (even if having NULLs in the model is not in the first place).
SQL also provides two unary postfix operators, IS NULL and IS NOT NULL, these return TRUE or FALSE according to their operand.
NULL IS NULL -> TRUE
NULL IS NOT NULL -> FALSE

All comparisons involving null are undefined, and evaluate to false. This idea, which is what prevents null being evaluated as equivalent to null, also prevents null being evaluated as NOT equivalent to null.

The short answer is... NULLs are weird, they don't really behave like you'd expect.
Here's a great paper on how NULLs work in SQL. I think it will help improve your understanding of the topic. I think the sections on handling null values in expressions will be especially useful for you.
http://www.oracle.com/technology/oramag/oracle/05-jul/o45sql.html

The default (ANSI) behaviour of nulls within an expression will result in a null (there are enough other answers with the cases of that).
There are however some edge cases and caveats that I would place when dealing with MS Sql Server that are not being listed.
Nulls within a statement that is grouping values together will be considered equal and be grouped together.
Null values within a statement that is ordering them will be considered equal.
Null values selected within a statement that is using distinct will be considered equal when evaluating the distinct aspect of the query
It is possible in SQL Server to override the expression logic regarding the specific Null = Null test, using the SET ANSI_NULLS OFF, which will then give you equality between null values - this is not a recommended move, but does exist.
SET ANSI_NULLS OFF
select result =
case
when null=null then 'eq'
else 'ne'
end
SET ANSI_NULLS ON
select result =
case
when null=null then 'eq'
else 'ne'
end

Here is a Quick Fix
ISNULL(A,0)=ISNULL(B,0)
0 can be changed to something that can never happen in your data

"Is unknown a equal to unknown b? There's no way to know, so: unknown."
The question was : why does the comparison yield FALSE ?
Given three-valued logic, it would indeed be sensible for the comparison to yield UNKNOWN (not FALSE). But SQL does yield FALSE, and not UNKNOWN.
One of the myriads of perversities in the SQL language.
Furthermore, the following must be taken into account :
If "unkown" is a logical value in ternary logic, then it ought to be the case that an equality comparison between two logical values that both happen to be (the value for) "unknown", then that comparison ought to yield TRUE.
If the logical value is itself unknown, then obviously that cannot be represented by putting the value "unknown" there, because that would imply that the logical value is known (to be "unknown"). That is, a.o., how relational theory proves that implementing 3-valued logic raises the requirement for a 4-valued logic, that a 4 valued logic leads to the need for a 5-valued logic, etc. etc. ad infinitum.

Related

Why doesn't NULL propagate in boolean expression in SQL?

In SQL, every operation which involves an operand with NULL yields NULL (with the obvious exceptions of IS NULL or IS NOT NULL operators). However, NULL does not propagate with AND or OR operators which may return TRUE or FALSE. For example, the following in MariaDB 10.4 returns NULL and 0 respectively:
select 0 & null, 0 and null
The difference is that the first is a bitwise AND, the second is a boolean AND. Why NULL does not propagate in boolean operation?
A NULL value has a whole series of possible meanings. IIRC Chris Date found about 7 different interpretations.
A very common interpretation of NULL is: "I don't know". Another one is: "Not applicable".
So let's try to evaluate a condition with the "I don't know" interpretation of a NULL value.
As an example suppose there are two persons. And you want to compare their age. Person A happens to be 31 years old. In case of the other person, person B, you don't know.
The question if A is as old as B cannot be answered positively. But it can't be denied either. In fact, you don't know. Hence the truth value here is NULL.
If we add the ages of both persons, we run into the same problem. We don't have a clue about the sum of their ages. Again the resulting value is NULL.
This is why you'll have to define how to treat NULL values. A database system cannot know this.
We have 0 and null. In MariaDB, 0 and FALSE are synonymous. So we have FALSE AND NULL. But FALSE AND <anything> is always FALSE - there's no doubt that no matter what value might be substituted here, nothing can make this statement TRUE now.
So we short-circuit and return the FALSE/0 result. Similarly, 1 OR NULL should return 1.
NULL has the semantics of "unknown" value. It does not have the semantics of "missing". This is a nuance.
But, it is "propagated" by AND and OR, just not as you might expect. So:
true AND NULL --> NULL, because the value would depend on what NULL is
false AND NULL --> false, because the first value requires that the result is false
In WHERE and WHEN clauses, NULL is treated as "false". However, in CHECK constraints, NULL is treated as "true" -- that is, only explicitly false values fail the NULL constraint.
Otherwise, you are correct that almost all operations with NULL return NULL. The & operator is a bitwise operator that has nothing to do with boolean values. It is just another "mathematical" operator, such as +, or *, so the value is NULL when any operand is NULL.
One very important exception is the NULL-safe comparison operator, <=>.

Simple where clause condition involving NULL

I have a query that needs to exclude both Null and Blank Values, but for some reason I can't work out this simple logic in my head.
Currently, my code looks like this:
WHERE [Imported] = 0 AND ([Value] IS NOT NULL **OR** [Value] != '')
However, should my code look like this to exclude both condition:
WHERE [Imported] = 0 AND ([Value] IS NOT NULL **AND** [Value] != '')
For some reason I just can't sort this in my head properly. To me it seems like both would work.
In your question you wrote the following:
have a query that needs to exclude both Null and Blank Values
So you have answered yourself, the AND query is the right query:
WHERE [Imported] = 0 AND ([Value] IS NOT NULL AND [Value] != '')
Here is an extract from the ANSI SQL Draft 2003 that I borrowed from this question:
6.3.3.3 Rule evaluation order
[...]
Where the precedence is not determined by the Formats or by
parentheses, effective evaluation of expressions is generally
performed from left to right. However, it is
implementation-dependent whether expressions are actually evaluated left to right, particularly when operands or operators might
cause conditions to be raised or if the results of the expressions
can be determined without completely evaluating all parts of the
expression.
You don't specify what kind of database system you are using but the concept of short-circuit evaluation which is explained in the previous paragraph applies to all major SQL versions (T-SQL, PL/SQL etc...)
Short-circuit evaluation means that once an expression has been successfully evaluated it will immediately exit the condition and stop evaluating the other expressions, applied to your question:
If value is null you want to exit the condition, that's why it should be the first expression (from left to right) but if it isn't null it should also not be empty, so it has to be NOT NULL and NOT EMPTY.
This case is a bit tricky because you cannot have a non empty string that is also null so the OR condition will also work but you will do an extra evaluation because short-circuit evaluation will never exit in the first expression:
Value is null but we would always need to check that value is also not an empty string (value is null or value is not an empty string).
In this second case, you may get an exception because the expression [Value] != '' may be checked on a null object.
So I think AND is the right answer. Hope it helps.
If the value was numeric and you didn't want either 1 or 2, you would write that condition as
... WHERE value != 1 AND value != 2
An OR would always be true in this case. For instance a value of 1 would return true for the check against 2 - and then the OR-check would return true, as at least one of the conditions evaluated to true.
When yu also want to check against null values, the situation is a bit more complicated. A check against a null value always fails: value != '' is false when value is null. That is why there is a special IS NULL or IS NOT NULL test.

is null vs. equals null

I have a rough understanding of why = null in SQL and is null are not the same, from questions like this one.
But then, why is
update table
set column = null
a valid SQL statement (at least in Oracle)?
From that answer, I know that null can be seen as somewhat "UNKNOWN" and therefore and sql-statement with where column = null "should" return all rows, because the value of column is no longer an an unknown value. I set it to null explicitly ;)
Where am I wrong/ do not understand?
So, if my question is maybe unclear:
Why is = null valid in the set clause, but not in the where clause of an SQL statement?
SQL doesn't have different graphical signs for assignment and equality operators like languages such as c or java have. In such languages, = is the assignment operator, while == is the equality operator. In SQL, = is used for both cases, and interpreted contextually.
In the where clause, = acts as the equality operator (similar to == in C). I.e., it checks if both operands are equal, and returns true if they are. As you mentioned, null is not a value - it's the lack of a value. Therefore, it cannot be equal to any other value.
In the set clause, = acts as the assignment operator (similar to = in C). I.e., it sets the left operand (a column name) with the value of the right operand. This is a perfectly legal statement - you are declaring that you do not know the value of a certain column.
They completely different operators, even if you write them the same way.
In a where clause, is a comparsion operator
In a set, is an assignment operator
The assigment operator allosw to "clear" the data in the column and set it to the "null value" .
In the set clause, you're assigning the value to an unknown, as defined by NULL. In the where clause, you're querying for an unknown. When you don't know what an unknown is, you can't expect any results for it.

why does 10/NULL evaluate to null?

In SQL , why does 10/NULL evaluate to NULL (or unknown) ? Example :
if((10/NULL) is NULL)
DBMS_OUTPUT.PUT_LINE("Null.");
However , 1 = NULL being a COMPARISON is considered as FALSE. Shouldn't 10/NULL also be considered as FALSE ?
I am referring to SQL only . Not any DBMS in particular. And it might be a duplicate but I didn't know what keywords to put in search for this query.
Shouldn't 10/NULL also be considered as FALSE?
No, because:
Any arithmetic expression containing a null always evaluates to null. For example, null added to 10 is null. In fact, all operators (except concatenation) return null when given a null operand.
Emphasis mine, taken from the Oracle manual: http://docs.oracle.com/cd/E11882_01/server.112/e26088/sql_elements005.htm#i59110
And this is required by the SQL standard.
Edit, as the question was for RDBMS in general:
SQL Server
When SET ANSI_NULLS is ON, an operator that has one or two NULL expressions returns UNKNOWN
Link to the the manual:
MySQL
An expression that contains NULL always produces a NULL value unless otherwise indicated in the documentation for a particular function or operator
Link to the manual
DB2
if either operand can be null, the result can be null, and if either is null, the result is the null value
Link to the manual:
PostgreSQL
Unfortunately I could not find such an explicit statement in the PostgreSQL manual, although I sure it behaves the same.
Warning: The "(except concatenation)" is an Oracle only and non-standard exception. (The empty string and NULL are almost identical in Oracle). Concatenating nulls gives null in all other DBMS.
1 = null is not null. It is actually unknown. As well as any other null operation.
The equality predicate 1 = NULL evaluates to NULL. But NULL in a boolean comparison is considered false.
If you do something like NOT( 1 = NULL ), 1 = NULL evaluates to NULL, NOT( NULL ) evaluates to NULL and so the condition as a whole ends up evaluating to false.
Oracle has a section in their documentation on handling NULL values in comparisons and conditional statements-- other databases will handle things in an very similar manner.
10/something means that you are counting how much "something" will be in 10
in this case you're counting how much "nothing" will be in 10 - that's infinity, unknown..
1 = NULL is false because one does not equal nothing
The NULLIF function accepts two parameters. If the first parameter is equal to the second parameter, NULLIF returns Null. Otherwise, the value of the first parameter is returned.
NULLIF(value1, value2)
NVL
The NVL function accepts two parameters. It returns the first non-NULL parameter or NULL if all parameters are NULL.
also check this conditional outcomes:
This "null equals UNKNOWN truth value" proposition introduces an inconsistency into SQL 3VL. One major problem is that it contradicts a basic property of nulls, the property of propagation. Nulls, by definition, propagate through all SQL expressions. The Boolean truth values do not have this property. Consider the following scenarios in SQL:1999, in which two Boolean truth values are combined into a compound predicate. According to the rules of SQL 3VL, and as shown in the 3VL truth table shown earlier in this article, the following statements hold:
( TRUE OR UNKNOWN ) → TRUE
( FALSE AND UNKNOWN ) → FALSE
However, because nulls propagate, treating null as UNKNOWN results in the following logical inconsistencies in SQL 3VL:
( TRUE OR NULL ) → NULL ( = UNKNOWN )
( FALSE AND NULL ) → NULL ( = UNKNOWN )
The SQL:1999 standard does not define how to deal with this inconsistency, and results could vary between implementations. Because of these inconsistencies and lack of support from vendors the SQL Boolean datatype did not gain widespread acceptance. Most SQL DBMS platforms now offer their own platform-specific recommendations for storing Boolean-type data.
Note that in the PostgreSQL implementation of SQL, the null value is used to represent all UNKNOWN results and the following evaluations occur:
( TRUE OR NULL ) → TRUE
( FALSE AND NULL ) → FALSE
( FALSE OR NULL ) IS NULL → TRUE
( TRUE AND NULL ) IS NULL → TRUE

In most SQL implementations, as opposed to standard programming languages, why doesn't x != null return true?

Let's suppose that x is some variable that has any value other than null, say 4, as an example. What should the following expression return?
x != null
In just about every programming language I have ever worked with (C#, Javascript, PHP, Python), this expression, or an equivalent expression in that language, evaluates to true.
SQL implementations, on the other hand, all seem to handle this quite differently. If one or both operands of the inequality operator are NULL, either NULL or False will be returned. This is basically the opposite of the behavior that most programming languages use, and it is extremely unintuitive to me.
Why is the behavior in SQL like this? What is it about relationaly database logic that makes null behave so much differently than it does in general purpose programming?
The null in most programming languages is considered "known", while NULL in SQL is considered "unknown".
So X == null compares X with a known value and the result is known (true or false).
But X = NULL compares X with an unknown value and the result is unknown (i.e. NULL, again). As a consequence, we need a special operator IS [NOT] NULL to test for it.
I'm guessing at least part of the motivation for such NULLs would be the behavior of foreign keys. When a child endpoint of a foreign key is NULL, it shouldn't match any parent, even if the parent is NULL (which is possible if parent is UNIQUE instead of primary key). Unfortunately, this brings many more gotchas than it solves and I personally think SQL should have gone the route of the "known" null and avoided this monkey business altogether.
Even E. F. Codd, inventor or relational model, later indicated that the traditional NULL is not optimal. But for historical reasons, we are pretty much stuck with it.
the reason is that the concept of equality doesn't apply to null. it's not logically true to say that this null does or does not equal this other null.
so, that's all fine for a theoretical reason, but for the sake of convenience, why does sql not allow your to say (x != null)?
well, the reason is because sometimes you want to handle nulls differently.
if I say (columnA = columnB) for example, should that return true if both columns are null?
if I say (columnA != columnB) - should it give the same result when column A is "a" and column B is null, and when column A is "a" and column B is "b"?
the people who made sql decided that distinction was important and so they wrote it to treat the 2 cases differently.
the wikipedia page on this has a pretty decent writeup - http://en.wikipedia.org/wiki/Null_%28SQL%29
well in sql engines you usually don't use the "=" operator but "IS", which then makes it more intuitive.
SELECT 4 IS NULL FROM dual;
> 0
SELECT 4 IS NOT NULL FROM dual;
> 1
NULL doesn't stand for null pointer, it's just not the same concept at all.
sql NULL is a I don't know the value flag, it's not a "there's no pointer" flag. You just should not compare them, they shouldn't be used the same way. This is pretty unintuitive you're right, they should have named it differently.
In SQL, NULL means "an unknown value".
If you say x != NULL you are saying "is the value of x unequal to an unknown value". Well, since we don't know what unknown value is, we don't know if x is equal to it or not. So the answer is "I don't know".
Similarly:
x = NULL OR 1=2 -- Unknown. 1=2 is not true, but we don't know about x=NULL
x = NULL OR 1=1 -- True. We know that at least 1=1 is true, so the OR is fulfulled regardless.
x = NULL AND 1=1 -- Unknown. We want them both to be true to fulful the AND
x = NULL AND 1=2 -- False. We know 1=2 is false, so the AND is not fulfilled regardless.
Also
-- Neither statement will select rows where x is null
select x from T where x = 1
select x from T where x != 1
The only way to check a null is to specificaly ask "is it true that we don't know what the value of x is". That has a yes or no answer, and uses the IS keyword.
If you just want nulls to be treated as zero, or another value, you can use the COALESCE or ISNULL function.
COALESCE(NULL, 1) -- 1
COALESCE(NULL, NULL, 1) -- Also 1
COALESCE(x, y, z, 0) -- x, unless it is null, then y, unless it is null, then z, unless it is null in which case 0.