In most SQL implementations, as opposed to standard programming languages, why doesn't x != null return true? - sql

Let's suppose that x is some variable that has any value other than null, say 4, as an example. What should the following expression return?
x != null
In just about every programming language I have ever worked with (C#, Javascript, PHP, Python), this expression, or an equivalent expression in that language, evaluates to true.
SQL implementations, on the other hand, all seem to handle this quite differently. If one or both operands of the inequality operator are NULL, either NULL or False will be returned. This is basically the opposite of the behavior that most programming languages use, and it is extremely unintuitive to me.
Why is the behavior in SQL like this? What is it about relationaly database logic that makes null behave so much differently than it does in general purpose programming?

The null in most programming languages is considered "known", while NULL in SQL is considered "unknown".
So X == null compares X with a known value and the result is known (true or false).
But X = NULL compares X with an unknown value and the result is unknown (i.e. NULL, again). As a consequence, we need a special operator IS [NOT] NULL to test for it.
I'm guessing at least part of the motivation for such NULLs would be the behavior of foreign keys. When a child endpoint of a foreign key is NULL, it shouldn't match any parent, even if the parent is NULL (which is possible if parent is UNIQUE instead of primary key). Unfortunately, this brings many more gotchas than it solves and I personally think SQL should have gone the route of the "known" null and avoided this monkey business altogether.
Even E. F. Codd, inventor or relational model, later indicated that the traditional NULL is not optimal. But for historical reasons, we are pretty much stuck with it.

the reason is that the concept of equality doesn't apply to null. it's not logically true to say that this null does or does not equal this other null.
so, that's all fine for a theoretical reason, but for the sake of convenience, why does sql not allow your to say (x != null)?
well, the reason is because sometimes you want to handle nulls differently.
if I say (columnA = columnB) for example, should that return true if both columns are null?
if I say (columnA != columnB) - should it give the same result when column A is "a" and column B is null, and when column A is "a" and column B is "b"?
the people who made sql decided that distinction was important and so they wrote it to treat the 2 cases differently.
the wikipedia page on this has a pretty decent writeup - http://en.wikipedia.org/wiki/Null_%28SQL%29

well in sql engines you usually don't use the "=" operator but "IS", which then makes it more intuitive.
SELECT 4 IS NULL FROM dual;
> 0
SELECT 4 IS NOT NULL FROM dual;
> 1
NULL doesn't stand for null pointer, it's just not the same concept at all.
sql NULL is a I don't know the value flag, it's not a "there's no pointer" flag. You just should not compare them, they shouldn't be used the same way. This is pretty unintuitive you're right, they should have named it differently.

In SQL, NULL means "an unknown value".
If you say x != NULL you are saying "is the value of x unequal to an unknown value". Well, since we don't know what unknown value is, we don't know if x is equal to it or not. So the answer is "I don't know".
Similarly:
x = NULL OR 1=2 -- Unknown. 1=2 is not true, but we don't know about x=NULL
x = NULL OR 1=1 -- True. We know that at least 1=1 is true, so the OR is fulfulled regardless.
x = NULL AND 1=1 -- Unknown. We want them both to be true to fulful the AND
x = NULL AND 1=2 -- False. We know 1=2 is false, so the AND is not fulfilled regardless.
Also
-- Neither statement will select rows where x is null
select x from T where x = 1
select x from T where x != 1
The only way to check a null is to specificaly ask "is it true that we don't know what the value of x is". That has a yes or no answer, and uses the IS keyword.
If you just want nulls to be treated as zero, or another value, you can use the COALESCE or ISNULL function.
COALESCE(NULL, 1) -- 1
COALESCE(NULL, NULL, 1) -- Also 1
COALESCE(x, y, z, 0) -- x, unless it is null, then y, unless it is null, then z, unless it is null in which case 0.

Related

Why doesn't NULL propagate in boolean expression in SQL?

In SQL, every operation which involves an operand with NULL yields NULL (with the obvious exceptions of IS NULL or IS NOT NULL operators). However, NULL does not propagate with AND or OR operators which may return TRUE or FALSE. For example, the following in MariaDB 10.4 returns NULL and 0 respectively:
select 0 & null, 0 and null
The difference is that the first is a bitwise AND, the second is a boolean AND. Why NULL does not propagate in boolean operation?
A NULL value has a whole series of possible meanings. IIRC Chris Date found about 7 different interpretations.
A very common interpretation of NULL is: "I don't know". Another one is: "Not applicable".
So let's try to evaluate a condition with the "I don't know" interpretation of a NULL value.
As an example suppose there are two persons. And you want to compare their age. Person A happens to be 31 years old. In case of the other person, person B, you don't know.
The question if A is as old as B cannot be answered positively. But it can't be denied either. In fact, you don't know. Hence the truth value here is NULL.
If we add the ages of both persons, we run into the same problem. We don't have a clue about the sum of their ages. Again the resulting value is NULL.
This is why you'll have to define how to treat NULL values. A database system cannot know this.
We have 0 and null. In MariaDB, 0 and FALSE are synonymous. So we have FALSE AND NULL. But FALSE AND <anything> is always FALSE - there's no doubt that no matter what value might be substituted here, nothing can make this statement TRUE now.
So we short-circuit and return the FALSE/0 result. Similarly, 1 OR NULL should return 1.
NULL has the semantics of "unknown" value. It does not have the semantics of "missing". This is a nuance.
But, it is "propagated" by AND and OR, just not as you might expect. So:
true AND NULL --> NULL, because the value would depend on what NULL is
false AND NULL --> false, because the first value requires that the result is false
In WHERE and WHEN clauses, NULL is treated as "false". However, in CHECK constraints, NULL is treated as "true" -- that is, only explicitly false values fail the NULL constraint.
Otherwise, you are correct that almost all operations with NULL return NULL. The & operator is a bitwise operator that has nothing to do with boolean values. It is just another "mathematical" operator, such as +, or *, so the value is NULL when any operand is NULL.
One very important exception is the NULL-safe comparison operator, <=>.

Oracle sql null value is not selected

In NAME table FIRST column is having null but no rows are selected. Please help me to understand.
SELECT * FROM NAME WHERE FIRST != '1'
Any comparison with null is false - = and <>, >, <, and so on. You cannot use null in an IN list as well - it would be ignored. Moreover, two nulls are not even equal to each other.
To get the nulls, you need to ask for them explicitly, like this:
SELECT * FROM NAME WHERE FIRST IS NULL OR FIRST != '1'
Any comparison to NULL returns NULL, which is equivalent to FALSE. This is true eve of not-equals.
If you want to include NULL values, do one of the following:
where first <> '1' or first is null
or
where coalesce(first, '<null>') <> '1'
In Oracle, null is not considered a legal value to select unless you explicitly ask for it:
select * from name where (first != '1') or first is null
You could also use NVL (similar to coalesce):
select * from name where nvl(first,'0') != '1'
That is correct because NULL can never be compared with anything else....
The only option that you have is to include a NULL check as an or in the command
SELECT * FROM NAME WHERE FIRST!=1 OR FIRST IS NULL
According to Oracle Documentation NULL is defined as a value not knownm or when the value is not meaningful. That is solely the reason why Oracle mentions not consider a value of ZERO as NULL. This is just an FYI, an addon. Thanks!
NULL is dumb. Period.
NULL is evil.
If X is NULL and Y is NULL, then X does in fact equal Y because they are both NULL.
It's also a PITA that I can't say
IF X IN ('a','B','C', null)
Because this condition happens. But now I have to say
IF ( X IN ('a','B','C') or X is NULL )
which is a waste of time and a risk of error if I forget the parentheses.
What irks me further is that NULL shouldn't happen in the first place. Fields (er... ok kids, I'll call them Columns) should always be initialized. Period. Stop the nulls. Don't allow them. Default values should always be zeroes or blanks so that those folks that are too lazy to initialize columns in their software will have them initialized for them automatically.
There are many instances where a failure to define default values of zeroes and blanks makes life more difficult than it has to be.

why does 10/NULL evaluate to null?

In SQL , why does 10/NULL evaluate to NULL (or unknown) ? Example :
if((10/NULL) is NULL)
DBMS_OUTPUT.PUT_LINE("Null.");
However , 1 = NULL being a COMPARISON is considered as FALSE. Shouldn't 10/NULL also be considered as FALSE ?
I am referring to SQL only . Not any DBMS in particular. And it might be a duplicate but I didn't know what keywords to put in search for this query.
Shouldn't 10/NULL also be considered as FALSE?
No, because:
Any arithmetic expression containing a null always evaluates to null. For example, null added to 10 is null. In fact, all operators (except concatenation) return null when given a null operand.
Emphasis mine, taken from the Oracle manual: http://docs.oracle.com/cd/E11882_01/server.112/e26088/sql_elements005.htm#i59110
And this is required by the SQL standard.
Edit, as the question was for RDBMS in general:
SQL Server
When SET ANSI_NULLS is ON, an operator that has one or two NULL expressions returns UNKNOWN
Link to the the manual:
MySQL
An expression that contains NULL always produces a NULL value unless otherwise indicated in the documentation for a particular function or operator
Link to the manual
DB2
if either operand can be null, the result can be null, and if either is null, the result is the null value
Link to the manual:
PostgreSQL
Unfortunately I could not find such an explicit statement in the PostgreSQL manual, although I sure it behaves the same.
Warning: The "(except concatenation)" is an Oracle only and non-standard exception. (The empty string and NULL are almost identical in Oracle). Concatenating nulls gives null in all other DBMS.
1 = null is not null. It is actually unknown. As well as any other null operation.
The equality predicate 1 = NULL evaluates to NULL. But NULL in a boolean comparison is considered false.
If you do something like NOT( 1 = NULL ), 1 = NULL evaluates to NULL, NOT( NULL ) evaluates to NULL and so the condition as a whole ends up evaluating to false.
Oracle has a section in their documentation on handling NULL values in comparisons and conditional statements-- other databases will handle things in an very similar manner.
10/something means that you are counting how much "something" will be in 10
in this case you're counting how much "nothing" will be in 10 - that's infinity, unknown..
1 = NULL is false because one does not equal nothing
The NULLIF function accepts two parameters. If the first parameter is equal to the second parameter, NULLIF returns Null. Otherwise, the value of the first parameter is returned.
NULLIF(value1, value2)
NVL
The NVL function accepts two parameters. It returns the first non-NULL parameter or NULL if all parameters are NULL.
also check this conditional outcomes:
This "null equals UNKNOWN truth value" proposition introduces an inconsistency into SQL 3VL. One major problem is that it contradicts a basic property of nulls, the property of propagation. Nulls, by definition, propagate through all SQL expressions. The Boolean truth values do not have this property. Consider the following scenarios in SQL:1999, in which two Boolean truth values are combined into a compound predicate. According to the rules of SQL 3VL, and as shown in the 3VL truth table shown earlier in this article, the following statements hold:
( TRUE OR UNKNOWN ) → TRUE
( FALSE AND UNKNOWN ) → FALSE
However, because nulls propagate, treating null as UNKNOWN results in the following logical inconsistencies in SQL 3VL:
( TRUE OR NULL ) → NULL ( = UNKNOWN )
( FALSE AND NULL ) → NULL ( = UNKNOWN )
The SQL:1999 standard does not define how to deal with this inconsistency, and results could vary between implementations. Because of these inconsistencies and lack of support from vendors the SQL Boolean datatype did not gain widespread acceptance. Most SQL DBMS platforms now offer their own platform-specific recommendations for storing Boolean-type data.
Note that in the PostgreSQL implementation of SQL, the null value is used to represent all UNKNOWN results and the following evaluations occur:
( TRUE OR NULL ) → TRUE
( FALSE AND NULL ) → FALSE
( FALSE OR NULL ) IS NULL → TRUE
( TRUE AND NULL ) IS NULL → TRUE

testing inequality with columns that can be null

So, I asked a question this morning, which I did not phrase correctly, so I got a lot of responses as to why NULL compared to anything will give NULL/FALSE.
My actual question was, what is the time honored fashion in which db guys test inequalities for two columns that can both be NULL. My question is the exact opposite of this question.
The requirements are as follows, A and B are two columns:
a) if A and B are both NULL, they are equal, return FALSE
b) if A and B are both not NULL, then return A<>B
c) if either A or B are NULL, they are not equal, return TRUE
Depending on the data type and possible values for the columns:
COALESCE(A, -1) <> COALESCE(B, -1)
The trick is finding a value (here I used -1) that will NEVER appear in your data.
The other way would be:
(A <> B) OR (A IS NOT NULL AND B IS NULL) OR (A IS NULL AND B IS NOT NULL)
This can be a problem depending on how your particular RDBMS handles NULLs. By the ANSI standard, this should give you what you want, but who follows standards anyway. :)
P.S. - I should also point out that using the COALESCE function may invalidate the use of indexes in comparing the columns. Check your query plan and performance of the query to see if that's a problem.
P.P.S. - I just noticed that OMG Ponies mentioned that Informix doesn't support COALESCE. It's an ANSI standard function I believe, but see what I said above about standards...
I would personally write out the expression you came up with, especially if the table is expected to grow large. Wrapping the columns in function calls hurts performance by making it so the engine can't use any indexes you have on those columns. Of course, in a small table, this may not be any sort of issue, but I still like to do it the explicit way just in case a table ends up growing.
can you try something like this in informix?
CASE
WHEN a IS NULL AND B IS NULL THEN false
WHEN a IS NULL OR B IS NULL THEN true
ELSE a <> B
END
from IBM Informix Guide to SQL: Syntax , CASE Expressions
If you want to be sure about how NULLs are handled, you'll have to use whatever Informix supports for null checking. I haven't turned up much, other than the SE version doesn't support COALESCE, but it does support DECODE and possibly CASE.
WHERE COALESCE(t.a, 0) != COALESCE(t.b, 0)
WHERE DECODE(NULL, 0, t.a) != DECODE(NULL, 0, t.b)
For SQL Server, use:
WHERE ISNULL(A, '') <> ISNULL(B, '')
The trouble is that a<>b (or a=b) yields NULL, not 1 or 0 when one or both operands are NULL. This doesn't matter for the = case because NULL OR 1 is 1 and NULL OR 0 is NULL which behaves like 0 for selecting in a WHERE clause.
You could say:
a<>b OR (a IS NULL)<>(b IS NULL)
However needing to do it either way may be a sign that you're misusing NULL and should consider changing the schema to use some other NOT NULL value to signify this comparable condition.
For example if you've got a person table with a title column, don't use NULL to signify that they have no title; that's not a ‘missing’ datum, it's just that no title exists. So store it as an empty string '' that you can happily compare with other empty strings. (Well unless you run Oracle of course, with its Empty String Problem...)
IBM Informix Dynamic Server has a somewhat peculiar view of booleans for a variety of historical (aka 'bad') reasons. Adapting the idea suggested by #astander, this CASE expression 'works', but I'd be the first to say 'not obvious' (see - I said it before you did!). The setup phase:
create table x(a int, b int);
insert into x values(null, null);
insert into x values(null, 1);
insert into x values(1, null);
insert into x values(1, 1);
insert into x values(1, 2);
The SELECT statement:
SELECT *
FROM x
WHERE CASE
WHEN a IS NULL AND b IS NULL THEN 'f'::BOOLEAN
WHEN a IS NULL OR b IS NULL THEN 't'::BOOLEAN
WHEN a != b THEN 't'::BOOLEAN
ELSE 'f'::BOOLEAN
END
;
The result from this query is:
1
1
1 2
Issues:
IDS does not recognize FALSE or TRUE or UNKNOWN as keywords.
IDS does not recognize boolean expressions such as 'a != b' (or 'a <> b') as such.
Yes, it pains me greatly to have to state this.
If
where ((A=B) OR (A IS NULL AND B IS NULL))
is for equality, then why just not use:
where NOT (
((A=B) OR (A IS NULL AND B IS NULL))
)
for inequality?
A slight modification of #user3830747 answer, based on demorgans law:
NOT (NVL(a = b,FALSE) OR COALESCE(a,b) IS NULL)

why is null not equal to null false

I was reading this article:
Get null == null in SQL
And the consensus is that when trying to test equality between two (nullable) sql columns, the right approach is:
where ((A=B) OR (A IS NULL AND B IS NULL))
When A and B are NULL, (A=B) still returns FALSE, since NULL is not equal to NULL. That is why the extra check is required.
What about when testing inequalities? Following from the above discussion, it made me think that to test inequality I would need to do something like:
WHERE ((A <> B) OR (A IS NOT NULL AND B IS NULL) OR (A IS NULL AND B IS NOT NULL))
However, I noticed that that is not necessary (at least not on informix 11.5), and I can just do:
where (A<>B)
If A and B are NULL, this returns FALSE. If NULL is not equal to NULL, then shouldn't this return TRUE?
EDIT
These are all good answers, but I think my question was a little vague. Allow me to rephrase:
Given that either A or B can be NULL, is it enough to check their inequality with
where (A<>B)
Or do I need to explicitly check it like this:
WHERE ((A <> B) OR (A IS NOT NULL AND B IS NULL) OR (A IS NULL AND B IS NOT NULL))
REFER to this thread for the answer to this question.
Because that behavior follows established ternary logic where NULL is considered an unknown value.
If you think of NULL as unknown, it becomes much more intuitive:
Is unknown a equal to unknown b? There's no way to know, so: unknown.
relational expressions involving NULL actually yield NULL again
edit
here, <> stands for arbitrary binary operator, NULL is the SQL placeholder, and value is any value (NULL is not a value):
NULL <> value -> NULL
NULL <> NULL -> NULL
the logic is: NULL means "no value" or "unknown value", and thus any comparison with any actual value makes no sense.
is X = 42 true, false, or unknown, given that you don't know what value (if any) X holds? SQL says it's unknown. is X = Y true, false, or unknown, given that both are unknown? SQL says the result is unknown. and it says so for any binary relational operation, which is only logical (even if having NULLs in the model is not in the first place).
SQL also provides two unary postfix operators, IS NULL and IS NOT NULL, these return TRUE or FALSE according to their operand.
NULL IS NULL -> TRUE
NULL IS NOT NULL -> FALSE
All comparisons involving null are undefined, and evaluate to false. This idea, which is what prevents null being evaluated as equivalent to null, also prevents null being evaluated as NOT equivalent to null.
The short answer is... NULLs are weird, they don't really behave like you'd expect.
Here's a great paper on how NULLs work in SQL. I think it will help improve your understanding of the topic. I think the sections on handling null values in expressions will be especially useful for you.
http://www.oracle.com/technology/oramag/oracle/05-jul/o45sql.html
The default (ANSI) behaviour of nulls within an expression will result in a null (there are enough other answers with the cases of that).
There are however some edge cases and caveats that I would place when dealing with MS Sql Server that are not being listed.
Nulls within a statement that is grouping values together will be considered equal and be grouped together.
Null values within a statement that is ordering them will be considered equal.
Null values selected within a statement that is using distinct will be considered equal when evaluating the distinct aspect of the query
It is possible in SQL Server to override the expression logic regarding the specific Null = Null test, using the SET ANSI_NULLS OFF, which will then give you equality between null values - this is not a recommended move, but does exist.
SET ANSI_NULLS OFF
select result =
case
when null=null then 'eq'
else 'ne'
end
SET ANSI_NULLS ON
select result =
case
when null=null then 'eq'
else 'ne'
end
Here is a Quick Fix
ISNULL(A,0)=ISNULL(B,0)
0 can be changed to something that can never happen in your data
"Is unknown a equal to unknown b? There's no way to know, so: unknown."
The question was : why does the comparison yield FALSE ?
Given three-valued logic, it would indeed be sensible for the comparison to yield UNKNOWN (not FALSE). But SQL does yield FALSE, and not UNKNOWN.
One of the myriads of perversities in the SQL language.
Furthermore, the following must be taken into account :
If "unkown" is a logical value in ternary logic, then it ought to be the case that an equality comparison between two logical values that both happen to be (the value for) "unknown", then that comparison ought to yield TRUE.
If the logical value is itself unknown, then obviously that cannot be represented by putting the value "unknown" there, because that would imply that the logical value is known (to be "unknown"). That is, a.o., how relational theory proves that implementing 3-valued logic raises the requirement for a 4-valued logic, that a 4 valued logic leads to the need for a 5-valued logic, etc. etc. ad infinitum.