Handle null and empty in SQL Server [duplicate] - sql

This is obviously not going to return a row...
select 1 where null = ''
But why does this also not return a row?
select 1 where null <> ''
How can both of those WHEREs be "false"?

"How can both of those WHEREs be "false"?"
It's not!
The answer is not "true" either!
The answer is "we don't know".
Think of NULL as a value you don't know yet.
Would you bet it's '' ?
Would you bet it's not '' ?
So, safer is to declare you don't know yet. The answer to both questions, therefore, is not false but I don't know, e.g. NULL in SQL.

Because this is an instance of SQL Server conforming to ANSI SQL ;-)
NULL in SQL is somewhat similar to IEEE NaN in comparison rules: NaN != NaN and NaN == NaN are both false. It takes a special operator IS NULL in SQL (or "IsNaN" for IEEE FP) to detect these special values. (There are actually multiple ways to detect these special values: IS NULL/"IsNaN" are just clean and simple methods.)
However, NULL = x goes one step further: the result of NULL =/<> x is not false. Rather, the result of the expression is itself NULL UNKNOWN. So NOT(NULL = '') is also NULL UNKNOWN (or "false" in a where context -- see comment). Welcome to the world of SQL tri-state logic ;-)
Since the question is about SQL Server, then for completeness: If running run with "SET ANSI_NULLS OFF" -- but see remarks/warnings at top -- then the "original" TSQL behavior can be attained.
"Original" behavior (this is deprecated):
SET ANSI_NULLS OFF;
select 'eq' where null = ''; -- no output
select 'ne' where null <> ''; -- output: ne
select 'not' where not(null = ''); -- output: not; null = '' -> False, not(False) -> True
ANSI-NULLs behavior (default in anything recent, please use):
SET ANSI_NULLS ON;
select 'eq' where null = ''; -- no output
select 'ne' where null <> ''; -- no output
select 'not' where not(null = ''); -- no output; null = '' -> Unknown, not(Unknown) -> Unknown
Happy coding.

Null has some very odd behaviour when comparing - the wikipedia article explains it quite well. In a nutshell, as well as true and false, there is an unknown value, which SQL returns when doing a comparison.

The SQL standard specifies that NULL = x is false for all x (even if x is itself NULL) and SQL Server is just following the standard. If you want to check if something is or is not NULL, then you have to use x IS NULL or x IS NOT NULL.

Databses have what is known as three valued logic. The values of true, false, and unknown.
Read this http://www.simple-talk.com/sql/learn-sql-server/sql-and-the-snare-of-three-valued-logic/

Because any comparison with NULL returns false (or to be precise: returns NULL)
As NULL is the absence of information you cannot tell whether is equal to or not equal to something.

Probably getting a little repetitive here, but my two cents anyway:
A. a_horse_with_no_name's example (comment, above) is a very good one!
B. In non=mathmatical terms, NULL is an unknown value. An empty String is a string with a length of zero - hence, a "known" value. This is why NULL does not compare equally or not equally to an empty string.
C. Since NULL represents unknown, it is impossible to compare two NULL values for equality. If you don't know the value of X, and you don't know the value of Y, then you don't know if they are equal or not.

See SQL-92 8.2 comparison predicate saying:
General Rules
Let X and Y be any two corresponding <row value constructor element>s. Let XV and YV be the values represented by X and Y, respectively.
Case:
a) If XV or YV is the null value, then "X <comp op> Y" is unknown.

Related

In SQL, is there a difference between "IS" and "=" when returning values in where statements?

I am currently learning SQL utilizing Codecademy and am curious if there is a difference between using "IS" or "=".
In the current lesson, I wrote this code:
SELECT *
FROM nomnom
WHERE neighborhood IS 'Midtown'
OR neighborhood IS 'Downtown'
OR neighborhood IS 'Chinatown';
Which ran perfectly fine. I always like to look at the answer after to see if there was something I did wrong or could improve on. The answer had this code:
SELECT *
FROM nomnom
WHERE neighborhood = 'Midtown'
OR neighborhood = 'Downtown'
OR neighborhood = 'Chinatown';
Do IS and = function the same?
All that you want to know you can find it here:
The IS and IS NOT operators work like = and != except when one or both
of the operands are NULL. In this case, if both operands are NULL,
then the IS operator evaluates to 1 (true) and the IS NOT operator
evaluates to 0 (false). If one operand is NULL and the other is not,
then the IS operator evaluates to 0 (false) and the IS NOT operator is
1 (true). It is not possible for an IS or IS NOT expression to
evaluate to NULL. Operators IS and IS NOT have the same precedence as
=.
taken from: SQL As Understood By SQLite.
The important part is: ...except when one or both of the operands are NULL... because when using = or != (<>) and 1 (or both) of the operands is NULL then the result is also NULL and this is the difference to IS and IS NOT.
They work the same but "IS" is a keyword in MySQL and is generally used while comparing NULL values. While comparing NULL values "=" does not work.
SELECT * FROM nomnom WHERE neighborhood IS NULL
The above statement would run perfectly fine but
SELECT * FROM nomnom WHERE neighborhood = NULL
would result in an error.
They are the same for these cases, but further down the line you will discover one nifty little value called NULL.
NULL is a pain because... it doesn't exist.
0 = NULL returns FALSE;
Date <> [Column] will not return lines with NULL, only those with a value that is different.
Hell, even NULL = NULL returns false. And NULL <> NULL also returns false. That is why "IS" exists. Because NULL IS NULL will return true.
So as a general rule, use = for values.
Keep "IS" for null.
[Column] IS NULL
or
[Column] IS NOT NULL
And remember to always check if your column is nullable that you need to plan for null values in your WHERE or ON clauses.

why does 10/NULL evaluate to null?

In SQL , why does 10/NULL evaluate to NULL (or unknown) ? Example :
if((10/NULL) is NULL)
DBMS_OUTPUT.PUT_LINE("Null.");
However , 1 = NULL being a COMPARISON is considered as FALSE. Shouldn't 10/NULL also be considered as FALSE ?
I am referring to SQL only . Not any DBMS in particular. And it might be a duplicate but I didn't know what keywords to put in search for this query.
Shouldn't 10/NULL also be considered as FALSE?
No, because:
Any arithmetic expression containing a null always evaluates to null. For example, null added to 10 is null. In fact, all operators (except concatenation) return null when given a null operand.
Emphasis mine, taken from the Oracle manual: http://docs.oracle.com/cd/E11882_01/server.112/e26088/sql_elements005.htm#i59110
And this is required by the SQL standard.
Edit, as the question was for RDBMS in general:
SQL Server
When SET ANSI_NULLS is ON, an operator that has one or two NULL expressions returns UNKNOWN
Link to the the manual:
MySQL
An expression that contains NULL always produces a NULL value unless otherwise indicated in the documentation for a particular function or operator
Link to the manual
DB2
if either operand can be null, the result can be null, and if either is null, the result is the null value
Link to the manual:
PostgreSQL
Unfortunately I could not find such an explicit statement in the PostgreSQL manual, although I sure it behaves the same.
Warning: The "(except concatenation)" is an Oracle only and non-standard exception. (The empty string and NULL are almost identical in Oracle). Concatenating nulls gives null in all other DBMS.
1 = null is not null. It is actually unknown. As well as any other null operation.
The equality predicate 1 = NULL evaluates to NULL. But NULL in a boolean comparison is considered false.
If you do something like NOT( 1 = NULL ), 1 = NULL evaluates to NULL, NOT( NULL ) evaluates to NULL and so the condition as a whole ends up evaluating to false.
Oracle has a section in their documentation on handling NULL values in comparisons and conditional statements-- other databases will handle things in an very similar manner.
10/something means that you are counting how much "something" will be in 10
in this case you're counting how much "nothing" will be in 10 - that's infinity, unknown..
1 = NULL is false because one does not equal nothing
The NULLIF function accepts two parameters. If the first parameter is equal to the second parameter, NULLIF returns Null. Otherwise, the value of the first parameter is returned.
NULLIF(value1, value2)
NVL
The NVL function accepts two parameters. It returns the first non-NULL parameter or NULL if all parameters are NULL.
also check this conditional outcomes:
This "null equals UNKNOWN truth value" proposition introduces an inconsistency into SQL 3VL. One major problem is that it contradicts a basic property of nulls, the property of propagation. Nulls, by definition, propagate through all SQL expressions. The Boolean truth values do not have this property. Consider the following scenarios in SQL:1999, in which two Boolean truth values are combined into a compound predicate. According to the rules of SQL 3VL, and as shown in the 3VL truth table shown earlier in this article, the following statements hold:
( TRUE OR UNKNOWN ) → TRUE
( FALSE AND UNKNOWN ) → FALSE
However, because nulls propagate, treating null as UNKNOWN results in the following logical inconsistencies in SQL 3VL:
( TRUE OR NULL ) → NULL ( = UNKNOWN )
( FALSE AND NULL ) → NULL ( = UNKNOWN )
The SQL:1999 standard does not define how to deal with this inconsistency, and results could vary between implementations. Because of these inconsistencies and lack of support from vendors the SQL Boolean datatype did not gain widespread acceptance. Most SQL DBMS platforms now offer their own platform-specific recommendations for storing Boolean-type data.
Note that in the PostgreSQL implementation of SQL, the null value is used to represent all UNKNOWN results and the following evaluations occur:
( TRUE OR NULL ) → TRUE
( FALSE AND NULL ) → FALSE
( FALSE OR NULL ) IS NULL → TRUE
( TRUE AND NULL ) IS NULL → TRUE

In most SQL implementations, as opposed to standard programming languages, why doesn't x != null return true?

Let's suppose that x is some variable that has any value other than null, say 4, as an example. What should the following expression return?
x != null
In just about every programming language I have ever worked with (C#, Javascript, PHP, Python), this expression, or an equivalent expression in that language, evaluates to true.
SQL implementations, on the other hand, all seem to handle this quite differently. If one or both operands of the inequality operator are NULL, either NULL or False will be returned. This is basically the opposite of the behavior that most programming languages use, and it is extremely unintuitive to me.
Why is the behavior in SQL like this? What is it about relationaly database logic that makes null behave so much differently than it does in general purpose programming?
The null in most programming languages is considered "known", while NULL in SQL is considered "unknown".
So X == null compares X with a known value and the result is known (true or false).
But X = NULL compares X with an unknown value and the result is unknown (i.e. NULL, again). As a consequence, we need a special operator IS [NOT] NULL to test for it.
I'm guessing at least part of the motivation for such NULLs would be the behavior of foreign keys. When a child endpoint of a foreign key is NULL, it shouldn't match any parent, even if the parent is NULL (which is possible if parent is UNIQUE instead of primary key). Unfortunately, this brings many more gotchas than it solves and I personally think SQL should have gone the route of the "known" null and avoided this monkey business altogether.
Even E. F. Codd, inventor or relational model, later indicated that the traditional NULL is not optimal. But for historical reasons, we are pretty much stuck with it.
the reason is that the concept of equality doesn't apply to null. it's not logically true to say that this null does or does not equal this other null.
so, that's all fine for a theoretical reason, but for the sake of convenience, why does sql not allow your to say (x != null)?
well, the reason is because sometimes you want to handle nulls differently.
if I say (columnA = columnB) for example, should that return true if both columns are null?
if I say (columnA != columnB) - should it give the same result when column A is "a" and column B is null, and when column A is "a" and column B is "b"?
the people who made sql decided that distinction was important and so they wrote it to treat the 2 cases differently.
the wikipedia page on this has a pretty decent writeup - http://en.wikipedia.org/wiki/Null_%28SQL%29
well in sql engines you usually don't use the "=" operator but "IS", which then makes it more intuitive.
SELECT 4 IS NULL FROM dual;
> 0
SELECT 4 IS NOT NULL FROM dual;
> 1
NULL doesn't stand for null pointer, it's just not the same concept at all.
sql NULL is a I don't know the value flag, it's not a "there's no pointer" flag. You just should not compare them, they shouldn't be used the same way. This is pretty unintuitive you're right, they should have named it differently.
In SQL, NULL means "an unknown value".
If you say x != NULL you are saying "is the value of x unequal to an unknown value". Well, since we don't know what unknown value is, we don't know if x is equal to it or not. So the answer is "I don't know".
Similarly:
x = NULL OR 1=2 -- Unknown. 1=2 is not true, but we don't know about x=NULL
x = NULL OR 1=1 -- True. We know that at least 1=1 is true, so the OR is fulfulled regardless.
x = NULL AND 1=1 -- Unknown. We want them both to be true to fulful the AND
x = NULL AND 1=2 -- False. We know 1=2 is false, so the AND is not fulfilled regardless.
Also
-- Neither statement will select rows where x is null
select x from T where x = 1
select x from T where x != 1
The only way to check a null is to specificaly ask "is it true that we don't know what the value of x is". That has a yes or no answer, and uses the IS keyword.
If you just want nulls to be treated as zero, or another value, you can use the COALESCE or ISNULL function.
COALESCE(NULL, 1) -- 1
COALESCE(NULL, NULL, 1) -- Also 1
COALESCE(x, y, z, 0) -- x, unless it is null, then y, unless it is null, then z, unless it is null in which case 0.

Sql Server: CASE Statement does unexpected behavior when comparing to NULL

Given:
The following Select statement:
select case NULL
when NULL then 0
else 1
end
Problem:
I'm expecting this to return 0 but instead it returns 1. What gives?
Generally speaking, NULL is not something you should attempt to compare for equality, which is what a case statement does. You can use "Is NULL" to test for it. There is no expectation that NULL != NULL or that NULL = NULL. It's an indeterminate, undefined value, not a hard constant.
-- To encompass questions in the comments --
If you need to retrieve a value when you may encounter a NULL column, try this instead:
Case
When SomeColumn IS NULL
Then 0
Else 1
End
I believe that should work. As far as your original post is concerned:
Select Case NULL
When NULL then 0 // Checks for NULL = NULL
else 1 // NULL = NULL is not true (technically, undefined), else happens
end
The trouble is that your Case select automatically attempts to use equality operations. That simply doesn't work with NULL.
I was going to add this as a comment to Aaron's answer, but it was getting too long, so I'll add it as another (part of the) answer.
The CASE statement actually has two distinct modes, simple and searched.
From BOL:
The CASE expression has two formats:
The simple CASE expression compares an expression to a set of simple expressions to determine the result.
The searched CASE expression evaluates a set of Boolean expressions to determine the result.
When the simple CASE (your example) does what it describes as comparison it does an equality comparison - i.e. =
This is clarified in the later documentation:
The simple CASE expression operates by comparing the first expression
to the expression in each WHEN clause for equivalency. If these
expressions are equivalent, the expression in the THEN clause will be
returned.
Allows only an equality check.
Because anything = NULL is always false in ANSI SQL (and if you didn't know this, you need to read up on NULLs in SQL more generally, particularly also with the behavior in the other searched comparison - WHERE x IN (a, b, c)), you cannot use NULL in a simple case and have it ever be compared to a value, with a NULL either in the initial expression or in the list of expressions to be compared against.
If you want to check for NULL, you will have to use an IF/ELSE construct or the searched CASE with a full expression.
I agree that it's kind of unfortunate there is no version which supports an IS comparison to make it easier to write:
select case colname
when IS NULL then 0
else 1
end
Which would make writing certain long CASE statements easier:
select case colname
when IS NULL then ''
when 1 then 'a'
when 2 then 'b'
when 3 then 'c'
when 4 then 'd'
else 'z'
end
But that's just wishful thinking...
An option is to use ISNULL or COALESCE:
select case COALESCE(colname, 999999) -- 999999 is some value never used
when 999999 then ''
when 1 then 'a'
when 2 then 'b'
when 3 then 'c'
when 4 then 'd'
else 'z'
end
But it isn't always a great option.
In addition to the other answers, you need to change the syntax for CASE slightly to do this:
SELECT CASE
WHEN NULL IS NULL THEN 0
ELSE 1
END;
Using the value in your syntax implicitly uses an equals comparison. NULL is unknown, and so is NULL = NULL, so with your current code you will always get zero 1 (geez I did it too).
To get the behavior you want, you can use SET ANSI_NULLS ON; however note that this can change other code in ways you may not be able to predict, and the setting is deprecated - so it will stop working at all in a future version of SQL Server (see this SQL Server 2008 doc).
You need to use the IS NULL operator. Standard comparison operators do not work with NULL.
Check out these MSDN articles about Null that may be useful:
IS [NOT] NULL (Transact-SQL)
Null Values

why is null not equal to null false

I was reading this article:
Get null == null in SQL
And the consensus is that when trying to test equality between two (nullable) sql columns, the right approach is:
where ((A=B) OR (A IS NULL AND B IS NULL))
When A and B are NULL, (A=B) still returns FALSE, since NULL is not equal to NULL. That is why the extra check is required.
What about when testing inequalities? Following from the above discussion, it made me think that to test inequality I would need to do something like:
WHERE ((A <> B) OR (A IS NOT NULL AND B IS NULL) OR (A IS NULL AND B IS NOT NULL))
However, I noticed that that is not necessary (at least not on informix 11.5), and I can just do:
where (A<>B)
If A and B are NULL, this returns FALSE. If NULL is not equal to NULL, then shouldn't this return TRUE?
EDIT
These are all good answers, but I think my question was a little vague. Allow me to rephrase:
Given that either A or B can be NULL, is it enough to check their inequality with
where (A<>B)
Or do I need to explicitly check it like this:
WHERE ((A <> B) OR (A IS NOT NULL AND B IS NULL) OR (A IS NULL AND B IS NOT NULL))
REFER to this thread for the answer to this question.
Because that behavior follows established ternary logic where NULL is considered an unknown value.
If you think of NULL as unknown, it becomes much more intuitive:
Is unknown a equal to unknown b? There's no way to know, so: unknown.
relational expressions involving NULL actually yield NULL again
edit
here, <> stands for arbitrary binary operator, NULL is the SQL placeholder, and value is any value (NULL is not a value):
NULL <> value -> NULL
NULL <> NULL -> NULL
the logic is: NULL means "no value" or "unknown value", and thus any comparison with any actual value makes no sense.
is X = 42 true, false, or unknown, given that you don't know what value (if any) X holds? SQL says it's unknown. is X = Y true, false, or unknown, given that both are unknown? SQL says the result is unknown. and it says so for any binary relational operation, which is only logical (even if having NULLs in the model is not in the first place).
SQL also provides two unary postfix operators, IS NULL and IS NOT NULL, these return TRUE or FALSE according to their operand.
NULL IS NULL -> TRUE
NULL IS NOT NULL -> FALSE
All comparisons involving null are undefined, and evaluate to false. This idea, which is what prevents null being evaluated as equivalent to null, also prevents null being evaluated as NOT equivalent to null.
The short answer is... NULLs are weird, they don't really behave like you'd expect.
Here's a great paper on how NULLs work in SQL. I think it will help improve your understanding of the topic. I think the sections on handling null values in expressions will be especially useful for you.
http://www.oracle.com/technology/oramag/oracle/05-jul/o45sql.html
The default (ANSI) behaviour of nulls within an expression will result in a null (there are enough other answers with the cases of that).
There are however some edge cases and caveats that I would place when dealing with MS Sql Server that are not being listed.
Nulls within a statement that is grouping values together will be considered equal and be grouped together.
Null values within a statement that is ordering them will be considered equal.
Null values selected within a statement that is using distinct will be considered equal when evaluating the distinct aspect of the query
It is possible in SQL Server to override the expression logic regarding the specific Null = Null test, using the SET ANSI_NULLS OFF, which will then give you equality between null values - this is not a recommended move, but does exist.
SET ANSI_NULLS OFF
select result =
case
when null=null then 'eq'
else 'ne'
end
SET ANSI_NULLS ON
select result =
case
when null=null then 'eq'
else 'ne'
end
Here is a Quick Fix
ISNULL(A,0)=ISNULL(B,0)
0 can be changed to something that can never happen in your data
"Is unknown a equal to unknown b? There's no way to know, so: unknown."
The question was : why does the comparison yield FALSE ?
Given three-valued logic, it would indeed be sensible for the comparison to yield UNKNOWN (not FALSE). But SQL does yield FALSE, and not UNKNOWN.
One of the myriads of perversities in the SQL language.
Furthermore, the following must be taken into account :
If "unkown" is a logical value in ternary logic, then it ought to be the case that an equality comparison between two logical values that both happen to be (the value for) "unknown", then that comparison ought to yield TRUE.
If the logical value is itself unknown, then obviously that cannot be represented by putting the value "unknown" there, because that would imply that the logical value is known (to be "unknown"). That is, a.o., how relational theory proves that implementing 3-valued logic raises the requirement for a 4-valued logic, that a 4 valued logic leads to the need for a 5-valued logic, etc. etc. ad infinitum.