testing inequality with columns that can be null - sql

So, I asked a question this morning, which I did not phrase correctly, so I got a lot of responses as to why NULL compared to anything will give NULL/FALSE.
My actual question was, what is the time honored fashion in which db guys test inequalities for two columns that can both be NULL. My question is the exact opposite of this question.
The requirements are as follows, A and B are two columns:
a) if A and B are both NULL, they are equal, return FALSE
b) if A and B are both not NULL, then return A<>B
c) if either A or B are NULL, they are not equal, return TRUE

Depending on the data type and possible values for the columns:
COALESCE(A, -1) <> COALESCE(B, -1)
The trick is finding a value (here I used -1) that will NEVER appear in your data.
The other way would be:
(A <> B) OR (A IS NOT NULL AND B IS NULL) OR (A IS NULL AND B IS NOT NULL)
This can be a problem depending on how your particular RDBMS handles NULLs. By the ANSI standard, this should give you what you want, but who follows standards anyway. :)
P.S. - I should also point out that using the COALESCE function may invalidate the use of indexes in comparing the columns. Check your query plan and performance of the query to see if that's a problem.
P.P.S. - I just noticed that OMG Ponies mentioned that Informix doesn't support COALESCE. It's an ANSI standard function I believe, but see what I said above about standards...

I would personally write out the expression you came up with, especially if the table is expected to grow large. Wrapping the columns in function calls hurts performance by making it so the engine can't use any indexes you have on those columns. Of course, in a small table, this may not be any sort of issue, but I still like to do it the explicit way just in case a table ends up growing.

can you try something like this in informix?
CASE
WHEN a IS NULL AND B IS NULL THEN false
WHEN a IS NULL OR B IS NULL THEN true
ELSE a <> B
END
from IBM Informix Guide to SQL: Syntax , CASE Expressions

If you want to be sure about how NULLs are handled, you'll have to use whatever Informix supports for null checking. I haven't turned up much, other than the SE version doesn't support COALESCE, but it does support DECODE and possibly CASE.
WHERE COALESCE(t.a, 0) != COALESCE(t.b, 0)
WHERE DECODE(NULL, 0, t.a) != DECODE(NULL, 0, t.b)

For SQL Server, use:
WHERE ISNULL(A, '') <> ISNULL(B, '')

The trouble is that a<>b (or a=b) yields NULL, not 1 or 0 when one or both operands are NULL. This doesn't matter for the = case because NULL OR 1 is 1 and NULL OR 0 is NULL which behaves like 0 for selecting in a WHERE clause.
You could say:
a<>b OR (a IS NULL)<>(b IS NULL)
However needing to do it either way may be a sign that you're misusing NULL and should consider changing the schema to use some other NOT NULL value to signify this comparable condition.
For example if you've got a person table with a title column, don't use NULL to signify that they have no title; that's not a ‘missing’ datum, it's just that no title exists. So store it as an empty string '' that you can happily compare with other empty strings. (Well unless you run Oracle of course, with its Empty String Problem...)

IBM Informix Dynamic Server has a somewhat peculiar view of booleans for a variety of historical (aka 'bad') reasons. Adapting the idea suggested by #astander, this CASE expression 'works', but I'd be the first to say 'not obvious' (see - I said it before you did!). The setup phase:
create table x(a int, b int);
insert into x values(null, null);
insert into x values(null, 1);
insert into x values(1, null);
insert into x values(1, 1);
insert into x values(1, 2);
The SELECT statement:
SELECT *
FROM x
WHERE CASE
WHEN a IS NULL AND b IS NULL THEN 'f'::BOOLEAN
WHEN a IS NULL OR b IS NULL THEN 't'::BOOLEAN
WHEN a != b THEN 't'::BOOLEAN
ELSE 'f'::BOOLEAN
END
;
The result from this query is:
1
1
1 2
Issues:
IDS does not recognize FALSE or TRUE or UNKNOWN as keywords.
IDS does not recognize boolean expressions such as 'a != b' (or 'a <> b') as such.
Yes, it pains me greatly to have to state this.

If
where ((A=B) OR (A IS NULL AND B IS NULL))
is for equality, then why just not use:
where NOT (
((A=B) OR (A IS NULL AND B IS NULL))
)
for inequality?

A slight modification of #user3830747 answer, based on demorgans law:
NOT (NVL(a = b,FALSE) OR COALESCE(a,b) IS NULL)

Related

Comparing two empty Strings in Oracle SQL

Hi today I have met with weird situation. I had a where clause where was condition which returns String and I wanted to check if it's empty or not. And when it returns empty string Oracle still treat it like a different Strings. So I went further and prepared simple queries:
select 1 from dual where 1 = 1;
returns: 1
select 1 from dual where 'A' = 'A';
returns: 1
And now what I cannot understand:
select 1 from dual where '' = '';
No result.
Even if I check if they are different there is still no result.
select 1 from dual where '' != '';
No result.
Can someone explain it for me ?
Oracle treats empty strings as NULL. It's a gotcha. Make a note of it and hope it never bites you in the butt in production.
The reason is as #Captain Kenpachi explained. If want to compare two strings (or other types that are the same) and want to be tolerant of NULLs (or empty string in Oracle as it treats it as the same) then you need to involve an IS test.
You could try the common cheat of using a rogue value that will never be used but Murphy's Law dictates that one day someone will. This technique also has the drawback that the rogue value should match the type of the thing you are comparing i.e. comparing strings you need a rogue string while comparing dates you need a rouge date. This also means you can't cut-and-paste it liberally without applying a little thought. Example:
WHERE NVL(col1,'MyRougeValue')=NVL(col2,'MyRougeValue')
The standard version is to explicitly test for NULLs
WHERE (col1=col2 OR (col1 IS NULL AND col2 IS NULL))
The opposite becomes WHERE NOT(col1=col2 OR (col1 IS NULL AND col2 IS NULL))
I have seen the a long winded opposite version (as seen in Toad's data compare tool)
WHERE (col1<>col2 OR (col1 IS NULL AND col2 IS NOT NULL) OR (col1 IS NOT NULL AND col2 IS NULL))
Oracle does have a handy DECODE function that is basically is IF a IS b THEN c ELSE d so equality is WHERE DECODE(col1,col2,1,0)=1 and the opposite is WHERE DECODE(col1,col2,1,0)=0. You may find this a little slower than the explicit IS test. It is proprietary to Oracle but helps make up for the empty string problem.

Oracle sql null value is not selected

In NAME table FIRST column is having null but no rows are selected. Please help me to understand.
SELECT * FROM NAME WHERE FIRST != '1'
Any comparison with null is false - = and <>, >, <, and so on. You cannot use null in an IN list as well - it would be ignored. Moreover, two nulls are not even equal to each other.
To get the nulls, you need to ask for them explicitly, like this:
SELECT * FROM NAME WHERE FIRST IS NULL OR FIRST != '1'
Any comparison to NULL returns NULL, which is equivalent to FALSE. This is true eve of not-equals.
If you want to include NULL values, do one of the following:
where first <> '1' or first is null
or
where coalesce(first, '<null>') <> '1'
In Oracle, null is not considered a legal value to select unless you explicitly ask for it:
select * from name where (first != '1') or first is null
You could also use NVL (similar to coalesce):
select * from name where nvl(first,'0') != '1'
That is correct because NULL can never be compared with anything else....
The only option that you have is to include a NULL check as an or in the command
SELECT * FROM NAME WHERE FIRST!=1 OR FIRST IS NULL
According to Oracle Documentation NULL is defined as a value not knownm or when the value is not meaningful. That is solely the reason why Oracle mentions not consider a value of ZERO as NULL. This is just an FYI, an addon. Thanks!
NULL is dumb. Period.
NULL is evil.
If X is NULL and Y is NULL, then X does in fact equal Y because they are both NULL.
It's also a PITA that I can't say
IF X IN ('a','B','C', null)
Because this condition happens. But now I have to say
IF ( X IN ('a','B','C') or X is NULL )
which is a waste of time and a risk of error if I forget the parentheses.
What irks me further is that NULL shouldn't happen in the first place. Fields (er... ok kids, I'll call them Columns) should always be initialized. Period. Stop the nulls. Don't allow them. Default values should always be zeroes or blanks so that those folks that are too lazy to initialize columns in their software will have them initialized for them automatically.
There are many instances where a failure to define default values of zeroes and blanks makes life more difficult than it has to be.

In most SQL implementations, as opposed to standard programming languages, why doesn't x != null return true?

Let's suppose that x is some variable that has any value other than null, say 4, as an example. What should the following expression return?
x != null
In just about every programming language I have ever worked with (C#, Javascript, PHP, Python), this expression, or an equivalent expression in that language, evaluates to true.
SQL implementations, on the other hand, all seem to handle this quite differently. If one or both operands of the inequality operator are NULL, either NULL or False will be returned. This is basically the opposite of the behavior that most programming languages use, and it is extremely unintuitive to me.
Why is the behavior in SQL like this? What is it about relationaly database logic that makes null behave so much differently than it does in general purpose programming?
The null in most programming languages is considered "known", while NULL in SQL is considered "unknown".
So X == null compares X with a known value and the result is known (true or false).
But X = NULL compares X with an unknown value and the result is unknown (i.e. NULL, again). As a consequence, we need a special operator IS [NOT] NULL to test for it.
I'm guessing at least part of the motivation for such NULLs would be the behavior of foreign keys. When a child endpoint of a foreign key is NULL, it shouldn't match any parent, even if the parent is NULL (which is possible if parent is UNIQUE instead of primary key). Unfortunately, this brings many more gotchas than it solves and I personally think SQL should have gone the route of the "known" null and avoided this monkey business altogether.
Even E. F. Codd, inventor or relational model, later indicated that the traditional NULL is not optimal. But for historical reasons, we are pretty much stuck with it.
the reason is that the concept of equality doesn't apply to null. it's not logically true to say that this null does or does not equal this other null.
so, that's all fine for a theoretical reason, but for the sake of convenience, why does sql not allow your to say (x != null)?
well, the reason is because sometimes you want to handle nulls differently.
if I say (columnA = columnB) for example, should that return true if both columns are null?
if I say (columnA != columnB) - should it give the same result when column A is "a" and column B is null, and when column A is "a" and column B is "b"?
the people who made sql decided that distinction was important and so they wrote it to treat the 2 cases differently.
the wikipedia page on this has a pretty decent writeup - http://en.wikipedia.org/wiki/Null_%28SQL%29
well in sql engines you usually don't use the "=" operator but "IS", which then makes it more intuitive.
SELECT 4 IS NULL FROM dual;
> 0
SELECT 4 IS NOT NULL FROM dual;
> 1
NULL doesn't stand for null pointer, it's just not the same concept at all.
sql NULL is a I don't know the value flag, it's not a "there's no pointer" flag. You just should not compare them, they shouldn't be used the same way. This is pretty unintuitive you're right, they should have named it differently.
In SQL, NULL means "an unknown value".
If you say x != NULL you are saying "is the value of x unequal to an unknown value". Well, since we don't know what unknown value is, we don't know if x is equal to it or not. So the answer is "I don't know".
Similarly:
x = NULL OR 1=2 -- Unknown. 1=2 is not true, but we don't know about x=NULL
x = NULL OR 1=1 -- True. We know that at least 1=1 is true, so the OR is fulfulled regardless.
x = NULL AND 1=1 -- Unknown. We want them both to be true to fulful the AND
x = NULL AND 1=2 -- False. We know 1=2 is false, so the AND is not fulfilled regardless.
Also
-- Neither statement will select rows where x is null
select x from T where x = 1
select x from T where x != 1
The only way to check a null is to specificaly ask "is it true that we don't know what the value of x is". That has a yes or no answer, and uses the IS keyword.
If you just want nulls to be treated as zero, or another value, you can use the COALESCE or ISNULL function.
COALESCE(NULL, 1) -- 1
COALESCE(NULL, NULL, 1) -- Also 1
COALESCE(x, y, z, 0) -- x, unless it is null, then y, unless it is null, then z, unless it is null in which case 0.

What applications are there for NULLIF()?

I just had a trivial but genuine use for NULLIF(), for the first time in my career in SQL. Is it a widely used tool I've just ignored, or a nearly-forgotten quirk of SQL? It's present in all major database implementations.
If anyone needs a refresher, NULLIF(A, B) returns the first value, unless it's equal to the second in which case it returns NULL. It is equivalent to this CASE statement:
CASE WHEN A <> B OR B IS NULL THEN A END
or, in C-style syntax:
A == B || A == null ? null : A
So far the only non-trivial example I've found is to exclude a specific value from an aggregate function:
SELECT COUNT(NULLIF(Comment, 'Downvoted'))
This has the limitation of only allowing one to skip a single value; a CASE, while more verbose, would let you use an expression.
For the record, the use I found was to suppress the value of a "most recent change" column if it was equal to the first change:
SELECT Record, FirstChange, NULLIF(LatestChange, FirstChange) AS LatestChange
This was useful only in that it reduced visual clutter for human consumers.
I rather think that
NULLIF(A, B)
is syntactic sugar for
CASE WHEN A = B THEN NULL ELSE A END
But you are correct: it is mere syntactic sugar to aid the human reader.
I often use it where I need to avoid the Division by Zero exception:
SELECT
COALESCE(Expression1 / NULLIF(Expression2, 0), 0) AS Result
FROM …
Three years later, I found a material use for NULLIF: using NULLIF(Field, '') translates empty strings into NULL, for equivalence with Oracle's peculiar idea about what "NULL" represents.
NULLIF is handy when you're working with legacy data that contains a mixture of null values and empty strings.
Example:
SELECT(COALESCE(NULLIF(firstColumn, ''), secondColumn) FROM table WHERE this = that
SUM and COUNT have the behavior of turning nulls into zeros. I could see NULLIF being handy when you want to undo that behavior. If fact this came up in a recent answer I provided. If I had remembered NULLIF I probably would have written the following
SELECT student,
NULLIF(coursecount,0) as courseCount
FROM (SELECT cs.student,
COUNT(os.course) coursecount
FROM #CURRENTSCHOOL cs
LEFT JOIN #OTHERSCHOOLS os
ON cs.student = os.student
AND cs.school <> os.school
GROUP BY cs.student) t

Addition with NULL values

In a stored procedure (Oracle in my case), I want to add some values to an existing record. Problem is that both the existing value and the value to be added can be null. I only want the result to be NULL when both operands are null. If only one of them is null, I want the result to be the other operand. If both are non-null, I want the result to be "normal" addition.
Here's what I am using so far:
SELECT column INTO anz_old FROM aTable Where <someKeyCondition>;
IF anz_old IS NULL
THEN
anz_new := panzahl;
ELSE
anz_new := anz_new + NVL (panzahl, 0);
END IF;
UPATE aTabel set column = anz_new Where <someKeyCondition>;
Is there a more elegant way (pereferably completely in SQL, i.e. just in an update statement short of a long CASE-Statement with basically the same logic as the above code)?
If you want to add a and b and either may be null, you could use coalesce, which returns the first non-null parameter you pass it:
coalesce(a+b, a, b)
So in this case, if neither parameter is null, it will return the sum. If only b is null, it will skip a+b and return a. If a is null, it will skip a+b and a and return b, which will only be null if they are both null.
If you want the answer to be 0 rather than null if both a and b are null, you can pass 0 as the last parameter:
coalesce(a+b, a, b, 0)
Do consider #erwins answer - null might not be the right thing to be using.
I accomplished it this way:
coalesce("Column1",0.00) + coalesce("Column2",0.00)
I'm working with front end high level execs.... They don't understand why NULL and 0 aren't handled the same way.
In my case it works, just replacing NULLs with 0.00... may not in all though :)
You can also use ISNULL, so if you have 3 values
isnull(val1,0)+isnull(val2,0)+isnull(val3,0)
which ever column will have a NULL will use a 0, otherwise its original value.
In SQL, Null is supposed to be a state that says "I don't know".
If you don't know how much b is, then you also do not know how much a+b is, and it is misleading to pretend that a+b=a in that case.
In SQL terms, when adding numbers, a result of NULL means there were no non-null numbers added.
This suggests that a sensible answer in SQL terms would be
CASE WHEN A IS NULL AND B IS NULL THEN NULL ELSE ISNULL(A, 0) + ISNULL(B, 0) END