What applications are there for NULLIF()? - sql

I just had a trivial but genuine use for NULLIF(), for the first time in my career in SQL. Is it a widely used tool I've just ignored, or a nearly-forgotten quirk of SQL? It's present in all major database implementations.
If anyone needs a refresher, NULLIF(A, B) returns the first value, unless it's equal to the second in which case it returns NULL. It is equivalent to this CASE statement:
CASE WHEN A <> B OR B IS NULL THEN A END
or, in C-style syntax:
A == B || A == null ? null : A
So far the only non-trivial example I've found is to exclude a specific value from an aggregate function:
SELECT COUNT(NULLIF(Comment, 'Downvoted'))
This has the limitation of only allowing one to skip a single value; a CASE, while more verbose, would let you use an expression.
For the record, the use I found was to suppress the value of a "most recent change" column if it was equal to the first change:
SELECT Record, FirstChange, NULLIF(LatestChange, FirstChange) AS LatestChange
This was useful only in that it reduced visual clutter for human consumers.

I rather think that
NULLIF(A, B)
is syntactic sugar for
CASE WHEN A = B THEN NULL ELSE A END
But you are correct: it is mere syntactic sugar to aid the human reader.

I often use it where I need to avoid the Division by Zero exception:
SELECT
COALESCE(Expression1 / NULLIF(Expression2, 0), 0) AS Result
FROM …

Three years later, I found a material use for NULLIF: using NULLIF(Field, '') translates empty strings into NULL, for equivalence with Oracle's peculiar idea about what "NULL" represents.

NULLIF is handy when you're working with legacy data that contains a mixture of null values and empty strings.
Example:
SELECT(COALESCE(NULLIF(firstColumn, ''), secondColumn) FROM table WHERE this = that

SUM and COUNT have the behavior of turning nulls into zeros. I could see NULLIF being handy when you want to undo that behavior. If fact this came up in a recent answer I provided. If I had remembered NULLIF I probably would have written the following
SELECT student,
NULLIF(coursecount,0) as courseCount
FROM (SELECT cs.student,
COUNT(os.course) coursecount
FROM #CURRENTSCHOOL cs
LEFT JOIN #OTHERSCHOOLS os
ON cs.student = os.student
AND cs.school <> os.school
GROUP BY cs.student) t

Related

Does the concatenation of NULL + 'something' always yield NULL? Why?

I want to apply an OUTER APPLY computation on every row of a set. This computation concatenates a number of string fields into one but sometimes theses fields are null or filled with empty strings. The concatenation uses '-' as a delimiter between fields, so when they are empty ('') the result is '----' instead of a NULL, the result I'd like to get.
Before doing this computation I need to check the value of these fields.
How would you do it?
I thought about using NULLIF and it seems that it behaves as I expect, but I don't know why.
Will the concatenation of NULL +'something' always be NULL? Why?
SELECT
string_1,
string_2,
string_3,
string_4,
string_5,
string_concat,
FROM Table1
OUTER APPLY(VALUES(NULLIF(string_1,'')+'-'+NULLIF(string_2,'')+'-'+NULLIF(string_3,'')+'-'+NULLIF(string_4,'')+'-'+NULLIF(string_5,''))) v1(string_concat)
Table1 doesn't have any index and I can't implement any.
Is this code better in terms of performance than doing a CASE in the SELECT?
I like it because the code looks more clean, though.
Does the concatenation of NULL + NULL + 'something' always be NULL?
Why?
This depends on SET CONCAT_NULL_YIELDS_NULL setting. If it is ON (the default) then yes concatenating a NULL with the + operator will always yield NULL
I'd probably do it like this though
SELECT string_1,
string_2,
string_3,
string_4,
string_5,
string_concat,
FROM Table1
CROSS APPLY(VALUES (NULLIF(CONCAT(string_1, '-', string_2, '-', string_3, '-', string_4, '-', string_5), '----')) ) v1(string_concat)
so only one NULLIF is needed - on the CONCAT result.
As the VALUES clause always returns exactly one row you can use CROSS APPLY
Will the concatenation of NULL +'something' always be NULL? Why?
Aside the technical explanation that has already been given, you may be interested in the logical motivation on why it is like that.
NULL actually means you don't know something or that something is not applicable and therefore not usable in that context. So anything you do with something you don't know, or doesn't make sense, will yield an "I still don't know/doesn't make sense" result.
For example: let's say a friend of yours has 2 cars, and another friends of yours has some cars but you don't know how many. If I would ask you how many cars your friends have in total, that only answer you could give me is that you don't really know.
Hope this helps to understand. The issue with NULL is much bigger than this as it involves 3-valued-logic instead of the more common and well understood 2-valued-logic. Here's more detail on the issue: http://www.dbdebunk.com/2017/04/null-value-is-contradiction-in-terms.html

Oracle: True value in WHERE

I am building a SQL query inside a function. The functions parameters are my criterias for the WHERE clause. Parameters can be null too.
foo (1,2,3) => SELECT x FROM y WHERE a=1 AND b=2 AND c=3;
foo (null, 2, null) => SELECT x FROM y WHERE b=2;
My approach to do that in code is to add a very first alltime true in the WHERE-clause (e.g. 1=1 or NULL is NULL or 2 > 1)
So I do not need to handle the problem that the 1st WHERE condition is after a "WHERE" and all others are after a "AND".
String sql="SELECT x FROM y WHERE 1=1";
if (a!=null)
{
sql += " AND a="+a;
}
Is there a better term than 1=1 or my other samples to EXPLICITLY have a always true value? TRUE and FALSE is not working.
Oracle does not support a boolean type in SQL. It exists in PL/SQL, but can't be used in a query. The easiest thing to do is 1=1 for true and 0=1 for false. I'm not sure if the optimizer will optimize these out or not, but I suspect the performance impact is negligible. I've used this where the number of predicates is unknown until runtime.
I think this construct is superior to anything using nvl, coalesce, decode, or case, because those bypass normal indexes and I've had them lead to complicated and slow execution plans.
So, yes I'd do something like you have (not sure what language you're building this query in; this is not valid PL/SQL. Also you should use bind variables but that's another story):
sql = "SELECT x FROM y WHERE 1=1";
if (a != null)
{
sql += " AND a=" + a;
}
if (b != null)
{
sql += " AND b=" + b;
}
... etc.
I think this is better than Gordon Linoff's suggestion, because otherwise this would be more complicated because you'd have to have another clause for each predicate checking if there was a previous predicate, i.e. whether you needed to include the AND or not. It just makes the code more verbose to avoid a single, trivial clause in the query.
I've never really understood the approach of mandating a where clause even when there are no conditions. If you are constructing the logic, then combine the conditions. If the resulting string is empty, then leave out the where clause entirely.
Many application languages have the equivalent of concat_ws() -- string concatenation with a separator (join in Python, for instance). So leaving out the where clause does not even result in code that is much more complicated.
As for your question, Oracle doesn't have boolean values, so 1=1 is probably the most common approach.
use NVL so you can set a default value. Look up NVL or NVL2. Either might work for you.
check out: http://www.dba-oracle.com/t_nvl_vs_nvl2.htm
In sql the standard way to "default null" is to use coalesce. So lets say your input is #p1, #p2, and #p3
Then
select *
from table
where a = coalesce(#p1,a) and
b = coalesce(#p2,b) and
c = coalesce(#p3,c)
Thus if (as an example) #p1 is null then it will compare a = a (which is always true). If #p1 is not null then it will filter on a equals that value.
In this way null becomes a "wildcard" for all records.
you could also use decode to get the results you need since it is oracle
Select
decode(a, 1, x,null),
decode(b, 2, x,null),
decode(c, 3, x,null)
from y
What about using OR inside of the parentheses while ANDs exist at outside of them
SELECT x
FROM y
WHERE ( a=:prm1 OR :prm1 is null)
AND ( b=:prm2 OR :prm2 is null)
AND ( c=:prm3 OR :prm3 is null);

What does this SQL Query mean?

I have the following SQL query:
select AuditStatusId
from dbo.ABC_AuditStatus
where coalesce(AuditFrequency, 0) <> 0
I'm struggling a bit to understand it. It looks pretty simple, and I know what the coalesce operator does (more or less), but dont' seem to get the MEANING.
Without knowing anymore information except the query above, what do you think it means?
select AuditStatusId
from dbo.ABC_AuditStatus
where AuditFrequency <> 0 and AuditFrequency is not null
Note that the use of Coalesce means that it will not be possible to use an index properly to satisfy this query.
COALESCE is the ANSI standard function to deal with NULL values, by returning the first non-NULL value based on the comma delimited list. This:
WHERE COALESCE(AuditFrequency, 0) != 0
..means that if the AuditFrequency column is NULL, convert the value to be zero instead. Otherwise, the AuditFrequency value is returned.
Since the comparison is to not return rows where the AuditFrequency column value is zero, rows where AuditFrequency is NULL will also be ignored by the query.
It looks like it's designed to detect a null AuditFrequency as zero and thus hide those rows.
From what I can see, it checks for fields that aren't 0 or null.
I think it is more accurately described by this:
select AuditStatusId
from dbo.ABC_AuditStatus
where (AuditFrequency IS NOT NULL AND AuditFrequency != 0) OR 0 != 0
I'll admit the last part will never do anything and maybe i'm just being pedantic but to me this more accurately describes your query.
The idea is that it is desireable to express a single search condition using a single expression but it's merely style, a question of taste:
One expression:
WHERE age = COALESCE(#parameter_value, age);
Two expressions:
WHERE (
age = #parameter_value
OR
#parameter_value IS NULL
);
Here's another example:
One expression:
WHERE age BETWEEN 18 AND 65;
Two expressions
WHERE (
age >= 18
AND
age <= 65
);
Personally, I have a strong personal perference for single expressions and find them easier to read... if I am familiar with the pattern used ;) Whether they perform differently is another matter...

Matching BIT to DATETIME in CASE statement

I'm attempting to create a T-SQL case statement to filter a query based on whether a field is NULL or if it contains a value. It would be simple if you could assign NULL or NOT NULL as the result of a case but that doesn't appear possible.
Here's the psuedocode:
WHERE DateColumn = CASE #BitInput
WHEN 0 THEN (all null dates)
WHEN 1 THEN (any non-null date)
WHEN NULL THEN (return all rows)
From my understanding, the WHEN 0 condition can be achieved by not providing a WHEN condition at all (to return a NULL value).
The WHEN 1 condition seems like it could use a wildcard character but I'm getting an error regarding type conversion. Assigning the column to itself fixes this.
I have no idea what to do for the WHEN NULL condition. My internal logic seems to think assigning the column to itself should solve this but it does not as stated above.
I have recreated this using dynamic SQL but for various reasons I'd prefer to have it created in the native code.
I'd appreciate any input. Thanks.
The CASE expression (as OMG Ponies said) is mixing and matching datatypes (as you spotted), in addition you can not compare to NULL using = or WHEN.
WHERE
(#BitInput = 0 AND DateColumn IS NULL)
OR
(#BitInput = 1 AND DateColumn IS NOT NULL)
OR
#BitInput IS NULL
You could probably write it using CASE but what you want is an OR really.
You can also use IF..ELSE or UNION ALL to separate the 3 cases

testing inequality with columns that can be null

So, I asked a question this morning, which I did not phrase correctly, so I got a lot of responses as to why NULL compared to anything will give NULL/FALSE.
My actual question was, what is the time honored fashion in which db guys test inequalities for two columns that can both be NULL. My question is the exact opposite of this question.
The requirements are as follows, A and B are two columns:
a) if A and B are both NULL, they are equal, return FALSE
b) if A and B are both not NULL, then return A<>B
c) if either A or B are NULL, they are not equal, return TRUE
Depending on the data type and possible values for the columns:
COALESCE(A, -1) <> COALESCE(B, -1)
The trick is finding a value (here I used -1) that will NEVER appear in your data.
The other way would be:
(A <> B) OR (A IS NOT NULL AND B IS NULL) OR (A IS NULL AND B IS NOT NULL)
This can be a problem depending on how your particular RDBMS handles NULLs. By the ANSI standard, this should give you what you want, but who follows standards anyway. :)
P.S. - I should also point out that using the COALESCE function may invalidate the use of indexes in comparing the columns. Check your query plan and performance of the query to see if that's a problem.
P.P.S. - I just noticed that OMG Ponies mentioned that Informix doesn't support COALESCE. It's an ANSI standard function I believe, but see what I said above about standards...
I would personally write out the expression you came up with, especially if the table is expected to grow large. Wrapping the columns in function calls hurts performance by making it so the engine can't use any indexes you have on those columns. Of course, in a small table, this may not be any sort of issue, but I still like to do it the explicit way just in case a table ends up growing.
can you try something like this in informix?
CASE
WHEN a IS NULL AND B IS NULL THEN false
WHEN a IS NULL OR B IS NULL THEN true
ELSE a <> B
END
from IBM Informix Guide to SQL: Syntax , CASE Expressions
If you want to be sure about how NULLs are handled, you'll have to use whatever Informix supports for null checking. I haven't turned up much, other than the SE version doesn't support COALESCE, but it does support DECODE and possibly CASE.
WHERE COALESCE(t.a, 0) != COALESCE(t.b, 0)
WHERE DECODE(NULL, 0, t.a) != DECODE(NULL, 0, t.b)
For SQL Server, use:
WHERE ISNULL(A, '') <> ISNULL(B, '')
The trouble is that a<>b (or a=b) yields NULL, not 1 or 0 when one or both operands are NULL. This doesn't matter for the = case because NULL OR 1 is 1 and NULL OR 0 is NULL which behaves like 0 for selecting in a WHERE clause.
You could say:
a<>b OR (a IS NULL)<>(b IS NULL)
However needing to do it either way may be a sign that you're misusing NULL and should consider changing the schema to use some other NOT NULL value to signify this comparable condition.
For example if you've got a person table with a title column, don't use NULL to signify that they have no title; that's not a ‘missing’ datum, it's just that no title exists. So store it as an empty string '' that you can happily compare with other empty strings. (Well unless you run Oracle of course, with its Empty String Problem...)
IBM Informix Dynamic Server has a somewhat peculiar view of booleans for a variety of historical (aka 'bad') reasons. Adapting the idea suggested by #astander, this CASE expression 'works', but I'd be the first to say 'not obvious' (see - I said it before you did!). The setup phase:
create table x(a int, b int);
insert into x values(null, null);
insert into x values(null, 1);
insert into x values(1, null);
insert into x values(1, 1);
insert into x values(1, 2);
The SELECT statement:
SELECT *
FROM x
WHERE CASE
WHEN a IS NULL AND b IS NULL THEN 'f'::BOOLEAN
WHEN a IS NULL OR b IS NULL THEN 't'::BOOLEAN
WHEN a != b THEN 't'::BOOLEAN
ELSE 'f'::BOOLEAN
END
;
The result from this query is:
1
1
1 2
Issues:
IDS does not recognize FALSE or TRUE or UNKNOWN as keywords.
IDS does not recognize boolean expressions such as 'a != b' (or 'a <> b') as such.
Yes, it pains me greatly to have to state this.
If
where ((A=B) OR (A IS NULL AND B IS NULL))
is for equality, then why just not use:
where NOT (
((A=B) OR (A IS NULL AND B IS NULL))
)
for inequality?
A slight modification of #user3830747 answer, based on demorgans law:
NOT (NVL(a = b,FALSE) OR COALESCE(a,b) IS NULL)