SQL why use isnull(field, '') <> ''? - sql

I'm reviewing some code I've inherited, and I found a line like this:
And isnull(IH.CustomerItemNumber, '') <> ''
Which one of my predecessors seems to have used in a ton of where clause or join clauses. It looks to me like this is an unnecessary calling of a function, and therefor a performance hog, because NULL will never equal the empty string '', right?
Specifically I took this out of a join clause in a particular query and performance increased dramatically (from 46-49 secs to between 1-3).
Replaced it with AND IH.CustomerItemNumber <> ''
Is my assessment correct here? This is redundant and slow and can be removed? In what situation might this code be beneficial?
EDIT: So, can NULL ever equal the empty string?

This is semantically the same as:
And IH.CustomerItemNumber <> '' And IH.CustomerItemNumber Is Not Null
So it is checking that the column is both not null and not an empty string. Could be important.
UPDATE
In this case, because we're looking for non-equality of a string literal (empty string), you have at least three semantically correct options:
And IH.CustomerItemNumber <> ''
And IH.CustomerItemNumber <> '' And IH.CustomerItemNumber Is Not Null
And isnull(IH.CustomerItemNumber, '') <> ''
The first is going to return the same result set as the other two because <> '' will not match a null, regardless of the ansi_nulls setting.
In a quick test on a dev system, both the first and the second utilized an index seek. The first is very slightly outperforming the second in one of a few very simplified tests.
The third, since it adds a function call, may not utilize indexing like the others, so this is probably the worst choice. That said, in a quick test, isnull was able to use an index scan. Further adding Is Not Null to the third choice actually sped it up and moved it to an index seek. Go figure (GO! GO! Query optimizer!).
As with #Gordon, I would also choose the second option most times since it would better state my intent to other developers (or myself) and would be a better practice to follow if we were checking equality against another column which could be null (think of potential ansi_nulls off).
For completeness' sake:
And nullif(IH.CustomerItemNumber, '') is not null
And case when IH.CustomerItemNumber = '' then null else IH.CustomerItemNumber end is not null
And case IH.CustomerItemNumber when '' then null else IH.CustomerItemNumber end is not null
Are all interpreted exactly the same way (as far as I can tell) in SQL Server and perform the same as the third option above.

The reason that the code is there may be because of the history of the application. Perhaps at some point in time, NULLs were allowed in the field. Then, these were replaced with empty strings.
The reason the code is inefficient is because of the optimization of the join. ISNULL() and its ANSI standard equivalent COALESCE() generally add negligible overhead to the processing of a query. (It does seem that in some versions of SQl Server, COALESCE() evaluates the first argument twice, which is a problem if it is a subquery.)
My guess is that the field has an index on it. SQL Server knows to use the index for the join when the field is used alone. It is not smart enough to use the index when included in a function call. It is the join optimziation that is slowing down the query, not the overhead of the function call.
Personally, I would prefer the form with the explicit NULL check, if the performance is the same:
IH.CustomerItemNumber <> '' and IH.CustomerItemNumber is not null
Being explicit about NULL processing can only help you maintain the code in the future.

You can use for NULL Checking:
And (IH.CustomerItemNumber IS NOT NULL) AND (IH.CustomerItemNumber <> '')
BTW,
ISNULL ( check_expression , replacement_value ) - Replaces NULL with the specified replacement value.
In your case, if the value of IH.CustomerItemNumber is null then it will be replaced by empty value which will then be compared with empty string.

because NULL will never equal the empty string '', right?
NULL is also never not equal to the empty string... it's both at once and neither all at the same time. It communicates a state where you just don't know for sure.

Related

Check column IS NULL not working in Case Expression

I am trying to check a column which belongs to View "IS NULL" and it works good with select statement:
SELECT TOP (1000) [Invoice_Number]
,[Invoice_Date]
,[Invoice_Amount]
,[Invoice_CoCd]
,[Invoice_vendor]
,[Invoice_PBK]
,[Invoice_DType]
,[Invoice_DueDate]
,[Invoice_ClgDate] FROM [dbo].[viewABC] where Invoice_PBK IS NULL
Now I have an update statement, where I need to update a column in a table based on NULL in VIEW:
UPDATE cis
SET
cis.InvoiceStatus =
(
CASE
WHEN RTRIM(LTRIM(imd.[Invoice_PBK])) IS NULL THEN
'HELLO'
WHEN RTRIM(LTRIM(imd.Invoice_DType)) = 'RD'
THEN '233' END)
FROM
[dbo.[tblABC] cis,
[dbo].[viewABC] imd
WHERE [condition logic]
These is no issue with where condition, the IS NULL in the CASE expression causing the problem.
Can someone help please?
You may want to just test your Logic piece by piece.
Try
SELECT ISNULL(RTRIM(LTRIM(imd.[Invoice_PBK])),'Hey Im Null')
FROM
[dbo].[tblABC] cis,
[dbo].[viewABC] imd
WHERE [condition logic]
See if you get that String value on your returned column(s) from the Query. Then, build from there. It could be something as simple as the JOIN. Which I'm not a fan of that old syntax but, without more info on this table other than [conditions]. It's hard to just guess an answer for you. You have no detailed evidence that helps us. It very well could be your conditions but, you're saying "condition logic" and that does nothing for the group on this thread.
Looking at this expression:
cis.InvoiceStatus =
CASE
WHEN RTRIM(LTRIM(imd.[Invoice_PBK])) IS NULL THEN
'HELLO'
WHEN RTRIM(LTRIM(imd.Invoice_DType)) = 'RD' THEN
'233'
END
And this symptom:
CIS.InvoiceStatus gets updated with NULL
The obvious conclusion is neither WHEN condition is met, and therefore the result of the CASE expression is NULL.
Maybe you wanted this, which will preserve the original value in that situation:
cis.InvoiceStatus =
CASE
WHEN RTRIM(LTRIM(imd.[Invoice_PBK])) IS NULL THEN
'HELLO'
WHEN RTRIM(LTRIM(imd.Invoice_DType)) = 'RD' THEN
'233'
ELSE
cis.InvoiceStatus
END
Or maybe you wanted this, to also match an empty string value:
cis.InvoiceStatus =
CASE
WHEN NULLIF(RTRIM(LTRIM(imd.[Invoice_PBK])),'') IS NULL THEN
'HELLO'
WHEN RTRIM(LTRIM(imd.Invoice_DType)) = 'RD' THEN
'233'
END
It's also worth pointing out the two WHEN conditions are looking at two different columns.
Finally, it may be worth a data clean-up project here. Needing to do an LTRIM() will break any chance of using indexes on those fields (RTRIM() is slightly less bad), and index use cuts to the core of database performance.

Simple where clause condition involving NULL

I have a query that needs to exclude both Null and Blank Values, but for some reason I can't work out this simple logic in my head.
Currently, my code looks like this:
WHERE [Imported] = 0 AND ([Value] IS NOT NULL **OR** [Value] != '')
However, should my code look like this to exclude both condition:
WHERE [Imported] = 0 AND ([Value] IS NOT NULL **AND** [Value] != '')
For some reason I just can't sort this in my head properly. To me it seems like both would work.
In your question you wrote the following:
have a query that needs to exclude both Null and Blank Values
So you have answered yourself, the AND query is the right query:
WHERE [Imported] = 0 AND ([Value] IS NOT NULL AND [Value] != '')
Here is an extract from the ANSI SQL Draft 2003 that I borrowed from this question:
6.3.3.3 Rule evaluation order
[...]
Where the precedence is not determined by the Formats or by
parentheses, effective evaluation of expressions is generally
performed from left to right. However, it is
implementation-dependent whether expressions are actually evaluated left to right, particularly when operands or operators might
cause conditions to be raised or if the results of the expressions
can be determined without completely evaluating all parts of the
expression.
You don't specify what kind of database system you are using but the concept of short-circuit evaluation which is explained in the previous paragraph applies to all major SQL versions (T-SQL, PL/SQL etc...)
Short-circuit evaluation means that once an expression has been successfully evaluated it will immediately exit the condition and stop evaluating the other expressions, applied to your question:
If value is null you want to exit the condition, that's why it should be the first expression (from left to right) but if it isn't null it should also not be empty, so it has to be NOT NULL and NOT EMPTY.
This case is a bit tricky because you cannot have a non empty string that is also null so the OR condition will also work but you will do an extra evaluation because short-circuit evaluation will never exit in the first expression:
Value is null but we would always need to check that value is also not an empty string (value is null or value is not an empty string).
In this second case, you may get an exception because the expression [Value] != '' may be checked on a null object.
So I think AND is the right answer. Hope it helps.
If the value was numeric and you didn't want either 1 or 2, you would write that condition as
... WHERE value != 1 AND value != 2
An OR would always be true in this case. For instance a value of 1 would return true for the check against 2 - and then the OR-check would return true, as at least one of the conditions evaluated to true.
When yu also want to check against null values, the situation is a bit more complicated. A check against a null value always fails: value != '' is false when value is null. That is why there is a special IS NULL or IS NOT NULL test.

Oracle : IN and OR

I've a scenrio which process many data in Oracle database. In some cases, the variable Sec_email will contain many values and in some cases Sec_email will contain null or ' '.
so can please any one tell me how to write a query for this?
I tried with
(C.SECONDARY_EMAIL IN ('?,?') OR '' = '' )
where C is the Client table.
When I use this i get the count as 0.
You can perform a not null check before the IN comparison like
Sec_email is not null and C.SECONDARY_EMAIL IN (...
One obvious problem is that Oracle (by default) treats empty strings as NULL. So: '' = '' is the same as NULL = NULL, which is never true.
Arrgh.
In any case, you are probably constructing the query, so use is null instead:
(C.SECONDARY_EMAIL IN ('?,?') OR '' IS NULL
I think the real problem, though, is the first comparison. The IN list has one element with a constant, not two (but perhaps that is your intention). If you want to put a variable number of values for comparison, one method uses regular expressions. For instance:
C.SECONDARY_EMAIL REGEXP_LIKE '^val1|val2|val3$' or '' IS NULL
If you would like to get a list of values when some of them is null you should use:
("some other conditions" OR C.SECONDARY_EMAIL IS NULL)
The question is if it is not null and not ' ' value what you are expecting, if it should be some king of
pattern you should use regular expression:
regexp_like(C.SECONDARY_EMAIL, '^(.+?[,]+?)+$')
Also, if you have a few conditions in where clause use should use brackets to group you conditions null check and another one.
All conditions i this case will be divided by OR.
(C.SECONDARY_EMAIL IS NULL OR regexp_like(C.SECONDARY_EMAIL, '^(.+?[,]+?)+$'))
or
(C.SECONDARY_EMAIL IS NULL OR regexp_like(C.SECONDARY_EMAIL, '^(.+?[,]+?)+$')
OR C.SECONDARY_EMAIL = ' ')

What applications are there for NULLIF()?

I just had a trivial but genuine use for NULLIF(), for the first time in my career in SQL. Is it a widely used tool I've just ignored, or a nearly-forgotten quirk of SQL? It's present in all major database implementations.
If anyone needs a refresher, NULLIF(A, B) returns the first value, unless it's equal to the second in which case it returns NULL. It is equivalent to this CASE statement:
CASE WHEN A <> B OR B IS NULL THEN A END
or, in C-style syntax:
A == B || A == null ? null : A
So far the only non-trivial example I've found is to exclude a specific value from an aggregate function:
SELECT COUNT(NULLIF(Comment, 'Downvoted'))
This has the limitation of only allowing one to skip a single value; a CASE, while more verbose, would let you use an expression.
For the record, the use I found was to suppress the value of a "most recent change" column if it was equal to the first change:
SELECT Record, FirstChange, NULLIF(LatestChange, FirstChange) AS LatestChange
This was useful only in that it reduced visual clutter for human consumers.
I rather think that
NULLIF(A, B)
is syntactic sugar for
CASE WHEN A = B THEN NULL ELSE A END
But you are correct: it is mere syntactic sugar to aid the human reader.
I often use it where I need to avoid the Division by Zero exception:
SELECT
COALESCE(Expression1 / NULLIF(Expression2, 0), 0) AS Result
FROM …
Three years later, I found a material use for NULLIF: using NULLIF(Field, '') translates empty strings into NULL, for equivalence with Oracle's peculiar idea about what "NULL" represents.
NULLIF is handy when you're working with legacy data that contains a mixture of null values and empty strings.
Example:
SELECT(COALESCE(NULLIF(firstColumn, ''), secondColumn) FROM table WHERE this = that
SUM and COUNT have the behavior of turning nulls into zeros. I could see NULLIF being handy when you want to undo that behavior. If fact this came up in a recent answer I provided. If I had remembered NULLIF I probably would have written the following
SELECT student,
NULLIF(coursecount,0) as courseCount
FROM (SELECT cs.student,
COUNT(os.course) coursecount
FROM #CURRENTSCHOOL cs
LEFT JOIN #OTHERSCHOOLS os
ON cs.student = os.student
AND cs.school <> os.school
GROUP BY cs.student) t

why is null not equal to null false

I was reading this article:
Get null == null in SQL
And the consensus is that when trying to test equality between two (nullable) sql columns, the right approach is:
where ((A=B) OR (A IS NULL AND B IS NULL))
When A and B are NULL, (A=B) still returns FALSE, since NULL is not equal to NULL. That is why the extra check is required.
What about when testing inequalities? Following from the above discussion, it made me think that to test inequality I would need to do something like:
WHERE ((A <> B) OR (A IS NOT NULL AND B IS NULL) OR (A IS NULL AND B IS NOT NULL))
However, I noticed that that is not necessary (at least not on informix 11.5), and I can just do:
where (A<>B)
If A and B are NULL, this returns FALSE. If NULL is not equal to NULL, then shouldn't this return TRUE?
EDIT
These are all good answers, but I think my question was a little vague. Allow me to rephrase:
Given that either A or B can be NULL, is it enough to check their inequality with
where (A<>B)
Or do I need to explicitly check it like this:
WHERE ((A <> B) OR (A IS NOT NULL AND B IS NULL) OR (A IS NULL AND B IS NOT NULL))
REFER to this thread for the answer to this question.
Because that behavior follows established ternary logic where NULL is considered an unknown value.
If you think of NULL as unknown, it becomes much more intuitive:
Is unknown a equal to unknown b? There's no way to know, so: unknown.
relational expressions involving NULL actually yield NULL again
edit
here, <> stands for arbitrary binary operator, NULL is the SQL placeholder, and value is any value (NULL is not a value):
NULL <> value -> NULL
NULL <> NULL -> NULL
the logic is: NULL means "no value" or "unknown value", and thus any comparison with any actual value makes no sense.
is X = 42 true, false, or unknown, given that you don't know what value (if any) X holds? SQL says it's unknown. is X = Y true, false, or unknown, given that both are unknown? SQL says the result is unknown. and it says so for any binary relational operation, which is only logical (even if having NULLs in the model is not in the first place).
SQL also provides two unary postfix operators, IS NULL and IS NOT NULL, these return TRUE or FALSE according to their operand.
NULL IS NULL -> TRUE
NULL IS NOT NULL -> FALSE
All comparisons involving null are undefined, and evaluate to false. This idea, which is what prevents null being evaluated as equivalent to null, also prevents null being evaluated as NOT equivalent to null.
The short answer is... NULLs are weird, they don't really behave like you'd expect.
Here's a great paper on how NULLs work in SQL. I think it will help improve your understanding of the topic. I think the sections on handling null values in expressions will be especially useful for you.
http://www.oracle.com/technology/oramag/oracle/05-jul/o45sql.html
The default (ANSI) behaviour of nulls within an expression will result in a null (there are enough other answers with the cases of that).
There are however some edge cases and caveats that I would place when dealing with MS Sql Server that are not being listed.
Nulls within a statement that is grouping values together will be considered equal and be grouped together.
Null values within a statement that is ordering them will be considered equal.
Null values selected within a statement that is using distinct will be considered equal when evaluating the distinct aspect of the query
It is possible in SQL Server to override the expression logic regarding the specific Null = Null test, using the SET ANSI_NULLS OFF, which will then give you equality between null values - this is not a recommended move, but does exist.
SET ANSI_NULLS OFF
select result =
case
when null=null then 'eq'
else 'ne'
end
SET ANSI_NULLS ON
select result =
case
when null=null then 'eq'
else 'ne'
end
Here is a Quick Fix
ISNULL(A,0)=ISNULL(B,0)
0 can be changed to something that can never happen in your data
"Is unknown a equal to unknown b? There's no way to know, so: unknown."
The question was : why does the comparison yield FALSE ?
Given three-valued logic, it would indeed be sensible for the comparison to yield UNKNOWN (not FALSE). But SQL does yield FALSE, and not UNKNOWN.
One of the myriads of perversities in the SQL language.
Furthermore, the following must be taken into account :
If "unkown" is a logical value in ternary logic, then it ought to be the case that an equality comparison between two logical values that both happen to be (the value for) "unknown", then that comparison ought to yield TRUE.
If the logical value is itself unknown, then obviously that cannot be represented by putting the value "unknown" there, because that would imply that the logical value is known (to be "unknown"). That is, a.o., how relational theory proves that implementing 3-valued logic raises the requirement for a 4-valued logic, that a 4 valued logic leads to the need for a 5-valued logic, etc. etc. ad infinitum.