Negating SQL WHERE condition with nullable fields - sql

I have a filter where user can select operations like is, contains, etc...
bubu is xoxo translates into WHERE lower(bubu) = 'xoxo' SQL WHERE condition.
bubu contains xoxo translates into WHERE bubu ILIKE '%xoxo%' SQL WHERE condition.
Now I have added the negative variants - is not, does not contain, etc.. I do not want to rewrite the WHERE conditions from scratch, so I prepend NOT to the already existing ones:
bubu is not xoxo translates into WHERE NOT lower(bubu) = 'xoxo' SQL WHERE condition.
bubu does not contain xoxo translates into WHERE NOT bubu ILIKE '%xoxo%' SQL WHERE condition.
However, there is a problem. If bubu is a nullable field and it actually has NULL in it, then the negative WHERE condition does not pick it, although from the human perspective (as opposed to SQL) the NULL value should satisfy the bubu is not xoxo filter.
I solve this problem by modifying the original positive WHERE condition like this:
bubu is xoxo translates into WHERE (lower(bubu) = 'xoxo' AND bubu IS NOT NULL) SQL WHERE condition.
Then, the negation yields:
bubu is not xoxo translates into WHERE NOT (lower(bubu) = 'xoxo' AND bubu IS NOT NULL) SQL WHERE condition.
And this time the NULL values are picked up correctly. The same problem is with the contains filter.
Is there a more elegant solution to resolve this inconsistency between how humans treat NULL and how SQL does it?
I am using PostgreSQL 9.2 and I do not mind having a solution specific to this database.
P.S.
Please, note that I want the negative expression to be of the form NOT positive.

I think you should be able to get away with using COALESCE to convert your NULLs to empty strings:
-- These skip skips NULLs
lower(coalesce(bubu, '')) = 'xoxo'
coalesce(bubu, '') ilike '%xo%'
-- These will find NULLs
not lower(coalesce(bubu, '')) = 'xoxo'
not coalesce(bubu, '') ilike '%xo%'
Of course, this sort of trickery will run into problems if you're searching for empty strings, in such cases you'll need a context-sensitive sentinel value so that you can intelligently choose something that cannot possibly match your search term.
Demo: http://sqlfiddle.com/#!12/8bbd2/3

For the = operator you can use the is [not] distinct from construct
WHERE (lower(bubu) is not distinct from 'xoxo')
http://www.postgresql.org/docs/9.2/static/functions-comparison.html

Related

Simple where clause condition involving NULL

I have a query that needs to exclude both Null and Blank Values, but for some reason I can't work out this simple logic in my head.
Currently, my code looks like this:
WHERE [Imported] = 0 AND ([Value] IS NOT NULL **OR** [Value] != '')
However, should my code look like this to exclude both condition:
WHERE [Imported] = 0 AND ([Value] IS NOT NULL **AND** [Value] != '')
For some reason I just can't sort this in my head properly. To me it seems like both would work.
In your question you wrote the following:
have a query that needs to exclude both Null and Blank Values
So you have answered yourself, the AND query is the right query:
WHERE [Imported] = 0 AND ([Value] IS NOT NULL AND [Value] != '')
Here is an extract from the ANSI SQL Draft 2003 that I borrowed from this question:
6.3.3.3 Rule evaluation order
[...]
Where the precedence is not determined by the Formats or by
parentheses, effective evaluation of expressions is generally
performed from left to right. However, it is
implementation-dependent whether expressions are actually evaluated left to right, particularly when operands or operators might
cause conditions to be raised or if the results of the expressions
can be determined without completely evaluating all parts of the
expression.
You don't specify what kind of database system you are using but the concept of short-circuit evaluation which is explained in the previous paragraph applies to all major SQL versions (T-SQL, PL/SQL etc...)
Short-circuit evaluation means that once an expression has been successfully evaluated it will immediately exit the condition and stop evaluating the other expressions, applied to your question:
If value is null you want to exit the condition, that's why it should be the first expression (from left to right) but if it isn't null it should also not be empty, so it has to be NOT NULL and NOT EMPTY.
This case is a bit tricky because you cannot have a non empty string that is also null so the OR condition will also work but you will do an extra evaluation because short-circuit evaluation will never exit in the first expression:
Value is null but we would always need to check that value is also not an empty string (value is null or value is not an empty string).
In this second case, you may get an exception because the expression [Value] != '' may be checked on a null object.
So I think AND is the right answer. Hope it helps.
If the value was numeric and you didn't want either 1 or 2, you would write that condition as
... WHERE value != 1 AND value != 2
An OR would always be true in this case. For instance a value of 1 would return true for the check against 2 - and then the OR-check would return true, as at least one of the conditions evaluated to true.
When yu also want to check against null values, the situation is a bit more complicated. A check against a null value always fails: value != '' is false when value is null. That is why there is a special IS NULL or IS NOT NULL test.

Oracle : IN and OR

I've a scenrio which process many data in Oracle database. In some cases, the variable Sec_email will contain many values and in some cases Sec_email will contain null or ' '.
so can please any one tell me how to write a query for this?
I tried with
(C.SECONDARY_EMAIL IN ('?,?') OR '' = '' )
where C is the Client table.
When I use this i get the count as 0.
You can perform a not null check before the IN comparison like
Sec_email is not null and C.SECONDARY_EMAIL IN (...
One obvious problem is that Oracle (by default) treats empty strings as NULL. So: '' = '' is the same as NULL = NULL, which is never true.
Arrgh.
In any case, you are probably constructing the query, so use is null instead:
(C.SECONDARY_EMAIL IN ('?,?') OR '' IS NULL
I think the real problem, though, is the first comparison. The IN list has one element with a constant, not two (but perhaps that is your intention). If you want to put a variable number of values for comparison, one method uses regular expressions. For instance:
C.SECONDARY_EMAIL REGEXP_LIKE '^val1|val2|val3$' or '' IS NULL
If you would like to get a list of values when some of them is null you should use:
("some other conditions" OR C.SECONDARY_EMAIL IS NULL)
The question is if it is not null and not ' ' value what you are expecting, if it should be some king of
pattern you should use regular expression:
regexp_like(C.SECONDARY_EMAIL, '^(.+?[,]+?)+$')
Also, if you have a few conditions in where clause use should use brackets to group you conditions null check and another one.
All conditions i this case will be divided by OR.
(C.SECONDARY_EMAIL IS NULL OR regexp_like(C.SECONDARY_EMAIL, '^(.+?[,]+?)+$'))
or
(C.SECONDARY_EMAIL IS NULL OR regexp_like(C.SECONDARY_EMAIL, '^(.+?[,]+?)+$')
OR C.SECONDARY_EMAIL = ' ')

is null vs. equals null

I have a rough understanding of why = null in SQL and is null are not the same, from questions like this one.
But then, why is
update table
set column = null
a valid SQL statement (at least in Oracle)?
From that answer, I know that null can be seen as somewhat "UNKNOWN" and therefore and sql-statement with where column = null "should" return all rows, because the value of column is no longer an an unknown value. I set it to null explicitly ;)
Where am I wrong/ do not understand?
So, if my question is maybe unclear:
Why is = null valid in the set clause, but not in the where clause of an SQL statement?
SQL doesn't have different graphical signs for assignment and equality operators like languages such as c or java have. In such languages, = is the assignment operator, while == is the equality operator. In SQL, = is used for both cases, and interpreted contextually.
In the where clause, = acts as the equality operator (similar to == in C). I.e., it checks if both operands are equal, and returns true if they are. As you mentioned, null is not a value - it's the lack of a value. Therefore, it cannot be equal to any other value.
In the set clause, = acts as the assignment operator (similar to = in C). I.e., it sets the left operand (a column name) with the value of the right operand. This is a perfectly legal statement - you are declaring that you do not know the value of a certain column.
They completely different operators, even if you write them the same way.
In a where clause, is a comparsion operator
In a set, is an assignment operator
The assigment operator allosw to "clear" the data in the column and set it to the "null value" .
In the set clause, you're assigning the value to an unknown, as defined by NULL. In the where clause, you're querying for an unknown. When you don't know what an unknown is, you can't expect any results for it.

What applications are there for NULLIF()?

I just had a trivial but genuine use for NULLIF(), for the first time in my career in SQL. Is it a widely used tool I've just ignored, or a nearly-forgotten quirk of SQL? It's present in all major database implementations.
If anyone needs a refresher, NULLIF(A, B) returns the first value, unless it's equal to the second in which case it returns NULL. It is equivalent to this CASE statement:
CASE WHEN A <> B OR B IS NULL THEN A END
or, in C-style syntax:
A == B || A == null ? null : A
So far the only non-trivial example I've found is to exclude a specific value from an aggregate function:
SELECT COUNT(NULLIF(Comment, 'Downvoted'))
This has the limitation of only allowing one to skip a single value; a CASE, while more verbose, would let you use an expression.
For the record, the use I found was to suppress the value of a "most recent change" column if it was equal to the first change:
SELECT Record, FirstChange, NULLIF(LatestChange, FirstChange) AS LatestChange
This was useful only in that it reduced visual clutter for human consumers.
I rather think that
NULLIF(A, B)
is syntactic sugar for
CASE WHEN A = B THEN NULL ELSE A END
But you are correct: it is mere syntactic sugar to aid the human reader.
I often use it where I need to avoid the Division by Zero exception:
SELECT
COALESCE(Expression1 / NULLIF(Expression2, 0), 0) AS Result
FROM …
Three years later, I found a material use for NULLIF: using NULLIF(Field, '') translates empty strings into NULL, for equivalence with Oracle's peculiar idea about what "NULL" represents.
NULLIF is handy when you're working with legacy data that contains a mixture of null values and empty strings.
Example:
SELECT(COALESCE(NULLIF(firstColumn, ''), secondColumn) FROM table WHERE this = that
SUM and COUNT have the behavior of turning nulls into zeros. I could see NULLIF being handy when you want to undo that behavior. If fact this came up in a recent answer I provided. If I had remembered NULLIF I probably would have written the following
SELECT student,
NULLIF(coursecount,0) as courseCount
FROM (SELECT cs.student,
COUNT(os.course) coursecount
FROM #CURRENTSCHOOL cs
LEFT JOIN #OTHERSCHOOLS os
ON cs.student = os.student
AND cs.school <> os.school
GROUP BY cs.student) t

What does this SQL Query mean?

I have the following SQL query:
select AuditStatusId
from dbo.ABC_AuditStatus
where coalesce(AuditFrequency, 0) <> 0
I'm struggling a bit to understand it. It looks pretty simple, and I know what the coalesce operator does (more or less), but dont' seem to get the MEANING.
Without knowing anymore information except the query above, what do you think it means?
select AuditStatusId
from dbo.ABC_AuditStatus
where AuditFrequency <> 0 and AuditFrequency is not null
Note that the use of Coalesce means that it will not be possible to use an index properly to satisfy this query.
COALESCE is the ANSI standard function to deal with NULL values, by returning the first non-NULL value based on the comma delimited list. This:
WHERE COALESCE(AuditFrequency, 0) != 0
..means that if the AuditFrequency column is NULL, convert the value to be zero instead. Otherwise, the AuditFrequency value is returned.
Since the comparison is to not return rows where the AuditFrequency column value is zero, rows where AuditFrequency is NULL will also be ignored by the query.
It looks like it's designed to detect a null AuditFrequency as zero and thus hide those rows.
From what I can see, it checks for fields that aren't 0 or null.
I think it is more accurately described by this:
select AuditStatusId
from dbo.ABC_AuditStatus
where (AuditFrequency IS NOT NULL AND AuditFrequency != 0) OR 0 != 0
I'll admit the last part will never do anything and maybe i'm just being pedantic but to me this more accurately describes your query.
The idea is that it is desireable to express a single search condition using a single expression but it's merely style, a question of taste:
One expression:
WHERE age = COALESCE(#parameter_value, age);
Two expressions:
WHERE (
age = #parameter_value
OR
#parameter_value IS NULL
);
Here's another example:
One expression:
WHERE age BETWEEN 18 AND 65;
Two expressions
WHERE (
age >= 18
AND
age <= 65
);
Personally, I have a strong personal perference for single expressions and find them easier to read... if I am familiar with the pattern used ;) Whether they perform differently is another matter...