Does the concatenation of NULL + 'something' always yield NULL? Why? - sql

I want to apply an OUTER APPLY computation on every row of a set. This computation concatenates a number of string fields into one but sometimes theses fields are null or filled with empty strings. The concatenation uses '-' as a delimiter between fields, so when they are empty ('') the result is '----' instead of a NULL, the result I'd like to get.
Before doing this computation I need to check the value of these fields.
How would you do it?
I thought about using NULLIF and it seems that it behaves as I expect, but I don't know why.
Will the concatenation of NULL +'something' always be NULL? Why?
SELECT
string_1,
string_2,
string_3,
string_4,
string_5,
string_concat,
FROM Table1
OUTER APPLY(VALUES(NULLIF(string_1,'')+'-'+NULLIF(string_2,'')+'-'+NULLIF(string_3,'')+'-'+NULLIF(string_4,'')+'-'+NULLIF(string_5,''))) v1(string_concat)
Table1 doesn't have any index and I can't implement any.
Is this code better in terms of performance than doing a CASE in the SELECT?
I like it because the code looks more clean, though.

Does the concatenation of NULL + NULL + 'something' always be NULL?
Why?
This depends on SET CONCAT_NULL_YIELDS_NULL setting. If it is ON (the default) then yes concatenating a NULL with the + operator will always yield NULL
I'd probably do it like this though
SELECT string_1,
string_2,
string_3,
string_4,
string_5,
string_concat,
FROM Table1
CROSS APPLY(VALUES (NULLIF(CONCAT(string_1, '-', string_2, '-', string_3, '-', string_4, '-', string_5), '----')) ) v1(string_concat)
so only one NULLIF is needed - on the CONCAT result.
As the VALUES clause always returns exactly one row you can use CROSS APPLY

Will the concatenation of NULL +'something' always be NULL? Why?
Aside the technical explanation that has already been given, you may be interested in the logical motivation on why it is like that.
NULL actually means you don't know something or that something is not applicable and therefore not usable in that context. So anything you do with something you don't know, or doesn't make sense, will yield an "I still don't know/doesn't make sense" result.
For example: let's say a friend of yours has 2 cars, and another friends of yours has some cars but you don't know how many. If I would ask you how many cars your friends have in total, that only answer you could give me is that you don't really know.
Hope this helps to understand. The issue with NULL is much bigger than this as it involves 3-valued-logic instead of the more common and well understood 2-valued-logic. Here's more detail on the issue: http://www.dbdebunk.com/2017/04/null-value-is-contradiction-in-terms.html

Related

Comparing two empty Strings in Oracle SQL

Hi today I have met with weird situation. I had a where clause where was condition which returns String and I wanted to check if it's empty or not. And when it returns empty string Oracle still treat it like a different Strings. So I went further and prepared simple queries:
select 1 from dual where 1 = 1;
returns: 1
select 1 from dual where 'A' = 'A';
returns: 1
And now what I cannot understand:
select 1 from dual where '' = '';
No result.
Even if I check if they are different there is still no result.
select 1 from dual where '' != '';
No result.
Can someone explain it for me ?
Oracle treats empty strings as NULL. It's a gotcha. Make a note of it and hope it never bites you in the butt in production.
The reason is as #Captain Kenpachi explained. If want to compare two strings (or other types that are the same) and want to be tolerant of NULLs (or empty string in Oracle as it treats it as the same) then you need to involve an IS test.
You could try the common cheat of using a rogue value that will never be used but Murphy's Law dictates that one day someone will. This technique also has the drawback that the rogue value should match the type of the thing you are comparing i.e. comparing strings you need a rogue string while comparing dates you need a rouge date. This also means you can't cut-and-paste it liberally without applying a little thought. Example:
WHERE NVL(col1,'MyRougeValue')=NVL(col2,'MyRougeValue')
The standard version is to explicitly test for NULLs
WHERE (col1=col2 OR (col1 IS NULL AND col2 IS NULL))
The opposite becomes WHERE NOT(col1=col2 OR (col1 IS NULL AND col2 IS NULL))
I have seen the a long winded opposite version (as seen in Toad's data compare tool)
WHERE (col1<>col2 OR (col1 IS NULL AND col2 IS NOT NULL) OR (col1 IS NOT NULL AND col2 IS NULL))
Oracle does have a handy DECODE function that is basically is IF a IS b THEN c ELSE d so equality is WHERE DECODE(col1,col2,1,0)=1 and the opposite is WHERE DECODE(col1,col2,1,0)=0. You may find this a little slower than the explicit IS test. It is proprietary to Oracle but helps make up for the empty string problem.

Can 2 character length variables cause SQL injection vulnerability?

I am taking a text input from the user, then converting it into 2 character length strings (2-Grams)
For example
RX480 becomes
"rx","x4","48","80"
Now if I directly query server like below can they somehow make SQL injection?
select *
from myTable
where myVariable in ('rx', 'x4', '48', '80')
SQL injection is not a matter of length of anything.
It happens when someone adds code to your existing query. They do this by sending in the malicious extra code as a form submission (or something). When your SQL code executes, it doesn't realize that there are more than one thing to do. It just executes what it's told.
You could start with a simple query like:
select *
from thisTable
where something=$something
So you could end up with a query that looks like:
select *
from thisTable
where something=; DROP TABLE employees;
This is an odd example. But it does more or less show why it's dangerous. The first query will fail, but who cares? The second one will actually work. And if you have a table named "employees", well, you don't anymore.
Two characters in this case are sufficient to make an error in query and possibly reveal some information about it. For example try to use string ')480 and watch how your application will behave.
Although not much of an answer, this really doesn't fit in a comment.
Your code scans a table checking to see if a column value matches any pair of consecutive characters from a user supplied string. Expressed in another way:
declare #SearchString as VarChar(10) = 'Voot';
select Buffer, case
when DataLength( Buffer ) != 2 then 0 -- NB: Len() right trims.
when PatIndex( '%' + Buffer + '%', #SearchString ) != 0 then 1
else 0 end as Match
from ( values
( 'vo' ), ( 'go' ), ( 'n ' ), ( 'po' ), ( 'et' ), ( 'ry' ),
( 'oo' ) ) as Samples( Buffer );
In this case you could simply pass the value of #SearchString as a parameter and avoid the issue of the IN clause.
Alternatively, the character pairs could be passed as a table parameter and used with IN: where Buffer in ( select CharacterPair from #CharacterPairs ).
As far as SQL injection goes, limiting the text to character pairs does preclude adding complete statements. It does, as others have noted, allow for corrupting the query and causing it to fail. That, in my mind, constitutes a problem.
I'm still trying to imagine a use-case for this rather odd pattern matching. It won't match a column value longer (or shorter) than two characters against a search string.
There definitely should be a canonical answer to all these innumerable "if I have [some special kind of data treatment] will be my query still vulnerable?" questions.
First of all you should ask yourself - why you are looking to buy yourself such an indulgence? What is the reason? Why do you want add an exception to your data processing? Why separate your data into the sheep and the goats, telling yourself "this data is "safe", I won't process it properly and that data is unsafe, I'll have to do something?
The only reason why such a question could even appear is your application architecture. Or, rather, lack of architecture. Because only in spaghetti code, where user input is added directly to the query, such a question can be ever occur. Otherwise, your database layer should be able to process any kind of data, being totally ignorant of its nature, origin or alleged "safety".

Negating SQL WHERE condition with nullable fields

I have a filter where user can select operations like is, contains, etc...
bubu is xoxo translates into WHERE lower(bubu) = 'xoxo' SQL WHERE condition.
bubu contains xoxo translates into WHERE bubu ILIKE '%xoxo%' SQL WHERE condition.
Now I have added the negative variants - is not, does not contain, etc.. I do not want to rewrite the WHERE conditions from scratch, so I prepend NOT to the already existing ones:
bubu is not xoxo translates into WHERE NOT lower(bubu) = 'xoxo' SQL WHERE condition.
bubu does not contain xoxo translates into WHERE NOT bubu ILIKE '%xoxo%' SQL WHERE condition.
However, there is a problem. If bubu is a nullable field and it actually has NULL in it, then the negative WHERE condition does not pick it, although from the human perspective (as opposed to SQL) the NULL value should satisfy the bubu is not xoxo filter.
I solve this problem by modifying the original positive WHERE condition like this:
bubu is xoxo translates into WHERE (lower(bubu) = 'xoxo' AND bubu IS NOT NULL) SQL WHERE condition.
Then, the negation yields:
bubu is not xoxo translates into WHERE NOT (lower(bubu) = 'xoxo' AND bubu IS NOT NULL) SQL WHERE condition.
And this time the NULL values are picked up correctly. The same problem is with the contains filter.
Is there a more elegant solution to resolve this inconsistency between how humans treat NULL and how SQL does it?
I am using PostgreSQL 9.2 and I do not mind having a solution specific to this database.
P.S.
Please, note that I want the negative expression to be of the form NOT positive.
I think you should be able to get away with using COALESCE to convert your NULLs to empty strings:
-- These skip skips NULLs
lower(coalesce(bubu, '')) = 'xoxo'
coalesce(bubu, '') ilike '%xo%'
-- These will find NULLs
not lower(coalesce(bubu, '')) = 'xoxo'
not coalesce(bubu, '') ilike '%xo%'
Of course, this sort of trickery will run into problems if you're searching for empty strings, in such cases you'll need a context-sensitive sentinel value so that you can intelligently choose something that cannot possibly match your search term.
Demo: http://sqlfiddle.com/#!12/8bbd2/3
For the = operator you can use the is [not] distinct from construct
WHERE (lower(bubu) is not distinct from 'xoxo')
http://www.postgresql.org/docs/9.2/static/functions-comparison.html

What applications are there for NULLIF()?

I just had a trivial but genuine use for NULLIF(), for the first time in my career in SQL. Is it a widely used tool I've just ignored, or a nearly-forgotten quirk of SQL? It's present in all major database implementations.
If anyone needs a refresher, NULLIF(A, B) returns the first value, unless it's equal to the second in which case it returns NULL. It is equivalent to this CASE statement:
CASE WHEN A <> B OR B IS NULL THEN A END
or, in C-style syntax:
A == B || A == null ? null : A
So far the only non-trivial example I've found is to exclude a specific value from an aggregate function:
SELECT COUNT(NULLIF(Comment, 'Downvoted'))
This has the limitation of only allowing one to skip a single value; a CASE, while more verbose, would let you use an expression.
For the record, the use I found was to suppress the value of a "most recent change" column if it was equal to the first change:
SELECT Record, FirstChange, NULLIF(LatestChange, FirstChange) AS LatestChange
This was useful only in that it reduced visual clutter for human consumers.
I rather think that
NULLIF(A, B)
is syntactic sugar for
CASE WHEN A = B THEN NULL ELSE A END
But you are correct: it is mere syntactic sugar to aid the human reader.
I often use it where I need to avoid the Division by Zero exception:
SELECT
COALESCE(Expression1 / NULLIF(Expression2, 0), 0) AS Result
FROM …
Three years later, I found a material use for NULLIF: using NULLIF(Field, '') translates empty strings into NULL, for equivalence with Oracle's peculiar idea about what "NULL" represents.
NULLIF is handy when you're working with legacy data that contains a mixture of null values and empty strings.
Example:
SELECT(COALESCE(NULLIF(firstColumn, ''), secondColumn) FROM table WHERE this = that
SUM and COUNT have the behavior of turning nulls into zeros. I could see NULLIF being handy when you want to undo that behavior. If fact this came up in a recent answer I provided. If I had remembered NULLIF I probably would have written the following
SELECT student,
NULLIF(coursecount,0) as courseCount
FROM (SELECT cs.student,
COUNT(os.course) coursecount
FROM #CURRENTSCHOOL cs
LEFT JOIN #OTHERSCHOOLS os
ON cs.student = os.student
AND cs.school <> os.school
GROUP BY cs.student) t

What does this SQL Query mean?

I have the following SQL query:
select AuditStatusId
from dbo.ABC_AuditStatus
where coalesce(AuditFrequency, 0) <> 0
I'm struggling a bit to understand it. It looks pretty simple, and I know what the coalesce operator does (more or less), but dont' seem to get the MEANING.
Without knowing anymore information except the query above, what do you think it means?
select AuditStatusId
from dbo.ABC_AuditStatus
where AuditFrequency <> 0 and AuditFrequency is not null
Note that the use of Coalesce means that it will not be possible to use an index properly to satisfy this query.
COALESCE is the ANSI standard function to deal with NULL values, by returning the first non-NULL value based on the comma delimited list. This:
WHERE COALESCE(AuditFrequency, 0) != 0
..means that if the AuditFrequency column is NULL, convert the value to be zero instead. Otherwise, the AuditFrequency value is returned.
Since the comparison is to not return rows where the AuditFrequency column value is zero, rows where AuditFrequency is NULL will also be ignored by the query.
It looks like it's designed to detect a null AuditFrequency as zero and thus hide those rows.
From what I can see, it checks for fields that aren't 0 or null.
I think it is more accurately described by this:
select AuditStatusId
from dbo.ABC_AuditStatus
where (AuditFrequency IS NOT NULL AND AuditFrequency != 0) OR 0 != 0
I'll admit the last part will never do anything and maybe i'm just being pedantic but to me this more accurately describes your query.
The idea is that it is desireable to express a single search condition using a single expression but it's merely style, a question of taste:
One expression:
WHERE age = COALESCE(#parameter_value, age);
Two expressions:
WHERE (
age = #parameter_value
OR
#parameter_value IS NULL
);
Here's another example:
One expression:
WHERE age BETWEEN 18 AND 65;
Two expressions
WHERE (
age >= 18
AND
age <= 65
);
Personally, I have a strong personal perference for single expressions and find them easier to read... if I am familiar with the pattern used ;) Whether they perform differently is another matter...