postgres: How to protect conditional expressions from null values - sql

After years of using Postgresql, I still don't know if there is an established best-practice on how to protect conditional expressions from null values of variables, given that SQL query planners have full authority to apply or ignore the most frequently used idiom to protect from null values: "var is null or var=0".
Allegedly, using the 'case when ... end' grammar solves any ambiguity, but also reduces maintainability, since it obscures with lots of words a simple procedure.
Thanks, in advance.

I think you have a missconception arising from comparing SQL to Java (or C, C++, or any language dealing with references or pointers).
You don't need to protect conditional expressions from NULL values when working with SQL.
In SQL, you do not have (hidden) pointers (or references) to objects that should be tested against NULL or otherwise they cannot be dereferenced. In SQL, every expression produces a certain value of a certain type. This value can be NULL (also called UNKNOWN).
If your var is NULL, then var = 0 will evaluate to NULL (unknown = 0 gives back unknown). Then var IS NULL (unknown is unknown) will evaluate to TRUE. And, according to three-value logic, TRUE or UNKNOWN evaluates to TRUE. No matter which is the order of evaluation, the result is always the same.
You can check it just by evaluating:
SELECT
/* var */ NULL = 0 as null_equals_zero,
/* var */ NULL IS NULL as null_is_null,
TRUE or NULL AS true_or_null,
(NULL = 0) OR (NULL IS NULL) AS your_case_when_var_is_null,
(NULL IS NULL) OR (NULL = 0) AS the_same_reordered
;
Returns
null_equals_zero | null_is_null | true_or_null | your_case_when_var_is_null | the_same_reordered
:--------------- | :----------- | :----------- | :------------------------- | :-----------------
null | t | t | t | t
dbfiddle here
Given var = 0, NULL and 1 (<> 0); you'll get:
WITH vals(var) AS
(
VALUES
(0),
(NULL),
(1)
)
SELECT
var,
var = 0 OR var IS NULL AS var_equals_zero_or_var_is_null,
var IS NULL OR var = 0 AS var_is_null_or_var_equals_zero,
CASE WHEN var IS NULL then true
WHEN var = 0 then true
ELSE false
END AS the_same_with_protection
FROM
vals ;
var | var_equals_zero_or_var_is_null | var_is_null_or_var_equals_zero | the_same_with_protection
---: | :----------------------------- | :----------------------------- | :-----------------------
0 | t | t | t
null | t | t | t
1 | f | f | f
dbfiddle here
These are the basic truth tables for the different operators (NOT, AND, OR, IS NULL, XOR, IMPLIES) using three-valued logic, and checked with SQL:
WITH three_values(x) AS
(
VALUES
(NULL), (FALSE), (TRUE)
)
SELECT
a, b,
a = NULL AS a_equals_null, -- This is alwaus NULL
a IS NULL AS a_is_null, -- This is NEVER NULL
a OR b AS a_or_b, -- This is UNKNOWN if both are
a AND b AS a_and_b, -- This is UNKNOWN if any is
NOT a AS not_a, -- This is UNKNOWN if a is
(a OR b) AND NOT (a AND b) AS a_xor_b, -- Unknown when any is unknown
/* (a AND NOT b) OR (NOT a AND b) a_xor_b_v2, */
NOT a OR b AS a_implies_b -- Kleener and Priests logic
FROM
three_values AS x(a)
CROSS JOIN
three_values AS y(b);
This is the truth table:
a | b | a_equals_null | a_is_null | a_or_b | a_and_b | not_a | a_xor_b | a_implies_b
:--- | :--- | :------------ | :-------- | :----- | :------ | :---- | :------ | :----------
null | null | null | t | null | null | null | null | null
null | f | null | t | null | f | null | null | null
null | t | null | t | t | null | null | null | t
f | null | null | f | null | f | t | null | t
f | f | null | f | f | f | t | f | t
f | t | null | f | t | f | t | t | t
t | null | null | f | t | null | f | null | null
t | f | null | f | t | f | f | t | f
t | t | null | f | t | t | f | f | t
dbfiddle here

It seems I just asked a question which has forever been present. So, per de problem of NULL propagation in SQL logical expressions, with the added danger of the sql optimizer not honoring short-circuit constructs, and of evolving SQL standards, let me share what I've found so far:
Read wikipedia's article on SQL NULL PROPAGATION
Use coalesce() around any column name with possible null values, involved in any calculation within a sql statement (thanks Igor).
Also use 'is [not] distinct from' instead of '=' or '<>'

Related

Update of value in array of jsonb returns error"invalid input syntax for type json"

I have a column of type jsonb which contains json arrays of the form
[
{
"Id": 62497,
"Text": "BlaBla"
}
]
I'd like to update the Id to the value of a column word_id (type uuid) from a different table word.
I tried this
update inflection_copy
SET inflectionlinks = s.json_array
FROM (
SELECT jsonb_agg(
CASE
WHEN elems->>'Id' = (
SELECT word_copy.id::text
from word_copy
where word_copy.id::text = elems->>'Id'
) THEN jsonb_set(
elems,
'{Id}'::text [],
(
SELECT jsonb(word_copy.word_id::text)
from word_copy
where word_copy.id::text = elems->>'Id'
)
)
ELSE elems
END
) as json_array
FROM inflection_copy,
jsonb_array_elements(inflectionlinks) elems
) s;
Until now I always get the following error:
invalid input syntax for type json
DETAIL: Token "c66a4353" is invalid.
CONTEXT: JSON data, line 1: c66a4353...
The c66a4535 is part of one of the uuids of the word table. I don't understand why this is marked as invalid input.
EDIT:
To give an example of one of the uuids:
select to_jsonb(word_id::text) from word_copy limit(5);
returns
+----------------------------------------+
| to_jsonb |
|----------------------------------------|
| "078c979d-e479-4fce-b27c-d14087f467c2" |
| "ef288256-1599-4f0f-a932-aad85d666c9a" |
| "d1d95b60-623e-47cf-b770-de46b01042c5" |
| "f97464c6-b872-4be8-9d9d-83c0102fb26a" |
| "9bb19719-e014-4286-a2d1-4c0cf7f089fc" |
+----------------------------------------+
As requested the respective columns id and word_id from the word table:
+---------------------------------------------------+
| row |
|---------------------------------------------------|
| ('27733', '078c979d-e479-4fce-b27c-d14087f467c2') |
| ('72337', 'ef288256-1599-4f0f-a932-aad85d666c9a') |
| ('72340', 'd1d95b60-623e-47cf-b770-de46b01042c5') |
| ('27741', 'f97464c6-b872-4be8-9d9d-83c0102fb26a') |
| ('72338', '9bb19719-e014-4286-a2d1-4c0cf7f089fc') |
+---------------------------------------------------+
+----------------+----------+----------------------------+
| Column | Type | Modifiers |
|----------------+----------+----------------------------|
| id | bigint | |
| value | text | |
| homonymnumber | smallint | |
| pronounciation | text | |
| audio | text | |
| level | integer | |
| alpha | bigint | |
| frequency | bigint | |
| hanja | text | |
| typeeng | text | |
| typekr | text | |
| word_id | uuid | default gen_random_uuid() |
+----------------+----------+----------------------------+
I would suggest you to modify your sub query as follow :
update inflection_copy AS ic
SET inflectionlinks = s.json_array
FROM
(SELECT jsonb_agg(CASE WHEN wc.word_id IS NULL THEN e.elems ELSE jsonb_set(e.elems, array['Id'], to_jsonb(wc.word_id::text)) END ORDER BY e.id ASC) AS json_array
FROM inflection_copy AS ic
CROSS JOIN LATERAL jsonb_path_query(ic.inflectionlinks, '$[*]') WITH ORDINALITY AS e(elems, id)
LEFT JOIN word_copy AS wc
ON wc.id::text = e.elems->>'Id'
) AS s
The LEFT JOIN clause will return wc.word_id = NULL when there is no wc.id which corresponds to e.elems->>'id', so that e.elems is unchanged in the CASE.
The ORDER BY clause in the aggregate function jsonb_agg will ensure that the order is unchanged in the jsonb array.
jsonb_path_query is used instead of jsonb_array_elements so that to not raise an error when ic.inflectionlinks is not a jsonb array and it is used in lax mode (which is the default behavior).
see the test result in dbfiddle

Difference between x = null vs. x IS NULL

In Snowflake, what is the difference between x = NULL and x IS NULL in a condition expression? It seems empirically that x IS NULL is what I want when I want to find rows where some column is blank. I ask because x = NULL is treated as valid syntax and I am curious whether there's a different application for this expression.
what is the difference between x = NULL and x IS NULL
In Snowflake just like in other RDBMS, Nothing is equal to NULL (even NULL itself), so a condition x = NULL (which is valid SQL syntax) will always evaluate as false (well, actually, it evaluates to NULL in most RDBMS, which is not true). Note that this is also true for non-equality comparisons: that is NULL <> NULL is false too.
The typical way to check if a variable is NULL is to use the x IS NULL construct, which evaluate as true if x is NULL. You can use x IS NOT NULL too. This syntax is reserved for NULL, so something like x IS y is a syntax error.
Here is a small demo:
select
case when 1 = null then 1 else 0 end 1_equal_null,
case when 1 <> null then 1 else 0 end 1_not_equal_null,
case when null is null then 1 else 0 end null_is_null,
case when 1 is not null then 1 else 0 end 1_is_not_null
1_equal_null | 1_not_equal_null | null_is_null | 1_is_not_null
-----------: | ---------------: | -----------: | ------------:
0 | 0 | 1 | 1
This particular case is well-described in Snowflake's documentation:
EQUAL_NULL
IS [ NOT ] DISTINCT FROM
Compares whether two expressions are equal. The function is NULL-safe, meaning it treats NULLs as known values for comparing equality. Note that this is different from the EQUAL comparison operator (=), which treats NULLs as unknown values.
+------+------+--------------------------------+------------------------------------------+----------------------------+--------------------------------------+
| X1_I | X2_I | X1.I IS NOT DISTINCT FROM X2.I | SELECT IF X1.I IS NOT DISTINCT FROM X2.I | X1.I IS DISTINCT FROM X2.I | SELECT IF X1.I IS DISTINCT FROM X2.I |
|------+------+--------------------------------+------------------------------------------+----------------------------+--------------------------------------|
| 1 | 1 | True | Selected | False | Not |
| 1 | 2 | False | Not | True | Selected |
| 1 | NULL | False | Not | True | Selected |
| 2 | 1 | False | Not | True | Selected |
| 2 | 2 | True | Selected | False | Not |
| 2 | NULL | False | Not | True | Selected |
| NULL | 1 | False | Not | True | Selected |
| NULL | 2 | False | Not | True | Selected |
| NULL | NULL | True | Selected | False | Not |
+------+------+--------------------------------+------------------------------------------+----------------------------+--------------------------------------+
Like most SQL languages, comparing NULL = NULL does not return TRUE. In SnowFlake, it returns NULL, as does ANY comparison to a NULL value. The reason for this is tied to the convoluted history of SQL, and it has been well argued whether or not this is a good feature or not. Regardless, it's what we have.
As such, when you are comparing two values that may be NULL here are a few different solutions you can typically use.
-- NVL will return the second value if the first value is NULL
-- So if both of your values are NULL, then an NVL around each of them will
-- return a value so that they are both equal.
-- This only works if you know that your values will never be equal to -1 for example
SELECT ...
WHERE NVL(x, -1) = NVL(y, -1)
-- A little messier, especially among more complicated filters,
-- but guaranteed to work regardless of values
SELECT ...
WHERE x = y OR (x is null and y is null)
-- My new favorite which works in SnowFlake (thanks to #waldente)
SELECT x IS NOT DISTINCT FROM y;
-- For most SQL languages, this is a neat way to take advantage of how
-- INTERSECT compares values which does treat NULLs as equal
SELECT ...
WHERE exists (select x intersect select y)

Joining two tables and show data from one if there is any

I have these two tables that i need to join
fields_data fields
+------------+-----------+------+ +------+-------------+----------+
| relationid | fieldname | data | | name | displayname | position |
+------------+-----------+------+ +------+-------------+----------+
| 2 | ftp | test | | user | Username | top |
| 2 | other | 1234 | | pass | Password | top |
+------------+-----------+------+ | ftp | FTP | top |
| log | Log | top |
| txt | Text | mid |
+------+-------------+----------+
I want to get all the rows from the "fields" table if they have the position "top" AND if a row has a match on name = fieldname from fields_data it should also show the data. This is my join
SELECT
fd.`data`,
fd.`relationid`,
fd.`fieldname`,
f.`name`,
f.`displayname`
FROM `fields` AS f
LEFT OUTER JOIN `fields_data` AS fd
ON fd.`fieldname` = f.`name`
WHERE f.`position`='top' AND (fd.`relationid`='3' OR fd.`relationid` IS NULL)
My problem is that the above query only gives me this result:
+------+------------+-----------+------+-------------+
| data | relationid | fieldname | name | displayname |
+------+------------+-----------+------+-------------+
| NULL | NULL | NULL | user | Username |
| NULL | NULL | NULL | pass | Password |
| NULL | NULL | NULL | log | Log |
+------+------------+-----------+------+-------------+
The field called "ftp" is missing due to it having a relation to "2".. However i still want to display it as result but like the others with NULL in it. And if the SQL query had "fd.relationid='2'" instead of 3 it would give same result, but with the row containing ftp in name, holding data in the three fields.
I hope you get what i mean.. My english is not the best.. Heres the result i want:
with above query containing fd.`relationid`='3'
+------+------------+-----------+------+-------------+
| data | relationid | fieldname | name | displayname |
+------+------------+-----------+------+-------------+
| NULL | NULL | NULL | user | Username |
| NULL | NULL | NULL | pass | Password |
| NULL | NULL | NULL | ftp | FTP |
| NULL | NULL | NULL | log | Log |
+------+------------+-----------+------+-------------+
with above query containing fd.`relationid`='2'
+------+------------+-----------+------+-------------+
| data | relationid | fieldname | name | displayname |
+------+------------+-----------+------+-------------+
| NULL | NULL | NULL | user | Username |
| NULL | NULL | NULL | pass | Password |
| test | 2 | ftp | ftp | FTP |
| NULL | NULL | NULL | log | Log |
+------+------------+-----------+------+-------------+
You want to move the condition to the on clause:
SELECT fd.`data`, fd.`relationid`, fd.`fieldname`, f.`name`, f.`displayname`
FROM `fields` f LEFT OUTER JOIN
`fields_data` fd
ON fd.`fieldname` = f.`name` AND fd.`relationid` = '3'
WHERE f.`position`='top' ;
It is interesting that the semantics of your query and this query are different -- and you found the exact situation: when there is a match on another value, the where clause form filters out the row. This will still keep everything.
As a note, the following also does what you want:
SELECT fd.`data`, fd.`relationid`, fd.`fieldname`, f.`name`, f.`displayname`
FROM `fields` f LEFT OUTER JOIN
(SELECT fd.*
FROM `fields_data` fd
WHERE fd.`relationid` = '3'
) fd
ON fd.`fieldname` = f.`name`
WHERE f.`position` = 'top' ;
I wouldn't recommend writing the query this way, particularly in MySQL (because the subquery is materialized). However, understanding why your version is different from these versions (and why these are the same) is a big step forward in mastering outer joins.

Transaction management and temporary tables in SQL Server

Sorry for the title, perhaps it's not very clear.
I have some SQL queries in a script that depend on each other.
The script uses a temporary table in which the data is inserted (the #temp_data table).
This is the expected output:
___________________________________
| speed1 | speed2 | distance |
| 1 | NULL | 10 |
| 3 | NULL | 40 |
| 5 | NULL | 90 |
| NULL | 1 | 10 |
| NULL | 3 | 40 |
| NULL | 5 | 90 |
Here is the query structure (I didn't include the actual query since it's too big):
-- First group
queryForSpeed1
queryToUpdateDistanceBasedOnSpeed1
-- Second group
queryForSpeed2
queryToUpdateDistanceBasedOnSpeed2
If I run the first group of queries (queryForSpeed1 and queryToUpdateDistanceBasedOnSpeed1) separately from the second group then I get the expected output: only the speed1 and distance columns contain data:
___________________________________
| speed1 | speed2 | distance |
| 1 | NULL | 10 |
| 3 | NULL | 40 |
| 5 | NULL | 90 |
| NULL | NULL | NULL |
| NULL | NULL | NULL |
| NULL | NULL | NULL |
The same happens when I run the second group:
___________________________________
| speed1 | speed2 | distance |
| NULL | NULL | NULL |
| NULL | NULL | NULL |
| NULL | NULL | NULL |
| NULL | 1 | 10 |
| NULL | 2 | 40 |
| NULL | 3 | 90 |
BUT, when I run both groups: all the distances are NULL:
___________________________________
| speed1 | speed2 | distance |
| 1 | NULL | NULL |
| 3 | NULL | NULL |
| 5 | NULL | NULL |
| NULL | 1 | NULL |
| NULL | 2 | NULL |
| NULL | 3 | NULL |
I believe this is somehow related to transaction management and temporary tables, although I wasn't able to find anything relevant to solve the problem on Google.
From what I've read, SQL Server keeps a transaction log where it stores every update, insert and whatever... when it arrives at the end of the script it actually does all those insertions and updates.
So the update I did for the distance column finds all the speeds as being NULL because the data wasn't yet inserted in the temporary table from the previous updates, but at the end of the query the speeds are inserted in the table so that's why they are visible.
I played a bit with the GO statement to execute my script in batches, but no luck so far...
What am I doing wrong? Can someone point me in the right direction, please?
EDIT
Here is the actual query.
The problem is not related to transactions, but rather to the way you conduct updates to #temp_speed_profile. The second pass through #temp_speed_profile retrieves all six records. Speed_new is null in first record of Voyage_Id, consequently #distance becomes null. As you retain the value of #distance in next turn, it remains null.
Problem goes away when using different temporary tables because second pass works on second set of data only.
A note on cursors - when defining one make sure to add local and fast_forward. Local because it is limiting cursors' scope, and fast_forward to optimize fetches.
It is almost certainly caused by the way you have written your queries.
To confirm, just rewrite your queries using #temp_data1 and #temp_data2, rather than a single table #temp_data.

Create a summary result with one query

I have a table with the following format.
mysql> describe unit_characteristics;
+----------------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| uut_id | int(10) unsigned | NO | PRI | NULL | |
| uut_sn | varchar(45) | NO | | NULL | |
| characteristic_name | varchar(80) | NO | PRI | NULL | |
| characteristic_value | text | NO | | NULL | |
| creation_time | datetime | NO | | NULL | |
| last_modified_time | datetime | NO | | NULL | |
+----------------------+------------------+------+-----+---------+----------------+
each uut_sn has multiple characteristic_name/value pairs. I want to use MySQL to generate a table
+----------------------+-------------+-------------+-------------+--------------+
| uut_sn | char_name_1 | char_name_2 | char_name_3 | char_name_4 | ... |
+----------------------+-------------+-------------+-------------+--------------+
| 00000 | char_val_1 | char_val_2 | char_val_3 | char_val_4 | ... |
| 00001 | char_val_1 | char_val_2 | char_val_3 | char_val_4 | ... |
| 00002 | char_val_1 | char_val_2 | char_val_3 | char_val_4 | ... |
| ..... | char_val_1 | char_val_2 | char_val_3 | char_val_4 | ... |
+----------------------+------------------+------+-----+---------+--------------+
Is this possible with just one query?
Thanks,
-peter
This is a standard pivot query:
SELECT uc.uut_sn,
MAX(CASE
WHEN uc.characteristic_name = 'char_name_1' THEN uc.characteristic_value
ELSE NULL
END) AS char_name_1,
MAX(CASE
WHEN uc.characteristic_name = 'char_name_2' THEN uc.characteristic_value
ELSE NULL
END) AS char_name_2,
MAX(CASE
WHEN uc.characteristic_name = 'char_name_3' THEN uc.characteristic_value
ELSE NULL
END) AS char_name_3,
FROM unit_characteristics uc
GROUP BY uc.uut_sn
To make it dynamic, you need to use MySQL's dynamic SQL syntax called Prepared Statements. It requires two queries - the first gets a list of the characteristic_name values, so you can concatenate the appropriate string into the CASE expressions like you see in my example as the ultimate query.
You're using the EAV antipattern. There's no way to automatically generate the pivot table you describe, without hardcoding the characteristics you want to include. As #OMG Ponies mentions, you need to use dynamic SQL to general the query in a custom fashion for the set of characteristics you want to include in the result.
Instead, I recommend you fetch the characteristics one per row, as they are stored in the database, and if you want an application object to represent a single UUT with all its characteristics, you write code to loop over the rows as you fetch them in your application, collecting them into objects.
For example in PHP:
$sql = "SELECT uut_sn, characteristic_name, characteristic_value
FROM unit_characteristics";
$stmt = $pdo->query($sql);
$objects = array();
while ($row = $stmt->fetch()) {
if (!isset($objects[ $row["uut_sn"] ])) {
$object[ $row["uut_sn"] ] = new Uut();
}
$objects[ $row["uut_sn"] ]->$row["characteristic_name"]
= $row["characterstic_value"];
}
This has a few advantages over the solution of hardcoding characteristic names in your query:
This solution takes only one SQL query instead of two.
No complex code is needed to build your dynamic SQL query.
If you forget one of the characteristics, this solution automatically finds it anyway.
GROUP BY in MySQL is often slow, and this avoids the GROUP BY.