Difference between x = null vs. x IS NULL - sql

In Snowflake, what is the difference between x = NULL and x IS NULL in a condition expression? It seems empirically that x IS NULL is what I want when I want to find rows where some column is blank. I ask because x = NULL is treated as valid syntax and I am curious whether there's a different application for this expression.

what is the difference between x = NULL and x IS NULL
In Snowflake just like in other RDBMS, Nothing is equal to NULL (even NULL itself), so a condition x = NULL (which is valid SQL syntax) will always evaluate as false (well, actually, it evaluates to NULL in most RDBMS, which is not true). Note that this is also true for non-equality comparisons: that is NULL <> NULL is false too.
The typical way to check if a variable is NULL is to use the x IS NULL construct, which evaluate as true if x is NULL. You can use x IS NOT NULL too. This syntax is reserved for NULL, so something like x IS y is a syntax error.
Here is a small demo:
select
case when 1 = null then 1 else 0 end 1_equal_null,
case when 1 <> null then 1 else 0 end 1_not_equal_null,
case when null is null then 1 else 0 end null_is_null,
case when 1 is not null then 1 else 0 end 1_is_not_null
1_equal_null | 1_not_equal_null | null_is_null | 1_is_not_null
-----------: | ---------------: | -----------: | ------------:
0 | 0 | 1 | 1

This particular case is well-described in Snowflake's documentation:
EQUAL_NULL
IS [ NOT ] DISTINCT FROM
Compares whether two expressions are equal. The function is NULL-safe, meaning it treats NULLs as known values for comparing equality. Note that this is different from the EQUAL comparison operator (=), which treats NULLs as unknown values.
+------+------+--------------------------------+------------------------------------------+----------------------------+--------------------------------------+
| X1_I | X2_I | X1.I IS NOT DISTINCT FROM X2.I | SELECT IF X1.I IS NOT DISTINCT FROM X2.I | X1.I IS DISTINCT FROM X2.I | SELECT IF X1.I IS DISTINCT FROM X2.I |
|------+------+--------------------------------+------------------------------------------+----------------------------+--------------------------------------|
| 1 | 1 | True | Selected | False | Not |
| 1 | 2 | False | Not | True | Selected |
| 1 | NULL | False | Not | True | Selected |
| 2 | 1 | False | Not | True | Selected |
| 2 | 2 | True | Selected | False | Not |
| 2 | NULL | False | Not | True | Selected |
| NULL | 1 | False | Not | True | Selected |
| NULL | 2 | False | Not | True | Selected |
| NULL | NULL | True | Selected | False | Not |
+------+------+--------------------------------+------------------------------------------+----------------------------+--------------------------------------+

Like most SQL languages, comparing NULL = NULL does not return TRUE. In SnowFlake, it returns NULL, as does ANY comparison to a NULL value. The reason for this is tied to the convoluted history of SQL, and it has been well argued whether or not this is a good feature or not. Regardless, it's what we have.
As such, when you are comparing two values that may be NULL here are a few different solutions you can typically use.
-- NVL will return the second value if the first value is NULL
-- So if both of your values are NULL, then an NVL around each of them will
-- return a value so that they are both equal.
-- This only works if you know that your values will never be equal to -1 for example
SELECT ...
WHERE NVL(x, -1) = NVL(y, -1)
-- A little messier, especially among more complicated filters,
-- but guaranteed to work regardless of values
SELECT ...
WHERE x = y OR (x is null and y is null)
-- My new favorite which works in SnowFlake (thanks to #waldente)
SELECT x IS NOT DISTINCT FROM y;
-- For most SQL languages, this is a neat way to take advantage of how
-- INTERSECT compares values which does treat NULLs as equal
SELECT ...
WHERE exists (select x intersect select y)

Related

SQL-Retrieve only the Total count of occurences of a specific column [duplicate]

Think I have a table with two fields: ID and State. State value (that is boolean) can be 0 or 1. ID isn't unique so the table looks like this:
ID | State |
-----------------
1 | true |
-----------------
1 | false |
-----------------
2 | false |
-----------------
3 | true |
-----------------
1 | true |
Now, I want to count every rows group by ID field and have State as two different columns in resultset. So it should look like this:
ID | TrueState | FalseState |
------------------------------------
1 | 2 | 1 |
------------------------------------
2 | 0 | 1 |
------------------------------------
3 | 1 | 0 |
How to do that?
This is a pivot query, which mysql doesn't support. The workarounds get ugly fast, but since you're only going to be generating two new columns, it won't be horribly ugly, just mildly unpleasant:
SELECT SUM(State = True) AS TrueState, SUM(State = False) AS FalseState,
SUM(State is NULL) AS FileNotFoundState
...
Basically state = true will evaluate to boolean true/false, which MySQL will type-cast to an integer 0 or 1, which can them be SUM()med up.

SQLITE - If a column is null then the value in another column is 0

I have two columns in a table. The table name is constructed with inner join and group by, let's call this table Joined. It has two columns Present and Score. If Present is null then, I want to assign 0 to the Score value.
+------------+--------+-------------+------------+--------+
| Student_Id | Course | ExamDate | Present | Score |
+------------+--------+-------------+------------+--------+
| 1 | Math | 04/05/2020 | Yes | 45 |
| 2 | Math | 04/05/2020 | NULL | 90 |
| 2 | Math | 04/05/2020 | NULL | 50 |
+------------+--------+-------------+------------+--------+
What I have up to now is
SELECT DISTINCT StudentID ,Course, ExamDate, Present, Score
CASE Present ISNULL
Score = 0
END
FROM Joined
I need the distinct because the inner join can give me some repetitions. What I need is
+------------+--------+-------------+------------+--------+
| Student_Id | Course | ExamDate | Present | Score |
+------------+--------+-------------+------------+--------+
| 1 | Math | 04/05/2020 | Yes | 45 |
| 2 | Math | 04/05/2020 | NULL | 0 |
+------------+--------+-------------+------------+--------+
It feels very very wrong to me but I haven't been able figure out how to do it with one query. How can I do it?
If Present is null then, I want to assign 0 to the Score value.
The case expression goes like:
select
present,
case when present is null
then 0
else score
end as score
from ...
You don’t tell what to do when present is not null - so this returns the original score.
It is unclear why you would need distinct. If you were to ask a question about the original query, which seems to produce (partial) duplicates, on might be able to help fix it.
You can try the below -
SELECT DISTINCT StudentID ,Course, ExamDate, Present, Score,
CASE when Present IS NULL
then 0 else score END as scoreval
FROM Joined

How To Check Numerical Format in SQL Server 2008

I am converting some existing Oracle queries to MSSQL Server (2008) and can't figure out how to replicate the following Regex check:
SELECT SomeField
FROM SomeTable
WHERE NOT REGEXP_LIKE(TO_CHAR(SomeField), '^[0-9]{2}[.][0-9]{7}$');
That finds all results where the format of the number starts with 2 positive digits, followed by a decimal point, and 7 decimal places of data: 12.3456789
I've tried using STR, CAST, CONVERT, but they all seem to truncate the decimal to 4 decimal places for some reason. The truncating has prevented me from getting reliable results using LEN and CHARINDEX. Manually adding size parameters to STR gets slightly closer, but I still don't know how to compare the original numerical representation to the converted value.
SELECT SomeField
, STR(SomeField, 10, 7)
, CAST(SomeField AS VARCHAR)
, LEN(SomeField )
, CHARINDEX(STR(SomeField ), '.')
FROM SomeTable
+------------------+------------+---------+-----+-----------+
| Orig | STR | Cast | LEN | CHARINDEX |
+------------------+------------+---------+-----+-----------+
| 31.44650944 | 31.4465094 | 31.4465 | 7 | 0 |
| 35.85609 | 35.8560900 | 35.8561 | 7 | 0 |
| 54.589623 | 54.5896230 | 54.5896 | 7 | 0 |
| 31.92653899 | 31.9265390 | 31.9265 | 7 | 0 |
| 31.4523333333333 | 31.4523333 | 31.4523 | 7 | 0 |
| 31.40208955 | 31.4020895 | 31.4021 | 7 | 0 |
| 51.3047869443893 | 51.3047869 | 51.3048 | 7 | 0 |
| 51 | 51.0000000 | 51 | 2 | 0 |
| 32.220633 | 32.2206330 | 32.2206 | 7 | 0 |
| 35.769247 | 35.7692470 | 35.7692 | 7 | 0 |
| 35.071022 | 35.0710220 | 35.071 | 6 | 0 |
+------------------+------------+---------+-----+-----------+
What you want to do does not make sense in SQL Server.
Oracle supports a number data type that has a variable precision:
if a precision is not specified, the column stores values as given.
There is no corresponding data type in SQL Server. You have have a variable number (float/real) or a fixed number (decimal/numeric). However, both apply to ALL values in a column, not to individual values within a row.
The closest you could do is:
where somefield >= 0 and somefield < 100
Or if you wanted to insist that there is a decimal component:
where somefield >= 0 and somefield < 100 and floor(somefield) <> somefield
However, you might have valid integer values that this would filter out.
This answer gave me an option that works in conjunction with checking the decimal position first.
SELECT SomeField
FROM SomeTable
WHERE SomeField IS NOT NULL
AND CHARINDEX('.', SomeField ) = 3
AND LEN(CAST(CAST(REVERSE(CONVERT(VARCHAR(50), SomeField , 128)) AS FLOAT) AS BIGINT)) = 7
While I understand this is terrible by nearly all metrics, it satisfies the requirements.
The basis of checking formatting on this data type in inherently flawed as pointed out by several posters, however for this very isolated use case I wanted to document the workaround.

Setting value of boolean columns based on existence of value in set

I have a SQL table of the format (INTEGER, json_array(INTEGER)).
I need to return results from the table that have two boolean columns. One is set to true iff a 1 appears in the json_array, and the other true iff a two appears in the array. Obviously there is not mutual exclusion.
For example, if the data were this:
-------------------------------
| ID | VALUES |
-------------------------------
| 12 | [1, 4, 6, 11] |
_______________________________
| 74 | [0, 1, 2, 5] |
-------------------------------
I would hope to get back:
-------------------------------
| ID | HAS1 | HAS2 |
-------------------------------
| 12 | true | false |
_______________________________
| 74 | true | true |
-------------------------------
I have managed to extract the json data out of the values column using json_each, but am unsure how to proceed.
If I recall correctly, SQLite max aggregate function supports boolean, therefore you can simply group by your data:
select
t1.id,
max(case json_each.value when 1 then true else false end) as has1,
max(case json_each.value when 2 then true else false end) as has2
from
yourtable t1,
json_each(t1.values)
group by
t1.id

postgres: How to protect conditional expressions from null values

After years of using Postgresql, I still don't know if there is an established best-practice on how to protect conditional expressions from null values of variables, given that SQL query planners have full authority to apply or ignore the most frequently used idiom to protect from null values: "var is null or var=0".
Allegedly, using the 'case when ... end' grammar solves any ambiguity, but also reduces maintainability, since it obscures with lots of words a simple procedure.
Thanks, in advance.
I think you have a missconception arising from comparing SQL to Java (or C, C++, or any language dealing with references or pointers).
You don't need to protect conditional expressions from NULL values when working with SQL.
In SQL, you do not have (hidden) pointers (or references) to objects that should be tested against NULL or otherwise they cannot be dereferenced. In SQL, every expression produces a certain value of a certain type. This value can be NULL (also called UNKNOWN).
If your var is NULL, then var = 0 will evaluate to NULL (unknown = 0 gives back unknown). Then var IS NULL (unknown is unknown) will evaluate to TRUE. And, according to three-value logic, TRUE or UNKNOWN evaluates to TRUE. No matter which is the order of evaluation, the result is always the same.
You can check it just by evaluating:
SELECT
/* var */ NULL = 0 as null_equals_zero,
/* var */ NULL IS NULL as null_is_null,
TRUE or NULL AS true_or_null,
(NULL = 0) OR (NULL IS NULL) AS your_case_when_var_is_null,
(NULL IS NULL) OR (NULL = 0) AS the_same_reordered
;
Returns
null_equals_zero | null_is_null | true_or_null | your_case_when_var_is_null | the_same_reordered
:--------------- | :----------- | :----------- | :------------------------- | :-----------------
null | t | t | t | t
dbfiddle here
Given var = 0, NULL and 1 (<> 0); you'll get:
WITH vals(var) AS
(
VALUES
(0),
(NULL),
(1)
)
SELECT
var,
var = 0 OR var IS NULL AS var_equals_zero_or_var_is_null,
var IS NULL OR var = 0 AS var_is_null_or_var_equals_zero,
CASE WHEN var IS NULL then true
WHEN var = 0 then true
ELSE false
END AS the_same_with_protection
FROM
vals ;
var | var_equals_zero_or_var_is_null | var_is_null_or_var_equals_zero | the_same_with_protection
---: | :----------------------------- | :----------------------------- | :-----------------------
0 | t | t | t
null | t | t | t
1 | f | f | f
dbfiddle here
These are the basic truth tables for the different operators (NOT, AND, OR, IS NULL, XOR, IMPLIES) using three-valued logic, and checked with SQL:
WITH three_values(x) AS
(
VALUES
(NULL), (FALSE), (TRUE)
)
SELECT
a, b,
a = NULL AS a_equals_null, -- This is alwaus NULL
a IS NULL AS a_is_null, -- This is NEVER NULL
a OR b AS a_or_b, -- This is UNKNOWN if both are
a AND b AS a_and_b, -- This is UNKNOWN if any is
NOT a AS not_a, -- This is UNKNOWN if a is
(a OR b) AND NOT (a AND b) AS a_xor_b, -- Unknown when any is unknown
/* (a AND NOT b) OR (NOT a AND b) a_xor_b_v2, */
NOT a OR b AS a_implies_b -- Kleener and Priests logic
FROM
three_values AS x(a)
CROSS JOIN
three_values AS y(b);
This is the truth table:
a | b | a_equals_null | a_is_null | a_or_b | a_and_b | not_a | a_xor_b | a_implies_b
:--- | :--- | :------------ | :-------- | :----- | :------ | :---- | :------ | :----------
null | null | null | t | null | null | null | null | null
null | f | null | t | null | f | null | null | null
null | t | null | t | t | null | null | null | t
f | null | null | f | null | f | t | null | t
f | f | null | f | f | f | t | f | t
f | t | null | f | t | f | t | t | t
t | null | null | f | t | null | f | null | null
t | f | null | f | t | f | f | t | f
t | t | null | f | t | t | f | f | t
dbfiddle here
It seems I just asked a question which has forever been present. So, per de problem of NULL propagation in SQL logical expressions, with the added danger of the sql optimizer not honoring short-circuit constructs, and of evolving SQL standards, let me share what I've found so far:
Read wikipedia's article on SQL NULL PROPAGATION
Use coalesce() around any column name with possible null values, involved in any calculation within a sql statement (thanks Igor).
Also use 'is [not] distinct from' instead of '=' or '<>'