I have SQL query in PostgreSQL which filters particular fields in the form of an array
for being in bigint range. I would like to add the possibility not to filter out null values. With existing queries, null values for all of the fields are filtered out:
select *
from table_test
where '[0,2147483647]'::int8range #> ALL(ARRAY[fields])
And I would like to do something like this, only here I check against the whole array while I would want to check against each field:
select count(*) from dbm.inventory_source where '[0,2147483647]'::int8range #> ALL(ARRAY[id, exchange_id, min_cpm_micros])
or (array[id, exchange_id, min_cpm_micros]) is null
Also, I would not want to check each field for null instead I would like to check nulls for the whole array of fields.
I pass the names of the fields like one string into query (called fields) and it is the reason I do not want to check each field separately. Such implementation was created to have more generic queries for multiple tables.
How can I fix this query?
I would like to add the possibility not to filter out null values.
Based on this, I would expect logic like:
where '[0,2147483647]'::int8range #> ALL(ARRAY[field_1, field_2, field_3]) or
(field_1 is null and field_2 is null and field_3 is null)
I am unclear if you want to allow all values to be NULL or any of them. The above is for all of them. If you want any, change the ands to ors.
If I understand correctly, presumably you're looking for something like this:
SELECT *
FROM table_test
WHERE '[0,2147483647]'::int8range #> ALL(ARRAY[fields]) IS NOT FALSE
(yeah, sorry, all I did was add three words and capitalize your keywords)
What's this doing? Let's start from the top.
Let's look at what we want from all the conditional stuff. Specifically, it seems we want the condition to return TRUE for every array wherein each value of the array satisfies one of these two conditions:
The value falls within the range [0,2147483647]
The value is NULL
It's useful here to keep in mind the exact meaning of NULL in SQL: it's a value that we don't know. This is why NULL propagates in most operations, and thinking of it that way makes it easier to predict how the database will treat it. In fact, let's replace it with ? for some examples. Why doesn't ? = ? return TRUE? It's because we don't know what either of those values are, so we don't know if they're equal, so the expression evaluates to some unknown value, NULL. What about something like ? + 1? Well, we don't know what the sum is, so it's also NULL. Similarly, ? AND TRUE depends entirely on what the first value is, and we don't know what it is, so we write NULL.
This is where it gets fun: ? AND FALSE will always be false, no matter what our unknown value is, so it evaluates to FALSE instead of NULL. Similarly, ? OR TRUE must evaluate to TRUE.
Now, revisiting our two conditions, we see that your code already checked for condition 1. What about condition 2? Well, think about how ALL works, and what it's really telling you. It's basically evaluating your condition for each entry in your array, then combining all of those with AND to tell you whether or not it's true for all of the entries. This means that your test, specifically the expression
'[0,2147483647]'::int8range #> ALL(ARRAY[fields])
returns TRUE, FALSE, or NULL for each of the entries in the array, then combines those results using AND. Since we know that
TRUE AND TRUE returns TRUE
NULL AND TRUE returns NULL
x AND FALSE returns FALSE for any x
we can safely say that your code will return FALSE if and only if the array contains a value outside of your given range; otherwise, it will return TRUE or NULL. On the other hand, we want to get TRUE regardless of whether your code says TRUE or NULL; in other words, whenever it evaluates to anything other than FALSE. Luckily, there's a predicate for that:
IS NOT NULL
Well! That was pretty complicated to think about, but after finding the solution, it seems almost offensively simple! Way to make me feel stupid.
Check it out here.
Related
Excuse my ignorance about this... I'm taking a data analysis course and I stumbled upon this query in an exercise:
SELECT
CASE
WHEN MIN(REGEXP_CONTAINS(STRING(ActivityDate), DATE_REGEX)) = TRUE THEN
"Valid"
ELSE
"Not Valid"
END
AS valid_test
FROM
`tracker_data_clean.daily-activity-clean`;
ActivityDate is a field that contains date type data and DATE_REGEX is a regular expression variable for a date format string.
What I don't know, is what does taking the MIN() of this boolean expression REGEX_CONTAINS do or mean.
I would appreciate if any of you could help me understand the concept of doing this.
Thanks !
The query selects rows from the table and applies the REGEXP_CONTAINS() function to every (string-converted) value in the ActivityDate column. REGEXP_CONTAINS() will either return true or false based on whether the value matches the regex pattern in DATE_REGEX.
How MIN() behaves here can vary by implementation:
Booleans might be coerced as integers, so MIN() is evaluating 0's and 1's. If all the values are 1 (true), MIN() will be 1 (true), otherwise it will be 0 (false).
Other implementations might evaluate booleans directly, so MIN() returns true if all of the values are true, because the minimum value is true (true being "greater" than false), otherwise it returns false.
The result, based on the implementation, is that MIN() returns 0/1, or false/true. Either way, that result is compared to true in the CASE statement. If all values matched the regex, the comparison will be true.
Basically, the query is "does every row have a valid date in the ActivityDate column?" The result will be a table with a single column valid_test and one row, containing "Valid" if they all match, "Not Valid" otherwise.
Another way to look at it that would be relatable to some programming languages is that MIN(bool_function()) is analogous to all(), meaning return true if all values are true. Similarly, MAX(bool_function()) would be analogous to any(), meaning return true if any value is true.
I am a little bit confusing and have no idea which one of these two SELECT statments are correct
SELECT Value FROM visibility WHERE site_info LIKE '%site_is_down%';
OR
SELECT Value FROM visibility WHERE site_info = 'site_is_down';
SInce I run both of these I get same result, but I am interesting which one is correct since Value column is VARCHAR data type OR both of these SELECT are incorect ?
Result set running first SELECT
Value
1. 0
Result set running second SELECT
Value
1. 0
The two statements do not do the same thing.
The first statement filters on rows whose site_infos contain string 'site_is_down'. The surrounding '%' are wildcards. So it would match on something like 'It looks like site_is_down right now'.
The second query, with the equality condition, filters on site_info whose content is exactly 'site_is_dow'.
Everything that the second query is also returned by the first query - but the opposite is not true.
Which statement is "correct" depends on your actual requirement.
If both queries are useful for you, I'd use the second query, as it is the simplest, and runs faster.
I have a query that needs to exclude both Null and Blank Values, but for some reason I can't work out this simple logic in my head.
Currently, my code looks like this:
WHERE [Imported] = 0 AND ([Value] IS NOT NULL **OR** [Value] != '')
However, should my code look like this to exclude both condition:
WHERE [Imported] = 0 AND ([Value] IS NOT NULL **AND** [Value] != '')
For some reason I just can't sort this in my head properly. To me it seems like both would work.
In your question you wrote the following:
have a query that needs to exclude both Null and Blank Values
So you have answered yourself, the AND query is the right query:
WHERE [Imported] = 0 AND ([Value] IS NOT NULL AND [Value] != '')
Here is an extract from the ANSI SQL Draft 2003 that I borrowed from this question:
6.3.3.3 Rule evaluation order
[...]
Where the precedence is not determined by the Formats or by
parentheses, effective evaluation of expressions is generally
performed from left to right. However, it is
implementation-dependent whether expressions are actually evaluated left to right, particularly when operands or operators might
cause conditions to be raised or if the results of the expressions
can be determined without completely evaluating all parts of the
expression.
You don't specify what kind of database system you are using but the concept of short-circuit evaluation which is explained in the previous paragraph applies to all major SQL versions (T-SQL, PL/SQL etc...)
Short-circuit evaluation means that once an expression has been successfully evaluated it will immediately exit the condition and stop evaluating the other expressions, applied to your question:
If value is null you want to exit the condition, that's why it should be the first expression (from left to right) but if it isn't null it should also not be empty, so it has to be NOT NULL and NOT EMPTY.
This case is a bit tricky because you cannot have a non empty string that is also null so the OR condition will also work but you will do an extra evaluation because short-circuit evaluation will never exit in the first expression:
Value is null but we would always need to check that value is also not an empty string (value is null or value is not an empty string).
In this second case, you may get an exception because the expression [Value] != '' may be checked on a null object.
So I think AND is the right answer. Hope it helps.
If the value was numeric and you didn't want either 1 or 2, you would write that condition as
... WHERE value != 1 AND value != 2
An OR would always be true in this case. For instance a value of 1 would return true for the check against 2 - and then the OR-check would return true, as at least one of the conditions evaluated to true.
When yu also want to check against null values, the situation is a bit more complicated. A check against a null value always fails: value != '' is false when value is null. That is why there is a special IS NULL or IS NOT NULL test.
I understand that SQL uses three valued logic but I am having trouble understanding how to use this in practice, especially why TRUE || NULL = True and FALSE && NULL = False instead of evaluating to null.
Here are the three valued truth tables that apply to SQL Server:
I found a couple explanations of three valued logic online but I cannot find any real code examples of this in use. Can someone show me a code example using three valued logic to help me understand this a little better?
An example of TRUE || NULL = True would be
declare #x as int = null;
if 1=1 or #x/1=1
print 'true'
An example of FALSE && NULL = False would be
declare #x as int = null;
if not(1=2 and #x/1=1)
print 'false'
True && NULL is neither True or False. It's just NULL.
Whether that will evaluate as True, False, or an Error in a boolean expression depends on what happens on your system when you evaluate NULL by itself as a boolean. Sql Server will do everything it can to avoid choosing, but when forced you'll pretty much never see a positive (True) result.
Generally speaking from a user standpoint, you don't want a Boolean expression to evaluate to NULL.
Writing SQL typically involves writing queries to explicitly avoid NULL values in Boolean expressions. IMX, developers would consider using three valued logic intentionally would be considered an abuse of three valued logic. A properly written query should handle NULLs and understand them. You don't write them in such a way that they happen to work right when something is NULL. Usually this involves COALESCE() or IS NULL or IS NOT NULL somewhere.
It is, however, vital that you understand the logic, because NULLs exist and are unavoidable for most real-world data.
For example, let's say I'm working on a table of students. The table has First, Middle, and Last name fields. I want to know the list of students that don't have a middle name. Now, some applications will store an empty string, '', and some applications will store a NULL value, and some applications might do both (and some RDBMSs like Oracle treat empty strings as NULLs). If you were unsure, you could write it as:
SELECT *
FROM Student
WHERE MiddleName = ''
OR MiddleName IS NULL;
The other common scenario is when you're OUTER JOINing to another table. Let's say you're comparing the paychecks for teachers. You have a table for Checks, and a table for CheckDetail. You want to know how much teachers pay for Benefits. Your report needs to list all teachers, even if they're contractors who don't pay for benefits because they don't get any:
SELECT Check.Employee_Id,
SUM(CheckDetail.Amount) AS BenefitsDeductions
FROM Check
LEFT JOIN CheckDetail
ON Check.Id = CheckDetail.CheckId
AND CheckDetail.LineItemType = 'Benefits'
GROUP BY Check.Employee_Id;
You run your report, and you notice that your contractor teachers show NULL for BenefitsDeductions. Oops. You need to make sure that shows up as a zero:
SELECT Check.Employee_Id,
COALESCE(SUM(CheckDetail.Amount),0) AS BenefitsDeductions
FROM Check
LEFT JOIN CheckDetail
ON Check.Id = CheckDetail.CheckId
AND CheckDetail.LineItemType = 'Benefits'
GROUP BY Check.Employee_Id;
So you try that, and it works. No NULL values! But... a few days later, your users report that teachers who used to be contractors are showing up with 0s even though they're paying for benefits now. You've got to COALESCE before the SUM to keep those amounts:
SELECT Check.Employee_Id,
SUM(COALESCE(CheckDetail.Amount,0)) AS BenefitsDeductions
FROM Check
LEFT JOIN CheckDetail
ON Check.Id = CheckDetail.CheckId
AND CheckDetail.LineItemType = 'Benefits'
GROUP BY Check.Employee_Id;
Finding these kinds of corner cases and exceptions is what writing SQL is all about.
The code example by user4955163 is a great visualization of this, however I just wanted to step back in to address the first segment of the question:
...especially why TRUE || NULL = True and FALSE && NULL = False instead
of evaluating to null...
TRUE || NULL = True
This is because the or operator will short-circuit if one operand is already known to be true. No matter what the second operand is (even if unknown, ie. "NULL"), it wouldn't make the expression false since we already know the other operand is true. or only needs one operand to be true, to evaluate to true.
FALSE && NULL = False
This is because the and operator will short-circuit if one operand is already known to be false. No matter what the second operand is (even if unknown, ie. "NULL"), it wouldn't make the expression true since we already know the other operand is false. and needs both operands to be true to evaluate to true.
To use a nullable variable you just need to check NULL conditions (using IS NULL) before checking the value.
e.g. IF #a IS NOT NULL AND #a = 1
I was reading this article:
Get null == null in SQL
And the consensus is that when trying to test equality between two (nullable) sql columns, the right approach is:
where ((A=B) OR (A IS NULL AND B IS NULL))
When A and B are NULL, (A=B) still returns FALSE, since NULL is not equal to NULL. That is why the extra check is required.
What about when testing inequalities? Following from the above discussion, it made me think that to test inequality I would need to do something like:
WHERE ((A <> B) OR (A IS NOT NULL AND B IS NULL) OR (A IS NULL AND B IS NOT NULL))
However, I noticed that that is not necessary (at least not on informix 11.5), and I can just do:
where (A<>B)
If A and B are NULL, this returns FALSE. If NULL is not equal to NULL, then shouldn't this return TRUE?
EDIT
These are all good answers, but I think my question was a little vague. Allow me to rephrase:
Given that either A or B can be NULL, is it enough to check their inequality with
where (A<>B)
Or do I need to explicitly check it like this:
WHERE ((A <> B) OR (A IS NOT NULL AND B IS NULL) OR (A IS NULL AND B IS NOT NULL))
REFER to this thread for the answer to this question.
Because that behavior follows established ternary logic where NULL is considered an unknown value.
If you think of NULL as unknown, it becomes much more intuitive:
Is unknown a equal to unknown b? There's no way to know, so: unknown.
relational expressions involving NULL actually yield NULL again
edit
here, <> stands for arbitrary binary operator, NULL is the SQL placeholder, and value is any value (NULL is not a value):
NULL <> value -> NULL
NULL <> NULL -> NULL
the logic is: NULL means "no value" or "unknown value", and thus any comparison with any actual value makes no sense.
is X = 42 true, false, or unknown, given that you don't know what value (if any) X holds? SQL says it's unknown. is X = Y true, false, or unknown, given that both are unknown? SQL says the result is unknown. and it says so for any binary relational operation, which is only logical (even if having NULLs in the model is not in the first place).
SQL also provides two unary postfix operators, IS NULL and IS NOT NULL, these return TRUE or FALSE according to their operand.
NULL IS NULL -> TRUE
NULL IS NOT NULL -> FALSE
All comparisons involving null are undefined, and evaluate to false. This idea, which is what prevents null being evaluated as equivalent to null, also prevents null being evaluated as NOT equivalent to null.
The short answer is... NULLs are weird, they don't really behave like you'd expect.
Here's a great paper on how NULLs work in SQL. I think it will help improve your understanding of the topic. I think the sections on handling null values in expressions will be especially useful for you.
http://www.oracle.com/technology/oramag/oracle/05-jul/o45sql.html
The default (ANSI) behaviour of nulls within an expression will result in a null (there are enough other answers with the cases of that).
There are however some edge cases and caveats that I would place when dealing with MS Sql Server that are not being listed.
Nulls within a statement that is grouping values together will be considered equal and be grouped together.
Null values within a statement that is ordering them will be considered equal.
Null values selected within a statement that is using distinct will be considered equal when evaluating the distinct aspect of the query
It is possible in SQL Server to override the expression logic regarding the specific Null = Null test, using the SET ANSI_NULLS OFF, which will then give you equality between null values - this is not a recommended move, but does exist.
SET ANSI_NULLS OFF
select result =
case
when null=null then 'eq'
else 'ne'
end
SET ANSI_NULLS ON
select result =
case
when null=null then 'eq'
else 'ne'
end
Here is a Quick Fix
ISNULL(A,0)=ISNULL(B,0)
0 can be changed to something that can never happen in your data
"Is unknown a equal to unknown b? There's no way to know, so: unknown."
The question was : why does the comparison yield FALSE ?
Given three-valued logic, it would indeed be sensible for the comparison to yield UNKNOWN (not FALSE). But SQL does yield FALSE, and not UNKNOWN.
One of the myriads of perversities in the SQL language.
Furthermore, the following must be taken into account :
If "unkown" is a logical value in ternary logic, then it ought to be the case that an equality comparison between two logical values that both happen to be (the value for) "unknown", then that comparison ought to yield TRUE.
If the logical value is itself unknown, then obviously that cannot be represented by putting the value "unknown" there, because that would imply that the logical value is known (to be "unknown"). That is, a.o., how relational theory proves that implementing 3-valued logic raises the requirement for a 4-valued logic, that a 4 valued logic leads to the need for a 5-valued logic, etc. etc. ad infinitum.