Excuse my ignorance about this... I'm taking a data analysis course and I stumbled upon this query in an exercise:
SELECT
CASE
WHEN MIN(REGEXP_CONTAINS(STRING(ActivityDate), DATE_REGEX)) = TRUE THEN
"Valid"
ELSE
"Not Valid"
END
AS valid_test
FROM
`tracker_data_clean.daily-activity-clean`;
ActivityDate is a field that contains date type data and DATE_REGEX is a regular expression variable for a date format string.
What I don't know, is what does taking the MIN() of this boolean expression REGEX_CONTAINS do or mean.
I would appreciate if any of you could help me understand the concept of doing this.
Thanks !
The query selects rows from the table and applies the REGEXP_CONTAINS() function to every (string-converted) value in the ActivityDate column. REGEXP_CONTAINS() will either return true or false based on whether the value matches the regex pattern in DATE_REGEX.
How MIN() behaves here can vary by implementation:
Booleans might be coerced as integers, so MIN() is evaluating 0's and 1's. If all the values are 1 (true), MIN() will be 1 (true), otherwise it will be 0 (false).
Other implementations might evaluate booleans directly, so MIN() returns true if all of the values are true, because the minimum value is true (true being "greater" than false), otherwise it returns false.
The result, based on the implementation, is that MIN() returns 0/1, or false/true. Either way, that result is compared to true in the CASE statement. If all values matched the regex, the comparison will be true.
Basically, the query is "does every row have a valid date in the ActivityDate column?" The result will be a table with a single column valid_test and one row, containing "Valid" if they all match, "Not Valid" otherwise.
Another way to look at it that would be relatable to some programming languages is that MIN(bool_function()) is analogous to all(), meaning return true if all values are true. Similarly, MAX(bool_function()) would be analogous to any(), meaning return true if any value is true.
Related
I have SQL query in PostgreSQL which filters particular fields in the form of an array
for being in bigint range. I would like to add the possibility not to filter out null values. With existing queries, null values for all of the fields are filtered out:
select *
from table_test
where '[0,2147483647]'::int8range #> ALL(ARRAY[fields])
And I would like to do something like this, only here I check against the whole array while I would want to check against each field:
select count(*) from dbm.inventory_source where '[0,2147483647]'::int8range #> ALL(ARRAY[id, exchange_id, min_cpm_micros])
or (array[id, exchange_id, min_cpm_micros]) is null
Also, I would not want to check each field for null instead I would like to check nulls for the whole array of fields.
I pass the names of the fields like one string into query (called fields) and it is the reason I do not want to check each field separately. Such implementation was created to have more generic queries for multiple tables.
How can I fix this query?
I would like to add the possibility not to filter out null values.
Based on this, I would expect logic like:
where '[0,2147483647]'::int8range #> ALL(ARRAY[field_1, field_2, field_3]) or
(field_1 is null and field_2 is null and field_3 is null)
I am unclear if you want to allow all values to be NULL or any of them. The above is for all of them. If you want any, change the ands to ors.
If I understand correctly, presumably you're looking for something like this:
SELECT *
FROM table_test
WHERE '[0,2147483647]'::int8range #> ALL(ARRAY[fields]) IS NOT FALSE
(yeah, sorry, all I did was add three words and capitalize your keywords)
What's this doing? Let's start from the top.
Let's look at what we want from all the conditional stuff. Specifically, it seems we want the condition to return TRUE for every array wherein each value of the array satisfies one of these two conditions:
The value falls within the range [0,2147483647]
The value is NULL
It's useful here to keep in mind the exact meaning of NULL in SQL: it's a value that we don't know. This is why NULL propagates in most operations, and thinking of it that way makes it easier to predict how the database will treat it. In fact, let's replace it with ? for some examples. Why doesn't ? = ? return TRUE? It's because we don't know what either of those values are, so we don't know if they're equal, so the expression evaluates to some unknown value, NULL. What about something like ? + 1? Well, we don't know what the sum is, so it's also NULL. Similarly, ? AND TRUE depends entirely on what the first value is, and we don't know what it is, so we write NULL.
This is where it gets fun: ? AND FALSE will always be false, no matter what our unknown value is, so it evaluates to FALSE instead of NULL. Similarly, ? OR TRUE must evaluate to TRUE.
Now, revisiting our two conditions, we see that your code already checked for condition 1. What about condition 2? Well, think about how ALL works, and what it's really telling you. It's basically evaluating your condition for each entry in your array, then combining all of those with AND to tell you whether or not it's true for all of the entries. This means that your test, specifically the expression
'[0,2147483647]'::int8range #> ALL(ARRAY[fields])
returns TRUE, FALSE, or NULL for each of the entries in the array, then combines those results using AND. Since we know that
TRUE AND TRUE returns TRUE
NULL AND TRUE returns NULL
x AND FALSE returns FALSE for any x
we can safely say that your code will return FALSE if and only if the array contains a value outside of your given range; otherwise, it will return TRUE or NULL. On the other hand, we want to get TRUE regardless of whether your code says TRUE or NULL; in other words, whenever it evaluates to anything other than FALSE. Luckily, there's a predicate for that:
IS NOT NULL
Well! That was pretty complicated to think about, but after finding the solution, it seems almost offensively simple! Way to make me feel stupid.
Check it out here.
In SQL, every operation which involves an operand with NULL yields NULL (with the obvious exceptions of IS NULL or IS NOT NULL operators). However, NULL does not propagate with AND or OR operators which may return TRUE or FALSE. For example, the following in MariaDB 10.4 returns NULL and 0 respectively:
select 0 & null, 0 and null
The difference is that the first is a bitwise AND, the second is a boolean AND. Why NULL does not propagate in boolean operation?
A NULL value has a whole series of possible meanings. IIRC Chris Date found about 7 different interpretations.
A very common interpretation of NULL is: "I don't know". Another one is: "Not applicable".
So let's try to evaluate a condition with the "I don't know" interpretation of a NULL value.
As an example suppose there are two persons. And you want to compare their age. Person A happens to be 31 years old. In case of the other person, person B, you don't know.
The question if A is as old as B cannot be answered positively. But it can't be denied either. In fact, you don't know. Hence the truth value here is NULL.
If we add the ages of both persons, we run into the same problem. We don't have a clue about the sum of their ages. Again the resulting value is NULL.
This is why you'll have to define how to treat NULL values. A database system cannot know this.
We have 0 and null. In MariaDB, 0 and FALSE are synonymous. So we have FALSE AND NULL. But FALSE AND <anything> is always FALSE - there's no doubt that no matter what value might be substituted here, nothing can make this statement TRUE now.
So we short-circuit and return the FALSE/0 result. Similarly, 1 OR NULL should return 1.
NULL has the semantics of "unknown" value. It does not have the semantics of "missing". This is a nuance.
But, it is "propagated" by AND and OR, just not as you might expect. So:
true AND NULL --> NULL, because the value would depend on what NULL is
false AND NULL --> false, because the first value requires that the result is false
In WHERE and WHEN clauses, NULL is treated as "false". However, in CHECK constraints, NULL is treated as "true" -- that is, only explicitly false values fail the NULL constraint.
Otherwise, you are correct that almost all operations with NULL return NULL. The & operator is a bitwise operator that has nothing to do with boolean values. It is just another "mathematical" operator, such as +, or *, so the value is NULL when any operand is NULL.
One very important exception is the NULL-safe comparison operator, <=>.
I have a rough understanding of why = null in SQL and is null are not the same, from questions like this one.
But then, why is
update table
set column = null
a valid SQL statement (at least in Oracle)?
From that answer, I know that null can be seen as somewhat "UNKNOWN" and therefore and sql-statement with where column = null "should" return all rows, because the value of column is no longer an an unknown value. I set it to null explicitly ;)
Where am I wrong/ do not understand?
So, if my question is maybe unclear:
Why is = null valid in the set clause, but not in the where clause of an SQL statement?
SQL doesn't have different graphical signs for assignment and equality operators like languages such as c or java have. In such languages, = is the assignment operator, while == is the equality operator. In SQL, = is used for both cases, and interpreted contextually.
In the where clause, = acts as the equality operator (similar to == in C). I.e., it checks if both operands are equal, and returns true if they are. As you mentioned, null is not a value - it's the lack of a value. Therefore, it cannot be equal to any other value.
In the set clause, = acts as the assignment operator (similar to = in C). I.e., it sets the left operand (a column name) with the value of the right operand. This is a perfectly legal statement - you are declaring that you do not know the value of a certain column.
They completely different operators, even if you write them the same way.
In a where clause, is a comparsion operator
In a set, is an assignment operator
The assigment operator allosw to "clear" the data in the column and set it to the "null value" .
In the set clause, you're assigning the value to an unknown, as defined by NULL. In the where clause, you're querying for an unknown. When you don't know what an unknown is, you can't expect any results for it.
What I'm trying to do is the following: I have 3 columns (Control_OpenDate, Control_Record Age, Control_Stage2). Once the row is inserted it will populate Control_OpenDate (01/27/2013) and Control_RecordAge (computed column, see formula).
(datediff(day, [Control_OpenDate], getdate()))
which gives the days up to date.
Everything is working perfect but I want to add like an IF condition to the computed column whenever the column Control_Stage2 is populated if not, do not calculate or add a text...
How can I add the WHERE-statement in the above formula??
Note: I'm entering such formula directly into the column properties, I know there are queries that can do this but is there a way to do it trough a formula.
This can be done using a CASE-statement, as shown here.
Your logic will then look like:
(CASE
WHEN [Control_Stage2] IS NULL THEN NULL -- or -1 or what you like
ELSE datediff(day,[Control_OpenDate],getdate())
END)
This can also be written as a ternary statement (IIF)
IIF ( boolean_expression, true_value, false_value )
IIF is a shorthand way for writing a CASE expression. It evaluates the Boolean expression passed as the first argument, and then returns either of the other two arguments based on the result of the evaluation. That is, the true_value is returned if the Boolean expression is true, and the false_value is returned if the Boolean expression is false or unknown
E.G.
iif([Control_Stage2] is null, null, datediff(day,[Control_OpenDate],getdate()))
What does this statement meaning, could somebody elaborate me on this please
(#diplomaPercentage is null OR diploma_percentage>=#diplomaPercentage)
Will be returned all rows if diplomaPercentage is not specified (i.e. passed null), in other cases will be returned rows where diploma_percentage more or equal #diplomaPercentage. :)
This kind of condition is usually used to avoid "dynamic SQL", but it makes the code ugly and ironically may result in worse performance compared to just using dynamic SQL. You can read more about this at:
http://www.sommarskog.se/dyn-search-2005.html
#diplomaPercentage is a variable.
#diplomaPercentage is null is checking if the variable is NULL or not and
diploma_percentage>=#diplomaPercentage is checking if the column value diploma_percentage is greater than or equal to variable value
If you give it a null value for #diplomapercentage then it will return all records, otherwise it will it return only the records with a diploma_percentage value greater than or equal to the one you supply.
If no value for #diplomaPercentage is passed it it will return all rows, otherwise it will return all the rows where diploma_percentage is greater than #diplomaPercentage if a value has been passed in