This question already has an answer here:
How to use ANY instead of IN in a WHERE clause?
(1 answer)
Closed 5 years ago.
Recently I've read Quantified Comparison Predicates – Some of SQL’s Rarest Species:
In fact, the SQL standard defines the IN predicate as being just syntax sugar for the = ANY() quantified comparison predicate.
8.4 <in predicate>
Let RVC be the <row value predicand> and
let IPV be the <in predicate value>.
The expression RVC IN IPV
is equivalent to RVC = ANY IPV
Fair enough, based on other answers like: What is exactly “SOME / ANY” and “IN” or Oracle: '= ANY()' vs. 'IN ()'
I've assumed that I could use them interchangely.
Now here is my example:
select 'match'
where 1 = any( string_to_array('1,2,3', ',')::int[])
-- match
select 'match'
where 1 IN ( string_to_array('1,2,3', ',')::int[])
-- ERROR: operator does not exist: integer = integer[]
-- HINT: No operator matches the given name and argument type(s).
-- You might need to add explicit type casts.
DB Fiddle
The question is why the first query is working and the second returns error?
That's because IN (unlike ANY) does not accept an array as input. Only a set (from a subquery) or a list of values. Detailed explanation:
How to use ANY instead of IN in a WHERE clause with Rails?
Related
I try a query which test NULL NOT IN Empty_Relation on Postrgresql, Spark and I got different results.
select count(*) from
(select 1)
where null not in
(a empty relation)
Postgresql outputs 1.
The other outputs 0.
I understand the NULL behaviour of NOT IN, but my the subquery is empty relation, this situation seems more interesting.
There are a lot of posts discussing NOT IN but I don't find anything related to NOT IN Empty_Relation.
So my question is more like does ANSI SQL define this behavior or this is actually a grey area, both answers could be accepted.
tl;dr: PostgreSQL is correct.
This is what the SQL specification says about this behavior:
4) The expression RVC NOT IN IPV is equivalent to NOT ( RVC IN IPV )
5) The expression RVC IN IPV is equivalent to RVC = ANY IPV
So, NULL NOT IN (<empty relation>) is equivalent to NOT (NULL = ANY (<empty relation>))
Then, it goes on to say:
The result of R <comp op> <quantifier> T is derived by the application of the implied <comparison predicate> R <comp op> RT to every row RT in T.
[...]
d) If T is empty or if the implied <comparison predicate> is False for every row RT in T, then R <comp op> <some> T is False.
(Note: <some> is either ANY or SOME -- they both mean the same).
By this rule, since T is empty, NULL = ANY (<empty>) is False, so NOT (NULL = ANY (<empty relation>) is True.
I'm pretty sure Postgres is correct.
Although almost every comparison with NULL returns NULL, you have found an exception. If the set is empty, then nothing is in the set. That is, any value is NOT in the set regardless of the value.
Remember, the semantics of NULL mean "unknown" value -- not missing value. "Unknown" means that it can take on any value. The expression <anything> not in (<empty set>) is true, regardless of the value of <anything>.
Incidentally, Postgres is not alone in this behavior. A cursory look shows that SQL Server and Oracle also return 1 for equivalent queries.
This question already has answers here:
String concatenation with a null seems to nullify the entire string - is that desired behavior in Postgres?
(4 answers)
Closed 1 year ago.
I'm trying to understand the differences between concat() and || in postgres when using nulls. Why does concat() return an empty string if both sides of the concat are null?
Take this query For Example:
SELECT concat(NULL,NULL) AS null_concat, NULL||NULL AS null_pipes,
concat(NULL,NULL) IS NULL is_concat_null, NULL||NULL IS NULL is_pipe_null
will return:
I understand that concat() ignores nulls but if all the values in the concat are null, wouldn't the expected result be a null? Is this typical behavior of all functions in postgres? I couldn't find anything in the documentaion around this scenario.
Edit:
I had a thought that maybe this was the expected result of any string function but that does not appear to be the case. Both upper() and left() return nulls if a null value is passed:
SELECT concat(NULL), NULL||NULL, UPPER(null), left(NULL,1)
Result:
in concat() function:
text concat(str "any",...) Concatenate all arguments. NULL arguments are ignored.
Note: NULL arguments are ignored.
Imagine this:
The input arguments concat() are dynamical.
So when we write: concat('a',null,null,null,null) => we have written: concat('a')
(As opposed to the || operator that NULL destroyed everything)
in || operator:
the string concatenation operator (||) still accepts non-string input,
so long as at least one input is of a string type
So NULL||NULL has a wrong syntax
But why not give Error?
Because in the concat operation, if we do not reject the NULL (Like the concat function), they will overwhelm everything
SELECT NULL ||'aaa'||'bbb'||'ccc'||'ddd'
output:
NULL
more info:
Note: Before PostgreSQL 8.3, these functions would silently accept
values of several non-string data types as well, due to the presence
of implicit coercions from those data types to text. Those coercions
have been removed because they frequently caused surprising behaviors.
However, the string concatenation operator (||) still accepts
non-string input, so long as at least one input is of a string type,
as shown in Table 9-6. For other cases, insert an explicit coercion to
text if you need to duplicate the previous behavior.
I used to have a query like in Rails:
MyModel.where(id: ids)
Which generates sql query like:
SELECT "my_models".* FROM "my_models"
WHERE "my_models"."id" IN (1, 28, 7, 8, 12)
Now I want to change this to use ANY instead of IN. I created this:
MyModel.where("id = ANY(VALUES(#{ids.join '),('}))"
Now when I use empty array ids = [] I get the folowing error:
MyModel Load (53.0ms) SELECT "my_models".* FROM "my_models" WHERE (id = ANY(VALUES()))
ActiveRecord::JDBCError: org.postgresql.util.PSQLException: ERROR: syntax error at or near ")"
ActiveRecord::StatementInvalid: ActiveRecord::JDBCError: org.postgresql.util.PSQLException: ERROR: syntax error at or near ")"
Position: 75: SELECT "social_messages".* FROM "social_messages" WHERE (id = ANY(VALUES()))
from arjdbc/jdbc/RubyJdbcConnection.java:838:in `execute_query'
There are two variants of IN expressions:
expression IN (subquery)
expression IN (value [, ...])
Similarly, two variants with the ANY construct:
expression operator ANY (subquery)
expression operator ANY (array expression)
A subquery works for either technique, but for the second form of each, IN expects a list of values (as defined in standard SQL) while = ANY expects an array.
Which to use?
ANY is a later, more versatile addition, it can be combined with any binary operator returning a boolean value. IN burns down to a special case of ANY. In fact, its second form is rewritten internally:
IN is rewritten with = ANY
NOT IN is rewritten with <> ALL
Check the EXPLAIN output for any query to see for yourself. This proves two things:
IN can never be faster than = ANY.
= ANY is not going to be substantially faster.
The choice should be decided by what's easier to provide: a list of values or an array (possibly as array literal - a single value).
If the IDs you are going to pass come from within the DB anyway, it is much more efficient to select them directly (subquery) or integrate the source table into the query with a JOIN (like #mu commented).
To pass a long list of values from your client and get the best performance, use an array, unnest() and join, or provide it as table expression using VALUES (like #PinnyM commented). But note that a JOIN preserves possible duplicates in the provided array / set while IN or = ANY do not. More:
Optimizing a Postgres query with a large IN
In the presence of NULL values, NOT IN is often the wrong choice and NOT EXISTS would be right (and faster, too):
Select rows which are not present in other table
Syntax for = ANY
For the array expression Postgres accepts:
an array constructor (array is constructed from a list of values on the Postgres side) of the form: ARRAY[1,2,3]
or an array literal of the form '{1,2,3}'.
To avoid invalid type casts, you can cast explicitly:
ARRAY[1,2,3]::numeric[]
'{1,2,3}'::bigint[]
Related:
PostgreSQL: Issue with passing array to procedure
How to pass custom type array to Postgres function
Or you could create a Postgres function taking a VARIADIC parameter, which takes individual arguments and forms an array from them:
Passing multiple values in single parameter
How to pass the array from Ruby?
Assuming id to be integer:
MyModel.where('id = ANY(ARRAY[?]::int[])', ids.map { |i| i})
But I am just dabbling in Ruby. #mu provides detailed instructions in this related answer:
Sending array of values to a sql query in ruby?
In Mathematics and many programming languages (and I think standard SQL as well), parentheses change precedence (grouping parts to be evaluated first) or to enhance readability (for human eyes).
Equivalent Examples:
SELECT array[1,2] #> array[1]
SELECT (array[1,2]) #> array[1]
SELECT array[1,2] #> (array[1])
SELECT ((array[1,2]) #> (array[1]))
But SELECT 1 = ANY array[1,2] is a syntax error (!), and SELECT 1 = ANY (array[1,2]) is valid. Why?
OK, because "the manual says so". But what the logic for humans to remember all exceptions?
Is there a guide about it?
I do not understand why (expression) is the same as expression in some cases, but not in other cases.
PS1: parentheses are also used as value-list delimiters, as in expression IN (value [, ...]). But an array is not a value-list, and there does not seem to be a general rule in PostgreSQL when (array expression) is not the same as array expression.
Also, I used array as example, but this problem/question is not only about arrays.
"Is there a summarized guide?", well... The answer is no, so: hands-on! This answer is a Wiki, let's write.
Summarized guide
Let,
F() a an usual function. (ex. ROUND)
L() a function-like operator (ex. ANY)
f a operator-like function (ex. current_date)
Op an operator
Op1, Op2 are distinct operators
A, B, C values or expressions
S a expression-list, as "(A,B,C)"
The rules, using these elements, are in the form
rule: notes.
"pure" mathematical expressions
When Op, Op1, Op2 are mathematical operators (ex. +, -. *), and F() is a mathematical function (ex. ROUND()).
Rules for scalar expressions and "pure array expressions":
A Op B = (A Op B): the parentheses is optional.
A Op1 B Op2 C: need to check precedence.
(A Op1 B) Op2 C: enforce "first (A Op1 B)".
A Op1 (B Op2 C): enforce "first (B Op2 C)".
F(A) = (F(A)) = F((A)) = (F((A))): the parentheses are optional.
S = (S): the external parentheses are optional.
f=(f): the parentheses are optional.
Expressions with function-like operators
Rules for operators as ALL, ANY, ROW, SOME, etc.
L(A) = L((A)): the parentheses is optional in the argument.
(L(A)): SYNTAX ERROR.
...More rules? Please help editing here.
ANY is a function-like construct. Like (almost) any other function in Postgres it requires parentheses around its parameters. Makes the syntax consistent and helps the parser avoid ambiguities.
You can think of ANY() like a shorthand for unnest() condensed to a single expression.
One might argue an additional set of parentheses around the set-variant of ANY. But that would be ambiguous, since a list of values in parentheses is interpreted as a single ROW type.
I would like to create a function on postgresql that receives an array of bigint (record ids) and to use the received information on a query using the "in" condition.
I know that I could simply fo the query by my self, but the point over here is that I'm going to create that function that will do some other validations and processes.
The source that I was tring to use was something like this:
CREATE OR REPLACE FUNCTION func_test(VARIADIC arr bigint[])
RETURNS TABLE(record_id bigint,parent_id bigint)
AS $$ SELECT s.record_id, s.parent_id FROM TABLE s WHERE s.column in ($1);
$$ LANGUAGE SQL;
Using the above code I receive the following error:
ERROR: operator does not exist: bigint = bigint[]
LINE 3: ...ECT s.record_id, s.parent_id FROM TABLE s WHERE s.column in ($1)
^
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.
How can I fix this?
IN doesn't work with arrays the way you think it does. IN wants to see either a list of values
expression IN (value [, ...])
or a subquery:
expression IN (subquery)
A single array will satisfy the first one but that form of IN will compare expression against each value using the equality operator (=); but, as the error message tells you, there is no equality operator that can compare a bigint with a bigint[].
You're looking for ANY:
9.23.3. ANY/SOME (array)
expression operator ANY (array expression)
expression operator SOME (array expression)
The right-hand side is a parenthesized expression, which must yield an array value. The left-hand expression is evaluated and compared to each element of the array using the given operator, which must yield a Boolean result. The result of ANY is "true" if any true result is obtained. The result is "false" if no true result is found (including the case where the array has zero elements).
So you want to say:
WHERE s.column = any ($1)
Also, you're not using the argument's name so you don't need to give it one, just this:
CREATE OR REPLACE FUNCTION func_test(VARIADIC bigint[]) ...
will be sufficient. You can leave the name there if you want, it won't hurt anything.