How to use ANY instead of IN in a WHERE clause? - sql

I used to have a query like in Rails:
MyModel.where(id: ids)
Which generates sql query like:
SELECT "my_models".* FROM "my_models"
WHERE "my_models"."id" IN (1, 28, 7, 8, 12)
Now I want to change this to use ANY instead of IN. I created this:
MyModel.where("id = ANY(VALUES(#{ids.join '),('}))"
Now when I use empty array ids = [] I get the folowing error:
MyModel Load (53.0ms) SELECT "my_models".* FROM "my_models" WHERE (id = ANY(VALUES()))
ActiveRecord::JDBCError: org.postgresql.util.PSQLException: ERROR: syntax error at or near ")"
ActiveRecord::StatementInvalid: ActiveRecord::JDBCError: org.postgresql.util.PSQLException: ERROR: syntax error at or near ")"
Position: 75: SELECT "social_messages".* FROM "social_messages" WHERE (id = ANY(VALUES()))
from arjdbc/jdbc/RubyJdbcConnection.java:838:in `execute_query'

There are two variants of IN expressions:
expression IN (subquery)
expression IN (value [, ...])
Similarly, two variants with the ANY construct:
expression operator ANY (subquery)
expression operator ANY (array expression)
A subquery works for either technique, but for the second form of each, IN expects a list of values (as defined in standard SQL) while = ANY expects an array.
Which to use?
ANY is a later, more versatile addition, it can be combined with any binary operator returning a boolean value. IN burns down to a special case of ANY. In fact, its second form is rewritten internally:
IN is rewritten with = ANY
NOT IN is rewritten with <> ALL
Check the EXPLAIN output for any query to see for yourself. This proves two things:
IN can never be faster than = ANY.
= ANY is not going to be substantially faster.
The choice should be decided by what's easier to provide: a list of values or an array (possibly as array literal - a single value).
If the IDs you are going to pass come from within the DB anyway, it is much more efficient to select them directly (subquery) or integrate the source table into the query with a JOIN (like #mu commented).
To pass a long list of values from your client and get the best performance, use an array, unnest() and join, or provide it as table expression using VALUES (like #PinnyM commented). But note that a JOIN preserves possible duplicates in the provided array / set while IN or = ANY do not. More:
Optimizing a Postgres query with a large IN
In the presence of NULL values, NOT IN is often the wrong choice and NOT EXISTS would be right (and faster, too):
Select rows which are not present in other table
Syntax for = ANY
For the array expression Postgres accepts:
an array constructor (array is constructed from a list of values on the Postgres side) of the form: ARRAY[1,2,3]
or an array literal of the form '{1,2,3}'.
To avoid invalid type casts, you can cast explicitly:
ARRAY[1,2,3]::numeric[]
'{1,2,3}'::bigint[]
Related:
PostgreSQL: Issue with passing array to procedure
How to pass custom type array to Postgres function
Or you could create a Postgres function taking a VARIADIC parameter, which takes individual arguments and forms an array from them:
Passing multiple values in single parameter
How to pass the array from Ruby?
Assuming id to be integer:
MyModel.where('id = ANY(ARRAY[?]::int[])', ids.map { |i| i})
But I am just dabbling in Ruby. #mu provides detailed instructions in this related answer:
Sending array of values to a sql query in ruby?

Related

WHERE clause returning no results

I have the following query:
SELECT *
FROM public."Matches"
WHERE 'Matches.Id' = '24e81894-2f1e-4654-bf50-b75e584ed3eb'
I'm certain there is an existing match with this Id (tried it on other Ids as well), but it returns 0 rows. I'm new to querying with PgAdmin so it's probably just a simple error, but I've read the docs up and down and can't seem to find why this is returning nothing.
Single quotes are only used for strings in SQL. So 'Matches.Id' is a string constant and obviously not the same as '24e81894-2f1e-4654-bf50-b75e584ed3eb' thus the WHERE condition is always false (it's like writing where 1 = 0)
You need to use double quotes for identifiers, the same way you did in the FROM clause.
WHERE "Matches"."Id" = ...
In general the use of quoted identifiers is strongly discouraged.

How to use regexp_matches() in an UPDATE statement?

I am trying to clean up a table that has a very messy varchar column, with entries of the sorts:
<u><font color="#0000FF">VA Lidar</font></u> OR <u><font color="#0000FF">InPort Metadata</font></u>
I would like to update the column by keeping only the html links, and separating them with a coma if there are more than one. Ideally I would do something like this:
UPDATE mytable
SET column = array_to_string(regexp_matches(column,'(?<=href=").+?(?=\")','g') , ',');
But unfortunately this returns an error in Postgres 10:
ERROR: set-returning functions are not allowed in UPDATE
I assume regexp_matches() is the said set-returning function. Any ideas on how I can achieve this?
Notes
1.
You don't need to base the correlated subquery on a separate instance of the base table (like other answers suggested). That would be doing more work for nothing.
2.
For simple cases an ARRAY constructor is cheaper than array_agg(). See:
Why is array_agg() slower than the non-aggregate ARRAY() constructor?
3.
I use a regular expression without lookahead and lookbehind constraints and parentheses instead: href="([^"]+)
See query 1.
This works because parenthesized subexpressions are captured by regexp_matches() (and several other Postgres regexp functions). So we can replace the more sophisticated constraints with plain parentheses. The manual on regexp_match():
If a match is found, and the pattern contains no parenthesized
subexpressions, then the result is a single-element text array
containing the substring matching the whole pattern. If a match is
found, and the *pattern* contains parenthesized subexpressions, then the
result is a text array whose n'th element is the substring matching
the n'th parenthesized subexpression of the pattern
And for regexp_matches():
This function returns no rows if there is no match, one row if there
is a match and the g flag is not given, or N rows if there are N
matches and the g flag is given. Each returned row is a text array
containing the whole matched substring or the substrings matching
parenthesized subexpressions of the pattern, just as described above
for regexp_match.
4.
regexp_matches() returns a set of arrays (setof text[]) for a reason: not only can a regular expression match several times in a single string (hence the set), it can also produce multiple strings for each single match with multiple capturing parentheses (hence the array). Does not occur with this regexp, every array in the result holds a single element. But future readers shall not be lead into a trap:
When feeding the resulting 1-D arrays to array_agg() (or an ARRAY constructor) that produces a 2-D array - which is only even possible since Postgres 9.5 added a variant of array_agg() accepting array input. See:
Is there something like a zip() function in PostgreSQL that combines two arrays?
However, quoting the manual:
inputs must all have same dimensionality, and cannot be empty or NULL
I think this can never fail as the same regexp always produces the same number of array elements. Ours always produces one element. But that may be different with other regexp. If so, there are various options:
Only take the first element with (regexp_matches(...))[1]. See query 2.
Unnest arrays and use string_agg() on base elements. See query 3.
Each approach works here, too.
Query 1
UPDATE tbl t
SET col = (
SELECT array_to_string(ARRAY(SELECT regexp_matches(col, 'href="([^"]+)', 'g')), ',')
);
Columns with no match are set to '' (empty string).
Query 2
UPDATE tbl
SET col = (
SELECT string_agg(t.arr[1], ',')
FROM regexp_matches(col, 'href="([^"]+)', 'g') t(arr)
);
Columns with no match are set to NULL.
Query 3
UPDATE tbl
SET col = (
SELECT string_agg(elem, ',')
FROM regexp_matches(col, 'href="([^"]+)', 'g') t(arr)
, unnest(t.arr) elem
);
Columns with no match are set to NULL.
db<>fiddle here (with extended test case)
You could use a correlated subquery to deal with the offending set-returning function (which is regexp_matches). Something like this:
update mytable
set column = (
select array_to_string(array_agg(x), ',')
from (
select regexp_matches(t2.c, '(?<=href=").+?(?=\")', 'g')
from t t2
where t2.id = t.id
) dt(x)
)
You're still stuck with the "CSV in a column" nastiness but that's a separate issue and presumably not a problem for you.
Building on the approach of mu is too short with slightly different regex and a COALESCE function to retain values that do not contain href-links:
UPDATE a
SET bad_data = COALESCE(
(SELECT Array_to_string(Array_agg(x), ',')
FROM (SELECT Regexp_matches(a.bad_data,
'(?<=href=")[^"]+', 'g'
) AS x
FROM a a2
WHERE a2.id = a.id) AS sub), bad_data
);
SQL Fiddle

Postgresql using the VARIADIC on a query inside a function

I would like to create a function on postgresql that receives an array of bigint (record ids) and to use the received information on a query using the "in" condition.
I know that I could simply fo the query by my self, but the point over here is that I'm going to create that function that will do some other validations and processes.
The source that I was tring to use was something like this:
CREATE OR REPLACE FUNCTION func_test(VARIADIC arr bigint[])
RETURNS TABLE(record_id bigint,parent_id bigint)
AS $$ SELECT s.record_id, s.parent_id FROM TABLE s WHERE s.column in ($1);
$$ LANGUAGE SQL;
Using the above code I receive the following error:
ERROR: operator does not exist: bigint = bigint[]
LINE 3: ...ECT s.record_id, s.parent_id FROM TABLE s WHERE s.column in ($1)
^
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.
How can I fix this?
IN doesn't work with arrays the way you think it does. IN wants to see either a list of values
expression IN (value [, ...])
or a subquery:
expression IN (subquery)
A single array will satisfy the first one but that form of IN will compare expression against each value using the equality operator (=); but, as the error message tells you, there is no equality operator that can compare a bigint with a bigint[].
You're looking for ANY:
9.23.3. ANY/SOME (array)
expression operator ANY (array expression)
expression operator SOME (array expression)
The right-hand side is a parenthesized expression, which must yield an array value. The left-hand expression is evaluated and compared to each element of the array using the given operator, which must yield a Boolean result. The result of ANY is "true" if any true result is obtained. The result is "false" if no true result is found (including the case where the array has zero elements).
So you want to say:
WHERE s.column = any ($1)
Also, you're not using the argument's name so you don't need to give it one, just this:
CREATE OR REPLACE FUNCTION func_test(VARIADIC bigint[]) ...
will be sufficient. You can leave the name there if you want, it won't hurt anything.

How to use array_agg() for varchar[]

I have a column in our database called min_crew that has varying character arrays such as '{CA, FO, FA}'.
I have a query where I'm trying to get aggregates of these arrays without success:
SELECT use.user_sched_id, array_agg(se.sched_entry_id) AS seids
, array_agg(se.min_crew)
FROM base.sched_entry se
LEFT JOIN base.user_sched_entry use ON se.sched_entry_id = use.sched_entry_id
WHERE se.sched_entry_id = ANY(ARRAY[623, 625])
GROUP BY user_sched_id;
Both 623 and 625 have the same use.user_sched_id, so the result should be the grouping of the seids and the min_crew, but I just keep getting this error:
ERROR: could not find array type for data type character varying[]
If I remove the array_agg(se.min_crew) portion of the code, I do get a table returned with the user_sched_id = 2131 and seids = '{623, 625}'.
The standard aggregate function array_agg() only works for base types, not array types as input.
(But Postgres 9.5+ has a new variant of array_agg() that can!)
You could use the custom aggregate function array_agg_mult() as defined in this related answer:
Selecting data into a Postgres array
Create it once per database. Then your query could work like this:
SELECT use.user_sched_id, array_agg(se.sched_entry_id) AS seids
,array_agg_mult(ARRAY[se.min_crew]) AS min_crew_arr
FROM base.sched_entry se
LEFT JOIN base.user_sched_entry use USING (sched_entry_id)
WHERE se.sched_entry_id = ANY(ARRAY[623, 625])
GROUP BY user_sched_id;
There is a detailed rationale in the linked answer.
Extents have to match
In response to your comment, consider this quote from the manual on array types:
Multidimensional arrays must have matching extents for each dimension.
A mismatch causes an error.
There is no way around that, the array type does not allow such a mismatch in Postgres. You could pad your arrays with NULL values so that all dimensions have matching extents.
But I would rather translate the arrays to a comma-separated lists with array_to_string() for the purpose of this query and use string_agg() to aggregate the text - preferably with a different separator. Using a newline in my example:
SELECT use.user_sched_id, array_agg(se.sched_entry_id) AS seids
,string_agg(array_to_string(se.min_crew, ','), E'\n') AS min_crews
FROM ...
Normalize
You might want to consider normalizing your schema to begin with. Typically, you would implement such an n:m relationship with a separate table like outlined in this example:
How to implement a many-to-many relationship in PostgreSQL?

Check if value exists in Postgres array

Using Postgres 9.0, I need a way to test if a value exists in a given array. So far I came up with something like this:
select '{1,2,3}'::int[] #> (ARRAY[]::int[] || value_variable::int)
But I keep thinking there should be a simpler way to this, I just can't see it. This seems better:
select '{1,2,3}'::int[] #> ARRAY[value_variable::int]
I believe it will suffice. But if you have other ways to do it, please share!
Simpler with the ANY construct:
SELECT value_variable = ANY ('{1,2,3}'::int[])
The right operand of ANY (between parentheses) can either be a set (result of a subquery, for instance) or an array. There are several ways to use it:
SQLAlchemy: how to filter on PgArray column types?
IN vs ANY operator in PostgreSQL
Important difference: Array operators (<#, #>, && et al.) expect array types as operands and support GIN or GiST indices in the standard distribution of PostgreSQL, while the ANY construct expects an element type as left operand and can be supported with a plain B-tree index (with the indexed expression to the left of the operator, not the other way round like it seems to be in your example). Example:
Index for finding an element in a JSON array
None of this works for NULL elements. To test for NULL:
Check if NULL exists in Postgres array
Watch out for the trap I got into: When checking if certain value is not present in an array, you shouldn't do:
SELECT value_variable != ANY('{1,2,3}'::int[])
but use
SELECT value_variable != ALL('{1,2,3}'::int[])
instead.
but if you have other ways to do it please share.
You can compare two arrays. If any of the values in the left array overlap the values in the right array, then it returns true. It's kind of hackish, but it works.
SELECT '{1}' && '{1,2,3}'::int[]; -- true
SELECT '{1,4}' && '{1,2,3}'::int[]; -- true
SELECT '{4}' && '{1,2,3}'::int[]; -- false
In the first and second query, value 1 is in the right array
Notice that the second query is true, even though the value 4 is not contained in the right array
For the third query, no values in the left array (i.e., 4) are in the right array, so it returns false
unnest can be used as well.
It expands array to a set of rows and then simply checking a value exists or not is as simple as using IN or NOT IN.
e.g.
id => uuid
exception_list_ids => uuid[]
select * from table where id NOT IN (select unnest(exception_list_ids) from table2)
Hi that one works fine for me, maybe useful for someone
select * from your_table where array_column ::text ilike ANY (ARRAY['%text_to_search%'::text]);
"Any" works well. Just make sure that the any keyword is on the right side of the equal to sign i.e. is present after the equal to sign.
Below statement will throw error: ERROR: syntax error at or near "any"
select 1 where any('{hello}'::text[]) = 'hello';
Whereas below example works fine
select 1 where 'hello' = any('{hello}'::text[]);
When looking for the existence of a element in an array, proper casting is required to pass the SQL parser of postgres. Here is one example query using array contains operator in the join clause:
For simplicity I only list the relevant part:
table1 other_name text[]; -- is an array of text
The join part of SQL shown
from table1 t1 join table2 t2 on t1.other_name::text[] #> ARRAY[t2.panel::text]
The following also works
on t2.panel = ANY(t1.other_name)
I am just guessing that the extra casting is required because the parse does not have to fetch the table definition to figure the exact type of the column. Others please comment on this.