Sending ARRAY to VALUES clause fails - sql

If I want to construct a temporary valueset for testing, I can do something like this:
SELECT * FROM (VALUES (97.99), (98.01), (99.00))
which will result in this:
COLUMN1
1
97.99
2
98.01
3
99.00
However, if I want to construct a result set where one of the columns contains an ARRAY, like this:
SELECT * FROM (VALUES (97.99, [14, 37]), (98.01, []), (99.00, [14]))
I would expect this:
COLUMN1
COLUMN2
1
97.99
[14, 37]
2
98.01
[]
3
99.00
[14]
but I actually get the following error:
Invalid expression [ARRAY_CONSTRUCT(14, 37)] in VALUES clause
I don't see anything in the documentation for the VALUES clause that explains why this is invalid. What am I doing wrong here and how can I generate a result set with an ARRAY column?

I think the values clause only allows primitive types. You can define it as a string in single quotes and use parse_json to turn it into an array:
SELECT $1 COL1, parse_json($2)::array COL2
FROM (VALUES (97.99, '[14, 37]'), (98.01, '[]'), (99.00, '[14]'));

VALUES() has some restrictions:
Each expression must be a constant, or an expression that can be evaluated as a constant during compilation of the SQL statement.
Most simple arithmetic expressions and string functions can be evaluated at compile time, but most other expressions cannot.
https://docs.snowflake.com/en/sql-reference/constructs/values.html

From the documentation
Each expression must be a constant, or an expression that can be
evaluated as a constant during compilation of the SQL statement.
Most simple arithmetic expressions and string functions can be
evaluated at compile time, but most other expressions cannot.
The documentation doesn't explicitly says this, but given the ability of arrays to hold multiple data types and varying number of elements, I want to say arrays in most SQL based databases are dynamic arrays that don't evaluate at compile time. Maybe some experts can shed more light on this.
Back to your problem, I would just use explicit select statements like:
select 97.99, [14, 37] union all
select 98.01, [];

Related

LIKE operator two columns clickhouse

I want to select rows in clickhouse table where two string columns are LIKE each other (foe example where column1 is 'Hello' and column2 is '%llo')
I tried LIKE operator:
SELECT * FROM table_name WHERE column1 LIKE column2;
but it said:
Received exception from server (version 21.2.8):
Code: 44. DB::Exception: Received from localhost:9000. DB::Exception: Argument at index 1 for function like must be constant: while executing 'FUNCTION like(column1 : 17, column2 : 17) -> like(column1, column2) UInt8 : 28'.
it seems that the second argument should be a constant value. Is there any other way to apply this condition?
CH Like supports only constant argument.
There is no general solution. The same problem with regex functions and so on. (because Clickhouse applies compiled expression and applies to a column byte-stream before separating to rows).
In some cases you can use position or countSubstrings functions for this task.
You can use LOCATE or POSITION for this (https://clickhouse.tech/docs/en/sql-reference/functions/string-search-functions/). The query would look something like this:
SELECT *
FROM table_name
WHERE position(column1, column2, character_length(column1) - character_length(column2) + 1) > 0;
This may be flawed. It seems that in clickhouse most string functions work on bytes or variable UTF8 byte lengths rather than on characters. One has to pay attentention hence how the functions work and how they should be combined. I am using the third parameter start_pos above and assume that it refers to the character position, but well, it can be bytes just as well - I have not been able to find this information in the docs .

How to use regexp_matches() in an UPDATE statement?

I am trying to clean up a table that has a very messy varchar column, with entries of the sorts:
<u><font color="#0000FF">VA Lidar</font></u> OR <u><font color="#0000FF">InPort Metadata</font></u>
I would like to update the column by keeping only the html links, and separating them with a coma if there are more than one. Ideally I would do something like this:
UPDATE mytable
SET column = array_to_string(regexp_matches(column,'(?<=href=").+?(?=\")','g') , ',');
But unfortunately this returns an error in Postgres 10:
ERROR: set-returning functions are not allowed in UPDATE
I assume regexp_matches() is the said set-returning function. Any ideas on how I can achieve this?
Notes
1.
You don't need to base the correlated subquery on a separate instance of the base table (like other answers suggested). That would be doing more work for nothing.
2.
For simple cases an ARRAY constructor is cheaper than array_agg(). See:
Why is array_agg() slower than the non-aggregate ARRAY() constructor?
3.
I use a regular expression without lookahead and lookbehind constraints and parentheses instead: href="([^"]+)
See query 1.
This works because parenthesized subexpressions are captured by regexp_matches() (and several other Postgres regexp functions). So we can replace the more sophisticated constraints with plain parentheses. The manual on regexp_match():
If a match is found, and the pattern contains no parenthesized
subexpressions, then the result is a single-element text array
containing the substring matching the whole pattern. If a match is
found, and the *pattern* contains parenthesized subexpressions, then the
result is a text array whose n'th element is the substring matching
the n'th parenthesized subexpression of the pattern
And for regexp_matches():
This function returns no rows if there is no match, one row if there
is a match and the g flag is not given, or N rows if there are N
matches and the g flag is given. Each returned row is a text array
containing the whole matched substring or the substrings matching
parenthesized subexpressions of the pattern, just as described above
for regexp_match.
4.
regexp_matches() returns a set of arrays (setof text[]) for a reason: not only can a regular expression match several times in a single string (hence the set), it can also produce multiple strings for each single match with multiple capturing parentheses (hence the array). Does not occur with this regexp, every array in the result holds a single element. But future readers shall not be lead into a trap:
When feeding the resulting 1-D arrays to array_agg() (or an ARRAY constructor) that produces a 2-D array - which is only even possible since Postgres 9.5 added a variant of array_agg() accepting array input. See:
Is there something like a zip() function in PostgreSQL that combines two arrays?
However, quoting the manual:
inputs must all have same dimensionality, and cannot be empty or NULL
I think this can never fail as the same regexp always produces the same number of array elements. Ours always produces one element. But that may be different with other regexp. If so, there are various options:
Only take the first element with (regexp_matches(...))[1]. See query 2.
Unnest arrays and use string_agg() on base elements. See query 3.
Each approach works here, too.
Query 1
UPDATE tbl t
SET col = (
SELECT array_to_string(ARRAY(SELECT regexp_matches(col, 'href="([^"]+)', 'g')), ',')
);
Columns with no match are set to '' (empty string).
Query 2
UPDATE tbl
SET col = (
SELECT string_agg(t.arr[1], ',')
FROM regexp_matches(col, 'href="([^"]+)', 'g') t(arr)
);
Columns with no match are set to NULL.
Query 3
UPDATE tbl
SET col = (
SELECT string_agg(elem, ',')
FROM regexp_matches(col, 'href="([^"]+)', 'g') t(arr)
, unnest(t.arr) elem
);
Columns with no match are set to NULL.
db<>fiddle here (with extended test case)
You could use a correlated subquery to deal with the offending set-returning function (which is regexp_matches). Something like this:
update mytable
set column = (
select array_to_string(array_agg(x), ',')
from (
select regexp_matches(t2.c, '(?<=href=").+?(?=\")', 'g')
from t t2
where t2.id = t.id
) dt(x)
)
You're still stuck with the "CSV in a column" nastiness but that's a separate issue and presumably not a problem for you.
Building on the approach of mu is too short with slightly different regex and a COALESCE function to retain values that do not contain href-links:
UPDATE a
SET bad_data = COALESCE(
(SELECT Array_to_string(Array_agg(x), ',')
FROM (SELECT Regexp_matches(a.bad_data,
'(?<=href=")[^"]+', 'g'
) AS x
FROM a a2
WHERE a2.id = a.id) AS sub), bad_data
);
SQL Fiddle

How to use ANY instead of IN in a WHERE clause?

I used to have a query like in Rails:
MyModel.where(id: ids)
Which generates sql query like:
SELECT "my_models".* FROM "my_models"
WHERE "my_models"."id" IN (1, 28, 7, 8, 12)
Now I want to change this to use ANY instead of IN. I created this:
MyModel.where("id = ANY(VALUES(#{ids.join '),('}))"
Now when I use empty array ids = [] I get the folowing error:
MyModel Load (53.0ms) SELECT "my_models".* FROM "my_models" WHERE (id = ANY(VALUES()))
ActiveRecord::JDBCError: org.postgresql.util.PSQLException: ERROR: syntax error at or near ")"
ActiveRecord::StatementInvalid: ActiveRecord::JDBCError: org.postgresql.util.PSQLException: ERROR: syntax error at or near ")"
Position: 75: SELECT "social_messages".* FROM "social_messages" WHERE (id = ANY(VALUES()))
from arjdbc/jdbc/RubyJdbcConnection.java:838:in `execute_query'
There are two variants of IN expressions:
expression IN (subquery)
expression IN (value [, ...])
Similarly, two variants with the ANY construct:
expression operator ANY (subquery)
expression operator ANY (array expression)
A subquery works for either technique, but for the second form of each, IN expects a list of values (as defined in standard SQL) while = ANY expects an array.
Which to use?
ANY is a later, more versatile addition, it can be combined with any binary operator returning a boolean value. IN burns down to a special case of ANY. In fact, its second form is rewritten internally:
IN is rewritten with = ANY
NOT IN is rewritten with <> ALL
Check the EXPLAIN output for any query to see for yourself. This proves two things:
IN can never be faster than = ANY.
= ANY is not going to be substantially faster.
The choice should be decided by what's easier to provide: a list of values or an array (possibly as array literal - a single value).
If the IDs you are going to pass come from within the DB anyway, it is much more efficient to select them directly (subquery) or integrate the source table into the query with a JOIN (like #mu commented).
To pass a long list of values from your client and get the best performance, use an array, unnest() and join, or provide it as table expression using VALUES (like #PinnyM commented). But note that a JOIN preserves possible duplicates in the provided array / set while IN or = ANY do not. More:
Optimizing a Postgres query with a large IN
In the presence of NULL values, NOT IN is often the wrong choice and NOT EXISTS would be right (and faster, too):
Select rows which are not present in other table
Syntax for = ANY
For the array expression Postgres accepts:
an array constructor (array is constructed from a list of values on the Postgres side) of the form: ARRAY[1,2,3]
or an array literal of the form '{1,2,3}'.
To avoid invalid type casts, you can cast explicitly:
ARRAY[1,2,3]::numeric[]
'{1,2,3}'::bigint[]
Related:
PostgreSQL: Issue with passing array to procedure
How to pass custom type array to Postgres function
Or you could create a Postgres function taking a VARIADIC parameter, which takes individual arguments and forms an array from them:
Passing multiple values in single parameter
How to pass the array from Ruby?
Assuming id to be integer:
MyModel.where('id = ANY(ARRAY[?]::int[])', ids.map { |i| i})
But I am just dabbling in Ruby. #mu provides detailed instructions in this related answer:
Sending array of values to a sql query in ruby?

Arithmetic operation on array aggregation in postgres

I have 2 questions regarding array_agg in postgres
1) I have a column which is of type array_agg. I need to divide each of the element of the array_agg by a constant value. Is it possible. I checked http://www.postgresql.org/docs/9.1/static/functions-array.html, but could not find any reference to any arithmetic operations on array_agg.
Edit:
An example of the desired operation.
select array_agg(value)/2 from some_table
Here, I create an array of the column value from the table some_table and I have to divide each of the column by 4
2) Is it possible to use coalesce in array_agg. In my scenario, there may be cases wherein, the array_agg of a column may result in a NULL array. Can we use coalesce for array_agg ?
Example:
select coalsece(array_agg(value1), 0)
Dividing is probably simper than you thought:
SELECT array_agg(value/2)
FROM ...
However, what value/2 does exactly depends on the data type. If value is an integer, fractional digits are truncated. To preserve fractional digits use value/2.0 instead. The fractional digit forces the calculation to be done with numeric values.
COALESCE won't make any difference outside the array. Either there are no rows, then you get no result at all ('no row'), or if there are, you get an array, possibly with NULL elements. But the value of the array itself is never NULL.
To replace individual NULL values with 0:
SELECT array_agg(coalesce(value/2.0, 0))
FROM ...

SQL FTS and comparison statements

Short story. I am working on a project where I need to communicate with SQLite database. And there I have several problems:
There is one FTS table with nodeId and nodeName columns. I need to select all nodeIds for which nodeNames contains some text pattern. For instance all node names with "Donald" inside. Something similar was discussed in this thread. The point is that I can't use CONTAINS keyword. Instead I use MATCH. And here is the question itself: how should this "Donald" string be "framed"? With '*' or with '%' character? Here is my query:
SELECT * FROM nodeFtsTable WHERE nodeName MATCH "Donald"
Is it OK to write multiple comparison in SELECT statement? I mean something like this:
SELECT * FROM distanceTable WHERE pointId = 1 OR pointId = 4 OR pointId = 203 AND distance<200
I hope that it does not sound very confusing. Thank you in advance!
Edit: Sorry, I missed the fact that you are using FTS4. It looks like you can just do this:
SELECT * FROM nodeFtsTable WHERE nodeName MATCH 'Donald'
Here is relevant documentation.
No wildcard characters are needed in order to match all entries in which Donald is a discrete word (e.g. the above will match Donald Duck). If you want to match Donald as part of a word (e.g. Donalds) then you need to use * in the appropriate place:
SELECT * FROM nodeFtsTable WHERE nodeName MATCH 'Donald*'
If your query wasn't working, it was probably because you used double quotes.
From the SQLite documentation:
The MATCH operator is a special syntax for the match()
application-defined function. The default match() function
implementation raises an exception and is not really useful for
anything. But extensions can override the match() function with more
helpful logic.
FTS4 is an extension that provides a match() function.
Yes, it is ok to use multiple conditions as in your second query. When you have a complex set of conditions, it is important to understand the order in which the conditions will be evaluated. AND is always evaluated before OR (they are analagous to mathematical multiplication and addition, respectively). In practice, I think it is always best to use parentheses for clarity when using a combination of AND and OR:
--This is the same as with no parentheses, but is clearer:
SELECT * FROM distanceTable WHERE
pointId = 1 OR
pointId = 4 OR
(pointId = 203 AND distance<200)
--This is something completely different:
SELECT * FROM distanceTable WHERE
(pointId = 1 OR pointId = 4 OR pointId = 203) AND
distance<200