Querying whole OBJECT or Document using LIKE condition in CrateDB - cratedb

I have a table with one of the field defined as "attributes" OBJECT (DYNAMIC).
Now, my usecase is to be able to check if any particular string is part of this OBJECT. In SQL terms I want to execute a like command on this whole OBJECT or even whole Document. Do we currently support this feature
Query I want to execute: select * from gra.test_table where attributes like '%53.22.232.27%';
When I execute this on Crate 4.5.1, I am running into error: UnsupportedFeatureException[Unknown function: (gra.test_table.attributes LIKE '%53.22.232.27%'), no overload found for matching argument types: (object, text). Possible candidates: op_like(text, text):boolean]
When I execute this on Crate 3.x, I am running into error:
SQLActionException[SQLParseException: Cannot cast attributes to type string]
Table structure is below and attributes is the field I am talking about
CREATE TABLE IF NOT EXISTS "test"."test_table" (
"accountname" STRING,
"attributes" OBJECT (DYNAMIC) AS (
"accesslist" STRING,
"accesslistid" STRING,
"accessmask" STRING,
"accessreason" STRING
),
"employeeid" STRING,
"day" TIMESTAMP,
PRIMARY KEY ("day", "id"),
INDEX "all_columns_ft" USING FULLTEXT ("employeeid") WITH (
analyzer = 'standard'
)
)
CLUSTERED INTO 1 SHARDS
PARTITIONED BY ("day")
WITH (
"allocation.max_retries" = 5
)

Using like on a whole object is not supported.
A possible (slow) workaround could be to use a regular expression operator on the text cast of an object typed column:
select * from gra.test_table where attributes::text ~ '.*53.22.232.27.*'
But be aware that through the cast, no index could be utilized and such a full table scan + filter is executed resulting in slow query execution.

Related

Dynamically cast type of array elements to match some expression type in PostgreSQL query

I want to use array_position function in PostgreSQL (which takes array of some type and expression or value of the same type) for constructing query that returns rows in some arbitrary order (additional context: I want to enhance Ruby on Rails in_order_of feature which is currently implemented via unreadable CASE statement):
SELECT id, title, type
FROM posts
ORDER BY
array_position(ARRAY['SuperSpecial','Special','Ordinary']::varchar[], type),
published_at DESC;
The problem here is that requirement to do explicit type casting from type inferred by PostgreSQL from array literal (ARRAY['doh'] is text[]) to type of expression (type is varchar here). While varchar and text are coercible to each other, PostgreSQL requires explicit type cast, otherwise if omit it (like in array_position(ARRAY['doh'], type)) PostgreSQL will throw error (see this answer for details):
ERROR: function array_position(text[], character varying) does not exist
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
While it is not a problem to specify explicit type cast in some static queries, it is problem in autogenerated queries when type of expression is unknown beforehand: array_position(ARRAY[1,2,3], id * 2) (what type has id * 2?)
I thought that pg_typeof() could help me, but it seems that it can't be used neither in :: operator nor in CAST operator (I've seen information that both forms aren't function forms, but syntax constructs, see this question for details):
SELECT id, title, type
FROM posts
ORDER BY array_position(CAST(ARRAY['SpecialPost','Post','Whatever'] AS pg_typeof(type)), type), id;
ERROR: type "pg_typeof" does not exist
LINE 1: ...on(CAST(ARRAY['SpecialPost','Post','Whatever'] AS pg_typeof(...
Question:
How to do dynamic typecast to expression type (say, to type of "posts"."id" * 2) in the same SQL query?
I would prefer to avoid extra roundtrip to database server (like executing SELECT pg_typeof("id" * 2) FROM "posts" LIMIT 1 and then using its result in generating of a new query) or writing some custom functions. Is it possible?
Better query
I want to enhance Ruby on Rails in_order_of feature which is currently implemented via unreadable CASE statement:
For starters, neither the awkward CASE construct nor array_position() are ideal solutions.
SELECT id, title, type
FROM posts
ORDER BY
array_position(ARRAY['SuperSpecial','Special','Ordinary']::varchar[], type),
published_at DESC;
There is a superior solution in Postgres:
SELECT id, title, type
FROM posts
LEFT JOIN unnest(ARRAY['SuperSpecial','Special','Ordinary']::varchar[]) WITH ORDINALITY o(type, ord) USING (type)
ORDER BY o.ord, published_at DESC;
This avoids calling the function array_position() for every row and is cheaper.
Equivalent short syntax with array literal and implicit column name:
SELECT id, title, type
FROM posts
LEFT JOIN unnest('{SuperSpecial,Special,Ordinary}'::varchar[]) WITH ORDINALITY type USING (type)
ORDER BY ordinality, published_at DESC;
db<>fiddle here
Added "benefit": it works with type-mismatch in Postgres 13 - as long as array type and column type are compatible.
The only possible caveat I can think of: If the passed array has duplicate elements, joined rows are duplicated accordingly. That wouldn't happen with array_position(). But duplicates would be nonsense for the expressed purpose in any case. Make sure to pass unique array elements.
See:
ORDER BY the IN value list
PostgreSQL unnest() with element number
Improved functionality in Postgres 14
The error you report is going away with Postgres 14:
ERROR: function array_position(text[], character varying) does not exist
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
Quoting the release notes:
Allow some array functions to operate on a mix of compatible data
types (Tom Lane)
The functions array_append(), array_prepend(), array_cat(),
array_position(), array_positions(), array_remove(),
array_replace(), and width_bucket() now take anycompatiblearray
instead of anyarray arguments. This makes them less fussy about
exact matches of argument types.
And the manual on anycompatiblearray:
Indicates that a function accepts any array data type, with automatic promotion of multiple arguments to a common data type
So, while this raises the above error msg in Postgres 13:
SELECT array_position(ARRAY['a','b','c']::text[], 'd'::varchar);
.. the same just works in Postgres 14.
(Your query and error msg show flipped positions for text and varchar, but all the same.)
To be clear, calls with compatible types now just work, incompatible types still raise an exception:
SELECT array_position('{1,2,3}'::text[], 3);
(The numeric literal 3 defaults to type integer, which is incompatible with text.)
Answer to actual question
.. which may be irrelevant by now. But as proof of concept:
CREATE OR REPLACE FUNCTION posts_order_by(_order anyarray)
RETURNS SETOF posts
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE format (
$$
SELECT p.*
FROM posts p
LEFT JOIN unnest($1) WITH ORDINALITY o(type, ord) ON (o.type::%s = p.type)
ORDER BY o.ord, published_at DESC
$$
, (SELECT atttypid::regtype
FROM pg_attribute
WHERE attrelid = 'posts'::regclass
AND attname = 'type')
)
USING _order;
END
$func$;
db<>fiddle here
Doesn't make a whole lot of sense, as the type of posts.id should be well-known at the time of writing the function, but there may be special cases ...
Now both of these calls work:
SELECT * FROM posts_order_by('{SuperSpecial,Special,Ordinary}'::varchar[]);
SELECT * FROM posts_order_by('{1,2,3}'::int[]);
Though the second typically doesn't make sense.
Related, with links to more:
Executing queries dynamically in PL/pgSQL

Text case-insitive search with crate.io SQL

What is the proper SQL syntax to search an array text in the crate database?
My example table is:
create table
tasks(user string, entry array(object as (taskid string, eTime timestamp))).
I tried the following which give a syntax error:
select * from program where any(entry['taskid']) ~* '.*cleanup.*';
The correct syntax for the ANY operator would be:
SELECT * FROM tasks WHERE '.*cleanup.*' ~* ANY(entry['taskid']);
However, PCRE are currently not supported in combination with ANY. An alternative would be the LIKE predicate, but that is not case-insensitive (and can be quite slow if it starts with a wildcard character);
So ultimately, you could ...
... either use a fulltext index on the entry['taskid'] column with a lowercase analyzer (which is probably not the best solution, because I assume taskid is a single word and you want to use it "as is" also),
... or split up the array values into separate rows so you have a schema like:
CREATE TABLE tasks (
user string,
entry OBJECT AS (
taskid STRING,
etime TIMESTAMP
)
) ...
The you can use
SELECT * FROM tasks WHERE entry['taskid'] ~* '.*cleanup.*';

postgresql using json sub-element in where clause

This might be a very basic question but I am not able to find anything on this online.
If I create a sample table :
create table dummy ( id int not null, data json );
Then, if I query the table using the following query:
select * from dummy where data->'x' = 10;
Now since there are no records in the table yet and there is no such property as 'x' in any record, it should return zero results.
But I get the following error:
postgres=# select * from dummy where data->'x' = 10;
ERROR: operator does not exist: json = integer
LINE 1: select * from dummy where data->'x' = 10;
However following query works:
select * from dummy where cast(data->>'x' as integer) = 10;
Am I missing something here or typecasting is the only way I can get an integer value from a json field ? If that's the case, does it not affect the performance when data becomes extremely large ?
Am I missing something here or typecasting is the only way I can get
an integer value from a json field ?
You're correct, typecasting is the only way to read an integer value from a json field.
If that's the case, does it not affect the performance when data
becomes extremely large ?
Postgres allows you to index functions including casts, so the index below will allow you to quickly retrieve all rows where data->>x has some integer value
CREATE INDEX dummy_x_idx ON dummy(cast("data"->>'x' AS int))
JSON operator ->> means Get JSON array element (or object field) as text, so type cast is necessary.
You could define your own JSON operator, but it would only simplify the code, without consequences for performance.

PostgreSQL - best way to return an array of key-value pairs

I'm trying to select a number of fields, one of which needs to be an array with each element of the array containing two values. Each array item needs to contain a name (character varying) and an ID (numeric). I know how to return an array of single values (using the ARRAY keyword) but I'm unsure of how to return an array of an object which in itself contains two values.
The query is something like
SELECT
t.field1,
t.field2,
ARRAY(--with each element containing two values i.e. {'TheName', 1 })
FROM MyTable t
I read that one way to do this is by selecting the values into a type and then creating an array of that type. Problem is, the rest of the function is already returning a type (which means I would then have nested types - is that OK? If so, how would you read this data back in application code - i.e. with a .Net data provider like NPGSQL?)
Any help is much appreciated.
ARRAYs can only hold elements of the same type
Your example displays a text and an integer value (no single quotes around 1). It is generally impossible to mix types in an array. To get those values into an array you have to create a composite type and then form an ARRAY of that composite type like you already mentioned yourself.
Alternatively you can use the data types json in Postgres 9.2+, jsonb in Postgres 9.4+ or hstore for key-value pairs.
Of course, you can cast the integer to text, and work with a two-dimensional text array. Consider the two syntax variants for a array input in the demo below and consult the manual on array input.
There is a limitation to overcome. If you try to aggregate an ARRAY (build from key and value) into a two-dimensional array, the aggregate function array_agg() or the ARRAY constructor error out:
ERROR: could not find array type for data type text[]
There are ways around it, though.
Aggregate key-value pairs into a 2-dimensional array
PostgreSQL 9.1 with standard_conforming_strings= on:
CREATE TEMP TABLE tbl(
id int
,txt text
,txtarr text[]
);
The column txtarr is just there to demonstrate syntax variants in the INSERT command. The third row is spiked with meta-characters:
INSERT INTO tbl VALUES
(1, 'foo', '{{1,foo1},{2,bar1},{3,baz1}}')
,(2, 'bar', ARRAY[['1','foo2'],['2','bar2'],['3','baz2']])
,(3, '}b",a{r''', '{{1,foo3},{2,bar3},{3,baz3}}'); -- txt has meta-characters
SELECT * FROM tbl;
Simple case: aggregate two integer (I use the same twice) into a two-dimensional int array:
Update: Better with custom aggregate function
With the polymorphic type anyarray it works for all base types:
CREATE AGGREGATE array_agg_mult (anyarray) (
SFUNC = array_cat
,STYPE = anyarray
,INITCOND = '{}'
);
Call:
SELECT array_agg_mult(ARRAY[ARRAY[id,id]]) AS x -- for int
,array_agg_mult(ARRAY[ARRAY[id::text,txt]]) AS y -- or text
FROM tbl;
Note the additional ARRAY[] layer to make it a multidimensional array.
Update for Postgres 9.5+
Postgres now ships a variant of array_agg() accepting array input and you can replace my custom function from above with this:
The manual:
array_agg(expression)
...
input arrays concatenated into array of one
higher dimension (inputs must all have same dimensionality, and cannot
be empty or NULL)
I suspect that without having more knowledge of your application I'm not going to be able to get you all the way to the result you need. But we can get pretty far. For starters, there is the ROW function:
# SELECT 'foo', ROW(3, 'Bob');
?column? | row
----------+---------
foo | (3,Bob)
(1 row)
So that right there lets you bundle a whole row into a cell. You could also make things more explicit by making a type for it:
# CREATE TYPE person(id INTEGER, name VARCHAR);
CREATE TYPE
# SELECT now(), row(3, 'Bob')::person;
now | row
-------------------------------+---------
2012-02-03 10:46:13.279512-07 | (3,Bob)
(1 row)
Incidentally, whenever you make a table, PostgreSQL makes a type of the same name, so if you already have a table like this you also have a type. For example:
# DROP TYPE person;
DROP TYPE
# CREATE TABLE people (id SERIAL, name VARCHAR);
NOTICE: CREATE TABLE will create implicit sequence "people_id_seq" for serial column "people.id"
CREATE TABLE
# SELECT 'foo', row(3, 'Bob')::people;
?column? | row
----------+---------
foo | (3,Bob)
(1 row)
See in the third query there I used people just like a type.
Now this is not likely to be as much help as you'd think for two reasons:
I can't find any convenient syntax for pulling data out of the nested row.
I may be missing something, but I just don't see many people using this syntax. The only example I see in the documentation is a function taking a row value as an argument and doing something with it. I don't see an example of pulling the row out of the cell and querying against parts of it. It seems like you can package the data up this way, but it's hard to deconstruct after that. You'll wind up having to make a lot of stored procedures.
Your language's PostgreSQL driver may not be able to handle row-valued data nested in a row.
I can't speak for NPGSQL, but since this is a very PostgreSQL-specific feature you're not going to find support for it in libraries that support other databases. For example, Hibernate isn't going to be able to handle fetching an object stored as a cell value in a row. I'm not even sure the JDBC would be able to give Hibernate the information usefully, so the problem could go quite deep.
So, what you're doing here is feasible provided you can live without a lot of the niceties. I would recommend against pursuing it though, because it's going to be an uphill battle the whole way, unless I'm really misinformed.
A simple way without hstore
SELECT
jsonb_agg(to_jsonb (t))
FROM (
SELECT
unnest(ARRAY ['foo', 'bar', 'baz']) AS table_name
) t
>>> [{"table_name": "foo"}, {"table_name": "bar"}, {"table_name": "baz"}]

passing an array into oracle sql and using the array

I am running into the following problem, I am passing an array of string into Oracle SQL, and I would like to retrieve all the data where its id is in the list ...
here's what i've tried ...
OPEN O_default_values FOR
SELECT ID AS "Header",
VALUE AS "DisplayValue",
VALUE_DESC AS "DisplayText"
FROM TBL_VALUES
WHERE ID IN I_id;
I_id is an array described as follows - TYPE gl_id IS TABLE OF VARCHAR2(15) INDEX BY PLS_INTEGER;
I've been getting the "expression is of wrong type" error.
The I_id array can sometimes be as large as 600 records.
My question is, is there a way to do what i just describe, or do i need to create some sort of cursor and loop through the array?
What has been tried - creating the SQL string dynamically and then con-cat the values to the end of the SQL string and then execute it. This will work for small amount of data and the size of the string is static, which will caused some other errors (like index out of range).
have a look at this link: http://asktom.oracle.com/pls/asktom/f?p=100:11:620533477655526::::P11_QUESTION_ID:139812348065
effectively what you want is a variable in-list with bind variables.
do note this:
"the" is deprecated. no need for it
today.
TABLE is it's replacement
select * from TABLE( function );
since you already have the type, all you need to do is something similar to below:
OPEN O_default_values FOR
SELECT ID AS "Header",
VALUE AS "DisplayValue",
VALUE_DESC AS "DisplayText"
FROM TBL_VALUES
WHERE ID IN (select column_value form table(I_id));