How to use ARRAY contains operator with ANY - sql

I have a table where one column is array:
CREATE TABLE inherited_tags (
id serial,
tags text[]
);
Sample values:
INSERT INTO inherited_tags (tags) VALUES
(ARRAY['A','B','C']), -- id: 1
(ARRAY['D','E']), -- id: 2
(ARRAY['A','B']), -- id: 3
(ARRAY['C','D']), -- id: 4
(ARRAY['D','F']), -- id: 5
(ARRAY['A']); -- id: 6
I want to find rows which tags column contains some subset of words inside array. For example for input:
ARRAY[ARRAY['A','C'], ARRAY['F'], ARRAY['E']]::text[][]
I want to find all rows that contain ('A' and 'C') OR ('F') OR ('E'). So for example above I should get rows with ids: 1, 2, 5.
I was hoping that I could use syntax like this:
SELECT * FROM inherited_tags WHERE
tags #> ANY(ARRAY[ARRAY['A','C'], ARRAY['F'], ARRAY['E']]::text[][])
but I get error:
ERROR: operator does not exist: text[] #> text
LINE 1: SELECT * FROM inherited_tags where tags <# ANY(ARRAY[ARRAY['...
Postgres 9.6
plpgsql solution is acceptable but SQL is preferred.
DB-FIDDLE: https://www.db-fiddle.com/f/cKCr7Sfab6u8rqaCHhJvPk/0

The problem comes from the fact that the text[] and text[][] data types are internally the same data type. An array has a base type and dimensions, and the ANY operator will always extract the base type to compare, which will always be text and not text[]. It doesn't help that multidimensional arrays require that each subelement has the same length as every other. You can have ARRAY[ARRAY['A','C'],ARRAY['B','N']], but not ARRAY[ARRAY[2,3],ARRAY[1]].
In short, there is no direct way to make that particular query work. I tried to create a function and an operator for this as well, and that doesn't work, either, for different reasons. See how that went:
CREATE OR REPLACE FUNCTION check_tag_matches(
IN leftside text[],
IN rightside text)
RETURNS BOOLEAN AS
$BODY$
DECLARE rightarr text[];
BEGIN
SELECT CAST(rightside as text[]) INTO rightarr;
RETURN SELECT leftside #> rightarr;
END;
$BODY$
LANGUAGE plpgsql STABLE;
CREATE OPERATOR public.>>(
PROCEDURE = check_tag_matches,
LEFTARG = text[],
RIGHTARG = text,
COMMUTATOR = >>);
Then when testing it:
test=# SELECT * FROM inherited_tags WHERE
tags >> ANY(ARRAY[ARRAY['A','M'], ARRAY['F','E'], ARRAY['E','R']]::text[][]);
ERROR: malformed array literal: "A"
DETAIL: Array value must start with "{" or dimension information.
CONTEXT: SQL statement "SELECT CAST(rightside as text[])"
PL/pgSQL function check_tag_matches(text[],text) line 4 at SQL statement
It seems that when you try using a multidimensional array like ARRAY[ARRAY['A','M'], ARRAY['F','E'], ARRAY['E','R']]::text[][] in ANY(), it iterates not over ARRAY['A','M'], then ARRAY['F','E'], then ARRAY['E','R'], but over 'A','M','F','E','E','R'. The same thing happens when with unnest.
test=# SELECT unnest(ARRAY[ARRAY['A','M'], ARRAY['F','E'], ARRAY['E','R']]::text[][]);
unnest
--------
A
M
F
E
E
R
(6 rows)
Your remaining optiona are to define a function that will read array_length(rightside,1) and array_length(rightside,2) and use nested loops to check it all, or you can send multiple queries to get the inherited tags for each tag, or restructure your data somehow. And you can't even access the ARRAY['A','M'] element using rightside[1] to iterate over it, you're forced to go to the deepest level.

I don't think you can do that with a single condition because of the "contains A and C" requirement.
SELECT *
FROM inherited_tags
WHERE tags #> ARRAY['A','C']
OR tags && array['F', 'E'];
tags #> ARRAY['A','C'] selects those where tags contains all elements from ARRAY['A','C'] and tags && array['F', 'E'] selects those rows that contain at least one of the tags from array['F', 'E']
Updated DB Fiddle: https://www.db-fiddle.com/f/rXsjqEN3ry67uxJtEs3GM9/0

u can try
SELECT * FROM table WHERE
tags #> ARRAY['A','C']::varchar[]
OR
tags #> ARRAY['E']::varchar[]
OR
tags #> ARRAY['F']::varchar[]

Related

How to retrieve the list of dynamic nested keys of BigQuery nested records

My ELT tools imports my data in bigquery and generates/extends automatically the schema for dynamic nested keys (in the schema below, under properties)
It looks like this
How can I get the list of nested keys of a repeated record ? so for example I can group by properties when those items have said property non-null ?
I have tried
select column_name
from my_schema.INFORMATION_SCHEMA.COLUMNS
where
table_name = 'my_table
But it will only list first level keys
From the picture above, I want, as a first step, a SQL query that returns
message
user_id
seeker
liker_id
rateable_id
rateable_type
from_organization
likeable_type
company
existing_attempt
...
My real goal through, is to group/count my data based on a non-null value of a 2nd level nested properties properties.filters.[filter_type]
The schema may evolve when our application adds more filters, so this need to be dynamically generated, I can't just hard-code the list of nested keys.
Note: this is very similar to this question How to extract all the keys in a JSON object with BigQuery but in my case my data is already in a shcema and it's not a JSON object
EDIT:
Suppose I have a list of such records with nested properties, how do I write a SQL query that adds a field "enabled_filters" which aggregates, for each item, the list of properties for wihch said property is not null ?
Example input (properties.x are dynamic and not known by the programmer)
search_id
properties.filters.school
properties.filters.type
1
MIT
master
2
Princetown
null
3
null
master
Example output
search_id
enabled_filters
1
["school", "type"]
2
["school"]
3
["type"]
Have you looked at COLUMN_FIELD_PATHS? It should give you the paths for all columns.
select field_path from my_schema.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS where table_name = '<table>'
[https://cloud.google.com/bigquery/docs/information-schema-column-field-paths]
The field properties is not nested by array only by structures. Then a UDF in JavaScript to parse thise field should work fast enough.
CREATE TEMP FUNCTION jsonObjectKeys(input STRING, shownull BOOL,fullname Bool)
RETURNS Array<String>
LANGUAGE js AS """
function test(input,old){
var out=[]
for(let x in input){
let te=input[x];
out=out.concat(te==null ? (shownull?[x+'==null']:[]) : typeof te=='object' ? test(te,old+x+'.') : [fullname ? old+x : x] );
}
return out;
Object.keys(JSON.parse(input));
}
return test(JSON.parse(input),"");
""";
with tbl as (select struct(1 as alpha,struct(2 as x, 3 as y,[1,2,3] as z ) as B) A from unnest(generate_array(1,10*1))
union all select struct(null,struct(null,1,[999])) )
select *,
TO_JSON_STRING (A ) as string_output,
jsonObjectKeys(TO_JSON_STRING (A),true,false) as output1,
jsonObjectKeys(TO_JSON_STRING (A),false,true) as output2,
concat('["', array_to_string(jsonObjectKeys(TO_JSON_STRING (A),false,true),'","' ) ,'"]') as output_sring,
jsonObjectKeys(TO_JSON_STRING (A.B),false,true) as outpu
from tbl

How to get the value of jsonb data using only the key number id?

I have a jsonb column with the following data:
{"oz": "2835", "cup": "229", "jar": "170"}
I have the key number 0 that represents the first item "oz". How can I pull this value using the 0?
I'm thinking something similar to:
SELECT units->[0] as test
I only have the key ID to reference this data. I do not have the key name "oz".
Sounds like a horrible idea. But you can still create a function to implement this horrible idea:
create function jsonb_disaster(jsonb,int) returns jsonb language SQL as $$
select value from jsonb_each($1) with ordinality where ordinality=1+$2
$$;
select jsonb_disaster('{"oz": "2835", "cup": "229", "jar": "170"}',0);
jsonb_disaster
----------------
"2835"
You could also create your own operator to wrap up this disaster:
create operator !> ( function = jsonb_disaster, leftarg=jsonb, rightarg=int);
select '{"cup": "229", "jar": "170", "oz": "2835"}' !> 1;
?column?
----------
"229"

extract data from redhsift

elements
[{"name":"email",
"value":"abc#gmail.com",
"nodeName":"INPUT",
"type":"text"},
{"name":"password",
"value":"*****",
"nodeName":"INPUT",
"type":"password"},
{"name":"checkbox",
"value":null,
"nodeName":"INPUT",
"type":"checkbox"}]
I have data like this in redshift. How do I extract value abc#gmail.com from this. This query is for redshift. Please help me with the SQL. Elements is a column name and the value starts with [].
Query I tried:
select
id,
json_extract_path_text(ELEMENTS, 'name') as name1
from table
error:[XX000][500310] Amazon Invalid operation: JSON parsing error Details: ----------------------------------------------- error: JSON parsing error code: 8 ...
You can create UDF in python, for your case I've created one, please test and edit as suits:
create or replace function f_py_json (jsonVar varchar(512),
jsonElemvarchar(10), occ integer)
returns varchar(512)
stable
as $$
import json
asJson = json.loads(jsonVar)
name_list = []
ret=str(asJson['elements'][occ][jsonElem])
return ret
$$ language plpythonu;
select f_py_json (id, 'value', 1) from test;
-- Input is {"elements":[{"name":"email","value":"abc#gmail.com"},{"name":"password","value":"*****"}]}

Select rows from table with jsonb column based on arbitrary jsonb filter expression

Test data
DROP TABLE t;
CREATE TABLE t(_id serial PRIMARY KEY, data jsonb);
INSERT INTO t(data) VALUES
('{"a":1,"b":2, "c":3}')
, ('{"a":11,"b":12, "c":13}')
, ('{"a":21,"b":22, "c":23}')
Problem statement: I want to receive an arbitrary JSONB parameter which acts as a filter on column t.data, such as
{ "b":{ "from":0, "to":20 }, "c":13 }
and use this to select matching rows from my test table t.
In this example, I want rows where b is between 0 and 20 and c = 13.
No error is required if the filter specifies a "column" (or "tag") which does not exist in t.data - it just fails to find a match.
I've used numeric values for simplicity but would like an approach which generalises to text as well.
What I have tried so far. I looked at the containment approach, which works for equality conditions, but am stumped on a generic way of handling range conditions:
select * from t
where t.data#> '{"c":13}'::jsonb;
Background: This problem arose when building a generic table-preview page on a website (for Admin users).
The page displays a filter based on various columns in whichever table is selected for preview.
The filter is then passed to a function in Postgres DB which applies this dynamic filter condition to the table.
It returns a jsonb array of the rows matching the filter specified by the user.
This jsonb array is then used to populate the Preview resultset.
The columns which make up the filter may change.
My Postgres version is 9.6 - thanks.
if you want to parse { "b":{ "from":0, "to":20 }, "c":13 } you need a parser. It is out of scope of json functions, but you can write "generic" query using AND and OR to filter by such json, eg:
https://www.db-fiddle.com/f/jAPBQggG3p7CxqbKLMbPKw/0
with filt(f) as (values('{ "b":{ "from":0, "to":20 }, "c":13 }'::json))
select *
from t
join filt on
(f->'b'->>'from')::int < (data->>'b')::int
and
(f->'b'->>'to')::int > (data->>'b')::int
and
(data->>'c')::int = (f->>'c')::int
;
Thanks for the comments/suggestions.
I will definitely look at GraphQL when I have more time - I'm working under a tight deadline at the moment.
It seems the consensus is that a fully generic solution is not achievable without a parser.
However, I got a workable first draft - it's far from ideal but we can work with it. Any comments/improvements are welcome ...
Test data (expanded to include dates & text fields)
DROP TABLE t;
CREATE TABLE t(_id serial PRIMARY KEY, data jsonb);
INSERT INTO t(data) VALUES
('{"a":1,"b":2, "c":3, "d":"2018-03-10", "e":"2018-03-10", "f":"Blah blah" }')
, ('{"a":11,"b":12, "c":13, "d":"2018-03-14", "e":"2018-03-14", "f":"Howzat!"}')
, ('{"a":21,"b":22, "c":23, "d":"2018-03-14", "e":"2018-03-14", "f":"Blah blah"}')
First draft of code to apply a jsonb filter dynamically, but with restrictions on what syntax is supported.
Also, it just fails silently if the syntax supplied does not match what it expects.
Timestamp handling a bit kludgey, too.
-- Handle timestamp & text types as well as int
-- See is_timestamp(text) function at bottom
with cte as (
select t.data, f.filt, fk.key
from t
, ( values ('{ "a":11, "b":{ "from":0, "to":20 }, "c":13, "d":"2018-03-14", "e":{ "from":"2018-03-11", "to": "2018-03-14" }, "f":"Howzat!" }'::jsonb ) ) as f(filt) -- equiv to cross join
, lateral (select * from jsonb_each(f.filt)) as fk
)
select data, filt --, key, jsonb_typeof(filt->key), jsonb_typeof(filt->key->'from'), is_timestamp((filt->key)::text), is_timestamp((filt->key->'from')::text)
from cte
where
case when (filt->key->>'from') is null then
case jsonb_typeof(filt->key)
when 'number' then (data->>key)::numeric = (filt->>key)::numeric
when 'string' then
case is_timestamp( (filt->key)::text )
when true then (data->>key)::timestamp = (filt->>key)::timestamp
else (data->>key)::text = (filt->>key)::text
end
when 'boolean' then (data->>key)::boolean = (filt->>key)::boolean
else false
end
else
case jsonb_typeof(filt->key->'from')
when 'number' then (data->>key)::numeric between (filt->key->>'from')::numeric and (filt->key->>'to')::numeric
when 'string' then
case is_timestamp( (filt->key->'from')::text )
when true then (data->>key)::timestamp between (filt->key->>'from')::timestamp and (filt->key->>'to')::timestamp
else (data->>key)::text between (filt->key->>'from')::text and (filt->key->>'to')::text
end
when 'boolean' then false
else false
end
end
group by data, filt
having count(*) = ( select count(distinct key) from cte ) -- must match on all filter elements
;
create or replace function is_timestamp(s text) returns boolean as $$
begin
perform s::timestamp;
return true;
exception when others then
return false;
end;
$$ strict language plpgsql immutable;

Using Postgres JSON Functions on table columns

I have searched extensively (in Postgres docs and on Google and SO) to find examples of JSON functions being used on actual JSON columns in a table.
Here's my problem: I am trying to extract key values from an array of JSON objects in a column, using jsonb_to_recordset(), but get syntax errors. When I pass the object literally to the function, it works fine:
Passing JSON literally:
select *
from jsonb_to_recordset('[
{ "id": 0, "name": "400MB-PDF.pdf", "extension": ".pdf",
"transferId": "ap31fcoqcajjuqml6rng"},
{ "id": 0, "name": "1000MB-PDF.pdf", "extension": ".pdf",
"transferId": "ap31fcoqcajjuqml6rng"}
]') as f(name text);`
results in:
400MB-PDF.pdf
1000MB-PDF.pdf
It extracts the value of the key "name".
Here's the JSON in the column, being extracted using:
select journal.data::jsonb#>>'{context,data,files}'
from journal
where id = 'ap32bbofopvo7pjgo07g';
resulting in:
[ { "id": 0, "name": "400MB-PDF.pdf", "extension": ".pdf",
"transferId": "ap31fcoqcajjuqml6rng"},
{ "id": 0, "name": "1000MB-PDF.pdf", "extension": ".pdf",
"transferId": "ap31fcoqcajjuqml6rng"}
]
But when I try to pass jsonb#>>'{context,data,files}' to jsonb_to_recordset() like this:
select id,
journal.data::jsonb#>>::jsonb_to_recordset('{context,data,files}') as f(name text)
from journal
where id = 'ap32bbofopvo7pjgo07g';
I get a syntax error. I have tried different ways but each time it complains about a syntax error:
Version:
PostgreSQL 9.4.10 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2, 64-bit
The expressions after select must evaluate to a single value. Since jsonb_to_recordset returns a set of rows and columns, you can't use it there.
The solution is a cross join lateral, which allows you to expand one row into multiple rows using a function. That gives you single rows that select can act on. For example:
select *
from journal j
cross join lateral
jsonb_to_recordset(j.data#>'{context, data, files}') as d(id int, name text)
where j.id = 'ap32bbofopvo7pjgo07g'
Note that the #>> operator returns type text, and the #> operator returns type jsonb. As jsonb_to_recordset expects jsonb as its first parameter I'm using #>.
See it working at rextester.com
jsonb_to_recordset is a set-valued function and can only be invoked in specific places. The FROM clause is one such place, which is why your first example works, but the SELECT clause is not.
In order to turn your JSON array into a "table" that you can query, you need to use a lateral join. The effect is rather like a foreach loop on the source recordset, and that's where you apply the jsonb_to_recordset function. Here's a sample dataset:
create table jstuff (id int, val jsonb);
insert into jstuff
values
(1, '[{"outer": {"inner": "a"}}, {"outer": {"inner": "b"}}]'),
(2, '[{"outer": {"inner": "c"}}]');
A simple lateral join query:
select id, r.*
from jstuff
join lateral jsonb_to_recordset(val) as r("outer" jsonb) on true;
id | outer
----+----------------
1 | {"inner": "a"}
1 | {"inner": "b"}
2 | {"inner": "c"}
(3 rows)
That's the hard part. Note that you have to define what your new recordset looks like in the AS clause -- since each element in our val array is a JSON object with a single field named "outer", that's what we give it. If your array elements contain multiple fields you're interested in, you declare those in a similar manner. Be aware also that your JSON schema needs to be consistent: if an array element doesn't contain a key named "outer", the resulting value will be null.
From here, you just need to pull the specific value you need out of each JSON object using the traversal operator as you were. If I wanted only the "inner" value from the sample dataset, I would specify select id, r.outer->>'inner'. Since it's already JSONB, it doesn't require casting.