Add and use index for jsonb with nested arrays

Add and use index for jsonb with nested arrays - sql

In my PostgreSQL 11.11 I have one jsonb column that holds objects like this:
{
"dynamicFields":[
{
"name":"200",
"hidden":false,
"subfields":[
{
"name":"a",
"value":"Subfield a"
},
{
"name":"b",
"value":"Subfield b"
}
]
}
]
}
dynamicFields is an array and subfields is also an array and I having performance issues when hitting selects like this:
select *
from my_table a
cross join lateral jsonb_array_elements(jsonb_column -> 'dynamicFields') df
cross join lateral jsonb_array_elements(df -> 'subfields') sf
where df ->> 'name' = '200' and sf ->> 'name' = 'a'
The performance issues live mostly in the subfield. I have already added an index like this:
CREATE INDEX idx_my_index ON my_table USING gin ((marc->'dynamicFields') jsonb_path_ops);
How can I add an index for the subfields inside the dynamicFields?
The query above is just one example, I use it a lot in joins with other tables in the database. And I also know the #> operator.

You already have a very good index to support your query.
Make use of it with the jsonb "contains" operator" #>:
SELECT *
FROM my_table
WHERE marc->'dynamicFields' #> '[{"name": "200", "subfields":[{"name": "a"}]}]';
db<>fiddle here
Carefully match the structure of the JSON object in the table. Then rows are selected cheaply using the index.
You can then extract whatever parts you need from qualifying rows.
Detailed instructions:
Index for finding an element in a JSON array
If one of the filters is very selective on its own, it might be faster to split the two conditions like in your original. Either way, both variants should be fast:
SELECT *
FROM my_table
WHERE marc->'dynamicFields' #> '[{"name": "200"}]'
AND marc->'dynamicFields' #> '[{"subfields":[{"name": "a"}]}]';

Index is for enhancing query performance on tables. Index can only be done on table columns and considering those columns that will be used in table join and where clause makes the indexing significant. For a jsonb column you can use create index on table_name using gin(column_name, jsonb_path_ops).

Related

Select rows with jsonb that have and only have certain keys in postgresql

I have a jsonb column called data in a table called people. The json's values are arrays. It looks like this:
{"bar":["def"],"foo":["abc","hij"]}
In the above example, this jsonb has 2 keys "bar" and "foo". Both values are arrays containing several elements. I am trying to query using several key-value pairs but the values here are single strings. I am trying to make sure the results have and only have the keys in the query and at the same time the corresponding value in the query exists in the json's arrays.
For example, using
{"bar":"def", "foo":"abc"} or {"bar":"def", "foo":"hij"}
, I should be able to get the result.
But if using
{"bar":"def"} or {"foo":"abc"} or {"bar":"def", "foo":"abc", "xyz":"123"}
, I shouldn't get the result since the keys don't match exactly.
I have tried using data->'bar' #> '["def"]' AND data->'foo' #> '["abc"]' to make sure the key-value pairs in the query exist in the data jsonb, but I don't know how to filter out the rows that have more keys than in the query. I was thinking about converting all the keys in the jsonb into an array and use the keys in the query as an array to check if the array from the query contains the array from the jsonb, but couldn't really know how to do it properly. If there is any other better solution, please share your thoughts.

You can full outer join the keys of your objects, check that a key match exists, and then verify the target value exists in the array of possibilities:
create or replace function js_match(record jsonb, template jsonb) returns bool as $$
select not exists (select 1 from jsonb_each(record) t1
full outer join jsonb_each(template) t2 on t1.key = t2.key
where t1.key is null or t2.key is null or not exists
(select 1 from jsonb_array_elements(t1.value) v where v = t2.value))
$$ language sql;
Usage:
select * from people where js_match(data, '{"bar":"def", "foo":"abc"}'::jsonb)
See fiddle
This answer uses a function to make the comparisons easier during the main selection; however, below is a pure query version:
select * from people p where not exists (select 1 from jsonb_each(p.data) t1
full outer join jsonb_each('{"bar":"def", "foo":"abc"}'::jsonb) t2 on t1.key = t2.key
where t1.key is null or t2.key is null or not exists
(select 1 from jsonb_array_elements(t1.value) v where v = t2.value))
See fiddle

JSONB Joining with the Contains Operator

I have two tables, ClaimPaymentHistory and RemittanceHistory which I am currently joining with the following query.
select rh."EventHistory"
from "ClaimPaymentHistory" ph, jsonb_array_elements(ph."EventHistory") payments
inner join "RemittanceHistory" rh
on payments->> 'rk' = rh."RemittanceRefKey"::text
where ph."ClaimRefKey" = #ClaimRefKey
I wanted to improve this query using the following index:
CREATE INDEX claim_payment_history_gin_idx ON "ClaimPaymentHistory"
USING gin ("EventHistory" jsonb_path_ops)
But I don't appear to get any improvement with this. However, I can see this index being leveraged if I query the EventHistory column of that table using the #> operator, for example like so:
select * from "ClaimPaymentHistory" where "EventHistory" #> '[{"rk": 637453920516771103}]';
So my question is, am I able to create a join using that contains operator? I've been playing with the syntax but can't get anything to work.
If I am unable to create a join with that operator, what would be my best options for indexing?

That index could be used if you wrote the query like this:
select rh."EventHistory"
from "RemittanceHistory" rh join "ClaimPaymentHistory" ph
on ph."EventHistory" #> jsonb_build_array(jsonb_build_object('rk',rh."RemittanceRefKey"))
where ph."ClaimRefKey" = 5;
However, this unlikely to have good performance unless "RemittanceHistory" has few rows in it.
...what would be my best options for indexing?
The obvious choice, if you don't have them already, would be regular (btree) indexes on rh."RemittanceRefKey" and ph."ClaimRefKey".
Also, look at (and show us) the EXPLAIN (ANALYZE, BUFFERS) for the original query you want to make faster.

I wound up refactoring the table structure. Instead of a join through RemittanceRefKey I added a JSONB column to RemittanceHistory called ClaimRefKeys. This is simply an array of integer values and now I can lookup the desired rows with:
select "EventHistory" from "RemittanceHistory" where "ClaimRefKeys" #> #ClaimRefKey;
This combined with the following index gives pretty fantastic performance.
CREATE INDEX remittance_history_claimrefkeys_gin_idx ON "RemittanceHistory" USING gin ("ClaimRefKeys" jsonb_path_ops);

jsonb gin index not being used in postgresql

I create the following indexes on jsonb columns in my table:
CREATE INDEX idx_gin_accounts ON t1 USING GIN (accounts jsonb_path_ops);
CREATE INDEX idx_gin_stocks ON t1 USING GIN (stocks jsonb_path_ops);
CREATE INDEX idx_gin_stocks_value ON t1 USING GIN ((stocks-> 'value'));
CREATE INDEX idx_gin_stocks_type ON t1 USING GIN ((stocks-> 'type'));
My query is like this:
SELECT
t.accounts ->> 'name' as account_name
//other columns
FROM t1 t
left join lateral jsonb_array_elements(t.accounts) a(accounts)
on 1 = 1 and a.accounts #> '{"role": "ADVISOR"}'
left join lateral jsonb_array_elements(t1.stocks) s(stocks)
on 1 = 1 and s.stocks #> '{"type": "RIC"}'
WHERE (s.stocks -> 'value' ? 'XXX')
When I analyse with EXPLAIN ANALYSE I do not see these indexes being used in the query plan.
Should different indexes be created? Or How can I use these ones to speed up the search?
Say When I pass in (s.stocks-> 'value' ? 'XXX') in where condition, I would want the search to be optimal?

You can not index the results of a set returning function (other than making a materialized view).
We can reason out that if a.accounts #> '{"role": "ADVISOR"}' than it is necessary that t.accounts #> '[{"role": "ADVISOR"}]'. PostgreSQL can't reason that out, but we can.
However, this also won't help, because you are doing left joins. If every single row of t1 is getting returned, what do expect an index to accomplish?
With your added WHERE clause, you can use a JSONPATH (if you are using the latest version of PostgreSQL) to get the rows of t1 that you seem to want. It will use the index on t1 (stocks), either with or without the jsonb_path_ops:
WHERE (s.stocks -> 'value' ? 'XXX') AND
t.stocks #? '$[*] ? (#.type == "RIC" ) ? (exists (#.value.XXX))';
However, the index is not actually very efficient if almost all entries have a type RIC, so this is pyrrhic victory.

Joining JSON array with another table

My data is in array format like [1,2,3] how to join with another table. Here the query i am trying:
select
RCM.header_details->'auditAssertion'as auditassertion
from masters."RCM" RCM
left join reference."AUDIT_ASSERTION_APPLICATION" as AAA on AAA.id=RCM.header_details->'auditAssertion'

You can use the ? operator to check if a value belongs to json(b) array:
select m.header_details->'auditAssertion'as auditassertion
from masters.rcm m
left join reference.audit_assertion_application a
on m.header_details->'auditAssertion' ? a.id::text
For performance, Postgres would support the following index:
create index on masters.rcm using gin ((header_details->'auditAssertion'));

How to JOIN a JSONB column with an array by a text column?

Using PostgreSQL 9.6+
Two tables (simplified to only the columns that matter with example data):
Table 1:
-------------------------------------------------------
key (PK) [Text]| resources [JSONB]
-------------------------------------------------------
asdfaewdfas | [i0c1d1233s49f3fce, z0k1d9921s49f3glk]
Table 2:
-------------------------------------------------------
resource (PK) [Text]| data [JSONB]
-------------------------------------------------------
i0c1d1233s49f3fce | {large json of data}
z0k1d9921s49f3glk | {large json of data}
Trying to access the data column(s) of Table 2 from the resources column of Table 1.

Unnest the JSON array and join to the second table. Like:
SELECT t1.*, t2.data -- or just the bits you need
FROM table1 t1, jsonb_array_elements_text(t1.resources) r(resource)
JOIN table2 t2 USING (resource)
WHERE t1.key = ?
Or, to preserve all rows in table1 with empty / null / unmatched resources:
SELECT t1.*, t2.data -- or just the bits you need
FROM table1 t1
LEFT JOIN LATERAL jsonb_array_elements_text(t1.resources) r(resource) ON true
LEFT JOIN table2 t2 USING (resource)
WHERE t1.key = ?
About jsonb_array_elements_text():
How to turn json array into postgres array?
There is an implicit LATERAL join in the first query. See:
What is the difference between LATERAL and a subquery in PostgreSQL?
Consider a normalized DB design with a junction table with one row per linked resource instead of the column table1.resources, implementing the m:n relation properly. This way you can enforce referential integrity, data integrity etc. with relational features. And queries become simpler. jsonb for everything is simple at first. But if you work a lot with nested data, this may turn around on you.
Can PostgreSQL array be optimized for join?
How to implement a many-to-many relationship in PostgreSQL?
Can PostgreSQL have a uniqueness constraint on array elements?

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Add and use index for jsonb with nested arrays - sql

Index is for enhancing query performance on tables. Index can only be done on table columns and considering those columns that will be used in table join and where clause makes the indexing significant. For a jsonb column you can use create index on table_name using gin(column_name, jsonb_path_ops).

Related

Select rows with jsonb that have and only have certain keys in postgresql

JSONB Joining with the Contains Operator

jsonb gin index not being used in postgresql

Joining JSON array with another table

How to JOIN a JSONB column with an array by a text column?

Categories

Resources