I create the following indexes on jsonb columns in my table:
CREATE INDEX idx_gin_accounts ON t1 USING GIN (accounts jsonb_path_ops);
CREATE INDEX idx_gin_stocks ON t1 USING GIN (stocks jsonb_path_ops);
CREATE INDEX idx_gin_stocks_value ON t1 USING GIN ((stocks-> 'value'));
CREATE INDEX idx_gin_stocks_type ON t1 USING GIN ((stocks-> 'type'));
My query is like this:
SELECT
t.accounts ->> 'name' as account_name
//other columns
FROM t1 t
left join lateral jsonb_array_elements(t.accounts) a(accounts)
on 1 = 1 and a.accounts #> '{"role": "ADVISOR"}'
left join lateral jsonb_array_elements(t1.stocks) s(stocks)
on 1 = 1 and s.stocks #> '{"type": "RIC"}'
WHERE (s.stocks -> 'value' ? 'XXX')
When I analyse with EXPLAIN ANALYSE I do not see these indexes being used in the query plan.
Should different indexes be created? Or How can I use these ones to speed up the search?
Say When I pass in (s.stocks-> 'value' ? 'XXX') in where condition, I would want the search to be optimal?
You can not index the results of a set returning function (other than making a materialized view).
We can reason out that if a.accounts #> '{"role": "ADVISOR"}' than it is necessary that t.accounts #> '[{"role": "ADVISOR"}]'. PostgreSQL can't reason that out, but we can.
However, this also won't help, because you are doing left joins. If every single row of t1 is getting returned, what do expect an index to accomplish?
With your added WHERE clause, you can use a JSONPATH (if you are using the latest version of PostgreSQL) to get the rows of t1 that you seem to want. It will use the index on t1 (stocks), either with or without the jsonb_path_ops:
WHERE (s.stocks -> 'value' ? 'XXX') AND
t.stocks #? '$[*] ? (#.type == "RIC" ) ? (exists (#.value.XXX))';
However, the index is not actually very efficient if almost all entries have a type RIC, so this is pyrrhic victory.
Related
In my PostgreSQL 11.11 I have one jsonb column that holds objects like this:
{
"dynamicFields":[
{
"name":"200",
"hidden":false,
"subfields":[
{
"name":"a",
"value":"Subfield a"
},
{
"name":"b",
"value":"Subfield b"
}
]
}
]
}
dynamicFields is an array and subfields is also an array and I having performance issues when hitting selects like this:
select *
from my_table a
cross join lateral jsonb_array_elements(jsonb_column -> 'dynamicFields') df
cross join lateral jsonb_array_elements(df -> 'subfields') sf
where df ->> 'name' = '200' and sf ->> 'name' = 'a'
The performance issues live mostly in the subfield. I have already added an index like this:
CREATE INDEX idx_my_index ON my_table USING gin ((marc->'dynamicFields') jsonb_path_ops);
How can I add an index for the subfields inside the dynamicFields?
The query above is just one example, I use it a lot in joins with other tables in the database. And I also know the #> operator.
You already have a very good index to support your query.
Make use of it with the jsonb "contains" operator" #>:
SELECT *
FROM my_table
WHERE marc->'dynamicFields' #> '[{"name": "200", "subfields":[{"name": "a"}]}]';
db<>fiddle here
Carefully match the structure of the JSON object in the table. Then rows are selected cheaply using the index.
You can then extract whatever parts you need from qualifying rows.
Detailed instructions:
Index for finding an element in a JSON array
If one of the filters is very selective on its own, it might be faster to split the two conditions like in your original. Either way, both variants should be fast:
SELECT *
FROM my_table
WHERE marc->'dynamicFields' #> '[{"name": "200"}]'
AND marc->'dynamicFields' #> '[{"subfields":[{"name": "a"}]}]';
Index is for enhancing query performance on tables. Index can only be done on table columns and considering those columns that will be used in table join and where clause makes the indexing significant. For a jsonb column you can use create index on table_name using gin(column_name, jsonb_path_ops).
The question is for Firebird 2.5. Let's assume we have the following query:
SELECT
EVENTS.ID,
EVENTS.TS,
EVENTS.DEV_TS,
EVENTS.COMPLETE_TS,
EVENTS.OBJ_ID,
EVENTS.OBJ_CODE,
EVENTS.SIGNAL_CODE,
EVENTS.SIGNAL_EVENT,
EVENTS.REACTION,
EVENTS.PROT_TYPE,
EVENTS.GROUP_CODE,
EVENTS.DEV_TYPE,
EVENTS.DEV_CODE,
EVENTS.SIGNAL_LEVEL,
EVENTS.SIGNAL_INFO,
EVENTS.USER_ID,
EVENTS.MEDIA_ID,
SIGNALS.ID AS SIGNAL_ID,
SIGNALS.SIGNAL_TYPE,
SIGNALS.IMAGE AS SIGNAL_IMAGE,
SIGNALS.NAME AS SIGNAL_NAME,
REACTION.INFO,
USERS.NAME AS USER_NAME
FROM EVENTS
LEFT OUTER JOIN SIGNALS ON (EVENTS.SIGNAL_ID = SIGNALS.ID)
LEFT OUTER JOIN REACTION ON (EVENTS.ID = REACTION.EVENTS_ID)
LEFT OUTER JOIN USERS ON (EVENTS.USER_ID = USERS.ID)
WHERE (TS BETWEEN '27.07.2021 00:00:00' AND '28.07.2021 10:34:08')
AND (OBJ_ID = 8973)
AND (DEV_CODE IN (0, 1234))
AND (DEV_TYPE = 79)
AND (PROT_TYPE = 8)
ORDER BY TS;
EVENTS has about 190 million records by now and this query takes too much time to complete. As I read here, the tables have to have indexes on all the columns that are used.
Here are the CREATE INDEX statements for the EVENTS table:
CREATE INDEX FK_EVENTS_OBJ ON EVENTS (OBJ_ID);
CREATE INDEX FK_EVENTS_SIGNALS ON EVENTS (SIGNAL_ID);
CREATE INDEX IDX_EVENTS_COMPLETE_TS ON EVENTS (COMPLETE_TS);
CREATE INDEX IDX_EVENTS_OBJ_SIGNAL_TS ON EVENTS (OBJ_ID,SIGNAL_ID,TS);
CREATE INDEX IDX_EVENTS_TS ON EVENTS (TS);
Here is the data from the PLAN analyzer:
PLAN JOIN (JOIN (JOIN (EVENTS ORDER IDX_EVENTS_TS INDEX (FK_EVENTS_OBJ, IDX_EVENTS_TS), SIGNALS INDEX (PK_SIGNALS)), REACTION INDEX (IDX_REACTION_EVENTS)), USERS INDEX (PK_USERS))
As requested the speed of the execution:
without LEFT JOIN -> 138ms
with LEFT JOIN -> 338ms
Is there another way to speed up the execution of the query besides indexing the columns or maybe add another index?
If I add another index will the optimizer choose to use it?
You can only optimize the joins themselves by being sure that the keys are indexed in the second tables. These all look like primary keys, so they should have appropriate indexes.
For this WHERE clause:
WHERE TS BETWEEN '27.07.2021 00:00:00' AND '28.07.2021 10:34:08')
OBJ_ID = 8973 AND
DEV_CODE IN (0, 1234) AND
DEV_TYPE = 79 AND
PROT_TYPE = 8
You probably want an index on (OBJ_ID, DEV_TYPE, PROT_TYPE, TS, DEV_CODE). The order of the first three keys is not particularly important because they are all equality comparisons. I am guessing that one day of data is fewer rows than two device codes.
First of all you want to find the table1 rows quickly. You are using several columns in your WHERE clause to get them. Provide an index on these columns. Which column is the most selective? I.e. which criteria narrows the result rows most? Let's say it's dt, so we put this first:
create index idx1 on table1 (dt, oid, pt, ts, dc);
I have put ts and dt last, because we are looking for more than one value in these columns. It may still be that putting ts or dsas the first column is a good choice. Sometimes we have to play around with this. I.e. provide several indexes with the column order changed and then see which one gets used by the DBMS.
Tables table2 and tabe4 get accessed by the primary key for which exists an index. But table3 gets accessed by t1id. So provide an index on that, too:
create index idx2 on table3 (t1id);
I have two tables, ClaimPaymentHistory and RemittanceHistory which I am currently joining with the following query.
select rh."EventHistory"
from "ClaimPaymentHistory" ph, jsonb_array_elements(ph."EventHistory") payments
inner join "RemittanceHistory" rh
on payments->> 'rk' = rh."RemittanceRefKey"::text
where ph."ClaimRefKey" = #ClaimRefKey
I wanted to improve this query using the following index:
CREATE INDEX claim_payment_history_gin_idx ON "ClaimPaymentHistory"
USING gin ("EventHistory" jsonb_path_ops)
But I don't appear to get any improvement with this. However, I can see this index being leveraged if I query the EventHistory column of that table using the #> operator, for example like so:
select * from "ClaimPaymentHistory" where "EventHistory" #> '[{"rk": 637453920516771103}]';
So my question is, am I able to create a join using that contains operator? I've been playing with the syntax but can't get anything to work.
If I am unable to create a join with that operator, what would be my best options for indexing?
That index could be used if you wrote the query like this:
select rh."EventHistory"
from "RemittanceHistory" rh join "ClaimPaymentHistory" ph
on ph."EventHistory" #> jsonb_build_array(jsonb_build_object('rk',rh."RemittanceRefKey"))
where ph."ClaimRefKey" = 5;
However, this unlikely to have good performance unless "RemittanceHistory" has few rows in it.
...what would be my best options for indexing?
The obvious choice, if you don't have them already, would be regular (btree) indexes on rh."RemittanceRefKey" and ph."ClaimRefKey".
Also, look at (and show us) the EXPLAIN (ANALYZE, BUFFERS) for the original query you want to make faster.
I wound up refactoring the table structure. Instead of a join through RemittanceRefKey I added a JSONB column to RemittanceHistory called ClaimRefKeys. This is simply an array of integer values and now I can lookup the desired rows with:
select "EventHistory" from "RemittanceHistory" where "ClaimRefKeys" #> #ClaimRefKey;
This combined with the following index gives pretty fantastic performance.
CREATE INDEX remittance_history_claimrefkeys_gin_idx ON "RemittanceHistory" USING gin ("ClaimRefKeys" jsonb_path_ops);
My data is in array format like [1,2,3] how to join with another table. Here the query i am trying:
select
RCM.header_details->'auditAssertion'as auditassertion
from masters."RCM" RCM
left join reference."AUDIT_ASSERTION_APPLICATION" as AAA on AAA.id=RCM.header_details->'auditAssertion'
You can use the ? operator to check if a value belongs to json(b) array:
select m.header_details->'auditAssertion'as auditassertion
from masters.rcm m
left join reference.audit_assertion_application a
on m.header_details->'auditAssertion' ? a.id::text
For performance, Postgres would support the following index:
create index on masters.rcm using gin ((header_details->'auditAssertion'));
I have the following compound sql statement for a lookup and I am trying to understand that are the optimal indexes (indices?) to create, and which ones I should leave out because they aren't needed or if it is counter productive to have multiple.
SELECT items.id, items.standard_part_number,
items.standard_price, items.quantity,
part_numbers.value, items.metadata,
items.image_file_name, items.updated_at
FROM items LEFT OUTER JOIN part_numbers ON items.id=part_numbers.item_id
AND part_numbers.account_id='#{account_id}'
WHERE items.standard_part_number LIKE '#{part_number}%'
UNION ALL
SELECT items.id, items.standard_part_number,
items.standard_price, items.quantity,
part_numbers.value, items.metadata,
items.image_file_name, items.updated_at
FROM items LEFT OUTER JOIN part_numbers ON items.id=part_numbers.item_id
AND part_numbers.account_id='#{account_id}'
WHERE part_numbers.value LIKE '#{part_number}%'
ORDER BY items.standard_part_number
LIMIT '#{limit}' OFFSET '#{offset}'
I have the following indices, some of them may not be necessary or could I be missing an index?... Or worse can having too many be working against the optimal performance configuration?
for items:
CREATE INDEX index_items_standard_part_number ON items (standard_part_number);
for part_numbers:
CREATE INDEX index_part_numbers_item_id ON part_numbers (item_id);
CREATE INDEX index_part_numbers_item_id_and_account_id on part_numbers (item_id,account_id);
CREATE INDEX index_part_numbers_item_id_and_account_id_and_value ON part_numbers (item_id,account_id,value);
CREATE INDEX index_part_numbers_item_id_and_value on part_numbers (item_id,value);
CREATE INDEX index_part_numbers_value on part_numbers (value);
Update:
The schema for the tables listed above
CREATE TABLE accounts (id INTEGER PRIMARY KEY,name TEXT,code TEXT UNIQUE,created_at INTEGER,updated_at INTEGER,company_id INTEGER,standard BOOLEAN,price_list_id INTEGER);
CREATE TABLE items (id INTEGER PRIMARY KEY,standard_part_number TEXT UNIQUE,standard_price INTEGER,part_number TEXT,price INTEGER,quantity INTEGER,unit_of_measure TEXT,metadata TEXT,image_file_name TEXT,created_at INTEGER,updated_at INTEGER,company_id INTEGER);
CREATE TABLE part_numbers (id INTEGER PRIMARY KEY,value TEXT,item_id INTEGER,account_id INTEGER,created_at INTEGER,updated_at INTEGER,company_id INTEGER,standard BOOLEAN);
Outer joins constrain the join order, so you should not use them unless necessary.
In the second subquery, the WHERE part_numbers.value LIKE ... clause would filter out any unmatched records anyway, so you should drop that LEFT OUTER.
SQLite can use at most one index per table per (sub)query.
So to be able to use the same index for both searching and sorting, both operations must use the same collation.
LIKE uses a case-insensitive collation, so the ORDER BY should be declared to use the same (ORDER BY items.standard_part_number COLLATE NOCASE).
This is not possible if the part numbers must be sorted case sensitively.
This is not needed if SQLite does not actually use the same index for both (check with EXPLAIN QUERY PLAN).
In the first subquery, there is no index that could be used for the items.standard_part_number LIKE '#{part_number}%' search.
You would need an index like this (NOCASE is needed for LIKE):
CREATE INDEX iii ON items(standard_part_number COLLATE NOCASE);
In the second subquery, SQLite is likely to use part_numbers as the outer table in the join because it has two filtered columns.
An index for these two searches must look like this (with NOCASE only for the second column):
CREATE INDEX ppp ON part_numbers(account_id, value COLLATE NOCASE);
With all these changes, the query and its EXPLAIN QUERY PLAN output look like this:
EXPLAIN QUERY PLAN
SELECT items.id, items.standard_part_number,
items.standard_price, items.quantity,
part_numbers.value, items.metadata,
items.image_file_name, items.updated_at
FROM items LEFT OUTER JOIN part_numbers ON items.id=part_numbers.item_id
AND part_numbers.account_id='#{account_id}'
WHERE items.standard_part_number LIKE '#{part_number}%'
UNION ALL
SELECT items.id, items.standard_part_number,
items.standard_price, items.quantity,
part_numbers.value, items.metadata,
items.image_file_name, items.updated_at
FROM items JOIN part_numbers ON items.id=part_numbers.item_id
AND part_numbers.account_id='#{account_id}'
WHERE part_numbers.value LIKE '#{part_number}%'
ORDER BY items.standard_part_number COLLATE NOCASE
LIMIT -1 OFFSET 0;
1|0|0|SEARCH TABLE items USING INDEX iii (standard_part_number>? AND standard_part_number<?)
1|1|1|SEARCH TABLE part_numbers USING COVERING INDEX index_part_numbers_item_id_and_account_id_and_value (item_id=? AND account_id=?)
2|0|1|SEARCH TABLE part_numbers USING INDEX ppp (account_id=? AND value>? AND value<?)
2|1|0|SEARCH TABLE items USING INTEGER PRIMARY KEY (rowid=?)
2|0|0|USE TEMP B-TREE FOR ORDER BY
0|0|0|COMPOUND SUBQUERIES 1 AND 2 (UNION ALL)
The second subquery cannot use an index for sorting because part_numbers is not the outer table in the join, but the speedup from looking up both account_id and value through an index is likely to be greater than the slowdown from doing an explicit sorting step.
For this query alone, you could drop all indexes not mentioned here.
If the part numbers can be searched case sensitively, you should remove all the COLLATE NOCASE stuff and replace the LIKE searches with a case-sensitive search (partnum BETWEEN 'abc' AND 'abcz').