jsonb field indexing

jsonb field indexing - sql

I am trying to optimize the following sql query
select * from begin_transaction where ("group"->>'id')::bigint = '5'
without using additional indexing i get this
Gather (cost=1000.00..91957.50 rows=4179 width=750) (actual time=0.158..218.972 rows=715002 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on begin_transaction (cost=0.00..90539.60 rows=1741 width=750) (actual time=0.020..127.525 rows=238334 loops=3)
Filter: (((""group"" ->> 'id'::text))::bigint = '5'::bigint)
Rows Removed by Filter: 40250
Planning Time: 0.039 ms
Execution Time: 235.200 ms
if add index (btree)
CREATE INDEX begin_transaction_group_id_idx
ON public.begin_transaction USING btree (((("group"->>'id'::text))::bigint))
I receive
Bitmap Heap Scan on begin_transaction (cost=80.81..13773.97 rows=4179 width=750) (actual time=43.647..414.756 rows=715002 loops=1)
Recheck Cond: (((""group"" ->> 'id'::text))::bigint = '5'::bigint)
Rows Removed by Index Recheck: 52117
Heap Blocks: exact=50534 lossy=33026
-> Bitmap Index Scan on begin_transaction_group_id_idx (cost=0.00..79.77 rows=4179 width=0) (actual time=35.852..35.852 rows=715002 loops=1)
Index Cond: (((""group"" ->> 'id'::text))::bigint = '5'::bigint)
Planning Time: 0.045 ms
Execution Time: 429.968 ms
any ideas how to go about it to increase performance? the group field is jsonb.

Related

POSTGRESQL: How to optimize index for substring of a column?

How to optimize index for substring of a column ?
For example, having a column postal_code storing a string of 5 characters. If most of my queries filter on the 2 first characters having an index on this column is not useful.
What if I create an index only on the substring:
CREATE INDEX ON index.annonces_parsed (left(postal_code, 2))
Is it a good solution, or is it better to add a new column storing only the substring and having an index on it ?
A query using this index could be:
select *
from index.cities
where left(postal_code, 2) = '83' --- Will it use the index on the substring ?
Thanks so much

I have test table which is has a 20 million records.
Test 1
CREATE INDEX test_a1_idx ON test (a1)
explain analyze
select * from test
where left(a1, 2) = '58'
Gather (cost=1000.00..103565.05 rows=40000 width=12) (actual time=0.429..468.428 rows=89712 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on test (cost=0.00..98565.05 rows=16667 width=12) (actual time=0.114..407.330 rows=29904 loops=3)
Filter: ("left"(a1, 2) = '58'::text)
Rows Removed by Filter: 2636765
Planning Time: 0.424 ms
Execution Time: 470.472 ms
explain analyze
select * from test
where a1 like '58%'
Gather (cost=1000.00..99284.01 rows=80523 width=12) (actual time=0.990..337.339 rows=89712 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on test (cost=0.00..90231.71 rows=33551 width=12) (actual time=0.233..278.740 rows=29904 loops=3)
Filter: (a1 ~~ '58%'::text)
Rows Removed by Filter: 2636765
Planning Time: 0.092 ms
Execution Time: 339.259 ms
Test 2
CREATE INDEX test_a1_idx1 ON test (left(a1, 2))
explain analyze
select * from test
where left(a1, 2) = '58'
Bitmap Heap Scan on test (cost=446.43..49455.46 rows=40000 width=12) (actual time=10.507..206.800 rows=89712 loops=1)
Recheck Cond: ("left"(a1, 2) = '58'::text)
Heap Blocks: exact=38298
-> Bitmap Index Scan on test_a1_idx1 (cost=0.00..436.43 rows=40000 width=0) (actual time=5.450..5.450 rows=89712 loops=1)
Index Cond: ("left"(a1, 2) = '58'::text)
Planning Time: 0.501 ms
Execution Time: 209.217 ms
explain analyze
select * from test
where a1 like '58%'
Gather (cost=1000.00..99284.01 rows=80523 width=12) (actual time=0.341..334.759 rows=89712 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on test (cost=0.00..90231.71 rows=33551 width=12) (actual time=0.110..287.313 rows=29904 loops=3)
Filter: (a1 ~~ '58%'::text)
Rows Removed by Filter: 2636765
Planning Time: 0.067 ms
Execution Time: 336.762 ms
Result:
It should be noted that DB does not use indexes when we use any function in conditions. For this reason, functional indexing provides very good performance for these cases.

Postgres changing the query from index Only scan to bit map scan when data set increases

I have two same queries but with different where condition values
explain analyse select survey_contact_id, relation_id, count(survey_contact_id), count(relation_id) from nomination where survey_id = 1565 and account_id = 225 and deleted_at is NULL group by survey_contact_id, relation_id;
explain analyse select survey_contact_id, relation_id, count(survey_contact_id), count(relation_id) from nomination where survey_id = 888 and account_id = 12 and deleted_at is NULL group by survey_contact_id, relation_id;
When I ran this two queries they both producing different result
for first query the result
GroupAggregate (cost=0.28..8.32 rows=1 width=24) (actual time=0.016..0.021 rows=4 loops=1)
Group Key: survey_contact_id, relation_id
-> Index Only Scan using test on nomination (cost=0.28..8.30 rows=1 width=8) (actual time=0.010..0.012 rows=5 loops=1)
Index Cond: ((account_id = 225) AND (survey_id = 1565))
Heap Fetches: 5
Planning time: 0.148 ms
Execution time: 0.058 ms
and for the 2nd one
GroupAggregate (cost=11.08..11.12 rows=2 width=24) (actual time=0.015..0.015 rows=0 loops=1)
Group Key: survey_contact_id, relation_id
-> Sort (cost=11.08..11.08 rows=2 width=8) (actual time=0.013..0.013 rows=0 loops=1)
Sort Key: survey_contact_id, relation_id
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on nomination (cost=4.30..11.07 rows=2 width=8) (actual time=0.008..0.008 rows=0 loops=1)
Recheck Cond: ((account_id = 12) AND (survey_id = 888) AND (deleted_at IS NULL))
-> Bitmap Index Scan on test (cost=0.00..4.30 rows=2 width=0) (actual time=0.006..0.006 rows=0 loops=1)
Index Cond: ((account_id = 12) AND (survey_id = 888))
Planning time: 0.149 ms
Execution time: 0.052 ms
Can anyone explain Me why Postgres is making a BitMap Scan instead of Index Only scan ?

The short version is that Postgres has a cost-based approach, so it has estimated that the cost of doing so is less in the second case, based on the statistics it has.
In your case, the total cost (estimated) of each of these queries is 8.32 and 11.12 respectively. You may be able to see the cost of the index-only scan for the second query by running set enable_bitmapscan = off.
Note that, based on its statistics, Postgres estimated that the first query would return 1 row (actually 4), and that the second would return 2 rows (actually 0).
There are several ways to get better statistics, but if analyze (or autovacuum) hasn't been run on that table for a while, that is a common cause of bad estimates. Another tell-tale that vacuum may not have been run recently (at least on this table) is the Heap Fetches: 5 you can see in the first query plan.
I'm confused by the "when data set increases" part of your question, please do add more context on that front if relevant.
Finally, if you're not already planning a PostgreSQL upgrade, I highly recommend doing so soon. 9.6 is nearly out of support, and versions 10, 11, 12, and 13 each contained a host of performance focussed features.

Significantly different time of execution of the query due to different date value in Postgres

I have a weird case of query execution performance here. The query has a date values in the WHERE clause, and the speed of executing varies by values of the date.
Actually:
for the dates from the range of the last 30 days, execution takes around 3 min
for the dates before the range of the last 30 days, execution takes a few seconds
The query is listed below, with the date in the last 30 days range:
select
sk2_.code as col_0_0_,
bra4_.code as col_1_0_,
st0_.quantity as col_2_0_,
bat1_.forecast as col_3_0_
from
TBL_st st0_,
TBL_bat bat1_,
TBL_sk sk2_,
TBL_bra bra4_
where
st0_.batc_id=bat1_.id
and bat1_.sku_id=sk2_.id
and bat1_.bran_id=bra4_.id
and not (exists (select
1
from
TBL_st st6_,
TBL_bat bat7_,
TBL_sk sk10_
where
st6_.batc_id=bat7_.id
and bat7_.sku_id=sk10_.id
and bat7_.bran_id=bat1_.bran_id
and sk10_.code=sk2_.code
and st6_.date>st0_.date
and sk10_.acco_id=1
and st6_.date>='2017-04-20'
and st6_.date<='2017-04-30'))
and sk2_.acco_id=1
and st0_.date>='2017-04-20'
and st0_.date<='2017-04-30'
and here is the plan for the query with the date in the last 30 days range:
Nested Loop (cost=289.06..19764.03 rows=1 width=430) (actual time=3482.062..326049.246 rows=249 loops=1)
-> Nested Loop Anti Join (cost=288.91..19763.86 rows=1 width=433) (actual time=3482.023..326048.023 rows=249 loops=1)
Join Filter: ((st6_.date > st0_.date) AND ((sk10_.code)::text = (sk2_.code)::text))
Rows Removed by Join Filter: 210558
-> Nested Loop (cost=286.43..13719.38 rows=1 width=441) (actual time=4.648..2212.042 rows=2474 loops=1)
-> Nested Loop (cost=286.00..6871.33 rows=13335 width=436) (actual time=4.262..657.823 rows=666738 loops=1)
-> Index Scan using uk_TBL_sk0_account_code on TBL_sk sk2_ (cost=0.14..12.53 rows=1 width=426) (actual time=1.036..1.084 rows=50 loops=1)
Index Cond: (acco_id = 1)
-> Bitmap Heap Scan on TBL_bat bat1_ (cost=285.86..6707.27 rows=15153 width=26) (actual time=3.675..11.308 rows=13335 loops=50)
Recheck Cond: (sku_id = sk2_.id)
Heap Blocks: exact=241295
-> Bitmap Index Scan on ix_al_batc_sku_id (cost=0.00..282.07 rows=15153 width=0) (actual time=3.026..3.026 rows=13335 loops=50)
Index Cond: (sku_id = sk2_.id)
-> Index Scan using ix_al_stle_batc_id on TBL_st st0_ (cost=0.42..0.50 rows=1 width=21) (actual time=0.002..0.002 rows=0 loops=666738)
Index Cond: (batc_id = bat1_.id)
Filter: ((date >= '2017-04-20 00:00:00'::timestamp without time zone) AND (date <= '2017-04-30 00:00:00'::timestamp without time zone))
Rows Removed by Filter: 1
-> Nested Loop (cost=2.49..3023.47 rows=1 width=434) (actual time=111.345..130.883 rows=86 loops=2474)
-> Hash Join (cost=2.06..2045.18 rows=1905 width=434) (actual time=0.010..28.028 rows=54853 loops=2474)
Hash Cond: (bat7_.sku_id = sk10_.id)
-> Index Scan using ix_al_batc_bran_id on TBL_bat bat7_ (cost=0.42..1667.31 rows=95248 width=24) (actual time=0.009..11.045 rows=54853 loops=2474)
Index Cond: (bran_id = bat1_.bran_id)
-> Hash (cost=1.63..1.63 rows=1 width=426) (actual time=0.026..0.026 rows=50 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 11kB
-> Seq Scan on TBL_sk sk10_ (cost=0.00..1.63 rows=1 width=426) (actual time=0.007..0.015 rows=50 loops=1)
Filter: (acco_id = 1)
-> Index Scan using ix_al_stle_batc_id on TBL_st st6_ (cost=0.42..0.50 rows=1 width=16) (actual time=0.002..0.002 rows=0 loops=135706217)
Index Cond: (batc_id = bat7_.id)
Filter: ((date >= '2017-04-20 00:00:00'::timestamp without time zone) AND (date <= '2017-04-30 00:00:00'::timestamp without time zone))
Rows Removed by Filter: 1
-> Index Scan using TBL_bra_pk on TBL_bra bra4_ (cost=0.14..0.16 rows=1 width=13) (actual time=0.003..0.003 rows=1 loops=249)
Index Cond: (id = bat1_.bran_id)
Planning time: 8.108 ms
Execution time: 326049.583 ms
Here is the same query with the date before the last 30 days range:
select
sk2_.code as col_0_0_,
bra4_.code as col_1_0_,
st0_.quantity as col_2_0_,
bat1_.forecast as col_3_0_
from
TBL_st st0_,
TBL_bat bat1_,
TBL_sk sk2_,
TBL_bra bra4_
where
st0_.batc_id=bat1_.id
and bat1_.sku_id=sk2_.id
and bat1_.bran_id=bra4_.id
and not (exists (select
1
from
TBL_st st6_,
TBL_bat bat7_,
TBL_sk sk10_
where
st6_.batc_id=bat7_.id
and bat7_.sku_id=sk10_.id
and bat7_.bran_id=bat1_.bran_id
and sk10_.code=sk2_.code
and st6_.date>st0_.date
and sk10_.acco_id=1
and st6_.date>='2017-01-20'
and st6_.date<='2017-01-30'))
and sk2_.acco_id=1
and st0_.date>='2017-01-20'
and st0_.date<='2017-01-30'
and here is the plan for the query with the date before the last 30 days range:
Hash Join (cost=576.33..27443.95 rows=48 width=430) (actual time=132.732..3894.554 rows=250 loops=1)
Hash Cond: (bat1_.bran_id = bra4_.id)
-> Merge Anti Join (cost=572.85..27439.82 rows=48 width=433) (actual time=132.679..3894.287 rows=250 loops=1)
Merge Cond: ((sk2_.code)::text = (sk10_.code)::text)
Join Filter: ((st6_.date > st0_.date) AND (bat7_.bran_id = bat1_.bran_id))
Rows Removed by Join Filter: 84521
-> Nested Loop (cost=286.43..13719.38 rows=48 width=441) (actual time=26.105..1893.523 rows=2491 loops=1)
-> Nested Loop (cost=286.00..6871.33 rows=13335 width=436) (actual time=1.159..445.683 rows=666738 loops=1)
-> Index Scan using uk_TBL_sk0_account_code on TBL_sk sk2_ (cost=0.14..12.53 rows=1 width=426) (actual time=0.035..0.084 rows=50 loops=1)
Index Cond: (acco_id = 1)
-> Bitmap Heap Scan on TBL_bat bat1_ (cost=285.86..6707.27 rows=15153 width=26) (actual time=1.741..7.148 rows=13335 loops=50)
Recheck Cond: (sku_id = sk2_.id)
Heap Blocks: exact=241295
-> Bitmap Index Scan on ix_al_batc_sku_id (cost=0.00..282.07 rows=15153 width=0) (actual time=1.119..1.119 rows=13335 loops=50)
Index Cond: (sku_id = sk2_.id)
-> Index Scan using ix_al_stle_batc_id on TBL_st st0_ (cost=0.42..0.50 rows=1 width=21) (actual time=0.002..0.002 rows=0 loops=666738)
Index Cond: (batc_id = bat1_.id)
Filter: ((date >= '2017-01-20 00:00:00'::timestamp without time zone) AND (date <= '2017-01-30 00:00:00'::timestamp without time zone))
Rows Removed by Filter: 1
-> Materialize (cost=286.43..13719.50 rows=48 width=434) (actual time=15.584..1986.953 rows=84560 loops=1)
-> Nested Loop (cost=286.43..13719.38 rows=48 width=434) (actual time=15.577..1983.384 rows=2491 loops=1)
-> Nested Loop (cost=286.00..6871.33 rows=13335 width=434) (actual time=0.843..482.864 rows=666738 loops=1)
-> Index Scan using uk_TBL_sk0_account_code on TBL_sk sk10_ (cost=0.14..12.53 rows=1 width=426) (actual time=0.005..0.052 rows=50 loops=1)
Index Cond: (acco_id = 1)
-> Bitmap Heap Scan on TBL_bat bat7_ (cost=285.86..6707.27 rows=15153 width=24) (actual time=2.051..7.902 rows=13335 loops=50)
Recheck Cond: (sku_id = sk10_.id)
Heap Blocks: exact=241295
-> Bitmap Index Scan on ix_al_batc_sku_id (cost=0.00..282.07 rows=15153 width=0) (actual time=1.424..1.424 rows=13335 loops=50)
Index Cond: (sku_id = sk10_.id)
-> Index Scan using ix_al_stle_batc_id on TBL_st st6_ (cost=0.42..0.50 rows=1 width=16) (actual time=0.002..0.002 rows=0 loops=666738)
Index Cond: (batc_id = bat7_.id)
Filter: ((date >= '2017-01-20 00:00:00'::timestamp without time zone) AND (date <= '2017-01-30 00:00:00'::timestamp without time zone))
Rows Removed by Filter: 1
-> Hash (cost=2.10..2.10 rows=110 width=13) (actual time=0.033..0.033 rows=110 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 14kB
-> Seq Scan on TBL_bra bra4_ (cost=0.00..2.10 rows=110 width=13) (actual time=0.004..0.013 rows=110 loops=1)
Planning time: 14.542 ms
Execution time: 3894.793 ms
Does anyone have an idea why does this happens.
Did anyone had an experience with anything similar?
Thank you very much.
Kind regards, Petar

I am not sure, but I had a similar case a while ago(On ORACLE but i guess it is not important).
in my case the difference originated at the difference between the amount of data, meaning: if you have 1% of data from the past 30 days, it uses the indexs. when you need "older" data (the rest 99% of the data) it decides to not use the index and to do a full scan(in the form of nested loop and not hash join).
If you sure that the data distribution is ok then maybe try collecting statistics(worked for me at the time). eventually you can start to analyze every peace of this query and to see what part exactly is the bottleneck and work from there.

BTree indexes can have some issues with dates, especially if you're removing old data from the table (ie, deleting everything older than 90 days). It can cause the tables to get lopsided, with all of the rows being down one branch of the tree. Even without removing old dates, if there are many more "new" rows than "old" rows, it can still happen.
But I don't see your query plans using an index on st0_.date, so I don't think that's the issue. If you can afford a table lock on st0_, you can test this theory by running a REINDEX operation on any indexes that contain st0_.date.
Instead, I think you just have a lot more rows that match the 2017-01-20 to 2017-01-30 range vs. the 2017-04-20 to 2017-04-30 range. The first doubly indented Nested Loop is the same in both queries, so I'll ignore it. The second doubly indended stanza is different, and much more expensive in the slow query:
-> Materialize (cost=286.43..13719.50 rows=48 width=434) (actual time=15.584..1986.953 rows=84560 loops=1)
-> Nested Loop (cost=286.43..13719.38 rows=48 width=434) (actual time=15.577..1983.384 rows=2491 loops=1)
-> Nested Loop (cost=286.00..6871.33 rows=13335 width=434) (actual time=0.843..482.864 rows=666738
vs
-> Nested Loop (cost=2.49..3023.47 rows=1 width=434) (actual time=111.345..130.883 rows=86 loops=2474)
-> Hash Join (cost=2.06..2045.18 rows=1905 width=434) (actual time=0.010..28.028 rows=54853 loops=2474)
Materialize can be an expensive operation that doesn't necessarily scale with the estimated cost. Take a look at https://www.postgresql.org/docs/10/static/using-explain.html , and search for "Materialize". Also note that the estimated number of rows is much higher in the slow version.
I'm not 100% sure, but I believe that tweaking the "work_mem" parameter can have some effect in this area (https://www.postgresql.org/docs/9.4/static/runtime-config-resource.html#GUC-WORK-MEM). To test this theory, you can change that value per session using
SET LOCAL work_mem = '8MB';

Why is this query taking so long on JSONB Gin index field? Can I fix it so it actually uses the index?

Recently we changed the format of one of our tables from using a single entry in a column to having a JSONB column in the format of ["key1","key2","key3"] etc. Although we built a GIN index on the JSONB field the queries that we use on it are EXTREMELY slow (in the range of 50 minutes in explain plan). I am trying to find out a way to optimize the query and to correctly utilize the index. I pasted the query below as well as the explain plan for it. The indexed fields are visit.visitor, launch.campaign_key, launch.launch_key, visit.store_key and visits.stop JSONB field as GIN index. We are using PostgresQL 9.4
explain (analyze on) select count(subselect.visitors) as visitors,
subselect.campaign as campaign
from (
select distinct visit.visitor as visitors,
launch.campaign_key as campaign
from visit
join launch on (jsonb_exists(visit.stops, launch.launch_key)) where
visit.store_key = 'ahBzfmdlYXJsYXVuY2gtaHVi'
and launch.state = 'PRODUCTION') as subselect group by subselect.campaign
Explain results:
HashAggregate (cost=63873548.47..63873550.47 rows=200 width=88) (actual time=248617.348..248617.365 rows=58 loops=1)
Group Key: launch.campaign_key
-> HashAggregate (cost=63519670.22..63661221.52 rows=14155130 width=88) (actual time=248587.320..248616.558 rows=1938 loops=1)
Group Key: visit.visitor, launch.campaign_key
-> HashAggregate (cost=63307343.27..63448894.57 rows=14155130 width=88) (actual time=248553.278..248584.868 rows=1938 loops=1)
Group Key: visit.visitor, launch.campaign_key
-> Nested Loop (cost=4903.09..56997885.96 rows=1261891461 width=88) (actual time=180648.410..248550.249 rows=2085 loops=1)
Join Filter: jsonb_exists(visit.stops, (launch.launch_key)::text)
Rows Removed by Join Filter: 624114512
-> Bitmap Heap Scan on launch (cost=3213.19..126084.38 rows=169389 width=123) (actual time=32.082..317.561 rows=166121 loops=1)
Recheck Cond: ((state)::text = 'PRODUCTION'::text)
Heap Blocks: exact=56635
-> Bitmap Index Scan on launch_state_idx (cost=0.00..3170.85 rows=169389 width=0) (actual time=21.172..21.172 rows=166121 loops=1)
Index Cond: ((state)::text = 'PRODUCTION'::text)
-> Materialize (cost=1689.89..86736.04 rows=22349 width=117) (actual time=0.000..0.487 rows=3757 loops=166121)
-> Bitmap Heap Scan on visit (cost=1689.89..86624.29 rows=22349 width=117) (actual time=1.324..14.381 rows=3757 loops=1)
Recheck Cond: ((store_key)::text = 'ahBzfmdlYXJsYXVuY2gtaHVicg8LEgVTdG9yZRinzbKcDQw'::text)
Heap Blocks: exact=3672
-> Bitmap Index Scan on visit_store_key_idx (cost=0.00..1684.31 rows=22349 width=0) (actual time=0.780..0.780 rows=3757 loops=1)
Index Cond: ((store_key)::text = 'ahBzfmdlYXJsYXVuY2gtaHVicg8LEgVTdG9yZRinzbKcDQw'::text)
Planning time: 0.232 ms
Execution time: 248708.088 ms
I should mention the index on stops is built
CREATE INDEX ON visit USING GIN (stops)
I'm wondering if switching to building it to
CREATE INDEX ON visit USING GIN (stops->’value')
Will resolve the issue?

The wrapper function jsonb_exists() prevents the use of the gin index on visits.stops. Instead of
from visit
join launch on (jsonb_exists(visit.stops, launch.launch_key))
try
from visit
join launch on visit.stops ? launch.launch_key::text

Querying Postgres Table with JSONB data

I have a Table which stores data in a JSONB column.
Now, what i want to do is, query that table, and fetch records, which have specific values for a key.
This works fine:
SELECT "documents".*
FROM "documents"
WHERE (data #> '{"type": "foo"}')
But what i want to do is, fetch all the rows in the table, which have types foo OR bar.
I tried this:
SELECT "documents".*
FROM "documents"
WHERE (data #> '{"type": ["foo", "bar"]}')
But this doesn't seem to work.
I also tried this:
SELECT "documents".*
FROM "documents"
WHERE (data->'type' ?| array['foo', 'bar'])
Which works, but if I specify a key like so data->'type' it takes away the dynamicity of the query.
BTW, I am using Ruby on Rails with Postgres, so all the queries are going through ActiveRecord. This is how:
Document.where("data #> ?", query)

if I specify a key like so data->'type' it takes away the dynamicity of the query.
I understand you have a gin index on the column data defined like this:
CREATE INDEX ON documents USING GIN (data);
The index works for this query:
EXPLAIN ANALYSE
SELECT "documents".*
FROM "documents"
WHERE data #> '{"type": "foo"}';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on documents (cost=30.32..857.00 rows=300 width=25) (actual time=0.639..0.640 rows=1 loops=1)
Recheck Cond: (data #> '{"type": "foo"}'::jsonb)
Heap Blocks: exact=1
-> Bitmap Index Scan on documents_data_idx (cost=0.00..30.25 rows=300 width=0) (actual time=0.581..0.581 rows=1 loops=1)
Index Cond: (data #> '{"type": "foo"}'::jsonb)
Planning time: 7.928 ms
Execution time: 0.841 ms
but not for this one:
EXPLAIN ANALYSE
SELECT "documents".*
FROM "documents"
WHERE (data->'type' ?| array['foo', 'bar']);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Seq Scan on documents (cost=0.00..6702.98 rows=300 width=25) (actual time=31.895..92.813 rows=2 loops=1)
Filter: ((data -> 'type'::text) ?| '{foo,bar}'::text[])
Rows Removed by Filter: 299997
Planning time: 1.836 ms
Execution time: 92.839 ms
Solution 1. Use the operator #> twice, the index will be used for both conditions:
EXPLAIN ANALYSE
SELECT "documents".*
FROM "documents"
WHERE data #> '{"type": "foo"}'
OR data #> '{"type": "bar"}';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on documents (cost=60.80..1408.13 rows=600 width=25) (actual time=0.222..0.233 rows=2 loops=1)
Recheck Cond: ((data #> '{"type": "foo"}'::jsonb) OR (data #> '{"type": "bar"}'::jsonb))
Heap Blocks: exact=2
-> BitmapOr (cost=60.80..60.80 rows=600 width=0) (actual time=0.204..0.204 rows=0 loops=1)
-> Bitmap Index Scan on documents_data_idx (cost=0.00..30.25 rows=300 width=0) (actual time=0.144..0.144 rows=1 loops=1)
Index Cond: (data #> '{"type": "foo"}'::jsonb)
-> Bitmap Index Scan on documents_data_idx (cost=0.00..30.25 rows=300 width=0) (actual time=0.059..0.059 rows=1 loops=1)
Index Cond: (data #> '{"type": "bar"}'::jsonb)
Planning time: 3.170 ms
Execution time: 0.289 ms
Solution 2. Create an additional index on (data->'type'):
CREATE INDEX ON documents USING GIN ((data->'type'));
EXPLAIN ANALYSE
SELECT "documents".*
FROM "documents"
WHERE (data->'type' ?| array['foo', 'bar']);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on documents (cost=30.32..857.75 rows=300 width=25) (actual time=0.056..0.067 rows=2 loops=1)
Recheck Cond: ((data -> 'type'::text) ?| '{foo,bar}'::text[])
Heap Blocks: exact=2
-> Bitmap Index Scan on documents_expr_idx (cost=0.00..30.25 rows=300 width=0) (actual time=0.035..0.035 rows=2 loops=1)
Index Cond: ((data -> 'type'::text) ?| '{foo,bar}'::text[])
Planning time: 2.951 ms
Execution time: 0.108 ms
Solution 3. In fact this is a variant of the solution 1, with a different format of the condition which may be more convenient to use by the client program:
EXPLAIN ANALYSE
SELECT "documents".*
FROM "documents"
WHERE data #> any(array['{"type": "foo"}', '{"type": "bar"}']::jsonb[]);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on documents (cost=60.65..1544.20 rows=600 width=26) (actual time=0.803..0.819 rows=2 loops=1)
Recheck Cond: (data #> ANY ('{"{\"type\": \"foo\"}","{\"type\": \"bar\"}"}'::jsonb[]))
Heap Blocks: exact=2
-> Bitmap Index Scan on documents_data_idx (cost=0.00..60.50 rows=600 width=0) (actual time=0.778..0.778 rows=2 loops=1)
Index Cond: (data #> ANY ('{"{\"type\": \"foo\"}","{\"type\": \"bar\"}"}'::jsonb[]))
Planning time: 2.080 ms
Execution time: 0.304 ms
(7 rows)
Read more in the documentation.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

jsonb field indexing - sql

Related

POSTGRESQL: How to optimize index for substring of a column?

Postgres changing the query from index Only scan to bit map scan when data set increases

Significantly different time of execution of the query due to different date value in Postgres

Why is this query taking so long on JSONB Gin index field? Can I fix it so it actually uses the index?

Querying Postgres Table with JSONB data

Categories

Resources