I have two same queries but with different where condition values
explain analyse select survey_contact_id, relation_id, count(survey_contact_id), count(relation_id) from nomination where survey_id = 1565 and account_id = 225 and deleted_at is NULL group by survey_contact_id, relation_id;
explain analyse select survey_contact_id, relation_id, count(survey_contact_id), count(relation_id) from nomination where survey_id = 888 and account_id = 12 and deleted_at is NULL group by survey_contact_id, relation_id;
When I ran this two queries they both producing different result
for first query the result
GroupAggregate (cost=0.28..8.32 rows=1 width=24) (actual time=0.016..0.021 rows=4 loops=1)
Group Key: survey_contact_id, relation_id
-> Index Only Scan using test on nomination (cost=0.28..8.30 rows=1 width=8) (actual time=0.010..0.012 rows=5 loops=1)
Index Cond: ((account_id = 225) AND (survey_id = 1565))
Heap Fetches: 5
Planning time: 0.148 ms
Execution time: 0.058 ms
and for the 2nd one
GroupAggregate (cost=11.08..11.12 rows=2 width=24) (actual time=0.015..0.015 rows=0 loops=1)
Group Key: survey_contact_id, relation_id
-> Sort (cost=11.08..11.08 rows=2 width=8) (actual time=0.013..0.013 rows=0 loops=1)
Sort Key: survey_contact_id, relation_id
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on nomination (cost=4.30..11.07 rows=2 width=8) (actual time=0.008..0.008 rows=0 loops=1)
Recheck Cond: ((account_id = 12) AND (survey_id = 888) AND (deleted_at IS NULL))
-> Bitmap Index Scan on test (cost=0.00..4.30 rows=2 width=0) (actual time=0.006..0.006 rows=0 loops=1)
Index Cond: ((account_id = 12) AND (survey_id = 888))
Planning time: 0.149 ms
Execution time: 0.052 ms
Can anyone explain Me why Postgres is making a BitMap Scan instead of Index Only scan ?
The short version is that Postgres has a cost-based approach, so it has estimated that the cost of doing so is less in the second case, based on the statistics it has.
In your case, the total cost (estimated) of each of these queries is 8.32 and 11.12 respectively. You may be able to see the cost of the index-only scan for the second query by running set enable_bitmapscan = off.
Note that, based on its statistics, Postgres estimated that the first query would return 1 row (actually 4), and that the second would return 2 rows (actually 0).
There are several ways to get better statistics, but if analyze (or autovacuum) hasn't been run on that table for a while, that is a common cause of bad estimates. Another tell-tale that vacuum may not have been run recently (at least on this table) is the Heap Fetches: 5 you can see in the first query plan.
I'm confused by the "when data set increases" part of your question, please do add more context on that front if relevant.
Finally, if you're not already planning a PostgreSQL upgrade, I highly recommend doing so soon. 9.6 is nearly out of support, and versions 10, 11, 12, and 13 each contained a host of performance focussed features.
Setup
A table with one jsonb column attributes and a non unique numeric ID campaignid:
CREATE TABLE coupons (
id integer NOT NULL,
created timestamp with time zone DEFAULT now( ) NOT NULL,
campaignid bigint NOT NULL,
attributes jsonb NOT NULL
);
This table would have up to 500M rows, arbitrary key/values in attributes and hundreds of different campaignid values.
Two indexes exist on the table:
CREATE INDEX campaignid_attrs_idx ON coupons
USING gin (campaignid,attributes);
CREATE INDEX campaignid_idx ON coupons
USING btree (campaignid, deleted);
What I did
I executed the query:
SELECT COUNT(*)
FROM coupons
WHERE
(campaignid = 97 AND
attributes #> '{"CountryId": 3}');
Expected Results
I expected the index campaignid_attrs_idx on (campaignid,attributes) to be fully used and the query to complete quite fast.
Actual Result
The query took a long time (~40 seconds) to execute.
Here's the output from explain (ANALYZE, COSTS):
Aggregate (cost=32337.78..32337.79 rows=1 width=8) (actual time=39726.410..39726.414 rows=1 loops=1)
-> Bitmap Heap Scan on coupons (cost=30164.40..32332.44 rows=2136 width=0) (actual time=16893.439..39549.891 rows=1088478 loops=1)
" Recheck Cond: ((attributes #> '{""CountryId"": 3}'::jsonb) AND (campaignid = 97))"
Rows Removed by Index Recheck: 10531586
Heap Blocks: exact=138344 lossy=583282
-> BitmapAnd (cost=30164.40..30164.40 rows=2136 width=0) (actual time=16837.885..16837.887 rows=0 loops=1)
-> Bitmap Index Scan on coupons_campaignid_attrs_index (cost=0.00..1465.15 rows=178954 width=0) (actual time=9872.279..9872.279 rows=81799565 loops=1)
" Index Cond: (attributes #> '{""CountryId"": 3}'::jsonb)"
-> Bitmap Index Scan on campaignid_idx (cost=0.00..28697.93 rows=2135515 width=0) (actual time=6454.972..6454.972 rows=3088167 loops=1)
Index Cond: (campaignid = 97)
Planning Time: 0.175 ms
Execution Time: 39726.480 ms
Conclusions
It seems like the index campaignid_attrs_idx was used for the first part of the query attributes #> '{"CountryId": 3}' returning ~80M rows, while the index campaignid_idx was used on the second part of the WHERE clause campaignid = 97 in parallel returning ~3M rows. Results from both parts were intersected to arrive at a set that fulfills both conditions. Then there was a Bitmap Heap Scan which verified that the result set complies with the desired conditions which took most of the time (16893.439..39549.891)
My main question, why wasn't campaignid_attrs_idx used to filter both conditions?
EDIT: I removed the second index campaignid_attrs_idx to see if then the multicolumn index will be used for both conditions. Strangely I still see that the only one of the conditions used in the index scan. Here's the plan:
Aggregate (cost=181951.27..181951.28 rows=1 width=8) (actual time=209633.017..209633.018 rows=1 loops=1)
-> Bitmap Heap Scan on coupons (cost=1424.30..181945.81 rows=2183 width=0) (actual time=8938.605..209401.433 rows=1091580 loops=1)
" Recheck Cond: (attributes #> '{""CountryId"": 3}'::jsonb)"
Rows Removed by Index Recheck: 31487517
Filter: (campaignid = 97)
Rows Removed by Filter: 80674951
Heap Blocks: exact=121875 lossy=5572599
-> Bitmap Index Scan on coupons_campaignid_attributes_idx (cost=0.00..1423.75 rows=179434 width=0) (actual time=8908.682..8908.682 rows=81802589 loops=1)
" Index Cond: (attributes #> '{""CountryId"": 3}'::jsonb)"
Planning Time: 6.885 ms
Execution Time: 209638.234 ms
I am trying to workout query optimisation on id. Not sure which one way should I use. Below is the query plan using explain and cost wise looks similar.
1. explain (analyze, buffers) SELECT * FROM table1 WHERE id = ANY (ARRAY['00e289b0-1ac8-451f-957f-e00bc289148e'::uuid,...]);
QUERY PLAN:
Index Scan using table1_pkey on table1 (cost=0.42..641.44 rows=76 width=835) (actual time=0.258..2.603 rows=76 loops=1)
Index Cond: (id = ANY ('{00e289b0-1ac8-451f-957f-e00bc289148e,...}'::uuid[]))
Buffers: shared hit=231 read=73
Planning Time: 0.487 ms
Execution Time: 2.715 ms)
2. explain (analyze, buffers) SELECT * FROM table1 WHERE id = ANY (VALUES ('00e289b0-1ac8-451f-957f-e00bc289148e'::uuid),...);
QUERY PLAN:
Nested Loop (cost=1.56..644.10 rows=76 width=835) (actual time=0.058..0.297 rows=76 loops=1)
Buffers: shared hit=304
-> HashAggregate (cost=1.14..1.90 rows=76 width=16) (actual time=0.049..0.060 rows=76 loops=1)
Group Key: "*VALUES*".column1
-> Values Scan on "*VALUES*" (cost=0.00..0.95 rows=76 width=16) (actual time=0.006..0.022 rows=76 loops=1)
-> Index Scan using table1_pkey on table1 (cost=0.42..8.44 rows=1 width=835) (actual time=0.002..0.003 rows=1 loops=76)
Index Cond: (id = "*VALUES*".column1)
Buffers: shared hit=304
Planning Time: 0.437 ms
Execution Time: 0.389 ms
Looks like VALUES () does some hashing and join to improve performance but not sure.
NOTE: In my practical use case, id is uuid_generate_v4() e.x. d31cddc0-1771-4de8-ad41-e6c568b39a5d but the column may not be indexed as such.
Also, I have a table of with 5-10 million records.
Which way is for the better query performance?
Both options seem reasonable. I would, however, suggest to avoid casting the column you filter on. Instead, you should cast the literal values to uuid:
SELECT *
FROM table1
WHERE id = ANY (ARRAY['00e289b0-1ac8-451f-957f-e00bc289148e'::uuid, ...]);
This should allow the database to take advantage of an index on column id.
A few weeks ago our team faced difficulties with our SQL query because the data volume has increased a lot.
We would appreciate any advice on how we can update schema or optimize the query in order to keep status filtering logic the same.
In a nutshell:
We have two tables a and b. b has FK to a as M-1.
a
id | processed
1 TRUE
2 TRUE
b
a_id| status | type_id | l_id
1 '1' 5 105
1 '3' 6 105
2 '2' 7 105
We can have only one status for a unique combination of (l_id, type_id, a_id).
We need to calculate count of a rows filtered by statuses from b grouped by a_id .
In table a we have 5 300 000 rows.
In table b 750 000 000 rows.
So we need to calculate status for each a row by the next rules:
For a_id there are x rows in b:
1) If at least one status of x equals '3', then status for a_id is '3'.
2) If all statuses of x equal 1 then the status is 1.
And so on.
In current approach we use array_agg() function for filtering of subselection. So our query looks like:
SELECT COUNT(*)
FROM (
SELECT
FROM (
SELECT at.id as id,
BOOL_AND(bt.processed) AS not_pending,
ARRAY_AGG(DISTINCT bt.status) AS status
FROM a AS at
LEFT OUTER JOIN b AS bt
ON (at.id = bt.a_id AND bt.l_id = 105 AND
bt.type_id IN (2,10,18,1,4,5,6))
WHERE at.processed = True
GROUP BY at.id) sub
WHERE not_pending = True
AND status <# ARRAY ['1']::"char"[]
) counter
;
Our plan looks like:
Aggregate (cost=14665999.33..14665999.34 rows=1 width=8) (actual time=1875987.846..1875987.846 rows=1 loops=1)
-> GroupAggregate (cost=14166691.70..14599096.58 rows=5352220 width=37) (actual time=1875987.844..1875987.844 rows=0 loops=1)
Group Key: at.id
Filter: (bool_and(bt.processed) AND (array_agg(DISTINCT bt.status) <# '{1}'::"char"[]))
Rows Removed by Filter: 5353930
-> Sort (cost=14166691.70..14258067.23 rows=36550213 width=6) (actual time=1860315.593..1864175.762 rows=37430745 loops=1)
Sort Key: at.id
Sort Method: external merge Disk: 586000kB
-> Hash Right Join (cost=1135654.48..8076230.39 rows=36550213 width=6) (actual time=55665.584..1846965.271 rows=37430745 loops=1)
Hash Cond: (bt.a_id = at.id)
-> Bitmap Heap Scan on b bt (cost=882095.79..7418660.65 rows=36704370 width=6) (actual time=51871.658..1826058.186 rows=37430378 loops=1)
Recheck Cond: ((l_id = 105) AND (type_id = ANY ('{2,10,18,1,4,5,6}'::integer[])))
Rows Removed by Index Recheck: 574462752
Heap Blocks: exact=28898 lossy=5726508
-> Bitmap Index Scan on db_page_index_atableobjects (cost=0.00..872919.69 rows=36704370 width=0) (actual time=51861.815..51861.815 rows=37586483 loops=1)
Index Cond: ((l_id = 105) AND (type_id = ANY ('{2,10,18,1,4,5,6}'::integer[])))
-> Hash (cost=165747.94..165747.94 rows=5352220 width=4) (actual time=3791.710..3791.710 rows=5353930 loops=1)
Buckets: 131072 Batches: 128 Memory Usage: 2507kB
-> Seq Scan on a at (cost=0.00..165747.94 rows=5352220 width=4) (actual time=0.528..2958.004 rows=5353930 loops=1)
Filter: processed
Rows Removed by Filter: 18659
Planning time: 0.328 ms
Execution time: 1876066.242 ms
As you see the time for the query execution is immense and we would like to make it at least <30 seconds.
We have already tried some approaches like using bitor() instead of array_agg() and LATERAL JOIN. But they didn't give us desired performance and we decided to use materialized views for now. But we are still in search for any other solution and would really appreciate any suggestions!
Plan with track_io_timing enabled:
Aggregate (cost=14665999.33..14665999.34 rows=1 width=8) (actual time=2820945.285..2820945.285 rows=1 loops=1)
Buffers: shared hit=23 read=5998844, temp read=414465 written=414880
I/O Timings: read=2655805.505
-> GroupAggregate (cost=14166691.70..14599096.58 rows=5352220 width=930) (actual time=2820945.283..2820945.283 rows=0 loops=1)
Group Key: at.id
Filter: (bool_and(bt.processed) AND (array_agg(DISTINCT bt.status) <# '{1}'::"char"[]))
Rows Removed by Filter: 5353930
Buffers: shared hit=23 read=5998844, temp read=414465 written=414880
I/O Timings: read=2655805.505
-> Sort (cost=14166691.70..14258067.23 rows=36550213 width=6) (actual time=2804900.123..2808826.358 rows=37430745 loops=1)
Sort Key: at.id
Sort Method: external merge Disk: 586000kB
Buffers: shared hit=18 read=5998840, temp read=414465 written=414880
I/O Timings: read=2655805.491
-> Hash Right Join (cost=1135654.48..8076230.39 rows=36550213 width=6) (actual time=55370.788..2791441.542 rows=37430745 loops=1)
Hash Cond: (bt.a_id = at.id)
Buffers: shared hit=15 read=5998840, temp read=142879 written=142625
I/O Timings: read=2655805.491
-> Bitmap Heap Scan on b bt (cost=882095.79..7418660.65 rows=36704370 width=6) (actual time=51059.047..2769127.810 rows=37430378 loops=1)
Recheck Cond: ((l_id = 105) AND (type_id = ANY ('{2,10,18,1,4,5,6}'::integer[])))
Rows Removed by Index Recheck: 574462752
Heap Blocks: exact=28898 lossy=5726508
Buffers: shared hit=13 read=5886842
I/O Timings: read=2653254.939
-> Bitmap Index Scan on db_page_index_atableobjects (cost=0.00..872919.69 rows=36704370 width=0) (actual time=51049.365..51049.365 rows=37586483 loops=1)
Index Cond: ((l_id = 105) AND (type_id = ANY ('{2,10,18,1,4,5,6}'::integer[])))
Buffers: shared hit=12 read=131437
I/O Timings: read=49031.671
-> Hash (cost=165747.94..165747.94 rows=5352220 width=4) (actual time=4309.761..4309.761 rows=5353930 loops=1)
Buckets: 131072 Batches: 128 Memory Usage: 2507kB
Buffers: shared hit=2 read=111998, temp written=15500
I/O Timings: read=2550.551
-> Seq Scan on a at (cost=0.00..165747.94 rows=5352220 width=4) (actual time=0.515..3457.040 rows=5353930 loops=1)
Filter: processed
Rows Removed by Filter: 18659
Buffers: shared hit=2 read=111998
I/O Timings: read=2550.551
Planning time: 0.347 ms
Execution time: 2821022.622 ms
In the current plan, substantially all of the time is going to reading the table pages for the Bitmap Heap Scan. You must already have an index on something like (l_id, type_id). If you change it (create a new, then optionally drop the old one) to by on (ld_id, type_id, processed, a_id, status) instead, or perhaps on (ld_id, type_id, a_id, status) where processed), then it can probably switch to an index-only scan which can avoid reading the table as all the data is present in the index. You will need to make sure the table is well-vacuumed for this stategy to be effective. I would just manually vacuum the table once before building the index, then if it works you can at that point worry about how to keep it well-vacuumed.
Another option would be to jack up effective_io_concurrency (I'd just set it to 20. If it works; you can play with it more to find the optimal setting), so that more than one IO read request on the table can be outstanding at once. How effective this will be will depend on your IO system, and I don't know the answer to that for db.r5.xlarge. The index-only scan is better though as it uses less resources, while this method just uses the same resources faster. (If you have multiple similar queries running simultaneously, that is important. Also, if you are paying per IO, you want fewer of them, not the same number faster)
Another option is try to change the shape of the plan completely by having a nested loop from a into b. For this to have a hope, you will need an index on b which contains a_id and l_id as the leading columns (in either order). If you already have such an index and it doesn't naturally choose such a plan, you might be able to force by set enable_hashjoin=off. My gut feeling this is that a nested loop which needs to kick the other side 5,353,930 times is not going to be better than what you currently have, even if that other side has an efficient index.
You can filter and group table B before joining it with A. And order both tables by ID, because it increases speed of table scan when join operation is processed. Please check this code:
with at as (
select distinct at.id, at.processed
from a AS at
WHERE at.processed = True
order by at.id
),
bt as (
select bt.a_id, bt.l_id, bt.type_id, --BOOL_AND(bt.processed) AS not_pending,
ARRAY_AGG(DISTINCT bt.status) as status
from b AS bt
group by bt.a_id, bt.l_id, bt.type_id
having bt.l_id = 105 AND bt.type_id IN (2,10,18,1,4,5,6)
order by bt.a_id
),
counter as (
select at.id,
case
when '1' = all(status) then '1'
when '3' = any(status) then '3'
else status end as status
from at inner join bt on at.id=bt.a_id
)
select count (*) from counter where status='1'
In our database we have a table menus having 515502 rows. It has a column status which is of type smallint.
Currently, a simple count query takes 700 ms for set of docs having value of status as 3.
explain analyze select count(id) from menus where status = 2;
Aggregate (cost=72973.71..72973.72 rows=1 width=4) (actual time=692.564..692.565 rows=1 loops=1)
-> Bitmap Heap Scan on menus (cost=2510.63..72638.80 rows=133962 width=4) (actual time=28.179..623.077 rows=135429 loops=1)
Recheck Cond: (status = 2)
Rows Removed by Index Recheck: 199654
-> Bitmap Index Scan on menus_status (cost=0.00..2477.14 rows=133962 width=0) (actual time=26.211..26.211 rows=135429 loops=1)
Index Cond: (status = 2)
Total runtime: 692.705 ms
(7 rows)
Some rows have column value of 1 for which the query runs very fast.
explain analyze select count(id) from menus where status = 4;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=7198.73..7198.74 rows=1 width=4) (actual time=24.926..24.926 rows=1 loops=1)
-> Bitmap Heap Scan on menus (cost=40.53..7193.53 rows=2079 width=4) (actual time=1.461..23.418 rows=2220 loops=1)
Recheck Cond: (status = 4)
-> Bitmap Index Scan on menus_status (cost=0.00..40.02 rows=2079 width=0) (actual time=0.858..0.858 rows=2220 loops=1)
Index Cond: (status = 4)
Total runtime: 25.089 ms
(6 rows)
I observed that the most general btree index is the best indexing strategy for simple equality based queries. Both gin and hash were slower than btree.
Any tips for making count queries faster for any filter that is using an index?
I understand that this is a beginner level question, so apologies in advance for any kind of mistakes I might have made.
Maybe your table has more rows having status = 2 than ones having status = 4 , so, the total table access time is more for the second case.
So, for status = 2 there are too many rows to consider, so the the Bitmap for the Bitmap Heap Scan goes to the "lossy" mode, and recheck is needed after the operation. So, there are two things to consider: either your result is too big (but you can't do anything with that without reorganizing your tables, say, with partitioning), or your 'work_mem' param is too small to keep the intermittent result. Try to increase its value if you have possibility.