In PostgreSQL, I have an index on a date field on my tickets table.
When I compare the field against now(), the query is pretty efficient:
# explain analyze select count(1) as count from tickets where updated_at > now();
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=90.64..90.66 rows=1 width=0) (actual time=33.238..33.238 rows=1 loops=1)
-> Index Scan using tickets_updated_at_idx on tickets (cost=0.01..90.27 rows=74 width=0) (actual time=0.016..29.318 rows=40250 loops=1)
Index Cond: (updated_at > now())
Total runtime: 33.271 ms
It goes downhill and uses a Bitmap Heap Scan if I try to compare it against now() minus an interval.
# explain analyze select count(1) as count from tickets where updated_at > (now() - '24 hours'::interval);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=180450.15..180450.17 rows=1 width=0) (actual time=543.898..543.898 rows=1 loops=1)
-> Bitmap Heap Scan on tickets (cost=21296.43..175963.31 rows=897368 width=0) (actual time=251.700..457.916 rows=924373 loops=1)
Recheck Cond: (updated_at > (now() - '24:00:00'::interval))
-> Bitmap Index Scan on tickets_updated_at_idx (cost=0.00..20847.74 rows=897368 width=0) (actual time=238.799..238.799 rows=924699 loops=1)
Index Cond: (updated_at > (now() - '24:00:00'::interval))
Total runtime: 543.952 ms
Is there a more efficient way to query using date arithmetic?
The 1st query expects to find rows=74, but actually finds rows=40250.
The 2nd query expects to find rows=897368 and actually finds rows=924699.
Of course, processing 23 x as many rows takes considerably more time. So your actual times are not surprising.
Statistics for data with updated_at > now() are outdated. Run:
ANALYZE tickets;
and repeat your queries. And you seriously have data with updated_at > now()? That sounds wrong.
It's not surprising, however, that statistics are outdated for data most recently changed. That's in the logic of things. If your query depends on current statistics, you have to run ANALYZE before you run your query.
Also test with (in your session only):
SET enable_bitmapscan = off;
and repeat your second query to see times without bitmap index scan.
Why bitmap index scan for more rows?
A plain index scan fetches rows from the heap sequentially as found in the index. That's simple, dumb and without overhead. Fast for few rows, but may end up more expensive than a bitmap index scan with a growing number of rows.
A bitmap index scan collects rows from the index before looking up the table. If multiple rows reside on the same data page, that saves repeated visits and can make things considerably faster. The more rows, the greater the chance, a bitmap index scan will save time.
For even more rows (around 5% of the table, heavily depends on actual data), the planner switches to a sequential scan of the table and doesn't use the index at all.
The optimum would be an index only scan, introduced with Postgres 9.2. That's only possible if some preconditions are met. If all relevant columns are included in the index, the index type support it and the visibility map indicates that all rows on a data page are visible to all transactions, that page doesn't have to be fetched from the heap (the table) and the information in the index is enough.
The decision depends on your statistics (how many rows Postgres expects to find and their distribution) and on cost settings, most importantly random_page_cost, cpu_index_tuple_cost and effective_cache_size.
Related
I have the following query:
devapp=> Explain SELECT DISTINCT "chaindata_tokentransfer"."emitting_contract" FROM "chaindata_tokentransfer" WHERE (("chaindata_tokentransfer"."to_addr" = 100 OR "chaindata_tokentransfer"."from_addr" = 100) AND "chaindata_tokentransfer"."chain_id" = 1 AND "chaindata_tokentransfer"."block_number" >= 10000);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=29062023.48..29062321.43 rows=8870 width=4)
-> Sort (cost=29062023.48..29062172.45 rows=59591 width=4)
Sort Key: emitting_contract
-> Bitmap Heap Scan on chaindata_tokentransfer (cost=28822428.06..29057297.07 rows=59591 width=4)
Recheck Cond: (((to_addr = 100) OR (from_addr = 100)) AND (chain_id = 1) AND (block_number >= 10000))
-> BitmapAnd (cost=28822428.06..28822428.06 rows=59591 width=0)
-> BitmapOr (cost=4209.94..4209.94 rows=351330 width=0)
-> Bitmap Index Scan on chaindata_tokentransfer_to_addr_284dc4bc (cost=0.00..1800.73 rows=150953 width=0)
Index Cond: (to_addr = 100)
-> Bitmap Index Scan on chaindata_tokentransfer_from_addr_ef8ecd8c (cost=0.00..2379.41 rows=200377 width=0)
Index Cond: (from_addr = 100)
-> Bitmap Index Scan on chaindata_tokentransfer_chain_id_block_number_tx_eeeac2a4_idx (cost=0.00..28818202.98 rows=1315431027 width=0)
Index Cond: ((chain_id = 1) AND (block_number >= 10000))
(13 rows)
As you can see, the cost of the last index scan on chaindata_tokentransfer_chain_id_block_number_tx_eeeac2a4_idx is very high. And the query is timing out. If I remove the filter on chain_id and block_number from the query, then the query is executing in a reasonable amount of time. Since this new less constrained query is working, I'd expect even the original more constrained query to work if the index was not there and it was just an additional filter. How to achieve that without deleting the index?
You can probably disable the index by doing some dummy arithmetic on the indexed column.
...AND "chaindata_tokentransfer"."chain_id" + 0 = 1...
If you put that into production, make sure to add a code comment on why you are doing such an odd thing.
I'm curious why it chooses to use that index, despite apparently knowing how astonishingly awful it is. If you show the plan for the query with the index disabled, maybe we could figure that out.
If the dummy arithmetic doesn't work, what you could do is start a transaction, drop the index, execute the query (or the just the EXPLAIN of it), then rollback the drop. That is probably not something you want to do often in production (especially since the table will be locked from when the index is dropped until the rollback. Also because you might accidentally commit!) but getting the plan is probably worth doing it once.
I'm running this query in our database:
select
(
select least(2147483647, sum(pb.nr_size))
from tb_pr_dc pd
inner join tb_pr_dc_bn pb on 1=1
and pb.id_pr_dc_bn = pd.id_pr_dc_bn
where 1=1
and pd.id_pr = pt.id_pr -- outer query column
)
from
(
select regexp_split_to_table('[list of 500 ids]', ',')::integer id_pr
) pt
;
Which outputs 500 rows having a single result column and takes around 1 min and 43 secs to run. The explain (analyze, verbose, buffers) outputs the following plan:
Subquery Scan on pt (cost=0.00..805828.19 rows=1000 width=8) (actual time=96.791..103205.872 rows=500 loops=1)
Output: (SubPlan 1)
Buffers: shared hit=373771 read=153484
-> Result (cost=0.00..22.52 rows=1000 width=4) (actual time=0.434..3.729 rows=500 loops=1)
Output: ((regexp_split_to_table('[list of 500 ids]', ',')::integer id_pr)
-> ProjectSet (cost=0.00..5.02 rows=1000 width=32) (actual time=0.429..2.288 rows=500 loops=1)
Output: (regexp_split_to_table('[list of 500 ids]', ',')::integer id_pr
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)
SubPlan 1
-> Aggregate (cost=805.78..805.80 rows=1 width=8) (actual time=206.399..206.400 rows=1 loops=500)
Output: LEAST('2147483647'::bigint, sum((pb.nr_size)::integer))
Buffers: shared hit=373771 read=153484
-> Nested Loop (cost=0.87..805.58 rows=83 width=4) (actual time=1.468..206.247 rows=219 loops=500)
Output: pb.nr_size
Inner Unique: true
Buffers: shared hit=373771 read=153484
-> Index Scan using tb_pr_dc_in05 on db.tb_pr_dc pd (cost=0.43..104.02 rows=83 width=4) (actual time=0.233..49.289 rows=219 loops=500)
Output: pd.id_pr_dc, pd.ds_pr_dc, pd.id_pr, pd.id_user_in, pd.id_user_ex, pd.dt_in, pd.dt_ex, pd.ds_mt_ex, pd.in_at, pd.id_tp_pr_dc, pd.id_pr_xz (...)
Index Cond: ((pd.id_pr)::integer = pt.id_pr)
Buffers: shared hit=24859 read=64222
-> Index Scan using tb_pr_dc_bn_pk on db.tb_pr_dc_bn pb (cost=0.43..8.45 rows=1 width=8) (actual time=0.715..0.715 rows=1 loops=109468)
Output: pb.id_pr_dc_bn, pb.ds_ex, pb.ds_md_dc, pb.ds_m5_dc, pb.nm_aq, pb.id_user, pb.dt_in, pb.ob_pr_dc, pb.nr_size, pb.ds_sg, pb.ds_cr_ch, pb.id_user_ (...)
Index Cond: ((pb.id_pr_dc_bn)::integer = (pd.id_pr_dc_bn)::integer)
Buffers: shared hit=348912 read=89262
Planning Time: 1.151 ms
Execution Time: 103206.243 ms
The logic is: for each id_pr chosen (in the list of 500 ids) calculate the sum of the integer column pb.nr_size associated with them, returning the lesser value between this amount and the number 2,147,483,647. The result must contain 500 rows, one for each id, and we already know that they'll match at least one row in the subquery, so will not produce null values.
The index tb_pr_dc_in05 is a b-tree on id_pr only, which is of integer type. The index tb_pr_dc_bn_pk is a b-tree on the primary key id_pr_dc_bn only, which is of integer type also. Table tb_pr_dc has many rows for each id_pr. Actually, we have 209,217 unique id_prs in tb_pr_dc for a total of 13,910,855 rows. Table tb_pr_dc_bn has the same amount of rows.
As can be seen, we defined 500 ids to query tb_pr_dc, finding 109,468 rows (less than 1% of the table size) and then finding the same amount looking in tb_pr_dc_bn. Imo, the indexes look fine and the amount of rows to evaluate is minimal, so I can't understand why it's taking so much time to run this query. A lot of other queries reading a lot more of data on other tables and doing more calculations are running fine. The DBA just ran a reindex and vacuum analyze, but still it's running the same slow way. We are running PostgreSQL 11 on Linux. I'm running this query in a replica without concurrent access.
What could I be missing that could improve this query performance?
Thanks for your attention.
The time is spent jumping all over the table to find 109468 randomly scattered rows, issuing random IO requests to do so. You can verify that be turning track_io_timing on and redoing the plans (probably just leave it turned on globally and by default, the overhead is low and the value it produces is high), but I'm sure enough that I don't need to see that output before reaching this conclusion. The other queries that are faster are probably accessing fewer disk pages because they access data that is more tightly packed, or is organized so that it can be read more sequentially. In fact, I would say your query is quite fast given how many pages it had to read.
You ask about why so many columns are output in the internal nodes of the plan. The reason for that is that PostgreSQL often just passes around pointers to where the tuple lives in the shared_buffers, and the tuple being pointed to has the columns that the table itself has. It could allocate memory in which to store a reformatted version of the tuple with the unnecessary columns stripped out, but that would generally be more work, not less. If it was a reason to copy and re-form the tuple anyway, it will remove the extraneous columns while it does so. But it won't do it without a reason.
One way to sped this up is to create indexes which will enable index-only scans. Those would be on tb_pr_dc (id_pr, id_pr_dc_bn) and on tb_pr_dc_bn (id_pr_dc_bn, nr_size).
If this isn't enough, there might be other ways to improve this too; but I can't think through them if I keep getting distracted by the long strings of unmemorable unpronounceable gibberish you have for table and column names.
PostgreSQL 11 isn't smart enough to use indexes with included columns?
CREATE INDEX organization_locations__org_id_is_headquarters__inc_location_id_ix
ON organization_locations(org_id, is_headquarters) INCLUDE (location_id);
ANALYZE organization_locations;
ANALYZE organizations;
EXPLAIN VERBOSE
SELECT location_id
FROM organization_locations ol
WHERE org_id = (SELECT id FROM organizations WHERE code = 'akron')
AND is_headquarters = 1;
QUERY PLAN
Seq Scan on organization_locations ol (cost=8.44..14.61 rows=1 width=4)
Output: ol.location_id
Filter: ((ol.org_id = $0) AND (ol.is_headquarters = 1))
InitPlan 1 (returns $0)
-> Index Scan using organizations__code_ux on organizations (cost=0.42..8.44 rows=1 width=4)
Output: organizations.id
Index Cond: ((organizations.code)::text = 'akron'::text)
There are only 211 rows currently in organization_locations, average row length 91 bytes.
I get only loading one data page. But the I/O is the same to grab the index page and the target data is right there (no extra lookup into the data page from the index). What is PG thinking with this plan?
This just creates a TODO for me to round back and check to make sure the right plan starts getting generated once the table burgeons.
EDIT: Here is the explain with buffers:
Seq Scan on organization_locations ol (cost=8.44..14.33 rows=1 width=4) (actual time=0.018..0.032 rows=1 loops=1)
Filter: ((org_id = $0) AND (is_headquarters = 1))
Rows Removed by Filter: 210
Buffers: shared hit=7
InitPlan 1 (returns $0)
-> Index Scan using organizations__code_ux on organizations (cost=0.42..8.44 rows=1 width=4) (actual time=0.008..0.009 rows=1 loops=1)
Index Cond: ((code)::text = 'akron'::text)
Buffers: shared hit=4
Planning Time: 0.402 ms
Execution Time: 0.048 ms
Reading one index page is not cheaper than reading a table page, so with tiny tables you cannot expect a gain from an index-only scan.
Besides, did you
VACUUM organization_locations;
Without that, the visibility map won't show that the table block is all-visible, so you cannot get an index-only scan no matter what.
In addition to the other answers, this is probably a silly index to have in the first place. INCLUDE is good when you need a unique index but you also want to tack on a column which is not part of the unique constraint, or when the included column doesn't have btree operators and so can't be in the main body of the index. In other cases, you should just put the extra column in the index itself.
This just creates a TODO for me to round back and check to make sure the right plan starts getting generated once the table burgeons.
This is your workflow problem that you can't expect PostgreSQL to solve for you. Do you really think PostgreSQL should create actual plans based on imaginary scenarios?
EXPLAIN SELECT a.name, m.name FROM Casting c JOIN Movie m ON c.m_id = m.m_id JOIN Actor a ON a.a_id = c.a_id AND c.a_id < 50;
Output
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=26.20..18354.49 rows=1090 width=27) (actual time=0.240..5.603 rows=1011 loops=1)
-> Nested Loop (cost=25.78..12465.01 rows=1090 width=15) (actual time=0.236..4.046 rows=1011 loops=1)
-> Bitmap Heap Scan on casting c (cost=25.35..3660.19 rows=1151 width=8) (actual time=0.229..1.059 rows=1011 loops=1)
Recheck Cond: (a_id < 50)
Heap Blocks: exact=989
-> Bitmap Index Scan on casting_a_id_index (cost=0.00..25.06 rows=1151 width=0) (actual time=0.114..0.114 rows=1011 loops=1)
Index Cond: (a_id < 50)
-> Index Scan using movie_pkey on movie m (cost=0.42..7.64 rows=1 width=15) (actual time=0.003..0.003 rows=1 loops=1011)
Index Cond: (m_id = c.m_id)
-> Index Scan using actor_pkey on actor a (cost=0.42..5.39 rows=1 width=20) (actual time=0.001..0.001 rows=1 loops=1011)
Index Cond: (a_id = c.a_id)
Planning time: 0.334 ms
Execution time: 5.672 ms
(13 rows)
I am trying to understand how query planner works? I am able to understand the process it choose, but I am not getting why ?
Can someone explain query optimizer choices (choice of query processing algorithms, join order) in these queries based on parameters like query selectivity and cost models or anything that effects choice?
Also why there is use of Recheck Cond, after index scan ?
There are two reasons why there has to be a Bitmap Heap Scan:
PostgreSQL has to check whether the rows found are visible for the current transaction or not. Remember that PostgreSQL keeps old row versions in the table until VACUUM removes them. This visibility information is not stored in the index.
If work_mem is not large enough to contain a bitmap with one bit per table row, PostgreSQL uses one bit per table page, which loses some information. The PostgreSQL needs to check the lossy blocks to see which of the rows in the block really satisfy the condition.
You can see this when you use EXPLAIN (ANALYZE, BUFFERS), then PostgreSQL will show if there were lossy matches, see this example on rextester:
-> Bitmap Heap Scan on t (cost=177.14..4719.43 rows=9383 width=0)
(actual time=2.130..144.729 rows=10001 loops=1)
Recheck Cond: (val = 10)
Rows Removed by Index Recheck: 738586
Heap Blocks: exact=646 lossy=3305
Buffers: shared hit=1891 read=2090
-> Bitmap Index Scan on t_val_idx (cost=0.00..174.80 rows=9383 width=0)
(actual time=1.978..1.978 rows=10001 loops=1)
Index Cond: (val = 10)
Buffers: shared read=30
I cannot explain the whole of the PostgreSQL optimizer in this answer, but what it does is to try all possible ways to compute the result, estimate how much each one will cost and choose the cheapest plan.
To estimate how big the result set will be, it uses the object definitions and the table statistics, which contain detailed data about how the column values are distributed.
It then calculates how many disk blocks it will have to read sequentially and by random access (I/O cost), and how many tables and index rows and function calls it will have to process (CPU cost) to come up with a grand total. The weights for each of these components in the total can be configured.
Usually the best plan is one that reduces the number of result rows as quickly as possible by applying the most selective condition first. In your case this seems to be casting.a_id < 50.
Nested loop joins are often preferred if the number of rows in the outer (upper in EXPLAIN output) table is small.
explain analyse
SELECT COUNT(*) FROM syo_event WHERE id_group = 'OPPORTUNITY' AND id_type = 'NEW'
My query have this plan:
Aggregate (cost=654.16..654.17 rows=1 width=0) (actual time=3.783..3.783 rows=1 loops=1)
-> Bitmap Heap Scan on syo_event (cost=428.76..654.01 rows=58 width=0) (actual time=2.774..3.686 rows=1703 loops=1)
Recheck Cond: ((id_group = 'OPPORTUNITY'::text) AND (id_type = 'NEW'::text))
-> BitmapAnd (cost=428.76..428.76 rows=58 width=0) (actual time=2.635..2.635 rows=0 loops=1)
-> Bitmap Index Scan on syo_list_group (cost=0.00..35.03 rows=1429 width=0) (actual time=0.261..0.261 rows=2187 loops=1)
Index Cond: (id_group = 'OPPORTUNITY'::text)
-> Bitmap Index Scan on syo_list_type (cost=0.00..393.45 rows=17752 width=0) (actual time=2.177..2.177 rows=17555 loops=1)
Index Cond: (id_type = 'NEW'::text)
Total runtime: 3.827 ms
In the first line:
(actual time=3.783..3.783 rows=1 loops=1
(Why the actual time not match with the last line, Total runtime ?)
In the second line:
cost=428.76..654.01
(Start the Bitmap Heap Scan with cost 428.76 and ends with 654.01) ?
rows=58 width=0)
(Wath is rows and width, anything important ?)
rows=1703
(this is the result)
loops=1)
(Used in subqueries ?)
From the postgres docs:
Note that the "actual time" values are in milliseconds of real time, whereas the cost estimates are expressed in arbitrary units; so they are unlikely to match up. The thing that's usually most important to look for is whether the estimated row counts are reasonably close to reality.
The estimated cost is computed as (disk pages read * seq_page_cost) + (rows scanned * cpu_tuple_cost). By default, seq_page_cost is 1.0 and cpu_tuple_cost is 0.01
As for the first line, EXPLAIN executed the query and it took 3.783 ms, but presenting you with the output of the plan takes some time to, so that total runtime is increased by the time spent on doing that.
Basically EXPLAIN ANALYZE displays estimates that a plain EXPLAIN would show you along with values collected from actually running the query, hence the difference in the second line.
Both rows and width are important. Respectively this is the number of rows output estimation and an average width of rows in bytes. Your estimated total cost will be lower if the estimation of rows returned is smaller and you need to take that into account.
To understand what loops actually presents you need to know that a query plan is actually a tree of plan nodes. There are different types of nodes that serve different purposes - a scan node for example is responsible for returning raw rows from a table. If your query is doing some operation on rows there will be additional nodes above the scan nodes to handle that.
The first line in your output from EXPLAIN is a summary from the level 1 (at the top) node with estimated cost for entire plan.
Knowing that, loops represents a value of total number of executions of a particular node. This is because in some plans a subplan node can be executed more than once and if that happens then to make the numbers comparable with other estimates it multiplies time and rows values by loops to get the total time spent in that node.
You can get more insight on the topic in the documentation.