Optimizing SELECT count(*) on large table - sql

Basic count on a large table on PostgreSQL 14 with 64GB Ram & 20 threads. Storage is an NVME disk.
Questions:
How do I improve the query for this select count query? What kind of optimizations should I look into on Postgres configuration?
The workers planned is 4 but launched 0, is that normal?
EXPLAIN (ANALYZE, BUFFERS)
SELECT count(*) FROM public.product;
Finalize Aggregate (cost=2691545.69..2691545.70 rows=1 width=8) (actual time=330901.439..330902.951 rows=1 loops=1)
Buffers: shared hit=1963080 read=1140455 dirtied=1908 written=111146
I/O Timings: read=36692.273 write=6548.923
-> Gather (cost=2691545.27..2691545.68 rows=4 width=8) (actual time=330901.342..330902.861 rows=1 loops=1)
Workers Planned: 4
Workers Launched: 0
Buffers: shared hit=1963080 read=1140455 dirtied=1908 written=111146
I/O Timings: read=36692.273 write=6548.923
-> Partial Aggregate (cost=2690545.27..2690545.28 rows=1 width=8) (actual time=330898.747..330898.757 rows=1 loops=1)
Buffers: shared hit=1963080 read=1140455 dirtied=1908 written=111146
I/O Timings: read=36692.273 write=6548.923
-> Parallel Index Only Scan using points on products (cost=0.57..2634234.99 rows=22524114 width=0) (actual time=0.361..222958.361 rows=90993600 loops=1)
Heap Fetches: 46261956
Buffers: shared hit=1963080 read=1140455 dirtied=1908 written=111146
I/O Timings: read=36692.273 write=6548.923
Planning:
Buffers: shared hit=39 read=8
I/O Timings: read=0.398
Planning Time: 2.561 ms
JIT:
Functions: 4
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 0.691 ms, Inlining 104.789 ms, Optimization 24.169 ms, Emission 22.457 ms, Total 152.107 ms
Execution Time: 330999.777 ms

The workers planned is 4 but launched 0, is that normal?
It can happen when too many concurrent transactions compete for a limited number of allowed parallel workers. The manual:
The number of background workers that the planner will consider using
is limited to at most max_parallel_workers_per_gather. The
total number of background workers that can exist at any one time is
limited by both max_worker_processes and
max_parallel_workers. Therefore, it is possible for a
parallel query to run with fewer workers than planned, or even with no
workers at all. The optimal plan may depend on the number of workers
that are available, so this can result in poor query performance. If
this occurrence is frequent, consider increasing
max_worker_processes and max_parallel_workers so that more workers
can be run simultaneously or alternatively reducing
max_parallel_workers_per_gather so that the planner requests fewer
workers.
You can also optimize overall performance to free up resources, or get better hardware (in addition to ramping up max_parallel_workers).
What's also troubling:
Heap Fetches: 46261956
For 90993600 rows. That's way too many for comfort. An index-only scan is not supposed to do that many heap fetches.
Both of these symptoms would indicate massive concurrent write access (or long-running transactions hogging resources and keeping autovacuum from doing its job). Look into that, and/or tune per-table autovacuum settings for table product to be more aggressive, so that columns statistics are more valid and the visibility map can keep up. See:
Aggressive Autovacuum on PostgreSQL
Also, with halfway valid table statistics, a (blazingly fast!) estimate might be good enough? See:
Fast way to discover the row count of a table in PostgreSQL

Related

Postgresql recheck performed even there are no lossy blocks

I am running a explain (buffers, analyze, verbose)
And I am getting this subresult
-> Bitmap Heap Scan on public.d (cost=109.92..8479.81 rows=5871 width=40) (actual time=1.334..29.942 rows=5306 loops=1)
Output: d.id, d.pd, d.iid, d.dtid, d.bid
Recheck Cond: ((d.sid = 100) AND (d.pd >= '2020-01-28 10:24:40.034+00'::timestamp with time zone) AND (d.pd <= '2020-04-28 10:24:40.034+00'::timestamp with time zone))
Heap Blocks: exact=2014
Buffers: shared hit=3 read=2035
-> Bitmap Index Scan on idx_d_didpd (cost=0.00..108.45 rows=5871 width=0) (actual time=1.018..1.018 rows=5306 loops=1)
Index Cond: ((d.sid = 100) AND (d.pd >= '2020-01-28 10:24:40.034+00'::timestamp with time zone) AND (d.pd <= '2020-04-28 10:24:40.034+00'::timestamp with time zone))
Buffers: shared read=24
What I am wondering that in whole result the most "costly" parts are that are performing the Bitman Heap Scan (other parts performing the index scan and they are pretty fast). But I´ve read that recheck on bitmap heap scan is performed just in case that there are some lossy blocks. Which I can not see here.
Can anyone tell me why is this Heap Scan performed?
Note that "Recheck Cond" is also present with just an EXPLAIN without the ANALYZE. The value of this field does not depend on what actually happened during execution. It is telling you what condition will be used in a potential recheck, it does not tell you how often the recheck was actually performed (which in your case was probably zero).
Can anyone tell me why is this Heap Scan performed?
The Bitmap Heap Scan is not just there to do a recheck, its main purpose is to fetch the data you asked for.

Too many buffers hit+read during index scan

I've 2 tables User and Info. I'm writing a simple query with inner join and inserting the result into an unlogged table.
INSERT INTO Result (
iProfileId,email,format,content
)
SELECT
COALESCE(N1.iprofileId, 0),
Lower(N1.email),
W0.format,
W0.content
FROM
Info W0,
User N1
where
(N1.iprofileId = W0.iId);
Info table has 30M rows and User table has 158M rows. Due to some reason, this query is taking too long on one of my prod setups. At first glance it looks like its reading/hitting too many buffers:
Insert on Result (cost=152813.60..15012246.06 rows=31198136 width=1080) (actual time=5126063.502..5126063.502 rows=0 loops=1)
Buffers: shared hit=128815094 read=6103564 dirtied=599445 written=2088037
I/O Timings: read=2563306.517 write=570919.940
-> Merge Join (cost=152813.60..15012246.06 rows=31198136 width=1080) (actual time=0.097..5060947.922 rows=31191937 loops=1)
Merge Cond: (w0.iid = n1.iprofileid)
Buffers: shared hit=96480126 read=5574864 dirtied=70745 written=2009998
I/O Timings: read=2563298.981 write=562810.833
-> Index Scan using user_idx on info w0 (cost=0.56..2984094.60 rows=31198136 width=35) (actual time=0.012..246299.026 rows=31191937 loops=1)
Buffers: shared hit=481667 read=2490602 written=364347
I/O Timings: read=178000.987 write=38663.457
-> Index Scan using profile_id on user n1 (cost=0.57..14938848.88 rows=158842848 width=32) (actual time=0.020..4718272.082 rows=115378606 loops=1)
Buffers: shared hit=95998459 read=3084262 dirtied=70745 written=1645651
I/O Timings: read=2385297.994 write=524147.376
Planning Time: 11.531 ms
Execution Time: 5126063.577 ms
When I ran this query on a different setup but with similar tables and number of records, profile_id scan only used 5M pages(ran in 3m) whereas here it used(read+hit) 100M buffers(ran in 1.45h). When I checked using vacuum verbose this table only has 10M pages.
INFO: "User": found 64647 removable, 109184385 nonremovable row versions in 6876625 out of 10546400 pages
This is one of the good runs but we've seen this query taking up to 4-5 hrs as well. My test system which ran in under 3 mins also had iid distributed among profile_id range. But it had fewer columns and indexes as compared to the prod system. What could be the reason for this slowness?
The execution plan you are showing has a lot of dirtied and written pages. That indicates that the tables were freshly inserted, and your query was the first reader.
In PostgreSQL, the first reader of a new table row consults the commit log to see if that row is visible or not (did the transaction that created it commit?). It then sets flags in the row (the so-called hint bits) to save the next reader that trouble.
Setting the hint bits modifies the row, so the block is dirtied and has to be written to disk eventually. That writing is normally done by the checkpointer or the background writer, but they couldn't keep up, so the query had to clean out many dirty pages itself.
If you run the query a second time, it will be faster. For that reason, it is a good idea to VACUUM tables after bulk loading, which will also set the hint bits.
However, a large query like that will always be slow. Things you can try to speed it up further are:
have lots of RAM and load the tables into shared buffers with pg_prewarm
crank up work_mem in the hope to get a faster hash join
CLUSTER the tables using the indexes, so that heap fetches become more efficient

Postgresql performance issue when querying on time range

I'm trying to understand a strange performance issue on Postgres (v10.9).
We have a requests table and I want to get all requests made by a set of particular users in several time ranges. Here is the relevant info:
There is no user_id column in the table. Rather, there is a jsonb column named params, where the user_id field is stored as a string.
The set of users in question is very large, in the thousands.
There is a time column of type timestamptz and it's indexed with a standard BTREE index.
There is also an separate BTREE index on params->>'user_id'.
The queries I am running are based on the following template:
SELECT *
FROM requests
WHERE params->>'user_id' = ANY (VALUES ('id1'), ('id2'), ('id3')...)
AND time > 't1' AND time < 't2'
Where the ids and times here are placeholders for actual ids and times.
I am running a query like this for several consecutive time ranges of 2 weeks each. The queries for the first few time ranges take a couple of minutes each, which is obviously very long in terms of production but OK for research purposes. Then suddenly there is a dramatic spike in query runtime, and they start taking hours each, which begins to be untenable even for offline purposes.
This spike happens in the same range every time. It's worth noting that in this time range there is a x1.5 increase in total requests. Certainly more compared with the previous time range, but not enough to warrant a spike by a full order of magnitude.
Here is the output for EXPLAIN ANALYZE for the last time range with the reasonable running time:
Hash Join (cost=442.69..446645.35 rows=986171 width=1217) (actual time=66.305..203593.238 rows=445175 loops=1)
Hash Cond: ((requests.params ->> 'user_id'::text) = \"*VALUES*\".column1)
-> Index Scan using requests_time_idx on requests (cost=0.56..428686.19 rows=1976888 width=1217) (actual time=14.336..201643.439 rows=2139604 loops=1)
Index Cond: ((\"time\" > '2019-02-12 22:00:00+00'::timestamp with time zone) AND (\"time\" < '2019-02-26 22:00:00+00'::timestamp with time zone))
-> Hash (cost=439.62..439.62 rows=200 width=32) (actual time=43.818..43.818 rows=29175 loops=1)
Buckets: 32768 (originally 1024) Batches: 1 (originally 1) Memory Usage: 2536kB
-> HashAggregate (cost=437.62..439.62 rows=200 width=32) (actual time=24.887..33.775 rows=29175 loops=1)
Group Key: \"*VALUES*\".column1
-> Values Scan on \"*VALUES*\" (cost=0.00..364.69 rows=29175 width=32) (actual time=0.006..10.303 rows=29175 loops=1)
Planning time: 133.807 ms
Execution time: 203697.360 ms
If I understand this correctly, it seems that most of the time is spent on filtering the requests by time range, even though:
The time index seems to be used.
When running the same queries without the filter on the users (basically just fetching all requests by time range only), they both run in OK times.
Any thoughts on how to solve this problem would be appreciated, thanks!
Since you are retrieving so many rows, the query will never be really fast.
Unfortunately there is no single index to cover both conditions, but you can use these two:
CREATE INDEX ON requests ((params->>'user_id'));
CREATE INDEX ON requests (time);
Then you can hope for two bitmap index scans which get joined by a “bitmap or”.
I am not sure if that will improve performance; PostgreSQL may still opt for the current plan, which is not a bad one. If your indexes are cached or random access to your storage is fast, set effective_cache_size or random_page_cost accordingly, that will make PostgreSQL lean towards an index scan.

Postgres on AWS performance issues on filter or aggregate

I'm working on a system which has a table with aprox. 13 million records.
It does not appear to be big deal for postgres, but i'm facing serious performance issues when hitting this particular table.
The table has aprox. 60 columns (I know it's too much, but I can't change it for reasons beyond my will).
Hardware ain't problem. It's running on AWS. I tested several configurations, even the new RDS for postgres:
vCPU ECU mem(gb)
m1.xlarge 64 bits 4 8 15
m2.xlarge 64 bits 2 6,5 17
hs1.8xlarge 64 bits 16 35 117 SSD
I tuned pg settings with pgtune. And also set the ubuntu's kernel sshmall and shmmax.
Some "explain analyze" queries:
select count(*) from:
$Aggregate (cost=350390.93..350390.94 rows=1 width=0) (actual time=24864.287..24864.288 rows=1 loops=1)
-> Index Only Scan using core_eleitor_sexo on core_eleitor (cost=0.00..319722.17 rows=12267505 width=0) (actual time=0.019..12805.292 rows=12267505 loops=1)
Heap Fetches: 9676
Total runtime: 24864.325 ms
select distinct core_eleitor_city from:
HashAggregate (cost=159341.37..159341.39 rows=2 width=516) (actual time=15965.740..15966.090 rows=329 loops=1)
-> Bitmap Heap Scan on core_eleitor (cost=1433.53..159188.03 rows=61338 width=516) (actual time=956.590..9021.457 rows=5886110 loops=1)
Recheck Cond: ((core_eleitor_city)::text = 'RIO DE JANEIRO'::text)
-> Bitmap Index Scan on core_eleitor_city (cost=0.00..1418.19 rows=61338 width=0) (actual time=827.473..827.473 rows=5886110 loops=1)
Index Cond: ((core_eleitor_city)::text = 'RIO DE JANEIRO'::text)
Total runtime: 15977.460 ms
I have btree indexes on columns frequently used for filter or aggregations.
So, given I can't change my table design. Is there something I can do to improve the performance?
Any help would be awesome.
Thanks
You're aggregating ~12.3M and ~5.9M rows on a VPS cluster that, if I am not mistaking, might span multiple physical servers, with data that is probably pulled from a SAN on yet another set of different server than Postgres itself.
Imho, there's little you can do to make it faster on (AWS anyway), besides a) not running queries that basically visit the entire database table to begin with and b) maintaining a pre-count using triggers if possible if you persist in doing so.
Here you go for improving the performance on RDS:
As referred to link here:
http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_ReadRepl.html
Amazon RDS uses MySQL’s built-in replication functionality to create a special type of DB instance called a read replica from a source DB instance. Updates made to the source DB instance are copied to the read replica. You can reduce the load on your source DB instance by routing read queries from your applications to the read replica. Read replicas allow you to elastically scale out beyond the capacity constraints of a single DB instance for read-heavy database workloads.

PostgreSQL query not using index in production

I'm noticing something strange/weird:
The exact same query in development/production are not using the same query path. In particular, the development version is using indexes which are omitted in production (in favor of seqscan).
The only real difference is that the dataset is production is significantly larger--the index size is 1034 MB, vs 29 MB in production. Would PostgreSQL abstain from using indexes if they (or the table) are too big?
EDIT: EXPLAIN ANALYZE for both queries:
Development:
Limit (cost=41638.15..41638.20 rows=20 width=154) (actual time=159.576..159.581 rows=20 loops=1)
-> Sort (cost=41638.15..41675.10 rows=14779 width=154) (actual time=159.575..159.577 rows=20 loops=1)
Sort Key: (sum(scenario_ad_group_performances.clicks))
Sort Method: top-N heapsort Memory: 35kB
-> GroupAggregate (cost=0.00..41244.89 rows=14779 width=154) (actual time=0.040..151.535 rows=14197 loops=1)
-> Nested Loop Left Join (cost=0.00..31843.75 rows=93800 width=154) (actual time=0.022..82.509 rows=50059 loops=1)
-> Merge Left Join (cost=0.00..4203.46 rows=14779 width=118) (actual time=0.017..27.103 rows=14197 loops=1)
Merge Cond: (scenario_ad_groups.id = scenario_ad_group_vendor_instances.ad_group_id)
-> Index Scan using scenario_ad_groups_pkey on scenario_ad_groups (cost=0.00..2227.06 rows=14779 width=114) (actual time=0.009..12.085 rows=14197 loops=1)
Filter: (scenario_id = 22)
-> Index Scan using index_scenario_ad_group_vendor_instances_on_ad_group_id on scenario_ad_group_vendor_instances (cost=0.00..1737.02 rows=27447 width=8) (actual time=0.007..7.021 rows=16528 loops=1)
Filter: (ad_vendor_id = ANY ('{1,2,3}'::integer[]))
-> Index Scan using index_ad_group_performances_on_vendor_instance_id_and_date on scenario_ad_group_performances (cost=0.00..1.73 rows=11 width=44) (actual time=0.002..0.003 rows=3 loops=14197)
Index Cond: ((vendor_instance_id = scenario_ad_group_vendor_instances.id) AND (date >= '2012-02-01'::date) AND (date <= '2012-02-28'::date))
Total runtime: 159.710 ms
Production:
Limit (cost=822401.35..822401.40 rows=20 width=179) (actual time=21279.547..21279.591 rows=20 loops=1)
-> Sort (cost=822401.35..822488.42 rows=34828 width=179) (actual time=21279.543..21279.560 rows=20 loops=1)
Sort Key: (sum(scenario_ad_group_performances.clicks))
Sort Method: top-N heapsort Memory: 33kB
-> GroupAggregate (cost=775502.60..821474.59 rows=34828 width=179) (actual time=19126.783..21226.772 rows=34495 loops=1)
-> Sort (cost=775502.60..776739.48 rows=494751 width=179) (actual time=19125.902..19884.164 rows=675253 loops=1)
Sort Key: scenario_ad_groups.id
Sort Method: external merge Disk: 94200kB
-> Hash Right Join (cost=25743.86..596796.70 rows=494751 width=179) (actual time=1155.491..16720.460 rows=675253 loops=1)
Hash Cond: (scenario_ad_group_performances.vendor_instance_id = scenario_ad_group_vendor_instances.id)
-> Seq Scan on scenario_ad_group_performances (cost=0.00..476354.29 rows=4158678 width=44) (actual time=0.043..8949.640 rows=4307019 loops=1)
Filter: ((date >= '2012-02-01'::date) AND (date <= '2012-02-28'::date))
-> Hash (cost=24047.72..24047.72 rows=51371 width=143) (actual time=1123.896..1123.896 rows=34495 loops=1)
Buckets: 1024 Batches: 16 Memory Usage: 392kB
-> Hash Right Join (cost=6625.90..24047.72 rows=51371 width=143) (actual time=92.257..1070.786 rows=34495 loops=1)
Hash Cond: (scenario_ad_group_vendor_instances.ad_group_id = scenario_ad_groups.id)
-> Seq Scan on scenario_ad_group_vendor_instances (cost=0.00..11336.31 rows=317174 width=8) (actual time=0.020..451.496 rows=431770 loops=1)
Filter: (ad_vendor_id = ANY ('{1,2,3}'::integer[]))
-> Hash (cost=5475.55..5475.55 rows=34828 width=139) (actual time=88.311..88.311 rows=34495 loops=1)
Buckets: 1024 Batches: 8 Memory Usage: 726kB
-> Bitmap Heap Scan on scenario_ad_groups (cost=798.20..5475.55 rows=34828 width=139) (actual time=4.451..44.065 rows=34495 loops=1)
Recheck Cond: (scenario_id = 276)
-> Bitmap Index Scan on index_scenario_ad_groups_on_scenario_id (cost=0.00..789.49 rows=34828 width=0) (actual time=4.232..4.232 rows=37006 loops=1)
Index Cond: (scenario_id = 276)
Total runtime: 21306.697 ms
Disclaimer
I have used PostgreSQL very little. I'm answering based on my knowledge of SQL Server index usage and execution plans. I ask the PostgreSQL gods for mercy if I get something wrong.
Query Optimizers are Dynamic
You said your query plan has changed from your development to production environments. This is to be expected. Query optimizers are designed to generate the optimum execution plan based on the current data conditions. Under different conditions the optimizer may decide it is more efficient to use a table scan vs an index scan.
When would it be more efficient to use a table scan vs an index scan?
SELECT A, B
FROM someTable
WHERE A = 'SOME VALUE'
Let's say you have a non-clustered index on column A. In this case you are filtering on column A, which could potentially take advantage of the index. This would be efficient if the index is selective enough - basically, how many distinct values make up the index? The database keeps statistics on this selectivity info and uses these statistics when calculating costs for execution plans.
If you have a million rows in a table, but only 10 possible values for A, then your query would likely return about 100K rows. Because the index is non-clustered, and you are returning columns not included in the index, B, a lookup will need to be performed for each row returned. These look-ups are random-access lookups which are much more expensive then sequential reads used by a table scan. At a certain point it becomes more efficient for the database to just perform a table scan rather than an index scan.
This is just one scenario, there are many others. It's hard to know without knowing more about what your data is like, what your indexes look like and how you are trying to access the data.
To answer the original question:
Would PostgreSQL abstain from using indexes if they (or the table) are too big? No. It is more likely that in the way that you are accessing the data, it is less efficient for PostgreSQL to use the index vs using a table scan.
The PostgreSQL FAQ touches on this very subject (see: Why are my queries slow? Why don't they use my indexes?): https://wiki.postgresql.org/wiki/FAQ#Why_are_my_queries_slow.3F_Why_don.27t_they_use_my_indexes.3F
Postgres' query optimizer comes up with multiple scenarios (e.g. index vs seq-scan) and evaluates them using statistical information about your tables and the relative costs of disk/memory/index/table access set in configuration.
Did you use the EXPLAIN command to see why index use was omitted? Did you use EXPLAIN ANALYZE to find out if the decision was in error? Can we see the outputs, please?
edit:
As hard as analyzing two different singular queries on different systems are, I think I see a couple of things.
The production environment has a actual/cost rate of around 20-100 milliseconds per cost unit. I'm not even a DBA, but this seems consistent. The development environment has 261 for the main query. Does this seem right? Would you expect the raw speed (memory/disk/CPU) of the production environment to be 2-10x faster than dev?
Since the production environment has a much more complex query plan, it looks like it's doing its job. Undoubtedly, the dev environment's plan and many more have been considered, and deemed too costly. And the 20-100 variance isn't that much in my experience (but again, not a DBA) and shows that there isn't anything way off the mark. Still, you may want to run a VACUUM on the DB just in case.
I'm not experienced and patient enough to decode the full query, but could there be a denormalization/NOSQL-ization point for optimization?
The biggest bottleneck seems to be the disk merge at 90 MB. If the production environment has enough memory, you may want to increase the relevant setting (working memory?) to do it in-memory. It seems to be the work_mem parameter here, though you'll want to read through the rest.
I'd also suggest having a look at the index usage statistics. Many options with partial and functional indices exist.
It seems to me that your dev data is much "simpler" than the production data. As an example:
Development:
-> Index Scan using index_scenario_ad_group_vendor_instances_on_ad_group_id on scenario_ad_group_vendor_instances
(cost=0.00..1737.02 rows=27447 width=8)
(actual time=0.007..7.021 rows=16528 loops=1)
Filter: (ad_vendor_id = ANY ('{1,2,3}'::integer[]))
Production:
-> Seq Scan on scenario_ad_group_vendor_instances
(cost=0.00..11336.31 rows=317174 width=8)
(actual time=0.020..451.496 rows=431770 loops=1)
Filter: (ad_vendor_id = ANY ('{1,2,3}'::integer[]))
This means, that in dev 27447 matching row have been estimated upfront and 16528 rows were indeed found. That't the same ballpark and OK.
In production 317174 matching rows have been estimated upfront and 431770 rows were found. Also OK.
But comparing dev to prod means that the numbers are 10 times different. As already other answers indicate, doing 10 times more random seeks (due to index access) might indeed be worse than a plain table scan.
Hence the interesting question is: How many rows does this table contain both in dev and in prod? Is number_returned_rows / number_total_rows comparable between dev and prod?
Edit Don't forget: I have picked one index access as an example. A quick glance shows that the other index accesses have the same symptoms.
Try
SET enable_seqscan TO 'off'
before EXPLAIN ANALYZE