Too many buffers hit+read during index scan - sql

I've 2 tables User and Info. I'm writing a simple query with inner join and inserting the result into an unlogged table.
INSERT INTO Result (
iProfileId,email,format,content
)
SELECT
COALESCE(N1.iprofileId, 0),
Lower(N1.email),
W0.format,
W0.content
FROM
Info W0,
User N1
where
(N1.iprofileId = W0.iId);
Info table has 30M rows and User table has 158M rows. Due to some reason, this query is taking too long on one of my prod setups. At first glance it looks like its reading/hitting too many buffers:
Insert on Result (cost=152813.60..15012246.06 rows=31198136 width=1080) (actual time=5126063.502..5126063.502 rows=0 loops=1)
Buffers: shared hit=128815094 read=6103564 dirtied=599445 written=2088037
I/O Timings: read=2563306.517 write=570919.940
-> Merge Join (cost=152813.60..15012246.06 rows=31198136 width=1080) (actual time=0.097..5060947.922 rows=31191937 loops=1)
Merge Cond: (w0.iid = n1.iprofileid)
Buffers: shared hit=96480126 read=5574864 dirtied=70745 written=2009998
I/O Timings: read=2563298.981 write=562810.833
-> Index Scan using user_idx on info w0 (cost=0.56..2984094.60 rows=31198136 width=35) (actual time=0.012..246299.026 rows=31191937 loops=1)
Buffers: shared hit=481667 read=2490602 written=364347
I/O Timings: read=178000.987 write=38663.457
-> Index Scan using profile_id on user n1 (cost=0.57..14938848.88 rows=158842848 width=32) (actual time=0.020..4718272.082 rows=115378606 loops=1)
Buffers: shared hit=95998459 read=3084262 dirtied=70745 written=1645651
I/O Timings: read=2385297.994 write=524147.376
Planning Time: 11.531 ms
Execution Time: 5126063.577 ms
When I ran this query on a different setup but with similar tables and number of records, profile_id scan only used 5M pages(ran in 3m) whereas here it used(read+hit) 100M buffers(ran in 1.45h). When I checked using vacuum verbose this table only has 10M pages.
INFO: "User": found 64647 removable, 109184385 nonremovable row versions in 6876625 out of 10546400 pages
This is one of the good runs but we've seen this query taking up to 4-5 hrs as well. My test system which ran in under 3 mins also had iid distributed among profile_id range. But it had fewer columns and indexes as compared to the prod system. What could be the reason for this slowness?

The execution plan you are showing has a lot of dirtied and written pages. That indicates that the tables were freshly inserted, and your query was the first reader.
In PostgreSQL, the first reader of a new table row consults the commit log to see if that row is visible or not (did the transaction that created it commit?). It then sets flags in the row (the so-called hint bits) to save the next reader that trouble.
Setting the hint bits modifies the row, so the block is dirtied and has to be written to disk eventually. That writing is normally done by the checkpointer or the background writer, but they couldn't keep up, so the query had to clean out many dirty pages itself.
If you run the query a second time, it will be faster. For that reason, it is a good idea to VACUUM tables after bulk loading, which will also set the hint bits.
However, a large query like that will always be slow. Things you can try to speed it up further are:
have lots of RAM and load the tables into shared buffers with pg_prewarm
crank up work_mem in the hope to get a faster hash join
CLUSTER the tables using the indexes, so that heap fetches become more efficient

Related

Optimizing SELECT count(*) on large table

Basic count on a large table on PostgreSQL 14 with 64GB Ram & 20 threads. Storage is an NVME disk.
Questions:
How do I improve the query for this select count query? What kind of optimizations should I look into on Postgres configuration?
The workers planned is 4 but launched 0, is that normal?
EXPLAIN (ANALYZE, BUFFERS)
SELECT count(*) FROM public.product;
Finalize Aggregate (cost=2691545.69..2691545.70 rows=1 width=8) (actual time=330901.439..330902.951 rows=1 loops=1)
Buffers: shared hit=1963080 read=1140455 dirtied=1908 written=111146
I/O Timings: read=36692.273 write=6548.923
-> Gather (cost=2691545.27..2691545.68 rows=4 width=8) (actual time=330901.342..330902.861 rows=1 loops=1)
Workers Planned: 4
Workers Launched: 0
Buffers: shared hit=1963080 read=1140455 dirtied=1908 written=111146
I/O Timings: read=36692.273 write=6548.923
-> Partial Aggregate (cost=2690545.27..2690545.28 rows=1 width=8) (actual time=330898.747..330898.757 rows=1 loops=1)
Buffers: shared hit=1963080 read=1140455 dirtied=1908 written=111146
I/O Timings: read=36692.273 write=6548.923
-> Parallel Index Only Scan using points on products (cost=0.57..2634234.99 rows=22524114 width=0) (actual time=0.361..222958.361 rows=90993600 loops=1)
Heap Fetches: 46261956
Buffers: shared hit=1963080 read=1140455 dirtied=1908 written=111146
I/O Timings: read=36692.273 write=6548.923
Planning:
Buffers: shared hit=39 read=8
I/O Timings: read=0.398
Planning Time: 2.561 ms
JIT:
Functions: 4
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 0.691 ms, Inlining 104.789 ms, Optimization 24.169 ms, Emission 22.457 ms, Total 152.107 ms
Execution Time: 330999.777 ms
The workers planned is 4 but launched 0, is that normal?
It can happen when too many concurrent transactions compete for a limited number of allowed parallel workers. The manual:
The number of background workers that the planner will consider using
is limited to at most max_parallel_workers_per_gather. The
total number of background workers that can exist at any one time is
limited by both max_worker_processes and
max_parallel_workers. Therefore, it is possible for a
parallel query to run with fewer workers than planned, or even with no
workers at all. The optimal plan may depend on the number of workers
that are available, so this can result in poor query performance. If
this occurrence is frequent, consider increasing
max_worker_processes and max_parallel_workers so that more workers
can be run simultaneously or alternatively reducing
max_parallel_workers_per_gather so that the planner requests fewer
workers.
You can also optimize overall performance to free up resources, or get better hardware (in addition to ramping up max_parallel_workers).
What's also troubling:
Heap Fetches: 46261956
For 90993600 rows. That's way too many for comfort. An index-only scan is not supposed to do that many heap fetches.
Both of these symptoms would indicate massive concurrent write access (or long-running transactions hogging resources and keeping autovacuum from doing its job). Look into that, and/or tune per-table autovacuum settings for table product to be more aggressive, so that columns statistics are more valid and the visibility map can keep up. See:
Aggressive Autovacuum on PostgreSQL
Also, with halfway valid table statistics, a (blazingly fast!) estimate might be good enough? See:
Fast way to discover the row count of a table in PostgreSQL

Postgresql performance issue when querying on time range

I'm trying to understand a strange performance issue on Postgres (v10.9).
We have a requests table and I want to get all requests made by a set of particular users in several time ranges. Here is the relevant info:
There is no user_id column in the table. Rather, there is a jsonb column named params, where the user_id field is stored as a string.
The set of users in question is very large, in the thousands.
There is a time column of type timestamptz and it's indexed with a standard BTREE index.
There is also an separate BTREE index on params->>'user_id'.
The queries I am running are based on the following template:
SELECT *
FROM requests
WHERE params->>'user_id' = ANY (VALUES ('id1'), ('id2'), ('id3')...)
AND time > 't1' AND time < 't2'
Where the ids and times here are placeholders for actual ids and times.
I am running a query like this for several consecutive time ranges of 2 weeks each. The queries for the first few time ranges take a couple of minutes each, which is obviously very long in terms of production but OK for research purposes. Then suddenly there is a dramatic spike in query runtime, and they start taking hours each, which begins to be untenable even for offline purposes.
This spike happens in the same range every time. It's worth noting that in this time range there is a x1.5 increase in total requests. Certainly more compared with the previous time range, but not enough to warrant a spike by a full order of magnitude.
Here is the output for EXPLAIN ANALYZE for the last time range with the reasonable running time:
Hash Join (cost=442.69..446645.35 rows=986171 width=1217) (actual time=66.305..203593.238 rows=445175 loops=1)
Hash Cond: ((requests.params ->> 'user_id'::text) = \"*VALUES*\".column1)
-> Index Scan using requests_time_idx on requests (cost=0.56..428686.19 rows=1976888 width=1217) (actual time=14.336..201643.439 rows=2139604 loops=1)
Index Cond: ((\"time\" > '2019-02-12 22:00:00+00'::timestamp with time zone) AND (\"time\" < '2019-02-26 22:00:00+00'::timestamp with time zone))
-> Hash (cost=439.62..439.62 rows=200 width=32) (actual time=43.818..43.818 rows=29175 loops=1)
Buckets: 32768 (originally 1024) Batches: 1 (originally 1) Memory Usage: 2536kB
-> HashAggregate (cost=437.62..439.62 rows=200 width=32) (actual time=24.887..33.775 rows=29175 loops=1)
Group Key: \"*VALUES*\".column1
-> Values Scan on \"*VALUES*\" (cost=0.00..364.69 rows=29175 width=32) (actual time=0.006..10.303 rows=29175 loops=1)
Planning time: 133.807 ms
Execution time: 203697.360 ms
If I understand this correctly, it seems that most of the time is spent on filtering the requests by time range, even though:
The time index seems to be used.
When running the same queries without the filter on the users (basically just fetching all requests by time range only), they both run in OK times.
Any thoughts on how to solve this problem would be appreciated, thanks!
Since you are retrieving so many rows, the query will never be really fast.
Unfortunately there is no single index to cover both conditions, but you can use these two:
CREATE INDEX ON requests ((params->>'user_id'));
CREATE INDEX ON requests (time);
Then you can hope for two bitmap index scans which get joined by a “bitmap or”.
I am not sure if that will improve performance; PostgreSQL may still opt for the current plan, which is not a bad one. If your indexes are cached or random access to your storage is fast, set effective_cache_size or random_page_cost accordingly, that will make PostgreSQL lean towards an index scan.

Why are both SELECT count(PK) and SELECT count(*) so slow?

I've got a simple table with single column PRIMARY KEY called id, type serial. There is exactly 100,000,000 rows in there. Table takes 48GB, PK index ca 2,1GB. Machine running on is "dedicated" only for Postgres and it is something like Core i5, 500GB HDD, 8GB RAM. Pg config was created by pgtune utility (shared buffers ca 2GB, effective cache ca 7GB). OS is Ubuntu server 14.04, Postgres 9.3.6.
Why are both SELECT count(id) and SELECT count(*) so slow in this simple case (cca 11 minutes)?
Why is PostgreSQL planner choosing full table scan instead of index scanning which should be at least 25 times faster (in the case where it would have to read the whole index from HDD). Or where am I wrong?
Btw running the query several times in a row is not changing anything. still cca 11 minutes.
Execution plan here:
Aggregate (cost=7500001.00..7500001.01 rows=1 width=0) (actual time=698316.978..698316.979 rows=1 loops=1)
Buffers: shared hit=192 read=6249809
-> Seq Scan on transaction (cost=0.00..7250001.00 rows=100000000 width=0) (actual time=0.009..680594.049 rows=100000001 loops=1)
Buffers: shared hit=192 read=6249809
Total runtime: 698317.044 ms
Considering the spec of a HDD is usually somewhere between 50Mb/s and 100Mb/s then for 48Gb I would expect to read everything between 500 and 1000s.
Since you have no where clause, the planner sees that you are interested in the large majority of records, so it does not use the index as this would require additional indexes. The reason postgresql cannot use the index lies in the MVCC which postgresql uses for transaction consistency. This requires that the rows are pulled to ensure accurate results. (see https://wiki.postgresql.org/wiki/Slow_Counting)
The cache, CPU, etc will not affect this nor changing the caching settings. This is IO bound and the cache will be completely trashed after the query.
If you can live with an approximation you can use the reltuples field in the table metadata:
SELECT reltuples FROM pg_class WHERE relname = 'tbl';
Since this is just a single row this is blazing fast.
Update: since 9.2 a new way to store the visibility information allowed index-only counts to happen. However there are quite some caveats, especially in the case where there is no predicate to limit the rows. see https://wiki.postgresql.org/wiki/Index-only_scans for more details.

Why PostgreSQL queries are slower in the first request after first new connection than during the subsequent requests?

Why PostgreSQL queries are slower in the first request after first new connection than during the subsequent requests?
Using several different technologies to connect to a postgresql database. First request might take 1.5 seconds. Exact same query will take .03 seconds the second time. Open a second instance of my application (connecting to same database) and that first request takes 1.5 seconds and the second .03 seconds.
Because of the different technologies we are using they are connecting at different points and using different connection methods so I really don't think it has anything to do with any code I have written.
I'm thinking that opening a connection doesn't do 'everything' until the first request, so that request has some overhead.
Because I have used the database, and kept the server up everything is in memory so index and the like should not be an issue.
Edit Explain - tells me about the query and honestly the query looks pretty good (indexed, etc). I really think postgresql has some kind of overhead on the first query of a new connection.
I don't know how to prove/disprove that. If I used PG Admin III (pgAdmin version 1.12.3) all the query's seem fast. Any of the other tools I have the first query is slow. Most the time its not noticeably slower, and if it was I always chalked it up to updating the ram with the index. But this is clearly NOT that. If I open my tool(s) and do any other query that returns results the second query is fast regardless. If the first query doesn't return results then the second is still slow, then third is fast.
edit 2
Even though I don't think the query has anything to do with the delay (every first query is slow) here are two results from running Explain (EXPLAIN ANALYZE)
EXPLAIN ANALYZE
select * from company
where company_id = 39
Output:
"Seq Scan on company (cost=0.00..1.26 rows=1 width=54) (actual time=0.037..0.039 rows=1 loops=1)"
" Filter: (company_id = 39)"
"Total runtime: 0.085 ms"
and:
EXPLAIN ANALYZE
select * from group_devices
where device_name ilike 'html5_demo'
and group_id in ( select group_id from manager_groups
where company_id in (select company_id from company where company_name ='TRUTHPT'))
output:
"Nested Loop Semi Join (cost=1.26..45.12 rows=1 width=115) (actual time=1.947..2.457 rows=1 loops=1)"
" Join Filter: (group_devices.group_id = manager_groups.group_id)"
" -> Seq Scan on group_devices (cost=0.00..38.00 rows=1 width=115) (actual time=0.261..0.768 rows=1 loops=1)"
" Filter: ((device_name)::text ~~* 'html5_demo'::text)"
" -> Hash Semi Join (cost=1.26..7.09 rows=9 width=4) (actual time=0.297..1.596 rows=46 loops=1)"
" Hash Cond: (manager_groups.company_id = company.company_id)"
" -> Seq Scan on manager_groups (cost=0.00..5.53 rows=509 width=8) (actual time=0.003..0.676 rows=469 loops=1)"
" -> Hash (cost=1.26..1.26 rows=1 width=4) (actual time=0.035..0.035 rows=1 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 1kB"
" -> Seq Scan on company (cost=0.00..1.26 rows=1 width=4) (actual time=0.025..0.027 rows=1 loops=1)"
" Filter: ((company_name)::text = 'TRUTHPT'::text)"
"Total runtime: 2.566 ms"
I have observed the same behavior. If I start a new connection, and run a query multiple times, the first execution is about 25% slower than the following executions. (This query has been run earlier in other connections, and I have verified that there is no disk I/O involved.) I profiled the process with perf during the first query execution, and this is what I found:
As you can see, a lot of time is spent handling page faults. If I profile the second execution, there are no page faults. AFAICT, these are what is called minor/soft page faults. This happens when a process for the first time access a page that is in shared memory. At that point, the process needs to map the page into its virtual address space (see https://en.wikipedia.org/wiki/Page_fault). If the page needs to be read from disk, it is called as major/hard page fault.
This explanation also fits with other observations that I have made: If I later run a different query in the same connection, the amount of overhead for its first execution seems to depend on how much overlap there is with the data accessed by the first query.
This a very old question, but hopefully this may help.
First query
It doesn't seem like an index is being used, and the optimizer is resorting to a sequential scan of the table.
While scanning the table sequentially, the optimizer may cache the entire table in RAM, if the data fits into the buffer. See this article for more information.
Why the buffering is occurring for each connection I don't know. Regardless, a sequential scan is not desirable for this kind of query and can be avoided with correct indexing and statistics.
Check the structure of the company table. Make sure that company_id is part of a UNIQUE INDEX or PRIMARY KEY.
Make sure you run ANALYZE, so that the optimizer has the correct statistics. This will help to ensure that the index for company will be used in your queries instead of a sequential scan of the table.
See PostgreSQL documentation
Second query
Try using INNER JOIN to avoid the optimizer selecting Hash Semi Join, to get more consistent performance and a simplier EXPLAIN PLAN:
select gd.*
from group_devices gd
inner join manager_groups mg on mg.group_id = gd.group_id
inner join company c on c.company_id = mg.company_id
where gd.device_name like 'html5_demo%'
and c.company_name = 'TRUTHPT';
See related question
First request will read blocks from disk to buffers.
Second request will read from buffers.
It doesnt matter how many connections are made, the result is dependant on whether that query has already been parsed.
Please note changing literals will reparse the query
Also note that if the query hasnt been executed in a while then physical reads may still occur depending on many variables.

Postgres on AWS performance issues on filter or aggregate

I'm working on a system which has a table with aprox. 13 million records.
It does not appear to be big deal for postgres, but i'm facing serious performance issues when hitting this particular table.
The table has aprox. 60 columns (I know it's too much, but I can't change it for reasons beyond my will).
Hardware ain't problem. It's running on AWS. I tested several configurations, even the new RDS for postgres:
vCPU ECU mem(gb)
m1.xlarge 64 bits 4 8 15
m2.xlarge 64 bits 2 6,5 17
hs1.8xlarge 64 bits 16 35 117 SSD
I tuned pg settings with pgtune. And also set the ubuntu's kernel sshmall and shmmax.
Some "explain analyze" queries:
select count(*) from:
$Aggregate (cost=350390.93..350390.94 rows=1 width=0) (actual time=24864.287..24864.288 rows=1 loops=1)
-> Index Only Scan using core_eleitor_sexo on core_eleitor (cost=0.00..319722.17 rows=12267505 width=0) (actual time=0.019..12805.292 rows=12267505 loops=1)
Heap Fetches: 9676
Total runtime: 24864.325 ms
select distinct core_eleitor_city from:
HashAggregate (cost=159341.37..159341.39 rows=2 width=516) (actual time=15965.740..15966.090 rows=329 loops=1)
-> Bitmap Heap Scan on core_eleitor (cost=1433.53..159188.03 rows=61338 width=516) (actual time=956.590..9021.457 rows=5886110 loops=1)
Recheck Cond: ((core_eleitor_city)::text = 'RIO DE JANEIRO'::text)
-> Bitmap Index Scan on core_eleitor_city (cost=0.00..1418.19 rows=61338 width=0) (actual time=827.473..827.473 rows=5886110 loops=1)
Index Cond: ((core_eleitor_city)::text = 'RIO DE JANEIRO'::text)
Total runtime: 15977.460 ms
I have btree indexes on columns frequently used for filter or aggregations.
So, given I can't change my table design. Is there something I can do to improve the performance?
Any help would be awesome.
Thanks
You're aggregating ~12.3M and ~5.9M rows on a VPS cluster that, if I am not mistaking, might span multiple physical servers, with data that is probably pulled from a SAN on yet another set of different server than Postgres itself.
Imho, there's little you can do to make it faster on (AWS anyway), besides a) not running queries that basically visit the entire database table to begin with and b) maintaining a pre-count using triggers if possible if you persist in doing so.
Here you go for improving the performance on RDS:
As referred to link here:
http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_ReadRepl.html
Amazon RDS uses MySQL’s built-in replication functionality to create a special type of DB instance called a read replica from a source DB instance. Updates made to the source DB instance are copied to the read replica. You can reduce the load on your source DB instance by routing read queries from your applications to the read replica. Read replicas allow you to elastically scale out beyond the capacity constraints of a single DB instance for read-heavy database workloads.