Why PostgresQL count is so slow even with Index Only Scan - sql

I have a simple count query that can use Index Only Scan, but it still take so long in PostgresQL!
I have a cars table with 2 columns type bigint and active boolean, I also have a multi-column index on those columns
CREATE TABLE cars
(
id BIGSERIAL NOT NULL
CONSTRAINT cars_pkey PRIMARY KEY ,
type BIGINT NOT NULL ,
name VARCHAR(500) NOT NULL ,
active BOOLEAN DEFAULT TRUE NOT NULL,
created_at TIMESTAMP(0) WITH TIME ZONE default NOW(),
updated_at TIMESTAMP(0) WITH TIME ZONE default NOW(),
deleted_at TIMESTAMP(0) WITH TIME ZONE
);
CREATE INDEX cars_type_active_index ON cars(type, active);
I inserted some test data with 950k records, type=1 have 600k records
INSERT INTO cars (type, name) (SELECT 1, 'car-name' FROM generate_series(1,600000));
INSERT INTO cars (type, name) (SELECT 2, 'car-name' FROM generate_series(1,200000));
INSERT INTO cars (type, name) (SELECT 3, 'car-name' FROM generate_series(1,100000));
INSERT INTO cars (type, name) (SELECT 4, 'car-name' FROM generate_series(1,50000));
Let 's run VACUUM ANALYZE and force PostgresQL to use Index Only Scan
VACUUM ANALYSE;
SET enable_seqscan = OFF;
SET enable_bitmapscan = OFF;
OK, I have a simple query on type and active
EXPLAIN (VERBOSE, BUFFERS, ANALYSE)
SELECT count(*)
FROM cars
WHERE type = 1 AND active = true;
Result:
Aggregate (cost=24805.70..24805.71 rows=1 width=0) (actual time=4460.915..4460.918 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=2806
-> Index Only Scan using cars_type_active_index on public.cars (cost=0.42..23304.23 rows=600590 width=0) (actual time=0.051..2257.832 rows=600000 loops=1)
Output: type, active
Index Cond: ((cars.type = 1) AND (cars.active = true))
Filter: cars.active
Heap Fetches: 0
Buffers: shared hit=2806
Planning time: 0.213 ms
Execution time: 4461.002 ms
(11 rows)
Look at the query explain result,
It used Index Only Scan, with index only scan, depending on visibilities map, PostgresQL sometime need to fetch Table Heap to check for visibility of the tuple, But I already run VACUUM ANALYZE so you can see Heap fetch = 0, so reading the index is enough for answer this query.
The size of the index is quite small, it can all fit on the Buffer cache (Buffers: shared hit=2806), PostgresQL does not need to fetch pages from disk.
From there, I can't understand why PostgresQL take that long (4.5s) to answer the query, 1M records is not a big number of records, everything is already cached on memory, and the data on index is visible, it does not need to fetch Heap.
PostgreSQL 9.5.10 on x86_64-pc-linux-gnu, compiled by gcc (Debian 4enter code here.9.2-10) 4.9.2, 64-bit
I tested it on docker 17.09.1-ce, Macbook pro 2015.
I am still new to PostgresQL and trying to map my knowledge with the real cases.
Thanks so much,

It seems like I found the reason, it not about PostgresQL problems, it 's because of running in docker. When I run directly in my mac, the time will be around 100ms which is fast enough.
Another thing I figured out is the reason why PostgresQL still use seq scan instead of index only scan (that why I have to disable seq_scan and bitmapscan in my test):
The size of table is not so big compare to the size of the index, if I add more columns to the table or length of columns is longer, the bigger size of the table, the more chance index can be use.
random_page_cost value by default is 4, my disk is quite fast so I can set it to 1-2, it will help the psql's explainer estimate cost more correctly.

Related

Very slow updates and deletes on unreferenced PostgreSQL table with low row count

I have a PostgreSQL 13 database with a table named cache_record, hosted on Amazon RDS.
This is the table's definition:
CREATE TABLE cache_record
(
key text NOT NULL,
type text NOT NULL,
value bytea NOT NULL,
expiration timestamptz NOT NULL,
created_at timestamptz NOT NULL DEFAULT NOW(),
updated_at timestamptz NOT NULL DEFAULT NOW(),
CONSTRAINT cache_record_pkey PRIMARY KEY (key)
)
WITH (
OIDS = FALSE
);
CREATE INDEX cache_record_expiration_idx
ON cache_record USING btree
(expiration ASC NULLS LAST);
The table itself is not referenced by any foreign key (so no indexing/trigger issue) and only contains ~ 30000 rows. The value field does not exceed 1 MB in length on each row, with less than 50 bytes for 50% of the rows. Normally, DELETEs are performed as such:
DELETE FROM cache_record
WHERE expiration < NOW();
There are ~ 10000 expired rows to delete in the table. But this query takes too long to execute and the batch that runs it times out. So I decided to split it in batches and execute it manually from a shell:
DELETE FROM cache_record
WHERE key IN (SELECT key
FROM cache_record
WHERE expiration < NOW()
ORDER BY created_at
LIMIT 100)
One batch of 100 rows takes ~ 30 s to execute, which is absurd. The nested SELECT itself executes a lot faster than the nesting DELETE (with or without LIMIT).
The query never caused any issue until yesterday, when the CRON batch that is supposed to purge entries from the table started to timeout (30 s). Although, it's entirely possible that the query has always been slow but was just under the timeout threshold until yesterday.
What could be causing the slowness?
Edit 2023-01-20
I ran the query using EXPLAIN as suggested in the comments:
EXPLAIN (ANALYSE, BUFFERS) DELETE FROM cache_record WHERE expiration < NOW();
I purged the table yesterday so the query only had a few hits, but it's enough to show the speed issue (> 10 s of execution time):
Delete on cache_record (cost=14.28..501.73 rows=257 width=6) (actual time=10595.107..10595.109 rows=0 loops=1)
Buffers: shared hit=200819 read=43245 dirtied=42783 written=9470
I/O Timings: read=3037.437 write=73.217
-> Bitmap Heap Scan on cache_record (cost=14.28..501.73 rows=257 width=6) (actual time=0.528..29.769 rows=551 loops=1)
Recheck Cond: (expiration < now())
Heap Blocks: exact=88
Buffers: shared hit=10 read=85 dirtied=34 written=21
I/O Timings: read=2.006 write=0.161
-> Bitmap Index Scan on cache_record_expiration_idx (cost=0.00..14.22 rows=257 width=0) (actual time=0.030..0.031 rows=551 loops=1)
Index Cond: (expiration < now())
Buffers: shared hit=7
Planning:
Buffers: shared hit=56
Planning Time: 0.324 ms
Execution Time: 10595.676 ms
Based on the large number of buffers read and dirtied which show up only on the DELETE node, I would say your time is going to maintaining the TOAST table, deleting the huge "value" column. I don't know why it wasn't a problem before, maybe you were naturally deleting only a few records at a time before, or maybe you were principally deleting smaller records before. You said 50% are below 50 bytes, but maybe that 50% is not evenly distributed and you just hit a big slug of large ones.
As for the speed of the select, when you only select the "key" column, it doesn't need to access the TOAST records for the "value" column, so it doesn't spend any time doing so.

SQL - can search performance depend on amount of columns?

I have something like the following table
CREATE TABLE mytable
(
id serial NOT NULL
search_col int4 NULL,
a1 varchar NULL,
a2 varchar NULL,
...
a50 varchar NULL,
CONSTRAINT mytable_pkey PRIMARY KEY (id)
);
CREATE INDEX search_col_idx ON mytable USING btree (search_col);
This table has approximately 5 million rows and it takes about 15 seconds to perform a search operation like
select *
from mytable
where search_col = 83310
It is crucial for me to increase performance, but even clustering the table after search_col did not bring a major benefit.
However, I tried the following:
create table test as (select id, search_col, a1 from mytable);
A search on this table, having the same amount of rows as the original one, takes approximately 0.2 seconds. Why that and how can I use this for what I need?
Index Scan using search_col_idx on mytable (cost=0.43..2713.83 rows=10994 width=32802) (actual time=0.021..13.015 rows=12018 loops=1)
Seq Scan on test (cost=0.00..95729.46 rows=12347 width=19) (actual time=0.246..519.501 rows=12018 loops=1)
The result of DBeaver's Execution Plan
|Knotentyp|Entität|Kosten|Reihen|Zeit|Bedingung|
|Index Scan|mytable|0.43 - 3712.86|12018|13.141|(search_col = 83310)|
Execution Plan from psql:
Index Scan using mytable_search_col_idx on mytable (cost=0.43..3712.86 rows=15053 width=32032) (actual time=0.015..13.889 rows=12018 loops=1)
Index Cond: (search_col = 83310)
Planning time: 0.640 ms
Execution time: 23.910 ms
(4 rows)
One way that the columns would impact the timing would be if the columns were large. Really large.
In most cases, a row resides on a single data page. The index points to the page and the size of the row has little impact on the timing, because the timing is dominated by searching the index and fetching the row.
However, if the columns are really large, then that can require reading many more bytes from disk, which takes more time.
That said, another possibility is that the statistics are out-of-date and the index isn't being used on the first query.

Why isn't Postgres using the index with Distinct?

I have this table:
CREATE TABLE public.prodhistory (
curve_id int4 NOT NULL,
start_prod_date date NOT NULL,
prod_date date NOT NULL,
monthly_prod_rate float4 NOT NULL,
eff_date timestamp NOT NULL,
/* Keys */
CONSTRAINT prodhistorypk
PRIMARY KEY (curve_id, prod_date, start_prod_date, eff_date),
/* Foreign keys */
CONSTRAINT prodhistory2typecurves_fk
FOREIGN KEY (curve_id)
REFERENCES public.typecurves(curve_id)
) WITH (
OIDS = FALSE
);
CREATE INDEX prodhistory_idx_curve_id01
ON public.prodhistory
(curve_id);
with ~42M rows.
And I execute this query:
SELECT DISTINCT curve_id FROM prodhistory
Which I expect would be very quick, given the index. But no, 270 secs. So I explain, and I get:
HashAggregate (cost=824870.03..824873.08 rows=305 width=4) (actual time=211834.018..211834.097 rows=315 loops=1)
Output: curve_id
Group Key: prodhistory.curve_id
-> Seq Scan on public.prodhistory (cost=0.00..718003.22 rows=42746722 width=4) (actual time=12.751..200826.299 rows=43218808 loops=1)
Output: curve_id
Planning time: 0.115 ms
Execution time: 211848.137 ms
I'm not to experienced in reading these plans, but a Seq Scan on the DB seems bad.
Any thoughts? I'm sort of stumped.
This plan is chosen because PostgreSQL thinks it is cheaper.
You can compare by setting
SET enable_seqscan=off;
and then re-running your EXPLAIN (ANALYZE) statement. Compare cost and actual time in both cases and check if PostgreSQL estimated correctly or not.
If you find that using an Index Scan or Index Only Scan is actually cheaper, you could consider twiddling the cost parameters to match your machine better, e.g. lower random_page_cost or cpu_index_tuple_cost or raise cpu_tuple_cost.
PostgreSQL "index only scans" aren't always as cheap as you might think.
The reason is that each row needs to be checked as to whether it is visible to the MVCC snapshot or not.
Whether this is cheap or not depends on the table's visibility map.
If you force an index only scan (as per laurenz-albe's answer):
SET enable_seqscan=off;
Then run your query with:
EXPLAIN (ANALYZE ON, BUFFERS ON)
And see query plan output with "heap fetches" as below this means that the table's actual row data is being accessed, not just the index.
Index Only Scan using my_index on my_table (cost=0.42..17792.01 rows=595195 width=20) (actual time=37.942..2330.737 rows=539105 loops=1)
Heap Fetches: 234180
The official documentation describes this here:
https://www.postgresql.org/docs/current/indexes-index-only-scans.html
You may be able to resolve this by altering the way the table is updated, or by adjusting your auto vacuum settings.

Why is PostgreSQL not using *just* the covering index in this query depending on the contents of its IN() clause?

I have a table with a covering index that should respond to a query using just the index, without checking the table at all. Postgres does, in fact, do that, if the IN() clause has 1 or a few elements in it. However, if the IN clause has lots of elements, it seems like it's doing the search on the index, and then going to the table and re-checking the conditions...
I can't figure out why Postgres would do that. It can either serve the query straight from the index or it can't, why would it go to the table if it (in theory) doesn't have anything else to add?
The table:
CREATE TABLE phone_numbers
(
id serial NOT NULL,
phone_number character varying,
hashed_phone_number character varying,
user_id integer,
created_at timestamp without time zone,
updated_at timestamp without time zone,
ghost boolean DEFAULT false,
CONSTRAINT phone_numbers_pkey PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
CREATE INDEX index_phone_numbers_covering_hashed_ghost_and_user
ON phone_numbers
USING btree
(hashed_phone_number COLLATE pg_catalog."default", ghost, user_id);
The query I'm running is :
SELECT "phone_numbers"."user_id"
FROM "phone_numbers"
WHERE "phone_numbers"."hashed_phone_number" IN (*several numbers*)
AND "phone_numbers"."ghost" = 'f'
As you can see, the index has all the fields it needs to reply to that query.
And if I have only one or a few numbers in the IN clause, it does:
1 number:
Index Scan using index_phone_numbers_on_hashed_phone_number on phone_numbers (cost=0.41..8.43 rows=1 width=4)
Index Cond: ((hashed_phone_number)::text = 'bebd43a6eb29b2fda3bcb63dcc7ffaf5433e78660ccd1a495c1180a3eaaf6b6a'::text)
Filter: (NOT ghost)"
3 numbers:
Index Only Scan using index_phone_numbers_covering_hashed_ghost_and_user on phone_numbers (cost=0.42..17.29 rows=1 width=4)
Index Cond: ((hashed_phone_number = ANY ('{8228a8116f1fdb12e243102cb85ecd859ebf7873d9332dce5f1343a481ec72e8,43ddeebdca2ea829d468d5debc84d475c8322cf4bf6edca286c918b04216387e,1578bf773eb6eb8a9b57a130922a28c9c91f1bda67202ef5936b39630ca4cfe4}'::text[])) AND (...)
Filter: (NOT ghost)"
However, when I have a lot of numbers in the IN clause, Postgres is using the Index, but then hitting the table, and I don't know why:
Bitmap Heap Scan on phone_numbers (cost=926.59..1255.81 rows=106 width=4)
Recheck Cond: ((hashed_phone_number)::text = ANY ('{b6459ce58f21d99c462b132cce7adc9ea947fa522a3849321e9fb65893006a5e,8228a8116f1fdb12e243102cb85ecd859ebf7873d9332dce5f1343a481ec72e8,ab3554acc1f287bb2e22ff20bb855e19a4177ef552676689d217dbb2a1a6177b,7ec9f58 (...)
Filter: (NOT ghost)
-> Bitmap Index Scan on index_phone_numbers_covering_hashed_ghost_and_user (cost=0.00..926.56 rows=106 width=0)
Index Cond: (((hashed_phone_number)::text = ANY ('{b6459ce58f21d99c462b132cce7adc9ea947fa522a3849321e9fb65893006a5e,8228a8116f1fdb12e243102cb85ecd859ebf7873d9332dce5f1343a481ec72e8,ab3554acc1f287bb2e22ff20bb855e19a4177ef552676689d217dbb2a1a6177b,7e (...)
This is currently making this query, which is looking for 250 records in a table with 50k total rows, about twice as low as a similar query on another table, which looks for 250 records in a table with 5 million rows, which doesn't make much sense.
Any ideas what could be happening, and whether I can do anything to improve this?
UPDATE: Changing the order of the columns in the covering index to have first ghost and then hashed_phone_number also doesn't solve it:
Bitmap Heap Scan on phone_numbers (cost=926.59..1255.81 rows=106 width=4)
Recheck Cond: ((hashed_phone_number)::text = ANY ('{b6459ce58f21d99c462b132cce7adc9ea947fa522a3849321e9fb65893006a5e,8228a8116f1fdb12e243102cb85ecd859ebf7873d9332dce5f1343a481ec72e8,ab3554acc1f287bb2e22ff20bb855e19a4177ef552676689d217dbb2a1a6177b,7ec9f58 (...)
Filter: (NOT ghost)
-> Bitmap Index Scan on index_phone_numbers_covering_ghost_hashed_and_user (cost=0.00..926.56 rows=106 width=0)
Index Cond: ((ghost = false) AND ((hashed_phone_number)::text = ANY ('{b6459ce58f21d99c462b132cce7adc9ea947fa522a3849321e9fb65893006a5e,8228a8116f1fdb12e243102cb85ecd859ebf7873d9332dce5f1343a481ec72e8,ab3554acc1f287bb2e22ff20bb855e19a4177ef55267668 (...)
The choice of indexes is based on what the optimizer says is the best solution for the query. Postgres is trying really hard with your index, but it is not the best index for the query.
The best index has ghost first:
CREATE INDEX index_phone_numbers_covering_hashed_ghost_and_user
ON phone_numbers
USING btree
(ghost, hashed_phone_number COLLATE pg_catalog."default", user_id);
I happen to think that MySQL documentation does a good job of explaining how composite indexes are used.
Essentially, what is happening is that Postgres needs to do an index seek for every element of the in list. This may be compounded by the use of strings -- because collations/encodings affect the comparisons. Eventually, Postgres decides that other approaches are more efficient. If you put ghost first, then it will just jump to the right part of the index and find the rows it needs there.

Index doesn't improve performance

I have a simple table structure in my postgres database:
CREATE TABLE device
(
id bigint NOT NULL,
version bigint NOT NULL,
device_id character varying(255),
date_created timestamp without time zone,
last_updated timestamp without time zone,
CONSTRAINT device_pkey PRIMARY KEY (id )
)
I'm often querying data based on deviceId column. The table has 3,5 million rows, so it leads to performance issues:
"Seq Scan on device (cost=0.00..71792.70 rows=109 width=8) (actual time=352.725..353.445 rows=2 loops=1)"
" Filter: ((device_id)::text = '352184052470420'::text)"
"Total runtime: 353.463 ms"
Hence I've created index on device_id column:
CREATE INDEX device_device_id_idx
ON device
USING btree
(device_id );
However my problem is, that database still uses sequential scan, not index scan. The query plan after creating the index is the same:
"Seq Scan on device (cost=0.00..71786.33 rows=109 width=8) (actual time=347.133..347.508 rows=2 loops=1)"
" Filter: ((device_id)::text = '352184052470420'::text)"
"Total runtime: 347.538 ms"
The result of the query are 2 rows, so I'm not selecting a big portion of the table. I don't really understand why index is disregarded. What can I do to improve the performance?
edit:
My query:
select id from device where device_id ='357560051102491A';
I've run analyse on the device table, which didn't help
device_id contains also characters.
You may need to look at the queries. To use an index, the queries need to be sargable. That means certain ways to construct the queries are better than other ways. I am not familiar with Postgre but in SQl Server this would include such things (very small sample of the bad constructs):
Not doing data transformations in the join - instead store the data
properly
Not using correlated subqueries - use derived tables or temp table
instead
Not using OR conditions - use UNION ALL instead
Your first step shoud be to get a good book on performance tuning for your specific database. It will talk about what constructions to avoid for your particular database engine.
Indexes are not used when you cast a column to a different type:
((device_id)::text = '352184052470420'::text)
Instead, you can do this way:
(device_id = ('352184052470420'::character varying))
(or maybe you can change device_id to TEXT in the original table, if you wish.)
Also, remember to run analyze device after index has been created, or it will not be used anyway.
It seems, like time resolves everything. I'm not sure what have happened, but currently its working fine.
From the time I've posted this question I didn't change anything and now I get this query plan:
"Bitmap Heap Scan on device (cost=5.49..426.77 rows=110 width=166)"
" Recheck Cond: ((device_id)::text = '357560051102491'::text)"
" -> Bitmap Index Scan on device_device_id_idx (cost=0.00..5.46 rows=110 width=0)"
" Index Cond: ((device_id)::text = '357560051102491'::text)"
Time breakdown (timezone GMT+2):
~15:50 I've created the index
~16:00 I've dropepd and recreated the index several times, since it was not working
16:05 I've run analyse device (didn't help)
16:44:49 from app server request_log, I can see that the requests executing the query are still taking around 500 ms
16:56:59 I can see first request, which takes 23 ms (the index started to work!)
The question stays, why it took around 1:10 hour for the index to be applied? When I was creating indexes in the same database few days ago the changes were immediate.