Postgres: huge table with (delayed) read and write access - sql

I have a huge table (currently ~3mil rows, expected to increase by a factor of 1000) with lots of inserts every second. The table is never updated.
Now I have to run queries on that table which are pretty slow (as expected). These queries do not have to be 100% accurate, it is ok if the result is a day old (but not older).
There is currently two indexes on two single integer columns and I would have to add two more indexes (integer and timestamp columns) to speed up my queries.
The ideas I had so far:
Add the two missing indexes to the table
No indexes on the huge table at all and copy the content (as a daily task) to a second table (just the important rows) then create the indexes on the second table and run the queries on that table?
Partitioning the huge table
Master/Slave setup (writing to the master and reading from the slaves).
What option is the best in terms of performance? Do you have any other suggestions?
EDIT:
Here is the table (I have marked the foreign keys and prettified the query a bit):
CREATE TABLE client_log
(
id serial NOT NULL,
logid integer NOT NULL,
client_id integer NOT NULL, (FOREIGN KEY)
client_version varchar(16),
sessionid varchar(100) NOT NULL,
created timestamptz NOT NULL,
filename varchar(256),
funcname varchar(256),
linenum integer,
comment text,
domain varchar(128),
code integer,
latitude float8,
longitude float8,
created_on_server timestamptz NOT NULL,
message_id integer, (FOREIGN KEY)
app_id integer NOT NULL, (FOREIGN KEY)
result integer
);
CREATE INDEX client_log_code_idx ON client_log USING btree (code);
CREATE INDEX client_log_created_idx ON client_log USING btree (created);
CREATE INDEX clients_clientlog_app_id ON client_log USING btree (app_id);
CREATE INDEX clients_clientlog_client_id ON client_log USING btree (client_id);
CREATE UNIQUE INDEX clients_clientlog_logid_client_id_key ON client_log USING btree (logid, client_id);
CREATE INDEX clients_clientlog_message_id ON client_log USING btree (message_id);
And an example query:
SELECT
client_log.comment,
COUNT(client_log.comment) AS count
FROM
client_log
WHERE
client_log.app_id = 33 AND
client_log.code = 3 AND
client_log.client_id IN (SELECT client.id FROM client WHERE
client.app_id = 33 AND
client."replaced_id" IS NULL)
GROUP BY client_log.comment ORDER BY count DESC;
client_log_code_idx is the index needed for the query above. There is other queries needing the client_log_created_idx index.
And the query plan:
Sort (cost=2844.72..2844.75 rows=11 width=242) (actual time=4684.113..4684.180 rows=70 loops=1)
Sort Key: (count(client_log.comment))
Sort Method: quicksort Memory: 32kB
-> HashAggregate (cost=2844.42..2844.53 rows=11 width=242) (actual time=4683.830..4683.907 rows=70 loops=1)
-> Hash Semi Join (cost=1358.52..2844.32 rows=20 width=242) (actual time=303.515..4681.211 rows=1202 loops=1)
Hash Cond: (client_log.client_id = client.id)
-> Bitmap Heap Scan on client_log (cost=1108.02..2592.57 rows=387 width=246) (actual time=113.599..4607.568 rows=6962 loops=1)
Recheck Cond: ((app_id = 33) AND (code = 3))
-> BitmapAnd (cost=1108.02..1108.02 rows=387 width=0) (actual time=104.955..104.955 rows=0 loops=1)
-> Bitmap Index Scan on clients_clientlog_app_id (cost=0.00..469.96 rows=25271 width=0) (actual time=58.315..58.315 rows=40662 loops=1)
Index Cond: (app_id = 33)
-> Bitmap Index Scan on client_log_code_idx (cost=0.00..637.61 rows=34291 width=0) (actual time=45.093..45.093 rows=36310 loops=1)
Index Cond: (code = 3)
-> Hash (cost=248.06..248.06 rows=196 width=4) (actual time=61.069..61.069 rows=105 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 4kB
-> Bitmap Heap Scan on client (cost=10.95..248.06 rows=196 width=4) (actual time=27.843..60.867 rows=105 loops=1)
Recheck Cond: (app_id = 33)
Filter: (replaced_id IS NULL)
Rows Removed by Filter: 271
-> Bitmap Index Scan on clients_client_app_id (cost=0.00..10.90 rows=349 width=0) (actual time=15.144..15.144 rows=380 loops=1)
Index Cond: (app_id = 33)
Total runtime: 4684.843 ms

In general, in a system where time related data is constantly being inserted into the database, I'd recommend partitioning according to time.
This is not just because it might improve query times, but because otherwise it makes managing the data difficult. However big your hardware is, it will have a limit to its capacity, so you will eventually have to start removing rows that are older than a certain date. The rate at which you remove the rows will have to be equal to the rate they are going in.
If you just have one big table, and you remove old rows using DELETE, you will leave a lot of dead tuples that need to be vacuumed out. The autovacuum will be running constantly, using up valuable disk IO.
On the other hand, if you partition according to time, then removing out of date data is as easy as dropping the relevant child table.
In terms of indexes - the indexes are not inherited, so you can save on creating the indexes until after the partition is loaded. You could have a partition size of 1 day in your use case. This means the indexes do not need to be constantly updated as data is being inserted. It will be more practical to have additional indexes as needed to make your queries perform.
Your sample query does not filter on the 'created' time field, but you say other queries do. If you partition by time, and are careful about how you construct your queries, constraint exclusion will kick in and it will only include the specific partitions that are relevant to the query.

Except for partitioning I would consider splitting the table into many tables, aka Sharding.
I don't have the full picture of your domain but these are some suggestions:
Each client get their own table in their own schema (or a set of clients share a schema depending on how many clients you have and how many new clients you expect to get).
create table client1.log(id, logid,.., code, app_id);
create table client2.log(id, logid,.., code, app_id);
Splitting the table like this should also reduce the contention on inserts.
The table can be split even more. Within each client-schema you can also split the table per "code" or "app_id" or something else that makes sense for you. This might be overdoing it but it is easy to implement if the number of "code" and/or "app_id" values do not change often.
Do keep the code/app_id columns even in the new smaller tables but do put a constraint on the column so that no other type of log record can be inserted. The constraint will also help the optimiser when searching, see this example:
create schema client1;
set search_path = 'client1';
create table error_log(id serial, code text check(code ='error'));
create table warning_log(id serial, code text check(code ='warning'));
create table message_log(id serial, code text check(code ='message'));
To get the full picture (all rows) of a client you can use a view on top of all tables:
create view client_log as
select * from error_log
union all
select * from warning_log
union all
select * from message_log;
The check constraints should allow the optimiser to only search the table where the "code" can exist.
explain
select * from client_log where code = 'error';
-- Output
Append (cost=0.00..25.38 rows=6 width=36)
-> Seq Scan on error_log (cost=0.00..25.38 rows=6 width=36)
Filter: (code = 'error'::text)

Related

Speeding up sorting with filtering by joined column

I have the following database schema which is the base layer of an NFT marketplace I'm working on:
CREATE TABLE collections (
id TEXT PRIMARY KEY
);
-- Implicit collection_id -> collections fk
CREATE TABLE tokens (
contract BYTEA NOT NULL,
token_id NUMERIC NOT NULL,
collection_id TEXT,
PRIMARY KEY(contract, token_id)
);
CREATE INDEX tokens_collection_id_contract_token_index
ON tokens (collection_id, contract, token_id);
-- Implicit collection_id -> collections fk
CREATE TABLE attribute_keys (
id BIGSERIAL PRIMARY KEY,
collection_id TEXT NOT NULL,
key TEXT NOT NULL
);
-- Implicit attribute_key_id -> attribute_keys fk
CREATE TABLE attributes (
id BIGSERIAL PRIMARY KEY,
attribute_key_id BIGINT NOT NULL,
value TEXT NOT NULL
);
-- Implicit contract, token_id -> tokens fk
-- collection_id, key, value denormalized from the other attribute tables
CREATE TABLE token_attributes (
attribute_id BIGINT NOT NULL,
contract BYTEA NOT NULL,
token_id NUMERIC NOT NULL,
collection_id TEXT NOT NULL,
key TEXT NOT NULL,
value TEXT NOT NULL,
PRIMARY KEY(contract, token_id, attribute_id)
);
CREATE INDEX token_attributes_collection_id_key_value_contract_token_id_index
ON token_attributes (collection_id, key, value, contract, token_id);
-- Implicit address, token_id -> tokens fk
CREATE TABLE nft_transfer_events (
id BIGSERIAL PRIMARY KEY,
block INT NOT NULL,
address BYTEA NOT NULL,
token_id NUMERIC NOT NULL
);
CREATE INDEX nft_transfer_events_contract_token_id_block_index
ON nft_transfer_events (address, token_id, block DESC);
Two of the main use-cases we have are getting the latest events for a collection and getting the latest events for all tokens within a collection that match the given attribute filter. Unfortunately, both of these queries are quite inefficient since it involves filtering by columns in joined tables.
Here's the query to filter transfer events to a particular collection:
# explain analyze select * from nft_transfer_events nte join tokens t on nte.address = t.contract and nte.token_id = t.token_id where t.collection_id = '0xbc4ca0eda7647a8ab7c2061c2e118a18a936f13d' order by block desc limit 20;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=43312.03..43314.36 rows=20 width=1071) (actual time=52990.391..53040.926 rows=20 loops=1)
-> Gather Merge (cost=43312.03..180799.76 rows=1178384 width=1071) (actual time=52990.388..53040.919 rows=20 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=42312.00..43784.98 rows=589192 width=1071) (actual time=52789.050..52789.056 rows=16 loops=3)
Sort Key: nte.block DESC
Sort Method: top-N heapsort Memory: 49kB
Worker 0: Sort Method: top-N heapsort Memory: 47kB
Worker 1: Sort Method: top-N heapsort Memory: 43kB
-> Nested Loop (cost=1.26..26633.82 rows=589192 width=1071) (actual time=0.316..52759.414 rows=20586 loops=3)
-> Parallel Index Scan using tokens_collection_id_contract_token_id_index on tokens t (cost=0.69..12539.19 rows=5065 width=907) (actual time=0.137..343.623 rows=3333 loops=3)
Index Cond: (collection_id = '0xbc4ca0eda7647a8ab7c2061c2e118a18a936f13d'::text)
-> Index Scan using nft_transfer_events_address_token_id_block_index on nft_transfer_events nte (cost=0.57..2.77 rows=1 width=164) (actual time=3.580..15.719 rows=6 loops=10000)
Index Cond: ((address = t.contract) AND (token_id = t.token_id))
Planning Time: 12.243 ms
Execution Time: 53041.192 ms
(16 rows)
And here's the query to filter transfer events to tokens in a collection having a particular attribute (this can get even messier when I need to filter by multiple attributes - which would involve multiple joins on the token_attributes table):
# explain analyze select * from nft_transfer_events nte join token_attributes ta on nte.address = ta.contract and nte.token_id = ta.token_id where ta.collection_id = '0xbc4ca0eda7647a8ab7c2061c2e118a18a936f13d' and ta.key = 'Fur' and ta.value = 'Tan' order by block desc limit 20;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=5.72..5.73 rows=1 width=258) (actual time=18173.114..18173.122 rows=20 loops=1)
-> Sort (cost=5.72..5.73 rows=1 width=258) (actual time=18173.112..18173.117 rows=20 loops=1)
Sort Key: nte.block DESC
Sort Method: top-N heapsort Memory: 45kB
-> Nested Loop (cost=1.26..5.71 rows=1 width=258) (actual time=0.164..18166.759 rows=3974 loops=1)
-> Index Scan using token_attributes_collection_id_key_value_contract_token_id_index on token_attributes ta (cost=0.69..2.91 rows=1 width=94) (actual time=0.098..524.036 rows=626 loops=1)
Index Cond: ((collection_id = '0xbc4ca0eda7647a8ab7c2061c2e118a18a936f13d'::text) AND (key = 'Fur'::text) AND (value = 'Tan'::text))
-> Index Scan using nft_transfer_events_address_token_id_block_index on nft_transfer_events nte (cost=0.57..2.79 rows=1 width=164) (actual time=6.216..28.175 rows=6 loops=626)
Index Cond: ((address = ta.contract) AND (token_id = ta.token_id))
Planning Time: 61.328 ms
Execution Time: 18173.249 ms
What I'm looking for is ways to speed up these two particular queries. I would be open to even redesign the schema to better fit these two use-cases.
In case it might be relevant, the table sizes are as follows:
nft_transfer_events: ~100 million rows
tokens: ~50 million rows
token_attributes: ~200 million rows
Thank you in advance!
In the title you mention speeding up sorting, but the sort itself takes a trivial amount of time. It is waiting for the inputs to the sort that take all the time. And most of that is consumed at the bottom of the nested loop:
-> Index Scan using nft_transfer_events_address_token_id_block_index on nft_transfer_events nte (cost=0.57..2.77 rows=1 width=164) (actual time=3.580..15.719 rows=6 loops=10000)
Index Cond: ((address = t.contract) AND (token_id = t.token_id))
Multiplying loops by actual time (per loop) gets 157 seconds. Since it is happening over 3 processes in parallel, you can divide by 3 to see that this one thing is almost the entire amount of time spent on the query.
The index is well-suited to the way is is being used, so I have to assume it is bottlenecked by IO. For each loop it has to jump to a random index leaf page to find the indexed data, then to 6 random table pages to get the remaining data. The only data in the table which is not in the index is nft_transfer_events.id. So if you either added "id" to the end of indexed columns in that index (by recreating the index), or refrained from selecting nft_transfer_events.id in the first place (by replacing * with just the list of columns you need) you should be able to get an index-only scan, which should be like 7 times faster if the table is well vacuumed.
If you can't do that because there are more column in the table you need that you didn't tell us about, one way speed it up would be to CLUSTER nft_transfer_events on the index. That way the reads into the table would not be random, but would hit the same block over and over because related nft would be next to each other. The problem here is that CLUSTER takes a strong lock on the table, and has to be repeated occasionally as it only clusters existing data not future data.
Or, you could get faster storage so reading random pages doesn't take as long.
A radically different approach would be try to get a plan that reads the data already in order, filtering out the rows it needs to, and stopping once it finds 20 which pass the filter. It is hard to say if this would be better or not, because I cannot predict how many of the most recent blocks belong to the collection (or other criteria) you care about. To do this, you would need an index on nft_transfer_events that starts with "block", so that can be read in block order. (You might want to add the other columns to end of that index so it can get an index-only scan on it). But then you would need to be able to efficiently determine if an identified token met your other criteria, but it looks like you already have a good index to do that.

Why isn't Postgres using the index with Distinct?

I have this table:
CREATE TABLE public.prodhistory (
curve_id int4 NOT NULL,
start_prod_date date NOT NULL,
prod_date date NOT NULL,
monthly_prod_rate float4 NOT NULL,
eff_date timestamp NOT NULL,
/* Keys */
CONSTRAINT prodhistorypk
PRIMARY KEY (curve_id, prod_date, start_prod_date, eff_date),
/* Foreign keys */
CONSTRAINT prodhistory2typecurves_fk
FOREIGN KEY (curve_id)
REFERENCES public.typecurves(curve_id)
) WITH (
OIDS = FALSE
);
CREATE INDEX prodhistory_idx_curve_id01
ON public.prodhistory
(curve_id);
with ~42M rows.
And I execute this query:
SELECT DISTINCT curve_id FROM prodhistory
Which I expect would be very quick, given the index. But no, 270 secs. So I explain, and I get:
HashAggregate (cost=824870.03..824873.08 rows=305 width=4) (actual time=211834.018..211834.097 rows=315 loops=1)
Output: curve_id
Group Key: prodhistory.curve_id
-> Seq Scan on public.prodhistory (cost=0.00..718003.22 rows=42746722 width=4) (actual time=12.751..200826.299 rows=43218808 loops=1)
Output: curve_id
Planning time: 0.115 ms
Execution time: 211848.137 ms
I'm not to experienced in reading these plans, but a Seq Scan on the DB seems bad.
Any thoughts? I'm sort of stumped.
This plan is chosen because PostgreSQL thinks it is cheaper.
You can compare by setting
SET enable_seqscan=off;
and then re-running your EXPLAIN (ANALYZE) statement. Compare cost and actual time in both cases and check if PostgreSQL estimated correctly or not.
If you find that using an Index Scan or Index Only Scan is actually cheaper, you could consider twiddling the cost parameters to match your machine better, e.g. lower random_page_cost or cpu_index_tuple_cost or raise cpu_tuple_cost.
PostgreSQL "index only scans" aren't always as cheap as you might think.
The reason is that each row needs to be checked as to whether it is visible to the MVCC snapshot or not.
Whether this is cheap or not depends on the table's visibility map.
If you force an index only scan (as per laurenz-albe's answer):
SET enable_seqscan=off;
Then run your query with:
EXPLAIN (ANALYZE ON, BUFFERS ON)
And see query plan output with "heap fetches" as below this means that the table's actual row data is being accessed, not just the index.
Index Only Scan using my_index on my_table (cost=0.42..17792.01 rows=595195 width=20) (actual time=37.942..2330.737 rows=539105 loops=1)
Heap Fetches: 234180
The official documentation describes this here:
https://www.postgresql.org/docs/current/indexes-index-only-scans.html
You may be able to resolve this by altering the way the table is updated, or by adjusting your auto vacuum settings.

Why cost is increased by adding indexes?

I'm using postgresql 9.4.6.
There are the following entities:
CREATE TABLE user (id CHARACTER VARYING NOT NULL PRIMARY KEY);
CREATE TABLE group (id CHARACTER VARYING NOT NULL PRIMARY KEY);
CREATE TABLE group_member (
id CHARACTER VARYING NOT NULL PRIMARY KEY,
gid CHARACTER VARYING REFERENCES group(id),
uid CHARACTER VARYING REFERENCES user(id));
I analyze that query:
explain analyze select x2."gid" from "group_member" x2 where x2."uid" = 'a1';
I have several results. Before each result I flushed OS-caches and restarted postgres:
# /etc/init.d/postgresql stop
# sync
# echo 3 > /proc/sys/vm/drop_caches
# /etc/init.d/postgresql start
The results of analyzing are:
1) cost=4.17..11.28 with indexes:
create index "group_member_gid_idx" on "group_member" ("gid");
create index "group_member_uid_idx" on "group_member" ("uid");
Bitmap Heap Scan on group_member x2 (cost=4.17..11.28 rows=3 width=32) (actual time=0.021..0.021 rows=0 loops=1)
Recheck Cond: ((uid)::text = 'a1'::text)
-> Bitmap Index Scan on group_member_uid_idx (cost=0.00..4.17 rows=3 width=0) (actual time=0.005..0.005 rows=0 loops=1)
Index Cond: ((uid)::text = 'a1'::text)
Planning time: 28.641 ms
Execution time: 0.359 ms
2) cost=7.97..15.08 with indexes:
create unique index "group_member_gid_uid_idx" on "group_member" ("gid","uid");
Bitmap Heap Scan on group_member x2 (cost=7.97..15.08 rows=3 width=32) (actual time=0.013..0.013 rows=0 loops=1)
Recheck Cond: ((uid)::text = 'a1'::text)
-> Bitmap Index Scan on group_member_gid_uid_idx (cost=0.00..7.97 rows=3 width=0) (actual time=0.006..0.006 rows=0 loops=1)
Index Cond: ((uid)::text = 'a1'::text)
Planning time: 0.132 ms
Execution time: 0.047 ms
3) cost=0.00..16.38 without any indexes:
Seq Scan on group_member x2 (cost=0.00..16.38 rows=3 width=32) (actual time=0.002..0.002 rows=0 loops=1)
Filter: ((uid)::text = 'a1'::text)
Planning time: 42.599 ms
Execution time: 0.402 ms
Is a result #3 more effective? And why?
EDIT
There will be many rows in tables (group, user, group_members) in practice. About > 1 Million.
When analyzing queries, the costs and query plans on small data sets are not generally not a reliable guide to performance on larger data sets. And, SQL is more concerned with larger data sets than with trivially small ones.
The reading of data from disk is often the driving factor in query performance. The main purpose of using an index is to reduce the number of data pages being read. If all the data in the table fits on a single data page, then there isn't much opportunity for reducing the number of page reads: It takes the same amount of time to read one page, whether the page has one record or 100 records. (Reading through a page to find the right record also incurs overhead, whereas an index would identify the specific record on the page.)
Indexes incur overhead, but typically much, much less than reading a data page. The index itself needs to be read into memory -- so that means that two pages are being read into memory rather than one. One could argue that for tables that fit on one or two pages, the use of an index is probably not a big advantage.
Although using the index (in this case) does take longer, differences in performance measured in fractions of a millisecond are generally not germane to most database tasks. If you want to see the index do its work, put 100,000 rows in the table and run the same tests. You'll see that the version without the index scales roughly in proportion to the amount of data in the table; the version with the index is relatively constant (well, actually scaling more like the log of the number of records in the table).

Prevent usage of index for a particular query in Postgres

I have a slow query in a Postgres DB. Using explain analyze, I can see that Postgres makes bitmap index scan on two different indexes followed by bitmap AND on the two resulting sets.
Deleting one of the indexes makes the evaluation ten times faster (bitmap index scan is still used on the first index). However, that deleted index is useful in other queries.
Query:
select
booking_id
from
booking
where
substitute_confirmation_token is null
and date_trunc('day', from_time) >= cast('01/25/2016 14:23:00.004' as date)
and from_time >= '01/25/2016 14:23:00.004'
and type = 'LESSON_SUBSTITUTE'
and valid
order by
booking_id;
Indexes:
"idx_booking_lesson_substitute_day" btree (date_trunc('day'::text, from_time)) WHERE valid AND type::text = 'LESSON_SUBSTITUTE'::text
"booking_substitute_confirmation_token_key" UNIQUE CONSTRAINT, btree (substitute_confirmation_token)
Query plan:
Sort (cost=287.26..287.26 rows=1 width=8) (actual time=711.371..711.377 rows=44 loops=1)
Sort Key: booking_id
Sort Method: quicksort Memory: 27kB
Buffers: shared hit=8 read=7437 written=1
-> Bitmap Heap Scan on booking (cost=275.25..287.25 rows=1 width=8) (actual time=711.255..711.294 rows=44 loops=1)
Recheck Cond: ((date_trunc('day'::text, from_time) >= '2016-01-25'::date) AND valid AND ((type)::text = 'LESSON_SUBSTITUTE'::text) AND (substitute_confirmation_token IS NULL))
Filter: (from_time >= '2016-01-25 14:23:00.004'::timestamp without time zone)
Buffers: shared hit=5 read=7437 written=1
-> BitmapAnd (cost=275.25..275.25 rows=3 width=0) (actual time=711.224..711.224 rows=0 loops=1)
Buffers: shared hit=5 read=7433 written=1
-> Bitmap Index Scan on idx_booking_lesson_substitute_day (cost=0.00..20.50 rows=594 width=0) (actual time=0.080..0.080 rows=72 loops=1)
Index Cond: (date_trunc('day'::text, from_time) >= '2016-01-25'::date)
Buffers: shared hit=5 read=1
-> Bitmap Index Scan on booking_substitute_confirmation_token_key (cost=0.00..254.50 rows=13594 width=0) (actual time=711.102..711.102 rows=2718734 loops=1)
Index Cond: (substitute_confirmation_token IS NULL)
Buffers: shared read=7432 written=1
Total runtime: 711.436 ms
Can I prevent using a particular index for a particular query in Postgres?
Your clever solution
You already found a clever solution for your particular case: A partial unique index that only covers rare values, so Postgres won't (can't) use the index for the common NULL value.
CREATE UNIQUE INDEX booking_substitute_confirmation_uni
ON booking (substitute_confirmation_token)
WHERE substitute_confirmation_token IS NOT NULL;
It's a textbook use-case for a partial index. Literally! The manual has a similar example and these perfectly matching advice to go with it:
Finally, a partial index can also be used to override the system's
query plan choices. Also, data sets with peculiar distributions might
cause the system to use an index when it really should not. In that
case the index can be set up so that it is not available for the
offending query. Normally, PostgreSQL makes reasonable choices about
index usage (e.g., it avoids them when retrieving common values, so
the earlier example really only saves index size, it is not required
to avoid index usage), and grossly incorrect plan choices are cause
for a bug report.
Keep in mind that setting up a partial index indicates that you know
at least as much as the query planner knows, in particular you know
when an index might be profitable. Forming this knowledge requires
experience and understanding of how indexes in PostgreSQL work. In
most cases, the advantage of a partial index over a regular index will
be minimal.
You commented: The table has few millions of rows and just few thousands of rows with not null values, so this is a perfect use-case. It will even speed up queries on non-null values for substitute_confirmation_token because the index is much smaller now.
Answer to question
To answer your original question: it's not possible to "disable" an existing index for a particular query. You would have to drop it, but that's way to expensive.
Fake drop index
You could drop an index inside a transaction, run your SELECT and then, instead of committing, use ROLLBACK. That's fast, but be aware that (per documentation):
A normal DROP INDEX acquires exclusive lock on the table, blocking
other accesses until the index drop can be completed.
So this is no good for multi-user environments.
BEGIN;
DROP INDEX big_user_id_created_at_idx;
SELECT ...;
ROLLBACK; -- so the index is preserved after all
More detailed statistics
Normally, though, it should be enough to raise the STATISTICS target for the column, so Postgres can more reliably identify common values and avoid the index for those. Try:
ALTER TABLE booking ALTER COLUMN substitute_confirmation_token SET STATISTICS 2000;
Then: ANALYZE booking; before you try your query again. 2000 is an example value. Related:
Keep PostgreSQL from sometimes choosing a bad query plan

Why is PostgreSQL not using *just* the covering index in this query depending on the contents of its IN() clause?

I have a table with a covering index that should respond to a query using just the index, without checking the table at all. Postgres does, in fact, do that, if the IN() clause has 1 or a few elements in it. However, if the IN clause has lots of elements, it seems like it's doing the search on the index, and then going to the table and re-checking the conditions...
I can't figure out why Postgres would do that. It can either serve the query straight from the index or it can't, why would it go to the table if it (in theory) doesn't have anything else to add?
The table:
CREATE TABLE phone_numbers
(
id serial NOT NULL,
phone_number character varying,
hashed_phone_number character varying,
user_id integer,
created_at timestamp without time zone,
updated_at timestamp without time zone,
ghost boolean DEFAULT false,
CONSTRAINT phone_numbers_pkey PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
CREATE INDEX index_phone_numbers_covering_hashed_ghost_and_user
ON phone_numbers
USING btree
(hashed_phone_number COLLATE pg_catalog."default", ghost, user_id);
The query I'm running is :
SELECT "phone_numbers"."user_id"
FROM "phone_numbers"
WHERE "phone_numbers"."hashed_phone_number" IN (*several numbers*)
AND "phone_numbers"."ghost" = 'f'
As you can see, the index has all the fields it needs to reply to that query.
And if I have only one or a few numbers in the IN clause, it does:
1 number:
Index Scan using index_phone_numbers_on_hashed_phone_number on phone_numbers (cost=0.41..8.43 rows=1 width=4)
Index Cond: ((hashed_phone_number)::text = 'bebd43a6eb29b2fda3bcb63dcc7ffaf5433e78660ccd1a495c1180a3eaaf6b6a'::text)
Filter: (NOT ghost)"
3 numbers:
Index Only Scan using index_phone_numbers_covering_hashed_ghost_and_user on phone_numbers (cost=0.42..17.29 rows=1 width=4)
Index Cond: ((hashed_phone_number = ANY ('{8228a8116f1fdb12e243102cb85ecd859ebf7873d9332dce5f1343a481ec72e8,43ddeebdca2ea829d468d5debc84d475c8322cf4bf6edca286c918b04216387e,1578bf773eb6eb8a9b57a130922a28c9c91f1bda67202ef5936b39630ca4cfe4}'::text[])) AND (...)
Filter: (NOT ghost)"
However, when I have a lot of numbers in the IN clause, Postgres is using the Index, but then hitting the table, and I don't know why:
Bitmap Heap Scan on phone_numbers (cost=926.59..1255.81 rows=106 width=4)
Recheck Cond: ((hashed_phone_number)::text = ANY ('{b6459ce58f21d99c462b132cce7adc9ea947fa522a3849321e9fb65893006a5e,8228a8116f1fdb12e243102cb85ecd859ebf7873d9332dce5f1343a481ec72e8,ab3554acc1f287bb2e22ff20bb855e19a4177ef552676689d217dbb2a1a6177b,7ec9f58 (...)
Filter: (NOT ghost)
-> Bitmap Index Scan on index_phone_numbers_covering_hashed_ghost_and_user (cost=0.00..926.56 rows=106 width=0)
Index Cond: (((hashed_phone_number)::text = ANY ('{b6459ce58f21d99c462b132cce7adc9ea947fa522a3849321e9fb65893006a5e,8228a8116f1fdb12e243102cb85ecd859ebf7873d9332dce5f1343a481ec72e8,ab3554acc1f287bb2e22ff20bb855e19a4177ef552676689d217dbb2a1a6177b,7e (...)
This is currently making this query, which is looking for 250 records in a table with 50k total rows, about twice as low as a similar query on another table, which looks for 250 records in a table with 5 million rows, which doesn't make much sense.
Any ideas what could be happening, and whether I can do anything to improve this?
UPDATE: Changing the order of the columns in the covering index to have first ghost and then hashed_phone_number also doesn't solve it:
Bitmap Heap Scan on phone_numbers (cost=926.59..1255.81 rows=106 width=4)
Recheck Cond: ((hashed_phone_number)::text = ANY ('{b6459ce58f21d99c462b132cce7adc9ea947fa522a3849321e9fb65893006a5e,8228a8116f1fdb12e243102cb85ecd859ebf7873d9332dce5f1343a481ec72e8,ab3554acc1f287bb2e22ff20bb855e19a4177ef552676689d217dbb2a1a6177b,7ec9f58 (...)
Filter: (NOT ghost)
-> Bitmap Index Scan on index_phone_numbers_covering_ghost_hashed_and_user (cost=0.00..926.56 rows=106 width=0)
Index Cond: ((ghost = false) AND ((hashed_phone_number)::text = ANY ('{b6459ce58f21d99c462b132cce7adc9ea947fa522a3849321e9fb65893006a5e,8228a8116f1fdb12e243102cb85ecd859ebf7873d9332dce5f1343a481ec72e8,ab3554acc1f287bb2e22ff20bb855e19a4177ef55267668 (...)
The choice of indexes is based on what the optimizer says is the best solution for the query. Postgres is trying really hard with your index, but it is not the best index for the query.
The best index has ghost first:
CREATE INDEX index_phone_numbers_covering_hashed_ghost_and_user
ON phone_numbers
USING btree
(ghost, hashed_phone_number COLLATE pg_catalog."default", user_id);
I happen to think that MySQL documentation does a good job of explaining how composite indexes are used.
Essentially, what is happening is that Postgres needs to do an index seek for every element of the in list. This may be compounded by the use of strings -- because collations/encodings affect the comparisons. Eventually, Postgres decides that other approaches are more efficient. If you put ghost first, then it will just jump to the right part of the index and find the rows it needs there.