I have the following structure:
create table bitmex
(
timestamp timestamp with time zone not null,
symbol varchar(255) not null,
side varchar(255) not null,
tid varchar(255) not null,
size numeric not null,
price numeric not null,
constraint bitmex_tid_symbol_pk
primary key (tid, symbol)
);
create index bitmex_timestamp_symbol_index on bitmex (timestamp, symbol);
create index bitmex_symbol_index on bitmex (symbol);
I need to know the exact value of the quantity every time. So reltuples is not usable.
The table has more than 45,000,000 rows.
Running
explain analyze select count(*) from bitmex where symbol = 'XBTUSD';
gives
Finalize Aggregate (cost=1038428.56..1038428.57 rows=1 width=8)
-> Gather (cost=1038428.35..1038428.56 rows=2 width=8)
Workers Planned: 2
-> Partial Aggregate (cost=1037428.35..1037428.36 rows=1 width=8)
-> Parallel Seq Scan on bitmex (cost=0.00..996439.12 rows=16395690 width=0)
Filter: ((symbol)::text = 'XBTUSD'::text)
Running
explain analyze select count(*) from bitmex;
gives
Finalize Aggregate (cost=997439.34..997439.35 rows=1 width=8) (actual time=6105.463..6105.463 rows=1 loops=1)
-> Gather (cost=997439.12..997439.33 rows=2 width=8) (actual time=6105.444..6105.457 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=996439.12..996439.14 rows=1 width=8) (actual time=6085.960..6085.960 rows=1 loops=3)
-> Parallel Seq Scan on bitmex (cost=0.00..954473.50 rows=16786250 width=0) (actual time=0.364..4342.460 rows=13819096 loops=3)
Planning time: 0.080 ms
Execution time: 6108.277 ms
Why it did not use indexes?
Thanks
If all rows have to be visited, an index scan is only cheaper if the table does not have to be consulted for most of the values found in the index.
Due to the way PostgreSQL is organized, the table has to be visited to determine if the entry found in the index is visible or not. This step can be skipped if the whole page is marked as “visible” in the visibility map of the table.
To update the visibility map, run VACUUM on the table. Maybe then an index only scan will be used.
But counting the number of rows in a table is never cheap, even with an index scan. If you need to do that often, it may be a good idea to have a separate table that only contains a counter for the number of rows. Then you can write triggers that update the counter whenever rows are inserted or deleted.
That will slow down the performance during INSERT and DELETE, but you can count the rows with lightning speed.
Related
I have the following database schema which is the base layer of an NFT marketplace I'm working on:
CREATE TABLE collections (
id TEXT PRIMARY KEY
);
-- Implicit collection_id -> collections fk
CREATE TABLE tokens (
contract BYTEA NOT NULL,
token_id NUMERIC NOT NULL,
collection_id TEXT,
PRIMARY KEY(contract, token_id)
);
CREATE INDEX tokens_collection_id_contract_token_index
ON tokens (collection_id, contract, token_id);
-- Implicit collection_id -> collections fk
CREATE TABLE attribute_keys (
id BIGSERIAL PRIMARY KEY,
collection_id TEXT NOT NULL,
key TEXT NOT NULL
);
-- Implicit attribute_key_id -> attribute_keys fk
CREATE TABLE attributes (
id BIGSERIAL PRIMARY KEY,
attribute_key_id BIGINT NOT NULL,
value TEXT NOT NULL
);
-- Implicit contract, token_id -> tokens fk
-- collection_id, key, value denormalized from the other attribute tables
CREATE TABLE token_attributes (
attribute_id BIGINT NOT NULL,
contract BYTEA NOT NULL,
token_id NUMERIC NOT NULL,
collection_id TEXT NOT NULL,
key TEXT NOT NULL,
value TEXT NOT NULL,
PRIMARY KEY(contract, token_id, attribute_id)
);
CREATE INDEX token_attributes_collection_id_key_value_contract_token_id_index
ON token_attributes (collection_id, key, value, contract, token_id);
-- Implicit address, token_id -> tokens fk
CREATE TABLE nft_transfer_events (
id BIGSERIAL PRIMARY KEY,
block INT NOT NULL,
address BYTEA NOT NULL,
token_id NUMERIC NOT NULL
);
CREATE INDEX nft_transfer_events_contract_token_id_block_index
ON nft_transfer_events (address, token_id, block DESC);
Two of the main use-cases we have are getting the latest events for a collection and getting the latest events for all tokens within a collection that match the given attribute filter. Unfortunately, both of these queries are quite inefficient since it involves filtering by columns in joined tables.
Here's the query to filter transfer events to a particular collection:
# explain analyze select * from nft_transfer_events nte join tokens t on nte.address = t.contract and nte.token_id = t.token_id where t.collection_id = '0xbc4ca0eda7647a8ab7c2061c2e118a18a936f13d' order by block desc limit 20;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=43312.03..43314.36 rows=20 width=1071) (actual time=52990.391..53040.926 rows=20 loops=1)
-> Gather Merge (cost=43312.03..180799.76 rows=1178384 width=1071) (actual time=52990.388..53040.919 rows=20 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=42312.00..43784.98 rows=589192 width=1071) (actual time=52789.050..52789.056 rows=16 loops=3)
Sort Key: nte.block DESC
Sort Method: top-N heapsort Memory: 49kB
Worker 0: Sort Method: top-N heapsort Memory: 47kB
Worker 1: Sort Method: top-N heapsort Memory: 43kB
-> Nested Loop (cost=1.26..26633.82 rows=589192 width=1071) (actual time=0.316..52759.414 rows=20586 loops=3)
-> Parallel Index Scan using tokens_collection_id_contract_token_id_index on tokens t (cost=0.69..12539.19 rows=5065 width=907) (actual time=0.137..343.623 rows=3333 loops=3)
Index Cond: (collection_id = '0xbc4ca0eda7647a8ab7c2061c2e118a18a936f13d'::text)
-> Index Scan using nft_transfer_events_address_token_id_block_index on nft_transfer_events nte (cost=0.57..2.77 rows=1 width=164) (actual time=3.580..15.719 rows=6 loops=10000)
Index Cond: ((address = t.contract) AND (token_id = t.token_id))
Planning Time: 12.243 ms
Execution Time: 53041.192 ms
(16 rows)
And here's the query to filter transfer events to tokens in a collection having a particular attribute (this can get even messier when I need to filter by multiple attributes - which would involve multiple joins on the token_attributes table):
# explain analyze select * from nft_transfer_events nte join token_attributes ta on nte.address = ta.contract and nte.token_id = ta.token_id where ta.collection_id = '0xbc4ca0eda7647a8ab7c2061c2e118a18a936f13d' and ta.key = 'Fur' and ta.value = 'Tan' order by block desc limit 20;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=5.72..5.73 rows=1 width=258) (actual time=18173.114..18173.122 rows=20 loops=1)
-> Sort (cost=5.72..5.73 rows=1 width=258) (actual time=18173.112..18173.117 rows=20 loops=1)
Sort Key: nte.block DESC
Sort Method: top-N heapsort Memory: 45kB
-> Nested Loop (cost=1.26..5.71 rows=1 width=258) (actual time=0.164..18166.759 rows=3974 loops=1)
-> Index Scan using token_attributes_collection_id_key_value_contract_token_id_index on token_attributes ta (cost=0.69..2.91 rows=1 width=94) (actual time=0.098..524.036 rows=626 loops=1)
Index Cond: ((collection_id = '0xbc4ca0eda7647a8ab7c2061c2e118a18a936f13d'::text) AND (key = 'Fur'::text) AND (value = 'Tan'::text))
-> Index Scan using nft_transfer_events_address_token_id_block_index on nft_transfer_events nte (cost=0.57..2.79 rows=1 width=164) (actual time=6.216..28.175 rows=6 loops=626)
Index Cond: ((address = ta.contract) AND (token_id = ta.token_id))
Planning Time: 61.328 ms
Execution Time: 18173.249 ms
What I'm looking for is ways to speed up these two particular queries. I would be open to even redesign the schema to better fit these two use-cases.
In case it might be relevant, the table sizes are as follows:
nft_transfer_events: ~100 million rows
tokens: ~50 million rows
token_attributes: ~200 million rows
Thank you in advance!
In the title you mention speeding up sorting, but the sort itself takes a trivial amount of time. It is waiting for the inputs to the sort that take all the time. And most of that is consumed at the bottom of the nested loop:
-> Index Scan using nft_transfer_events_address_token_id_block_index on nft_transfer_events nte (cost=0.57..2.77 rows=1 width=164) (actual time=3.580..15.719 rows=6 loops=10000)
Index Cond: ((address = t.contract) AND (token_id = t.token_id))
Multiplying loops by actual time (per loop) gets 157 seconds. Since it is happening over 3 processes in parallel, you can divide by 3 to see that this one thing is almost the entire amount of time spent on the query.
The index is well-suited to the way is is being used, so I have to assume it is bottlenecked by IO. For each loop it has to jump to a random index leaf page to find the indexed data, then to 6 random table pages to get the remaining data. The only data in the table which is not in the index is nft_transfer_events.id. So if you either added "id" to the end of indexed columns in that index (by recreating the index), or refrained from selecting nft_transfer_events.id in the first place (by replacing * with just the list of columns you need) you should be able to get an index-only scan, which should be like 7 times faster if the table is well vacuumed.
If you can't do that because there are more column in the table you need that you didn't tell us about, one way speed it up would be to CLUSTER nft_transfer_events on the index. That way the reads into the table would not be random, but would hit the same block over and over because related nft would be next to each other. The problem here is that CLUSTER takes a strong lock on the table, and has to be repeated occasionally as it only clusters existing data not future data.
Or, you could get faster storage so reading random pages doesn't take as long.
A radically different approach would be try to get a plan that reads the data already in order, filtering out the rows it needs to, and stopping once it finds 20 which pass the filter. It is hard to say if this would be better or not, because I cannot predict how many of the most recent blocks belong to the collection (or other criteria) you care about. To do this, you would need an index on nft_transfer_events that starts with "block", so that can be read in block order. (You might want to add the other columns to end of that index so it can get an index-only scan on it). But then you would need to be able to efficiently determine if an identified token met your other criteria, but it looks like you already have a good index to do that.
I'm moving the Postgres DB to another server. After imporing the data (dumped with pg_dump) I checked the performane and found out that the same query results in different query plans on the two DBs (given that the DBMS versions, DB structure and the data itself are the same):
the query is:
explain analyse select * from common.composite where id = 0176200005519000087
query plan of the production DB:
Index Scan using composite_id_idx on composite (cost=0.43..8.45 rows=1 width=222) (actual time=0.070..0.071 rows=1 loops=1)
Index Cond: (id = '176200005519000087'::bigint)
Planning time: 0.502 ms
Execution time: 0.102 ms
for the new one:
Bitmap Heap Scan on composite (cost=581.08..54325.66 rows=53916 width=76) (actual time=0.209..0.210 rows=1 loops=1)
Recheck Cond: (id = '176200005519000087'::bigint)
Heap Blocks: exact=1
-> Bitmap Index Scan on composite_id_idx (cost=0.00..567.61 rows=53916 width=0) (actual time=0.187..0.187 rows=1 loops=1)
Index Cond: (id = '176200005519000087'::bigint)
Planning time: 0.428 ms
Execution time: 0.305 ms
Obviously, there is a btree index for id in both DBs.
As far as I could get, the new one uses bitmap indeces for some reason, while btree was imported from the dump. This results in a huge delay (up to 30x) in complex queries.
Is there something wrong the the index/dependencies importing or there is a way to point which indexes the planner should use?
Thank you.
I'm using postgresql 9.4.6.
There are the following entities:
CREATE TABLE user (id CHARACTER VARYING NOT NULL PRIMARY KEY);
CREATE TABLE group (id CHARACTER VARYING NOT NULL PRIMARY KEY);
CREATE TABLE group_member (
id CHARACTER VARYING NOT NULL PRIMARY KEY,
gid CHARACTER VARYING REFERENCES group(id),
uid CHARACTER VARYING REFERENCES user(id));
I analyze that query:
explain analyze select x2."gid" from "group_member" x2 where x2."uid" = 'a1';
I have several results. Before each result I flushed OS-caches and restarted postgres:
# /etc/init.d/postgresql stop
# sync
# echo 3 > /proc/sys/vm/drop_caches
# /etc/init.d/postgresql start
The results of analyzing are:
1) cost=4.17..11.28 with indexes:
create index "group_member_gid_idx" on "group_member" ("gid");
create index "group_member_uid_idx" on "group_member" ("uid");
Bitmap Heap Scan on group_member x2 (cost=4.17..11.28 rows=3 width=32) (actual time=0.021..0.021 rows=0 loops=1)
Recheck Cond: ((uid)::text = 'a1'::text)
-> Bitmap Index Scan on group_member_uid_idx (cost=0.00..4.17 rows=3 width=0) (actual time=0.005..0.005 rows=0 loops=1)
Index Cond: ((uid)::text = 'a1'::text)
Planning time: 28.641 ms
Execution time: 0.359 ms
2) cost=7.97..15.08 with indexes:
create unique index "group_member_gid_uid_idx" on "group_member" ("gid","uid");
Bitmap Heap Scan on group_member x2 (cost=7.97..15.08 rows=3 width=32) (actual time=0.013..0.013 rows=0 loops=1)
Recheck Cond: ((uid)::text = 'a1'::text)
-> Bitmap Index Scan on group_member_gid_uid_idx (cost=0.00..7.97 rows=3 width=0) (actual time=0.006..0.006 rows=0 loops=1)
Index Cond: ((uid)::text = 'a1'::text)
Planning time: 0.132 ms
Execution time: 0.047 ms
3) cost=0.00..16.38 without any indexes:
Seq Scan on group_member x2 (cost=0.00..16.38 rows=3 width=32) (actual time=0.002..0.002 rows=0 loops=1)
Filter: ((uid)::text = 'a1'::text)
Planning time: 42.599 ms
Execution time: 0.402 ms
Is a result #3 more effective? And why?
EDIT
There will be many rows in tables (group, user, group_members) in practice. About > 1 Million.
When analyzing queries, the costs and query plans on small data sets are not generally not a reliable guide to performance on larger data sets. And, SQL is more concerned with larger data sets than with trivially small ones.
The reading of data from disk is often the driving factor in query performance. The main purpose of using an index is to reduce the number of data pages being read. If all the data in the table fits on a single data page, then there isn't much opportunity for reducing the number of page reads: It takes the same amount of time to read one page, whether the page has one record or 100 records. (Reading through a page to find the right record also incurs overhead, whereas an index would identify the specific record on the page.)
Indexes incur overhead, but typically much, much less than reading a data page. The index itself needs to be read into memory -- so that means that two pages are being read into memory rather than one. One could argue that for tables that fit on one or two pages, the use of an index is probably not a big advantage.
Although using the index (in this case) does take longer, differences in performance measured in fractions of a millisecond are generally not germane to most database tasks. If you want to see the index do its work, put 100,000 rows in the table and run the same tests. You'll see that the version without the index scales roughly in proportion to the amount of data in the table; the version with the index is relatively constant (well, actually scaling more like the log of the number of records in the table).
I'm having difficulty understanding what I perceive as an inconsistancy in how postgres chooses to use indices. We have a query based on NOT IN against an indexed column that postgres executes sequentially, but when we perform the same query as IN, it uses the index.
I've created a simplistic example that I believe demonstrates the issue, notice this first query is sequential
CREATE TABLE node
(
id SERIAL PRIMARY KEY,
vid INTEGER
);
CREATE INDEX x ON node(vid);
INSERT INTO node(vid) VALUES (1),(2);
EXPLAIN ANALYZE
SELECT *
FROM node
WHERE NOT vid IN (1);
Seq Scan on node (cost=0.00..36.75 rows=2129 width=8) (actual time=0.009..0.010 rows=1 loops=1)
Filter: (vid <> 1)
Rows Removed by Filter: 1
Total runtime: 0.025 ms
But if we invert the query to IN, you'll notice that it now decided to use the index
EXPLAIN ANALYZE
SELECT *
FROM node
WHERE vid IN (2);
Bitmap Heap Scan on node (cost=4.34..15.01 rows=11 width=8) (actual time=0.017..0.017 rows=1 loops=1)
Recheck Cond: (vid = 1)
-> Bitmap Index Scan on x (cost=0.00..4.33 rows=11 width=0) (actual time=0.012..0.012 rows=1 loops=1)
Index Cond: (vid = 1)
Total runtime: 0.039 ms
Can anyone shed any light on this? Specifically, is there a way to re-write out NOT IN to work with the index (when obviously the result set is not as simplistic as just 1 or 2).
We are using Postgres 9.2 on CentOS 6.6
PostgreSQL is going to use an Index when it makes sense. It is likely that the statistics state that your NOT IN has too many tuples to return to make an Index effective.
You can test this by doing the following:
set enable_seqscan to false;
explain analyze .... NOT IN
set enable_seqscan to true;
explain analyze .... NOT IN
The results will tell you if PostgreSQL is making the correct decision. If it isn't you can make adjustments to the statistics of the column and or the costs (random_page_cost) to get the desired behavior.
I have a huge table (currently ~3mil rows, expected to increase by a factor of 1000) with lots of inserts every second. The table is never updated.
Now I have to run queries on that table which are pretty slow (as expected). These queries do not have to be 100% accurate, it is ok if the result is a day old (but not older).
There is currently two indexes on two single integer columns and I would have to add two more indexes (integer and timestamp columns) to speed up my queries.
The ideas I had so far:
Add the two missing indexes to the table
No indexes on the huge table at all and copy the content (as a daily task) to a second table (just the important rows) then create the indexes on the second table and run the queries on that table?
Partitioning the huge table
Master/Slave setup (writing to the master and reading from the slaves).
What option is the best in terms of performance? Do you have any other suggestions?
EDIT:
Here is the table (I have marked the foreign keys and prettified the query a bit):
CREATE TABLE client_log
(
id serial NOT NULL,
logid integer NOT NULL,
client_id integer NOT NULL, (FOREIGN KEY)
client_version varchar(16),
sessionid varchar(100) NOT NULL,
created timestamptz NOT NULL,
filename varchar(256),
funcname varchar(256),
linenum integer,
comment text,
domain varchar(128),
code integer,
latitude float8,
longitude float8,
created_on_server timestamptz NOT NULL,
message_id integer, (FOREIGN KEY)
app_id integer NOT NULL, (FOREIGN KEY)
result integer
);
CREATE INDEX client_log_code_idx ON client_log USING btree (code);
CREATE INDEX client_log_created_idx ON client_log USING btree (created);
CREATE INDEX clients_clientlog_app_id ON client_log USING btree (app_id);
CREATE INDEX clients_clientlog_client_id ON client_log USING btree (client_id);
CREATE UNIQUE INDEX clients_clientlog_logid_client_id_key ON client_log USING btree (logid, client_id);
CREATE INDEX clients_clientlog_message_id ON client_log USING btree (message_id);
And an example query:
SELECT
client_log.comment,
COUNT(client_log.comment) AS count
FROM
client_log
WHERE
client_log.app_id = 33 AND
client_log.code = 3 AND
client_log.client_id IN (SELECT client.id FROM client WHERE
client.app_id = 33 AND
client."replaced_id" IS NULL)
GROUP BY client_log.comment ORDER BY count DESC;
client_log_code_idx is the index needed for the query above. There is other queries needing the client_log_created_idx index.
And the query plan:
Sort (cost=2844.72..2844.75 rows=11 width=242) (actual time=4684.113..4684.180 rows=70 loops=1)
Sort Key: (count(client_log.comment))
Sort Method: quicksort Memory: 32kB
-> HashAggregate (cost=2844.42..2844.53 rows=11 width=242) (actual time=4683.830..4683.907 rows=70 loops=1)
-> Hash Semi Join (cost=1358.52..2844.32 rows=20 width=242) (actual time=303.515..4681.211 rows=1202 loops=1)
Hash Cond: (client_log.client_id = client.id)
-> Bitmap Heap Scan on client_log (cost=1108.02..2592.57 rows=387 width=246) (actual time=113.599..4607.568 rows=6962 loops=1)
Recheck Cond: ((app_id = 33) AND (code = 3))
-> BitmapAnd (cost=1108.02..1108.02 rows=387 width=0) (actual time=104.955..104.955 rows=0 loops=1)
-> Bitmap Index Scan on clients_clientlog_app_id (cost=0.00..469.96 rows=25271 width=0) (actual time=58.315..58.315 rows=40662 loops=1)
Index Cond: (app_id = 33)
-> Bitmap Index Scan on client_log_code_idx (cost=0.00..637.61 rows=34291 width=0) (actual time=45.093..45.093 rows=36310 loops=1)
Index Cond: (code = 3)
-> Hash (cost=248.06..248.06 rows=196 width=4) (actual time=61.069..61.069 rows=105 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 4kB
-> Bitmap Heap Scan on client (cost=10.95..248.06 rows=196 width=4) (actual time=27.843..60.867 rows=105 loops=1)
Recheck Cond: (app_id = 33)
Filter: (replaced_id IS NULL)
Rows Removed by Filter: 271
-> Bitmap Index Scan on clients_client_app_id (cost=0.00..10.90 rows=349 width=0) (actual time=15.144..15.144 rows=380 loops=1)
Index Cond: (app_id = 33)
Total runtime: 4684.843 ms
In general, in a system where time related data is constantly being inserted into the database, I'd recommend partitioning according to time.
This is not just because it might improve query times, but because otherwise it makes managing the data difficult. However big your hardware is, it will have a limit to its capacity, so you will eventually have to start removing rows that are older than a certain date. The rate at which you remove the rows will have to be equal to the rate they are going in.
If you just have one big table, and you remove old rows using DELETE, you will leave a lot of dead tuples that need to be vacuumed out. The autovacuum will be running constantly, using up valuable disk IO.
On the other hand, if you partition according to time, then removing out of date data is as easy as dropping the relevant child table.
In terms of indexes - the indexes are not inherited, so you can save on creating the indexes until after the partition is loaded. You could have a partition size of 1 day in your use case. This means the indexes do not need to be constantly updated as data is being inserted. It will be more practical to have additional indexes as needed to make your queries perform.
Your sample query does not filter on the 'created' time field, but you say other queries do. If you partition by time, and are careful about how you construct your queries, constraint exclusion will kick in and it will only include the specific partitions that are relevant to the query.
Except for partitioning I would consider splitting the table into many tables, aka Sharding.
I don't have the full picture of your domain but these are some suggestions:
Each client get their own table in their own schema (or a set of clients share a schema depending on how many clients you have and how many new clients you expect to get).
create table client1.log(id, logid,.., code, app_id);
create table client2.log(id, logid,.., code, app_id);
Splitting the table like this should also reduce the contention on inserts.
The table can be split even more. Within each client-schema you can also split the table per "code" or "app_id" or something else that makes sense for you. This might be overdoing it but it is easy to implement if the number of "code" and/or "app_id" values do not change often.
Do keep the code/app_id columns even in the new smaller tables but do put a constraint on the column so that no other type of log record can be inserted. The constraint will also help the optimiser when searching, see this example:
create schema client1;
set search_path = 'client1';
create table error_log(id serial, code text check(code ='error'));
create table warning_log(id serial, code text check(code ='warning'));
create table message_log(id serial, code text check(code ='message'));
To get the full picture (all rows) of a client you can use a view on top of all tables:
create view client_log as
select * from error_log
union all
select * from warning_log
union all
select * from message_log;
The check constraints should allow the optimiser to only search the table where the "code" can exist.
explain
select * from client_log where code = 'error';
-- Output
Append (cost=0.00..25.38 rows=6 width=36)
-> Seq Scan on error_log (cost=0.00..25.38 rows=6 width=36)
Filter: (code = 'error'::text)