I'm trying to understand how index scan's actually performed.
EXPLAIN ANALYZE SELECT * FROM tbl WHERE id = 46983
Consider the following plan:
Index Scan using pk_tbl on tbl (cost=0.29..8.30 rows=1 width=1064) (actual time=0.012..0.013 rows=1 loops=1)
Index Cond: (id = 46983)
Planning time: 0.101 ms
Execution time: 0.050 ms
As far as I undersdtand, the index scan process consists of two random page read. In my case
SHOW random_page_cost
returns 4.
So, I guess we need to find the block the the row with id = 46983 stored in (random access in index) and then we need to read that block by it's address(random access the block in physical storage). That's clear, two random access are actually occured. But from wiki I read that
In data structures, direct access implies the ability to access any
entry in a list in constant time
But it's obviously that traversing the balanced-tree doesn't have constant-time complexity, because it depends on the deep of the tree.
That way, how come is it correct to say that requesting the block of the index is actually random-access?
The reason is that indexes in database are normally stored as B-trees or B+trees, an n-ary tree structure with a variabile but very large number of children per node. A typical tree of this kind with three or four levels can address millions of records, and almost certainly at least the root is kept in the cache (buffer pool), so that a typical access for a random key has a cost in the order of 1 or 2 disk accesses. For this reason, in the database field (and when costs are estimated) the access to a B-tree index is considered as a small constant.
Related
I have an update which takes a lot of time to finish. 10 millions of rows need to be updated. The execution ended after 6 hours.
This is the query :
update A
set a_top = 'N'
where (a_toto, a_num, a_titi) in
(select a_toto, a_num, a_titi
from A
where a_titi <> 'test' and a_top is null limit 10000000);
Two indexes have been created :
CREATE UNIQUE INDEX pk_A ON A USING btree (a_toto, a_num, a_titi)
CREATE INDEX id1_A ON A USING btree (a_num)
These are the things I already checked :
No locks
No triggers on A
The execution plan shows me that the indexes are not used, would it change anything if I drop the indexes, update the rows and then create the indexes after that ?
Is there a way of improving the query itself ?
Here is the execution plan :
Update on A (cost=3561856.86..10792071.61 rows=80305304 width=200)
-> Hash Join (cost=3561856.86..10792071.61 rows=80305304 width=200)
Hash Cond: (((A.a_toto)::text = (("ANY_subquery".a_toto)::text)) AND ((A.a_num)::text = (("ANY_subquery".a_num)::text)) AND ((A.a_titi)::text = (("ANY_subquery".a_titi)::text)))
-> Seq Scan on A (cost=0.00..2509069.04 rows=80305304 width=126)
-> Hash (cost=3490830.00..3490830.00 rows=2082792 width=108)
-> Unique (cost=3390830.00..3490830.00 rows=2082792 width=108)
-> Sort (cost=3390830.00..3415830.00 rows=10000000 width=108)
Sort Key: (("ANY_subquery".a_toto)::text), (("ANY_subquery".a_num)::text), (("ANY_subquery".a_titi)::text)
-> Subquery Scan on "ANY_subquery" (cost=0.00..484987.17 rows=10000000 width=108)
-> Limit (cost=0.00..384987.17 rows=10000000 width=42)
-> Seq Scan on A A_1 (cost=0.00..2709832.30 rows=70387600 width=42)
Filter: ((a_top IS NULL) AND ((a_titi)::text <> 'test'::text))
(12 rows)
Thanks for you help.
The index I would have suggested is:
CREATE UNIQUE INDEX pk_A ON A USING btree (a_titi, a_top, a_toto, a_num);
This index covers the WHERE clause of the subquery, allowing Postgres to throw away records which don't meet the criteria. The index also includes the three columns which are needed in the SELECT clause.
One reason your current first index is not being used is that it doesn't cover the WHERE clause. This index, if used, might require a full index scan, during which Postgres would have to manually filter off non matching records.
PostgreSQL does a pretty poor job of optimizing bulk updates, because it optimizes it (almost) just like a select, and throws an update on top. It doesn't consider how the order of the rows returned by the select-like-portion will effect the IO pattern of the update itself. This can have devastatingly poor performance for high latency devices, like spinning disks. And is often bad even for SSD.
My theory is that you could get greatly improved performance by injecting a Sort by ctid node just below the Update node. But it looks really hard to do this, even in a gross and hackish way just to get a proof-of-concept.
But if the Hash node can fit the entire hash table in work_mem, rather than spilling to disk, then the Hash Join should return tuples in physical order so they can be updated efficiently. This would require a work_mem very much larger than 4MB, though. (But it is hard to say how much larger without trial and error. But even if spills to disk in 4 batches, that should be much better than hundreds.)
You can probably get it to use an index plan by setting both enable_hashjoin and enable_mergejoin to off. But whether this will actually be faster is another question, it might have the same random IO problem as the current method does. Unless the table is clustered or something like that.
You should really go back to your client and ask what they are trying to accomplish here. If they would just update the table in one shot without the self join, they wouldn't have this problem. If they are using the LIMIT to try to get the UPDATE to run faster, then it is probably backfiring spectacularly. If they are doing it for some other reason, well, what is it?
Despite what all the documentation says, I'm finding GIN indexes to be significantly slower than GIST indexes for pg_trgm related searches. This is on a table of 25 million rows with a relatively short text field (average length of 21 characters). Most of the rows of text are addresses of the form "123 Main st, City".
GIST index takes about 4 seconds with a search like
select suggestion from search_suggestions where suggestion % 'seattle';
But GIN takes 90 seconds and the following result when running with EXPLAIN ANALYZE:
Bitmap Heap Scan on search_suggestions (cost=330.09..73514.15 rows=25043 width=22) (actual time=671.606..86318.553 rows=40482 loops=1)
Recheck Cond: ((suggestion)::text % 'seattle'::text)
Rows Removed by Index Recheck: 23214341
Heap Blocks: exact=7625 lossy=223807
-> Bitmap Index Scan on tri_suggestions_idx (cost=0.00..323.83 rows=25043 width=0) (actual time=669.841..669.841 rows=1358175 loops=1)
Index Cond: ((suggestion)::text % 'seattle'::text)
Planning time: 1.420 ms
Execution time: 86327.246 ms
Note that over a million rows are being selected by the index, even though only 40k rows actually match. Any ideas why this is performing so poorly? This is on PostgreSQL 9.4.
Some issues stand out:
First, consider upgrading to a current version of Postgres. At the time of writing that's pg 9.6 or pg 10 (currently beta). Since Pg 9.4 there have been multiple improvements for GIN indexes, the additional module pg_trgm and big data in general.
Next, you need much more RAM, in particular a higher work_mem setting. I can tell from this line in the EXPLAIN output:
Heap Blocks: exact=7625 lossy=223807
"lossy" in the details for a Bitmap Heap Scan (with your particular numbers) indicates a dramatic shortage of work_mem. Postgres only collects block addresses in the bitmap index scan instead of row pointers because that's expected to be faster with your low work_mem setting (can't hold exact addresses in RAM). Many more non-qualifying rows have to be filtered in the following Bitmap Heap Scan this way. This related answer has details:
“Recheck Cond:” line in query plans with a bitmap index scan
But don't set work_mem too high without considering the whole situation:
Optimize simple query using ORDER BY date and text
There may other problems, like index or table bloat or more configuration bottlenecks. But if you fix just these two items, the query should be much faster already.
Also, do you really need to retrieve all 40k rows in the example? You probably want to add a small LIMIT to the query and make it a "nearest-neighbor" search - in which case a GiST index is the better choice after all, because that is supposed to be faster with a GiST index. Example:
Best index for similarity function
I've got a simple table with single column PRIMARY KEY called id, type serial. There is exactly 100,000,000 rows in there. Table takes 48GB, PK index ca 2,1GB. Machine running on is "dedicated" only for Postgres and it is something like Core i5, 500GB HDD, 8GB RAM. Pg config was created by pgtune utility (shared buffers ca 2GB, effective cache ca 7GB). OS is Ubuntu server 14.04, Postgres 9.3.6.
Why are both SELECT count(id) and SELECT count(*) so slow in this simple case (cca 11 minutes)?
Why is PostgreSQL planner choosing full table scan instead of index scanning which should be at least 25 times faster (in the case where it would have to read the whole index from HDD). Or where am I wrong?
Btw running the query several times in a row is not changing anything. still cca 11 minutes.
Execution plan here:
Aggregate (cost=7500001.00..7500001.01 rows=1 width=0) (actual time=698316.978..698316.979 rows=1 loops=1)
Buffers: shared hit=192 read=6249809
-> Seq Scan on transaction (cost=0.00..7250001.00 rows=100000000 width=0) (actual time=0.009..680594.049 rows=100000001 loops=1)
Buffers: shared hit=192 read=6249809
Total runtime: 698317.044 ms
Considering the spec of a HDD is usually somewhere between 50Mb/s and 100Mb/s then for 48Gb I would expect to read everything between 500 and 1000s.
Since you have no where clause, the planner sees that you are interested in the large majority of records, so it does not use the index as this would require additional indexes. The reason postgresql cannot use the index lies in the MVCC which postgresql uses for transaction consistency. This requires that the rows are pulled to ensure accurate results. (see https://wiki.postgresql.org/wiki/Slow_Counting)
The cache, CPU, etc will not affect this nor changing the caching settings. This is IO bound and the cache will be completely trashed after the query.
If you can live with an approximation you can use the reltuples field in the table metadata:
SELECT reltuples FROM pg_class WHERE relname = 'tbl';
Since this is just a single row this is blazing fast.
Update: since 9.2 a new way to store the visibility information allowed index-only counts to happen. However there are quite some caveats, especially in the case where there is no predicate to limit the rows. see https://wiki.postgresql.org/wiki/Index-only_scans for more details.
I have been trying out postgres 9.3 running on an Azure VM on Windows Server 2012. I was originally running it on a 7GB server... I am now running it on a 14GB Azure VM. I went up a size when trying to solve the problem described below.
I am quite new to posgresql by the way, so I am only getting to know the configuration options bit by bit. Also, while I'd love to run it on Linux, I and my colleagues simply don't have the expertise to address issues when things go wrong in Linux, so Windows is our only option.
Problem description:
I have a table called test_table; it currently stores around 90 million rows. It will grow by around 3-4 million rows per month. There are 2 columns in test_table:
id (bigserial)
url (charachter varying 300)
I created indexes after importing the data from a few CSV files. Both columns are indexed.... the id is the primary key. The index on the url is a normal btree created using the defaults through pgAdmin.
When I ran:
SELECT sum(((relpages*8)/1024)) as MB FROM pg_class WHERE reltype=0;
... The total size is 5980MB
The indiviual size of the 2 indexes in question here are as follows, and I got them by running:
# SELECT relname, ((relpages*8)/1024) as MB, reltype FROM pg_class WHERE
reltype=0 ORDER BY relpages DESC LIMIT 10;
relname | mb | reltype
----------------------------------+------+--------
test_url_idx | 3684 | 0
test_pk | 2161 | 0
There are other indexes on other smaller tables, but they are tiny (< 5MB).... so I ignored them here
The trouble when querying the test_table using the url, particularly when using a wildcard in the search, is the speed (or lack of it). e.g.
select * from test_table where url like 'orange%' limit 20;
...would take anything from 20-40 seconds to run.
Running explain analyze on the above gives the following:
# explain analyze select * from test_table where
url like 'orange%' limit 20;
QUERY PLAN
-----------------------------------------------------------------
Limit (cost=0.00..4787.96 rows=20 width=57)
(actual time=0.304..1898.583 rows=20 loops=1)
-> Seq Scan on test_table (cost=0.00..2303247.60 rows=9621 width=57)
(actual time=0.302..1898
.542 rows=20 loops=1)
Filter: ((url)::text ~~ 'orange%'::text)
Rows Removed by Filter: 210286
Total runtime: 1898.650 ms
(5 rows)
Taking another example... this time with the wildcard between american and .com....
# explain select * from test_table where url
like 'american%.com' limit 50;
QUERY PLAN
-------------------------------------------------------
Limit (cost=0.00..11969.90 rows=50 width=57)
-> Seq Scan on test_table (cost=0.00..2303247.60 rows=9621 width=57)
Filter: ((url)::text ~~ 'american%.com'::text)
(3 rows)
# explain analyze select * from test_table where url
like 'american%.com' limit 50;
QUERY PLAN
-----------------------------------------------------
Limit (cost=0.00..11969.90 rows=50 width=57)
(actual time=83.470..3035.696 rows=50 loops=1)
-> Seq Scan on test_table (cost=0.00..2303247.60 rows=9621 width=57)
(actual time=83.467..303
5.614 rows=50 loops=1)
Filter: ((url)::text ~~ 'american%.com'::text)
Rows Removed by Filter: 276142
Total runtime: 3035.774 ms
(5 rows)
I then went from a 7GB to a 14GB server. Query Speeds were no better.
Observations on the server
I can see that Memory usage never really goes beyond 2MB.
Disk reads go off the charts when running a query using a LIKE statement.
Query speed is perfectly fine when matching against the id (primary key)
The postgresql.conf file has had only a few changes from the defaults. Note that I took some of these suggestions from the following blog post: http://www.gabrielweinberg.com/blog/2011/05/postgresql.html.
Changes to conf:
shared_buffers = 512MB
checkpoint_segments = 10
(I changed checkpoint_segments as I got lots of warnings when loading in CSV files... although a production database will not be very write intensive so this can be changed back to 3 if necessary...)
cpu_index_tuple_cost = 0.0005
effective_cache_size = 10GB # recommendation in the blog post was 2GB...
On the server itself, in the Task Manager -> Performance tab, the following are probably the relevant bits for someone who can assist:
CPU: rarely over 2% (regardless of what queries are run... it hit 11% once when I was importing a 6GB CSV file)
Memory: 1.5/14.0GB (11%)
More details on Memory:
In use: 1.4GB
Available: 12.5GB
Committed 1.9/16.1 GB
Cached: 835MB
Paged Pool: 95.2MB
Non-paged pool: 71.2 MB
Questions
How can I ensure an index will sit in memory (providing it doesn't get too big for memory)? Is it just configuration tweaking I need here?
Is implementing my own search index (e.g. Lucene) a better option here?
Are the full-text indexing features in postgres going to improve performance dramatically, even if I can solve the index in memory issue?
Thanks for reading.
Those seq scans make it look like you didn't run analyze on the table after importing your data.
http://www.postgresql.org/docs/current/static/sql-analyze.html
During normal operation, scheduling to run vacuum analyze isn't useful, because the autovacuum periodically kicks in. But it is important when doing massive writes, such as during imports.
On a slightly related note, see this reversed index tip on Pavel's PostgreSQL Tricks site, if you ever need to run anchord queries at the end, rather than at the beginning, e.g. like '%.com'
http://postgres.cz/wiki/PostgreSQL_SQL_Tricks_I#section_20
Regarding your actual questions, be wary that some of the suggestions in that post you liked to are dubious at best. Changing the cost of index use is frequently dubious and disabling seq scan is downright silly. (Sometimes, it is cheaper to seq scan a table than itis to use an index.)
With that being said:
Postgres primarily caches indexes based on how often they're used, and it will not use an index if the stats suggest that it shouldn't -- hence the need to analyze after an import. Giving Postgres plenty of memory will, of course, increase the likelihood it's in memory too, but keep the latter points in mind.
and 3. Full text search works fine.
For further reading on fine-tuning, see the manual and:
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
Two last notes on your schema:
Last I checked, bigint (bigserial in your case) was slower than plain int. (This was a while ago, so the difference might now be negligible on modern, 64-bit servers.) Unless you foresee that you'll actually need more than 2.3 billion entries, int is plenty and takes less space.
From an implementation standpoint, the only difference between a varchar(300) and a varchar without a specified length (or text, for that matter) is an extra check constraint on the length. If you don't actually need data to fit that size and are merely doing so for no reason other than habit, your db inserts and updates will run faster by getting rid of that constraint.
Unless your encoding or collation is C or POSIX, an ordinary btree index cannot efficiently satisfy an anchored like query. You may have to declare a btree index with the varchar_pattern_ops op class to benefit.
The problem is that you're getting hit with a full table scan for each of those lookups ("index in memory" isn't really an issue). Each time you run one of those queries the database is visiting every single row, which is causing the high disk usage. You might check here for a little more information (especially follow the links to the docs on operator classes and index types). If you follow that advice you should be able to get prefix lookups working fine, i.e. those situations where you're matching something like 'orange%'.
Full text search is nice for more natural text search, like written documents, but it might be more difficult to get it working well for URL searching. There was also this thread in the mailing lists a few months back that might have more domain-specific information for what you're trying to do.
explain analyze select * from test_table where
url like 'orange%' limit 20;
You probably want to use a gin/gist index for like queries. Should give you much better results than btree - I don't think btree supports like queries at all.
I'm running PostgreSQL 8.3 on a 1.83 GHz Intel Core Duo Mac Mini with 1GB of RAM and Mac OS X 10.5.8. I have a stored a huge graph in my PostgreSQL database. It consists of 1.6 million nodes and 30 million edges. My database schema is like:
CREATE TABLE nodes (id INTEGER PRIMARY KEY,title VARCHAR(256));
CREATE TABLE edges (id INTEGER,link INTEGER,PRIMARY KEY (id,link));
CREATE INDEX id_idx ON edges (id);
CREATE INDEX link_idx ON edges (link);
The data in the table edges looks like
id link
1 234
1 88865
1 6
2 365
2 12
...
So it stores for each node with id x the outgoing link to id y.
The time for searching all the outgoing links is ok:
=# explain analyze select link from edges where id=4620;
QUERY PLAN
---------------------------------------------------------------------------------
Index Scan using id_idx on edges (cost=0.00..101.61 rows=3067 width=4) (actual time=135.507..157.982 rows=1052 loops=1)
Index Cond: (id = 4620)
Total runtime: 158.348 ms
(3 rows)
However, if I search for the incoming links to a node, the database is more than 100 times slower (although the resulting number of incoming links is only 5-10 times higher than the number of outgoing links):
=# explain analyze select id from edges where link=4620;
QUERY PLAN
----------------------------------------------------------------------------------
Bitmap Heap Scan on edges (cost=846.31..100697.48 rows=51016 width=4) (actual time=322.584..48983.478 rows=26887 loops=1)
Recheck Cond: (link = 4620)
-> Bitmap Index Scan on link_idx (cost=0.00..833.56 rows=51016 width=0) (actual time=298.132..298.132 rows=26887 loops=1)
Index Cond: (link = 4620)
Total runtime: 49001.936 ms
(5 rows)
I tried to force Postgres not to use a Bitmap Scan via
=# set enable_bitmapscan = false;
but the speed of the query for incoming links didn't improve:
=# explain analyze select id from edges where link=1588;
QUERY PLAN
-------------------------------------------------------------------------------------------
Index Scan using link_idx on edges (cost=0.00..4467.63 rows=1143 width=4) (actual time=110.302..51275.822 rows=43629 loops=1)
Index Cond: (link = 1588)
Total runtime: 51300.041 ms
(3 rows)
I also increased my shared buffers from 24MB to 512MB, but it didn't help. So I wonder why my queries for outgoing and incoming links show such an asymmetric behaviour? Is something wrong with my choice of indexes? Or should I better create a third table holding all the incoming links for a node with id x? But that would be quite a waste of disk space. But since I'm new into SQL databases maybe I'm missing something basic here?
I guess it is because of a “density” of same-key-records on the disk.
I think the records with same id are stored in dense (i.e., few number of blocks) and those with same link are stored in sparse (i.e., distributed to huge number of blocks).
If you have inserted records in the order of id, this situation can be happen.
Assume that:
1. there are 10,000 records,
2. they're stored in the order such as (id, link) = (1, 1), (1, 2),..., (1, 100), (2, 1)..., and
3. 50 records can be stored in a block.
In the assumption above, block #1~#3 consists of the records (1, 1)~(1, 50), (1, 51)~(1, 100) and (2, 1)~(2, 50) respectively.
When you SELECT * FROM edges WHERE id=1, only 2 blocks (#1, #2) is to be loaded and scanned.
On the other hand, SELECT * FROM edges WHERE link=1 requires 50 blocks (#1, #3, #5,...), even though the number of rows are same.
If you need good performance and can deal without foreign key constraints (or use triggers to implement them manually) try the intarray and intagg extension modules. Instead of the edges table have an outedges integer[] column on nodes table. This will add about 140MB to the table, so the whole thing will still probably fit into memory. For reverse lookups, either create an GIN index on the outedges column (for an additional 280MB), or just add an inedges column.
Postgresql has pretty high row overhead so the naive edges table will result in 1G of space for the table alone, + another 1.5 for the indices. Given your dataset size, you have a good chance of having most of it in cache if you use integer arrays to store the relations. This will make any lookups blazingly fast. I see around 0.08ms lookup times to get edges in either direction for a given node. Even if you don't fit it all in memory, you'll still have a larger fraction in memory and a whole lot better cache locality.
I think habe is right.
You can check this by using cluster link_idx on edges; analyze edges after filling the table. Now the second query should be fast, and first should be slow.
To have both queries fast you'll have to denormalize by using a second table, as you have proposed. Just remember to cluster and analyze this second table after loading your data, so all egdes linking to a node will be physically grouped.
If you will not query this all the time and you do not want to store and backup this second table then you can create it temporarily before querying:
create temporary table egdes_backwards
as select link, id from edges order by link, id;
create index edges_backwards_link_idx on edges_backwards(link);
You do not have to cluster this temporary table, as it will be physically ordered right on creation. It does not make sense for one query, but can help for several queries in a row.
Your issue seems to be disk-io related. Postgres has to read the tuples of index matches in order to see whether or not the row is visible (this can not be done from an index as it doesn't contain the necessary information).
VACUUM ANALYZE (or simply ANALYZE) will help if you have lots of deleted rows and/or updated rows. Run it first and see if you get any improvements.
CLUSTER might also help. Based on your examples, I'd say using link_idx as the cluster-key. "CLUSTER edges USING link_idx". It might degrade the performance of your id queries though (your id queries might be quick because they are already sorted on disk). Remember to run ANALYZE after CLUSTER.
Next steps include fine-tuning memory parameters, adding more memory, or adding a faster disk subsystem.
have you tried doing this in www.neo4j.org? This is almost trivial in a graph database and should give you performance on your usecase in ms-range.