Postgresql planner uses wrong index - sql

Recently i upgraded Postgresql from 9.1 to 9.2 version. New planner uses wrong index and query executes too long.
Query:
explain SELECT mentions.* FROM mentions WHERE (searches_id = 7646553) ORDER BY id ASC LIMIT 1000
Explain in 9.1 version:
Limit (cost=5762.99..5765.49 rows=1000 width=184)
-> Sort (cost=5762.99..5842.38 rows=31755 width=184)
Sort Key: id
-> Index Scan using mentions_searches_id_idx on mentions (cost=0.00..4021.90 rows=31755 width=184)
Index Cond: (searches_id = 7646553)
Expain in 9.2 version:
Limit (cost=0.00..450245.54 rows=1000 width=244)
-> Index Scan using mentions_pk on mentions (cost=0.00..110469543.02 rows=245354 width=244
Index Cond: (id > 0)"
Filter: (searches_id = 7646553)
The correct approach is in 9.1 version, where planner uses index on searches_id. In 9.2 version planner doesn't not uses that index and filter rows by searches_id.
When i execute on 9.2 version query without ORDER BY id, planner uses index on searches_id, but i need to order by id.
I also tried to select rows in subquery and order it in second query, but explain shows that, the planner do the same like in normal query.
select * from (
SELECT mentions.* FROM mentions WHERE (searches_id = 7646553))
AS q1
order by id asc
What would you recommend?

If searches_id #7646553 rows are more than a few percent of the table then the index on that column will not be used as a table scan would be faster. Do a
select count(*) from mentions where searches_id = 7646553
and compare to the total rows.
If they are less than a few percent of the table then try
with m as (
SELECT *
FROM mentions
WHERE searches_id = 7646553
)
select *
from m
order by id asc
(From PostgreSQL v12 on, you have to use with ... as materialized.)
Or create a composite index:
create index index_name on mentions (searches_id, id)
If searches_id has low cardinality then create the same index in the opposite order
create index index_name on mentions (id, searches_id)
Do
analyze mentions
After creating an index.

For me, I had indexes but they were all based on 3 columns, and I wasn't calling out one of the columns in the indexes, so it was doing seq scan across the entire thing. Possible fix: more indexes but that use fewer columns (and/or switch column order).
Another problem we saw was that we had the right index, but apparently it was an "invalid" (poorly created CONCURRENT?) index. So dropping it and creating it (or reindexing it) and it started using it.
What are the available options to identify and remove the invalid objects in Postgres (ex: corrupted indexes)
See also http://www.postgresql.org/docs/8.4/static/indexes-multicolumn.html

Related

How to prevent changing of execution plan for certain values

I have a table in PosgreSQL 9.1.9. There's a schema:
CREATE TABLE chpl_text
(
id integer NOT NULL DEFAULT nextval('chpl_text_id_seq1'::regclass),
page_id bigint NOT NULL,
page_idx integer NOT NULL,
...
);
I have around 40000000 (40M) rows in this table.
Now, there's a query:
SELECT
...
FROM chpl_text
ORDER BY id
LIMIT 100000
OFFSET N
Everything runs smoothly for N <= 5300000. Execution plan looks like this
Limit (cost=12743731.26..12984179.02 rows=100000 width=52)
-> Index Scan using chpl_text_pkey on chpl_text t (cost=0.00..96857560.86 rows=40282164 width=52)
But for N >= 5400000 it magically changes into
Limit (cost=13042543.16..13042793.16 rows=100000 width=52)
-> Sort (cost=13029043.16..13129748.57 rows=40282164 width=52)
Sort Key: id
-> Seq Scan on chpl_text t (cost=0.00..1056505.64 rows=40282164 width=52)
Resulting in very long runtime.
How can I prevent postresql from changeing query plan for higher offsets?
Note: I am aware, that big offsets are not good at all, but I am forced to use them here.
If Postgres is configured halfway decently, your statistics are up to date (ANALYZE or autovacuum) and the cost settings are sane, Postgres generally knows better when to do an index scan or a sequential scan. Details and links:
Keep PostgreSQL from sometimes choosing a bad query plan
To actually test performance without sequential scan, "disable" it (in a debug session only!)
SET enable_seqscan=OFF;
More in the manual.
Then run EXPLAIN ANALYZE again ...
Also, the release of Postgres 9.2 had a focus on "big data". With your given use case you should urgently consider upgrading to the current release.
You can also try this alternative query with a CTE and row_number() and see if the query plan turns out more favorable:
WITH cte AS (
SELECT ..., row_number() OVER (ORDER BY id) AS rn
FROM chpl_text
)
SELECT ...
FROM cte
WHERE rn BETWEEN N+1 AND N+100000
ORDER BY id;
That's not always the case, but might be in your special situation.

How SQL actually run in Memory? Row by Row

How does SQL actually run?
For example, if I want to find a row with row_id=123, will SQL query search row by row from the top of memory?
This is a topic of query optimization. Briefly speaking, based on your query, the database system first tries to generate and optimize a query plan that possibly has optimal performance, then executes that plan.
For selections like row_id = 123, the actually query plan depends on whether you have an index or not. If you do not, a table scan will be used to examine the table row by row. But if you do have an index on row_id, there is a chance to skip most of the rows by using the index. In this case, the DB will not search row by row.
If you're running PostgreSQL or MySQL, you can use
EXPLAIN SELECT * FROM table WHERE row_id = 123;
to see the query plan generated by your system.
For an example table,
CREATE TABLE test(row_id INT); -- without index
COPY test FROM '/home/user/test.csv'; -- 40,000 rows
The EXPLAIN SELECT * FROM test WHERE row_id = 123 outputs:
QUERY PLAN
------------------------------------------------------
Seq Scan on test (cost=0.00..677.00 rows=5 width=4)
Filter: (row_id = 123)
(2 rows)
which means the database will do a sequential scan on the whole table and find the rows with row_id = 123.
However, if you create an index on the column row_id = 123:
CREATE INDEX test_idx ON test(row_id);
then the same EXPLAIN will tell us that the database will use an index scan to avoid going through the whole table:
QUERY PLAN
--------------------------------------------------------------------------
Index Only Scan using test_idx on test (cost=0.00..8.34 rows=5 width=4)
Index Cond: (row_id = 123)
(2 rows)
You can also use EXPLAIN ANALYZE to see actual performance of your SQL queries. On my machine, the total runtimes for sequential scan and index scan are 14.738 ms and 0.171 ms, respectively.
For details of query optimization, refer to Chapters 15 and 16 in the Database Systems: The Complete Book.

will PostgreSQL optimize LIKE '%'?

I wonder if SELECT * FROM foo will execute faster than SELECT * FROM foo WHERE name LIKE '%' assuming name is NOT NULL?
Any references to documentation?
Both of your queries will scan the entire table. Whether or not name is NOT NULL is only important in extremely rare circumstances where there is (1) an index on name and (2) it is very, very sparse. Only then will PostgreSQL consider looking up the records from the name index.
In all other situations, this SQLFiddle shows that the LIKE version adds a filter, which must be checked. PostgreSQL has no optimization to remove LIKE '%' against a not-null varchar column, as much as it seems sensible.
Table SELECT * all rows
QUERY PLAN
Seq Scan on foo (cost=0.00..15.00 rows=1000 width=62)
Table SELECT * all rows with `LIKE '%'`
QUERY PLAN
Seq Scan on foo (cost=0.00..17.50 rows=1000 width=62)
Filter: ((name)::text ~~ '%'::text)
Yes. Using LIKE requires the database to do a full table scan, in addition to simply returning the rows.

Effectively query on column that includes a substring

Given a string column with a value similar to /123/12/34/56/5/, what is the optimal way of querying for all the records that include the given number (12 for example)?
The solution from top of my head is:
SELECT id FROM things WHERE things.path LIKE '%/12/%'
But AFAIK this query can't use indexes on the column due to the leading %.
There must be something better. What is it?
Using PostgreSQL, but would prefer the solution that would work across other DBs too.
If you're happy turning that column into an array of integers, like:
'/123/12/34/56/5/' becomes ARRAY[123,12,34,56,5]
So that path_arr is a column of type INTEGER[], then you can create a GIN index on that column:
CREATE INDEX ON things USING gin(path_arr);
A query for all items containing 12 then becomes:
SELECT * FROM things WHERE ARRAY[12] <# path_arr;
Which will use the index. In my test (with a million rows), I get plans like:
EXPLAIN SELECT * FROM things WHERE ARRAY[12] <# path_arr;
QUERY PLAN
----------------------------------------------------------------------------------------
Bitmap Heap Scan on things (cost=5915.75..9216.99 rows=1000 width=92)
Recheck Cond: (path_arr <# '{12}'::integer[])
-> Bitmap Index Scan on things_path_arr_idx (cost=0.00..5915.50 rows=1000 width=0)
Index Cond: ('{12}'::integer[] <# path_arr)
(4 rows)
In PostgreSQL 9.1 you could utilize the pg_trgm module and build a GIN index with it.
CREATE EXTENSION pg_trgm; -- once per database
CREATE INDEX things_path_trgm_gin_idx ON things USING gin (path gin_trgm_ops);
Your LIKE expression can use this index even if it is not left-anchored.
See a detailed demo by depesz here.
Normalize it If you can, though.

How to increase query speed without using full-text search?

This is my simple query; By searching selectnothing I'm sure I'll have no hits.
SELECT nome_t FROM myTable WHERE nome_t ILIKE '%selectnothing%';
This is the EXPLAIN ANALYZE VERBOSE
Seq Scan on myTable (cost=0.00..15259.04 rows=37 width=29) (actual time=2153.061..2153.061 rows=0 loops=1)
Output: nome_t
Filter: (nome_t ~~* '%selectnothing%'::text)
Total runtime: 2153.116 ms
myTable has around 350k rows and the table definition is something like:
CREATE TABLE myTable (
nome_t text NOT NULL,
)
I have an index on nome_t as stated below:
CREATE INDEX idx_m_nome_t ON myTable
USING btree (nome_t);
Although this is clearly a good candidate for Fulltext search I would like to rule that option out for now.
This query is meant to be run from a web application and currently it's taking around 2 seconds which is obviously too much;
Is there anything I can do, like using other index methods, to improve the speed of this query?
No, ILIKE '%selectnothing%' always needs a full table scan, every index is useless. You need full text search, it's not that hard to implement.
Edit: You could use a Wildspeed, I forgot about this option. The indexes will be huge, but your performance will also be much better.
Wildspeed extension provides GIN index
support for wildcard search for LIKE
operator.
http://www.sai.msu.su/~megera/wiki/wildspeed
another thing you can do-- is break this nome_t column in table myTable into it's own table. Searching one column out of a table is slow (if there are fifty other wide columns) because the other data effectively slows down the scan against that column (because there are less records per page/extent).