PostgreSQL 10 - IN and ANY performance inexplicable behaviour - sql

I do selection from big table where id in array/list.
Checked several variants, result wonder me.
1. Use ANY and ARRAY
EXPLAIN (ANALYZE,BUFFERS)
SELECT * FROM cca_data_hours
WHERE
datetime = '2018-01-07 19:00:00'::timestamp without time zone AND
id_web_page = ANY (ARRAY[1, 2, 8, 3 /* ~50k ids */])
Result
"Index Scan using cca_data_hours_pri on cca_data_hours (cost=0.28..576.79 rows=15 width=188) (actual time=0.035..0.998 rows=6 loops=1)"
" Index Cond: (datetime = '2018-01-07 19:00:00'::timestamp without time zone)"
" Filter: (id_web_page = ANY ('{1,2,8,3, (...)"
" Rows Removed by Filter: 5"
" Buffers: shared hit=3"
"Planning time: 57.625 ms"
"Execution time: 1.065 ms"
2. Use IN and VALUES
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM cca_data_hours
WHERE
datetime = '2018-01-07 19:00:00'::timestamp without time zone AND
id_web_page IN (VALUES (1),(2),(8),(3) /* ~50k ids */)
Result
"Hash Join (cost=439.77..472.66 rows=8 width=188) (actual time=90.806..90.858 rows=6 loops=1)"
" Hash Cond: (cca_data_hours.id_web_page = "*VALUES*".column1)"
" Buffers: shared hit=3"
" -> Index Scan using cca_data_hours_pri on cca_data_hours (cost=0.28..33.06 rows=15 width=188) (actual time=0.035..0.060 rows=11 loops=1)"
" Index Cond: (datetime = '2018-01-07 19:00:00'::timestamp without time zone)"
" Buffers: shared hit=3"
" -> Hash (cost=436.99..436.99 rows=200 width=4) (actual time=90.742..90.742 rows=4 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 9kB"
" -> HashAggregate (cost=434.99..436.99 rows=200 width=4) (actual time=90.709..90.717 rows=4 loops=1)"
" Group Key: "*VALUES*".column1"
" -> Values Scan on "*VALUES*" (cost=0.00..362.49 rows=28999 width=4) (actual time=0.008..47.056 rows=28999 loops=1)"
"Planning time: 53.607 ms"
"Execution time: 91.681 ms"
I expect case #2 will faster, but it is not like.
Why IN with VALUES slowelly?

Comparing the EXPLAIN ANALYZE results, it looks like the old version wasn't using the available index to key in the given examples. The reason why ANY (ARRAY[]) became faster is in version 9.2 https://www.postgresql.org/docs/current/static/release-9-2.html
Allow indexed_col op ANY(ARRAY[...]) conditions to be used in plain index scans and index-only scans (Tom Lane)
The site where you got the suggestion from was about version 9.0

Related

Why does this INSERT query run slower when using indices

I'm currently running a query that inserts the values from table insert_values(a,b) into table insert_base(a,b):
INSERT INTO insert_base
SELECT DISTINCT *
FROM insert_values IV
WHERE NOT EXISTS (SELECT * FROM insert_base IB
WHERE IV.a = IB.a AND IV.b = IB.b);
However, when I put an index on insert_values on (a,b) (all the attributes in insert_values) the query actually runs slightly slower than when there is no index (by around 2 seconds). I'm quite confused as to why this is the case, since I thought at worst the index wouldn't negatively affect the performance? Any help would be much appreciated. I am using postgresql by the way.
These are the access plans (indexed query first):
"Insert on insert_base (cost=34712.68..36712.68 rows=0 width=0) (actual time=3517.311..3517.346 rows=0 loops=1)"
" -> HashAggregate (cost=34712.68..35712.68 rows=100000 width=8) (actual time=897.690..1540.008 rows=949662 loops=1)"
" Group Key: iv.a, iv.b"
" Batches: 57 Memory Usage: 11065kB Disk Usage: 24288kB"
" -> Hash Anti Join (cost=100.08..29737.15 rows=995106 width=8) (actual time=12.398..444.112 rows=998050 loops=1)"
" Hash Cond: ((iv.a = ib.a) AND (iv.b = ib.b))"
" -> Seq Scan on insert_values iv (cost=0.00..14425.00 rows=1000000 width=8) (actual time=1.367..149.392 rows=1000000 loops=1)"
" -> Hash (cost=93.43..93.43 rows=443 width=8) (actual time=10.920..10.922 rows=20000 loops=1)"
" Buckets: 32768 (originally 1024) Batches: 1 (originally 1) Memory Usage: 1038kB"
" -> Seq Scan on insert_base ib (cost=0.00..93.43 rows=443 width=8) (actual time=0.020..2.565 rows=20000 loops=1)"
"Planning Time: 12.250 ms"
"Execution Time: 3548.398 ms"
"Insert on insert_base (cost=34682.06..36682.06 rows=0 width=0) (actual time=2821.450..2821.453 rows=0 loops=1)"
" -> HashAggregate (cost=34682.06..35682.06 rows=100000 width=8) (actual time=735.926..1348.614 rows=949662 loops=1)"
" Group Key: iv.a, iv.b"
" Batches: 57 Memory Usage: 11065kB Disk Usage: 24288kB"
" -> Hash Anti Join (cost=102.75..29719.59 rows=992495 width=8) (actual time=4.566..404.311 rows=998050 loops=1)"
" Hash Cond: ((iv.a = ib.a) AND (iv.b = ib.b))"
" -> Seq Scan on insert_values iv (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.050..96.755 rows=1000000 loops=1)"
" -> Hash (cost=94.50..94.50 rows=550 width=8) (actual time=4.491..4.492 rows=20000 loops=1)"
" Buckets: 32768 (originally 1024) Batches: 1 (originally 1) Memory Usage: 1038kB"
" -> Seq Scan on insert_base ib (cost=0.00..94.50 rows=550 width=8) (actual time=0.009..1.558 rows=20000 loops=1)"
"Planning Time: 0.280 ms"
"Execution Time: 2828.308 ms"

SELECT FOR UPDATE becomes slow with time

We have a table with 1B entries and there are 4 processes which work on this simultaneously. They claim rows with their session ids with 1000 rows at a time and then update the table after 10,000 rows. The query used for claiming is:
EXPLAIN (ANALYZE,BUFFERS) WITH b AS
(
SELECT
userid,
address
FROM
UserEvents
WHERE
deliveryId = 2108625
AND
(
tsExpire > GetDate()
OR tsExpire IS NULL
)
AND sendTime <= GetDate()
AND session_id = 0
AND level IN
(
'default'
)
ORDER BY
sendTime FOR
UPDATE
SKIP LOCKED LIMIT 1000
)
UPDATE
UserEvents e
SET
session_id = 1
FROM
b
WHERE
e.userid = b.userid RETURNING b.userid,
b.address
This query generally runs within 500ms when all 4 processes are running simultaneously. Suddenly in the last few runs, it has been slowing down significantly with time. Here are the explain plans:
"Update on UserEvents e (cost=5753.03..8567.46 rows=1000 width=1962) (actual time=1373.284..1422.244 rows=1000 loops=1)"
" Buffers: shared hit=1146920 read=59 dirtied=194"
" I/O Timings: read=13.916"
" CTE b"
" -> Limit (cost=0.56..5752.46 rows=1000 width=82) (actual time=1373.094..1380.853 rows=1000 loops=1)"
" Buffers: shared hit=1121721 read=27 dirtied=23"
" I/O Timings: read=3.440"
" -> LockRows (cost=0.56..179683.91 rows=31239 width=82) (actual time=1373.093..1380.775 rows=1000 loops=1)"
" Buffers: shared hit=1121721 read=27 dirtied=23"
" I/O Timings: read=3.440"
" -> Index Scan using UserEvents_nextpass2 on UserEvents (cost=0.56..179371.52 rows=31239 width=82) (actual time=1366.046..1373.339 rows=4186 loops=1)"
" Index Cond: ((deliveryId = 2108625) AND (sendTime <= '2020-04-15 08:33:57.372282+00'::timestamp with time zone))"
" Filter: (((tsexpire > '2020-04-15 08:33:57.372282+00'::timestamp with time zone) OR (tsexpire IS NULL)) AND (session_id = 0) AND ((level)::text = 'default'::text))"
" Rows Removed by Filter: 29614"
" Buffers: shared hit=1113493 read=27"
" I/O Timings: read=3.440"
" -> Nested Loop (cost=0.58..2815.00 rows=1000 width=1962) (actual time=1373.218..1389.995 rows=1000 loops=1)"
" Buffers: shared hit=1126728 read=27 dirtied=23"
" I/O Timings: read=3.440"
" -> CTE Scan on b (cost=0.00..20.00 rows=1000 width=1692) (actual time=1373.106..1382.263 rows=1000 loops=1)"
" Buffers: shared hit=1121721 read=27 dirtied=23"
" I/O Timings: read=3.440"
" -> Index Scan using UserEvents_id on UserEvents e (cost=0.58..2.79 rows=1 width=268) (actual time=0.007..0.007 rows=1 loops=1000)"
" Index Cond: (userid = b.userid)"
" Buffers: shared hit=5007"
"Planning Time: 0.331 ms"
"Execution Time: 1422.457 ms"
Surprisingly the index scan on UserEvents_nextpass2 slows down significantly after this query is called a few thousand times. This is a partial index on non-null sendTime values. sendTime is updated after each process updates the rows and removes their session ids. But this has been the case for the last 1B events, what could be the reason for this slowness? Any help would be appreciated.
Explain plan for relatively faster run with 700ms:
"Update on UserEvents e (cost=5707.45..8521.87 rows=1000 width=1962) (actual time=695.897..751.557 rows=1000 loops=1)"
" Buffers: shared hit=605921 read=68 dirtied=64"
" I/O Timings: read=27.139"
" CTE b"
" -> Limit (cost=0.56..5706.87 rows=1000 width=82) (actual time=695.616..707.835 rows=1000 loops=1)"
" Buffers: shared hit=580158 read=33 dirtied=29"
" I/O Timings: read=10.491"
" -> LockRows (cost=0.56..179686.41 rows=31489 width=82) (actual time=695.615..707.770 rows=1000 loops=1)"
" Buffers: shared hit=580158 read=33 dirtied=29"
" I/O Timings: read=10.491"
" -> Index Scan using UserEvents_nextpass2 on UserEvents (cost=0.56..179371.52 rows=31489 width=82) (actual time=691.529..704.076 rows=3000 loops=1)"
" Index Cond: ((deliveryId = 2108625) AND (sendTime <= '2020-04-15 07:42:42.856859+00'::timestamp with time zone))"
" Filter: (((tsexpire > '2020-04-15 07:42:42.856859+00'::timestamp with time zone) OR (tsexpire IS NULL)) AND (session_id = 0) AND ((level)::text = 'default'::text))"
" Rows Removed by Filter: 29722"
" Buffers: shared hit=573158 read=33"
" I/O Timings: read=10.491"
" -> Nested Loop (cost=0.58..2815.00 rows=1000 width=1962) (actual time=695.658..716.356 rows=1000 loops=1)"
" Buffers: shared hit=585165 read=33 dirtied=29"
" I/O Timings: read=10.491"
" -> CTE Scan on b (cost=0.00..20.00 rows=1000 width=1692) (actual time=695.628..709.116 rows=1000 loops=1)"
" Buffers: shared hit=580158 read=33 dirtied=29"
" I/O Timings: read=10.491"
" -> Index Scan using UserEvents_id on UserEvents e (cost=0.58..2.79 rows=1 width=268) (actual time=0.007..0.007 rows=1 loops=1000)"
" Index Cond: (userid = b.userid)"
" Buffers: shared hit=5007"
"Planning Time: 0.584 ms"
"Execution Time: 751.713 ms"
My index on this table is:
CREATE INDEX UserEvents_nextpass2 ON public.UserEvents USING btree (deliveryid ASC NULLS LAST, sendTime ASC NULLS LAST) WHERE sendTime IS NOT NULL;
Index Scan using UserEvents_nextpass2 on UserEvents (cost=0.56..179371.52 rows=31239 width=82) (actual time=1366.046..1373.339 rows=4186 loops=1)"
Buffers: shared hit=1113493 read=27"
It looks there is a lot of obsolete data in the "UserEvents_nextpass2" index. Visiting 266 pages for every row returned is a bit ridiculous. Do you have any long-open transactions which are blocking VACUUM and btree-specific microvacuum from doing their job effectively?
Look in pg_stat_activity. Also, is hotstandby_feedback on? Is vacuum_defer_cleanup_age not zero?
There is no easy way to reduce pages accessed per row because all my indexed columns are getting updated simultaneously. Since my filter was discarding ~80% of the rows, I decided to add my filter rows to the multi-column index. So my index changed from:
CREATE INDEX UserEvents_nextpass2
ON public.UserEvents USING btree (deliveryid ASC NULLS LAST, sendTime ASC NULLS LAST)
WHERE sendTime IS NOT NULL;
to:
CREATE INDEX UserEvents_nextpass2
ON public.UserEvents USING btree (deliveryid ASC NULLS LAST, sendTime ASC NULLS LAST, tsexpired, session_id, level)
WHERE sendTime IS NOT NULL;
This reduced my filtered-removed rows to 0 and I only accessed the rows I actually need. My buffer hit size was reduced to <100,000 from 1,121,721. The query meantime came down to 200ms from 1.5 secs.
Lesson learned:
Always prefer a multi-column index over filtering

Speeding up the query with multiple joins, group by and order by

I have a SQL query as:
SELECT
title,
(COUNT(DISTINCT A.id)) AS "count_title"
FROM
B
INNER JOIN D ON B.app = D.app
INNER JOIN A ON D.number = A.number
INNER JOIN C ON A.id = C.id
GROUP BY C.title
ORDER BY count_title DESC
LIMIT 10
;
Table D contains 50M records, A contains 30M records, B & C contains 30k records each. Indexes are defined on all columns used in joins, group by, order by.
The query works fine without the order by statement and returns results in around 2-3 sec.
But, with the sorting operation(order by) the query time increases to 10-12 seconds.
I understand the reason behind this, that executor has to traverse all the records for sorting operation and index will hardly help here.
Are there some other ways to speed up this query?
Here is the explain analyze of this query:
"QUERY PLAN"
"Limit (cost=974652.20..974652.22 rows=10 width=54) (actual time=2817.579..2825.071 rows=10 loops=1)"
" Buffers: shared hit=120299 read=573195"
" -> Sort (cost=974652.20..974666.79 rows=5839 width=54) (actual time=2817.578..2817.578 rows=10 loops=1)"
" Sort Key: (count(DISTINCT A.id)) DESC"
" Sort Method: top-N heapsort Memory: 26kB"
" Buffers: shared hit=120299 read=573195"
" -> GroupAggregate (cost=974325.65..974526.02 rows=5839 width=54) (actual time=2792.465..2817.097 rows=3618 loops=1)"
" Group Key: C.title"
" Buffers: shared hit=120299 read=573195"
" -> Sort (cost=974325.65..974372.97 rows=18931 width=32) (actual time=2792.451..2795.161 rows=45175 loops=1)"
" Sort Key: C.title"
" Sort Method: quicksort Memory: 5055kB"
" Buffers: shared hit=120299 read=573195"
" -> Gather (cost=968845.30..972980.74 rows=18931 width=32) (actual time=2753.402..2778.648 rows=45175 loops=1)"
" Workers Planned: 1"
" Workers Launched: 1"
" Buffers: shared hit=120299 read=573195"
" -> Parallel Hash Join (cost=967845.30..970087.64 rows=11136 width=32) (actual time=2751.725..2764.832 rows=22588 loops=2)"
" Hash Cond: ((C.id)::text = (A.id)::text)"
" Buffers: shared hit=120299 read=573195"
" -> Parallel Seq Scan on C (cost=0.00..1945.87 rows=66687 width=32) (actual time=0.017..4.316 rows=56684 loops=2)"
" Buffers: shared read=1279"
" -> Parallel Hash (cost=966604.55..966604.55 rows=99260 width=9) (actual time=2750.987..2750.987 rows=20950 loops=2)"
" Buckets: 262144 Batches: 1 Memory Usage: 4032kB"
" Buffers: shared hit=120266 read=571904"
" -> Nested Loop (cost=219572.23..966604.55 rows=99260 width=9) (actual time=665.832..2744.270 rows=20950 loops=2)"
" Buffers: shared hit=120266 read=571904"
" -> Parallel Hash Join (cost=219571.79..917516.91 rows=99260 width=4) (actual time=665.804..2583.675 rows=20950 loops=2)"
" Hash Cond: ((D.app)::text = (B.app)::text)"
" Buffers: shared hit=8 read=524214"
" -> Parallel Bitmap Heap Scan on D (cost=217542.51..895848.77 rows=5126741 width=13) (actual time=661.254..1861.862 rows=6160441 loops=2)"
" Recheck Cond: ((action_type)::text = ANY ('{10,11}'::text[]))"
" Heap Blocks: exact=242152"
" Buffers: shared hit=3 read=523925"
" -> Bitmap Index Scan on D_index_action_type (cost=0.00..214466.46 rows=12304178 width=0) (actual time=546.470..546.471 rows=12320882 loops=1)"
" Index Cond: ((action_type)::text = ANY ('{10,11}'::text[]))"
" Buffers: shared hit=3 read=33669"
" -> Parallel Hash (cost=1859.36..1859.36 rows=13594 width=12) (actual time=4.337..4.337 rows=16313 loops=2)"
" Buckets: 32768 Batches: 1 Memory Usage: 1152kB"
" Buffers: shared hit=5 read=289"
" -> Parallel Index Only Scan using B_index_app on B (cost=0.29..1859.36 rows=13594 width=12) (actual time=0.015..2.218 rows=16313 loops=2)"
" Heap Fetches: 0"
" Buffers: shared hit=5 read=289"
" -> Index Scan using A_index_number on A (cost=0.43..0.48 rows=1 width=24) (actual time=0.007..0.007 rows=1 loops=41900)"
" Index Cond: ((number)::text = (D.number)::text)"
" Buffers: shared hit=120258 read=47690"
"Planning Time: 0.747 ms"
"Execution Time: 2825.118 ms"
You could try to aim for a nested loop join between b and d because b is so much smaller:
CREATE INDEX ON d (app);
If d is vacuumed frequently enough, you could see if an index only scan is even faster. For that, include number in the index (in v11, use the INCLUDE clause for that!). The EXPLAIN output suggests that you have an extra condition on action_type; you'd have to include that column too for an index only scan.

PostgreSQL query is not using an index

Enviroment
My PostgreSQL (9.2) schema looks like this:
CREATE TABLE first
(
id_first bigint NOT NULL,
first_date timestamp without time zone NOT NULL,
CONSTRAINT first_pkey PRIMARY KEY (id_first)
)
WITH (
OIDS=FALSE
);
CREATE INDEX first_first_date_idx
ON first
USING btree
(first_date);
CREATE TABLE second
(
id_second bigint NOT NULL,
id_first bigint NOT NULL,
CONSTRAINT second_pkey PRIMARY KEY (id_second),
CONSTRAINT fk_first FOREIGN KEY (id_first)
REFERENCES first (id_first) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
WITH (
OIDS=FALSE
);
CREATE INDEX second_id_first_idx
ON second
USING btree
(id_first);
CREATE TABLE third
(
id_third bigint NOT NULL,
id_second bigint NOT NULL,
CONSTRAINT third_pkey PRIMARY KEY (id_third),
CONSTRAINT fk_second FOREIGN KEY (id_second)
REFERENCES second (id_second) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
WITH (
OIDS=FALSE
);
CREATE INDEX third_id_second_idx
ON third
USING btree
(id_second);
So, I have 3 tables with own PK. First has an index on first_date, Second has a FK from First and index on it. Third as a FK from Second and index on it aswell:
First (0 --> n) Second (0 --> n) Third
First table contains about 10 000 000 records.
Second table contains about 20 000 000 records.
Third table contains about 18 000 000 records.
Date range in column first_date is from 2016-01-01 till today.
random_cost_page is set to 2.0.
default_statistics_target is set to 100.
All FK, PK and first_date STATISTICS are set to 5000
Task to do
I want to count all Third rows connected with First, where first_date < X
My query:
SELECT count(t.id_third) AS count
FROM first f
JOIN second s ON s.id_first = f.id_first
JOIN third t ON t.id_second = s.id_second
WHERE first_date < _my_date
Problem description
Asking for 2 days - _my_date = '2016-01-03'
Everything working pretty well. Query lasts 1-2 seconds.
EXPLAIN ANALYZE:
"Aggregate (cost=8585512.55..8585512.56 rows=1 width=8) (actual time=67.310..67.310 rows=1 loops=1)"
" -> Merge Join (cost=4208477.49..8583088.04 rows=969805 width=8) (actual time=44.277..65.948 rows=17631 loops=1)"
" Merge Cond: (s.id_second = t.id_second)"
" -> Sort (cost=4208477.48..4211121.75 rows=1057709 width=8) (actual time=44.263..46.035 rows=19230 loops=1)"
" Sort Key: s.id_second"
" Sort Method: quicksort Memory: 1670kB"
" -> Nested Loop (cost=0.01..4092310.41 rows=1057709 width=8) (actual time=6.169..39.183 rows=19230 loops=1)"
" -> Index Scan using first_first_date_idx on first f (cost=0.01..483786.81 rows=492376 width=8) (actual time=6.159..12.223 rows=10346 loops=1)"
" Index Cond: (first_date < '2016-01-03 00:00:00'::timestamp without time zone)"
" -> Index Scan using second_id_first_idx on second s (cost=0.00..7.26 rows=7 width=16) (actual time=0.002..0.002 rows=2 loops=10346)"
" Index Cond: (id_first = f.id_first)"
" -> Index Scan using third_id_second_idx on third t (cost=0.00..4316649.89 rows=17193788 width=16) (actual time=0.008..7.293 rows=17632 loops=1)"
"Total runtime: 67.369 ms"
Asking for 10 days or more - _my_date = '2016-01-11' or more
Query is not using a indexscan anymore - replaced by seqscan and last 3-4 minutes...
Query plan:
"Aggregate (cost=8731468.75..8731468.76 rows=1 width=8) (actual time=234411.229..234411.229 rows=1 loops=1)"
" -> Hash Join (cost=4352424.81..8728697.88 rows=1108348 width=8) (actual time=189670.068..234400.540 rows=138246 loops=1)"
" Hash Cond: (t.id_second = o.id_second)"
" -> Seq Scan on third t (cost=0.00..4128080.88 rows=17193788 width=16) (actual time=0.016..124111.453 rows=17570724 loops=1)"
" -> Hash (cost=4332592.69..4332592.69 rows=1208810 width=8) (actual time=98566.740..98566.740 rows=151263 loops=1)"
" Buckets: 16384 Batches: 16 Memory Usage: 378kB"
" -> Hash Join (cost=561918.25..4332592.69 rows=1208810 width=8) (actual time=6535.801..98535.915 rows=151263 loops=1)"
" Hash Cond: (s.id_first = f.id_first)"
" -> Seq Scan on second s (cost=0.00..3432617.48 rows=18752248 width=16) (actual time=6090.771..88891.691 rows=19132869 loops=1)"
" -> Hash (cost=552685.31..552685.31 rows=562715 width=8) (actual time=444.630..444.630 rows=81650 loops=1)"
" -> Index Scan using first_first_date_idx on first f (cost=0.01..552685.31 rows=562715 width=8) (actual time=7.987..421.087 rows=81650 loops=1)"
" Index Cond: (first_date < '2016-01-13 00:00:00'::timestamp without time zone)"
"Total runtime: 234411.303 ms"
For test purposes, I have set:
SET enable_seqscan = OFF;
My queries start using indexscan again and last for 1-10 s (depends on range).
Question
Why this is working like that? How to convince a Query Planner to use a indexscan?
EDIT
After reducing a random_page_cost to 1.1, I can select about 30 days now still using a indexscan. Query plan changed a little bit:
"Aggregate (cost=8071389.47..8071389.48 rows=1 width=8) (actual time=4915.196..4915.196 rows=1 loops=1)"
" -> Nested Loop (cost=0.01..8067832.28 rows=1422878 width=8) (actual time=14.402..4866.937 rows=399184 loops=1)"
" -> Nested Loop (cost=0.01..3492321.55 rows=1551849 width=8) (actual time=14.393..3012.617 rows=436794 loops=1)"
" -> Index Scan using first_first_date_idx on first f (cost=0.01..432541.99 rows=722404 width=8) (actual time=14.372..729.233 rows=236007 loops=1)"
" Index Cond: (first_date < '2016-02-01 00:00:00'::timestamp without time zone)"
" -> Index Scan using second_id_first_idx on second s (cost=0.00..4.17 rows=7 width=16) (actual time=0.008..0.009 rows=2 loops=236007)"
" Index Cond: (second = f.id_second)"
" -> Index Scan using third_id_second_idx on third t (cost=0.00..2.94 rows=1 width=16) (actual time=0.004..0.004 rows=1 loops=436794)"
" Index Cond: (id_second = s.id_second)"
"Total runtime: 4915.254 ms"
However, I still don get it why asking for more couse a seqscan...
Iteresting is that, when I ask for range just above some kind of limit I getting a Query plan like this (here select for 40 days - asking for more will produce full seqscan again):
"Aggregate (cost=8403399.27..8403399.28 rows=1 width=8) (actual time=138303.216..138303.217 rows=1 loops=1)"
" -> Hash Join (cost=3887619.07..8399467.63 rows=1572656 width=8) (actual time=44056.443..138261.203 rows=512062 loops=1)"
" Hash Cond: (t.id_second = s.id_second)"
" -> Seq Scan on third t (cost=0.00..4128080.88 rows=17193788 width=16) (actual time=0.004..119497.056 rows=17570724 loops=1)"
" -> Hash (cost=3859478.04..3859478.04 rows=1715203 width=8) (actual time=5695.077..5695.077 rows=560503 loops=1)"
" Buckets: 16384 Batches: 16 Memory Usage: 1390kB"
" -> Nested Loop (cost=0.01..3859478.04 rows=1715203 width=8) (actual time=65.250..5533.413 rows=560503 loops=1)"
" -> Index Scan using first_first_date_idx on first f (cost=0.01..477985.28 rows=798447 width=8) (actual time=64.927..1688.341 rows=302663 loops=1)"
" Index Cond: (first_date < '2016-02-11 00:00:00'::timestamp without time zone)"
" -> Index Scan using second_id_first_idx on second s (cost=0.00..4.17 rows=7 width=16) (actual time=0.010..0.012 rows=2 loops=302663)"
" Index Cond: (id_first = f.id_first)"
"Total runtime: 138303.306 ms"
UPDATE after Laurenz Able suggestions
After rewritting a query plan as Laurenz Able suggested:
"Aggregate (cost=9102321.05..9102321.06 rows=1 width=8) (actual time=15237.830..15237.830 rows=1 loops=1)"
" -> Merge Join (cost=4578171.25..9097528.19 rows=1917143 width=8) (actual time=9111.694..15156.092 rows=803657 loops=1)"
" Merge Cond: (third.id_second = s.id_second)"
" -> Index Scan using third_id_second_idx on third (cost=0.00..4270478.19 rows=17193788 width=16) (actual time=23.650..5425.137 rows=803658 loops=1)"
" -> Materialize (cost=4577722.81..4588177.38 rows=2090914 width=8) (actual time=9088.030..9354.326 rows=879283 loops=1)"
" -> Sort (cost=4577722.81..4582950.09 rows=2090914 width=8) (actual time=9088.023..9238.426 rows=879283 loops=1)"
" Sort Key: s.id_second"
" Sort Method: external sort Disk: 15480kB"
" -> Merge Join (cost=673389.38..4341477.37 rows=2090914 width=8) (actual time=3662.239..8485.768 rows=879283 loops=1)"
" Merge Cond: (s.id_first = f.id_first)"
" -> Index Scan using second_id_first_idx on second s (cost=0.00..3587838.88 rows=18752248 width=16) (actual time=0.015..4204.308 rows=879284 loops=1)"
" -> Materialize (cost=672960.82..677827.55 rows=973345 width=8) (actual time=3662.216..3855.667 rows=892988 loops=1)"
" -> Sort (cost=672960.82..675394.19 rows=973345 width=8) (actual time=3662.213..3745.975 rows=476519 loops=1)"
" Sort Key: f.id_first"
" Sort Method: external sort Disk: 8400kB"
" -> Index Scan using first_first_date_idx on first f (cost=0.01..568352.90 rows=973345 width=8) (actual time=126.386..3233.134 rows=476519 loops=1)"
" Index Cond: (first_date < '2016-03-03 00:00:00'::timestamp without time zone)"
"Total runtime: 15244.404 ms"
First, it looks like some of the estimates are off.
Try to ANALYZE the tables and see if that changes the query plan chosen.
What might also help is to lower random_page_cost to a value just over 1 and see if that improves the plan.
It is interesting to note that the index scan on third_id_second_idx in the fast query produces only 17632 rows instead of over 17 million, which I can only explain by assuming that from that row on, the values of id_second no longer match any row in the join of first and second, i.e. the merge join is completed after that.
You can try to exploit that with with a rewritten query. Try
JOIN (SELECT id_second, id_third FROM third ORDER BY id_second) t
instead of
JOIN third t
That may result in a better plan since PostgreSQL won't optimize the ORDER BY away, and the planner may decide that since it has to sort third anyway, it may be cheaper to use a merge join. That way you trick the planner into choosing a plan that it wouldn't recognize as ideal. With a different value distribution the planner's original choice would probably be better.

Postgresql Explain Plan Differences

this is my first post....
I have a query that is taking longer than I would like (don't we all!)
Depending on what I put in the WHERE clause...it MAY run faster.
I am trying to understand why the query plan is different
AND
what i can do to speed the query up over all.
Here's Query #1:
SELECT date_observed, base_value
FROM device_read_data
WHERE fk_device_rw_id IN
(SELECT fk_device_rw_id FROM equipment_set_rw
WHERE fk_equipment_set_id = CAST('ed151028-1fc0-11e3-b79f-47c0fd87d2b4' AS uuid))
AND date_observed
BETWEEN '2013-12-01 07:45:00+00'::timestamptz
AND '2014-01-01 07:59:59+00'::timestamptz
AND base_value ~ '[0-9]+(\.[0-9]+)?'
;
Here's Query Plan #1:
"Hash Semi Join (cost=11.65..5640243.59 rows=92194 width=16) (actual time=34.947..132522.023 rows=43609 loops=1)"
" Hash Cond: (device_read_data.fk_device_rw_id = equipment_set_rw.fk_device_rw_id)"
" -> Seq Scan on device_read_data (cost=0.00..5449563.56 rows=72157042 width=32) (actual time=0.844..123760.331 rows=71764376 loops=1)"
" Filter: ((date_observed >= '2013-12-01 07:45:00+00'::timestamp with time zone) AND (date_observed <= '2014-01-01 07:59:59+00'::timestamp with time zone) AND ((base_value)::text ~ '[0-9]+(\.[0-9]+)?'::text))"
" Rows Removed by Filter: 82135660"
" -> Hash (cost=11.61..11.61 rows=3 width=16) (actual time=0.018..0.018 rows=1 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 1kB"
" -> Bitmap Heap Scan on equipment_set_rw (cost=4.27..11.61 rows=3 width=16) (actual time=0.016..0.016 rows=1 loops=1)"
" Recheck Cond: (fk_equipment_set_id = 'ed151028-1fc0-11e3-b79f-47c0fd87d2b4'::uuid)"
" -> Bitmap Index Scan on uc_fk_equipment_set_id_fk_device_rw_id (cost=0.00..4.27 rows=3 width=0) (actual time=0.011..0.011 rows=1 loops=1)"
" Index Cond: (fk_equipment_set_id = 'ed151028-1fc0-11e3-b79f-47c0fd87d2b4'::uuid)"
"Total runtime: 132530.290 ms"
Here's Query #2:
SELECT date_observed, base_value
FROM device_read_data
WHERE fk_device_rw_id IN
(SELECT fk_device_rw_id FROM equipment_set_rw
WHERE fk_equipment_set_id = CAST('ed151028-1fc0-11e3-b79f-47c0fd87d2b4' AS uuid))
AND date_observed
BETWEEN '2014-01-01 07:45:00+00'::timestamptz
AND '2014-02-01 07:59:59+00'::timestamptz
AND base_value ~ '[0-9]+(\.[0-9]+)?'
;
Here's Query Plan #2:
"Nested Loop (cost=4.27..1869543.46 rows=20391 width=16) (actual time=0.041..2053.656 rows=12997 loops=1)"
" -> Bitmap Heap Scan on equipment_set_rw (cost=4.27..9.73 rows=2 width=16) (actual time=0.015..0.017 rows=1 loops=1)"
" Recheck Cond: (fk_equipment_set_id = 'ed151028-1fc0-11e3-b79f-47c0fd87d2b4'::uuid)"
" -> Bitmap Index Scan on uc_fk_equipment_set_id_fk_device_rw_id (cost=0.00..4.27 rows=2 width=0) (actual time=0.010..0.010 rows=1 loops=1)"
" Index Cond: (fk_equipment_set_id = 'ed151028-1fc0-11e3-b79f-47c0fd87d2b4'::uuid)"
" -> Index Scan using idx_device_read_data_date_observed_fk_device_rw_id on device_read_data (cost=0.00..934664.91 rows=10195 width=32) (actual time=0.024..2050.656 rows=12997 loops=1)"
" Index Cond: ((date_observed >= '2014-01-01 07:45:00+00'::timestamp with time zone) AND (date_observed <= '2014-02-01 07:59:59+00'::timestamp with time zone) AND (fk_device_rw_id = equipment_set_rw.fk_device_rw_id))"
" Filter: ((base_value)::text ~ '[0-9]+(\.[0-9]+)?'::text)"
"Total runtime: 2055.068 ms"
I've only changed the Date Range in the Where clause.
You can see that in Query #1 there is a Seq Scan on the table VS an Index Scan in Query #2.
I'm trying to determine what is causing this, but I can't seem to find the answer.
Additional Information
There is a composite index on (date_observed, fk_device_rw_id)
There are never any deletes on this table. Autovacuum is not needed.
I vacuumed the table anyway....but this had no effect.
I've rebuilt the Index on this table
I've Analyzed this table
This system is a copy of Prod and is currently Idle
System Information
Running Postgres 9.2 on Linux
16GB System Ram
Shared_Buffers set to 4GB
What other information can I provide? I am sure there are things I have left out.
Thanks for your help.
Edit 1
I tried: set enable_seqscan = false
Here are the Explain Plan Results:
"Hash Semi Join (cost=2566484.50..7008502.81 rows=92194 width=16) (actual time=18587.453..182228.966 rows=43609 loops=1)"
" Hash Cond: (device_read_data.fk_device_rw_id = equipment_set_rw.fk_device_rw_id)"
" -> Bitmap Heap Scan on device_read_data (cost=2566472.85..6817822.78 rows=72157042 width=32) (actual time=18562.247..172074.048 rows=71764376 loops=1)"
" Recheck Cond: ((date_observed >= '2013-12-01 07:45:00+00'::timestamp with time zone) AND (date_observed <= '2014-01-01 07:59:59+00'::timestamp with time zone))"
" Rows Removed by Index Recheck: 2102"
" Filter: ((base_value)::text ~ '[0-9]+(\.[0-9]+)?'::text)"
" Rows Removed by Filter: 12265137"
" -> Bitmap Index Scan on idx_device_read_data_date_observed_fk_device_rw_id (cost=0.00..2548433.59 rows=85430682 width=0) (actual time=18556.228..18556.228 rows=84029513 loops=1)"
" Index Cond: ((date_observed >= '2013-12-01 07:45:00+00'::timestamp with time zone) AND (date_observed <= '2014-01-01 07:59:59+00'::timestamp with time zone))"
" -> Hash (cost=11.61..11.61 rows=3 width=16) (actual time=16.134..16.134 rows=1 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 1kB"
" -> Bitmap Heap Scan on equipment_set_rw (cost=4.27..11.61 rows=3 width=16) (actual time=16.128..16.129 rows=1 loops=1)"
" Recheck Cond: (fk_equipment_set_id = 'ed151028-1fc0-11e3-b79f-47c0fd87d2b4'::uuid)"
" -> Bitmap Index Scan on uc_fk_equipment_set_id_fk_device_rw_id (cost=0.00..4.27 rows=3 width=0) (actual time=16.116..16.116 rows=1 loops=1)"
" Index Cond: (fk_equipment_set_id = 'ed151028-1fc0-11e3-b79f-47c0fd87d2b4'::uuid)"
"Total runtime: 182244.181 ms"
As predicted, the query took longer.
Are there just too may records to make this faster?
What are my choices?
Thanks.
Edit 2
I tried the re-write approach. I'm afraid the results were similar to the original.
Here's the query Plan:
"Hash Join (cost=11.65..6013386.19 rows=90835 width=16) (actual time=35.272..127965.785 rows=43609 loops=1)"
" Hash Cond: (a.fk_device_rw_id = b.fk_device_rw_id)"
" -> Seq Scan on device_read_data a (cost=0.00..5565898.74 rows=71450793 width=32) (actual time=13.050..119667.814 rows=71764376 loops=1)"
" Filter: ((date_observed >= '2013-12-01 07:45:00+00'::timestamp with time zone) AND (date_observed <= '2014-01-01 07:59:59+00'::timestamp with time zone) AND ((base_value)::text ~ '[0-9]+(\.[0-9]+)?'::text))"
" Rows Removed by Filter: 85426425"
" -> Hash (cost=11.61..11.61 rows=3 width=16) (actual time=0.018..0.018 rows=1 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 1kB"
" -> Bitmap Heap Scan on equipment_set_rw b (cost=4.27..11.61 rows=3 width=16) (actual time=0.015..0.016 rows=1 loops=1)"
" Recheck Cond: (fk_equipment_set_id = 'ed151028-1fc0-11e3-b79f-47c0fd87d2b4'::uuid)"
" -> Bitmap Index Scan on uc_fk_equipment_set_id_fk_device_rw_id (cost=0.00..4.27 rows=3 width=0) (actual time=0.011..0.011 rows=1 loops=1)"
" Index Cond: (fk_equipment_set_id = 'ed151028-1fc0-11e3-b79f-47c0fd87d2b4'::uuid)"
"Total runtime: 127992.849 ms"
It seems like a simple problem. Return records from a table that fall in a particular date range. Given my existing system architecture, perhaps there's a threshold of how many records that can exist in the table before performance is adversely affected.
Unless there are other suggestions, I may need to pursue the partitioning approach.
Thanks for the help thus far!
In your first query your date range spans a full month, as opposed to just the one day in the second query. The date range in the first query matches 72M rows out of about 154M rows in device_read_data, which is nearly half of the rows in that table.
Index scans are generally slower than full table scans for that many rows (because an index scan has to read index pages and data pages, the total number of disk reads required to get that many rows is likely larger than just reading every data page).
You can set enable_seq_scan = false before running the first query to see the difference, and if you're feeling adventurous run your explain as explain (analyze, buffers) <query> to see how many block reads you get when doing a table scan versus an index scan.
Edit: For your specific problem you might have some luck using partial indexes. You'll have to figure out how to build these so that they cast as wide a net as possible (it's tempting but wasteful to write a partial index per problem) but you might start with something like this:
create index idx_device_read_data_date_observed_base_value
on device_read_data (date_observed)
where base_value ~ '[0-9]+(\.[0-9]+)?'
;
That index will only be built for those rows matching that base_value pattern. You'd know better than we would if that's a fairly restrictive condition or not (it'd be good for you if it did reduce the number of rows to consider).
You might also flip that idea and index on base_value matching that pattern and make your where conditions something like date_observed between '2013-12-01 and '2013-12-31', adding one such index for each month (this way is likely to get out of hand with just indexes - I'd switch to partitioning).
Another potential improvement could come from re-writing your query. Here's an approach that eliminates the IN condition, which provides the same results if there are no repeats of fk_device_rw_id in equipment_set_rw for the given fk_equipment_set_id.
SELECT a.date_observed, a.base_value
FROM device_read_data a
join equipment_set_rw b
on a.fk_device_rw_id = b.fk_device_rw_id
WHERE b.fk_equipment_set_id = CAST('ed151028-1fc0-11e3-b79f-47c0fd87d2b4' AS uuid)
AND a.date_observed BETWEEN '2014-01-01 07:45:00+00'::timestamptz
AND '2014-02-01 07:59:59+00'::timestamptz
AND a.base_value ~ '[0-9]+(\.[0-9]+)?'
;
I've tried a few things and I'm satisfied for now with the performance.
I changed the index on the device_read_data table to the reverse order of what it was.
Original Index:
CREATE UNIQUE INDEX idx_device_read_data_date_observed_fk_device_rw_id
ON device_read_data
USING btree (date_observed, fk_device_rw_id);
New Index:
CREATE UNIQUE INDEX idx_device_read_data_date_observed_fk_device_rw_id
ON device_read_data
USING btree (fk_device_rw_id, date_observed);
The fk_device_rw_id column has a much lower cardinality. Placing this column first in the index has helped to filter the records much faster.
Also, make sure the columns in the where clause are in the same order as the composite index. (Which is the case....now.)
I altered the statistics on the date_observed column. Thus giving the query planner more information to work with.
Originally it was using the postgres default of 100. I set it to this:
ALTER TABLE device_read_data ALTER COLUMN date_observed SET STATISTICS 1000;
Below are the results of the query. Much...much faster.
I may be able to tweak this further with additional statistics...however, this works for now. I may be able to hold off on partitioning for a bit.
Thanks for all your help.
Query:
explain Analyze
SELECT date_observed, base_value
FROM device_read_data
WHERE fk_device_rw_id IN
(SELECT fk_device_rw_id FROM equipment_set_rw
WHERE fk_equipment_set_id = CAST('ed151028-1fc0-11e3-b79f-47c0fd87d2b4' AS uuid))
AND (date_observed >= '2013-12-01 07:45:00+00'::timestamptz AND date_observed <= '2014- 01-01 07:59:59+00'::timestamptz)
AND base_value ~ '[0-9]+(\.[0-9]+)?'
;
New Query Plan:
"Nested Loop (cost=1197.25..264699.54 rows=59694 width=16) (actual time=25.876..493.073 rows=43609 loops=1)"
" -> Bitmap Heap Scan on equipment_set_rw (cost=4.27..9.73 rows=2 width=16) (actual time=0.018..0.019 rows=1 loops=1)"
" Recheck Cond: (fk_equipment_set_id = 'ed151028-1fc0-11e3-b79f-47c0fd87d2b4'::uuid)"
" -> Bitmap Index Scan on uc_fk_equipment_set_id_fk_device_rw_id (cost=0.00..4.27 rows=2 width=0) (actual time=0.012..0.012 rows=1 loops=1)"
" Index Cond: (fk_equipment_set_id = 'ed151028-1fc0-11e3-b79f-47c0fd87d2b4'::uuid)"
" -> Bitmap Heap Scan on device_read_data (cost=1192.99..132046.43 rows=29847 width=32) (actual time=25.849..486.701 rows=43609 loops=1)"
" Recheck Cond: ((fk_device_rw_id = equipment_set_rw.fk_device_rw_id) AND (date_observed >= '2013-12-01 07:45:00+00'::timestamp with time zone) AND (date_observed <= '2014-01-01 07:59:59+00'::timestamp with time zone))"
" Rows Removed by Index Recheck: 2076173"
" Filter: ((base_value)::text ~ '[0-9]+(\.[0-9]+)?'::text)"
" -> Bitmap Index Scan on idx_device_read_data_date_observed_fk_device_rw_id (cost=0.00..1185.53 rows=35640 width=0) (actual time=24.000..24.000 rows=43609 loops=1)"
" Index Cond: ((fk_device_rw_id = equipment_set_rw.fk_device_rw_id) AND (date_observed >= '2013-12-01 07:45:00+00'::timestamp with time zone) AND (date_observed <= '2014-01-01 07:59:59+00'::timestamp with time zone))"
"Total runtime: 495.506 ms"