Postgresql Explain Plan Differences - sql

this is my first post....
I have a query that is taking longer than I would like (don't we all!)
Depending on what I put in the WHERE clause...it MAY run faster.
I am trying to understand why the query plan is different
AND
what i can do to speed the query up over all.
Here's Query #1:
SELECT date_observed, base_value
FROM device_read_data
WHERE fk_device_rw_id IN
(SELECT fk_device_rw_id FROM equipment_set_rw
WHERE fk_equipment_set_id = CAST('ed151028-1fc0-11e3-b79f-47c0fd87d2b4' AS uuid))
AND date_observed
BETWEEN '2013-12-01 07:45:00+00'::timestamptz
AND '2014-01-01 07:59:59+00'::timestamptz
AND base_value ~ '[0-9]+(\.[0-9]+)?'
;
Here's Query Plan #1:
"Hash Semi Join (cost=11.65..5640243.59 rows=92194 width=16) (actual time=34.947..132522.023 rows=43609 loops=1)"
" Hash Cond: (device_read_data.fk_device_rw_id = equipment_set_rw.fk_device_rw_id)"
" -> Seq Scan on device_read_data (cost=0.00..5449563.56 rows=72157042 width=32) (actual time=0.844..123760.331 rows=71764376 loops=1)"
" Filter: ((date_observed >= '2013-12-01 07:45:00+00'::timestamp with time zone) AND (date_observed <= '2014-01-01 07:59:59+00'::timestamp with time zone) AND ((base_value)::text ~ '[0-9]+(\.[0-9]+)?'::text))"
" Rows Removed by Filter: 82135660"
" -> Hash (cost=11.61..11.61 rows=3 width=16) (actual time=0.018..0.018 rows=1 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 1kB"
" -> Bitmap Heap Scan on equipment_set_rw (cost=4.27..11.61 rows=3 width=16) (actual time=0.016..0.016 rows=1 loops=1)"
" Recheck Cond: (fk_equipment_set_id = 'ed151028-1fc0-11e3-b79f-47c0fd87d2b4'::uuid)"
" -> Bitmap Index Scan on uc_fk_equipment_set_id_fk_device_rw_id (cost=0.00..4.27 rows=3 width=0) (actual time=0.011..0.011 rows=1 loops=1)"
" Index Cond: (fk_equipment_set_id = 'ed151028-1fc0-11e3-b79f-47c0fd87d2b4'::uuid)"
"Total runtime: 132530.290 ms"
Here's Query #2:
SELECT date_observed, base_value
FROM device_read_data
WHERE fk_device_rw_id IN
(SELECT fk_device_rw_id FROM equipment_set_rw
WHERE fk_equipment_set_id = CAST('ed151028-1fc0-11e3-b79f-47c0fd87d2b4' AS uuid))
AND date_observed
BETWEEN '2014-01-01 07:45:00+00'::timestamptz
AND '2014-02-01 07:59:59+00'::timestamptz
AND base_value ~ '[0-9]+(\.[0-9]+)?'
;
Here's Query Plan #2:
"Nested Loop (cost=4.27..1869543.46 rows=20391 width=16) (actual time=0.041..2053.656 rows=12997 loops=1)"
" -> Bitmap Heap Scan on equipment_set_rw (cost=4.27..9.73 rows=2 width=16) (actual time=0.015..0.017 rows=1 loops=1)"
" Recheck Cond: (fk_equipment_set_id = 'ed151028-1fc0-11e3-b79f-47c0fd87d2b4'::uuid)"
" -> Bitmap Index Scan on uc_fk_equipment_set_id_fk_device_rw_id (cost=0.00..4.27 rows=2 width=0) (actual time=0.010..0.010 rows=1 loops=1)"
" Index Cond: (fk_equipment_set_id = 'ed151028-1fc0-11e3-b79f-47c0fd87d2b4'::uuid)"
" -> Index Scan using idx_device_read_data_date_observed_fk_device_rw_id on device_read_data (cost=0.00..934664.91 rows=10195 width=32) (actual time=0.024..2050.656 rows=12997 loops=1)"
" Index Cond: ((date_observed >= '2014-01-01 07:45:00+00'::timestamp with time zone) AND (date_observed <= '2014-02-01 07:59:59+00'::timestamp with time zone) AND (fk_device_rw_id = equipment_set_rw.fk_device_rw_id))"
" Filter: ((base_value)::text ~ '[0-9]+(\.[0-9]+)?'::text)"
"Total runtime: 2055.068 ms"
I've only changed the Date Range in the Where clause.
You can see that in Query #1 there is a Seq Scan on the table VS an Index Scan in Query #2.
I'm trying to determine what is causing this, but I can't seem to find the answer.
Additional Information
There is a composite index on (date_observed, fk_device_rw_id)
There are never any deletes on this table. Autovacuum is not needed.
I vacuumed the table anyway....but this had no effect.
I've rebuilt the Index on this table
I've Analyzed this table
This system is a copy of Prod and is currently Idle
System Information
Running Postgres 9.2 on Linux
16GB System Ram
Shared_Buffers set to 4GB
What other information can I provide? I am sure there are things I have left out.
Thanks for your help.
Edit 1
I tried: set enable_seqscan = false
Here are the Explain Plan Results:
"Hash Semi Join (cost=2566484.50..7008502.81 rows=92194 width=16) (actual time=18587.453..182228.966 rows=43609 loops=1)"
" Hash Cond: (device_read_data.fk_device_rw_id = equipment_set_rw.fk_device_rw_id)"
" -> Bitmap Heap Scan on device_read_data (cost=2566472.85..6817822.78 rows=72157042 width=32) (actual time=18562.247..172074.048 rows=71764376 loops=1)"
" Recheck Cond: ((date_observed >= '2013-12-01 07:45:00+00'::timestamp with time zone) AND (date_observed <= '2014-01-01 07:59:59+00'::timestamp with time zone))"
" Rows Removed by Index Recheck: 2102"
" Filter: ((base_value)::text ~ '[0-9]+(\.[0-9]+)?'::text)"
" Rows Removed by Filter: 12265137"
" -> Bitmap Index Scan on idx_device_read_data_date_observed_fk_device_rw_id (cost=0.00..2548433.59 rows=85430682 width=0) (actual time=18556.228..18556.228 rows=84029513 loops=1)"
" Index Cond: ((date_observed >= '2013-12-01 07:45:00+00'::timestamp with time zone) AND (date_observed <= '2014-01-01 07:59:59+00'::timestamp with time zone))"
" -> Hash (cost=11.61..11.61 rows=3 width=16) (actual time=16.134..16.134 rows=1 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 1kB"
" -> Bitmap Heap Scan on equipment_set_rw (cost=4.27..11.61 rows=3 width=16) (actual time=16.128..16.129 rows=1 loops=1)"
" Recheck Cond: (fk_equipment_set_id = 'ed151028-1fc0-11e3-b79f-47c0fd87d2b4'::uuid)"
" -> Bitmap Index Scan on uc_fk_equipment_set_id_fk_device_rw_id (cost=0.00..4.27 rows=3 width=0) (actual time=16.116..16.116 rows=1 loops=1)"
" Index Cond: (fk_equipment_set_id = 'ed151028-1fc0-11e3-b79f-47c0fd87d2b4'::uuid)"
"Total runtime: 182244.181 ms"
As predicted, the query took longer.
Are there just too may records to make this faster?
What are my choices?
Thanks.
Edit 2
I tried the re-write approach. I'm afraid the results were similar to the original.
Here's the query Plan:
"Hash Join (cost=11.65..6013386.19 rows=90835 width=16) (actual time=35.272..127965.785 rows=43609 loops=1)"
" Hash Cond: (a.fk_device_rw_id = b.fk_device_rw_id)"
" -> Seq Scan on device_read_data a (cost=0.00..5565898.74 rows=71450793 width=32) (actual time=13.050..119667.814 rows=71764376 loops=1)"
" Filter: ((date_observed >= '2013-12-01 07:45:00+00'::timestamp with time zone) AND (date_observed <= '2014-01-01 07:59:59+00'::timestamp with time zone) AND ((base_value)::text ~ '[0-9]+(\.[0-9]+)?'::text))"
" Rows Removed by Filter: 85426425"
" -> Hash (cost=11.61..11.61 rows=3 width=16) (actual time=0.018..0.018 rows=1 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 1kB"
" -> Bitmap Heap Scan on equipment_set_rw b (cost=4.27..11.61 rows=3 width=16) (actual time=0.015..0.016 rows=1 loops=1)"
" Recheck Cond: (fk_equipment_set_id = 'ed151028-1fc0-11e3-b79f-47c0fd87d2b4'::uuid)"
" -> Bitmap Index Scan on uc_fk_equipment_set_id_fk_device_rw_id (cost=0.00..4.27 rows=3 width=0) (actual time=0.011..0.011 rows=1 loops=1)"
" Index Cond: (fk_equipment_set_id = 'ed151028-1fc0-11e3-b79f-47c0fd87d2b4'::uuid)"
"Total runtime: 127992.849 ms"
It seems like a simple problem. Return records from a table that fall in a particular date range. Given my existing system architecture, perhaps there's a threshold of how many records that can exist in the table before performance is adversely affected.
Unless there are other suggestions, I may need to pursue the partitioning approach.
Thanks for the help thus far!

In your first query your date range spans a full month, as opposed to just the one day in the second query. The date range in the first query matches 72M rows out of about 154M rows in device_read_data, which is nearly half of the rows in that table.
Index scans are generally slower than full table scans for that many rows (because an index scan has to read index pages and data pages, the total number of disk reads required to get that many rows is likely larger than just reading every data page).
You can set enable_seq_scan = false before running the first query to see the difference, and if you're feeling adventurous run your explain as explain (analyze, buffers) <query> to see how many block reads you get when doing a table scan versus an index scan.
Edit: For your specific problem you might have some luck using partial indexes. You'll have to figure out how to build these so that they cast as wide a net as possible (it's tempting but wasteful to write a partial index per problem) but you might start with something like this:
create index idx_device_read_data_date_observed_base_value
on device_read_data (date_observed)
where base_value ~ '[0-9]+(\.[0-9]+)?'
;
That index will only be built for those rows matching that base_value pattern. You'd know better than we would if that's a fairly restrictive condition or not (it'd be good for you if it did reduce the number of rows to consider).
You might also flip that idea and index on base_value matching that pattern and make your where conditions something like date_observed between '2013-12-01 and '2013-12-31', adding one such index for each month (this way is likely to get out of hand with just indexes - I'd switch to partitioning).
Another potential improvement could come from re-writing your query. Here's an approach that eliminates the IN condition, which provides the same results if there are no repeats of fk_device_rw_id in equipment_set_rw for the given fk_equipment_set_id.
SELECT a.date_observed, a.base_value
FROM device_read_data a
join equipment_set_rw b
on a.fk_device_rw_id = b.fk_device_rw_id
WHERE b.fk_equipment_set_id = CAST('ed151028-1fc0-11e3-b79f-47c0fd87d2b4' AS uuid)
AND a.date_observed BETWEEN '2014-01-01 07:45:00+00'::timestamptz
AND '2014-02-01 07:59:59+00'::timestamptz
AND a.base_value ~ '[0-9]+(\.[0-9]+)?'
;

I've tried a few things and I'm satisfied for now with the performance.
I changed the index on the device_read_data table to the reverse order of what it was.
Original Index:
CREATE UNIQUE INDEX idx_device_read_data_date_observed_fk_device_rw_id
ON device_read_data
USING btree (date_observed, fk_device_rw_id);
New Index:
CREATE UNIQUE INDEX idx_device_read_data_date_observed_fk_device_rw_id
ON device_read_data
USING btree (fk_device_rw_id, date_observed);
The fk_device_rw_id column has a much lower cardinality. Placing this column first in the index has helped to filter the records much faster.
Also, make sure the columns in the where clause are in the same order as the composite index. (Which is the case....now.)
I altered the statistics on the date_observed column. Thus giving the query planner more information to work with.
Originally it was using the postgres default of 100. I set it to this:
ALTER TABLE device_read_data ALTER COLUMN date_observed SET STATISTICS 1000;
Below are the results of the query. Much...much faster.
I may be able to tweak this further with additional statistics...however, this works for now. I may be able to hold off on partitioning for a bit.
Thanks for all your help.
Query:
explain Analyze
SELECT date_observed, base_value
FROM device_read_data
WHERE fk_device_rw_id IN
(SELECT fk_device_rw_id FROM equipment_set_rw
WHERE fk_equipment_set_id = CAST('ed151028-1fc0-11e3-b79f-47c0fd87d2b4' AS uuid))
AND (date_observed >= '2013-12-01 07:45:00+00'::timestamptz AND date_observed <= '2014- 01-01 07:59:59+00'::timestamptz)
AND base_value ~ '[0-9]+(\.[0-9]+)?'
;
New Query Plan:
"Nested Loop (cost=1197.25..264699.54 rows=59694 width=16) (actual time=25.876..493.073 rows=43609 loops=1)"
" -> Bitmap Heap Scan on equipment_set_rw (cost=4.27..9.73 rows=2 width=16) (actual time=0.018..0.019 rows=1 loops=1)"
" Recheck Cond: (fk_equipment_set_id = 'ed151028-1fc0-11e3-b79f-47c0fd87d2b4'::uuid)"
" -> Bitmap Index Scan on uc_fk_equipment_set_id_fk_device_rw_id (cost=0.00..4.27 rows=2 width=0) (actual time=0.012..0.012 rows=1 loops=1)"
" Index Cond: (fk_equipment_set_id = 'ed151028-1fc0-11e3-b79f-47c0fd87d2b4'::uuid)"
" -> Bitmap Heap Scan on device_read_data (cost=1192.99..132046.43 rows=29847 width=32) (actual time=25.849..486.701 rows=43609 loops=1)"
" Recheck Cond: ((fk_device_rw_id = equipment_set_rw.fk_device_rw_id) AND (date_observed >= '2013-12-01 07:45:00+00'::timestamp with time zone) AND (date_observed <= '2014-01-01 07:59:59+00'::timestamp with time zone))"
" Rows Removed by Index Recheck: 2076173"
" Filter: ((base_value)::text ~ '[0-9]+(\.[0-9]+)?'::text)"
" -> Bitmap Index Scan on idx_device_read_data_date_observed_fk_device_rw_id (cost=0.00..1185.53 rows=35640 width=0) (actual time=24.000..24.000 rows=43609 loops=1)"
" Index Cond: ((fk_device_rw_id = equipment_set_rw.fk_device_rw_id) AND (date_observed >= '2013-12-01 07:45:00+00'::timestamp with time zone) AND (date_observed <= '2014-01-01 07:59:59+00'::timestamp with time zone))"
"Total runtime: 495.506 ms"

Related

Postgres query planner tuning

We have a db (9.6) that contains measurement data. The relevant query regards 3 tables:
aufnehmer (i.e. sensor), 5e+2 entries,
zeitpunkt (i.e. point in time), 4e+6 entries
wert (i.e. value), 6e+8 entries
aufnehmer : zeitpunkt = m : n with wert as the mapping table. All relevant columns are indexed.
The following query
select count(*) from wert w
inner join aufnehmer a on w.aufnehmer_id = a.id
inner join zeitpunkt z on z.id = w.zeitpunkt_id
where a.id = 12749
and z.zeitpunkt <= ('2018-05-07')::timestamp without time zone
and z.zeitpunkt >= ('2018-05-01')::timestamp without time zone;
produces this query plan:
Aggregate (cost=3429124.66..3429124.67 rows=1 width=8) (actual time=66.252..66.252 rows=1 loops=1)
-> Nested Loop (cost=571.52..3429084.29 rows=16149 width=0) (actual time=19.051..65.406 rows=15942 loops=1)
-> Index Only Scan using idx_aufnehmer_id on aufnehmer a (cost=0.28..8.29 rows=1 width=4) (actual time=0.007..0.008 rows=1 loops=1)
Index Cond: (id = 12749)
Heap Fetches: 1
-> Nested Loop (cost=571.24..3428914.50 rows=16149 width=4) (actual time=19.040..64.502 rows=15942 loops=1)
-> Bitmap Heap Scan on zeitpunkt z (cost=570.67..22710.60 rows=26755 width=4) (actual time=1.551..3.407 rows=24566 loops=1)
Recheck Cond: ((zeitpunkt <= '2018-05-07 00:00:00'::timestamp without time zone) AND (zeitpunkt >= '2018-05-01 00:00:00'::timestamp without time zone))
Heap Blocks: exact=135
-> Bitmap Index Scan on idx_zeitpunkt_zeitpunkt_order_desc (cost=0.00..563.98 rows=26755 width=0) (actual time=1.527..1.527 rows=24566 loops=1)
Index Cond: ((zeitpunkt <= '2018-05-07 00:00:00'::timestamp without time zone) AND (zeitpunkt >= '2018-05-01 00:00:00'::timestamp without time zone))
-> Index Only Scan using uq1_wert on wert w (cost=0.57..126.94 rows=37 width=8) (actual time=0.002..0.002 rows=1 loops=24566)
Index Cond: ((aufnehmer_id = 12749) AND (zeitpunkt_id = z.id))
Heap Fetches: 15942
Planning time: 0.399 ms
Execution time: 66.339 ms
and takes about a second.
When the end date is augmented by one day and the query is changed to:
... --same as above
and z.zeitpunkt <= ('2018-05-08')::timestamp without time zone
and z.zeitpunkt >= ('2018-05-01')::timestamp without time zone;
the query plan changes to
Aggregate (cost=3711151.24..3711151.25 rows=1 width=8) (actual time=35601.351..35601.351 rows=1 loops=1)
-> Nested Loop (cost=66264.74..3711104.14 rows=18840 width=0) (actual time=35348.705..35600.192 rows=17612 loops=1)
-> Index Only Scan using idx_aufnehmer_id on aufnehmer a (cost=0.28..8.29 rows=1 width=4) (actual time=0.007..0.010 rows=1 loops=1)
Index Cond: (id = 12749)
Heap Fetches: 1
-> Hash Join (cost=66264.47..3710907.45 rows=18840 width=4) (actual time=35348.693..35598.183 rows=17612 loops=1)
Hash Cond: (w.zeitpunkt_id = z.id)
-> Bitmap Heap Scan on wert w (cost=43133.18..3678947.46 rows=2304078 width=8) (actual time=912.086..35145.680 rows=2334815 loops=1)
Recheck Cond: (aufnehmer_id = 12749)
Rows Removed by Index Recheck: 205191912
Heap Blocks: exact=504397 lossy=1316875
-> Bitmap Index Scan on idx_wert_aufnehmer_id (cost=0.00..42557.16 rows=2304078 width=0) (actual time=744.144..744.144 rows=2334815 loops=1)
Index Cond: (aufnehmer_id = 12749)
-> Hash (cost=22741.12..22741.12 rows=31214 width=4) (actual time=8.909..8.909 rows=27675 loops=1)
Buckets: 32768 Batches: 1 Memory Usage: 1229kB
-> Bitmap Heap Scan on zeitpunkt z (cost=664.37..22741.12 rows=31214 width=4) (actual time=1.822..5.600 rows=27675 loops=1)
Recheck Cond: ((zeitpunkt <= '2018-05-08 00:00:00'::timestamp without time zone) AND (zeitpunkt >= '2018-05-01 00:00:00'::timestamp without time zone))
Heap Blocks: exact=152
-> Bitmap Index Scan on idx_zeitpunkt_zeitpunkt_order_desc (cost=0.00..656.57 rows=31214 width=0) (actual time=1.798..1.798 rows=27675 loops=1)
Index Cond: ((zeitpunkt <= '2018-05-08 00:00:00'::timestamp without time zone) AND (zeitpunkt >= '2018-05-01 00:00:00'::timestamp without time zone))
Planning time: 0.404 ms
Execution time: 35608.286 ms
and the execution takes about 1000 times longer.
So it seems that the query planner switches to joining aufnehmer and wert first which takes much longer than joining zeitpunkt and wert first.
Any idea if it can be forced to keep the first execution plan? We have already increased work_mem but it did not make any difference.

PostgreSQL 10 - IN and ANY performance inexplicable behaviour

I do selection from big table where id in array/list.
Checked several variants, result wonder me.
1. Use ANY and ARRAY
EXPLAIN (ANALYZE,BUFFERS)
SELECT * FROM cca_data_hours
WHERE
datetime = '2018-01-07 19:00:00'::timestamp without time zone AND
id_web_page = ANY (ARRAY[1, 2, 8, 3 /* ~50k ids */])
Result
"Index Scan using cca_data_hours_pri on cca_data_hours (cost=0.28..576.79 rows=15 width=188) (actual time=0.035..0.998 rows=6 loops=1)"
" Index Cond: (datetime = '2018-01-07 19:00:00'::timestamp without time zone)"
" Filter: (id_web_page = ANY ('{1,2,8,3, (...)"
" Rows Removed by Filter: 5"
" Buffers: shared hit=3"
"Planning time: 57.625 ms"
"Execution time: 1.065 ms"
2. Use IN and VALUES
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM cca_data_hours
WHERE
datetime = '2018-01-07 19:00:00'::timestamp without time zone AND
id_web_page IN (VALUES (1),(2),(8),(3) /* ~50k ids */)
Result
"Hash Join (cost=439.77..472.66 rows=8 width=188) (actual time=90.806..90.858 rows=6 loops=1)"
" Hash Cond: (cca_data_hours.id_web_page = "*VALUES*".column1)"
" Buffers: shared hit=3"
" -> Index Scan using cca_data_hours_pri on cca_data_hours (cost=0.28..33.06 rows=15 width=188) (actual time=0.035..0.060 rows=11 loops=1)"
" Index Cond: (datetime = '2018-01-07 19:00:00'::timestamp without time zone)"
" Buffers: shared hit=3"
" -> Hash (cost=436.99..436.99 rows=200 width=4) (actual time=90.742..90.742 rows=4 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 9kB"
" -> HashAggregate (cost=434.99..436.99 rows=200 width=4) (actual time=90.709..90.717 rows=4 loops=1)"
" Group Key: "*VALUES*".column1"
" -> Values Scan on "*VALUES*" (cost=0.00..362.49 rows=28999 width=4) (actual time=0.008..47.056 rows=28999 loops=1)"
"Planning time: 53.607 ms"
"Execution time: 91.681 ms"
I expect case #2 will faster, but it is not like.
Why IN with VALUES slowelly?
Comparing the EXPLAIN ANALYZE results, it looks like the old version wasn't using the available index to key in the given examples. The reason why ANY (ARRAY[]) became faster is in version 9.2 https://www.postgresql.org/docs/current/static/release-9-2.html
Allow indexed_col op ANY(ARRAY[...]) conditions to be used in plain index scans and index-only scans (Tom Lane)
The site where you got the suggestion from was about version 9.0

Significantly different time of execution of the query due to different date value in Postgres

I have a weird case of query execution performance here. The query has a date values in the WHERE clause, and the speed of executing varies by values of the date.
Actually:
for the dates from the range of the last 30 days, execution takes around 3 min
for the dates before the range of the last 30 days, execution takes a few seconds
The query is listed below, with the date in the last 30 days range:
select
sk2_.code as col_0_0_,
bra4_.code as col_1_0_,
st0_.quantity as col_2_0_,
bat1_.forecast as col_3_0_
from
TBL_st st0_,
TBL_bat bat1_,
TBL_sk sk2_,
TBL_bra bra4_
where
st0_.batc_id=bat1_.id
and bat1_.sku_id=sk2_.id
and bat1_.bran_id=bra4_.id
and not (exists (select
1
from
TBL_st st6_,
TBL_bat bat7_,
TBL_sk sk10_
where
st6_.batc_id=bat7_.id
and bat7_.sku_id=sk10_.id
and bat7_.bran_id=bat1_.bran_id
and sk10_.code=sk2_.code
and st6_.date>st0_.date
and sk10_.acco_id=1
and st6_.date>='2017-04-20'
and st6_.date<='2017-04-30'))
and sk2_.acco_id=1
and st0_.date>='2017-04-20'
and st0_.date<='2017-04-30'
and here is the plan for the query with the date in the last 30 days range:
Nested Loop (cost=289.06..19764.03 rows=1 width=430) (actual time=3482.062..326049.246 rows=249 loops=1)
-> Nested Loop Anti Join (cost=288.91..19763.86 rows=1 width=433) (actual time=3482.023..326048.023 rows=249 loops=1)
Join Filter: ((st6_.date > st0_.date) AND ((sk10_.code)::text = (sk2_.code)::text))
Rows Removed by Join Filter: 210558
-> Nested Loop (cost=286.43..13719.38 rows=1 width=441) (actual time=4.648..2212.042 rows=2474 loops=1)
-> Nested Loop (cost=286.00..6871.33 rows=13335 width=436) (actual time=4.262..657.823 rows=666738 loops=1)
-> Index Scan using uk_TBL_sk0_account_code on TBL_sk sk2_ (cost=0.14..12.53 rows=1 width=426) (actual time=1.036..1.084 rows=50 loops=1)
Index Cond: (acco_id = 1)
-> Bitmap Heap Scan on TBL_bat bat1_ (cost=285.86..6707.27 rows=15153 width=26) (actual time=3.675..11.308 rows=13335 loops=50)
Recheck Cond: (sku_id = sk2_.id)
Heap Blocks: exact=241295
-> Bitmap Index Scan on ix_al_batc_sku_id (cost=0.00..282.07 rows=15153 width=0) (actual time=3.026..3.026 rows=13335 loops=50)
Index Cond: (sku_id = sk2_.id)
-> Index Scan using ix_al_stle_batc_id on TBL_st st0_ (cost=0.42..0.50 rows=1 width=21) (actual time=0.002..0.002 rows=0 loops=666738)
Index Cond: (batc_id = bat1_.id)
Filter: ((date >= '2017-04-20 00:00:00'::timestamp without time zone) AND (date <= '2017-04-30 00:00:00'::timestamp without time zone))
Rows Removed by Filter: 1
-> Nested Loop (cost=2.49..3023.47 rows=1 width=434) (actual time=111.345..130.883 rows=86 loops=2474)
-> Hash Join (cost=2.06..2045.18 rows=1905 width=434) (actual time=0.010..28.028 rows=54853 loops=2474)
Hash Cond: (bat7_.sku_id = sk10_.id)
-> Index Scan using ix_al_batc_bran_id on TBL_bat bat7_ (cost=0.42..1667.31 rows=95248 width=24) (actual time=0.009..11.045 rows=54853 loops=2474)
Index Cond: (bran_id = bat1_.bran_id)
-> Hash (cost=1.63..1.63 rows=1 width=426) (actual time=0.026..0.026 rows=50 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 11kB
-> Seq Scan on TBL_sk sk10_ (cost=0.00..1.63 rows=1 width=426) (actual time=0.007..0.015 rows=50 loops=1)
Filter: (acco_id = 1)
-> Index Scan using ix_al_stle_batc_id on TBL_st st6_ (cost=0.42..0.50 rows=1 width=16) (actual time=0.002..0.002 rows=0 loops=135706217)
Index Cond: (batc_id = bat7_.id)
Filter: ((date >= '2017-04-20 00:00:00'::timestamp without time zone) AND (date <= '2017-04-30 00:00:00'::timestamp without time zone))
Rows Removed by Filter: 1
-> Index Scan using TBL_bra_pk on TBL_bra bra4_ (cost=0.14..0.16 rows=1 width=13) (actual time=0.003..0.003 rows=1 loops=249)
Index Cond: (id = bat1_.bran_id)
Planning time: 8.108 ms
Execution time: 326049.583 ms
Here is the same query with the date before the last 30 days range:
select
sk2_.code as col_0_0_,
bra4_.code as col_1_0_,
st0_.quantity as col_2_0_,
bat1_.forecast as col_3_0_
from
TBL_st st0_,
TBL_bat bat1_,
TBL_sk sk2_,
TBL_bra bra4_
where
st0_.batc_id=bat1_.id
and bat1_.sku_id=sk2_.id
and bat1_.bran_id=bra4_.id
and not (exists (select
1
from
TBL_st st6_,
TBL_bat bat7_,
TBL_sk sk10_
where
st6_.batc_id=bat7_.id
and bat7_.sku_id=sk10_.id
and bat7_.bran_id=bat1_.bran_id
and sk10_.code=sk2_.code
and st6_.date>st0_.date
and sk10_.acco_id=1
and st6_.date>='2017-01-20'
and st6_.date<='2017-01-30'))
and sk2_.acco_id=1
and st0_.date>='2017-01-20'
and st0_.date<='2017-01-30'
and here is the plan for the query with the date before the last 30 days range:
Hash Join (cost=576.33..27443.95 rows=48 width=430) (actual time=132.732..3894.554 rows=250 loops=1)
Hash Cond: (bat1_.bran_id = bra4_.id)
-> Merge Anti Join (cost=572.85..27439.82 rows=48 width=433) (actual time=132.679..3894.287 rows=250 loops=1)
Merge Cond: ((sk2_.code)::text = (sk10_.code)::text)
Join Filter: ((st6_.date > st0_.date) AND (bat7_.bran_id = bat1_.bran_id))
Rows Removed by Join Filter: 84521
-> Nested Loop (cost=286.43..13719.38 rows=48 width=441) (actual time=26.105..1893.523 rows=2491 loops=1)
-> Nested Loop (cost=286.00..6871.33 rows=13335 width=436) (actual time=1.159..445.683 rows=666738 loops=1)
-> Index Scan using uk_TBL_sk0_account_code on TBL_sk sk2_ (cost=0.14..12.53 rows=1 width=426) (actual time=0.035..0.084 rows=50 loops=1)
Index Cond: (acco_id = 1)
-> Bitmap Heap Scan on TBL_bat bat1_ (cost=285.86..6707.27 rows=15153 width=26) (actual time=1.741..7.148 rows=13335 loops=50)
Recheck Cond: (sku_id = sk2_.id)
Heap Blocks: exact=241295
-> Bitmap Index Scan on ix_al_batc_sku_id (cost=0.00..282.07 rows=15153 width=0) (actual time=1.119..1.119 rows=13335 loops=50)
Index Cond: (sku_id = sk2_.id)
-> Index Scan using ix_al_stle_batc_id on TBL_st st0_ (cost=0.42..0.50 rows=1 width=21) (actual time=0.002..0.002 rows=0 loops=666738)
Index Cond: (batc_id = bat1_.id)
Filter: ((date >= '2017-01-20 00:00:00'::timestamp without time zone) AND (date <= '2017-01-30 00:00:00'::timestamp without time zone))
Rows Removed by Filter: 1
-> Materialize (cost=286.43..13719.50 rows=48 width=434) (actual time=15.584..1986.953 rows=84560 loops=1)
-> Nested Loop (cost=286.43..13719.38 rows=48 width=434) (actual time=15.577..1983.384 rows=2491 loops=1)
-> Nested Loop (cost=286.00..6871.33 rows=13335 width=434) (actual time=0.843..482.864 rows=666738 loops=1)
-> Index Scan using uk_TBL_sk0_account_code on TBL_sk sk10_ (cost=0.14..12.53 rows=1 width=426) (actual time=0.005..0.052 rows=50 loops=1)
Index Cond: (acco_id = 1)
-> Bitmap Heap Scan on TBL_bat bat7_ (cost=285.86..6707.27 rows=15153 width=24) (actual time=2.051..7.902 rows=13335 loops=50)
Recheck Cond: (sku_id = sk10_.id)
Heap Blocks: exact=241295
-> Bitmap Index Scan on ix_al_batc_sku_id (cost=0.00..282.07 rows=15153 width=0) (actual time=1.424..1.424 rows=13335 loops=50)
Index Cond: (sku_id = sk10_.id)
-> Index Scan using ix_al_stle_batc_id on TBL_st st6_ (cost=0.42..0.50 rows=1 width=16) (actual time=0.002..0.002 rows=0 loops=666738)
Index Cond: (batc_id = bat7_.id)
Filter: ((date >= '2017-01-20 00:00:00'::timestamp without time zone) AND (date <= '2017-01-30 00:00:00'::timestamp without time zone))
Rows Removed by Filter: 1
-> Hash (cost=2.10..2.10 rows=110 width=13) (actual time=0.033..0.033 rows=110 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 14kB
-> Seq Scan on TBL_bra bra4_ (cost=0.00..2.10 rows=110 width=13) (actual time=0.004..0.013 rows=110 loops=1)
Planning time: 14.542 ms
Execution time: 3894.793 ms
Does anyone have an idea why does this happens.
Did anyone had an experience with anything similar?
Thank you very much.
Kind regards, Petar
I am not sure, but I had a similar case a while ago(On ORACLE but i guess it is not important).
in my case the difference originated at the difference between the amount of data, meaning: if you have 1% of data from the past 30 days, it uses the indexs. when you need "older" data (the rest 99% of the data) it decides to not use the index and to do a full scan(in the form of nested loop and not hash join).
If you sure that the data distribution is ok then maybe try collecting statistics(worked for me at the time). eventually you can start to analyze every peace of this query and to see what part exactly is the bottleneck and work from there.
BTree indexes can have some issues with dates, especially if you're removing old data from the table (ie, deleting everything older than 90 days). It can cause the tables to get lopsided, with all of the rows being down one branch of the tree. Even without removing old dates, if there are many more "new" rows than "old" rows, it can still happen.
But I don't see your query plans using an index on st0_.date, so I don't think that's the issue. If you can afford a table lock on st0_, you can test this theory by running a REINDEX operation on any indexes that contain st0_.date.
Instead, I think you just have a lot more rows that match the 2017-01-20 to 2017-01-30 range vs. the 2017-04-20 to 2017-04-30 range. The first doubly indented Nested Loop is the same in both queries, so I'll ignore it. The second doubly indended stanza is different, and much more expensive in the slow query:
-> Materialize (cost=286.43..13719.50 rows=48 width=434) (actual time=15.584..1986.953 rows=84560 loops=1)
-> Nested Loop (cost=286.43..13719.38 rows=48 width=434) (actual time=15.577..1983.384 rows=2491 loops=1)
-> Nested Loop (cost=286.00..6871.33 rows=13335 width=434) (actual time=0.843..482.864 rows=666738
vs
-> Nested Loop (cost=2.49..3023.47 rows=1 width=434) (actual time=111.345..130.883 rows=86 loops=2474)
-> Hash Join (cost=2.06..2045.18 rows=1905 width=434) (actual time=0.010..28.028 rows=54853 loops=2474)
Materialize can be an expensive operation that doesn't necessarily scale with the estimated cost. Take a look at https://www.postgresql.org/docs/10/static/using-explain.html , and search for "Materialize". Also note that the estimated number of rows is much higher in the slow version.
I'm not 100% sure, but I believe that tweaking the "work_mem" parameter can have some effect in this area (https://www.postgresql.org/docs/9.4/static/runtime-config-resource.html#GUC-WORK-MEM). To test this theory, you can change that value per session using
SET LOCAL work_mem = '8MB';

PostgreSQL query is not using an index

Enviroment
My PostgreSQL (9.2) schema looks like this:
CREATE TABLE first
(
id_first bigint NOT NULL,
first_date timestamp without time zone NOT NULL,
CONSTRAINT first_pkey PRIMARY KEY (id_first)
)
WITH (
OIDS=FALSE
);
CREATE INDEX first_first_date_idx
ON first
USING btree
(first_date);
CREATE TABLE second
(
id_second bigint NOT NULL,
id_first bigint NOT NULL,
CONSTRAINT second_pkey PRIMARY KEY (id_second),
CONSTRAINT fk_first FOREIGN KEY (id_first)
REFERENCES first (id_first) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
WITH (
OIDS=FALSE
);
CREATE INDEX second_id_first_idx
ON second
USING btree
(id_first);
CREATE TABLE third
(
id_third bigint NOT NULL,
id_second bigint NOT NULL,
CONSTRAINT third_pkey PRIMARY KEY (id_third),
CONSTRAINT fk_second FOREIGN KEY (id_second)
REFERENCES second (id_second) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
WITH (
OIDS=FALSE
);
CREATE INDEX third_id_second_idx
ON third
USING btree
(id_second);
So, I have 3 tables with own PK. First has an index on first_date, Second has a FK from First and index on it. Third as a FK from Second and index on it aswell:
First (0 --> n) Second (0 --> n) Third
First table contains about 10 000 000 records.
Second table contains about 20 000 000 records.
Third table contains about 18 000 000 records.
Date range in column first_date is from 2016-01-01 till today.
random_cost_page is set to 2.0.
default_statistics_target is set to 100.
All FK, PK and first_date STATISTICS are set to 5000
Task to do
I want to count all Third rows connected with First, where first_date < X
My query:
SELECT count(t.id_third) AS count
FROM first f
JOIN second s ON s.id_first = f.id_first
JOIN third t ON t.id_second = s.id_second
WHERE first_date < _my_date
Problem description
Asking for 2 days - _my_date = '2016-01-03'
Everything working pretty well. Query lasts 1-2 seconds.
EXPLAIN ANALYZE:
"Aggregate (cost=8585512.55..8585512.56 rows=1 width=8) (actual time=67.310..67.310 rows=1 loops=1)"
" -> Merge Join (cost=4208477.49..8583088.04 rows=969805 width=8) (actual time=44.277..65.948 rows=17631 loops=1)"
" Merge Cond: (s.id_second = t.id_second)"
" -> Sort (cost=4208477.48..4211121.75 rows=1057709 width=8) (actual time=44.263..46.035 rows=19230 loops=1)"
" Sort Key: s.id_second"
" Sort Method: quicksort Memory: 1670kB"
" -> Nested Loop (cost=0.01..4092310.41 rows=1057709 width=8) (actual time=6.169..39.183 rows=19230 loops=1)"
" -> Index Scan using first_first_date_idx on first f (cost=0.01..483786.81 rows=492376 width=8) (actual time=6.159..12.223 rows=10346 loops=1)"
" Index Cond: (first_date < '2016-01-03 00:00:00'::timestamp without time zone)"
" -> Index Scan using second_id_first_idx on second s (cost=0.00..7.26 rows=7 width=16) (actual time=0.002..0.002 rows=2 loops=10346)"
" Index Cond: (id_first = f.id_first)"
" -> Index Scan using third_id_second_idx on third t (cost=0.00..4316649.89 rows=17193788 width=16) (actual time=0.008..7.293 rows=17632 loops=1)"
"Total runtime: 67.369 ms"
Asking for 10 days or more - _my_date = '2016-01-11' or more
Query is not using a indexscan anymore - replaced by seqscan and last 3-4 minutes...
Query plan:
"Aggregate (cost=8731468.75..8731468.76 rows=1 width=8) (actual time=234411.229..234411.229 rows=1 loops=1)"
" -> Hash Join (cost=4352424.81..8728697.88 rows=1108348 width=8) (actual time=189670.068..234400.540 rows=138246 loops=1)"
" Hash Cond: (t.id_second = o.id_second)"
" -> Seq Scan on third t (cost=0.00..4128080.88 rows=17193788 width=16) (actual time=0.016..124111.453 rows=17570724 loops=1)"
" -> Hash (cost=4332592.69..4332592.69 rows=1208810 width=8) (actual time=98566.740..98566.740 rows=151263 loops=1)"
" Buckets: 16384 Batches: 16 Memory Usage: 378kB"
" -> Hash Join (cost=561918.25..4332592.69 rows=1208810 width=8) (actual time=6535.801..98535.915 rows=151263 loops=1)"
" Hash Cond: (s.id_first = f.id_first)"
" -> Seq Scan on second s (cost=0.00..3432617.48 rows=18752248 width=16) (actual time=6090.771..88891.691 rows=19132869 loops=1)"
" -> Hash (cost=552685.31..552685.31 rows=562715 width=8) (actual time=444.630..444.630 rows=81650 loops=1)"
" -> Index Scan using first_first_date_idx on first f (cost=0.01..552685.31 rows=562715 width=8) (actual time=7.987..421.087 rows=81650 loops=1)"
" Index Cond: (first_date < '2016-01-13 00:00:00'::timestamp without time zone)"
"Total runtime: 234411.303 ms"
For test purposes, I have set:
SET enable_seqscan = OFF;
My queries start using indexscan again and last for 1-10 s (depends on range).
Question
Why this is working like that? How to convince a Query Planner to use a indexscan?
EDIT
After reducing a random_page_cost to 1.1, I can select about 30 days now still using a indexscan. Query plan changed a little bit:
"Aggregate (cost=8071389.47..8071389.48 rows=1 width=8) (actual time=4915.196..4915.196 rows=1 loops=1)"
" -> Nested Loop (cost=0.01..8067832.28 rows=1422878 width=8) (actual time=14.402..4866.937 rows=399184 loops=1)"
" -> Nested Loop (cost=0.01..3492321.55 rows=1551849 width=8) (actual time=14.393..3012.617 rows=436794 loops=1)"
" -> Index Scan using first_first_date_idx on first f (cost=0.01..432541.99 rows=722404 width=8) (actual time=14.372..729.233 rows=236007 loops=1)"
" Index Cond: (first_date < '2016-02-01 00:00:00'::timestamp without time zone)"
" -> Index Scan using second_id_first_idx on second s (cost=0.00..4.17 rows=7 width=16) (actual time=0.008..0.009 rows=2 loops=236007)"
" Index Cond: (second = f.id_second)"
" -> Index Scan using third_id_second_idx on third t (cost=0.00..2.94 rows=1 width=16) (actual time=0.004..0.004 rows=1 loops=436794)"
" Index Cond: (id_second = s.id_second)"
"Total runtime: 4915.254 ms"
However, I still don get it why asking for more couse a seqscan...
Iteresting is that, when I ask for range just above some kind of limit I getting a Query plan like this (here select for 40 days - asking for more will produce full seqscan again):
"Aggregate (cost=8403399.27..8403399.28 rows=1 width=8) (actual time=138303.216..138303.217 rows=1 loops=1)"
" -> Hash Join (cost=3887619.07..8399467.63 rows=1572656 width=8) (actual time=44056.443..138261.203 rows=512062 loops=1)"
" Hash Cond: (t.id_second = s.id_second)"
" -> Seq Scan on third t (cost=0.00..4128080.88 rows=17193788 width=16) (actual time=0.004..119497.056 rows=17570724 loops=1)"
" -> Hash (cost=3859478.04..3859478.04 rows=1715203 width=8) (actual time=5695.077..5695.077 rows=560503 loops=1)"
" Buckets: 16384 Batches: 16 Memory Usage: 1390kB"
" -> Nested Loop (cost=0.01..3859478.04 rows=1715203 width=8) (actual time=65.250..5533.413 rows=560503 loops=1)"
" -> Index Scan using first_first_date_idx on first f (cost=0.01..477985.28 rows=798447 width=8) (actual time=64.927..1688.341 rows=302663 loops=1)"
" Index Cond: (first_date < '2016-02-11 00:00:00'::timestamp without time zone)"
" -> Index Scan using second_id_first_idx on second s (cost=0.00..4.17 rows=7 width=16) (actual time=0.010..0.012 rows=2 loops=302663)"
" Index Cond: (id_first = f.id_first)"
"Total runtime: 138303.306 ms"
UPDATE after Laurenz Able suggestions
After rewritting a query plan as Laurenz Able suggested:
"Aggregate (cost=9102321.05..9102321.06 rows=1 width=8) (actual time=15237.830..15237.830 rows=1 loops=1)"
" -> Merge Join (cost=4578171.25..9097528.19 rows=1917143 width=8) (actual time=9111.694..15156.092 rows=803657 loops=1)"
" Merge Cond: (third.id_second = s.id_second)"
" -> Index Scan using third_id_second_idx on third (cost=0.00..4270478.19 rows=17193788 width=16) (actual time=23.650..5425.137 rows=803658 loops=1)"
" -> Materialize (cost=4577722.81..4588177.38 rows=2090914 width=8) (actual time=9088.030..9354.326 rows=879283 loops=1)"
" -> Sort (cost=4577722.81..4582950.09 rows=2090914 width=8) (actual time=9088.023..9238.426 rows=879283 loops=1)"
" Sort Key: s.id_second"
" Sort Method: external sort Disk: 15480kB"
" -> Merge Join (cost=673389.38..4341477.37 rows=2090914 width=8) (actual time=3662.239..8485.768 rows=879283 loops=1)"
" Merge Cond: (s.id_first = f.id_first)"
" -> Index Scan using second_id_first_idx on second s (cost=0.00..3587838.88 rows=18752248 width=16) (actual time=0.015..4204.308 rows=879284 loops=1)"
" -> Materialize (cost=672960.82..677827.55 rows=973345 width=8) (actual time=3662.216..3855.667 rows=892988 loops=1)"
" -> Sort (cost=672960.82..675394.19 rows=973345 width=8) (actual time=3662.213..3745.975 rows=476519 loops=1)"
" Sort Key: f.id_first"
" Sort Method: external sort Disk: 8400kB"
" -> Index Scan using first_first_date_idx on first f (cost=0.01..568352.90 rows=973345 width=8) (actual time=126.386..3233.134 rows=476519 loops=1)"
" Index Cond: (first_date < '2016-03-03 00:00:00'::timestamp without time zone)"
"Total runtime: 15244.404 ms"
First, it looks like some of the estimates are off.
Try to ANALYZE the tables and see if that changes the query plan chosen.
What might also help is to lower random_page_cost to a value just over 1 and see if that improves the plan.
It is interesting to note that the index scan on third_id_second_idx in the fast query produces only 17632 rows instead of over 17 million, which I can only explain by assuming that from that row on, the values of id_second no longer match any row in the join of first and second, i.e. the merge join is completed after that.
You can try to exploit that with with a rewritten query. Try
JOIN (SELECT id_second, id_third FROM third ORDER BY id_second) t
instead of
JOIN third t
That may result in a better plan since PostgreSQL won't optimize the ORDER BY away, and the planner may decide that since it has to sort third anyway, it may be cheaper to use a merge join. That way you trick the planner into choosing a plan that it wouldn't recognize as ideal. With a different value distribution the planner's original choice would probably be better.

How do I speed up this Django PostgreSQL query?

I'm trying to speed up what seems to me to be quite a slow PostgreSQL query in Django (150ms)
This is the query:
SELECT ••• FROM "predictions_prediction"
INNER JOIN "minute_in_time_minute"
ON ( "predictions_prediction"."minute_id" = "minute_in_time_minute"."id" )
WHERE ("minute_in_time_minute"."datetime" >= '2014-08-21 13:12:00+00:00'
AND "predictions_prediction"."location_id" = 1
AND "minute_in_time_minute"."datetime" < '2014-08-24 13:12:00+00:00'
AND "predictions_prediction"."tide_level" >= 3.0)
ORDER BY "minute_in_time_minute"."datetime" ASC
Here's the result of the PostgreSQL EXPLAIN:
Sort (cost=17731.45..17739.78 rows=3331 width=32) (actual time=151.755..151.901 rows=3515 loops=1)
Sort Key: minute_in_time_minute.datetime
Sort Method: quicksort Memory: 371kB
-> Hash Join (cost=3187.44..17536.56 rows=3331 width=32) (actual time=96.757..150.693 rows=3515 loops=1)
Hash Cond: (predictions_prediction.minute_id = minute_in_time_minute.id)
-> Seq Scan on predictions_prediction (cost=0.00..11232.00 rows=411175 width=20) (actual time=0.017..88.063 rows=410125 loops=1)
Filter: ((tide_level >= 3::double precision) AND (location_id = 1))
Rows Removed by Filter: 115475
-> Hash (cost=3134.21..3134.21 rows=4258 width=12) (actual time=9.221..9.221 rows=4320 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 203kB
-> Bitmap Heap Scan on minute_in_time_minute (cost=92.07..3134.21 rows=4258 width=12) (actual time=1.147..8.220 rows=4320 loops=1)
Recheck Cond: ((datetime >= '2014-08-21 13:18:00+00'::timestamp with time zone) AND (datetime < '2014-08-24 13:18:00+00'::timestamp with time zone))
-> Bitmap Index Scan on minute_in_time_minute_datetime_key (cost=0.00..91.00 rows=4258 width=0) (actual time=0.851..0.851 rows=4320 loops=1)
Index Cond: ((datetime >= '2014-08-21 13:18:00+00'::timestamp with time zone) AND (datetime < '2014-08-24 13:18:00+00'::timestamp with time zone))
I've tried visualising it in an external tool (http://explain.depesz.com/s/CWW) which shows that the start of the problem is the Seq Scan on predictions_prediction
What I've tried so far:
Add an index on predictions_prediction.tide_level
Add a composite index on tide_level and location on predictions.prediction
But neither had any effect as far as I could see.
Can someone please help me interpret the query plan?
Thanks
Try creating the following composite index on predictions_prediction
(minute_id, location_id, tide_level)
150ms for this type of query (a join, several where conditions with inequalities and an order by) is actually relatively normal, so you might not be able to speed it up much more.