PostgreSQL View Query Times Exponential with Integer Array Column - sql

I have a SQL view (CREATE OR REPLACE VIEW table_view AS ...) that has 20 columns, with a mixture of text, integer, boolean and integer array fields. In total, there are nine columns that are integer arrays.
This view works very well the overwhelming majority of the time. However, when a row shows up with ~50 total values within the integer array, the query slows to a crawl. These columns are not being queried against, they are simply just part of the resultset, but the magnitude of the query goes from ~250ms to ~7s if any rows are returned that contain the high number of integer array values.
Here is an example query:
SELECT
"table_view"."id", -- integer
"table_view"."start", -- date with tz
"table_view"."end", -- date with tz
"table_view"."a_id", -- integer
"table_view"."c_id", -- integer
"table_view"."generator_id", -- integer
"table_view"."ci_id", -- integer
"table_view"."kind", -- integer
"table_view"."hidden", -- boolean
"table_view"."is_done", -- integer
"table_view"."title", -- text
"table_view"."c_type", -- integer
"table_view"."ci_title", -- integer
"table_view"."status", -- text
"table_view"."cs", -- integer array
"table_view"."cts", -- integer array
"table_view"."ts", -- integer array
"table_view"."tas", -- integer array
"table_view"."bs", -- integer array
"table_view"."ms", -- integer array
"table_view"."pcs", -- integer array
"table_view"."pts", -- integer array
"table_view"."ks" -- integer array
FROM
"table_view"
WHERE (
"table_view"."a_id" = 3289
AND (
"table_view"."cs" && ARRAY[28890, 21166, 28891, 29581, 29583, 22378, 22380, 22733, 28924, 28925, 28926, 28927, 28478, 41014]::integer[]
OR "table_view"."ms" && ARRAY[11125]::integer[]
) AND NOT (
"table_view"."hidden" = true
AND NOT (
"table_view"."ms" && ARRAY[11125]::integer[]
)
) AND "table_view"."end" >= '2019-02-03T00:00:00+00:00'::timestamptz
AND "table_view"."start" < '2019-04-15T00:00:00+00:00'::timestamptz
);
If none of the rows have a sum of integer array values ~> 50, the query time is ~50ms to 250ms.
Time: 50.417 ms
However, if one or more of the rows have a sum of integer array values <~ 50, the query time is ~6.5s to 7s.
Time: 6737.154 ms
I have obfuscated some of the columns names from this query, but that shouldn't impact the debugging of this issue. I'm beating myself up over this. Nothing I am researching online mentioned anything about column length being an issue in the resulting query.
Here is the output of the EXPLAIN ANALYZE for the "bad" query:
Subquery Scan on table_view (cost=173.16..173.29 rows=1 width=436) (actual time=648.470..800.577 rows=60 loops=1)
-> GroupAggregate (cost=173.16..173.28 rows=1 width=103) (actual time=648.468..800.552 rows=60 loops=1)
Group Key: dhci.id, wfciwf.id, dutm.account_id
Filter: (((array_append('{}'::integer[], dhci.cid) && '{28890,21166,28891,29581,29583,22378,22380,22733,28924,28925,28926,28927,28478,41014}'::integer[]) OR (array_agg(DISTIN
CT dhamti.member_id) && '{11125}'::integer[])) AND ((NOT dhci.hidden) OR (array_agg(DISTINCT dhamti.member_id) && '{11125}'::integer[])))
-> Sort (cost=173.16..173.17 rows=1 width=103) (actual time=647.727..685.155 rows=172000 loops=1)
Sort Key: dhci.id, wfciwf.id, dutm.account_id
Sort Method: external merge Disk: 16392kB
-> Nested Loop Left Join (cost=68.45..173.15 rows=1 width=103) (actual time=1.033..478.844 rows=172000 loops=1)
-> Nested Loop Left Join (cost=68.16..171.71 rows=1 width=99) (actual time=1.017..308.168 rows=172000 loops=1)
-> Nested Loop Left Join (cost=67.87..170.20 rows=1 width=95) (actual time=0.997..126.331 rows=172000 loops=1)
-> Nested Loop Left Join (cost=67.58..168.77 rows=1 width=91) (actual time=0.978..26.478 rows=33400 loops=1)
-> Nested Loop Left Join (cost=67.16..165.68 rows=1 width=87) (actual time=0.960..4.997 rows=3700 loops=1)
-> Nested Loop Left Join (cost=66.73..162.71 rows=1 width=83) (actual time=0.949..2.572 rows=410 loops=1)
-> Nested Loop Left Join (cost=66.44..161.59 rows=1 width=79) (actual time=0.939..2.121 rows=60 loops=1)
-> Nested Loop (cost=66.02..154.64 rows=1 width=75) (actual time=0.926..1.818 rows=60 loops=1)
-> Nested Loop (cost=65.59..148.07 rows=1 width=66) (actual time=0.902..1.446 rows=60 loops=1)
-> Index Scan using dutm_8a089c2a on dutm (cost=0.29..9.40 rows=2 width=8) (actual time=0.017..0.028 rows=9 loops=1)
Index Cond: (account_id = 3289)
-> Bitmap Heap Scan on dhci (cost=65.31..69.33 rows=1 width=66) (actual time=0.137..0.148 rows=7 loops=9)
Recheck Cond: ((owner_member_id = dutm.id) AND (deadline IS NOT NULL) AND (deadline >= '2019-02-03 00:00:00+00'::timestamp with time zone) AND (deadline < '2019-04-15 00:00:00+00'::timestamp with time zone))
Heap Blocks: exact=36
-> BitmapAnd (cost=65.31..65.31 rows=1 width=0) (actual time=0.135..0.135 rows=0 loops=9)
-> Bitmap Index Scan on dhci_9adb17cb (cost=0.00..5.79 rows=182 width=0) (actual time=0.036..0.036 rows=167 loops=9)
Index Cond: (owner_member_id = dutm.id)
-> Bitmap Index Scan on dhci_deadline_ba8bd3addbfe7dc_uniq (cost=0.00..58.66 rows=2419 width=0) (actual time=0.263..0.263 rows=1727 loops=3)
Index Cond: ((deadline IS NOT NULL) AND (deadline >= '2019-02-03 00:00:00+00'::timestamp with time zone) AND (deadline < '2019-04-15 00:00:00+00'::timestamp with time zone))
-> Index Scan using wfciwf_pkey on wfciwf (cost=0.42..6.55 rows=1 width=13) (actual time=0.005..0.005 rows=1 loops=60)
Index Cond: (id = dhci.workflow_state_id)
Filter: (((current_status)::text <> 'killed'::text) AND ((current_status)::text <> 'parked'::text))
-> Index Scan using dhamti_d7bbcb82 on dhamti (cost=0.42..6.93 rows=2 width=8) (actual time=0.003..0.004 rows=1 loops=60)
Index Cond: (content_item_id = dhci.id)
-> Index Scan using dhcibs_6123fe8a on dhcibs (cost=0.29..1.10 rows=2 width=8) (actual time=0.003..0.005 rows=7 loops=60)
Index Cond: (ciid = dhci.id)
-> Index Scan using dhcit_6123fe8a on dhcit (cost=0.42..2.95 rows=2 width=8) (actual time=0.002..0.004 rows=9 loops=410)
Index Cond: (ciid = dhci.id)
-> Index Scan using ghcita_6123fe8a on ghcita (cost=0.42..3.07 rows=2 width=8) (actual time=0.002..0.003 rows=9 loops=3700)
Index Cond: (ciid = dhci.id)
-> Index Scan using dhcipc_6123fe8a on dhcipc (cost=0.29..1.41 rows=2 width=8) (actual time=0.001..0.002 rows=5 loops=33400)
Index Cond: (ciid = dhci.id)
-> Index Scan using dhcipt_6123fe8a on dhcipt (cost=0.29..1.49 rows=2 width=8) (actual time=0.001..0.001 rows=0 loops=172000)
Index Cond: (ciid = dhci.id)
-> Index Scan using dhcik_6123fe8a on dhcik (cost=0.29..1.41 rows=3 width=8) (actual time=0.001..0.001 rows=0 loops=172000)
Index Cond: (ciid = dhci.id)
Planning time: 8.219 ms
Execution time: 804.161 ms
(45 rows)

Related

Query Tuning in PostgreSQL [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
I have a query that is running in 17s, but I can not think of a way to optimize this query. Some help is much needed.
EXPLAIN ANALYSE
CREATE materialized VIEW professores_fizeram_planejamentoTEST as
SELECT unities.id as id_escola,
unities.name as nome_escola,
teachers.id as id_professor,
teachers.name as nome_professor,
datas.dia,
COALESCE((SELECT true
FROM lesson_plans
WHERE lesson_plans.teacher_id = teachers.id and
datas.dia between lesson_plans.start_at and lesson_plans.end_at
LIMIT 1), false) as criou_plano_aula,
COALESCE((select true
from content_records
where content_records.teacher_id = teachers.id and
content_records.record_date = datas.dia
limit 1), false) as criou_registro_conteudo
FROM (SELECT i::date as dia,
EXTRACT(year FROM i::date) as ano
FROM generate_series(date_trunc('year', now()), now(), '1 day'::INTERVAL) i
WHERE EXTRACT(dow from i::timestamp) in (1,2,3,4,5)) datas
JOIN (SELECT distinct teacher_id, classroom_id, YEAR
FROM teacher_discipline_classrooms) teacher_discipline_classrooms ON (teacher_discipline_classrooms.year = datas.ano)
JOIN classrooms on (classrooms.id = teacher_discipline_classrooms.classroom_id)
JOIN unities on (unities.id = classrooms.unity_id)
JOIN teachers on (teachers.id = teacher_discipline_classrooms.teacher_id)
WHERE NOT EXISTS(SELECT 1
FROM school_calendars
JOIN school_calendar_events on (school_calendar_events.school_calendar_id = school_calendars.id and
school_calendar_events.event_type = 'no_school' and
datas.dia between school_calendar_events.start_date and school_calendar_events.end_date)
WHERE school_calendars.unity_id = unities.id)
This query returns the following analysis
Nested Loop (cost=143.840..3721.540 rows=38 width=66) (actual time=1.923..17270.125 rows=171231 loops=1)
-> Nested Loop (cost=143.690..1523.510 rows=38 width=41) (actual time=1.744..5996.571 rows=171231 loops=1)
Join Filter: (NOT (delta 3))
Rows Removed by Join Filter: 15249
-> Nested Loop (cost=143.550..203.530 rows=76 width=16) (actual time=1.661..568.049 rows=186480 loops=1)
-> Hash Join (cost=143.270..165.450 rows=76 width=16) (actual time=1.651..183.740 rows=186660 loops=1)
Hash Cond: ((victor.juliet_seven)::double precision = echo_tango('quebec_four'::text, ((alpha_quebec_whiskey.alpha_quebec_whiskey)::date)::timestamp without time zone))
-> HashAggregate (cost=121.700..127.820 rows=612 width=12) (actual time=1.384..3.336 rows=2388 loops=1)
Group Key: victor.foxtrot_six, victor.oscar_kilo, victor.juliet_seven
-> Seq Scan on victor (cost=0.000..94.400 rows=3640 width=12) (actual time=0.004..0.563 rows=3640 loops=1)
-> Hash (cost=21.260..21.260 rows=25 width=8) (actual time=0.256..0.256 rows=180 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 16kB
-> Function Scan on xray_yankee alpha_quebec_whiskey (cost=0.010..21.260 rows=25 width=8) (actual time=0.081..0.195 rows=180 loops=1)
Filter: (echo_tango('papa'::text, (alpha_quebec_whiskey)::timestamp without time zone) = ANY ('oscar_seven_charlie'::double precision[]))
Rows Removed by Filter: 72
-> Index Scan using echo_victor on uniform (cost=0.280..0.490 rows=1 width=8) (actual time=0.001..0.002 rows=1 loops=186660)
Index Cond: (quebec_seven = victor.oscar_kilo)
-> Index Scan using golf on four (cost=0.140..0.160 rows=1 width=29) (actual time=0.001..0.001 rows=1 loops=186480)
Index Cond: (quebec_seven = uniform.xray_victor)
SubPlan
-> Nested Loop (cost=0.280..34.110 rows=2 width=0) (actual time=0.027..0.027 rows=0 loops=186480)
-> Seq Scan on seven (cost=0.000..1.990 rows=2 width=4) (actual time=0.003..0.008 rows=2 loops=186480)
Filter: (xray_victor = four.quebec_seven)
Rows Removed by Filter: 75
-> Index Scan using alpha_quebec_papa on two (cost=0.280..16.050 rows=1 width=4) (actual time=0.008..0.008 rows=0 loops=372960)
Index Cond: (zulu = seven.quebec_seven)
Filter: (((xray_delta)::text = 'oscar_seven_golf'::text) AND ((alpha_quebec_whiskey.alpha_quebec_whiskey)::date >= foxtrot_three) AND ((alpha_quebec_whiskey.alpha_quebec_whiskey)::date <= lima))
Rows Removed by Filter: 14
-> Index Scan using tango on romeo (cost=0.150..0.200 rows=1 width=29) (actual time=0.001..0.001 rows=1 loops=171231)
Index Cond: (quebec_seven = victor.foxtrot_six)
SubPlan
-> Limit (cost=0.000..20.600 rows=1 width=0) (actual time=0.048..0.048 rows=0 loops=171231)
-> Seq Scan on five (cost=0.000..20.600 rows=1 width=0) (actual time=0.045..0.045 rows=0 loops=171231)
Filter: ((foxtrot_six = romeo.quebec_seven) AND ((alpha_quebec_whiskey.alpha_quebec_whiskey)::date >= oscar_echo) AND ((alpha_quebec_whiskey.alpha_quebec_whiskey)::date <= xray_three))
Rows Removed by Filter: 246
SubPlan
-> Limit (cost=4.810..37.030 rows=1 width=0) (actual time=0.015..0.015 rows=0 loops=171231)
-> Bitmap Heap Scan on whiskey (cost=4.810..37.030 rows=1 width=0) (actual time=0.011..0.011 rows=0 loops=171231)
Recheck Cond: (foxtrot_six = romeo.quebec_seven)
Filter: (foxtrot_tango = (alpha_quebec_whiskey.alpha_quebec_whiskey)::date)
Rows Removed by Filter: 28
Heap Blocks: exact=258248
-> Bitmap Index Scan on juliet_bravo (cost=0.000..4.810 rows=70 width=0) (actual time=0.003..0.003 rows=37 loops=171231)
Index Cond: (foxtrot_six = romeo.quebec_seven)
EXPLAIN 1 - RESULT
EXPLAIN 2 - RESULT
Thanks you.
no, we wont!
sanitize your query (add aliases, and use them, for instance)
COALESCE((SELECT true FROM lesson_plans WHERE lesson_plans.teacher_id = teachers.id and datas.dia between lesson_plans.start_at and lesson_plans.end_at LIMIT 1), false) as criou_plano_aula
... can be replaced by a simple EXISTS(subquery)
your outer query only refers to {unities,teachers,datas}, the rest of the tables are merely junction tables.
if there is a difference in the queryplan between expected <-->observed, your statistics are wrong.
the function scan on generate_series() spoils the queryplan. Better use a material calendartable, which could be indexed & countable.
always add the tuning parameters and an estimate of the cardinalities to the question. These are not details.
This is not an answer, but a comment that doesn't fit in the comments section. If you want to speed up your query please add some information:
First, please include the execution plan. This will tell us what's going on fast.
Also, please post:
Existing indexes on lessons_plans.
Existing indexes on content_records.
Rows on teacher_discipline_classrooms.
Existing indexes on teacher_discipline_classrooms.
Existing indexes on classrooms.
Existing indexes on unities.
indexes on teachers.
Existing indexes on shool_calendars.

Significantly different time of execution of the query due to different date value in Postgres

I have a weird case of query execution performance here. The query has a date values in the WHERE clause, and the speed of executing varies by values of the date.
Actually:
for the dates from the range of the last 30 days, execution takes around 3 min
for the dates before the range of the last 30 days, execution takes a few seconds
The query is listed below, with the date in the last 30 days range:
select
sk2_.code as col_0_0_,
bra4_.code as col_1_0_,
st0_.quantity as col_2_0_,
bat1_.forecast as col_3_0_
from
TBL_st st0_,
TBL_bat bat1_,
TBL_sk sk2_,
TBL_bra bra4_
where
st0_.batc_id=bat1_.id
and bat1_.sku_id=sk2_.id
and bat1_.bran_id=bra4_.id
and not (exists (select
1
from
TBL_st st6_,
TBL_bat bat7_,
TBL_sk sk10_
where
st6_.batc_id=bat7_.id
and bat7_.sku_id=sk10_.id
and bat7_.bran_id=bat1_.bran_id
and sk10_.code=sk2_.code
and st6_.date>st0_.date
and sk10_.acco_id=1
and st6_.date>='2017-04-20'
and st6_.date<='2017-04-30'))
and sk2_.acco_id=1
and st0_.date>='2017-04-20'
and st0_.date<='2017-04-30'
and here is the plan for the query with the date in the last 30 days range:
Nested Loop (cost=289.06..19764.03 rows=1 width=430) (actual time=3482.062..326049.246 rows=249 loops=1)
-> Nested Loop Anti Join (cost=288.91..19763.86 rows=1 width=433) (actual time=3482.023..326048.023 rows=249 loops=1)
Join Filter: ((st6_.date > st0_.date) AND ((sk10_.code)::text = (sk2_.code)::text))
Rows Removed by Join Filter: 210558
-> Nested Loop (cost=286.43..13719.38 rows=1 width=441) (actual time=4.648..2212.042 rows=2474 loops=1)
-> Nested Loop (cost=286.00..6871.33 rows=13335 width=436) (actual time=4.262..657.823 rows=666738 loops=1)
-> Index Scan using uk_TBL_sk0_account_code on TBL_sk sk2_ (cost=0.14..12.53 rows=1 width=426) (actual time=1.036..1.084 rows=50 loops=1)
Index Cond: (acco_id = 1)
-> Bitmap Heap Scan on TBL_bat bat1_ (cost=285.86..6707.27 rows=15153 width=26) (actual time=3.675..11.308 rows=13335 loops=50)
Recheck Cond: (sku_id = sk2_.id)
Heap Blocks: exact=241295
-> Bitmap Index Scan on ix_al_batc_sku_id (cost=0.00..282.07 rows=15153 width=0) (actual time=3.026..3.026 rows=13335 loops=50)
Index Cond: (sku_id = sk2_.id)
-> Index Scan using ix_al_stle_batc_id on TBL_st st0_ (cost=0.42..0.50 rows=1 width=21) (actual time=0.002..0.002 rows=0 loops=666738)
Index Cond: (batc_id = bat1_.id)
Filter: ((date >= '2017-04-20 00:00:00'::timestamp without time zone) AND (date <= '2017-04-30 00:00:00'::timestamp without time zone))
Rows Removed by Filter: 1
-> Nested Loop (cost=2.49..3023.47 rows=1 width=434) (actual time=111.345..130.883 rows=86 loops=2474)
-> Hash Join (cost=2.06..2045.18 rows=1905 width=434) (actual time=0.010..28.028 rows=54853 loops=2474)
Hash Cond: (bat7_.sku_id = sk10_.id)
-> Index Scan using ix_al_batc_bran_id on TBL_bat bat7_ (cost=0.42..1667.31 rows=95248 width=24) (actual time=0.009..11.045 rows=54853 loops=2474)
Index Cond: (bran_id = bat1_.bran_id)
-> Hash (cost=1.63..1.63 rows=1 width=426) (actual time=0.026..0.026 rows=50 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 11kB
-> Seq Scan on TBL_sk sk10_ (cost=0.00..1.63 rows=1 width=426) (actual time=0.007..0.015 rows=50 loops=1)
Filter: (acco_id = 1)
-> Index Scan using ix_al_stle_batc_id on TBL_st st6_ (cost=0.42..0.50 rows=1 width=16) (actual time=0.002..0.002 rows=0 loops=135706217)
Index Cond: (batc_id = bat7_.id)
Filter: ((date >= '2017-04-20 00:00:00'::timestamp without time zone) AND (date <= '2017-04-30 00:00:00'::timestamp without time zone))
Rows Removed by Filter: 1
-> Index Scan using TBL_bra_pk on TBL_bra bra4_ (cost=0.14..0.16 rows=1 width=13) (actual time=0.003..0.003 rows=1 loops=249)
Index Cond: (id = bat1_.bran_id)
Planning time: 8.108 ms
Execution time: 326049.583 ms
Here is the same query with the date before the last 30 days range:
select
sk2_.code as col_0_0_,
bra4_.code as col_1_0_,
st0_.quantity as col_2_0_,
bat1_.forecast as col_3_0_
from
TBL_st st0_,
TBL_bat bat1_,
TBL_sk sk2_,
TBL_bra bra4_
where
st0_.batc_id=bat1_.id
and bat1_.sku_id=sk2_.id
and bat1_.bran_id=bra4_.id
and not (exists (select
1
from
TBL_st st6_,
TBL_bat bat7_,
TBL_sk sk10_
where
st6_.batc_id=bat7_.id
and bat7_.sku_id=sk10_.id
and bat7_.bran_id=bat1_.bran_id
and sk10_.code=sk2_.code
and st6_.date>st0_.date
and sk10_.acco_id=1
and st6_.date>='2017-01-20'
and st6_.date<='2017-01-30'))
and sk2_.acco_id=1
and st0_.date>='2017-01-20'
and st0_.date<='2017-01-30'
and here is the plan for the query with the date before the last 30 days range:
Hash Join (cost=576.33..27443.95 rows=48 width=430) (actual time=132.732..3894.554 rows=250 loops=1)
Hash Cond: (bat1_.bran_id = bra4_.id)
-> Merge Anti Join (cost=572.85..27439.82 rows=48 width=433) (actual time=132.679..3894.287 rows=250 loops=1)
Merge Cond: ((sk2_.code)::text = (sk10_.code)::text)
Join Filter: ((st6_.date > st0_.date) AND (bat7_.bran_id = bat1_.bran_id))
Rows Removed by Join Filter: 84521
-> Nested Loop (cost=286.43..13719.38 rows=48 width=441) (actual time=26.105..1893.523 rows=2491 loops=1)
-> Nested Loop (cost=286.00..6871.33 rows=13335 width=436) (actual time=1.159..445.683 rows=666738 loops=1)
-> Index Scan using uk_TBL_sk0_account_code on TBL_sk sk2_ (cost=0.14..12.53 rows=1 width=426) (actual time=0.035..0.084 rows=50 loops=1)
Index Cond: (acco_id = 1)
-> Bitmap Heap Scan on TBL_bat bat1_ (cost=285.86..6707.27 rows=15153 width=26) (actual time=1.741..7.148 rows=13335 loops=50)
Recheck Cond: (sku_id = sk2_.id)
Heap Blocks: exact=241295
-> Bitmap Index Scan on ix_al_batc_sku_id (cost=0.00..282.07 rows=15153 width=0) (actual time=1.119..1.119 rows=13335 loops=50)
Index Cond: (sku_id = sk2_.id)
-> Index Scan using ix_al_stle_batc_id on TBL_st st0_ (cost=0.42..0.50 rows=1 width=21) (actual time=0.002..0.002 rows=0 loops=666738)
Index Cond: (batc_id = bat1_.id)
Filter: ((date >= '2017-01-20 00:00:00'::timestamp without time zone) AND (date <= '2017-01-30 00:00:00'::timestamp without time zone))
Rows Removed by Filter: 1
-> Materialize (cost=286.43..13719.50 rows=48 width=434) (actual time=15.584..1986.953 rows=84560 loops=1)
-> Nested Loop (cost=286.43..13719.38 rows=48 width=434) (actual time=15.577..1983.384 rows=2491 loops=1)
-> Nested Loop (cost=286.00..6871.33 rows=13335 width=434) (actual time=0.843..482.864 rows=666738 loops=1)
-> Index Scan using uk_TBL_sk0_account_code on TBL_sk sk10_ (cost=0.14..12.53 rows=1 width=426) (actual time=0.005..0.052 rows=50 loops=1)
Index Cond: (acco_id = 1)
-> Bitmap Heap Scan on TBL_bat bat7_ (cost=285.86..6707.27 rows=15153 width=24) (actual time=2.051..7.902 rows=13335 loops=50)
Recheck Cond: (sku_id = sk10_.id)
Heap Blocks: exact=241295
-> Bitmap Index Scan on ix_al_batc_sku_id (cost=0.00..282.07 rows=15153 width=0) (actual time=1.424..1.424 rows=13335 loops=50)
Index Cond: (sku_id = sk10_.id)
-> Index Scan using ix_al_stle_batc_id on TBL_st st6_ (cost=0.42..0.50 rows=1 width=16) (actual time=0.002..0.002 rows=0 loops=666738)
Index Cond: (batc_id = bat7_.id)
Filter: ((date >= '2017-01-20 00:00:00'::timestamp without time zone) AND (date <= '2017-01-30 00:00:00'::timestamp without time zone))
Rows Removed by Filter: 1
-> Hash (cost=2.10..2.10 rows=110 width=13) (actual time=0.033..0.033 rows=110 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 14kB
-> Seq Scan on TBL_bra bra4_ (cost=0.00..2.10 rows=110 width=13) (actual time=0.004..0.013 rows=110 loops=1)
Planning time: 14.542 ms
Execution time: 3894.793 ms
Does anyone have an idea why does this happens.
Did anyone had an experience with anything similar?
Thank you very much.
Kind regards, Petar
I am not sure, but I had a similar case a while ago(On ORACLE but i guess it is not important).
in my case the difference originated at the difference between the amount of data, meaning: if you have 1% of data from the past 30 days, it uses the indexs. when you need "older" data (the rest 99% of the data) it decides to not use the index and to do a full scan(in the form of nested loop and not hash join).
If you sure that the data distribution is ok then maybe try collecting statistics(worked for me at the time). eventually you can start to analyze every peace of this query and to see what part exactly is the bottleneck and work from there.
BTree indexes can have some issues with dates, especially if you're removing old data from the table (ie, deleting everything older than 90 days). It can cause the tables to get lopsided, with all of the rows being down one branch of the tree. Even without removing old dates, if there are many more "new" rows than "old" rows, it can still happen.
But I don't see your query plans using an index on st0_.date, so I don't think that's the issue. If you can afford a table lock on st0_, you can test this theory by running a REINDEX operation on any indexes that contain st0_.date.
Instead, I think you just have a lot more rows that match the 2017-01-20 to 2017-01-30 range vs. the 2017-04-20 to 2017-04-30 range. The first doubly indented Nested Loop is the same in both queries, so I'll ignore it. The second doubly indended stanza is different, and much more expensive in the slow query:
-> Materialize (cost=286.43..13719.50 rows=48 width=434) (actual time=15.584..1986.953 rows=84560 loops=1)
-> Nested Loop (cost=286.43..13719.38 rows=48 width=434) (actual time=15.577..1983.384 rows=2491 loops=1)
-> Nested Loop (cost=286.00..6871.33 rows=13335 width=434) (actual time=0.843..482.864 rows=666738
vs
-> Nested Loop (cost=2.49..3023.47 rows=1 width=434) (actual time=111.345..130.883 rows=86 loops=2474)
-> Hash Join (cost=2.06..2045.18 rows=1905 width=434) (actual time=0.010..28.028 rows=54853 loops=2474)
Materialize can be an expensive operation that doesn't necessarily scale with the estimated cost. Take a look at https://www.postgresql.org/docs/10/static/using-explain.html , and search for "Materialize". Also note that the estimated number of rows is much higher in the slow version.
I'm not 100% sure, but I believe that tweaking the "work_mem" parameter can have some effect in this area (https://www.postgresql.org/docs/9.4/static/runtime-config-resource.html#GUC-WORK-MEM). To test this theory, you can change that value per session using
SET LOCAL work_mem = '8MB';

Postgres SQL slow query aggregation

I have an aggregation query which is ends up to be slow, I am looking for any improvements in "query" or "index".
I indexed all the fieldsI use, maybe i missed something, or maybe you can suggest any ways I can execute this query
query:
EXPLAIN ANALYZE
SELECT HE.fs_perm_sec_id,
HE.TICKER_EXCHANGE,
HE.proper_name,
OP.shares_outstanding,
(SELECT factset_industry_desc
FROM factset_industry_map AS fim
WHERE fim.factset_industry_code = HES.industry_code) AS industry,
// slow aggregation
(SELECT SUM(OIH.current_holdings)
FROM own_inst_holdings OIH
WHERE OIH.fs_perm_sec_id = HE.fs_perm_sec_id) AS inst_holdings
FROM own_prices OP
JOIN h_security_ticker_exchange HE ON OP.fs_perm_sec_id = HE.fs_perm_sec_id
JOIN h_entity_sector HES ON HES.factset_entity_id = HE.factset_entity_id
WHERE HE.ticker_exchange = 'BUD-NYS'
ORDER BY OP.price_date DESC LIMIT 1
Where this piece slows down the query:
(SELECT SUM(OIH.current_holdings)
FROM own_inst_holdings OIH
WHERE OIH.fs_perm_sec_id = HE.fs_perm_sec_id) AS inst_holdings
EXPLAIN ANALYZE
Limit (cost=360.41..360.41 rows=1 width=100) (actual time=920.592..920.592 rows=1 loops=1)
-> Sort (cost=360.41..360.41 rows=1 width=100) (actual time=920.592..920.592 rows=1 loops=1)
Sort Key: op.price_date
Sort Method: top-N heapsort Memory: 25kB
-> Nested Loop (cost=0.26..360.41 rows=1 width=100) (actual time=867.898..920.493 rows=35 loops=1)
-> Nested Loop (cost=0.17..6.43 rows=1 width=104) (actual time=4.882..4.940 rows=35 loops=1)
-> Index Scan using h_sec_exch_factset_entity_id_idx on h_security_ticker_exchange he (cost=0.09..4.09 rows=1 width=92) (actual time=3.611..3.612 rows=1 loops=1)
Index Cond: ((ticker_exchange)::text = 'BUD-NYS'::text)
-> Index Only Scan using own_prices_multiple_idx_1 on own_prices op (cost=0.09..2.25 rows=32 width=23) (actual time=1.258..1.301 rows=35 loops=1)
Index Cond: (fs_perm_sec_id = (he.fs_perm_sec_id)::text)
Heap Fetches: 0
-> Index Scan using h_entity_sector_multiple_idx_3 on h_entity_sector hes (cost=0.09..4.09 rows=1 width=14) (actual time=0.083..0.085 rows=1 loops=35)
Index Cond: (factset_entity_id = he.factset_entity_id)
SubPlan 1
-> Seq Scan on factset_industry_map fim (cost=0.00..2.48 rows=1 width=20) (actual time=0.014..0.031 rows=1 loops=35)
Filter: (factset_industry_code = hes.industry_code)
Rows Removed by Filter: 137
SubPlan 2
-> Aggregate (cost=347.40..347.40 rows=1 width=6) (actual time=26.035..26.035 rows=1 loops=35)
-> Bitmap Heap Scan on own_inst_holdings oih (cost=4.36..347.31 rows=177 width=6) (actual time=0.326..25.658 rows=622 loops=35)
Recheck Cond: ((fs_perm_sec_id)::text = (he.fs_perm_sec_id)::text)
Heap Blocks: exact=22750
-> Bitmap Index Scan on own_inst_holdings_fs_perm_sec_id_idx (cost=0.00..4.35 rows=177 width=0) (actual time=0.232..0.232 rows=662 loops=35)
Index Cond: ((fs_perm_sec_id)::text = (he.fs_perm_sec_id)::text)
Planning time: 5.806 ms
Execution time: 920.778 ms
For this query:
SELECT HE.fs_perm_sec_id, HE.TICKER_EXCHANGE, HE.proper_name, OP.shares_outstanding,
(SELECT factset_industry_desc
FROM factset_industry_map AS fim
WHERE fim.factset_industry_code = HES.industry_code
) AS industry,
(SELECT SUM(OIH.current_holdings)
FROM own_inst_holdings OIH
WHERE OIH.fs_perm_sec_id = HE.fs_perm_sec_id
) AS inst_holdings
FROM own_prices OP JOIN
h_security_ticker_exchange HE
ON OP.fs_perm_sec_id = HE.fs_perm_sec_id JOIN
h_entity_sector HES
ON HES.factset_entity_id = HE.factset_entity_id
WHERE HE.ticker_exchange = 'BUD-NYS'
ORDER BY OP.price_date DESC
LIMIT 1;
You want the following indexes:
h_security_ticker_exchange(ticker_exchange, factset_entity_id, fs_perm_sec_id)
own_prices(fs_perm_sec_id)
h_entity_sector(factset_entity_id)
factset_industry_map(factset_industry_code, factset_industry_desc)
own_inst_holdings(fs_perm_sec_id, current_holdings)

Why PostgreSQL index is not used and suggest indexes [duplicate]

This question already has answers here:
Keep PostgreSQL from sometimes choosing a bad query plan
(6 answers)
Closed 8 years ago.
I am doing a query to select records that lie in a certain geographic region and I am doing some joins and couple of filtering as well.
This is my query:
SELECT "events".* FROM "events" INNER JOIN "albums" ON "albums"."event_id" = "events"."id" INNER JOIN "photos" ON "photos"."album_id" = "albums"."id" WHERE "events"."deleted_at" IS NULL AND "albums"."deleted_at" IS NULL AND "photos"."deleted_at" IS NULL AND (events.latitude BETWEEN -44.197088742316055 AND -23.22003941183816 AND events.longitude BETWEEN 133.226480859375 AND 165.570230859375) GROUP BY events.id HAVING count(albums.id) > 0 ORDER BY start_date DESC
I have the following indexes:
Events:
"events_pkey" PRIMARY KEY, btree (id)
"index_events_on_deleted_at" btree (deleted_at)
"index_events_on_latitude_and_longitude" btree (latitude, longitude)
Albums:
"albums_pkey" PRIMARY KEY, btree (id)
"index_albums_on_deleted_at" btree (deleted_at)
"index_albums_on_event_id" btree (event_id)
Photos:
"photos_pkey" PRIMARY KEY, btree (id)
"index_photos_on_album_id" btree (album_id)
"index_photos_on_deleted_at" btree (deleted_at)
Doing an EXPLAIN ANALYZE results in this and I dont see any usage of my indexes which. I am not sure how to force it to use the indexes. Can anyone help me optimize this?
Sort (cost=4057.46..4057.84 rows=150 width=668) (actual time=556.114..556.187 rows=76 loops=1)
Sort Key: events.start_date
Sort Method: quicksort Memory: 78kB
-> HashAggregate (cost=4050.16..4052.04 rows=150 width=668) (actual time=555.667..555.783 rows=76 loops=1)
Filter: (count(albums.id) > 0)
-> Hash Join (cost=76.14..3946.54 rows=20724 width=668) (actual time=3.675..467.578 rows=48050 loops=1)
Hash Cond: (photos.album_id = albums.id)
-> Seq Scan on photos (cost=0.00..3441.87 rows=59013 width=4) (actual time=0.008..169.206 rows=60599 loops=1)
Filter: (deleted_at IS NULL)
-> Hash (cost=74.10..74.10 rows=163 width=668) (actual time=3.633..3.633 rows=318 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 176kB
-> Hash Join (cost=49.80..74.10 rows=163 width=668) (actual time=1.195..2.519 rows=318 loops=1)
Hash Cond: (albums.event_id = events.id)
-> Seq Scan on albums (cost=0.00..21.47 rows=321 width=8) (actual time=0.011..0.458 rows=321 loops=1)
Filter: (deleted_at IS NULL)
-> Hash (cost=47.92..47.92 rows=150 width=664) (actual time=1.151..1.151 rows=195 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 126kB
-> Seq Scan on events (cost=0.00..47.92 rows=150 width=664) (actual time=0.007..0.488 rows=195 loops=1)
Filter: ((deleted_at IS NULL) AND (latitude >= (-44.1970887423161)::double precision) AND (latitude <= (-23.2200394118382)::double precision) AND (longitude >= 133.226480859375::double precision) AND (longitude <= 165.570230859375::double precision))
Total runtime: 556.459 ms
Thanks!!
EDIT: Thanks for the links. I have tried disabling seqscan. Now my plan is:
Sort (cost=5565.73..5566.10 rows=150 width=46) (actual time=451.208..451.290 rows=76 loops=1)
Sort Key: (date(events.start_date))
Sort Method: quicksort Memory: 31kB
-> GroupAggregate (cost=0.00..5560.31 rows=150 width=46) (actual time=2.990..450.850 rows=76 loops=1)
Filter: (count(albums.id) > 0)
-> Nested Loop (cost=0.00..5454.44 rows=20724 width=46) (actual time=0.077..278.319 rows=48050 loops=1)
-> Merge Join (cost=0.00..205.35 rows=163 width=46) (actual time=0.051..2.856 rows=318 loops=1)
Merge Cond: (events.id = albums.event_id)
-> Index Scan using events_pkey on events (cost=0.00..118.72 rows=150 width=42) (actual time=0.024..0.792 rows=195 loops=1)
Filter: ((deleted_at IS NULL) AND (latitude >= (-44.1970887423161)::double precision) AND (latitude <= (-23.2200394118382)::double precision)
AND (longitude >= 133.226480859375::double precision) AND (longitude <= 165.570230859375::double precision))
-> Index Scan using index_albums_on_event_id on albums (cost=0.00..83.83 rows=321 width=8) (actual time=0.017..0.832 rows=321 loops=1)
Filter: (deleted_at IS NULL)
-> Index Scan using index_photos_on_album_id on photos (cost=0.00..30.27 rows=155 width=4) (actual time=0.010..0.409 rows=151 loops=318)
Index Cond: (album_id = albums.id)
Filter: (deleted_at IS NULL)
Total runtime: 451.562 ms
Still Indexes are not being fully used especially for the latitude and long conditions. Do I have my indexes setup correctly?
EDIT: After looking at answers at http://stackoverflow.com/questions/8228326/how-can-i-avoid-postgresql-sometimes-choosing-a-bad-query-plan-for-one-of-two-ne, I assumed that its behaving like this because my query was returning all records, then I updated the conditions, and the new query plan is:
Sort (cost=786.18..786.22 rows=19 width=668) (actual time=3.754..3.755 rows=2 loops=1)
Sort Key: events.start_date
Sort Method: quicksort Memory: 25kB
-> HashAggregate (cost=785.54..785.77 rows=19 width=668) (actual time=3.700..3.703 rows=2 loops=1)
Filter: (count(albums.id) > 0)
-> Nested Loop (cost=48.39..765.51 rows=2670 width=668) (actual time=1.116..2.968 rows=543 loops=1)
-> Hash Join (cost=48.39..89.25 rows=21 width=668) (actual time=1.093..1.128 rows=3 loops=1)
Hash Cond: (events.id = albums.event_id)
-> Bitmap Heap Scan on events (cost=9.42..49.44 rows=19 width=664) (actual time=0.061..0.080 rows=9 loops=1)
Recheck Cond: ((latitude >= (-33.7474111086624)::double precision) AND (latitude <= (-33.581678187556)::double precision) AND (longitude >= 151.193933862305::double precision) AND (longitude <= 151.44661940918::double precision))
Filter: (deleted_at IS NULL)
-> Bitmap Index Scan on index_events_on_latitude_and_longitude (cost=0.00..9.42 rows=28 width=0) (actual time=0.050..0.050 rows=9 loops=1)
Index Cond: ((latitude >= (-33.7474111086624)::double precision) AND (latitude <= (-33.581678187556)::double precision) AND (longitude >= 151.193933862305::double precision) AND (longitude <= 151.44661940918::double precision))
-> Hash (cost=34.95..34.95 rows=321 width=8) (actual time=0.992..0.992 rows=321 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 13kB
-> Bitmap Heap Scan on albums (cost=14.74..34.95 rows=321 width=8) (actual time=0.069..0.570 rows=321 loops=1)
Recheck Cond: (deleted_at IS NULL)
-> Bitmap Index Scan on index_albums_on_deleted_at (cost=0.00..14.66 rows=321 width=0) (actual time=0.056..0.056 rows=321 loops=1)
Index Cond: (deleted_at IS NULL)
-> Index Scan using index_photos_on_album_id on photos (cost=0.00..30.27 rows=155 width=4) (actual time=0.014..0.273 rows=181 loops=3)
Index Cond: (album_id = albums.id)
Filter: (deleted_at IS NULL)
Total runtime: 3.958 ms
And the time is verrry less!!
Any suggestions?
This is mostly because you did not set a LIMIT clause on your query. Without LIMIT you always requests all data from your tables, so looking at their indexes too would be insufficient.
SQLFiddle 1, 2 vs. 3, 4
Also note, that FOREIGN KEY does not add indexes (but UNIQUE & PRIMARY KEY constraint does). So you might want to add indexes for albums.event_id & photos.album_id.
SQLFiddle 3 vs. 4
Sorting can use an index, if present. At your query this means an index on events.start_date.

Extremely slow query on 1st run, even with indexes

I have an extremely slow query that is slow despite indexes being used(on the order of 1-3 minutes). Similar queries will be run 4-6 times by the user, so speed is critical.
QUERY:
SELECT SUM(bh.count) AS count,b.time AS batchtime
FROM
batchtimes AS b
INNER JOIN batchtimes_headlines AS bh ON b.hashed_id = bh.batchtime_hashed_id
INNER JOIN headlines_ngrams AS hn ON bh.headline_hashed_id = hn.headline_hashed_id
INNER JOIN ngrams AS n ON hn.ngram_hashed_id = n.hashed_id
INNER JOIN homepages_headlines AS hh ON bh.headline_hashed_id = hh.headline_hashed_id
INNER JOIN homepages AS hp ON hh.homepage_hashed_id = hp.hashed_id
WHERE
b.time IN (SELECT * FROM generate_series('2013-10-10 20:00:00.000000'::timestamp,'2014-02-16 20:00:00.000000'::timestamp,'1 hours'))
AND ( n.gram = 'a' )
AND hp.url = 'www.abcdefg.com'
GROUP BY
b.time
ORDER BY
b.time ASC;
EXPLAIN ANALYZE after very first run:
GroupAggregate (cost=6863.26..6863.79 rows=30 width=12) (actual time=90905.858..90908.889 rows=3039 loops=1)
-> Sort (cost=6863.26..6863.34 rows=30 width=12) (actual time=90905.853..90906.971 rows=19780 loops=1)
Sort Key: b."time"
Sort Method: quicksort Memory: 1696kB
-> Hash Join (cost=90.16..6862.52 rows=30 width=12) (actual time=378.784..90890.636 rows=19780 loops=1)
Hash Cond: (b."time" = generate_series.generate_series)
-> Nested Loop (cost=73.16..6845.27 rows=60 width=12) (actual time=375.644..90859.059 rows=22910 loops=1)
-> Nested Loop (cost=72.88..6740.51 rows=60 width=37) (actual time=375.624..90618.828 rows=22910 loops=1)
-> Nested Loop (cost=42.37..4391.06 rows=1 width=66) (actual time=368.993..54607.402 rows=1213 loops=1)
-> Nested Loop (cost=42.23..4390.18 rows=5 width=99) (actual time=223.681..53051.774 rows=294787 loops=1)
-> Nested Loop (cost=41.68..4379.19 rows=5 width=33) (actual time=223.643..49403.746 rows=294787 loops=1)
-> Index Scan using by_gram_ngrams on ngrams n (cost=0.56..8.58 rows=1 width=33) (actual time=17.001..17.002 rows=1 loops=1)
Index Cond: ((gram)::text = 'a'::text)
-> Bitmap Heap Scan on headlines_ngrams hn (cost=41.12..4359.59 rows=1103 width=66) (actual time=206.634..49273.363 rows=294787 loops=1)
Recheck Cond: ((ngram_hashed_id)::text = (n.hashed_id)::text)
-> Bitmap Index Scan on by_ngramhashedid_headlinesngrams (cost=0.00..40.84 rows=1103 width=0) (actual time=143.430..143.430 rows=294787 loops=1)
Index Cond: ((ngram_hashed_id)::text = (n.hashed_id)::text)
-> Index Scan using by_headlinehashedid_homepagesheadlines on homepages_headlines hh (cost=0.56..2.19 rows=1 width=66) (actual time=0.011..0.011 rows=1 loops=294787)
Index Cond: ((headline_hashed_id)::text = (hn.headline_hashed_id)::text)
-> Index Scan using by_hashedid_homepages on homepages hp (cost=0.14..0.17 rows=1 width=33) (actual time=0.005..0.005 rows=0 loops=294787)
Index Cond: ((hashed_id)::text = (hh.homepage_hashed_id)::text)
Filter: ((url)::text = 'www.abcdefg.com'::text)
Rows Removed by Filter: 1
-> Bitmap Heap Scan on batchtimes_headlines bh (cost=30.51..2333.86 rows=1560 width=70) (actual time=7.977..29.674 rows=19 loops=1213)
Recheck Cond: ((headline_hashed_id)::text = (hn.headline_hashed_id)::text)
-> Bitmap Index Scan on by_headlinehashedid_batchtimesheadlines (cost=0.00..30.12 rows=1560 width=0) (actual time=6.595..6.595 rows=19 loops=1213)
Index Cond: ((headline_hashed_id)::text = (hn.headline_hashed_id)::text)
-> Index Scan using by_hashedid_batchtimes on batchtimes b (cost=0.28..1.74 rows=1 width=41) (actual time=0.009..0.009 rows=1 loops=22910)
Index Cond: ((hashed_id)::text = (bh.batchtime_hashed_id)::text)
-> Hash (cost=14.50..14.50 rows=200 width=8) (actual time=3.130..3.130 rows=3097 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 121kB
-> HashAggregate (cost=12.50..14.50 rows=200 width=8) (actual time=1.819..2.342 rows=3097 loops=1)
-> Function Scan on generate_series (cost=0.00..10.00 rows=1000 width=8) (actual time=0.441..0.714 rows=3097 loops=1)
Total runtime: 90911.001 ms
EXPLAIN ANALYZE after 2nd run:
GroupAggregate (cost=6863.26..6863.79 rows=30 width=12) (actual time=3122.861..3125.796 rows=3039 loops=1)
-> Sort (cost=6863.26..6863.34 rows=30 width=12) (actual time=3122.857..3123.882 rows=19780 loops=1)
Sort Key: b."time"
Sort Method: quicksort Memory: 1696kB
-> Hash Join (cost=90.16..6862.52 rows=30 width=12) (actual time=145.396..3116.467 rows=19780 loops=1)
Hash Cond: (b."time" = generate_series.generate_series)
-> Nested Loop (cost=73.16..6845.27 rows=60 width=12) (actual time=142.406..3102.864 rows=22910 loops=1)
-> Nested Loop (cost=72.88..6740.51 rows=60 width=37) (actual time=142.395..3011.768 rows=22910 loops=1)
-> Nested Loop (cost=42.37..4391.06 rows=1 width=66) (actual time=142.229..2969.144 rows=1213 loops=1)
-> Nested Loop (cost=42.23..4390.18 rows=5 width=99) (actual time=135.799..2142.666 rows=294787 loops=1)
-> Nested Loop (cost=41.68..4379.19 rows=5 width=33) (actual time=135.768..437.824 rows=294787 loops=1)
-> Index Scan using by_gram_ngrams on ngrams n (cost=0.56..8.58 rows=1 width=33) (actual time=0.030..0.031 rows=1 loops=1)
Index Cond: ((gram)::text = 'a'::text)
-> Bitmap Heap Scan on headlines_ngrams hn (cost=41.12..4359.59 rows=1103 width=66) (actual time=135.732..405.943 rows=294787 loops=1)
Recheck Cond: ((ngram_hashed_id)::text = (n.hashed_id)::text)
-> Bitmap Index Scan on by_ngramhashedid_headlinesngrams (cost=0.00..40.84 rows=1103 width=0) (actual time=72.570..72.570 rows=294787 loops=1)
Index Cond: ((ngram_hashed_id)::text = (n.hashed_id)::text)
-> Index Scan using by_headlinehashedid_homepagesheadlines on homepages_headlines hh (cost=0.56..2.19 rows=1 width=66) (actual time=0.005..0.005 rows=1 loops=294787)
Index Cond: ((headline_hashed_id)::text = (hn.headline_hashed_id)::text)
-> Index Scan using by_hashedid_homepages on homepages hp (cost=0.14..0.17 rows=1 width=33) (actual time=0.003..0.003 rows=0 loops=294787)
Index Cond: ((hashed_id)::text = (hh.homepage_hashed_id)::text)
Filter: ((url)::text = 'www.abcdefg.com'::text)
Rows Removed by Filter: 1
-> Bitmap Heap Scan on batchtimes_headlines bh (cost=30.51..2333.86 rows=1560 width=70) (actual time=0.015..0.031 rows=19 loops=1213)
Recheck Cond: ((headline_hashed_id)::text = (hn.headline_hashed_id)::text)
-> Bitmap Index Scan on by_headlinehashedid_batchtimesheadlines (cost=0.00..30.12 rows=1560 width=0) (actual time=0.013..0.013 rows=19 loops=1213)
Index Cond: ((headline_hashed_id)::text = (hn.headline_hashed_id)::text)
-> Index Scan using by_hashedid_batchtimes on batchtimes b (cost=0.28..1.74 rows=1 width=41) (actual time=0.003..0.004 rows=1 loops=22910)
Index Cond: ((hashed_id)::text = (bh.batchtime_hashed_id)::text)
-> Hash (cost=14.50..14.50 rows=200 width=8) (actual time=2.982..2.982 rows=3097 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 121kB
-> HashAggregate (cost=12.50..14.50 rows=200 width=8) (actual time=1.771..2.311 rows=3097 loops=1)
-> Function Scan on generate_series (cost=0.00..10.00 rows=1000 width=8) (actual time=0.439..0.701 rows=3097 loops=1)
Total runtime: 3125.985 ms
I have a 32GB server. Here are the modifications to postgresql.conf:
default_statistics_target = 100
maintenance_work_mem = 1920MB
checkpoint_completion_target = 0.9
effective_cache_size = 16GB
work_mem = 160MB
wal_buffers = 16MB
checkpoint_segments = 32
shared_buffers = 7680MB
DB has recently been Vacuumed, re-indexed, and analyze.
Any suggestions for how to tune this query?
This may or may not answer to your question. i cannot comment above, since i dont have 50 rep's as per Stack overflow. :/
My first question is why Inner Join..? This will return you unwanted Columns in your Inner join result. For example in your query when you inner join
INNER JOIN headlines_ngrams AS hn ON bh.headline_hashed_id = hn.headline_hashed_id
The result will have two columns with same information which is redundant. so for example if you have 100,000,000 rows, you will have bh.headline_hashed_id and hh.headline_hashed_id 100,000,000 entries in each column. in your query above you are joining 5 tables. Plus you are interested in only
SELECT SUM(bh.count) AS count,b.time AS batchtime
so i belive you to use Natural join.
[link] (http://en.wikipedia.org/wiki/Inner_join#Inner_join)
The reason that i can think of why in second attempt you are getting a improved performance is due to cache. People have mentioned above to use temporary table for Generate_series which could be a good option. Plus if you think of using WITH in your query then, you should read this article. link