Why is this DISTINCT/INNER JOIN/ORDER BY postgresql query so slow? - sql

This query takes ~4 seconds to complete:
SELECT DISTINCT "resources_resource"."id",
"resources_resource"."heading",
"resources_resource"."name",
"resources_resource"."old_name",
"resources_resource"."clean_name",
"resources_resource"."sort_name",
"resources_resource"."see_also_id",
"resources_resource"."referenced_passages",
"resources_resource"."resource_type",
"resources_resource"."ord",
"resources_resource"."content",
"resources_resource"."thumb",
"resources_resource"."resource_origin"
FROM "resources_resource"
INNER JOIN "resources_passageresource" ON ("resources_resource"."id" = "resources_passageresource"."resource_id")
WHERE "resources_passageresource"."start_ref" >= 66001001
ORDER BY "resources_resource"."ord" ASC, "resources_resource"."sort_name" ASC LIMIT 5
By popular request, EXPLAIN ANALYZE:
Limit (cost=1125.50..1125.68 rows=5 width=803) (actual time=4434.076..4434.557 rows=5 loops=1)
-> Unique (cost=1125.50..1136.91 rows=326 width=803) (actual time=4434.076..4434.557 rows=5 loops=1)
-> Sort (cost=1125.50..1126.32 rows=326 width=803) (actual time=4434.075..4434.075 rows=6 loops=1)
Sort Key: resources_resource.ord, resources_resource.sort_name, resources_resource.id, resources_resource.heading, resources_resource.name, resources_resource.old_name, resources_resource.clean_name, resources_resource.see_also_id, resources_resource.referenced_passages, resources_resource.resource_type, resources_resource.content, resources_resource.thumb, resources_resource.resource_origin
Sort Method: quicksort Memory: 424kB
-> Hash Join (cost=697.00..1111.89 rows=326 width=803) (actual time=3.453..41.429 rows=424 loops=1)
Hash Cond: (resources_passageresource.resource_id = resources_resource.id)
-> Bitmap Heap Scan on resources_passageresource (cost=10.78..190.19 rows=326 width=4) (actual time=0.107..0.401 rows=424 loops=1)
Recheck Cond: (start_ref >= 66001001)
-> Bitmap Index Scan on resources_passageresource_start_ref (cost=0.00..10.70 rows=326 width=0) (actual time=0.086..0.086 rows=424 loops=1)
Index Cond: (start_ref >= 66001001)
-> Hash (cost=431.32..431.32 rows=2232 width=803) (actual time=3.228..3.228 rows=2232 loops=1)
Buckets: 1024 Batches: 2 Memory Usage: 947kB
-> Seq Scan on resources_resource (cost=0.00..431.32 rows=2232 width=803) (actual time=0.002..1.621 rows=2232 loops=1)
Total runtime: 4435.460 ms
This is ORM-generated SQL. I can work in SQL, but I'm definitely not proficient, and the EXPLAIN output here is mystifying to me. What about this query is dragging me down?
UPDATE: #Ybakos identified that the ORDER_BY was causing trouble. Removing the ORDER_BY clause altogether helps a bit, but the query still takes 800ms. Here's the EXPLAIN ANALYZE, sans ORDER_BY:
HashAggregate (cost=1122.49..1125.75 rows=326 width=803) (actual time=787.519..787.559 rows=104 loops=1)
-> Hash Join (cost=697.00..1111.89 rows=326 width=803) (actual time=3.381..7.312 rows=424 loops=1)
Hash Cond: (resources_passageresource.resource_id = resources_resource.id)
-> Bitmap Heap Scan on resources_passageresource (cost=10.78..190.19 rows=326 width=4) (actual time=0.095..0.686 rows=424 loops=1)
Recheck Cond: (start_ref >= 66001001)
-> Bitmap Index Scan on resources_passageresource_start_ref (cost=0.00..10.70 rows=326 width=0) (actual time=0.079..0.079 rows=424 loops=1)
Index Cond: (start_ref >= 66001001)
-> Hash (cost=431.32..431.32 rows=2232 width=803) (actual time=3.173..3.173 rows=2232 loops=1)
Buckets: 1024 Batches: 2 Memory Usage: 947kB
-> Seq Scan on resources_resource (cost=0.00..431.32 rows=2232 width=803) (actual time=0.002..1.568 rows=2232 loops=1)
Total runtime: 787.678 ms

It seems to me, DISTINCT has to be used to remove duplicates produced by the join. So my question is, why produce the duplicates in the first place? I'm not entirely sure what this query's being ORM-generated must imply, but if rewriting it is an option, you could certainly rewrite it in such a way as to prevent duplicates from appearing. For instance, using IN:
SELECT "resources_resource"."id",
"resources_resource"."heading",
"resources_resource"."name",
"resources_resource"."old_name",
"resources_resource"."clean_name",
"resources_resource"."sort_name",
"resources_resource"."see_also_id",
"resources_resource"."referenced_passages",
"resources_resource"."resource_type",
"resources_resource"."ord",
"resources_resource"."content",
"resources_resource"."thumb",
"resources_resource"."resource_origin"
FROM "resources_resource"
WHERE "resources_resource"."id" IN (
SELECT "resources_passageresource"."resource_id"
FROM "resources_passageresource"
WHERE "resources_passageresource"."start_ref" >= 66001001
)
ORDER BY "resources_resource"."ord" ASC, "resources_resource"."sort_name" ASC LIMIT 5
or using EXISTS:
SELECT "resources_resource"."id",
"resources_resource"."heading",
"resources_resource"."name",
"resources_resource"."old_name",
"resources_resource"."clean_name",
"resources_resource"."sort_name",
"resources_resource"."see_also_id",
"resources_resource"."referenced_passages",
"resources_resource"."resource_type",
"resources_resource"."ord",
"resources_resource"."content",
"resources_resource"."thumb",
"resources_resource"."resource_origin"
FROM "resources_resource"
WHERE EXISTS (
SELECT *
FROM "resources_passageresource"
WHERE "resources_passageresource"."resource_id" = "resources_resource"."id"
AND "resources_passageresource"."start_ref" >= 66001001
)
ORDER BY "resources_resource"."ord" ASC, "resources_resource"."sort_name" ASC LIMIT 5
And, of course, if it's acceptable to rewrite the query completely, I would also remove the long table names in front of column names. Consider the following, for instance (the IN query rewritten):
SELECT "id",
"heading",
"name",
"old_name",
"clean_name",
"sort_name",
"see_also_id",
"referenced_passages",
"resource_type",
"ord",
"content",
"thumb",
"resource_origin"
FROM "resources_resource"
WHERE "resources_resource"."id" IN (
SELECT "resource_id"
FROM "resources_passageresource"
WHERE "start_ref" >= 66001001
)
ORDER BY "ord" ASC, "sort_name" ASC LIMIT 5

It's the combination of ORDER BY with LIMIT.
If you don't have an index on (ord, sort_name) then I bet this is the cause of the slow performance. Or perhaps an index on (start_ref, ord, sort_name) is necessary for this particular query. Lastly, due to that join, perhaps have the left/first table be the one upon which your ORDER BY criteria applies.

That seems like a long time in the JOIN. The default memory settings in postgresql.conf are too low for any modern computer. Have you remembered to bump them up?

Related

Explain analyze slower than actual query in postgres

I have the following query
select * from activity_feed where user_id in (select following_id from user_follow where follower_id=:user_id)
union
select * from activity_feed where project_id in (select project_id from user_project_follow where user_id=:user_id)
order by id desc limit 30
Which runs in approximately 14 ms according to postico
But when i do explain analyze on this query , the plannig time is 0.5 ms and the execution time is around 800 ms (which is what i would actually expect). Is this because the query without explain analyze is returning cached results? I still get less than 20 ms results even when. use other values.
Which one is more indictivie of the performance I'll get in production? I also realized that this is a rather inefficient query, I can't seem to figure out an index that would make this more efficient. It's possible that I will have to not use union
Edit: the execution plan
Limit (cost=1380.94..1380.96 rows=10 width=148) (actual time=771.111..771.405 rows=10 loops=1)
-> Sort (cost=1380.94..1385.64 rows=1881 width=148) (actual time=771.097..771.160 rows=10 loops=1)
Sort Key: activity_feed."timestamp" DESC
Sort Method: top-N heapsort Memory: 27kB
-> HashAggregate (cost=1321.48..1340.29 rows=1881 width=148) (actual time=714.888..743.273 rows=4462 loops=1)
Group Key: activity_feed.id, activity_feed."timestamp", activity_feed.user_id, activity_feed.verb, activity_feed.object_type, activity_feed.object_id, activity_feed.project_id, activity_feed.privacy_level, activity_feed.local_time, activity_feed.local_date
-> Append (cost=5.12..1274.46 rows=1881 width=148) (actual time=0.998..682.466 rows=4487 loops=1)
-> Hash Join (cost=5.12..610.43 rows=1350 width=70) (actual time=0.982..326.089 rows=3013 loops=1)
Hash Cond: (activity_feed.user_id = user_follow.following_id)
-> Seq Scan on activity_feed (cost=0.00..541.15 rows=24215 width=70) (actual time=0.016..150.535 rows=24215 loops=1)
-> Hash (cost=4.78..4.78 rows=28 width=8) (actual time=0.911..0.922 rows=29 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 10kB
-> Index Only Scan using unique_user_follow_pair on user_follow (cost=0.29..4.78 rows=28 width=8) (actual time=0.022..0.334 rows=29 loops=1)
Index Cond: (follower_id = '17420532762804570'::bigint)
Heap Fetches: 0
-> Hash Join (cost=30.50..635.81 rows=531 width=70) (actual time=0.351..301.945 rows=1474 loops=1)
Hash Cond: (activity_feed_1.project_id = user_project_follow.project_id)
-> Seq Scan on activity_feed activity_feed_1 (cost=0.00..541.15 rows=24215 width=70) (actual time=0.027..143.896 rows=24215 loops=1)
-> Hash (cost=30.36..30.36 rows=11 width=8) (actual time=0.171..0.182 rows=11 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Index Only Scan using idx_user_project_follow_temp on user_project_follow (cost=0.28..30.36 rows=11 width=8) (actual time=0.020..0.102 rows=11 loops=1)
Index Cond: (user_id = '17420532762804570'::bigint)
Heap Fetches: 11
Planning Time: 0.571 ms
Execution Time: 771.774 ms
Thanks for the help in advance!
Very slow clock access like you show here (nearly 100 fold slower when TIMING defaults to ON!) usually indicates either old hardware or an old kernel IME. Not being able to trust EXPLAIN (ANALYZE) to get good data can be very frustrating if you are very particular about performance, so you should consider upgrading your hardware or your OS.

Why is Postgres query planner affected by LIMIT?

EXPLAIN ANALYZE SELECT "alerts"."id",
"alerts"."created_at",
't1'::text AS src_table
FROM "alerts"
INNER JOIN "devices"
ON "devices"."id" = "alerts"."device_id"
INNER JOIN "sites"
ON "sites"."id" = "devices"."site_id"
WHERE "sites"."cloud_id" = 111
AND "alerts"."created_at" >= '2019-08-30'
ORDER BY "created_at" DESC limit 9;
Limit (cost=1.15..36021.60 rows=9 width=16) (actual time=30.505..29495.765 rows=9 loops=1)
-> Nested Loop (cost=1.15..232132.92 rows=58 width=16) (actual time=30.504..29495.755 rows=9 loops=1)
-> Nested Loop (cost=0.86..213766.42 rows=57231 width=24) (actual time=0.029..29086.323 rows=88858 loops=1)
-> Index Scan Backward using alerts_created_at_index on alerts (cost=0.43..85542.16 rows=57231 width=24) (actual time=0.014..88.137 rows=88858 loops=1)
Index Cond: (created_at >= '2019-08-30 00:00:00'::timestamp without time zone)
-> Index Scan using devices_pkey on devices (cost=0.43..2.23 rows=1 width=16) (actual time=0.016..0.325 rows=1 loops=88858)
Index Cond: (id = alerts.device_id)
-> Index Scan using sites_pkey on sites (cost=0.29..0.31 rows=1 width=8) (actual time=0.004..0.004 rows=0 loops=88858)
Index Cond: (id = devices.site_id)
Filter: (cloud_id = 7231)
Rows Removed by Filter: 1
Total runtime: 29495.816 ms
Now we change to LIMIT 10:
EXPLAIN ANALYZE SELECT "alerts"."id",
"alerts"."created_at",
't1'::text AS src_table
FROM "alerts"
INNER JOIN "devices"
ON "devices"."id" = "alerts"."device_id"
INNER JOIN "sites"
ON "sites"."id" = "devices"."site_id"
WHERE "sites"."cloud_id" = 111
AND "alerts"."created_at" >= '2019-08-30'
ORDER BY "created_at" DESC limit 10;
Limit (cost=39521.79..39521.81 rows=10 width=16) (actual time=1.557..1.559 rows=10 loops=1)
-> Sort (cost=39521.79..39521.93 rows=58 width=16) (actual time=1.555..1.555 rows=10 loops=1)
Sort Key: alerts.created_at
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=5.24..39520.53 rows=58 width=16) (actual time=0.150..1.543 rows=11 loops=1)
-> Nested Loop (cost=4.81..16030.12 rows=2212 width=8) (actual time=0.137..0.643 rows=31 loops=1)
-> Index Scan using sites_cloud_id_index on sites (cost=0.29..64.53 rows=31 width=8) (actual time=0.014..0.057 rows=23 loops=1)
Index Cond: (cloud_id = 7231)
-> Bitmap Heap Scan on devices (cost=4.52..512.32 rows=270 width=16) (actual time=0.020..0.025 rows=1 loops=23)
Recheck Cond: (site_id = sites.id)
-> Bitmap Index Scan on devices_site_id_index (cost=0.00..4.46 rows=270 width=0) (actual time=0.006..0.006 rows=9 loops=23)
Index Cond: (site_id = sites.id)
-> Index Scan using alerts_device_id_index on alerts (cost=0.43..10.59 rows=3 width=24) (actual time=0.024..0.028 rows=0 loops=31)
Index Cond: (device_id = devices.id)
Filter: (created_at >= '2019-08-30 00:00:00'::timestamp without time zone)
Rows Removed by Filter: 12
Total runtime: 1.603 ms
alerts table has millions of records, other tables are counted in thousands.
I can already optimize the query by simply not using limit < 10. What I don't understand is why the LIMIT affects the performance. Perhaps there's a better way than hardcoding this magic number "10".
The number of result rows affects the PostgreSQL optimizer, because plans that return the first few rows quickly are not necessarily plans that return the whole result as fast as possible.
In your case, PostgreSQL thinks that for small values of LIMIT, it will be faster by scanning the alerts table in the order of the ORDER BY clause using an index and just join the other tables using a nested loop until it has found 9 rows.
The benefit of such a strategy is that it doesn't have to calculate the complete result of the join, then sort it and throw away all but the first few result rows.
The danger is that it takes longer than expected to find the 9 matching rows, and this is what hits you:
Index Scan Backward using alerts_created_at_index on alerts (cost=0.43..85542.16 rows=57231 width=24) (actual time=0.014..88.137 rows=88858 loops=1)
So PostgreSQL has to process 88858 rows and use a nested loop join (which is inefficient if it has to loop often) until it finds 9 result rows. This may be because it underestimates the selectivity of the conditions, or because the many matching rows all happen to have low created_at.
The number 10 just happens to be the cut-off point where PostgreSQL thinks it will no longer be more efficient to use that strategy, it is a value that will change as the data in the database change.
You can avoid using that plan altogether by using an ORDER BY clause that does not match the index:
ORDER BY (created_at + INTERVAL '0 days') DESC

Query Tuning in PostgreSQL [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
I have a query that is running in 17s, but I can not think of a way to optimize this query. Some help is much needed.
EXPLAIN ANALYSE
CREATE materialized VIEW professores_fizeram_planejamentoTEST as
SELECT unities.id as id_escola,
unities.name as nome_escola,
teachers.id as id_professor,
teachers.name as nome_professor,
datas.dia,
COALESCE((SELECT true
FROM lesson_plans
WHERE lesson_plans.teacher_id = teachers.id and
datas.dia between lesson_plans.start_at and lesson_plans.end_at
LIMIT 1), false) as criou_plano_aula,
COALESCE((select true
from content_records
where content_records.teacher_id = teachers.id and
content_records.record_date = datas.dia
limit 1), false) as criou_registro_conteudo
FROM (SELECT i::date as dia,
EXTRACT(year FROM i::date) as ano
FROM generate_series(date_trunc('year', now()), now(), '1 day'::INTERVAL) i
WHERE EXTRACT(dow from i::timestamp) in (1,2,3,4,5)) datas
JOIN (SELECT distinct teacher_id, classroom_id, YEAR
FROM teacher_discipline_classrooms) teacher_discipline_classrooms ON (teacher_discipline_classrooms.year = datas.ano)
JOIN classrooms on (classrooms.id = teacher_discipline_classrooms.classroom_id)
JOIN unities on (unities.id = classrooms.unity_id)
JOIN teachers on (teachers.id = teacher_discipline_classrooms.teacher_id)
WHERE NOT EXISTS(SELECT 1
FROM school_calendars
JOIN school_calendar_events on (school_calendar_events.school_calendar_id = school_calendars.id and
school_calendar_events.event_type = 'no_school' and
datas.dia between school_calendar_events.start_date and school_calendar_events.end_date)
WHERE school_calendars.unity_id = unities.id)
This query returns the following analysis
Nested Loop (cost=143.840..3721.540 rows=38 width=66) (actual time=1.923..17270.125 rows=171231 loops=1)
-> Nested Loop (cost=143.690..1523.510 rows=38 width=41) (actual time=1.744..5996.571 rows=171231 loops=1)
Join Filter: (NOT (delta 3))
Rows Removed by Join Filter: 15249
-> Nested Loop (cost=143.550..203.530 rows=76 width=16) (actual time=1.661..568.049 rows=186480 loops=1)
-> Hash Join (cost=143.270..165.450 rows=76 width=16) (actual time=1.651..183.740 rows=186660 loops=1)
Hash Cond: ((victor.juliet_seven)::double precision = echo_tango('quebec_four'::text, ((alpha_quebec_whiskey.alpha_quebec_whiskey)::date)::timestamp without time zone))
-> HashAggregate (cost=121.700..127.820 rows=612 width=12) (actual time=1.384..3.336 rows=2388 loops=1)
Group Key: victor.foxtrot_six, victor.oscar_kilo, victor.juliet_seven
-> Seq Scan on victor (cost=0.000..94.400 rows=3640 width=12) (actual time=0.004..0.563 rows=3640 loops=1)
-> Hash (cost=21.260..21.260 rows=25 width=8) (actual time=0.256..0.256 rows=180 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 16kB
-> Function Scan on xray_yankee alpha_quebec_whiskey (cost=0.010..21.260 rows=25 width=8) (actual time=0.081..0.195 rows=180 loops=1)
Filter: (echo_tango('papa'::text, (alpha_quebec_whiskey)::timestamp without time zone) = ANY ('oscar_seven_charlie'::double precision[]))
Rows Removed by Filter: 72
-> Index Scan using echo_victor on uniform (cost=0.280..0.490 rows=1 width=8) (actual time=0.001..0.002 rows=1 loops=186660)
Index Cond: (quebec_seven = victor.oscar_kilo)
-> Index Scan using golf on four (cost=0.140..0.160 rows=1 width=29) (actual time=0.001..0.001 rows=1 loops=186480)
Index Cond: (quebec_seven = uniform.xray_victor)
SubPlan
-> Nested Loop (cost=0.280..34.110 rows=2 width=0) (actual time=0.027..0.027 rows=0 loops=186480)
-> Seq Scan on seven (cost=0.000..1.990 rows=2 width=4) (actual time=0.003..0.008 rows=2 loops=186480)
Filter: (xray_victor = four.quebec_seven)
Rows Removed by Filter: 75
-> Index Scan using alpha_quebec_papa on two (cost=0.280..16.050 rows=1 width=4) (actual time=0.008..0.008 rows=0 loops=372960)
Index Cond: (zulu = seven.quebec_seven)
Filter: (((xray_delta)::text = 'oscar_seven_golf'::text) AND ((alpha_quebec_whiskey.alpha_quebec_whiskey)::date >= foxtrot_three) AND ((alpha_quebec_whiskey.alpha_quebec_whiskey)::date <= lima))
Rows Removed by Filter: 14
-> Index Scan using tango on romeo (cost=0.150..0.200 rows=1 width=29) (actual time=0.001..0.001 rows=1 loops=171231)
Index Cond: (quebec_seven = victor.foxtrot_six)
SubPlan
-> Limit (cost=0.000..20.600 rows=1 width=0) (actual time=0.048..0.048 rows=0 loops=171231)
-> Seq Scan on five (cost=0.000..20.600 rows=1 width=0) (actual time=0.045..0.045 rows=0 loops=171231)
Filter: ((foxtrot_six = romeo.quebec_seven) AND ((alpha_quebec_whiskey.alpha_quebec_whiskey)::date >= oscar_echo) AND ((alpha_quebec_whiskey.alpha_quebec_whiskey)::date <= xray_three))
Rows Removed by Filter: 246
SubPlan
-> Limit (cost=4.810..37.030 rows=1 width=0) (actual time=0.015..0.015 rows=0 loops=171231)
-> Bitmap Heap Scan on whiskey (cost=4.810..37.030 rows=1 width=0) (actual time=0.011..0.011 rows=0 loops=171231)
Recheck Cond: (foxtrot_six = romeo.quebec_seven)
Filter: (foxtrot_tango = (alpha_quebec_whiskey.alpha_quebec_whiskey)::date)
Rows Removed by Filter: 28
Heap Blocks: exact=258248
-> Bitmap Index Scan on juliet_bravo (cost=0.000..4.810 rows=70 width=0) (actual time=0.003..0.003 rows=37 loops=171231)
Index Cond: (foxtrot_six = romeo.quebec_seven)
EXPLAIN 1 - RESULT
EXPLAIN 2 - RESULT
Thanks you.
no, we wont!
sanitize your query (add aliases, and use them, for instance)
COALESCE((SELECT true FROM lesson_plans WHERE lesson_plans.teacher_id = teachers.id and datas.dia between lesson_plans.start_at and lesson_plans.end_at LIMIT 1), false) as criou_plano_aula
... can be replaced by a simple EXISTS(subquery)
your outer query only refers to {unities,teachers,datas}, the rest of the tables are merely junction tables.
if there is a difference in the queryplan between expected <-->observed, your statistics are wrong.
the function scan on generate_series() spoils the queryplan. Better use a material calendartable, which could be indexed & countable.
always add the tuning parameters and an estimate of the cardinalities to the question. These are not details.
This is not an answer, but a comment that doesn't fit in the comments section. If you want to speed up your query please add some information:
First, please include the execution plan. This will tell us what's going on fast.
Also, please post:
Existing indexes on lessons_plans.
Existing indexes on content_records.
Rows on teacher_discipline_classrooms.
Existing indexes on teacher_discipline_classrooms.
Existing indexes on classrooms.
Existing indexes on unities.
indexes on teachers.
Existing indexes on shool_calendars.

How can I optimize this really slow query generated by Django?

here's my Django ORM query:
Group.objects.filter(public = True)\
.annotate(num_members = Count('members', distinct = True))\
.annotate(num_images = Count('images', distinct = True))\
.order_by(sort)
Unfortunately this is taking over 30 seconds even with just a few dozen Groups. Removing the annotate statements makes the query significantly faster at only 3 ms...
My database backend is Postgres and here's the SQL and explain:
Executed SQL
SELECT ••• FROM "astrobin_apps_groups_group"
LEFT OUTER JOIN "astrobin_apps_groups_group_members" ON (
"astrobin_apps_groups_group"."id" = "astrobin_apps_groups_group_members"."group_id"
)
LEFT OUTER JOIN "astrobin_apps_groups_group_images" ON (
"astrobin_apps_groups_group"."id" = "astrobin_apps_groups_group_images"."group_id")
WHERE "astrobin_apps_groups_group"."public" = true
GROUP BY
"astrobin_apps_groups_group"."id",
"astrobin_apps_groups_group"."date_created",
"astrobin_apps_groups_group"."date_updated",
"astrobin_apps_groups_group"."creator_id",
"astrobin_apps_groups_group"."owner_id",
"astrobin_apps_groups_group"."name",
"astrobin_apps_groups_group"."description",
"astrobin_apps_groups_group"."category",
"astrobin_apps_groups_group"."public",
"astrobin_apps_groups_group"."moderated",
"astrobin_apps_groups_group"."autosubmission",
"astrobin_apps_groups_group"."forum_id"
ORDER BY "astrobin_apps_groups_group"."date_updated" ASC
Time
30455.9268951 ms
QUERY PLAN
GroupAggregate (cost=5910.49..8288.54 rows=216 width=242) (actual time=29255.329..30269.284 rows=27 loops=1)
-> Sort (cost=5910.49..6068.88 rows=63357 width=242) (actual time=29253.278..29788.601 rows=201888 loops=1)
Sort Key: astrobin_apps_groups_group.date_updated, astrobin_apps_groups_group.id, astrobin_apps_groups_group.date_created, astrobin_apps_groups_group.creator_id, astrobin_apps_groups_group.owner_id, astrobin_apps_groups_group.name, astrobin_apps_groups_group.description, astrobin_apps_groups_group.category, astrobin_apps_groups_group.public, astrobin_apps_groups_group.moderated, astrobin_apps_groups_group.autosubmission, astrobin_apps_groups_group.forum_id
Sort Method: external merge Disk: 70176kB
-> Hash Right Join (cost=15.69..857.39 rows=63357 width=242) (actual time=1.903..397.613 rows=201888 loops=1)
Hash Cond: (astrobin_apps_groups_group_images.group_id = astrobin_apps_groups_group.id)
-> Seq Scan on astrobin_apps_groups_group_images (cost=0.00..106.05 rows=6805 width=8) (actual time=0.024..12.510 rows=6837 loops=1)
-> Hash (cost=12.31..12.31 rows=270 width=238) (actual time=1.853..1.853 rows=323 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 85kB
-> Hash Right Join (cost=3.63..12.31 rows=270 width=238) (actual time=0.133..1.252 rows=323 loops=1)
Hash Cond: (astrobin_apps_groups_group_members.group_id = astrobin_apps_groups_group.id)
-> Seq Scan on astrobin_apps_groups_group_members (cost=0.00..4.90 rows=290 width=8) (actual time=0.004..0.348 rows=333 loops=1)
-> Hash (cost=3.29..3.29 rows=27 width=234) (actual time=0.103..0.103 rows=27 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 7kB
-> Seq Scan on astrobin_apps_groups_group (cost=0.00..3.29 rows=27 width=234) (actual time=0.004..0.049 rows=27 loops=1)
Filter: public
Total runtime: 30300.606 ms
It would be great if somebody could suggest a way to optimize this. I feel like I'm missing a really low hanging fruit.
Thanks!
What are the indexes present on astrobin_apps_groups_group and "astrobin_apps_groups_group_member, astrobin_apps_groups_group_image table?
Is there any aggregate functions like SUM, COUNT used in your select? If no, then you can remove all columns from GROUP BY
The plan shows most of the time is taken for sorting. If you create an index on date_updated filed with NULLS LAST with latest values first in index, then planner may use this index for sorting.
For sorting, disk is getting used which is most costly affair. This is because your data collected for sorting is not fitting in memory. Try increasing the WORK_MEM - set work_mem='10MB'; SELECT.....

How do I speed up this Django PostgreSQL query?

I'm trying to speed up what seems to me to be quite a slow PostgreSQL query in Django (150ms)
This is the query:
SELECT ••• FROM "predictions_prediction"
INNER JOIN "minute_in_time_minute"
ON ( "predictions_prediction"."minute_id" = "minute_in_time_minute"."id" )
WHERE ("minute_in_time_minute"."datetime" >= '2014-08-21 13:12:00+00:00'
AND "predictions_prediction"."location_id" = 1
AND "minute_in_time_minute"."datetime" < '2014-08-24 13:12:00+00:00'
AND "predictions_prediction"."tide_level" >= 3.0)
ORDER BY "minute_in_time_minute"."datetime" ASC
Here's the result of the PostgreSQL EXPLAIN:
Sort (cost=17731.45..17739.78 rows=3331 width=32) (actual time=151.755..151.901 rows=3515 loops=1)
Sort Key: minute_in_time_minute.datetime
Sort Method: quicksort Memory: 371kB
-> Hash Join (cost=3187.44..17536.56 rows=3331 width=32) (actual time=96.757..150.693 rows=3515 loops=1)
Hash Cond: (predictions_prediction.minute_id = minute_in_time_minute.id)
-> Seq Scan on predictions_prediction (cost=0.00..11232.00 rows=411175 width=20) (actual time=0.017..88.063 rows=410125 loops=1)
Filter: ((tide_level >= 3::double precision) AND (location_id = 1))
Rows Removed by Filter: 115475
-> Hash (cost=3134.21..3134.21 rows=4258 width=12) (actual time=9.221..9.221 rows=4320 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 203kB
-> Bitmap Heap Scan on minute_in_time_minute (cost=92.07..3134.21 rows=4258 width=12) (actual time=1.147..8.220 rows=4320 loops=1)
Recheck Cond: ((datetime >= '2014-08-21 13:18:00+00'::timestamp with time zone) AND (datetime < '2014-08-24 13:18:00+00'::timestamp with time zone))
-> Bitmap Index Scan on minute_in_time_minute_datetime_key (cost=0.00..91.00 rows=4258 width=0) (actual time=0.851..0.851 rows=4320 loops=1)
Index Cond: ((datetime >= '2014-08-21 13:18:00+00'::timestamp with time zone) AND (datetime < '2014-08-24 13:18:00+00'::timestamp with time zone))
I've tried visualising it in an external tool (http://explain.depesz.com/s/CWW) which shows that the start of the problem is the Seq Scan on predictions_prediction
What I've tried so far:
Add an index on predictions_prediction.tide_level
Add a composite index on tide_level and location on predictions.prediction
But neither had any effect as far as I could see.
Can someone please help me interpret the query plan?
Thanks
Try creating the following composite index on predictions_prediction
(minute_id, location_id, tide_level)
150ms for this type of query (a join, several where conditions with inequalities and an order by) is actually relatively normal, so you might not be able to speed it up much more.