I'm trying to optimize the following join query:
A notification is a record that says whether a user has read some activity. One notification points to one activity but many users can be notified about an activity. The activity record has some columns such as the workspace the activity is in and the type of activity.
This query gets the user non-comment notifications that have been read in a specific workspace ordered by time.
explain analyze
select activity.id from activity, notification
where notification.user_id = '9a51f675-e1e2-46e5-8bcd-6bc535c7e7cb'
and notification.read = true
and notification.activity_id = activity.id
and activity.space_id = '6d702c09-8795-4185-abb3-dc6b3e8907dc'
and activity.type != 'commented'
order by activity.end_time desc
limit 20;
The problem is that this query has to run through every notification the user has every gotten.
Limit (cost=4912.35..4912.36 rows=1 width=24) (actual time=138.767..138.779 rows=20 loops=1)
-> Sort (cost=4912.35..4912.36 rows=1 width=24) (actual time=138.766..138.770 rows=20 loops=1)
Sort Key: activity.end_time DESC
Sort Method: top-N heapsort Memory: 27kB
-> Nested Loop (cost=32.57..4912.34 rows=1 width=24) (actual time=1.354..138.606 rows=447 loops=1)
-> Bitmap Heap Scan on notification (cost=32.01..3847.48 rows=124 width=16) (actual time=1.341..6.639 rows=1218 loops=1)
Recheck Cond: (user_id = '9a51f675-e1e2-46e5-8bcd-6bc535c7e7cb'::uuid)
Filter: read
Rows Removed by Filter: 4101
Heap Blocks: exact=4774
-> Bitmap Index Scan on notification_user_id_idx (cost=0.00..31.98 rows=988 width=0) (actual time=0.719..0.719 rows=5355 loops=1)
Index Cond: (user_id = '9a51f675-e1e2-46e5-8bcd-6bc535c7e7cb'::uuid)
-> Index Scan using activity_pkey on activity (cost=0.56..8.59 rows=1 width=24) (actual time=0.108..0.108 rows=0 loops=1218)
Index Cond: (id = notification.activity_id)
Filter: ((type <> 'commented'::activity_type) AND (space_id = '6d702c09-8795-4185-abb3-dc6b3e8907dc'::uuid))
Rows Removed by Filter: 1
Planning time: 0.428 ms
Execution time: 138.825 ms
Edit: Here is the performance after the cache has been warmed.
Limit (cost=4912.35..4912.36 rows=1 width=24) (actual time=13.618..13.629 rows=20 loops=1)
-> Sort (cost=4912.35..4912.36 rows=1 width=24) (actual time=13.617..13.621 rows=20 loops=1)
Sort Key: activity.end_time DESC
Sort Method: top-N heapsort Memory: 27kB
-> Nested Loop (cost=32.57..4912.34 rows=1 width=24) (actual time=1.365..13.447 rows=447 loops=1)
-> Bitmap Heap Scan on notification (cost=32.01..3847.48 rows=124 width=16) (actual time=1.352..6.606 rows=1218 loops=1)
Recheck Cond: (user_id = '9a51f675-e1e2-46e5-8bcd-6bc535c7e7cb'::uuid)
Filter: read
Rows Removed by Filter: 4101
Heap Blocks: exact=4774
-> Bitmap Index Scan on notification_user_id_idx (cost=0.00..31.98 rows=988 width=0) (actual time=0.729..0.729 rows=5355 loops=1)
Index Cond: (user_id = '9a51f675-e1e2-46e5-8bcd-6bc535c7e7cb'::uuid)
-> Index Scan using activity_pkey on activity (cost=0.56..8.59 rows=1 width=24) (actual time=0.005..0.005 rows=0 loops=1218)
Index Cond: (id = notification.activity_id)
Filter: ((type <> 'commented'::activity_type) AND (space_id = '6d702c09-8795-4185-abb3-dc6b3e8907dc'::uuid))
Rows Removed by Filter: 1
Planning time: 0.438 ms
Execution time: 13.673 ms
I could create a multi-column index on user_id and read, but that doesn't solve the issue I'm trying to solve.
I could solve this problem myself by manually denormalizing the data, adding the space_id, type, and end_time columns in the notification record, but that seems like it should be unnecessary.
I would expect Postgres to be able create an index across the two tables, but everything I read so far says this isn't possible.
So my question: what is the best way to optimize this query?
Edit: After creating the suggested indexes:
create index tmp_index_1 on activity using btree (
space_id,
id,
end_time
) where (
type != 'commented'
);
create index tmp_index_2 on notification using btree (
user_id,
activity_id
) where (
read = true
);
The query performance improved 3x.
explain analyse
select activity.id from activity
INNER JOIN notification ON notification.user_id = '9a51f675-e1e2-46e5-8bcd-6bc535c7e7cb'
and notification.read = true
and notification.activity_id = activity.id
and activity.space_id = '6d702c09-8795-4185-abb3-dc6b3e8907dc'
and activity.type != 'commented'
order by activity.end_time desc
limit 20;
Limit (cost=955.26..955.27 rows=1 width=24) (actual time=4.386..4.397 rows=20 loops=1)
-> Sort (cost=955.26..955.27 rows=1 width=24) (actual time=4.385..4.389 rows=20 loops=1)
Sort Key: activity.end_time DESC
Sort Method: top-N heapsort Memory: 27kB
-> Nested Loop (cost=1.12..955.25 rows=1 width=24) (actual time=0.035..4.244 rows=447 loops=1)
-> Index Only Scan using tmp_index_2 on notification (cost=0.56..326.71 rows=124 width=16) (actual time=0.017..1.039 rows=1218 loops=1)
Index Cond: (user_id = '9a51f675-e1e2-46e5-8bcd-6bc535c7e7cb'::uuid)
Heap Fetches: 689
-> Index Only Scan using tmp_index_1 on activity (cost=0.56..5.07 rows=1 width=24) (actual time=0.002..0.002 rows=0 loops=1218)
Index Cond: ((space_id = '6d702c09-8795-4185-abb3-dc6b3e8907dc'::uuid) AND (id = notification.activity_id))
Heap Fetches: 1
Planning time: 0.484 ms
Execution time: 4.428 ms
The one thing that still bothers me about this query is the rows=1218 and loops=1218. This query is looping through all of the read user notifications and querying against the activities table.
I would expect to be able to create a single index to read all of this in a manner that would mimic denormalizing this data. For example, if I add space_id, type, and end_time to the notification table, I could create the following index and read in fractions of a millisecond.
create index tmp_index_3 on notification using btree (
user_id,
space_id,
end_time desc
) where (
read = true
and type != 'commented'
);
Is this not currently possible within Postgres without denormalizing?
Add the index:
create index ix1_activity on activity (space_id, type, end_time, id);
create index ix2_notification on notification (activity_id, user_id, read);
These two "covering indexes" could make your query real fast.
Additionally, with a little bit of luck, it will read the activity table first (only 20 rows), and perform a Nested Loop Join (NLJ) on notification. That is, a very limited index walk.
looking to your code you should use for filter a composite index on
table notification columns : user_id, read, activity_id
table activity columns space_id, type, id
and for query and order by you could also add end_time in composite for activity
table activity columns space_id, type, id, end_time
and you should also use explict inner join sintax
select activity.id from activity
INNER JOIN notification ON notification.user_id = '9a51f675-e1e2-46e5-8bcd-6bc535c7e7cb'
and notification.read = true
and notification.activity_id = activity.id
and activity.space_id = '6d702c09-8795-4185-abb3-dc6b3e8907dc'
and activity.type != 'commented'
order by activity.end_time desc
limit 20;
Related
I'm trying to create some queries in order to implement a cursor pagination (something like this: https://shopify.engineering/pagination-relative-cursors) on Postgres. In my implementation I'm trying to reach an efficient pagination even with ordering NON-unique columns.
I'm struggling to do that efficiently, in particular on the query that retrieves the previous page given a specific cursor.
The table that I'm using (>3M records) to test these query is very simple, and it has this structure:
CREATE TABLE "placemarks" (
"id" serial NOT NULL DEFAULT,
"assetId" text,
"createdAt" timestamptz,
PRIMARY KEY ("id")
);
I have an index on the id field clearly and also an index on the assetId column.
This is the query I'm using for retrieving the next page given a cursor composed by the latest ID and the latest assetId:
SELECT
*
FROM
"placemarks"
WHERE
"assetId" > 'CURSOR_ASSETID'
or("assetId" = 'CURSOR_ASSETID'
AND id > CURSOR_INT_ID)
ORDER BY
"assetId",
id
LIMIT 5;
This query is actually pretty fast, it uses the indexes and it allows to handle also duplicated values on assetId by using the unique ID field in order to avoid skipping duplicated rows with same CURSOR_ASSETID values.
-> Sort (cost=25709.62..25726.63 rows=6803 width=2324) (actual time=0.128..0.138 rows=5 loops=1)
" Sort Key: ""assetId"", id"
Sort Method: top-N heapsort Memory: 45kB
-> Bitmap Heap Scan on placemarks (cost=271.29..25596.63 rows=6803 width=2324) (actual time=0.039..0.088 rows=11 loops=1)
" Recheck Cond: (((""assetId"")::text > 'CURSOR_ASSETID'::text) OR ((""assetId"")::text = 'CURSOR_ASSETID'::text))"
" Filter: (((""assetId"")::text > 'CURSOR_ASSETID'::text) OR (((""assetId"")::text = 'CURSOR_ASSETID'::text) AND (id > CURSOR_INT_ID)))"
Rows Removed by Filter: 1
Heap Blocks: exact=10
-> BitmapOr (cost=271.29..271.29 rows=6803 width=0) (actual time=0.030..0.034 rows=0 loops=1)
" -> Bitmap Index Scan on ""placemarks_assetId_key"" (cost=0.00..263.45 rows=6802 width=0) (actual time=0.023..0.023 rows=11 loops=1)"
" Index Cond: ((""assetId"")::text > 'CURSOR_ASSETID'::text)"
" -> Bitmap Index Scan on ""placemarks_assetId_key"" (cost=0.00..4.44 rows=1 width=0) (actual time=0.005..0.005 rows=1 loops=1)"
" Index Cond: ((""assetId"")::text = 'CURSOR_ASSETID'::text)"
Planning time: 0.201 ms
Execution time: 0.194 ms
The issue is when I try to get the same page but with the query that should return me the previous page:
SELECT
*
FROM
placemarks
WHERE
"assetId" < 'CURSOR_ASSETID'
or("assetId" = 'CURSOR_ASSETID'
AND id < CURSOR_INT_ID)
ORDER BY
"assetId" desc,
id desc
LIMIT 5;
With this query no indexes are used, even if it would be much faster:
Limit (cost=933644.62..933644.63 rows=5 width=2324)
-> Sort (cost=933644.62..944647.42 rows=4401120 width=2324)
" Sort Key: ""assetId"" DESC, id DESC"
-> Seq Scan on placemarks (cost=0.00..860543.60 rows=4401120 width=2324)
" Filter: (((""assetId"")::text < 'CURSOR_ASSETID'::text) OR (((""assetId"")::text = 'CURSOR_ASSETID'::text) AND (id < CURSOR_INT_ID)))"
I've noticied that by forcing the usage of indexes with SET enable_seqscan = OFF; the query appears to be using the indexes and it performs better and faster. The query plan resulting:
Limit (cost=12.53..12.54 rows=5 width=108) (actual time=0.532..0.555 rows=5 loops=1)
-> Sort (cost=12.53..12.55 rows=6 width=108) (actual time=0.524..0.537 rows=5 loops=1)
Sort Key: assetid DESC, id DESC
Sort Method: top-N heapsort Memory: 25kB
" -> Bitmap Heap Scan on ""placemarks"" (cost=8.33..12.45 rows=6 width=108) (actual time=0.274..0.340 rows=14 loops=1)"
" Recheck Cond: ((assetid < 'CURSOR_ASSETID'::text) OR (assetid = 'CURSOR_ASSETID'::text))"
" Filter: ((assetid < 'CURSOR_ASSETID'::text) OR ((assetid = 'CURSOR_ASSETID'::text) AND (id < 14)))"
Rows Removed by Filter: 1
Heap Blocks: exact=1
-> BitmapOr (cost=8.33..8.33 rows=7 width=0) (actual time=0.152..0.159 rows=0 loops=1)
" -> Bitmap Index Scan on ""placemarks_assetid_idx"" (cost=0.00..4.18 rows=6 width=0) (actual time=0.108..0.110 rows=12 loops=1)"
" Index Cond: (assetid < 'CURSOR_ASSETID'::text)"
" -> Bitmap Index Scan on ""placemarks_assetid_idx"" (cost=0.00..4.15 rows=1 width=0) (actual time=0.036..0.036 rows=3 loops=1)"
" Index Cond: (assetid = 'CURSOR_ASSETID'::text)"
Planning time: 1.319 ms
Execution time: 0.918 ms
Any clue to optimize the second query in order to use always the indexes?
Postgres DB version: 10.20
The fast performance of your first query seems to be down to luck of where your constant 'CURSOR_ASSETID' falls in the distribution of that column. Or maybe this luck is not luck but is how it will always be?
For good performance more generally, including for reverse sorting, you need to write your query with a tuple comparator, not an OR comparator.
WHERE
("assetId",id) < ('something',500000)
If you are using a version before incremental sorting was introduced in v13, or if "assetId" can have a large number of ties, then you will need a multicolumn index on ("assetId",id) to get optimal performance.
And there is no reason to decorate the index with DESC, as PostgreSQL knows how to read the index backwards. Decorating the index is needed when the two columns have different ordering than each other, as then you would need to read the undecorated index "spirally" rather than either completely forward or completely backwards. (But that wouldn't work well here anyway, as tuple comparators can't have different orderings between the columns.)
I am trying to execute the same SQL but with different values for the where clause. One query is taking significantly longer time to process than the other. I have also observed that the execution plan for the two queries is different too,
Query1 and Execution Plan:
explain analyze
select t."postal_code"
from dev."postal_master" t
left join dev."premise_master" f
on t."primary_code" = f."primary_code"
and t."name" = f."name"
and t."final_code" = f."final_code"
where 1 = 1 and t."region" = 'US'
and t."name" = 'UBQ'
and t."accountModCode" = 'LTI'
and t."modularity_code" = 'PHA'
group by t."postal_code", t."modularity_code", t."region",
t."feature", t."granularity"
Group (cost=4.19..4.19 rows=1 width=38) (actual time=76411.456..76414.348 rows=11871 loops=1)
Group Key: t."postal_code", t."modularity_code", t."region", t."feature", t.granularity
-> Sort (cost=4.19..4.19 rows=1 width=38) (actual time=76411.452..76412.045 rows=11879 loops=1)
Sort Key: t."postal_code", t."feature", t.granularity
Sort Method: quicksort Memory: 2055kB
-> Nested Loop Left Join (cost=0.17..4.19 rows=1 width=38) (actual time=45.373..76362.219 rows=11879 loops=1)
Join Filter: (((t."name")::text = (f."name")::text) AND ((t."primary_code")::text = (f."primary_code")::text) AND ((t."final_code")::text = (f."final_code")::text))
Rows Removed by Join Filter: 150642887
-> Index Scan using idx_postal_code_source on postal_master t (cost=0.09..2.09 rows=1 width=72) (actual time=36.652..154.339 rows=11871 loops=1)
Index Cond: (("name")::text = 'UBQ'::text)
Filter: ((("region")::text = 'US'::text) AND (("accountModCode")::text = 'LTI'::text) AND (("modularity_code")::text = 'PHA'::text))
Rows Removed by Filter: 550164
-> Index Scan using idx_postal_master_source on premise_master f (cost=0.08..2.09 rows=1 width=35) (actual time=0.016..3.720 rows=12690 loops=11871)
Index Cond: (("name")::text = 'UBQ'::text)
Planning Time: 1.196 ms
Execution Time: 76415.004 ms
Query2 and Execution plan:
explain analyze
select t."postal_code"
from dev."postal_master" t
left join dev."premise_master" f
on t."primary_code" = f."primary_code"
and t."name" = f."name"
and t."final_code" = f."final_code"
where 1 = 1 and t."region" = 'DE'
and t."name" = 'EME'
and t."accountModCode" = 'QEW'
and t."modularity_code" = 'NFX'
group by t."postal_code", t."modularity_code", t."region",
t."feature", t."granularity"
Group (cost=50302.96..50426.04 rows=1330 width=38) (actual time=170.687..184.772 rows=8230 loops=1)
Group Key: t."postal_code", t."modularity_code", t."region", t."feature", t.granularity
-> Gather Merge (cost=50302.96..50423.27 rows=1108 width=38) (actual time=170.684..182.965 rows=8230 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Group (cost=49302.95..49304.62 rows=554 width=38) (actual time=164.446..165.613 rows=2743 loops=3)
Group Key: t."postal_code", t."modularity_code", t."region", t."feature", t.granularity
-> Sort (cost=49302.95..49303.23 rows=554 width=38) (actual time=164.444..164.645 rows=3432 loops=3)
Sort Key: t."postal_code", t."feature", t.granularity
Sort Method: quicksort Memory: 550kB
Worker 0: Sort Method: quicksort Memory: 318kB
Worker 1: Sort Method: quicksort Memory: 322kB
-> Nested Loop Left Join (cost=1036.17..49297.90 rows=554 width=38) (actual time=2.143..148.372 rows=3432 loops=3)
-> Parallel Bitmap Heap Scan on territory_postal_mapping t (cost=1018.37..38323.78 rows=554 width=72) (actual time=1.898..11.849 rows=2743 loops=3)
Recheck Cond: ((("accountModCode")::text = 'QEW'::text) AND (("region")::text = 'DE'::text) AND (("name")::text = 'EME'::text))
Filter: (("modularity_code")::text = 'NFX'::text)
Rows Removed by Filter: 5914
Heap Blocks: exact=2346
-> Bitmap Index Scan on territorypostal__source_region_mod (cost=0.00..1018.31 rows=48088 width=0) (actual time=4.783..4.783 rows=25973 loops=1)
Index Cond: ((("accountModCode")::text = 'QEW'::text) AND (("region")::text = 'DE'::text) AND (("name")::text = 'EME'::text))
-> Bitmap Heap Scan on premise_master f (cost=17.80..19.81 rows=1 width=35) (actual time=0.047..0.048 rows=1 loops=8230)
Recheck Cond: (((t."primary_code")::text = ("primary_code")::text) AND ((t."final_code")::text = ("final_code")::text))
Filter: ((("name")::text = 'EME'::text) AND ((t."name")::text = ("name")::text))
Heap Blocks: exact=1955
-> BitmapAnd (cost=17.80..17.80 rows=1 width=0) (actual time=0.046..0.046 rows=0 loops=8230)
-> Bitmap Index Scan on premise_master__accountprimarypostal (cost=0.00..1.95 rows=105 width=0) (actual time=0.008..0.008 rows=24 loops=8230)
Index Cond: ((t."primary_code")::text = ("primary_code")::text)
-> Bitmap Index Scan on premise_master__accountfinalterritorycode (cost=0.00..15.80 rows=1403 width=0) (actual time=0.065..0.065 rows=559 loops=4568)
Index Cond: ((t."final_code")::text = ("final_code")::text)
Planning Time: 1.198 ms
Execution Time: 185.197 ms
I am aware that there will be different number of rows depending on the where condition but is that the only reason for the different execution plan. Also, how can I improve the performance of the first query.
The estimates are totally wrong for the first query, so it is no surprise that PostgreSQL picks a bad plan. Try these measures one after the other and see if they help:
Collect statistics:
ANALYZE premise_master, postal_master;
Calculate more precise statistics:
ALTER TABLE premise_master ALTER name SET statistics 1000;
ALTER TABLE postal_master ALTER name SET statistics 1000;
ANALYZE premise_master, postal_master;
The estimates in the first query are off in such a bad way that I suspect that there is an exceptional problem, like an upgrade with pg_upgrade where you forgot to run ANALYZE afterwards, or you are wiping the database statistics with pg_stat_reset().
If that is not the case, and a simple ANALYZE of the tables did the trick, the cause of the problem must be that autoanalyze does not run often enough on these tables. You can tune autovacuum to do that more often with a statement like this:
ALTER TABLE premise_master SET (autovacuum_analyze_scale_factor = 0.01);
That would make PostgreSQL collect statistics whenever 1% of the table has changed.
The first line of each EXPLAIN ANALYZE output suggests that the planner only expected 1 row from the first query, while it expected 1130 from the second, so that's probably why it chose a less efficient query plan. That usually means table statistics aren't up to date, and when they were last run there weren't many rows that would've matched the first query (maybe the data was being loaded in alphabetical order?). In this case the fix is to execute an ANALYZE dev."postal_master"query to refresh the statistics.
You could also try removing the GROUP BY clause entirely (if your tooling allows). I could be misreading but it doesn't look like it's affecting the output much. If that results in unwanted duplicates you can use select distinct t.postal_code instead of the group by.
EXPLAIN ANALYZE SELECT "alerts"."id",
"alerts"."created_at",
't1'::text AS src_table
FROM "alerts"
INNER JOIN "devices"
ON "devices"."id" = "alerts"."device_id"
INNER JOIN "sites"
ON "sites"."id" = "devices"."site_id"
WHERE "sites"."cloud_id" = 111
AND "alerts"."created_at" >= '2019-08-30'
ORDER BY "created_at" DESC limit 9;
Limit (cost=1.15..36021.60 rows=9 width=16) (actual time=30.505..29495.765 rows=9 loops=1)
-> Nested Loop (cost=1.15..232132.92 rows=58 width=16) (actual time=30.504..29495.755 rows=9 loops=1)
-> Nested Loop (cost=0.86..213766.42 rows=57231 width=24) (actual time=0.029..29086.323 rows=88858 loops=1)
-> Index Scan Backward using alerts_created_at_index on alerts (cost=0.43..85542.16 rows=57231 width=24) (actual time=0.014..88.137 rows=88858 loops=1)
Index Cond: (created_at >= '2019-08-30 00:00:00'::timestamp without time zone)
-> Index Scan using devices_pkey on devices (cost=0.43..2.23 rows=1 width=16) (actual time=0.016..0.325 rows=1 loops=88858)
Index Cond: (id = alerts.device_id)
-> Index Scan using sites_pkey on sites (cost=0.29..0.31 rows=1 width=8) (actual time=0.004..0.004 rows=0 loops=88858)
Index Cond: (id = devices.site_id)
Filter: (cloud_id = 7231)
Rows Removed by Filter: 1
Total runtime: 29495.816 ms
Now we change to LIMIT 10:
EXPLAIN ANALYZE SELECT "alerts"."id",
"alerts"."created_at",
't1'::text AS src_table
FROM "alerts"
INNER JOIN "devices"
ON "devices"."id" = "alerts"."device_id"
INNER JOIN "sites"
ON "sites"."id" = "devices"."site_id"
WHERE "sites"."cloud_id" = 111
AND "alerts"."created_at" >= '2019-08-30'
ORDER BY "created_at" DESC limit 10;
Limit (cost=39521.79..39521.81 rows=10 width=16) (actual time=1.557..1.559 rows=10 loops=1)
-> Sort (cost=39521.79..39521.93 rows=58 width=16) (actual time=1.555..1.555 rows=10 loops=1)
Sort Key: alerts.created_at
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=5.24..39520.53 rows=58 width=16) (actual time=0.150..1.543 rows=11 loops=1)
-> Nested Loop (cost=4.81..16030.12 rows=2212 width=8) (actual time=0.137..0.643 rows=31 loops=1)
-> Index Scan using sites_cloud_id_index on sites (cost=0.29..64.53 rows=31 width=8) (actual time=0.014..0.057 rows=23 loops=1)
Index Cond: (cloud_id = 7231)
-> Bitmap Heap Scan on devices (cost=4.52..512.32 rows=270 width=16) (actual time=0.020..0.025 rows=1 loops=23)
Recheck Cond: (site_id = sites.id)
-> Bitmap Index Scan on devices_site_id_index (cost=0.00..4.46 rows=270 width=0) (actual time=0.006..0.006 rows=9 loops=23)
Index Cond: (site_id = sites.id)
-> Index Scan using alerts_device_id_index on alerts (cost=0.43..10.59 rows=3 width=24) (actual time=0.024..0.028 rows=0 loops=31)
Index Cond: (device_id = devices.id)
Filter: (created_at >= '2019-08-30 00:00:00'::timestamp without time zone)
Rows Removed by Filter: 12
Total runtime: 1.603 ms
alerts table has millions of records, other tables are counted in thousands.
I can already optimize the query by simply not using limit < 10. What I don't understand is why the LIMIT affects the performance. Perhaps there's a better way than hardcoding this magic number "10".
The number of result rows affects the PostgreSQL optimizer, because plans that return the first few rows quickly are not necessarily plans that return the whole result as fast as possible.
In your case, PostgreSQL thinks that for small values of LIMIT, it will be faster by scanning the alerts table in the order of the ORDER BY clause using an index and just join the other tables using a nested loop until it has found 9 rows.
The benefit of such a strategy is that it doesn't have to calculate the complete result of the join, then sort it and throw away all but the first few result rows.
The danger is that it takes longer than expected to find the 9 matching rows, and this is what hits you:
Index Scan Backward using alerts_created_at_index on alerts (cost=0.43..85542.16 rows=57231 width=24) (actual time=0.014..88.137 rows=88858 loops=1)
So PostgreSQL has to process 88858 rows and use a nested loop join (which is inefficient if it has to loop often) until it finds 9 result rows. This may be because it underestimates the selectivity of the conditions, or because the many matching rows all happen to have low created_at.
The number 10 just happens to be the cut-off point where PostgreSQL thinks it will no longer be more efficient to use that strategy, it is a value that will change as the data in the database change.
You can avoid using that plan altogether by using an ORDER BY clause that does not match the index:
ORDER BY (created_at + INTERVAL '0 days') DESC
I've been optimizing some sql queries against a production database clone. Here is an example query where I've create two indexes where we can run index-only scans really fast using a hash join.
explain analyse
select activity.id from activity, notification
where notification.user_id = '9a51f675-e1e2-46e5-8bcd-6bc535c7e7cb'
and notification.received = false
and notification.invalid = false
and activity.id = notification.activity_id
and activity.space_id = 'e12b42ac-4e54-476f-a4f5-7d6bdb1e61e2'
order by activity.end_time desc
limit 21;
Limit (cost=985.58..985.58 rows=1 width=24) (actual time=0.017..0.017 rows=0 loops=1)
-> Sort (cost=985.58..985.58 rows=1 width=24) (actual time=0.016..0.016 rows=0 loops=1)
Sort Key: activity.end_time DESC
Sort Method: quicksort Memory: 25kB
-> Hash Join (cost=649.76..985.57 rows=1 width=24) (actual time=0.010..0.010 rows=0 loops=1)
Hash Cond: (notification.activity_id = activity.id)
-> Index Only Scan using unreceived_notifications_index on notification (cost=0.42..334.62 rows=127 width=16) (actual time=0.009..0.009 rows=0 loops=1)
Index Cond: (user_id = '9a51f675-e1e2-46e5-8bcd-6bc535c7e7cb'::uuid)
Heap Fetches: 0
-> Hash (cost=634.00..634.00 rows=1227 width=24) (never executed)
-> Index Only Scan using space_activity_index on activity (cost=0.56..634.00 rows=1227 width=24) (never executed)
Index Cond: (space_id = 'e12b42ac-4e54-476f-a4f5-7d6bdb1e61e2'::uuid)
Heap Fetches: 0
Planning time: 0.299 ms
Execution time: 0.046 ms
And here are the indexes.
create index unreceived_notifications_index on notification using btree (
user_id,
activity_id, -- index-only scan
id -- index-only scan
) where (
invalid = false
and received = false
);
space_activity_index
create index space_activity_index on activity using btree (
space_id,
end_time desc,
id -- index-only scan
);
However, I'm noticing that these indexes are making our development database a LOT slower. Here's the same query against a user in our development database and you'll notice its using a nested loop join this time and the order of the loops is really inefficient.
explain analyse
select notification.id from notification, activity
where notification.user_id = '7c74a801-7cb5-4914-bbbe-2b18cd1ced76'
and notification.received = false
and notification.invalid = false
and activity.id = notification.activity_id
and activity.space_id = '415fc269-e68f-4da0-b3e3-b1273b741a7f'
order by activity.end_time desc
limit 20;
Limit (cost=0.69..272.04 rows=20 width=24) (actual time=277.255..277.255 rows=0 loops=1)
-> Nested Loop (cost=0.69..71487.55 rows=5269 width=24) (actual time=277.253..277.253 rows=0 loops=1)
-> Index Only Scan using space_activity_index on activity (cost=0.42..15600.36 rows=155594 width=24) (actual time=0.016..59.433 rows=155666 loops=1)
Index Cond: (space_id = '415fc269-e68f-4da0-b3e3-b1273b741a7f'::uuid)
Heap Fetches: 38361
-> Index Only Scan using unreceived_notifications_index on notification (cost=0.27..0.35 rows=1 width=32) (actual time=0.001..0.001 rows=0 loops=155666)
Index Cond: ((user_id = '7c74a801-7cb5-4914-bbbe-2b18cd1ced76'::uuid) AND (activity_id = activity.id))
Heap Fetches: 0
Planning time: 0.351 ms
Execution time: 277.286 ms
One thing to note here is that there is are only 2 space_ids in our development database. I suspect this is causing Postgres to try to be clever, but it's actually making performance worse!
My question is:
Is there some way that I can force Postgres to run the hash join instead of the nested loop join?
Is there some way, in general, that I can make Postgres's query-planner more deterministic? Ideally, the query performance characteristics would be the exact same between these environments.
Thanks.
Edit: Note that when I leave out the space_id condition when querying my dev database, the result is faster.
explain analyse
select notification.id from notification, activity
where notification.user_id = '7c74a801-7cb5-4914-bbbe-2b18cd1ced76'
and notification.received = false
and notification.invalid = false
and activity.id = notification.activity_id
--and activity.space_id = '415fc269-e68f-4da0-b3e3-b1273b741a7f'
order by activity.end_time desc
limit 20;
Limit (cost=17628.13..17630.43 rows=20 width=24) (actual time=2.730..2.730 rows=0 loops=1)
-> Gather Merge (cost=17628.13..17996.01 rows=3199 width=24) (actual time=2.729..2.729 rows=0 loops=1)
Workers Planned: 1
Workers Launched: 1
-> Sort (cost=16628.12..16636.12 rows=3199 width=24) (actual time=0.126..0.126 rows=0 loops=2)
Sort Key: activity.end_time DESC
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=20.59..16441.88 rows=3199 width=24) (actual time=0.093..0.093 rows=0 loops=2)
-> Parallel Bitmap Heap Scan on notification (cost=20.17..2512.17 rows=3199 width=32) (actual time=0.092..0.092 rows=0 loops=2)
Recheck Cond: ((user_id = '7c74a801-7cb5-4914-bbbe-2b18cd1ced76'::uuid) AND (NOT invalid) AND (NOT received))
-> Bitmap Index Scan on unreceived_notifications_index (cost=0.00..18.82 rows=5439 width=0) (actual time=0.006..0.006 rows=0 loops=1)
Index Cond: (user_id = '7c74a801-7cb5-4914-bbbe-2b18cd1ced76'::uuid)
-> Index Scan using activity_pkey on activity (cost=0.42..4.35 rows=1 width=24) (never executed)
Index Cond: (id = notification.activity_id)
Planning time: 0.344 ms
Execution time: 3.433 ms
Edit: After reading about index hinting, I tried turning nested_loop off using set enable_nestloop=false; and the query is way faster!
Limit (cost=20617.76..20620.09 rows=20 width=24) (actual time=2.872..2.872 rows=0 loops=1)
-> Gather Merge (cost=20617.76..21130.20 rows=4392 width=24) (actual time=2.871..2.871 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=19617.74..19623.23 rows=2196 width=24) (actual time=0.086..0.086 rows=0 loops=3)
Sort Key: activity.end_time DESC
Sort Method: quicksort Memory: 25kB
-> Hash Join (cost=2609.20..19495.85 rows=2196 width=24) (actual time=0.062..0.062 rows=0 loops=3)
Hash Cond: (activity.id = notification.activity_id)
-> Parallel Seq Scan on activity (cost=0.00..14514.57 rows=64831 width=24) (actual time=0.006..0.006 rows=1 loops=3)
Filter: (space_id = '415fc269-e68f-4da0-b3e3-b1273b741a7f'::uuid)
-> Hash (cost=2541.19..2541.19 rows=5441 width=32) (actual time=0.007..0.007 rows=0 loops=3)
Buckets: 8192 Batches: 1 Memory Usage: 64kB
-> Bitmap Heap Scan on notification (cost=20.18..2541.19 rows=5441 width=32) (actual time=0.006..0.006 rows=0 loops=3)
Recheck Cond: ((user_id = '7c74a801-7cb5-4914-bbbe-2b18cd1ced76'::uuid) AND (NOT invalid) AND (NOT received))
-> Bitmap Index Scan on unreceived_notifications_index (cost=0.00..18.82 rows=5441 width=0) (actual time=0.004..0.004 rows=0 loops=3)
Index Cond: (user_id = '7c74a801-7cb5-4914-bbbe-2b18cd1ced76'::uuid)
Planning time: 0.375 ms
Execution time: 3.630 ms
It depends on how specialized you want to get. There are plan guides in postgresQL that you can use to force the queries to use specific indexes. But query optimizers are strongly impacted by record counts in the choices they make. Maybe you add the extra indexes in the non-dev environment and move on?
https://docs.aws.amazon.com/dms/latest/sql-server-to-aurora-postgresql-migration-playbook/chap-sql-server-aurora-pg.tuning.queryplanning.html
I have written a query whose aim is to get 10 results including the current one, padding up to 9 entries on either side for an alphabetical list which can be sorted by the reciever.
This is the query I am using, my issue however is not with the result, but because neither of the queries is using an index.
(
SELECT
uid,
title
FROM
books
WHERE
lower(title) < lower('Frankenstein')
ORDER BY title desc
LIMIT 9
)
UNION
(
SELECT
uid,
title
FROM
books
WHERE
lower(title) >= lower('Frankenstein')
ORDER BY title
LIMIT 10
)
ORDER BY title;
The index I am trying to utilize is a simple btree, no text_pattern_ops etc as below:
CREATE INDEX books_title_idx ON books USING btree (lower(title));
If I run explain on the first part of the unioin, in spite of the limit and order, it performs a full table scan
explain analyze
SELECT
uid,
title
FROM
books
WHERE
lower(title) < lower('Frankenstein')
ORDER BY title desc
LIMIT 9
Limit (cost=69.04..69.06 rows=9 width=152) (actual time=6.276..6.292 rows=9 loops=1)
-> Sort (cost=69.04..69.67 rows=251 width=152) (actual time=6.273..6.277 rows=9 loops=1)
Sort Key: ((title))
Sort Method: top-N heapsort Memory: 25kB
-> Seq Scan on books (cost=0.00..63.80 rows=251 width=152) (actual time=0.056..5.227 rows=267 loops=1)
Filter: (lower((title)) < 'frankenstein'::text)
Rows Removed by Filter: 486
Total runtime: 6.359 ms
when I do an equality check on the same query - the index is used
explain analyze
SELECT
uid,
title
FROM
books
WHERE
lower(title) = lower('Frankenstein')
ORDER BY title desc
Sort (cost=17.04..17.05 rows=4 width=152) (actual time=0.054..0.054 rows=0 loops=1)
Sort Key: ((title))
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on books (cost=4.31..17.00 rows=4 width=152) (actual time=0.041..0.041 rows=0 loops=1)
Recheck Cond: (lower((title)) = 'frankenstein'::text)
-> Bitmap Index Scan on books_title_idx (cost=0.00..4.31 rows=4 width=0) (actual time=0.036..0.036 rows=0 loops=1)
Index Cond: (lower((title)) = 'frankenstein'::text)
Total runtime: 0.129 ms
and the same applies when I do a between query
explain analyze
SELECT
uid,
title
FROM
books
WHERE
lower(title) > lower('Frankenstein') AND lower(title) < lower('Gulliver''s Travels')
ORDER BY title
Sort (cost=17.08..17.09 rows=4 width=152) (actual time=0.511..0.529 rows=25 loops=1)
Sort Key: (title)
Sort Method: quicksort Memory: 27kB
-> Bitmap Heap Scan on books (cost=4.33..17.04 rows=4 width=152) (actual time=0.118..0.213 rows=25 loops=1)
Recheck Cond: ((lower(title) > 'frankenstein'::text) AND (lower(title) < 'gulliver''s travels'::text))
-> Bitmap Index Scan on books_title_idx (cost=0.00..4.33 rows=4 width=0) (actual time=0.087..0.087 rows=25 loops=1)
Index Cond: ((lower(title) > 'frankenstein'::text) AND (lower(title) < 'gulliver''s travels'::text))
Total runtime: 0.621 ms
What I am obviously looking for here is not a between search because the beginning and end are unknown.
So is this a postgresql limitation or is there something other than manipulating the cost of a table scan to something silly that I can use to convince the query planner to use the index?
I am using PostgreSQL 9.3
Use:
ORDER BY lower(title) DESC
or
ORDER BY lower(title)
to match your functional index, so it can be utilized.
ORDER BY is irrelevant for the selection of rows in the other two queries. That's why the index can be used in those cases.