Postgres similar query taking different time, not sure what's wrong - sql

We have a query that runs on several different child tables created hourly and inherited from a base table, say tab_name1 and tab_name2. The query was working fine, but suddenly it started to perform badly for all the Childs since a particular date.
This is the query which works fine till tab_name_20180621*, not sure what happened after that.
SELECT
*
FROM
tab_name_201806220300
WHERE
id = 201806220300
AND col1 IN (
SELECT
col1
FROM
tab_name2_201806220300
WHERE
uid = 5452
AND id = 201806220300
);
The analyze output shows something like this, there's huge difference in execution time.
#1
Nested Loop Semi Join (cost=0.00..84762.11 rows=1 width=937) (actual time=117.599..117.599 rows=0 loops=1)
Join Filter: (tab_name_201806210100.col1 = tab_name2_201806210100.col1)
-> Seq Scan on tab_name_201806210100 (cost=0.00..31603.56 rows=1 width=937) (actual time=117.596..117.596 rows=0 loops=1)
Filter: (log_id = '201806220100'::bigint)
Rows Removed by Filter: 434045
-> Materialize (cost=0.00..53136.74 rows=1454 width=41) (never executed)
-> Seq Scan on tab_name2_201806210100 (cost=0.00..53129.47 rows=1454 width=41) (never executed)
Filter: ((uid = 5452) AND (log_id = '201806210100'::bigint))
Planning time: 1.490 ms
Execution time: 117.723 ms
#2
Nested Loop Semi Join (cost=0.00..10299.31 rows=48 width=1476) (actual time=1082.255..47041.945 rows=671 loops=1)
Join Filter: (tab_name_201806220100.col1 = tab_name2_201806220100.col1)
Rows Removed by Join Filter: 252444174
-> Seq Scan on tab_name_201806220100 (cost=0.00..4023.69 rows=95 width=1476) (actual time=0.008..36.292 rows=64153 loops=1)
Filter: (log_id = '201806220100'::bigint)
-> Materialize (cost=0.00..6274.19 rows=1 width=32) (actual time=0.000..0.264 rows=3935 loops=64153)
-> Seq Scan on tab_name2_201806220100 (cost=0.00..6274.19 rows=1 width=32) (actual time=0.464..55.475 rows=3960 loops=1)
Filter: ((log_id = '201806220100'::bigint) AND (uid = 5452))
Rows Removed by Filter: 140592
Planning time: 1.024 ms
Execution time: 47042.234 ms
I don't know what to infer from this and how to proceed from here. Could you help me?

Related

Why is Postgres execution plan changing vastly based on where condition

I am trying to execute the same SQL but with different values for the where clause. One query is taking significantly longer time to process than the other. I have also observed that the execution plan for the two queries is different too,
Query1 and Execution Plan:
explain analyze
select t."postal_code"
from dev."postal_master" t
left join dev."premise_master" f
on t."primary_code" = f."primary_code"
and t."name" = f."name"
and t."final_code" = f."final_code"
where 1 = 1 and t."region" = 'US'
and t."name" = 'UBQ'
and t."accountModCode" = 'LTI'
and t."modularity_code" = 'PHA'
group by t."postal_code", t."modularity_code", t."region",
t."feature", t."granularity"
Group (cost=4.19..4.19 rows=1 width=38) (actual time=76411.456..76414.348 rows=11871 loops=1)
Group Key: t."postal_code", t."modularity_code", t."region", t."feature", t.granularity
-> Sort (cost=4.19..4.19 rows=1 width=38) (actual time=76411.452..76412.045 rows=11879 loops=1)
Sort Key: t."postal_code", t."feature", t.granularity
Sort Method: quicksort Memory: 2055kB
-> Nested Loop Left Join (cost=0.17..4.19 rows=1 width=38) (actual time=45.373..76362.219 rows=11879 loops=1)
Join Filter: (((t."name")::text = (f."name")::text) AND ((t."primary_code")::text = (f."primary_code")::text) AND ((t."final_code")::text = (f."final_code")::text))
Rows Removed by Join Filter: 150642887
-> Index Scan using idx_postal_code_source on postal_master t (cost=0.09..2.09 rows=1 width=72) (actual time=36.652..154.339 rows=11871 loops=1)
Index Cond: (("name")::text = 'UBQ'::text)
Filter: ((("region")::text = 'US'::text) AND (("accountModCode")::text = 'LTI'::text) AND (("modularity_code")::text = 'PHA'::text))
Rows Removed by Filter: 550164
-> Index Scan using idx_postal_master_source on premise_master f (cost=0.08..2.09 rows=1 width=35) (actual time=0.016..3.720 rows=12690 loops=11871)
Index Cond: (("name")::text = 'UBQ'::text)
Planning Time: 1.196 ms
Execution Time: 76415.004 ms
Query2 and Execution plan:
explain analyze
select t."postal_code"
from dev."postal_master" t
left join dev."premise_master" f
on t."primary_code" = f."primary_code"
and t."name" = f."name"
and t."final_code" = f."final_code"
where 1 = 1 and t."region" = 'DE'
and t."name" = 'EME'
and t."accountModCode" = 'QEW'
and t."modularity_code" = 'NFX'
group by t."postal_code", t."modularity_code", t."region",
t."feature", t."granularity"
Group (cost=50302.96..50426.04 rows=1330 width=38) (actual time=170.687..184.772 rows=8230 loops=1)
Group Key: t."postal_code", t."modularity_code", t."region", t."feature", t.granularity
-> Gather Merge (cost=50302.96..50423.27 rows=1108 width=38) (actual time=170.684..182.965 rows=8230 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Group (cost=49302.95..49304.62 rows=554 width=38) (actual time=164.446..165.613 rows=2743 loops=3)
Group Key: t."postal_code", t."modularity_code", t."region", t."feature", t.granularity
-> Sort (cost=49302.95..49303.23 rows=554 width=38) (actual time=164.444..164.645 rows=3432 loops=3)
Sort Key: t."postal_code", t."feature", t.granularity
Sort Method: quicksort Memory: 550kB
Worker 0: Sort Method: quicksort Memory: 318kB
Worker 1: Sort Method: quicksort Memory: 322kB
-> Nested Loop Left Join (cost=1036.17..49297.90 rows=554 width=38) (actual time=2.143..148.372 rows=3432 loops=3)
-> Parallel Bitmap Heap Scan on territory_postal_mapping t (cost=1018.37..38323.78 rows=554 width=72) (actual time=1.898..11.849 rows=2743 loops=3)
Recheck Cond: ((("accountModCode")::text = 'QEW'::text) AND (("region")::text = 'DE'::text) AND (("name")::text = 'EME'::text))
Filter: (("modularity_code")::text = 'NFX'::text)
Rows Removed by Filter: 5914
Heap Blocks: exact=2346
-> Bitmap Index Scan on territorypostal__source_region_mod (cost=0.00..1018.31 rows=48088 width=0) (actual time=4.783..4.783 rows=25973 loops=1)
Index Cond: ((("accountModCode")::text = 'QEW'::text) AND (("region")::text = 'DE'::text) AND (("name")::text = 'EME'::text))
-> Bitmap Heap Scan on premise_master f (cost=17.80..19.81 rows=1 width=35) (actual time=0.047..0.048 rows=1 loops=8230)
Recheck Cond: (((t."primary_code")::text = ("primary_code")::text) AND ((t."final_code")::text = ("final_code")::text))
Filter: ((("name")::text = 'EME'::text) AND ((t."name")::text = ("name")::text))
Heap Blocks: exact=1955
-> BitmapAnd (cost=17.80..17.80 rows=1 width=0) (actual time=0.046..0.046 rows=0 loops=8230)
-> Bitmap Index Scan on premise_master__accountprimarypostal (cost=0.00..1.95 rows=105 width=0) (actual time=0.008..0.008 rows=24 loops=8230)
Index Cond: ((t."primary_code")::text = ("primary_code")::text)
-> Bitmap Index Scan on premise_master__accountfinalterritorycode (cost=0.00..15.80 rows=1403 width=0) (actual time=0.065..0.065 rows=559 loops=4568)
Index Cond: ((t."final_code")::text = ("final_code")::text)
Planning Time: 1.198 ms
Execution Time: 185.197 ms
I am aware that there will be different number of rows depending on the where condition but is that the only reason for the different execution plan. Also, how can I improve the performance of the first query.
The estimates are totally wrong for the first query, so it is no surprise that PostgreSQL picks a bad plan. Try these measures one after the other and see if they help:
Collect statistics:
ANALYZE premise_master, postal_master;
Calculate more precise statistics:
ALTER TABLE premise_master ALTER name SET statistics 1000;
ALTER TABLE postal_master ALTER name SET statistics 1000;
ANALYZE premise_master, postal_master;
The estimates in the first query are off in such a bad way that I suspect that there is an exceptional problem, like an upgrade with pg_upgrade where you forgot to run ANALYZE afterwards, or you are wiping the database statistics with pg_stat_reset().
If that is not the case, and a simple ANALYZE of the tables did the trick, the cause of the problem must be that autoanalyze does not run often enough on these tables. You can tune autovacuum to do that more often with a statement like this:
ALTER TABLE premise_master SET (autovacuum_analyze_scale_factor = 0.01);
That would make PostgreSQL collect statistics whenever 1% of the table has changed.
The first line of each EXPLAIN ANALYZE output suggests that the planner only expected 1 row from the first query, while it expected 1130 from the second, so that's probably why it chose a less efficient query plan. That usually means table statistics aren't up to date, and when they were last run there weren't many rows that would've matched the first query (maybe the data was being loaded in alphabetical order?). In this case the fix is to execute an ANALYZE dev."postal_master"query to refresh the statistics.
You could also try removing the GROUP BY clause entirely (if your tooling allows). I could be misreading but it doesn't look like it's affecting the output much. If that results in unwanted duplicates you can use select distinct t.postal_code instead of the group by.

PL/pgSQL Query Plan Worse Inside Function Than Outside

I have a function that is running too slow. I've isolated which piece of the function is slow.. a small SELECT statement:
SELECT image_group_id
FROM programs.image_family fam
JOIN programs.provider_file pf
ON (fam.provider_data_id = pf.provider_data_id
AND fam.family_id = $1 AND pf.image_group_id IS NOT NULL)
LIMIT 1
When I run the function this piece of SQL generates the following query plan:
Query Text: SELECT image_group_id FROM programs.image_family fam JOIN programs.provider_file pf ON (fam.provider_data_id = pf.provider_data_id AND fam.family_id = $1 AND pf.image_group_id IS NOT NULL) LIMIT 1
Limit (cost=0.56..6.75 rows=1 width=6) (actual time=3471.004..3471.004 rows=0 loops=1)
-> Nested Loop (cost=0.56..594054.42 rows=96017 width=6) (actual time=3471.002..3471.002 rows=0 loops=1)
-> Seq Scan on image_family fam (cost=0.00..391880.08 rows=96023 width=6) (actual time=3471.001..3471.001 rows=0 loops=1)
Filter: ((family_id)::numeric = '8419853'::numeric)
Rows Removed by Filter: 19204671
-> Index Scan using "IX_DBO_PROVIDER_FILE_1" on provider_file pf (cost=0.56..2.11 rows=1 width=12) (never executed)
Index Cond: (provider_data_id = fam.provider_data_id)
Filter: (image_group_id IS NOT NULL)
When I run the selected query in a query tool (outside of the function) the query plan looks like this:
Limit (cost=1.12..3.81 rows=1 width=6) (actual time=0.043..0.043 rows=1 loops=1)
Output: pf.image_group_id
Buffers: shared hit=11
-> Nested Loop (cost=1.12..14.55 rows=5 width=6) (actual time=0.041..0.041 rows=1 loops=1)
Output: pf.image_group_id
Inner Unique: true
Buffers: shared hit=11
-> Index Only Scan using image_family_family_id_provider_data_id_idx on programs.image_family fam (cost=0.56..1.65 rows=5 width=6) (actual time=0.024..0.024 rows=1 loops=1)
Output: fam.family_id, fam.provider_data_id
Index Cond: (fam.family_id = 8419853)
Heap Fetches: 2
Buffers: shared hit=6
-> Index Scan using "IX_DBO_PROVIDER_FILE_1" on programs.provider_file pf (cost=0.56..2.58 rows=1 width=12) (actual time=0.013..0.013 rows=1 loops=1)
Output: pf.provider_data_id, pf.provider_file_path, pf.posted_dt, pf.file_repository_id, pf.restricted_size, pf.image_group_id, pf.is_master, pf.is_biggest
Index Cond: (pf.provider_data_id = fam.provider_data_id)
Filter: (pf.image_group_id IS NOT NULL)
Buffers: shared hit=5
Planning time: 0.809 ms
Execution time: 0.100 ms
If I disable sequence scans in the function I can get a similar query plan:
Query Text: SELECT image_group_id FROM programs.image_family fam JOIN programs.provider_file pf ON (fam.provider_data_id = pf.provider_data_id AND fam.family_id = $1 AND pf.image_group_id IS NOT NULL) LIMIT 1
Limit (cost=1.12..8.00 rows=1 width=6) (actual time=3855.722..3855.722 rows=0 loops=1)
-> Nested Loop (cost=1.12..660217.34 rows=96017 width=6) (actual time=3855.721..3855.721 rows=0 loops=1)
-> Index Only Scan using image_family_family_id_provider_data_id_idx on image_family fam (cost=0.56..458043.00 rows=96023 width=6) (actual time=3855.720..3855.720 rows=0 loops=1)
Filter: ((family_id)::numeric = '8419853'::numeric)
Rows Removed by Filter: 19204671
Heap Fetches: 368
-> Index Scan using "IX_DBO_PROVIDER_FILE_1" on provider_file pf (cost=0.56..2.11 rows=1 width=12) (never executed)
Index Cond: (provider_data_id = fam.provider_data_id)
Filter: (image_group_id IS NOT NULL)
The query plans are different where the Filter functions are for the Index Only Scan. The function has more Heap Fetches and seems to treat the argument as a string casted to a numeric.
Things I've tried:
Increasing statistics (and running vacuum/analyze)
Calling the problematic piece of SQL in another function with language SQL
Add another index (the one that its using now to perform an INDEX ONLY scan)
Create a CTE for the image_family table (this did help performance but would still do a sequence scan on the image_family instead of using the index so still, too slow)
Change from executing raw SQL to using an EXECUTE ... INTO .. USING in the function.
Makeup of the two tables:
image_family:
provider_data_id: numeric(16)
family_id: int4
(rest omitted for brevity)
unique index on provider_data_id
index on family_id
I recently added a unique index on (family_id, provider_data_id) as well
Approximately 20 million rows here. Families have many provider_data_ids but not all provider_data_ids are part of families and thus aren't all in this table.
provider_file:
provider_data_id numeric(16)
image_group_id numeric(16)
(rest omitted for brevity)
unique index on provider_data_id
Approximately 32 million rows in this table. Most rows (> 95%) have a Non-Null image_group_id.
Postgres Version 10
How can I get the query performance to match whether I call it from a function or as raw SQL in a query tool?
The problem is exhibited in this line:
Filter: ((family_id)::numeric = '8419853'::numeric)
The index on family_id cannot be used because family_id is compared to a numeric value. This requires a cast to numeric, and there is no index on family_id::numeric.
Even though integer and numeric both are types representing numbers, their internal representation is quite different, and so the indexes are incompatible. In other words, the cast to numeric is like a function for PostgreSQL, and since it has no index on that functional expression, it has to resort to a scan of the whole table (or index).
The solution is simple, however: use an integer instead of a numeric parameter for the query. If in doubt, use a cast like
fam.family_id = $1::integer

Improve running time of SQL query

I have the following table structure:
AdPerformance
id
ad_id
impressions
Targeting
value
AdActions
app_starts
Ad
id
name
parent_id
AdTargeting
id
targeting_
ad_id
Targeting
id
name
value
AdProduct
id
ad_id
name
I need to aggregate the data by targeting with restriction to product name , so I wrote the following query:
SELECT ad_performance.ad_id, targeting.value AS targeting_value,
sum(impressions) AS impressions,
sum(app_starts) AS app_starts
FROM ad_performance
LEFT JOIN ad on ad.id = ad_performance.ad_id
LEFT JOIN ad_actions ON ad_performance.id = ad_actions.ad_performance_id
RIGHT JOIN (
SELECT ad_id, value from targeting, ad_targeting
WHERE targeting.id = ad_targeting.id and targeting.name = 'gender'
) targeting ON targeting.ad_id = ad.parent_id
WHERE ad_performance.ad_id IN
(SELECT ad_id FROM ad_product WHERE product = 'iphone')
GROUP BY ad_performance.ad_id, targeting_value
However the above query in ANALYZE command takes about 5s for ~1M records.
Is there a way to improve it?
I do have indexes on foreign keys
UPDATED
See output of ANALYZE
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=5787.28..5789.87 rows=259 width=254) (actual time=3283.763..3286.015 rows=5998 loops=1)
Group Key: adobject_performance.ad_id, targeting.value
Buffers: shared hit=3400223
-> Nested Loop Left Join (cost=241.63..5603.63 rows=8162 width=254) (actual time=10.438..2774.664 rows=839720 loops=1)
Buffers: shared hit=3400223
-> Nested Loop (cost=241.21..1590.52 rows=8162 width=250) (actual time=10.412..703.818 rows=839720 loops=1)
Join Filter: (adobject.id = adobject_performance.ad_id)
Buffers: shared hit=36755
-> Hash Join (cost=240.78..323.35 rows=9 width=226) (actual time=10.380..20.332 rows=5998 loops=1)
Hash Cond: (ad_product.ad_id = ad.id)
Buffers: shared hit=190
-> HashAggregate (cost=128.98..188.96 rows=5998 width=4) (actual time=3.788..6.821 rows=5998 loops=1)
Group Key: ad_product.ad_id
Buffers: shared hit=39
-> Seq Scan on ad_product (cost=0.00..113.99 rows=5998 width=4) (actual time=0.011..1.726 rows=5998 loops=1)
Filter: ((product)::text = 'ft2_iPhone'::text)
Rows Removed by Filter: 1
Buffers: shared hit=39
-> Hash (cost=111.69..111.69 rows=9 width=222) (actual time=6.578..6.578 rows=5998 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 241kB
Buffers: shared hit=151
-> Hash Join (cost=30.26..111.69 rows=9 width=222) (actual time=0.154..4.660 rows=5998 loops=1)
Hash Cond: (adobject.parent_id = adobject_targeting.ad_id)
Buffers: shared hit=151
-> Seq Scan on adobject (cost=0.00..77.97 rows=897 width=8) (actual time=0.009..1.449 rows=6001 loops=1)
Buffers: shared hit=69
-> Hash (cost=30.24..30.24 rows=2 width=222) (actual time=0.132..0.132 rows=2 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
Buffers: shared hit=82
-> Nested Loop (cost=0.15..30.24 rows=2 width=222) (actual time=0.101..0.129 rows=2 loops=1)
Buffers: shared hit=82
-> Seq Scan on targeting (cost=0.00..13.88 rows=2 width=222) (actual time=0.015..0.042 rows=79 loops=1)
Filter: (name = 'age group'::targeting_name)
Rows Removed by Filter: 82
Buffers: shared hit=1
-> Index Scan using advertising_targeting_pkey on adobject_targeting (cost=0.15..8.17 rows=1 width=8) (actual time=0.001..0.001 rows=0 loops=79)
Index Cond: (id = targeting.id)
Buffers: shared hit=81
-> Index Scan using "fki_advertising_peformance_advertising_entity_id -> advertising" on adobject_performance (cost=0.42..89.78 rows=4081 width=32) (actual time=0.007..0.046 rows=140 loops=5998)
Index Cond: (ad_id = ad_product.ad_id)
Buffers: shared hit=36565
-> Index Scan using facebook_advertising_actions_pkey on facebook_adobject_actions (cost=0.42..0.48 rows=1 width=12) (actual time=0.001..0.002 rows=1 loops=839720)
Index Cond: (ad_performance.id = ad_performance_id)
Buffers: shared hit=3363468
Planning time: 1.525 ms
Execution time: 3287.324 ms
(46 rows)
Blindly shooting here, as we have not been provided with the result of the EXPLAIN, but still, Postgres should treat this query better if you take out your targeting table in a CTE:
WITH targeting AS
(
SELECT ad_id, value from targeting, ad_targeting
WHERE targeting.id = ad_targeting.id and targeting.name = 'gender'
)
SELECT ad_performance.ad_id, targeting.value AS targeting_value,
sum(impressions) AS impressions,
sum(app_starts) AS app_starts
FROM ad_performance
LEFT JOIN ad on ad.id = ad_performance.ad_id
LEFT JOIN ad_actions ON ad_performance.id = ad_actions.ad_performance_id
RIGHT JOIN targeting ON targeting.ad_id = ad.parent_id
WHERE ad_performance.ad_id IN
(SELECT ad_id FROM ad_product WHERE product = 'iphone')
GROUP BY ad_performance.ad_id, targeting_value
Taken from the Documentation:
A useful property of WITH queries is that they are evaluated only once
per execution of the parent query, even if they are referred to more
than once by the parent query or sibling WITH queries. Thus, expensive
calculations that are needed in multiple places can be placed within a
WITH query to avoid redundant work. Another possible application is to
prevent unwanted multiple evaluations of functions with side-effects.
The execution plan does not seem to match the query any more (maybe you can update the query).
However, the problem now is here:
-> Hash Join (cost=30.26..111.69 rows=9 width=222)
(actual time=0.154..4.660 rows=5998 loops=1)
Hash Cond: (adobject.parent_id = adobject_targeting.ad_id)
Buffers: shared hit=151
-> Seq Scan on adobject (cost=0.00..77.97 rows=897 width=8)
(actual time=0.009..1.449 rows=6001 loops=1)
Buffers: shared hit=69
-> Hash (cost=30.24..30.24 rows=2 width=222)
(actual time=0.132..0.132 rows=2 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
Buffers: shared hit=82
-> Nested Loop (cost=0.15..30.24 rows=2 width=222)
(actual time=0.101..0.129 rows=2 loops=1)
Buffers: shared hit=82
-> Seq Scan on targeting (cost=0.00..13.88 rows=2 width=222)
(actual time=0.015..0.042 rows=79 loops=1)
Filter: (name = 'age group'::targeting_name)
Rows Removed by Filter: 82
Buffers: shared hit=1
-> Index Scan using advertising_targeting_pkey on adobject_targeting
(cost=0.15..8.17 rows=1 width=8)
(actual time=0.001..0.001 rows=0 loops=79)
Index Cond: (id = targeting.id)
Buffers: shared hit=81
This is a join between adobject and the result of
targeting JOIN adobject_targeting
USING (id)
WHERE targeting.name = 'age group'
The latter subquery is correctly estimated to 2 rows, but PostgreSQL fails to notice that almost all rows found in adobject will match one of those two rows, so that the result of the join will be 6000 rather than the 9 it estimates.
This causes the optimizer to wrongly choose a nested loop join later on, where more than half of the query time is spent.
Unfortunately, since PostgreSQL doesn't have cross-table statistics, there is no way for PostgreSQL to know better.
One coarse measure is to SET enable_nestloop=off, but that will deteriorate the performance of the other (correctly chosen) nested loop join, so I don't know if it will be a net win.
If that helps, you could consider changing the parameter only for the duration of the query (with a transaction and SET LOCAL).
Maybe there is a way to rewrite the query so that a better plan can be found, but that is hard to say without knowing the exact query.
I dont know if this query will solve your problem, but try it:
SELECT ad_performance.ad_id, targeting.value AS targeting_value,
sum(impressions) AS impressions,
sum(app_starts) AS app_starts
FROM ad_performance
LEFT JOIN ad on ad.id = ad_performance.ad_id
LEFT JOIN ad_actions ON ad_performance.id = ad_actions.ad_performance_id
RIGHT JOIN ad_targeting on ad_targeting.ad_id = ad.parent_id
INNER JOIN targeting on targeting.id = ad_targeting.id and targeting.name = 'gender'
INNER JOIN ad_product on ad_product.ad_id = ad_performance.ad_id
WHERE ad_product.product = 'iphone'
GROUP BY ad_performance.ad_id, targeting_value
perhaps you would create index on all columns that you are putting in ON or WHERE conditions

1 minute difference in almost identical PostgreSQL queries?

I have Rails application with the ability to filter records by state_code. I noticed that when i pass 'CA' as search term i get my results almost instantly. If i will pass 'AZ' for example it will take more than a minute though.
I don't have any ideas why so?
Below is query explains from psql:
Fast one:
EXPLAIN ANALYZE SELECT
accounts.id
FROM "accounts"
LEFT OUTER JOIN "addresses"
ON "addresses"."addressable_id" = "accounts"."id"
AND "addresses"."address_type" = 'mailing'
AND "addresses"."addressable_type" = 'Account'
WHERE "accounts"."organization_id" = 16
AND (addresses.state_code IN ('CA'))
ORDER BY accounts.name DESC;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=4941.94..4941.94 rows=1 width=18) (actual time=74.810..74.969 rows=821 loops=1)
Sort Key: accounts.name
Sort Method: quicksort Memory: 75kB
-> Hash Join (cost=4.46..4941.93 rows=1 width=18) (actual time=70.044..73.148 rows=821 loops=1)
Hash Cond: (addresses.addressable_id = accounts.id)
-> Seq Scan on addresses (cost=0.00..4911.93 rows=6806 width=4) (actual time=0.027..65.547 rows=15244 loops=1)
Filter: (((address_type)::text = 'mailing'::text) AND ((addressable_type)::text = 'Account'::text) AND ((state_code)::text = 'CA'::text))
Rows Removed by Filter: 129688
-> Hash (cost=4.45..4.45 rows=1 width=18) (actual time=2.037..2.037 rows=1775 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 87kB
-> Index Scan using organization_id_index on accounts (cost=0.29..4.45 rows=1 width=18) (actual time=0.018..1.318 rows=1775 loops=1)
Index Cond: (organization_id = 16)
Planning time: 0.565 ms
Execution time: 75.224 ms
(14 rows)
Slow one:
EXPLAIN ANALYZE SELECT
accounts.id
FROM "accounts"
LEFT OUTER JOIN "addresses"
ON "addresses"."addressable_id" = "accounts"."id"
AND "addresses"."address_type" = 'mailing'
AND "addresses"."addressable_type" = 'Account'
WHERE "accounts"."organization_id" = 16
AND (addresses.state_code IN ('NV'))
ORDER BY accounts.name DESC;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=4917.27..4917.27 rows=1 width=18) (actual time=97091.270..97091.277 rows=25 loops=1)
Sort Key: accounts.name
Sort Method: quicksort Memory: 26kB
-> Nested Loop (cost=0.29..4917.26 rows=1 width=18) (actual time=844.250..97091.083 rows=25 loops=1)
Join Filter: (accounts.id = addresses.addressable_id)
Rows Removed by Join Filter: 915875
-> Index Scan using organization_id_index on accounts (cost=0.29..4.45 rows=1 width=18) (actual time=0.017..10.315 rows=1775 loops=1)
Index Cond: (organization_id = 16)
-> Seq Scan on addresses (cost=0.00..4911.93 rows=70 width=4) (actual time=0.110..54.521 rows=516 loops=1775)
Filter: (((address_type)::text = 'mailing'::text) AND ((addressable_type)::text = 'Account'::text) AND ((state_code)::text = 'NV'::text))
Rows Removed by Filter: 144416
Planning time: 0.308 ms
Execution time: 97091.325 ms
(13 rows)
Slow one result is 25 rows, fast one is 821 rows, which is even more confusing.
I solved it by using VACUUM ANALYZE command from psql command line.

Query plan & perf. variations on another machine

I'm not a PostgreSQL expert, and I've been struggling with this.
I have a rather simple query:
SELECT exists(SELECT 1 FROM res_users_log WHERE create_uid=u.id), count(1)
FROM res_users u
WHERE active=true
GROUP BY 1
Basically it counts the number of active users that have a log entry. Both tables are relatively big (~600k records each) and have indexes on their ids.
This query executes in ~500ms on our server, but completely hangs on my machine (same psql version, 9.3). My database is a restore of the server's dump so the indexes have been reindexed on import.
When I do an EXPLAIN ANALYZE of the query, I get different results on the server and on my machine.
Locally I get
HashAggregate (cost=78496.43..88302.28 rows=2 width=4) (actual time=518.003..518.003 rows=1 loops=1)
-> Index Scan using res_users_pkey on res_users u (cost=0.42..78496.35 rows=16 width=4) (actual time=51.393..517.969 rows=11 loops=1)
Index Cond: (id < 20)
Filter: active
Rows Removed by Filter: 7
SubPlan 1
-> Seq Scan on res_users_log (cost=0.00..9805.83 rows=2 width=0) (actual time=47.078..47.078 rows=1 loops=11)
Filter: (create_uid = u.id)
Rows Removed by Filter: 516910
Total runtime: 518.034 ms
(10 rows)
(had to add the id < 20 to have the query actually finish)
and on the server I get
HashAggregate (cost=5389666981.78..5389687409.80 rows=2 width=4) (actual time=532.664..532.665 rows=2 loops=1)
-> Seq Scan on res_users u (cost=0.00..5389664343.42 rows=527672 width=4) (actual time=256.169..467.829 rows=527661 loops=1)
Filter: active
Rows Removed by Filter: 381
SubPlan 1
-> Seq Scan on res_users_log (cost=0.00..10214.00 rows=1 width=0) (never executed)
Filter: (create_uid = u.id)
SubPlan 2
-> Seq Scan on res_users_log res_users_log_1 (cost=0.00..8800.60 rows=565360 width=4) (actual time=0.006..45.697 rows=547108 loops=1)
Total runtime: 532.757 ms
(10 rows)
I've been trying to determine why the query plans are different (and I don't understand the SubPlan 2 entry) and what could possibly make this query take more than 2hrs (killed it after that) on my laptop...
I've vacuumed both tables without any noticeable difference.
Any ideas of what could make this hang like that?
If you want active users that have a log entry, I would expect:
SELECT count(*)
FROM res_users u
WHERE u.active = true and
exists (SELECT 1 FROM res_users_log l WHERE l.create_uid = u.id);
Then indexes on res_users(active, id) and res_users_log(create_uid) would be optimal.