Postgres Materialize Node completing faster than its sub-node - sql

I was analyzing a slow postgresql database query and I noticed something that seemed quite odd to me (I'm new at analysing queries in postgres). The actual time for the Materialize node both starts and finishes before its sub node.
-> Nested Loop (cost=300.28..698.21 rows=1 width=54) (actual time=180.547..11022.591 rows=166 loops=1)
Join Filter: (mytable1.category_id = mytable2.category_id)
-> Index Scan using mytable2_p_category_id on mytable2 (cost=0.00..3.48 rows=15 width=4) (actual time=0.012..0.037 rows=15 loops=1)
-> Materialize (cost=300.28..694.51 rows=1 width=54) (actual time=12.036..734.653 rows=166 loops=15)
-> Nested Loop (cost=300.28..694.50 rows=1 width=54) (actual time=180.520..11016.887 rows=166 loops=1)
Does anyone know when and where you might expect this to happen?
In case it's relevant our postgres server is running version 9.1
Thanks

As Denis pointed out in the comments (so I can't give him the tick of approval) it would seem that the actual time for the Materialize node is probably best read in terms of loops * actual time
So in this example that would be:
Start - (12.036 * 15) = 180.54
End - (734.653 * 15) = 11019.795
Browsing online I found other examples of looped Materialize statements being reported this way also.
So I guess the answer to the question of "when and where" is almost always when your Materialize node is being looped over.
So unless someone who knows better chimes in I'll just make this the answer for now.

Related

Postgresql recheck performed even there are no lossy blocks

I am running a explain (buffers, analyze, verbose)
And I am getting this subresult
-> Bitmap Heap Scan on public.d (cost=109.92..8479.81 rows=5871 width=40) (actual time=1.334..29.942 rows=5306 loops=1)
Output: d.id, d.pd, d.iid, d.dtid, d.bid
Recheck Cond: ((d.sid = 100) AND (d.pd >= '2020-01-28 10:24:40.034+00'::timestamp with time zone) AND (d.pd <= '2020-04-28 10:24:40.034+00'::timestamp with time zone))
Heap Blocks: exact=2014
Buffers: shared hit=3 read=2035
-> Bitmap Index Scan on idx_d_didpd (cost=0.00..108.45 rows=5871 width=0) (actual time=1.018..1.018 rows=5306 loops=1)
Index Cond: ((d.sid = 100) AND (d.pd >= '2020-01-28 10:24:40.034+00'::timestamp with time zone) AND (d.pd <= '2020-04-28 10:24:40.034+00'::timestamp with time zone))
Buffers: shared read=24
What I am wondering that in whole result the most "costly" parts are that are performing the Bitman Heap Scan (other parts performing the index scan and they are pretty fast). But I´ve read that recheck on bitmap heap scan is performed just in case that there are some lossy blocks. Which I can not see here.
Can anyone tell me why is this Heap Scan performed?
Note that "Recheck Cond" is also present with just an EXPLAIN without the ANALYZE. The value of this field does not depend on what actually happened during execution. It is telling you what condition will be used in a potential recheck, it does not tell you how often the recheck was actually performed (which in your case was probably zero).
Can anyone tell me why is this Heap Scan performed?
The Bitmap Heap Scan is not just there to do a recheck, its main purpose is to fetch the data you asked for.

Why does a pg query stop using an index after a while?

I have this query in Postgres 12.0:
SELECT "articles"."id"
FROM "articles"
WHERE ((jsonfields ->> 'etat') = '0' OR (jsonfields ->> 'etat') = '1' OR (jsonfields ->> 'etat') = '2')
ORDER BY ordre ASC;
At this time:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=1274.09..1274.97 rows=354 width=8) (actual time=13.000..13.608 rows=10435 loops=1)
Sort Key: ordre
Sort Method: quicksort Memory: 874kB
-> Bitmap Heap Scan on articles (cost=15.81..1259.10 rows=354 width=8) (actual time=1.957..10.807 rows=10435 loops=1)
Recheck Cond: (((jsonfields ->> 'etat'::text) = '1'::text) OR ((jsonfields ->> 'etat'::text) = '2'::text) OR ((jsonfields ->> 'etat'::text) = '0'::text))
Heap Blocks: exact=6839
-> BitmapOr (cost=15.81..15.81 rows=356 width=0) (actual time=1.171..1.171 rows=0 loops=1)
-> Bitmap Index Scan on myidx (cost=0.00..5.18 rows=119 width=0) (actual time=0.226..0.227 rows=2110 loops=1)
Index Cond: ((jsonfields ->> 'etat'::text) = '1'::text)
-> Bitmap Index Scan on myidx (cost=0.00..5.18 rows=119 width=0) (actual time=0.045..0.045 rows=259 loops=1)
Index Cond: ((jsonfields ->> 'etat'::text) = '2'::text)
-> Bitmap Index Scan on myidx (cost=0.00..5.18 rows=119 width=0) (actual time=0.899..0.899 rows=8066 loops=1)
Index Cond: ((jsonfields ->> 'etat'::text) = '0'::text)
Planning Time: 0.382 ms
Execution Time: 14.234 ms
(15 lignes)
After a while:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=7044.04..7079.35 rows=14127 width=8) (actual time=613.445..614.679 rows=15442 loops=1)
Sort Key: ordre
Sort Method: quicksort Memory: 1108kB
-> Seq Scan on articles (cost=0.00..6070.25 rows=14127 width=8) (actual time=0.060..609.477 rows=15442 loops=1)
Filter: (((jsonfields ->> 'etat'::text) = '1'::text) OR ((jsonfields ->> 'etat'::text) = '2'::text) OR ((jsonfields ->> 'etat'::text) = '3'::text))
Rows Removed by Filter: 8288
Planning Time: 0.173 ms
Execution Time: 615.744 ms
(8 lignes)
I need to re-create index:
DROP INDEX myidx;
CREATE INDEX myidx ON articles ( (jsonfields->>'etat') );
Why? How to fix this?
I tried to decrease memory for disable seqscan. It doesn't work.
I tried to do select pg_stat_reset();. It doesn't work.
pg_stat_reset() does not reset table statistics. It only resets counters (like how often an index was used), it has no effects on query plans.
To update table statistics, use ANALYZE (or VACUUM ANALYZE, while being at it). autovacuum should take care of this automatically, normally.
Your first query finds rows=10435, your second query finds rows=15442. Postgres expects to find rows=354 (!) in the first, but rows=14127 in the second. It largely under-estimates the number of result rows in the first, which favours indexes. So your first query was only fast by accident.
Table statistics have changed, there may be table and index bloat. Most importantly, your cost settings are probably misleading. Consider a lower setting for random_page_cost (and possibly for cpu_index_tuple_cost and others).
Related:
Keep PostgreSQL from sometimes choosing a bad query plan
If recreating the index leads to a different query plan, the index may have been bloated. (A bloated index would also discourage Postgres from using it.) More aggressive autovacuum settings, generally or just for the table or even just the index may help.
Also, expression indexes introduce additional statistics (the essential one on jsonfields->>'etat' in your case). Dropping the index drops those, too. And the new expression index starts out with empty statistics which are filled with the next manual ANALYZE or by autovacuum. So, typically, you should run ANALYZE on the table after creating an expression index - except that in your case you currently only seem to get the fast query when based on misleading stats, so fix that first.
Maybe revisit your database design. Does that etat value really have to be nested in a JSON column? Might be a lot cheaper overall to have it as separate column.
Be that as it may, the most expensive part of your first (fast) query plan is the Bitmap Heap Scan, where Postgres reads actual data pages to return id values. A shortcut with a "covering" index would be possible since Postgres 11:
CREATE INDEX myidx ON articles ((jsonfields->>'etat')) INCLUDE (ordre, id);
But this relies on autovacuum doing its job in timely manner even more, as it requires the visibility map to be up to date.
Or, if your WHERE clause is constant (always filtering for (jsonfields ->> 'etat') = ANY ('{0,1,2}')), a partial index would reign supreme:
CREATE INDEX myidx ON articles (ordre, id)
WHERE (jsonfields ->> 'etat') = ANY ('{0,1,2}');
Immediately after you create the functional index, it doesn't have any statistics gathered on it, so PostgreSQL must make some generic assumptions. Once auto-analyze has had a chance to run, it has real stats to work with. Now it turns out the more-accurate estimates actually leads to a worse plan, which is rather unfortunate.
The PostgreSQL planner generally assumes much of our data is not in cache. This assumption pushes it to favor seq scans over index scan when it will be returning a large number of rows (Your second plan is returning 2/3 of the table!). The reasons it makes this assumption is that it is safer. Assuming too little data is cached leads to merely bad plans, but assuming too much is cached leads to utterly catastrophic plans.
In general, the amount of data assumed to be cache is baked into the random_page_cost setting, so you can tweak that setting if you want it. (baking it into that setting, rather than having a separate setting, was a poor design decision in my opinion, but it was made a very long time ago).
You could set random_page_cost equal to seq_page_cost, to see if that solves the problem. But that is probably not a change you would want to make permanently, as it is likely to create more problems than it solves. Perhaps the correct setting is lower than the default but still higher than seq_page_cost. You should also do EXPLAIN (ANALYZE, BUFFERS), and set track_io_timing = on, to give you more information to use in evaluating this.
Another issue is that the bitmap heap scan never needs to consult the actual JSON data. It gets all the data it needs from the index. The seq scan needs to consult the JSON data, and how slow this is will be depends on things like what type it is (json or jsonb) and how much other stuff is in that JSON. PostgreSQL rather ridiculously thinks that parsing a JSON document take about the same amount of time as comparing two integers does.
You can more or less fix this problem (for json type) by running the following statement:
update pg_proc set procost =100 where proname='json_object_field_text';
(This is imperfect, as the cost of this function gets charged to the recheck condition of the bitmap heap scan even though no recheck is done. But the recheck is charged for each tuple expected to be returned, not each tuple expected to be in the table, so this creates a distinction you can take advantage of).

Why PostgreSQL queries are slower in the first request after first new connection than during the subsequent requests?

Why PostgreSQL queries are slower in the first request after first new connection than during the subsequent requests?
Using several different technologies to connect to a postgresql database. First request might take 1.5 seconds. Exact same query will take .03 seconds the second time. Open a second instance of my application (connecting to same database) and that first request takes 1.5 seconds and the second .03 seconds.
Because of the different technologies we are using they are connecting at different points and using different connection methods so I really don't think it has anything to do with any code I have written.
I'm thinking that opening a connection doesn't do 'everything' until the first request, so that request has some overhead.
Because I have used the database, and kept the server up everything is in memory so index and the like should not be an issue.
Edit Explain - tells me about the query and honestly the query looks pretty good (indexed, etc). I really think postgresql has some kind of overhead on the first query of a new connection.
I don't know how to prove/disprove that. If I used PG Admin III (pgAdmin version 1.12.3) all the query's seem fast. Any of the other tools I have the first query is slow. Most the time its not noticeably slower, and if it was I always chalked it up to updating the ram with the index. But this is clearly NOT that. If I open my tool(s) and do any other query that returns results the second query is fast regardless. If the first query doesn't return results then the second is still slow, then third is fast.
edit 2
Even though I don't think the query has anything to do with the delay (every first query is slow) here are two results from running Explain (EXPLAIN ANALYZE)
EXPLAIN ANALYZE
select * from company
where company_id = 39
Output:
"Seq Scan on company (cost=0.00..1.26 rows=1 width=54) (actual time=0.037..0.039 rows=1 loops=1)"
" Filter: (company_id = 39)"
"Total runtime: 0.085 ms"
and:
EXPLAIN ANALYZE
select * from group_devices
where device_name ilike 'html5_demo'
and group_id in ( select group_id from manager_groups
where company_id in (select company_id from company where company_name ='TRUTHPT'))
output:
"Nested Loop Semi Join (cost=1.26..45.12 rows=1 width=115) (actual time=1.947..2.457 rows=1 loops=1)"
" Join Filter: (group_devices.group_id = manager_groups.group_id)"
" -> Seq Scan on group_devices (cost=0.00..38.00 rows=1 width=115) (actual time=0.261..0.768 rows=1 loops=1)"
" Filter: ((device_name)::text ~~* 'html5_demo'::text)"
" -> Hash Semi Join (cost=1.26..7.09 rows=9 width=4) (actual time=0.297..1.596 rows=46 loops=1)"
" Hash Cond: (manager_groups.company_id = company.company_id)"
" -> Seq Scan on manager_groups (cost=0.00..5.53 rows=509 width=8) (actual time=0.003..0.676 rows=469 loops=1)"
" -> Hash (cost=1.26..1.26 rows=1 width=4) (actual time=0.035..0.035 rows=1 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 1kB"
" -> Seq Scan on company (cost=0.00..1.26 rows=1 width=4) (actual time=0.025..0.027 rows=1 loops=1)"
" Filter: ((company_name)::text = 'TRUTHPT'::text)"
"Total runtime: 2.566 ms"
I have observed the same behavior. If I start a new connection, and run a query multiple times, the first execution is about 25% slower than the following executions. (This query has been run earlier in other connections, and I have verified that there is no disk I/O involved.) I profiled the process with perf during the first query execution, and this is what I found:
As you can see, a lot of time is spent handling page faults. If I profile the second execution, there are no page faults. AFAICT, these are what is called minor/soft page faults. This happens when a process for the first time access a page that is in shared memory. At that point, the process needs to map the page into its virtual address space (see https://en.wikipedia.org/wiki/Page_fault). If the page needs to be read from disk, it is called as major/hard page fault.
This explanation also fits with other observations that I have made: If I later run a different query in the same connection, the amount of overhead for its first execution seems to depend on how much overlap there is with the data accessed by the first query.
This a very old question, but hopefully this may help.
First query
It doesn't seem like an index is being used, and the optimizer is resorting to a sequential scan of the table.
While scanning the table sequentially, the optimizer may cache the entire table in RAM, if the data fits into the buffer. See this article for more information.
Why the buffering is occurring for each connection I don't know. Regardless, a sequential scan is not desirable for this kind of query and can be avoided with correct indexing and statistics.
Check the structure of the company table. Make sure that company_id is part of a UNIQUE INDEX or PRIMARY KEY.
Make sure you run ANALYZE, so that the optimizer has the correct statistics. This will help to ensure that the index for company will be used in your queries instead of a sequential scan of the table.
See PostgreSQL documentation
Second query
Try using INNER JOIN to avoid the optimizer selecting Hash Semi Join, to get more consistent performance and a simplier EXPLAIN PLAN:
select gd.*
from group_devices gd
inner join manager_groups mg on mg.group_id = gd.group_id
inner join company c on c.company_id = mg.company_id
where gd.device_name like 'html5_demo%'
and c.company_name = 'TRUTHPT';
See related question
First request will read blocks from disk to buffers.
Second request will read from buffers.
It doesnt matter how many connections are made, the result is dependant on whether that query has already been parsed.
Please note changing literals will reparse the query
Also note that if the query hasnt been executed in a while then physical reads may still occur depending on many variables.

Postgres on AWS performance issues on filter or aggregate

I'm working on a system which has a table with aprox. 13 million records.
It does not appear to be big deal for postgres, but i'm facing serious performance issues when hitting this particular table.
The table has aprox. 60 columns (I know it's too much, but I can't change it for reasons beyond my will).
Hardware ain't problem. It's running on AWS. I tested several configurations, even the new RDS for postgres:
vCPU ECU mem(gb)
m1.xlarge 64 bits 4 8 15
m2.xlarge 64 bits 2 6,5 17
hs1.8xlarge 64 bits 16 35 117 SSD
I tuned pg settings with pgtune. And also set the ubuntu's kernel sshmall and shmmax.
Some "explain analyze" queries:
select count(*) from:
$Aggregate (cost=350390.93..350390.94 rows=1 width=0) (actual time=24864.287..24864.288 rows=1 loops=1)
-> Index Only Scan using core_eleitor_sexo on core_eleitor (cost=0.00..319722.17 rows=12267505 width=0) (actual time=0.019..12805.292 rows=12267505 loops=1)
Heap Fetches: 9676
Total runtime: 24864.325 ms
select distinct core_eleitor_city from:
HashAggregate (cost=159341.37..159341.39 rows=2 width=516) (actual time=15965.740..15966.090 rows=329 loops=1)
-> Bitmap Heap Scan on core_eleitor (cost=1433.53..159188.03 rows=61338 width=516) (actual time=956.590..9021.457 rows=5886110 loops=1)
Recheck Cond: ((core_eleitor_city)::text = 'RIO DE JANEIRO'::text)
-> Bitmap Index Scan on core_eleitor_city (cost=0.00..1418.19 rows=61338 width=0) (actual time=827.473..827.473 rows=5886110 loops=1)
Index Cond: ((core_eleitor_city)::text = 'RIO DE JANEIRO'::text)
Total runtime: 15977.460 ms
I have btree indexes on columns frequently used for filter or aggregations.
So, given I can't change my table design. Is there something I can do to improve the performance?
Any help would be awesome.
Thanks
You're aggregating ~12.3M and ~5.9M rows on a VPS cluster that, if I am not mistaking, might span multiple physical servers, with data that is probably pulled from a SAN on yet another set of different server than Postgres itself.
Imho, there's little you can do to make it faster on (AWS anyway), besides a) not running queries that basically visit the entire database table to begin with and b) maintaining a pre-count using triggers if possible if you persist in doing so.
Here you go for improving the performance on RDS:
As referred to link here:
http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_ReadRepl.html
Amazon RDS uses MySQL’s built-in replication functionality to create a special type of DB instance called a read replica from a source DB instance. Updates made to the source DB instance are copied to the read replica. You can reduce the load on your source DB instance by routing read queries from your applications to the read replica. Read replicas allow you to elastically scale out beyond the capacity constraints of a single DB instance for read-heavy database workloads.

PostgreSQL query not using index in production

I'm noticing something strange/weird:
The exact same query in development/production are not using the same query path. In particular, the development version is using indexes which are omitted in production (in favor of seqscan).
The only real difference is that the dataset is production is significantly larger--the index size is 1034 MB, vs 29 MB in production. Would PostgreSQL abstain from using indexes if they (or the table) are too big?
EDIT: EXPLAIN ANALYZE for both queries:
Development:
Limit (cost=41638.15..41638.20 rows=20 width=154) (actual time=159.576..159.581 rows=20 loops=1)
-> Sort (cost=41638.15..41675.10 rows=14779 width=154) (actual time=159.575..159.577 rows=20 loops=1)
Sort Key: (sum(scenario_ad_group_performances.clicks))
Sort Method: top-N heapsort Memory: 35kB
-> GroupAggregate (cost=0.00..41244.89 rows=14779 width=154) (actual time=0.040..151.535 rows=14197 loops=1)
-> Nested Loop Left Join (cost=0.00..31843.75 rows=93800 width=154) (actual time=0.022..82.509 rows=50059 loops=1)
-> Merge Left Join (cost=0.00..4203.46 rows=14779 width=118) (actual time=0.017..27.103 rows=14197 loops=1)
Merge Cond: (scenario_ad_groups.id = scenario_ad_group_vendor_instances.ad_group_id)
-> Index Scan using scenario_ad_groups_pkey on scenario_ad_groups (cost=0.00..2227.06 rows=14779 width=114) (actual time=0.009..12.085 rows=14197 loops=1)
Filter: (scenario_id = 22)
-> Index Scan using index_scenario_ad_group_vendor_instances_on_ad_group_id on scenario_ad_group_vendor_instances (cost=0.00..1737.02 rows=27447 width=8) (actual time=0.007..7.021 rows=16528 loops=1)
Filter: (ad_vendor_id = ANY ('{1,2,3}'::integer[]))
-> Index Scan using index_ad_group_performances_on_vendor_instance_id_and_date on scenario_ad_group_performances (cost=0.00..1.73 rows=11 width=44) (actual time=0.002..0.003 rows=3 loops=14197)
Index Cond: ((vendor_instance_id = scenario_ad_group_vendor_instances.id) AND (date >= '2012-02-01'::date) AND (date <= '2012-02-28'::date))
Total runtime: 159.710 ms
Production:
Limit (cost=822401.35..822401.40 rows=20 width=179) (actual time=21279.547..21279.591 rows=20 loops=1)
-> Sort (cost=822401.35..822488.42 rows=34828 width=179) (actual time=21279.543..21279.560 rows=20 loops=1)
Sort Key: (sum(scenario_ad_group_performances.clicks))
Sort Method: top-N heapsort Memory: 33kB
-> GroupAggregate (cost=775502.60..821474.59 rows=34828 width=179) (actual time=19126.783..21226.772 rows=34495 loops=1)
-> Sort (cost=775502.60..776739.48 rows=494751 width=179) (actual time=19125.902..19884.164 rows=675253 loops=1)
Sort Key: scenario_ad_groups.id
Sort Method: external merge Disk: 94200kB
-> Hash Right Join (cost=25743.86..596796.70 rows=494751 width=179) (actual time=1155.491..16720.460 rows=675253 loops=1)
Hash Cond: (scenario_ad_group_performances.vendor_instance_id = scenario_ad_group_vendor_instances.id)
-> Seq Scan on scenario_ad_group_performances (cost=0.00..476354.29 rows=4158678 width=44) (actual time=0.043..8949.640 rows=4307019 loops=1)
Filter: ((date >= '2012-02-01'::date) AND (date <= '2012-02-28'::date))
-> Hash (cost=24047.72..24047.72 rows=51371 width=143) (actual time=1123.896..1123.896 rows=34495 loops=1)
Buckets: 1024 Batches: 16 Memory Usage: 392kB
-> Hash Right Join (cost=6625.90..24047.72 rows=51371 width=143) (actual time=92.257..1070.786 rows=34495 loops=1)
Hash Cond: (scenario_ad_group_vendor_instances.ad_group_id = scenario_ad_groups.id)
-> Seq Scan on scenario_ad_group_vendor_instances (cost=0.00..11336.31 rows=317174 width=8) (actual time=0.020..451.496 rows=431770 loops=1)
Filter: (ad_vendor_id = ANY ('{1,2,3}'::integer[]))
-> Hash (cost=5475.55..5475.55 rows=34828 width=139) (actual time=88.311..88.311 rows=34495 loops=1)
Buckets: 1024 Batches: 8 Memory Usage: 726kB
-> Bitmap Heap Scan on scenario_ad_groups (cost=798.20..5475.55 rows=34828 width=139) (actual time=4.451..44.065 rows=34495 loops=1)
Recheck Cond: (scenario_id = 276)
-> Bitmap Index Scan on index_scenario_ad_groups_on_scenario_id (cost=0.00..789.49 rows=34828 width=0) (actual time=4.232..4.232 rows=37006 loops=1)
Index Cond: (scenario_id = 276)
Total runtime: 21306.697 ms
Disclaimer
I have used PostgreSQL very little. I'm answering based on my knowledge of SQL Server index usage and execution plans. I ask the PostgreSQL gods for mercy if I get something wrong.
Query Optimizers are Dynamic
You said your query plan has changed from your development to production environments. This is to be expected. Query optimizers are designed to generate the optimum execution plan based on the current data conditions. Under different conditions the optimizer may decide it is more efficient to use a table scan vs an index scan.
When would it be more efficient to use a table scan vs an index scan?
SELECT A, B
FROM someTable
WHERE A = 'SOME VALUE'
Let's say you have a non-clustered index on column A. In this case you are filtering on column A, which could potentially take advantage of the index. This would be efficient if the index is selective enough - basically, how many distinct values make up the index? The database keeps statistics on this selectivity info and uses these statistics when calculating costs for execution plans.
If you have a million rows in a table, but only 10 possible values for A, then your query would likely return about 100K rows. Because the index is non-clustered, and you are returning columns not included in the index, B, a lookup will need to be performed for each row returned. These look-ups are random-access lookups which are much more expensive then sequential reads used by a table scan. At a certain point it becomes more efficient for the database to just perform a table scan rather than an index scan.
This is just one scenario, there are many others. It's hard to know without knowing more about what your data is like, what your indexes look like and how you are trying to access the data.
To answer the original question:
Would PostgreSQL abstain from using indexes if they (or the table) are too big? No. It is more likely that in the way that you are accessing the data, it is less efficient for PostgreSQL to use the index vs using a table scan.
The PostgreSQL FAQ touches on this very subject (see: Why are my queries slow? Why don't they use my indexes?): https://wiki.postgresql.org/wiki/FAQ#Why_are_my_queries_slow.3F_Why_don.27t_they_use_my_indexes.3F
Postgres' query optimizer comes up with multiple scenarios (e.g. index vs seq-scan) and evaluates them using statistical information about your tables and the relative costs of disk/memory/index/table access set in configuration.
Did you use the EXPLAIN command to see why index use was omitted? Did you use EXPLAIN ANALYZE to find out if the decision was in error? Can we see the outputs, please?
edit:
As hard as analyzing two different singular queries on different systems are, I think I see a couple of things.
The production environment has a actual/cost rate of around 20-100 milliseconds per cost unit. I'm not even a DBA, but this seems consistent. The development environment has 261 for the main query. Does this seem right? Would you expect the raw speed (memory/disk/CPU) of the production environment to be 2-10x faster than dev?
Since the production environment has a much more complex query plan, it looks like it's doing its job. Undoubtedly, the dev environment's plan and many more have been considered, and deemed too costly. And the 20-100 variance isn't that much in my experience (but again, not a DBA) and shows that there isn't anything way off the mark. Still, you may want to run a VACUUM on the DB just in case.
I'm not experienced and patient enough to decode the full query, but could there be a denormalization/NOSQL-ization point for optimization?
The biggest bottleneck seems to be the disk merge at 90 MB. If the production environment has enough memory, you may want to increase the relevant setting (working memory?) to do it in-memory. It seems to be the work_mem parameter here, though you'll want to read through the rest.
I'd also suggest having a look at the index usage statistics. Many options with partial and functional indices exist.
It seems to me that your dev data is much "simpler" than the production data. As an example:
Development:
-> Index Scan using index_scenario_ad_group_vendor_instances_on_ad_group_id on scenario_ad_group_vendor_instances
(cost=0.00..1737.02 rows=27447 width=8)
(actual time=0.007..7.021 rows=16528 loops=1)
Filter: (ad_vendor_id = ANY ('{1,2,3}'::integer[]))
Production:
-> Seq Scan on scenario_ad_group_vendor_instances
(cost=0.00..11336.31 rows=317174 width=8)
(actual time=0.020..451.496 rows=431770 loops=1)
Filter: (ad_vendor_id = ANY ('{1,2,3}'::integer[]))
This means, that in dev 27447 matching row have been estimated upfront and 16528 rows were indeed found. That't the same ballpark and OK.
In production 317174 matching rows have been estimated upfront and 431770 rows were found. Also OK.
But comparing dev to prod means that the numbers are 10 times different. As already other answers indicate, doing 10 times more random seeks (due to index access) might indeed be worse than a plain table scan.
Hence the interesting question is: How many rows does this table contain both in dev and in prod? Is number_returned_rows / number_total_rows comparable between dev and prod?
Edit Don't forget: I have picked one index access as an example. A quick glance shows that the other index accesses have the same symptoms.
Try
SET enable_seqscan TO 'off'
before EXPLAIN ANALYZE