I created an index on a nested JSONB field:
CREATE INDEX foo_idx ON some_table(cast(content->'meta'->>'version' AS int));
but the select query still does a full table scan:
select *
from some_table
where (content->'meta'->>'version')::INT <= 9000
LIMIT 1;
I also tried to express the query like:
select *
from some_table
where cast(content->'meta'->>'version' AS INT) <= 9000
LIMIT 1;
with the same result.
Query plan:
Limit (cost=0.00..1.06 rows=10 width=52)
-> Seq Scan on some_table (cost=0.00..38429.27 rows=361441 width=52)
Filter: ((((content -> 'meta'::text) ->> 'version'::text))::integer <= 9000)
What do I miss here?
Edit: It was more a coincident. I added a ORDER BY random() to the query and got the following query plan:
Limit (cost=31644.83..31644.83 rows=1 width=52) (actual time=0.017..0.017 rows=0 loops=1)
-> Sort (cost=31644.83..32548.43 rows=361441 width=52) (actual time=0.016..0.016 rows=0 loops=1)
Sort Key: (random())
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on game_object_user (cost=6769.60..29837.62 rows=361441 width=52) (actual time=0.011..0.011 rows=0 loops=1)
Recheck Cond: ((((content -> 'meta'::text) ->> 'version'::text))::integer < 9000)
-> Bitmap Index Scan on foo_idx (cost=0.00..6679.23 rows=361441 width=0) (actual time=0.009..0.009 rows=0 loops=1)
Index Cond: ((((content -> 'meta'::text) ->> 'version'::text))::integer < 90000)
Planning time: 0.074 ms
Execution time: 0.040 ms
The index was used.
Related
I'm trying to get a count of 'visit' records created by joining values from 3 tables. I have a simple query below but it takes almost 30 min on the db. Is there a way to optimize this query any further?
select a."ClientID" as ClientID, b."ProviderID" as ProviderID, count(1) as VisitCount
from "Log" c
inner join "MessageDetail" b on c."MessageDetailID" = b."MessageDetailID"
inner join "Message" a on a."MessageID" = b."MessageID"
where a."CreatedUTCDate" >= NOW() - INTERVAL '1 HOUR'
group by a."ClientID", b."ProviderID"
Example Result
ClientID ProviderID VisitCount
3245cf64-test-4d05-9d5d-345653566455 677777 1
3245cf64-test-4d05-9d5d-345653566455 677777 1
0284a326-test-4757-b00e-34563465dfgg 9999 5
Explain plan
GroupAggregate (cost=6529150.62..6529160.28 rows=483 width=48)
Group Key: a."ClientID", b."ProviderID"
-> Sort (cost=6529150.62..6529151.83 rows=483 width=40)
Sort Key: a."ClientID", b."ProviderID"
-> Nested Loop (cost=1.00..6529129.09 rows=483 width=40)
-> Nested Loop (cost=0.56..6509867.54 rows=3924 width=48)
-> Seq Scan on "Message" a (cost=0.00..6274917.96 rows=3089 width=44)
Filter: ("CreatedUTCDate" >= (now() - '01:00:00'::interval))
-> Index Scan using "ix_MessageDetail_MessageId" on "MessageDetail" b (cost=0.56..75.40 rows=66 width=20)
Index Cond: ("MessageID" = a."MessageID")
-> Index Only Scan using "ix_Log_MessageDetailId" on "Log" c (cost=0.43..4.90 rows=1 width=8)
Index Cond: ("MessageDetailID" = b."MessageDetailID")
Explain Analyze Plan
GroupAggregate (cost=6529127.35..6529137.01 rows=483 width=48) (actual time=791639.382..791661.555 rows=118 loops=1)
Group Key: a."ClientID", b."ProviderID"
-> Sort (cost=6529127.35..6529128.56 rows=483 width=40) (actual time=791639.373..791649.235 rows=64412 loops=1)
Sort Key: a."ClientID", b."ProviderID"
Sort Method: external merge Disk: 3400kB
-> Nested Loop (cost=1.00..6529105.82 rows=483 width=40) (actual time=25178.920..791410.769 rows=64412 loops=1)
-> Nested Loop (cost=0.56..6509844.55 rows=3924 width=48) (actual time=25178.874..790954.577 rows=65760 loops=1)
-> Seq Scan on "Message" a (cost=0.00..6274894.96 rows=3089 width=44) (actual time=25178.799..790477.178 rows=25121 loops=1)
Filter: ("CreatedUTCDate" >= (now() - '01:00:00'::interval))
Rows Removed by Filter: 30839080
-> Index Scan using "ix_MessageDetail_MessageId" on "MessageDetail" b (cost=0.56..75.40 rows=66 width=20) (actual time=0.009..0.016 rows=3 loops=25121)
Index Cond: ("MessageID" = a."MessageID")
-> Index Only Scan using "ix_Log_MessageDetailId" on "Log" c (cost=0.43..4.90 rows=1 width=8) (actual time=0.005..0.006 rows=1 loops=65760)
Index Cond: ("MessageDetailID" = b."MessageDetailID")
Heap Fetches: 65590
Planning time: 38.501 ms
Execution time: 791662.728 ms
This part of the execution plan
-> Seq Scan on "Message" a (...) (actual time=25178.799..790477.178 rows=25121 loops=1)
Filter: ("CreatedUTCDate" >= (now() - '01:00:00'::interval))
Rows Removed by Filter: 30839080
proves that an Index on "CreatedUTCDate" would speed up this query quite a lot:
almost the complete execution time is spent in this sequential scan
you scan over 30 million rows to find 25000, so the filter condition is highly selective
I have a SQL view (CREATE OR REPLACE VIEW table_view AS ...) that has 20 columns, with a mixture of text, integer, boolean and integer array fields. In total, there are nine columns that are integer arrays.
This view works very well the overwhelming majority of the time. However, when a row shows up with ~50 total values within the integer array, the query slows to a crawl. These columns are not being queried against, they are simply just part of the resultset, but the magnitude of the query goes from ~250ms to ~7s if any rows are returned that contain the high number of integer array values.
Here is an example query:
SELECT
"table_view"."id", -- integer
"table_view"."start", -- date with tz
"table_view"."end", -- date with tz
"table_view"."a_id", -- integer
"table_view"."c_id", -- integer
"table_view"."generator_id", -- integer
"table_view"."ci_id", -- integer
"table_view"."kind", -- integer
"table_view"."hidden", -- boolean
"table_view"."is_done", -- integer
"table_view"."title", -- text
"table_view"."c_type", -- integer
"table_view"."ci_title", -- integer
"table_view"."status", -- text
"table_view"."cs", -- integer array
"table_view"."cts", -- integer array
"table_view"."ts", -- integer array
"table_view"."tas", -- integer array
"table_view"."bs", -- integer array
"table_view"."ms", -- integer array
"table_view"."pcs", -- integer array
"table_view"."pts", -- integer array
"table_view"."ks" -- integer array
FROM
"table_view"
WHERE (
"table_view"."a_id" = 3289
AND (
"table_view"."cs" && ARRAY[28890, 21166, 28891, 29581, 29583, 22378, 22380, 22733, 28924, 28925, 28926, 28927, 28478, 41014]::integer[]
OR "table_view"."ms" && ARRAY[11125]::integer[]
) AND NOT (
"table_view"."hidden" = true
AND NOT (
"table_view"."ms" && ARRAY[11125]::integer[]
)
) AND "table_view"."end" >= '2019-02-03T00:00:00+00:00'::timestamptz
AND "table_view"."start" < '2019-04-15T00:00:00+00:00'::timestamptz
);
If none of the rows have a sum of integer array values ~> 50, the query time is ~50ms to 250ms.
Time: 50.417 ms
However, if one or more of the rows have a sum of integer array values <~ 50, the query time is ~6.5s to 7s.
Time: 6737.154 ms
I have obfuscated some of the columns names from this query, but that shouldn't impact the debugging of this issue. I'm beating myself up over this. Nothing I am researching online mentioned anything about column length being an issue in the resulting query.
Here is the output of the EXPLAIN ANALYZE for the "bad" query:
Subquery Scan on table_view (cost=173.16..173.29 rows=1 width=436) (actual time=648.470..800.577 rows=60 loops=1)
-> GroupAggregate (cost=173.16..173.28 rows=1 width=103) (actual time=648.468..800.552 rows=60 loops=1)
Group Key: dhci.id, wfciwf.id, dutm.account_id
Filter: (((array_append('{}'::integer[], dhci.cid) && '{28890,21166,28891,29581,29583,22378,22380,22733,28924,28925,28926,28927,28478,41014}'::integer[]) OR (array_agg(DISTIN
CT dhamti.member_id) && '{11125}'::integer[])) AND ((NOT dhci.hidden) OR (array_agg(DISTINCT dhamti.member_id) && '{11125}'::integer[])))
-> Sort (cost=173.16..173.17 rows=1 width=103) (actual time=647.727..685.155 rows=172000 loops=1)
Sort Key: dhci.id, wfciwf.id, dutm.account_id
Sort Method: external merge Disk: 16392kB
-> Nested Loop Left Join (cost=68.45..173.15 rows=1 width=103) (actual time=1.033..478.844 rows=172000 loops=1)
-> Nested Loop Left Join (cost=68.16..171.71 rows=1 width=99) (actual time=1.017..308.168 rows=172000 loops=1)
-> Nested Loop Left Join (cost=67.87..170.20 rows=1 width=95) (actual time=0.997..126.331 rows=172000 loops=1)
-> Nested Loop Left Join (cost=67.58..168.77 rows=1 width=91) (actual time=0.978..26.478 rows=33400 loops=1)
-> Nested Loop Left Join (cost=67.16..165.68 rows=1 width=87) (actual time=0.960..4.997 rows=3700 loops=1)
-> Nested Loop Left Join (cost=66.73..162.71 rows=1 width=83) (actual time=0.949..2.572 rows=410 loops=1)
-> Nested Loop Left Join (cost=66.44..161.59 rows=1 width=79) (actual time=0.939..2.121 rows=60 loops=1)
-> Nested Loop (cost=66.02..154.64 rows=1 width=75) (actual time=0.926..1.818 rows=60 loops=1)
-> Nested Loop (cost=65.59..148.07 rows=1 width=66) (actual time=0.902..1.446 rows=60 loops=1)
-> Index Scan using dutm_8a089c2a on dutm (cost=0.29..9.40 rows=2 width=8) (actual time=0.017..0.028 rows=9 loops=1)
Index Cond: (account_id = 3289)
-> Bitmap Heap Scan on dhci (cost=65.31..69.33 rows=1 width=66) (actual time=0.137..0.148 rows=7 loops=9)
Recheck Cond: ((owner_member_id = dutm.id) AND (deadline IS NOT NULL) AND (deadline >= '2019-02-03 00:00:00+00'::timestamp with time zone) AND (deadline < '2019-04-15 00:00:00+00'::timestamp with time zone))
Heap Blocks: exact=36
-> BitmapAnd (cost=65.31..65.31 rows=1 width=0) (actual time=0.135..0.135 rows=0 loops=9)
-> Bitmap Index Scan on dhci_9adb17cb (cost=0.00..5.79 rows=182 width=0) (actual time=0.036..0.036 rows=167 loops=9)
Index Cond: (owner_member_id = dutm.id)
-> Bitmap Index Scan on dhci_deadline_ba8bd3addbfe7dc_uniq (cost=0.00..58.66 rows=2419 width=0) (actual time=0.263..0.263 rows=1727 loops=3)
Index Cond: ((deadline IS NOT NULL) AND (deadline >= '2019-02-03 00:00:00+00'::timestamp with time zone) AND (deadline < '2019-04-15 00:00:00+00'::timestamp with time zone))
-> Index Scan using wfciwf_pkey on wfciwf (cost=0.42..6.55 rows=1 width=13) (actual time=0.005..0.005 rows=1 loops=60)
Index Cond: (id = dhci.workflow_state_id)
Filter: (((current_status)::text <> 'killed'::text) AND ((current_status)::text <> 'parked'::text))
-> Index Scan using dhamti_d7bbcb82 on dhamti (cost=0.42..6.93 rows=2 width=8) (actual time=0.003..0.004 rows=1 loops=60)
Index Cond: (content_item_id = dhci.id)
-> Index Scan using dhcibs_6123fe8a on dhcibs (cost=0.29..1.10 rows=2 width=8) (actual time=0.003..0.005 rows=7 loops=60)
Index Cond: (ciid = dhci.id)
-> Index Scan using dhcit_6123fe8a on dhcit (cost=0.42..2.95 rows=2 width=8) (actual time=0.002..0.004 rows=9 loops=410)
Index Cond: (ciid = dhci.id)
-> Index Scan using ghcita_6123fe8a on ghcita (cost=0.42..3.07 rows=2 width=8) (actual time=0.002..0.003 rows=9 loops=3700)
Index Cond: (ciid = dhci.id)
-> Index Scan using dhcipc_6123fe8a on dhcipc (cost=0.29..1.41 rows=2 width=8) (actual time=0.001..0.002 rows=5 loops=33400)
Index Cond: (ciid = dhci.id)
-> Index Scan using dhcipt_6123fe8a on dhcipt (cost=0.29..1.49 rows=2 width=8) (actual time=0.001..0.001 rows=0 loops=172000)
Index Cond: (ciid = dhci.id)
-> Index Scan using dhcik_6123fe8a on dhcik (cost=0.29..1.41 rows=3 width=8) (actual time=0.001..0.001 rows=0 loops=172000)
Index Cond: (ciid = dhci.id)
Planning time: 8.219 ms
Execution time: 804.161 ms
(45 rows)
I've been optimizing some sql queries against a production database clone. Here is an example query where I've create two indexes where we can run index-only scans really fast using a hash join.
explain analyse
select activity.id from activity, notification
where notification.user_id = '9a51f675-e1e2-46e5-8bcd-6bc535c7e7cb'
and notification.received = false
and notification.invalid = false
and activity.id = notification.activity_id
and activity.space_id = 'e12b42ac-4e54-476f-a4f5-7d6bdb1e61e2'
order by activity.end_time desc
limit 21;
Limit (cost=985.58..985.58 rows=1 width=24) (actual time=0.017..0.017 rows=0 loops=1)
-> Sort (cost=985.58..985.58 rows=1 width=24) (actual time=0.016..0.016 rows=0 loops=1)
Sort Key: activity.end_time DESC
Sort Method: quicksort Memory: 25kB
-> Hash Join (cost=649.76..985.57 rows=1 width=24) (actual time=0.010..0.010 rows=0 loops=1)
Hash Cond: (notification.activity_id = activity.id)
-> Index Only Scan using unreceived_notifications_index on notification (cost=0.42..334.62 rows=127 width=16) (actual time=0.009..0.009 rows=0 loops=1)
Index Cond: (user_id = '9a51f675-e1e2-46e5-8bcd-6bc535c7e7cb'::uuid)
Heap Fetches: 0
-> Hash (cost=634.00..634.00 rows=1227 width=24) (never executed)
-> Index Only Scan using space_activity_index on activity (cost=0.56..634.00 rows=1227 width=24) (never executed)
Index Cond: (space_id = 'e12b42ac-4e54-476f-a4f5-7d6bdb1e61e2'::uuid)
Heap Fetches: 0
Planning time: 0.299 ms
Execution time: 0.046 ms
And here are the indexes.
create index unreceived_notifications_index on notification using btree (
user_id,
activity_id, -- index-only scan
id -- index-only scan
) where (
invalid = false
and received = false
);
space_activity_index
create index space_activity_index on activity using btree (
space_id,
end_time desc,
id -- index-only scan
);
However, I'm noticing that these indexes are making our development database a LOT slower. Here's the same query against a user in our development database and you'll notice its using a nested loop join this time and the order of the loops is really inefficient.
explain analyse
select notification.id from notification, activity
where notification.user_id = '7c74a801-7cb5-4914-bbbe-2b18cd1ced76'
and notification.received = false
and notification.invalid = false
and activity.id = notification.activity_id
and activity.space_id = '415fc269-e68f-4da0-b3e3-b1273b741a7f'
order by activity.end_time desc
limit 20;
Limit (cost=0.69..272.04 rows=20 width=24) (actual time=277.255..277.255 rows=0 loops=1)
-> Nested Loop (cost=0.69..71487.55 rows=5269 width=24) (actual time=277.253..277.253 rows=0 loops=1)
-> Index Only Scan using space_activity_index on activity (cost=0.42..15600.36 rows=155594 width=24) (actual time=0.016..59.433 rows=155666 loops=1)
Index Cond: (space_id = '415fc269-e68f-4da0-b3e3-b1273b741a7f'::uuid)
Heap Fetches: 38361
-> Index Only Scan using unreceived_notifications_index on notification (cost=0.27..0.35 rows=1 width=32) (actual time=0.001..0.001 rows=0 loops=155666)
Index Cond: ((user_id = '7c74a801-7cb5-4914-bbbe-2b18cd1ced76'::uuid) AND (activity_id = activity.id))
Heap Fetches: 0
Planning time: 0.351 ms
Execution time: 277.286 ms
One thing to note here is that there is are only 2 space_ids in our development database. I suspect this is causing Postgres to try to be clever, but it's actually making performance worse!
My question is:
Is there some way that I can force Postgres to run the hash join instead of the nested loop join?
Is there some way, in general, that I can make Postgres's query-planner more deterministic? Ideally, the query performance characteristics would be the exact same between these environments.
Thanks.
Edit: Note that when I leave out the space_id condition when querying my dev database, the result is faster.
explain analyse
select notification.id from notification, activity
where notification.user_id = '7c74a801-7cb5-4914-bbbe-2b18cd1ced76'
and notification.received = false
and notification.invalid = false
and activity.id = notification.activity_id
--and activity.space_id = '415fc269-e68f-4da0-b3e3-b1273b741a7f'
order by activity.end_time desc
limit 20;
Limit (cost=17628.13..17630.43 rows=20 width=24) (actual time=2.730..2.730 rows=0 loops=1)
-> Gather Merge (cost=17628.13..17996.01 rows=3199 width=24) (actual time=2.729..2.729 rows=0 loops=1)
Workers Planned: 1
Workers Launched: 1
-> Sort (cost=16628.12..16636.12 rows=3199 width=24) (actual time=0.126..0.126 rows=0 loops=2)
Sort Key: activity.end_time DESC
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=20.59..16441.88 rows=3199 width=24) (actual time=0.093..0.093 rows=0 loops=2)
-> Parallel Bitmap Heap Scan on notification (cost=20.17..2512.17 rows=3199 width=32) (actual time=0.092..0.092 rows=0 loops=2)
Recheck Cond: ((user_id = '7c74a801-7cb5-4914-bbbe-2b18cd1ced76'::uuid) AND (NOT invalid) AND (NOT received))
-> Bitmap Index Scan on unreceived_notifications_index (cost=0.00..18.82 rows=5439 width=0) (actual time=0.006..0.006 rows=0 loops=1)
Index Cond: (user_id = '7c74a801-7cb5-4914-bbbe-2b18cd1ced76'::uuid)
-> Index Scan using activity_pkey on activity (cost=0.42..4.35 rows=1 width=24) (never executed)
Index Cond: (id = notification.activity_id)
Planning time: 0.344 ms
Execution time: 3.433 ms
Edit: After reading about index hinting, I tried turning nested_loop off using set enable_nestloop=false; and the query is way faster!
Limit (cost=20617.76..20620.09 rows=20 width=24) (actual time=2.872..2.872 rows=0 loops=1)
-> Gather Merge (cost=20617.76..21130.20 rows=4392 width=24) (actual time=2.871..2.871 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=19617.74..19623.23 rows=2196 width=24) (actual time=0.086..0.086 rows=0 loops=3)
Sort Key: activity.end_time DESC
Sort Method: quicksort Memory: 25kB
-> Hash Join (cost=2609.20..19495.85 rows=2196 width=24) (actual time=0.062..0.062 rows=0 loops=3)
Hash Cond: (activity.id = notification.activity_id)
-> Parallel Seq Scan on activity (cost=0.00..14514.57 rows=64831 width=24) (actual time=0.006..0.006 rows=1 loops=3)
Filter: (space_id = '415fc269-e68f-4da0-b3e3-b1273b741a7f'::uuid)
-> Hash (cost=2541.19..2541.19 rows=5441 width=32) (actual time=0.007..0.007 rows=0 loops=3)
Buckets: 8192 Batches: 1 Memory Usage: 64kB
-> Bitmap Heap Scan on notification (cost=20.18..2541.19 rows=5441 width=32) (actual time=0.006..0.006 rows=0 loops=3)
Recheck Cond: ((user_id = '7c74a801-7cb5-4914-bbbe-2b18cd1ced76'::uuid) AND (NOT invalid) AND (NOT received))
-> Bitmap Index Scan on unreceived_notifications_index (cost=0.00..18.82 rows=5441 width=0) (actual time=0.004..0.004 rows=0 loops=3)
Index Cond: (user_id = '7c74a801-7cb5-4914-bbbe-2b18cd1ced76'::uuid)
Planning time: 0.375 ms
Execution time: 3.630 ms
It depends on how specialized you want to get. There are plan guides in postgresQL that you can use to force the queries to use specific indexes. But query optimizers are strongly impacted by record counts in the choices they make. Maybe you add the extra indexes in the non-dev environment and move on?
https://docs.aws.amazon.com/dms/latest/sql-server-to-aurora-postgresql-migration-playbook/chap-sql-server-aurora-pg.tuning.queryplanning.html
I have an aggregation query which is ends up to be slow, I am looking for any improvements in "query" or "index".
I indexed all the fieldsI use, maybe i missed something, or maybe you can suggest any ways I can execute this query
query:
EXPLAIN ANALYZE
SELECT HE.fs_perm_sec_id,
HE.TICKER_EXCHANGE,
HE.proper_name,
OP.shares_outstanding,
(SELECT factset_industry_desc
FROM factset_industry_map AS fim
WHERE fim.factset_industry_code = HES.industry_code) AS industry,
// slow aggregation
(SELECT SUM(OIH.current_holdings)
FROM own_inst_holdings OIH
WHERE OIH.fs_perm_sec_id = HE.fs_perm_sec_id) AS inst_holdings
FROM own_prices OP
JOIN h_security_ticker_exchange HE ON OP.fs_perm_sec_id = HE.fs_perm_sec_id
JOIN h_entity_sector HES ON HES.factset_entity_id = HE.factset_entity_id
WHERE HE.ticker_exchange = 'BUD-NYS'
ORDER BY OP.price_date DESC LIMIT 1
Where this piece slows down the query:
(SELECT SUM(OIH.current_holdings)
FROM own_inst_holdings OIH
WHERE OIH.fs_perm_sec_id = HE.fs_perm_sec_id) AS inst_holdings
EXPLAIN ANALYZE
Limit (cost=360.41..360.41 rows=1 width=100) (actual time=920.592..920.592 rows=1 loops=1)
-> Sort (cost=360.41..360.41 rows=1 width=100) (actual time=920.592..920.592 rows=1 loops=1)
Sort Key: op.price_date
Sort Method: top-N heapsort Memory: 25kB
-> Nested Loop (cost=0.26..360.41 rows=1 width=100) (actual time=867.898..920.493 rows=35 loops=1)
-> Nested Loop (cost=0.17..6.43 rows=1 width=104) (actual time=4.882..4.940 rows=35 loops=1)
-> Index Scan using h_sec_exch_factset_entity_id_idx on h_security_ticker_exchange he (cost=0.09..4.09 rows=1 width=92) (actual time=3.611..3.612 rows=1 loops=1)
Index Cond: ((ticker_exchange)::text = 'BUD-NYS'::text)
-> Index Only Scan using own_prices_multiple_idx_1 on own_prices op (cost=0.09..2.25 rows=32 width=23) (actual time=1.258..1.301 rows=35 loops=1)
Index Cond: (fs_perm_sec_id = (he.fs_perm_sec_id)::text)
Heap Fetches: 0
-> Index Scan using h_entity_sector_multiple_idx_3 on h_entity_sector hes (cost=0.09..4.09 rows=1 width=14) (actual time=0.083..0.085 rows=1 loops=35)
Index Cond: (factset_entity_id = he.factset_entity_id)
SubPlan 1
-> Seq Scan on factset_industry_map fim (cost=0.00..2.48 rows=1 width=20) (actual time=0.014..0.031 rows=1 loops=35)
Filter: (factset_industry_code = hes.industry_code)
Rows Removed by Filter: 137
SubPlan 2
-> Aggregate (cost=347.40..347.40 rows=1 width=6) (actual time=26.035..26.035 rows=1 loops=35)
-> Bitmap Heap Scan on own_inst_holdings oih (cost=4.36..347.31 rows=177 width=6) (actual time=0.326..25.658 rows=622 loops=35)
Recheck Cond: ((fs_perm_sec_id)::text = (he.fs_perm_sec_id)::text)
Heap Blocks: exact=22750
-> Bitmap Index Scan on own_inst_holdings_fs_perm_sec_id_idx (cost=0.00..4.35 rows=177 width=0) (actual time=0.232..0.232 rows=662 loops=35)
Index Cond: ((fs_perm_sec_id)::text = (he.fs_perm_sec_id)::text)
Planning time: 5.806 ms
Execution time: 920.778 ms
For this query:
SELECT HE.fs_perm_sec_id, HE.TICKER_EXCHANGE, HE.proper_name, OP.shares_outstanding,
(SELECT factset_industry_desc
FROM factset_industry_map AS fim
WHERE fim.factset_industry_code = HES.industry_code
) AS industry,
(SELECT SUM(OIH.current_holdings)
FROM own_inst_holdings OIH
WHERE OIH.fs_perm_sec_id = HE.fs_perm_sec_id
) AS inst_holdings
FROM own_prices OP JOIN
h_security_ticker_exchange HE
ON OP.fs_perm_sec_id = HE.fs_perm_sec_id JOIN
h_entity_sector HES
ON HES.factset_entity_id = HE.factset_entity_id
WHERE HE.ticker_exchange = 'BUD-NYS'
ORDER BY OP.price_date DESC
LIMIT 1;
You want the following indexes:
h_security_ticker_exchange(ticker_exchange, factset_entity_id, fs_perm_sec_id)
own_prices(fs_perm_sec_id)
h_entity_sector(factset_entity_id)
factset_industry_map(factset_industry_code, factset_industry_desc)
own_inst_holdings(fs_perm_sec_id, current_holdings)
This question already has answers here:
Keep PostgreSQL from sometimes choosing a bad query plan
(6 answers)
Closed 8 years ago.
I am doing a query to select records that lie in a certain geographic region and I am doing some joins and couple of filtering as well.
This is my query:
SELECT "events".* FROM "events" INNER JOIN "albums" ON "albums"."event_id" = "events"."id" INNER JOIN "photos" ON "photos"."album_id" = "albums"."id" WHERE "events"."deleted_at" IS NULL AND "albums"."deleted_at" IS NULL AND "photos"."deleted_at" IS NULL AND (events.latitude BETWEEN -44.197088742316055 AND -23.22003941183816 AND events.longitude BETWEEN 133.226480859375 AND 165.570230859375) GROUP BY events.id HAVING count(albums.id) > 0 ORDER BY start_date DESC
I have the following indexes:
Events:
"events_pkey" PRIMARY KEY, btree (id)
"index_events_on_deleted_at" btree (deleted_at)
"index_events_on_latitude_and_longitude" btree (latitude, longitude)
Albums:
"albums_pkey" PRIMARY KEY, btree (id)
"index_albums_on_deleted_at" btree (deleted_at)
"index_albums_on_event_id" btree (event_id)
Photos:
"photos_pkey" PRIMARY KEY, btree (id)
"index_photos_on_album_id" btree (album_id)
"index_photos_on_deleted_at" btree (deleted_at)
Doing an EXPLAIN ANALYZE results in this and I dont see any usage of my indexes which. I am not sure how to force it to use the indexes. Can anyone help me optimize this?
Sort (cost=4057.46..4057.84 rows=150 width=668) (actual time=556.114..556.187 rows=76 loops=1)
Sort Key: events.start_date
Sort Method: quicksort Memory: 78kB
-> HashAggregate (cost=4050.16..4052.04 rows=150 width=668) (actual time=555.667..555.783 rows=76 loops=1)
Filter: (count(albums.id) > 0)
-> Hash Join (cost=76.14..3946.54 rows=20724 width=668) (actual time=3.675..467.578 rows=48050 loops=1)
Hash Cond: (photos.album_id = albums.id)
-> Seq Scan on photos (cost=0.00..3441.87 rows=59013 width=4) (actual time=0.008..169.206 rows=60599 loops=1)
Filter: (deleted_at IS NULL)
-> Hash (cost=74.10..74.10 rows=163 width=668) (actual time=3.633..3.633 rows=318 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 176kB
-> Hash Join (cost=49.80..74.10 rows=163 width=668) (actual time=1.195..2.519 rows=318 loops=1)
Hash Cond: (albums.event_id = events.id)
-> Seq Scan on albums (cost=0.00..21.47 rows=321 width=8) (actual time=0.011..0.458 rows=321 loops=1)
Filter: (deleted_at IS NULL)
-> Hash (cost=47.92..47.92 rows=150 width=664) (actual time=1.151..1.151 rows=195 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 126kB
-> Seq Scan on events (cost=0.00..47.92 rows=150 width=664) (actual time=0.007..0.488 rows=195 loops=1)
Filter: ((deleted_at IS NULL) AND (latitude >= (-44.1970887423161)::double precision) AND (latitude <= (-23.2200394118382)::double precision) AND (longitude >= 133.226480859375::double precision) AND (longitude <= 165.570230859375::double precision))
Total runtime: 556.459 ms
Thanks!!
EDIT: Thanks for the links. I have tried disabling seqscan. Now my plan is:
Sort (cost=5565.73..5566.10 rows=150 width=46) (actual time=451.208..451.290 rows=76 loops=1)
Sort Key: (date(events.start_date))
Sort Method: quicksort Memory: 31kB
-> GroupAggregate (cost=0.00..5560.31 rows=150 width=46) (actual time=2.990..450.850 rows=76 loops=1)
Filter: (count(albums.id) > 0)
-> Nested Loop (cost=0.00..5454.44 rows=20724 width=46) (actual time=0.077..278.319 rows=48050 loops=1)
-> Merge Join (cost=0.00..205.35 rows=163 width=46) (actual time=0.051..2.856 rows=318 loops=1)
Merge Cond: (events.id = albums.event_id)
-> Index Scan using events_pkey on events (cost=0.00..118.72 rows=150 width=42) (actual time=0.024..0.792 rows=195 loops=1)
Filter: ((deleted_at IS NULL) AND (latitude >= (-44.1970887423161)::double precision) AND (latitude <= (-23.2200394118382)::double precision)
AND (longitude >= 133.226480859375::double precision) AND (longitude <= 165.570230859375::double precision))
-> Index Scan using index_albums_on_event_id on albums (cost=0.00..83.83 rows=321 width=8) (actual time=0.017..0.832 rows=321 loops=1)
Filter: (deleted_at IS NULL)
-> Index Scan using index_photos_on_album_id on photos (cost=0.00..30.27 rows=155 width=4) (actual time=0.010..0.409 rows=151 loops=318)
Index Cond: (album_id = albums.id)
Filter: (deleted_at IS NULL)
Total runtime: 451.562 ms
Still Indexes are not being fully used especially for the latitude and long conditions. Do I have my indexes setup correctly?
EDIT: After looking at answers at http://stackoverflow.com/questions/8228326/how-can-i-avoid-postgresql-sometimes-choosing-a-bad-query-plan-for-one-of-two-ne, I assumed that its behaving like this because my query was returning all records, then I updated the conditions, and the new query plan is:
Sort (cost=786.18..786.22 rows=19 width=668) (actual time=3.754..3.755 rows=2 loops=1)
Sort Key: events.start_date
Sort Method: quicksort Memory: 25kB
-> HashAggregate (cost=785.54..785.77 rows=19 width=668) (actual time=3.700..3.703 rows=2 loops=1)
Filter: (count(albums.id) > 0)
-> Nested Loop (cost=48.39..765.51 rows=2670 width=668) (actual time=1.116..2.968 rows=543 loops=1)
-> Hash Join (cost=48.39..89.25 rows=21 width=668) (actual time=1.093..1.128 rows=3 loops=1)
Hash Cond: (events.id = albums.event_id)
-> Bitmap Heap Scan on events (cost=9.42..49.44 rows=19 width=664) (actual time=0.061..0.080 rows=9 loops=1)
Recheck Cond: ((latitude >= (-33.7474111086624)::double precision) AND (latitude <= (-33.581678187556)::double precision) AND (longitude >= 151.193933862305::double precision) AND (longitude <= 151.44661940918::double precision))
Filter: (deleted_at IS NULL)
-> Bitmap Index Scan on index_events_on_latitude_and_longitude (cost=0.00..9.42 rows=28 width=0) (actual time=0.050..0.050 rows=9 loops=1)
Index Cond: ((latitude >= (-33.7474111086624)::double precision) AND (latitude <= (-33.581678187556)::double precision) AND (longitude >= 151.193933862305::double precision) AND (longitude <= 151.44661940918::double precision))
-> Hash (cost=34.95..34.95 rows=321 width=8) (actual time=0.992..0.992 rows=321 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 13kB
-> Bitmap Heap Scan on albums (cost=14.74..34.95 rows=321 width=8) (actual time=0.069..0.570 rows=321 loops=1)
Recheck Cond: (deleted_at IS NULL)
-> Bitmap Index Scan on index_albums_on_deleted_at (cost=0.00..14.66 rows=321 width=0) (actual time=0.056..0.056 rows=321 loops=1)
Index Cond: (deleted_at IS NULL)
-> Index Scan using index_photos_on_album_id on photos (cost=0.00..30.27 rows=155 width=4) (actual time=0.014..0.273 rows=181 loops=3)
Index Cond: (album_id = albums.id)
Filter: (deleted_at IS NULL)
Total runtime: 3.958 ms
And the time is verrry less!!
Any suggestions?
This is mostly because you did not set a LIMIT clause on your query. Without LIMIT you always requests all data from your tables, so looking at their indexes too would be insufficient.
SQLFiddle 1, 2 vs. 3, 4
Also note, that FOREIGN KEY does not add indexes (but UNIQUE & PRIMARY KEY constraint does). So you might want to add indexes for albums.event_id & photos.album_id.
SQLFiddle 3 vs. 4
Sorting can use an index, if present. At your query this means an index on events.start_date.