I have the following query which takes a little too long to execute. I have posted the EXPLAIN ANALYZE for the query. Anything I can do to improve its speed?
EXPLAIN analyze SELECT c.*, match.user_json FROM match INNER JOIN conversation c
ON match.match_id = c.match_id WHERE c.from_id <> 142822281 AND c.to_id =
142822281 AND c.unix_timestamp = (SELECT max( unix_timestamp ) FROM conversation
WHERE match_id = c.match_id GROUP BY match_id)
EXPLAIN ANALYZE results
Nested Loop (cost=0.00..16183710.79 rows=2 width=805) (actual time=2455.133..2502.781 rows=34 loops=1)
Join Filter: (match.match_id = c.match_id)
Rows Removed by Join Filter: 71502
-> Seq Scan on match (cost=0.00..268.51 rows=2151 width=723) (actual time=0.006..4.973 rows=2104 loops=1)
-> Materialize (cost=0.00..16183377.75 rows=2 width=90) (actual time=0.034..1.168 rows=34 loops=2104)
-> Seq Scan on conversation c (cost=0.00..16183377.74 rows=2 width=90) (actual time=70.972..2421.949 rows=34 loops=1)
Filter: ((from_id <> 142822281) AND (to_id = 142822281) AND (unix_timestamp = (SubPlan 1)))
Rows Removed by Filter: 22010
SubPlan 1
-> GroupAggregate (cost=0.00..739.64 rows=10 width=16) (actual time=5.358..5.358 rows=1 loops=450)
Group Key: conversation.match_id
-> Seq Scan on conversation (cost=0.00..739.49 rows=10 width=16) (actual time=3.355..5.320 rows=17 loops=450)
Filter: (match_id = c.match_id)
Rows Removed by Filter: 22027
Planning Time: 1.132 ms
Execution Time: 2502.926 ms
This is your query:
SELECT c.*, m.user_json
FROM match m INNER JOIN
conversation c
ON m.match_id = c.match_id
WHERE c.from_id <> 142822281 AND
c.to_id = 142822281 AND
c.unix_timestamp = (SELECT max( c2.unix_timestamp )
FROM conversation c2
WHERE c2.match_id = c.match_id
GROUP BY c2.match_id
);
I would suggest writing it as:
SELECT DISTINCT ON (c.match_id) c.*, m.user_json
FROM match m INNER JOIN
conversation c
ON m.match_id = c.match_id
WHERE c.from_id <> 142822281 AND
c.to_id = 142822281 AND
ORDER BY c.match_id, c.unix_timestamp DESC;
Then try an index on: conversation(to_id, from_id, match_id). I assume you have an index on match(match_id).
Related
Here's the query:
with contrib as (
select
first_name,
last_name,
user_id,
photo_url
from contributors
where visible = true
group by 1,2,3,4
),
dwm as (
select * from dialogues_with_metadata
),
joined as (
select
c.*,
dwm.dialogue_id
from contrib c
left join dwm on c.user_id = dwm.contributor_one_user_id or c.user_id = dwm.contributor_two_user_id
)
select
first_name,
last_name,
user_id,
photo_url,
count(distinct dialogue_id) as dialogues
from joined
group by 1,2,3,4
order by 3 desc
PostgreSQL database.
CPU usage is 10%, so I don't think that's the problem!
I suspect what's slowing things down is the join statement. How might I reconfigure this query so that it doesn't take ~20 seconds to return less than 300 rows?
Here's the contributors table schema:
user_id (primary key - uuid)
username
first_name
last_name
hash
description
blurb
photo_url
blurb_updated_at
visible
And the dialogues table schema:
dialogue_id (uuid)
contributor_one_uuid
contributor_two_uuid
title
image_url
visible
categories
image_source
current_popularity
created_at
override_url
visible
Ran explain select * from dialogues_with_metadata; Here are the results:
Hash Left Join (cost=167.80..172.34 rows=137 width=880)
Hash Cond: (a.dialogue_id = b.writing_dialogue_id)
CTE main
-> Hash Right Join (cost=64.30..111.60 rows=137 width=504)
Hash Cond: (c2.user_id = d.contributor_two_uuid)
-> Seq Scan on contributors c2 (cost=0.00..43.60 rows=260 width=125)
-> Hash (cost=62.59..62.59 rows=137 width=325)
-> Hash Left Join (cost=46.85..62.59 rows=137 width=325)
Hash Cond: (d.contributor_one_uuid = c1.user_id)
-> Seq Scan on dialogues d (cost=0.00..15.37 rows=137 width=216)
-> Hash (cost=43.60..43.60 rows=260 width=125)
-> Seq Scan on contributors c1 (cost=0.00..43.60 rows=260 width=125)
CTE dialogues_with_installment_counts
-> HashAggregate (cost=50.39..52.00 rows=129 width=28)
Group Key: writings.dialogue_id
-> Seq Scan on writings (cost=0.00..46.82 rows=476 width=28)
Filter: finalized
-> CTE Scan on main a (cost=0.00..2.74 rows=137 width=868)
-> Hash (cost=2.58..2.58 rows=129 width=28)
-> CTE Scan on dialogues_with_installment_counts b (cost=0.00..2.58 rows=129 width=28)
EXPLAIN on updated query:
Seq Scan on contributors c (cost=0.00..4012.89 rows=247 width=109)
Filter: visible
SubPlan 1
-> Aggregate (cost=16.06..16.07 rows=1 width=8)
-> Seq Scan on dialogues (cost=0.00..16.05 rows=2 width=0)
Filter: ((c.user_id = contributor_one_uuid) OR (c.user_id = contributor_two_uuid))
explain (analyze, buffers) select on updated query:
Seq Scan on contributors c (cost=0.00..4205.86 rows=259 width=109) (actual time=0.073..16819.258 rows=260 loops=1)
Filter: visible
Rows Removed by Filter: 13
Buffers: shared hit=3681
SubPlan 1
-> Aggregate (cost=16.06..16.07 rows=1 width=8) (actual time=64.548..64.549 rows=1 loops=260)
Buffers: shared hit=3640
-> Seq Scan on dialogues (cost=0.00..16.05 rows=2 width=0) (actual time=49.155..64.547 rows=1 loops=260)
Filter: ((c.user_id = contributor_one_uuid) OR (c.user_id = contributor_two_uuid))
Rows Removed by Filter: 136
Buffers: shared hit=3640
Planning Time: 0.136 ms
Execution Time: 16819.365 ms
A second time (16 seconds faster!):
Seq Scan on contributors c (cost=0.00..4205.86 rows=259 width=109) (actual time=0.063..801.278 rows=260 loops=1)
Filter: visible
Rows Removed by Filter: 13
Buffers: shared hit=3681
SubPlan 1
-> Aggregate (cost=16.06..16.07 rows=1 width=8) (actual time=3.080..3.080 rows=1 loops=260)
Buffers: shared hit=3640
-> Seq Scan on dialogues (cost=0.00..16.05 rows=2 width=0) (actual time=0.009..3.079 rows=1 loops=260)
Filter: ((c.user_id = contributor_one_uuid) OR (c.user_id = contributor_two_uuid))
Rows Removed by Filter: 136
Buffers: shared hit=3640
Planning Time: 0.127 ms
Execution Time: 801.379 ms
This query should do the same without unnesesary grouping statements
select
c.first_name,
c.last_name,
c.user_id,
c.photo_url,
(
select
count(distinct dialogue_id) from dialogues_with_metadata dwm
where
c.user_id = dwm.contributor_one_user_id
or c.user_id = dwm.contributor_two_user_id
) as dialogues
from contributors c
where c.visible = true
Additionaly it's worth checking if counting number of conversations can be done in more efficient way (what indexes are on this table ?).
Query version with dialogues table
select
c.first_name,
c.last_name,
c.user_id,
c.photo_url,
(
select
count(*) from dialogues
where
c.user_id = dialogues.contributor_one_uuid
or c.user_id = dialogues.contributor_two_uuid
) as dialogues
from contributors c
where c.visible = true
Another version
select
c.first_name,
c.last_name,
c.user_id,
c.photo_url,
s.dialogues
from contributors c
join (
select count(*) dialogues, user_id from (
select contributor_one_uuid user_id from dialogues
union all
select contributor_two_uuid from dialogues
) stats group by user_id
) s on (s.user_id = c.user_id)
where c.visible = true
I need to JOIN a table inside a correlated subquery. However, the chosen query plan of postgres is very slow. How can I optimize the following query:
SELECT c.id
FROM customer c
WHERE EXISTS (
SELECT 1
FROM customer_communication cc
JOIN communication co on co.id = cc.communication_id and co.channel <> 'mobile'
WHERE cc.user_id = c.id
)
This is the EXPLAIN (ANALYZE) result:
Nested Loop (cost=3451561.57..3539012.42 rows=24509 width=8) (actual time=60913.294..64056.970 rows=1036309 loops=1)
-> HashAggregate (cost=3451561.14..3451806.23 rows=24509 width=8) (actual time=60913.264..61187.702 rows=1036310 loops=1)
Group Key: cc.customer_id
-> Hash Join (cost=2070834.75..3358538.60 rows=37209016 width=8) (actual time=32758.325..52752.383 rows=37209019 loops=1)
Hash Cond: (cc.communication_id = co.id)
-> Seq Scan on customer_communication cc (cost=0.00..755689.16 rows=37209016 width=16) (actual time=0.011..4949.315 rows=37209019 loops=1)
-> Hash (cost=1772758.38..1772758.38 rows=18168430 width=8) (actual time=32756.662..32756.663 rows=18108924 loops=1)
Buckets: 262144 Batches: 128 Memory Usage: 7557kB
-> Seq Scan on communication co (cost=0.00..1772758.38 rows=18168430 width=8) (actual time=0.007..30024.494 rows=18108924 loops=1)
Filter: (channel <> 'mobile')
-> Index Only Scan using customerxpk on customer c (cost=0.43..3.60 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=1036310)
Index Cond: (id = cc.customer_id)
Heap Fetches: 525050
Planning Time: 0.391 ms
Execution Time: 64094.584 ms
I think you have mis-specified the query, because you have conflicting aliases. This might be better:
SELECT c.id
FROM customer c
WHERE EXISTS (SELECT 1
FROM customer_communication cc JOIN
communication co
ON co.id = cc.communication_id AND
co.channel <> 'mobile'
WHERE cc.user_id = c.id
);
Note that in the subquery c refers to the outer query's customer and co refers to communication.
I am observing that COUNT(*) from table is not an optimised query when it comes to deep SQLs.
Here's the sql I am working with
SELECT COUNT(*) FROM "items"
INNER JOIN (
SELECT c.* FROM companies c LEFT OUTER JOIN company_groups ON c.id = company_groups.company_id
WHERE company_groups.has_restriction IS NULL OR company_groups.has_restriction = 'f' OR company_groups.company_id = 1999 OR company_groups.group_id IN ('3','2')
GROUP BY c.id
) AS companies ON companies.id = stock_items.vendor_id
LEFT OUTER JOIN favs ON items.id = favs.item_id AND favs.user_id = 999 AND favs.is_visible = TRUE
WHERE "items"."type" IN ('Fashion') AND "items"."visibility" = 't' AND "items"."is_hidden" = 'f' AND (items.depth IS NULL OR (items.depth >= '0' AND items.depth <= '100')) AND (items.table IS NULL OR (items.table >= '0' AND items.table <= '100')) AND (items.company_id NOT IN (199,200,201))
This query is taking 4084.8ms to count from 0.35 Million records from database.
I am using Rails as framework, so the SQL I am composing fires a COUNT query of the original query whenever I call results.count
Since, I am using LIMIT and OFFSET so basic results are loading in less than 32.0ms (which is way too fast)
Here's the output of the EXPLAIN ANALYSE
Merge Join (cost=70743.22..184962.02 rows=7540499 width=4) (actual time=4018.351..4296.963 rows=360323 loops=1)
Merge Cond: (c.id = items.company_id)
-> Group (cost=0.56..216.21 rows=4515 width=4) (actual time=0.357..5.165 rows=4501 loops=1)
Group Key: c.id
-> Merge Left Join (cost=0.56..204.92 rows=4515 width=4) (actual time=0.303..2.590 rows=4504 loops=1)
Merge Cond: (c.id = company_groups.company_id)
Filter: ((company_groups.has_restriction IS NULL) OR (NOT company_groups.has_restriction) OR (company_groups.company_id = 1999) OR (company_groups.group_id = ANY ('{3,2}'::integer[])))
Rows Removed by Filter: 10
-> Index Only Scan using companies_pkey on companies c (cost=0.28..128.10 rows=4521 width=4) (actual time=0.155..0.941 rows=4508 loops=1)
Heap Fetches: 3
-> Index Scan using index_company_groups_on_company_id on company_groups (cost=0.28..50.14 rows=879 width=9) (actual time=0.141..0.480 rows=878 loops=1)
-> Materialize (cost=70742.66..72421.11 rows=335690 width=8) (actual time=4017.964..4216.381 rows=362180 loops=1)
-> Sort (cost=70742.66..71581.89 rows=335690 width=8) (actual time=4017.955..4140.168 rows=362180 loops=1)
Sort Key: items.company_id
Sort Method: external merge Disk: 6352kB
-> Hash Left Join (cost=1.05..35339.74 rows=335690 width=8) (actual time=0.617..3588.634 rows=362180 loops=1)
Hash Cond: (items.id = favs.item_id)
-> Seq Scan on items (cost=0.00..34079.84 rows=335690 width=8) (actual time=0.504..3447.355 rows=362180 loops=1)
Filter: (visibility AND (NOT is_hidden) AND ((type)::text = 'Fashion'::text) AND (company_id <> ALL ('{199,200,201}'::integer[])) AND ((depth IS NULL) OR ((depth >= '0'::numeric) AND (depth <= '100'::nume (...)
Rows Removed by Filter: 5814
-> Hash (cost=1.04..1.04 rows=1 width=4) (actual time=0.009..0.009 rows=0 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 8kB
-> Seq Scan on favs (cost=0.00..1.04 rows=1 width=4) (actual time=0.008..0.008 rows=0 loops=1)
Filter: (is_visible AND (user_id = 999))
Rows Removed by Filter: 3
Planning time: 3.526 ms
Execution time: 4397.849 ms
Please advise on how should I make it work faster!
P.S.: All the columns are indexed like type, visibility, is_hidden, table, depth etc.
Thanks in advance!
Well, you have two parts that select everything (SELECT *) in your query, maybe you could limit that and see if it helps, example:
SELECT COUNT(OneSpecificColumn)
FROM "items"
INNER JOIN
( SELECT c.(AnotherSpecificColumn)
FROM companies c
LEFT OUTER JOIN company_groups ON c.id = company_groups.company_id
WHERE company_groups.has_restriction IS NULL
OR company_groups.has_restriction = 'f'
OR company_groups.company_id = 1999
OR company_groups.group_id IN ('3',
'2')
GROUP BY c.id) AS companies ON companies.id = stock_items.vendor_id
LEFT OUTER JOIN favs ON items.id = favs.item_id
AND favs.user_id = 999
AND favs.is_visible = TRUE
WHERE "items"."type" IN ('Fashion')
AND "items"."visibility" = 't'
AND "items"."is_hidden" = 'f'
AND (items.depth IS NULL
OR (items.depth >= '0'
AND items.depth <= '100'))
AND (items.table IS NULL
OR (items.table >= '0'
AND items.table <= '100'))
AND (items.company_id NOT IN (199,
200,
201))
You could also check if those left joins are all necessary, inner joins are less costly and may speed up your search.
The lion's share of the time is spent in the sequential scan of items, and that cannot be improved, because you need almost all of the rows in the table.
So the only ways to improve the query are
see that items is cached in memory
get faster storage
I've written the following PostgreSQL query which works as it should. However, it seems to be awfully slow, sometimes taking up to 10 seconds to return a result. I'm sure there is something in my statement that is causing this to be slow.
Can anyone help determine why this query is slow?
SELECT DISTINCT ON (school_classes.class_id,attendance_calendar.school_date)
school_classes.class_id, school_classes.class_name, school_classes.grade_id
, school_gradelevels.linked_calendar, attendance_calendars.calendar_id
, attendance_calendar.school_date, attendance_calendar.minutes
, teacher_join_classes_subjects.staff_id, staff.first_name, staff.last_name
FROM school_classes
INNER JOIN school_gradelevels ON school_gradelevels.id=school_classes.grade_id
INNER JOIN teacher_join_classes_subjects ON teacher_join_classes_subjects.class_id=school_classes.class_id
INNER JOIN staff ON staff.staff_id=teacher_join_classes_subjects.staff_id
INNER JOIN attendance_calendars ON attendance_calendars.title=school_gradelevels.linked_calendar
INNER JOIN attendance_calendar ON attendance_calendar.calendar_id=attendance_calendars.calendar_id
WHERE teacher_join_classes_subjects.syear='2013'
AND staff.syear='2013'
AND attendance_calendars.syear='2013'
AND teacher_join_classes_subjects.does_attendance='Y'
AND teacher_join_classes_subjects.subject_id IS NULL
AND attendance_calendar.school_date<CURRENT_DATE
AND attendance_calendar.school_date NOT IN (
SELECT com.school_date FROM attendance_completed com
WHERE com.class_id=school_classes.class_id
AND (com.period_id='101' AND attendance_calendar.minutes>='151' OR
com.period_id='95' AND attendance_calendar.minutes='150') )
I replaced the NOT IN with the following:
AND NOT EXISTS (
SELECT com.school_date
FROM attendance_completed com
WHERE com.class_id=school_classes.class_id
AND com.school_date=attendance_calendar.school_date
AND (com.period_id='101' AND attendance_calendar.minutes>='151' OR
com.period_id='95' AND attendance_calendar.minutes='150') )
Result of EXPLAIN ANALYZE:
Unique (cost=2998.39..2998.41 rows=3 width=85) (actual time=10751.111..10751.118 rows=1 loops=1)
-> Sort (cost=2998.39..2998.40 rows=3 width=85) (actual time=10751.110..10751.110 rows=2 loops=1)
Sort Key: school_classes.class_id, attendance_calendar.school_date
Sort Method: quicksort Memory: 25kB
-> Hash Join (cost=2.03..2998.37 rows=3 width=85) (actual time=6409.471..10751.045 rows=2 loops=1)
Hash Cond: ((teacher_join_classes_subjects.class_id = school_classes.class_id) AND (school_gradelevels.id = school_classes.grade_id))
Join Filter: (NOT (SubPlan 1))
-> Nested Loop (cost=0.00..120.69 rows=94 width=81) (actual time=2.468..1187.397 rows=26460 loops=1)
Join Filter: (attendance_calendars.calendar_id = attendance_calendar.calendar_id)
-> Nested Loop (cost=0.00..42.13 rows=1 width=70) (actual time=0.087..3.247 rows=735 loops=1)
Join Filter: ((attendance_calendars.title)::text = (school_gradelevels.linked_calendar)::text)
-> Nested Loop (cost=0.00..40.80 rows=1 width=277) (actual time=0.077..1.005 rows=245 loops=1)
-> Nested Loop (cost=0.00..39.61 rows=1 width=27) (actual time=0.064..0.572 rows=49 loops=1)
-> Seq Scan on teacher_join_classes_subjects (cost=0.00..10.48 rows=4 width=14) (actual time=0.022..0.143 rows=49 loops=1)
Filter: ((subject_id IS NULL) AND (syear = 2013::numeric) AND ((does_attendance)::text = 'Y'::text))
-> Index Scan using staff_pkey on staff (cost=0.00..7.27 rows=1 width=20) (actual time=0.006..0.007 rows=1 loops=49)
Index Cond: (staff.staff_id = teacher_join_classes_subjects.staff_id)
Filter: (staff.syear = 2013::numeric)
-> Seq Scan on attendance_calendars (cost=0.00..1.18 rows=1 width=250) (actual time=0.003..0.006 rows=5 loops=49)
Filter: (attendance_calendars.syear = 2013::numeric)
-> Seq Scan on school_gradelevels (cost=0.00..1.15 rows=15 width=11) (actual time=0.001..0.005 rows=15 loops=245)
-> Seq Scan on attendance_calendar (cost=0.00..55.26 rows=1864 width=18) (actual time=0.003..1.129 rows=1824 loops=735)
Filter: (attendance_calendar.school_date Hash (cost=1.41..1.41 rows=41 width=18) (actual time=0.040..0.040 rows=41 loops=1)
-> Seq Scan on school_classes (cost=0.00..1.41 rows=41 width=18) (actual time=0.006..0.015 rows=41 loops=1)
SubPlan 1
-> Seq Scan on attendance_completed com (cost=0.00..958.28 rows=5 width=4) (actual time=0.228..5.411 rows=17 loops=1764)
Filter: ((class_id = $0) AND (((period_id = 101::numeric) AND ($1 >= 151::numeric)) OR ((period_id = 95::numeric) AND ($1 = 150::numeric))))
NOT EXISTS is an excellent choice. Almost always better than NOT IN. More details here.
I simplified your query a bit (which looks fine, generally):
SELECT DISTINCT ON (c.class_id, a.school_date)
c.class_id, c.class_name, c.grade_id
,g.linked_calendar, aa.calendar_id
,a.school_date, a.minutes
,t.staff_id, s.first_name, s.last_name
FROM school_classes c
JOIN teacher_join_classes_subjects t USING (class_id)
JOIN staff s USING (staff_id)
JOIN school_gradelevels g ON g.id = c.grade_id
JOIN attendance_calendars aa ON aa.title = g.linked_calendar
JOIN attendance_calendar a ON a.calendar_id = aa.calendar_id
WHERE t.syear = 2013
AND s.syear = 2013
AND aa.syear = 2013
AND t.does_attendance = 'Y' -- looks like it should be boolean!
AND t.subject_id IS NULL
AND a.school_date < CURRENT_DATE
AND NOT EXISTS (
SELECT 1
FROM attendance_completed x
WHERE x.class_id = c.class_id
AND x.school_date = a.school_date
AND (x.period_id = 101 AND a.minutes >= 151 OR -- actually numbers?
x.period_id = 95 AND a.minutes = 150)
)
ORDER BY c.class_id, a.school_date, ???
What seems to be missing is ORDER BY which should accompany your DISTINCT ON. Add more ORDER BY items in place of ???. If there are duplicates to pick from, you probably want to define which to pick.
Numeric literals don't need single quotes and boolean values should be coded as such.
You may want to revisit the chapter about data types.
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=32164.87..32164.89 rows=1 width=44) (actual time=221552.831..221552.831 rows=0 loops=1)
-> Sort (cost=32164.87..32164.87 rows=1 width=44) (actual time=221552.827..221552.827 rows=0 loops=1)
Sort Key: t.date_effective, t.acct_account_transaction_id, p.method, t.amount, c.business_name, t.amount
-> Nested Loop (cost=22871.67..32164.86 rows=1 width=44) (actual time=221552.808..221552.808 rows=0 loops=1)
-> Nested Loop (cost=22871.67..32160.37 rows=1 width=52) (actual time=221431.071..221546.619 rows=670 loops=1)
-> Nested Loop (cost=22871.67..32157.33 rows=1 width=43) (actual time=221421.218..221525.056 rows=2571 loops=1)
-> Hash Join (cost=22871.67..32152.80 rows=1 width=16) (actual time=221307.382..221491.019 rows=2593 loops=1)
Hash Cond: ("outer".acct_account_id = "inner".acct_account_fk)
-> Seq Scan on acct_account a (cost=0.00..7456.08 rows=365008 width=8) (actual time=0.032..118.369 rows=61295 loops=1)
-> Hash (cost=22871.67..22871.67 rows=1 width=16) (actual time=221286.733..221286.733 rows=2593 loops=1)
-> Nested Loop Left Join (cost=0.00..22871.67 rows=1 width=16) (actual time=1025.396..221266.357 rows=2593 loops=1)
Join Filter: ("inner".orig_acct_payment_fk = "outer".acct_account_transaction_id)
Filter: ("inner".link_type IS NULL)
-> Seq Scan on acct_account_transaction t (cost=0.00..18222.98 rows=1 width=16) (actual time=949.081..976.432 rows=2596 loops=1)
Filter: ((("type")::text = 'debit'::text) AND ((transaction_status)::text = 'active'::text) AND (date_effective >= '2012-03-01'::date) AND (date_effective < '2012-04-01 00:00:00'::timestamp without time zone))
-> Seq Scan on acct_payment_link l (cost=0.00..4648.68 rows=1 width=15) (actual time=1.073..84.610 rows=169 loops=2596)
Filter: ((link_type)::text ~~ 'return_%'::text)
-> Index Scan using contact_pk on contact c (cost=0.00..4.52 rows=1 width=27) (actual time=0.007..0.008 rows=1 loops=2593)
Index Cond: (c.contact_id = "outer".contact_fk)
-> Index Scan using acct_payment_transaction_fk on acct_payment p (cost=0.00..3.02 rows=1 width=13) (actual time=0.005..0.005 rows=0 loops=2571)
Index Cond: (p.acct_account_transaction_fk = "outer".acct_account_transaction_id)
Filter: ((method)::text <> 'trade'::text)
-> Index Scan using contact_role_pk on contact_role (cost=0.00..4.48 rows=1 width=4) (actual time=0.007..0.007 rows=0 loops=670)
Index Cond: ("outer".contact_id = contact_role.contact_fk)
Filter: (exchange_fk = 74)
Total runtime: 221553.019 ms
Your problem is here:
-> Nested Loop Left Join (cost=0.00..22871.67 rows=1 width=16) (actual time=1025.396..221266.357 rows=2593 loops=1)
Join Filter: ("inner".orig_acct_payment_fk = "outer".acct_account_transaction_id)
Filter: ("inner".link_type IS NULL)
-> Seq Scan on acct_account_transaction t (cost=0.00..18222.98 rows=1 width=16) (actual time=949.081..976.432 rows=2596 loops=1)
Filter: ((("type")::text = 'debit'::text) AND ((transaction_status)::text = 'active'::text) AND (date_effective >= '2012-03-01'::date) AND (date_effective
Seq Scan on acct_payment_link l (cost=0.00..4648.68 rows=1 width=15) (actual time=1.073..84.610 rows=169 loops=2596)
Filter: ((link_type)::text ~~ 'return_%'::text)
It expects to find 1 row in acct_account_transaction, while it finds 2596, and similarly for the other table.
You did not mention Your postgres version (could You?), but this should do the trick:
SELECT DISTINCT
t.date_effective,
t.acct_account_transaction_id,
p.method,
t.amount,
c.business_name,
t.amount
FROM
contact c inner join contact_role on (c.contact_id=contact_role.contact_fk and contact_role.exchange_fk=74),
acct_account a, acct_payment p,
acct_account_transaction t
WHERE
p.acct_account_transaction_fk=t.acct_account_transaction_id
and t.type = 'debit'
and transaction_status = 'active'
and p.method != 'trade'
and t.date_effective >= '2012-03-01'
and t.date_effective < (date '2012-03-01' + interval '1 month')
and c.contact_id=a.contact_fk and a.acct_account_id = t.acct_account_fk
and not exists(
select * from acct_payment_link l
where orig_acct_payment_fk == acct_account_transaction_id
and link_type like 'return_%'
)
ORDER BY
t.date_effective DESC
Also, try setting appropriate statistics target for relevant columns. Link to the friendly manual: http://www.postgresql.org/docs/current/static/sql-altertable.html
What are your indexes, and have you analysed lately? It's doing a table scan on acct_account_transaction even though there are several criteria on that table:
type
date_effective
If there are no indexes on those columns, then a compound one one (type, date_effective) could help (assuming there are lots of rows that don't meet the criteria on those columns).
I remove my first suggestion, as it changes the nature of the query.
I see that there's too much time spent in the LEFT JOIN.
First thing is to try to make only a single scan of the acct_payment_link table. Could you try rewriting your query to:
... LEFT JOIN (SELECT * FROM acct_payment_link
WHERE link_type LIKE 'return_%') AS l ...
You should check your statistics, as there's difference between planned and returned numbers of rows.
You haven't included tables' and indexes' definitions, it'd be good to take a look on those.
You might also want to use contrib/pg_tgrm extension to build index on the acct_payment_link.link_type, but I would make this a last option to try out.
BTW, what is the PostgreSQL version you're using?
Your statement rewritten and formatted:
SELECT DISTINCT
t.date_effective,
t.acct_account_transaction_id,
p.method,
t.amount,
c.business_name,
t.amount
FROM contact c
JOIN contact_role cr ON cr.contact_fk = c.contact_id
JOIN acct_account a ON a.contact_fk = c.contact_id
JOIN acct_account_transaction t ON t.acct_account_fk = a.acct_account_id
JOIN acct_payment p ON p.acct_account_transaction_fk
= t.acct_account_transaction_id
LEFT JOIN acct_payment_link l ON orig_acct_payment_fk
= acct_account_transaction_id
-- missing table-qualification!
AND link_type like 'return_%'
-- missing table-qualification!
WHERE transaction_status = 'active' -- missing table-qualification!
AND cr.exchange_fk = 74
AND t.type = 'debit'
AND t.date_effective >= '2012-03-01'
AND t.date_effective < (date '2012-03-01' + interval '1 month')
AND p.method != 'trade'
AND l.link_type IS NULL
ORDER BY t.date_effective DESC;
Explicit JOIN statements are preferable. I reordered your tables according to your JOIN logic.
Why (date '2012-03-01' + interval '1 month') instead of date '2012-04-01'?
Some table qualifications are missing. In a complex statement like this that's bad style. May be hiding a mistake.
The key to performance are indexes where appropriate, proper configuration of PostgreSQL and accurate statistics.
General advice on performance tuning in the PostgreSQL wiki.