What makes performance vary for the same query? My DB has just ~10 tables and no more than a few thousand rows.
Here's the query:
select
c.first_name,
c.last_name,
c.user_id,
c.photo_url,
s.dialogues
from contributors c
join (
select count(*) dialogues, user_id from (
select contributor_one_uuid user_id from dialogues
union all
select contributor_two_uuid from dialogues
) stats group by user_id
) s on (s.user_id = c.user_id)
where c.visible = true
#1: It takes almost 17 seconds to execute! explain (analyze, buffers) select on the query:
Seq Scan on contributors c (cost=0.00..4205.86 rows=259 width=109) (actual time=0.073..16819.258 rows=260 loops=1)
Filter: visible
Rows Removed by Filter: 13
Buffers: shared hit=3681
SubPlan 1
-> Aggregate (cost=16.06..16.07 rows=1 width=8) (actual time=64.548..64.549 rows=1 loops=260)
Buffers: shared hit=3640
-> Seq Scan on dialogues (cost=0.00..16.05 rows=2 width=0) (actual time=49.155..64.547 rows=1 loops=260)
Filter: ((c.user_id = contributor_one_uuid) OR (c.user_id = contributor_two_uuid))
Rows Removed by Filter: 136
Buffers: shared hit=3640
Planning Time: 0.136 ms
Execution Time: 16819.365 ms
#2. It takes not even a second to execute!
Seq Scan on contributors c (cost=0.00..4205.86 rows=259 width=109) (actual time=0.063..801.278 rows=260 loops=1)
Filter: visible
Rows Removed by Filter: 13
Buffers: shared hit=3681
SubPlan 1
-> Aggregate (cost=16.06..16.07 rows=1 width=8) (actual time=3.080..3.080 rows=1 loops=260)
Buffers: shared hit=3640
-> Seq Scan on dialogues (cost=0.00..16.05 rows=2 width=0) (actual time=0.009..3.079 rows=1 loops=260)
Filter: ((c.user_id = contributor_one_uuid) OR (c.user_id = contributor_two_uuid))
Rows Removed by Filter: 136
Buffers: shared hit=3640
Planning Time: 0.127 ms
Execution Time: 801.379 ms
The engine is performing a SeqScan on contributors and I think that's unavoidable (unless I'm quite mistaken). However, it's also performing a SeqScan on dialogues and this can be prevented with a lateral join.
If the table dialogues has indexes on contributor_one_uuid and also on contributor_two_uuid the query can be rephrased. Hopefully this change can speed it up:
select
c.first_name,
c.last_name,
c.user_id,
c.photo_url,
s.dialogues
from contributors c,
lateral (
select (select count(*) from dialogues d where d.contributor_one_uuid = c.user_id)
+ (select count(*) from dialogues d where d.contributor_two_uuid = c.user_id)
as dialogues
) s on true
Related
Here's the query:
with contrib as (
select
first_name,
last_name,
user_id,
photo_url
from contributors
where visible = true
group by 1,2,3,4
),
dwm as (
select * from dialogues_with_metadata
),
joined as (
select
c.*,
dwm.dialogue_id
from contrib c
left join dwm on c.user_id = dwm.contributor_one_user_id or c.user_id = dwm.contributor_two_user_id
)
select
first_name,
last_name,
user_id,
photo_url,
count(distinct dialogue_id) as dialogues
from joined
group by 1,2,3,4
order by 3 desc
PostgreSQL database.
CPU usage is 10%, so I don't think that's the problem!
I suspect what's slowing things down is the join statement. How might I reconfigure this query so that it doesn't take ~20 seconds to return less than 300 rows?
Here's the contributors table schema:
user_id (primary key - uuid)
username
first_name
last_name
hash
description
blurb
photo_url
blurb_updated_at
visible
And the dialogues table schema:
dialogue_id (uuid)
contributor_one_uuid
contributor_two_uuid
title
image_url
visible
categories
image_source
current_popularity
created_at
override_url
visible
Ran explain select * from dialogues_with_metadata; Here are the results:
Hash Left Join (cost=167.80..172.34 rows=137 width=880)
Hash Cond: (a.dialogue_id = b.writing_dialogue_id)
CTE main
-> Hash Right Join (cost=64.30..111.60 rows=137 width=504)
Hash Cond: (c2.user_id = d.contributor_two_uuid)
-> Seq Scan on contributors c2 (cost=0.00..43.60 rows=260 width=125)
-> Hash (cost=62.59..62.59 rows=137 width=325)
-> Hash Left Join (cost=46.85..62.59 rows=137 width=325)
Hash Cond: (d.contributor_one_uuid = c1.user_id)
-> Seq Scan on dialogues d (cost=0.00..15.37 rows=137 width=216)
-> Hash (cost=43.60..43.60 rows=260 width=125)
-> Seq Scan on contributors c1 (cost=0.00..43.60 rows=260 width=125)
CTE dialogues_with_installment_counts
-> HashAggregate (cost=50.39..52.00 rows=129 width=28)
Group Key: writings.dialogue_id
-> Seq Scan on writings (cost=0.00..46.82 rows=476 width=28)
Filter: finalized
-> CTE Scan on main a (cost=0.00..2.74 rows=137 width=868)
-> Hash (cost=2.58..2.58 rows=129 width=28)
-> CTE Scan on dialogues_with_installment_counts b (cost=0.00..2.58 rows=129 width=28)
EXPLAIN on updated query:
Seq Scan on contributors c (cost=0.00..4012.89 rows=247 width=109)
Filter: visible
SubPlan 1
-> Aggregate (cost=16.06..16.07 rows=1 width=8)
-> Seq Scan on dialogues (cost=0.00..16.05 rows=2 width=0)
Filter: ((c.user_id = contributor_one_uuid) OR (c.user_id = contributor_two_uuid))
explain (analyze, buffers) select on updated query:
Seq Scan on contributors c (cost=0.00..4205.86 rows=259 width=109) (actual time=0.073..16819.258 rows=260 loops=1)
Filter: visible
Rows Removed by Filter: 13
Buffers: shared hit=3681
SubPlan 1
-> Aggregate (cost=16.06..16.07 rows=1 width=8) (actual time=64.548..64.549 rows=1 loops=260)
Buffers: shared hit=3640
-> Seq Scan on dialogues (cost=0.00..16.05 rows=2 width=0) (actual time=49.155..64.547 rows=1 loops=260)
Filter: ((c.user_id = contributor_one_uuid) OR (c.user_id = contributor_two_uuid))
Rows Removed by Filter: 136
Buffers: shared hit=3640
Planning Time: 0.136 ms
Execution Time: 16819.365 ms
A second time (16 seconds faster!):
Seq Scan on contributors c (cost=0.00..4205.86 rows=259 width=109) (actual time=0.063..801.278 rows=260 loops=1)
Filter: visible
Rows Removed by Filter: 13
Buffers: shared hit=3681
SubPlan 1
-> Aggregate (cost=16.06..16.07 rows=1 width=8) (actual time=3.080..3.080 rows=1 loops=260)
Buffers: shared hit=3640
-> Seq Scan on dialogues (cost=0.00..16.05 rows=2 width=0) (actual time=0.009..3.079 rows=1 loops=260)
Filter: ((c.user_id = contributor_one_uuid) OR (c.user_id = contributor_two_uuid))
Rows Removed by Filter: 136
Buffers: shared hit=3640
Planning Time: 0.127 ms
Execution Time: 801.379 ms
This query should do the same without unnesesary grouping statements
select
c.first_name,
c.last_name,
c.user_id,
c.photo_url,
(
select
count(distinct dialogue_id) from dialogues_with_metadata dwm
where
c.user_id = dwm.contributor_one_user_id
or c.user_id = dwm.contributor_two_user_id
) as dialogues
from contributors c
where c.visible = true
Additionaly it's worth checking if counting number of conversations can be done in more efficient way (what indexes are on this table ?).
Query version with dialogues table
select
c.first_name,
c.last_name,
c.user_id,
c.photo_url,
(
select
count(*) from dialogues
where
c.user_id = dialogues.contributor_one_uuid
or c.user_id = dialogues.contributor_two_uuid
) as dialogues
from contributors c
where c.visible = true
Another version
select
c.first_name,
c.last_name,
c.user_id,
c.photo_url,
s.dialogues
from contributors c
join (
select count(*) dialogues, user_id from (
select contributor_one_uuid user_id from dialogues
union all
select contributor_two_uuid from dialogues
) stats group by user_id
) s on (s.user_id = c.user_id)
where c.visible = true
This is my SQL script, I have to join 7 tables
SELECT concat_ws('-', it.item_id, it.model_id) AS product_id,
concat_ws('-', aip.partner_item_id, aip.partner_model_id) AS product_reseller_id,
i.name as item_name,
im.name AS model_name,
p.partner_code,
sum(it.quantity) AS transfer_total,
sum(isb.remaining_item) as remaining_stock,
sum(isb.sold_item) as partner_sold
FROM transfer t
INNER JOIN partner p ON p.reseller_store_id = t.reseller_store_id
INNER JOIN item_transfer it ON t.id = it.transfer_id
INNER JOIN item i ON i.id = it.item_id
INNER JOIN item_model im ON it.model_id = im.id
INNER JOIN affiliate_item_mapping aip on it.item_id = aip.seller_item_id and it.model_id = aip.seller_model_id
and t.reseller_store_id = aip.reseller_store_id
LEFT JOIN inventory_summary_branch isb on isb.inventory_summary_id = concat_ws('-', aip.partner_item_id, aip.partner_model_id)
WHERE p.store_id = 9805
GROUP BY it.item_id, it.model_id, p.partner_code, i.id, im.id, aip.id, isb.inventory_summary_id
This is the result of SQL EXPLAIN:
GroupAggregate (cost=13861.57..13861.62 rows=1 width=885) (actual time=1890.392..1890.525 rows=15 loops=1)
Group Key: it.item_id, it.model_id, p.partner_code, i.id, im.id, aip.id, isb.inventory_summary_id
Buffers: shared hit=118610
-> Sort (cost=13861.57..13861.58 rows=1 width=765) (actual time=1890.310..1890.338 rows=21 loops=1)
Sort Key: it.item_id, it.model_id, p.partner_code, aip.id, isb.inventory_summary_id
Sort Method: quicksort Memory: 28kB
Buffers: shared hit=118610
-> Nested Loop (cost=1.27..13861.56 rows=1 width=765) (actual time=73.156..1890.057 rows=21 loops=1)
Buffers: shared hit=118610
-> Nested Loop (cost=0.85..13853.14 rows=1 width=753) (actual time=73.134..1889.495 rows=21 loops=1)
Buffers: shared hit=118526
-> Nested Loop (cost=0.43..13845.32 rows=1 width=609) (actual time=73.099..1888.733 rows=21 loops=1)
Join Filter: ((p.reseller_store_id = t.reseller_store_id) AND (it.transfer_id = t.id))
Rows Removed by Join Filter: 2142
Buffers: shared hit=118442
-> Nested Loop (cost=0.43..13840.24 rows=1 width=633) (actual time=72.793..1879.961 rows=21 loops=1)
Join Filter: ((aip.seller_item_id = it.item_id) AND (aip.seller_model_id = it.model_id))
Rows Removed by Join Filter: 6003
Buffers: shared hit=118379
-> Nested Loop Left Join (cost=0.43..13831.47 rows=1 width=601) (actual time=72.093..1861.415 rows=24 loops=1)
Buffers: shared hit=118307
-> Nested Loop (cost=0.00..11.44 rows=1 width=572) (actual time=0.042..0.696 rows=24 loops=1)
Join Filter: (p.reseller_store_id = aip.reseller_store_id)
Rows Removed by Join Filter: 150
Buffers: shared hit=7
-> Seq Scan on partner p (cost=0.00..10.38 rows=1 width=524) (actual time=0.026..0.039 rows=6 loops=1)
Filter: (store_id = 9805)
Buffers: shared hit=1
-> Seq Scan on affiliate_item_mapping aip (cost=0.00..1.03 rows=3 width=48) (actual time=0.006..0.043 rows=29 loops=6)
Buffers: shared hit=6
-> Index Scan using branch_id_inventory_summary_id_inventory_summary_branch on inventory_summary_branch isb (cost=0.43..13820.01 rows=1 width=29) (actual time=77.498..77.498 rows=0 loops=24)
Index Cond: ((inventory_summary_id)::text = concat_ws('-'::text, aip.partner_item_id, aip.partner_model_id))
Buffers: shared hit=118300
-> Seq Scan on item_transfer it (cost=0.00..5.31 rows=231 width=32) (actual time=0.024..0.391 rows=251 loops=24)
Buffers: shared hit=72
-> Seq Scan on transfer t (cost=0.00..3.83 rows=83 width=16) (actual time=0.011..0.256 rows=103 loops=21)
Buffers: shared hit=63
-> Index Scan using pk_item on item i (cost=0.42..7.81 rows=1 width=152) (actual time=0.022..0.023 rows=1 loops=21)
Index Cond: (id = it.item_id)
Buffers: shared hit=84
-> Index Scan using pk_item_model on item_model im (cost=0.43..8.41 rows=1 width=20) (actual time=0.016..0.018 rows=1 loops=21)
Index Cond: (id = it.model_id)
Buffers: shared hit=84
Planning time: 10.051 ms
Execution time: 1890.943 ms
Of course, this statement works fine, but it's slow. Is there a better way to write this code?
How can I improve the performance? Join or sub-query is better in this case? Anyone, please give me a hand
2 things can help you
do VACCUME ANALYZE for all the tables involved.
create indexe on item_transfer.item_id & model_id
Essentially all of your time (77.498*24) is spend on the index scan of branch_id_inventory_summary_id_inventory_summary_branch.
About the only explanation I can see for this is that the index isn't suited to the query, and it is being full-index scanned (in lieu of full scanning the table), rather than being efficiently scanned. This probably means the index includes the column inventory_summary_id, but it is not the leading column. (It would be nice if EXPLAIN were to make this inefficient type of usage clearer than it currently does).
You would probably benefit from an index such as on inventory_summary_branch (inventory_summary_id) which has a better chance of being used efficiently.
I don't know why it wouldn't just do a hash join of that table. Maybe your work_mem is too low.
Inner joins will always be slower, especially with so many tables.
You could change from an inner join on the whole table to just the columns you need and see if that improves it at all:
From:
INNER JOIN partner p ON p.reseller_store_id = t.reseller_store_id
To:
inner join (select id, partner_code from partner) as p ON p.reseller_store_id = t.reseller_store_id
See if that speeds things up at all.
If not I would recommend indexes on the keys
We run a join query between 2 tables.
The query has an OR statement that compares one column from the left table and one column from the right table. The query performance is very low, and we fixed it by changing the OR to UNION.
Why is this happening? I'm looking for a detailed explanation or a reference to the documentation that might shed a light on the issue.
Query with Or Statment:
db1=# explain analyze select count(*)
from conversations
join agents on conversations.agent_id=agents.id
where conversations.id=1 or agents.id = '123';
**Query plan**
----------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=**11017.95..11017.96** rows=1 width=8) (actual time=54.088..54.088 rows=1 loops=1)
-> Gather (cost=11017.73..11017.94 rows=2 width=8) (actual time=53.945..57.181 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=10017.73..10017.74 rows=1 width=8) (actual time=48.303..48.303 rows=1 loops=3)
-> Hash Join (cost=219.26..10016.69 rows=415 width=0) (actual time=5.292..48.287 rows=130 loops=3)
Hash Cond: (conversations.agent_id = agents.id)
Join Filter: ((conversations.id = 1) OR ((agents.id)::text = '123'::text))
Rows Removed by Join Filter: 80035
-> Parallel Seq Scan on conversations (cost=0.00..9366.95 rows=163995 width=8) (actual time=0.017..14.972 rows=131196 loops=3)
-> Hash (cost=143.56..143.56 rows=6056 width=16) (actual time=2.686..2.686 rows=6057 loops=3)
Buckets: 8192 Batches: 1 Memory Usage: 353kB
-> Seq Scan on agents (cost=0.00..143.56 rows=6056 width=16) (actual time=0.011..1.305 rows=6057 loops=3)
Planning time: 0.710 ms
Execution time: 57.276 ms
(15 rows)
Changing the OR to UNION:
db1=# explain analyze select count(*) from (
select *
from conversations
join agents on conversations.agent_id=agents.id
where conversations.installation_id=1
union
select *
from conversations
join agents on conversations.agent_id=agents.id
where agents.source_id = '123') as subquery;
**Query plan:**
----------------------------------------------------------------------------------------------------------------------------------
Aggregate (**cost=1114.31..1114.32** rows=1 width=8) (actual time=8.038..8.038 rows=1 loops=1)
-> HashAggregate (cost=1091.90..1101.86 rows=996 width=1437) (actual time=7.783..8.009 rows=390 loops=1)
Group Key: conversations.id, conversations.created, conversations.modified, conversations.source_created, conversations.source_id, conversations.installation_id, bra
in_conversation.resolution_reason, conversations.solve_time, conversations.agent_id, conversations.submission_reason, conversations.is_marked_as_duplicate, conversations.n
um_back_and_forths, conversations.is_closed, conversations.is_solved, conversations.conversation_type, conversations.related_ticket_source_id, conversations.channel, brain_convers
ation.last_updated_from_platform, conversations.csat, agents.id, agents.created, agents.modified, agents.name, agents.source_id, organizati
on_agent.installation_id, agents.settings
-> Append (cost=219.68..1027.16 rows=996 width=1437) (actual time=5.517..6.307 rows=390 loops=1)
-> Hash Join (cost=219.68..649.69 rows=931 width=224) (actual time=5.516..6.063 rows=390 loops=1)
Hash Cond: (conversations.agent_id = agents.id)
-> Index Scan using conversations_installation_id_b3ff5c00 on conversations (cost=0.42..427.98 rows=931 width=154) (actual time=0.039..0.344 rows=879 loops=1)
Index Cond: (installation_id = 1)
-> Hash (cost=143.56..143.56 rows=6056 width=70) (actual time=5.394..5.394 rows=6057 loops=1)
Buckets: 8192 Batches: 1 Memory Usage: 710kB
-> Seq Scan on agents (cost=0.00..143.56 rows=6056 width=70) (actual time=0.014..1.938 rows=6057 loops=1)
-> Nested Loop (cost=0.70..367.52 rows=65 width=224) (actual time=0.210..0.211 rows=0 loops=1)
-> Index Scan using agents_source_id_106c8103_like on agents agents_1 (cost=0.28..8.30 rows=1 width=70) (actual time=0.210..0.210 rows=0 loops=1)
Index Cond: ((source_id)::text = '123'::text)
-> Index Scan using conversations_agent_id_de76554b on conversations conversations_1 (cost=0.42..358.12 rows=110 width=154) (never executed)
Index Cond: (agent_id = agents_1.id)
Planning time: 2.024 ms
Execution time: 9.367 ms
(18 rows)
Yes. or has a way of killing the performance of queries. For this query:
select count(*)
from conversations c join
agents a
on c.agent_id = a.id
where c.id = 1 or a.id = 123;
Note I removed the quotes around 123. It looks like a number so I assume it is. For this query, you want an index on conversations(agent_id).
Probably the most effective way to write the query is:
select count(*)
from ((select 1
from conversations c join
agents a
on c.agent_id = a.id
where c.id = 1
) union all
(select 1
from conversations c join
agents a
on c.agent_id = a.id
where a.id = 123 and c.id <> 1
)
) ac;
Note the use of union all rather than union. The additional where condition eliminates duplicates.
This can take advantage of the following indexes:
conversations(id, agent_id)
agents(id)
conversations(agent_id, id)
I have a query like this where join ~6000 values
SELECT DISTINCT ON(user_id)
user_id,
finished_at as last_deposit_date,
CASE When currency = 'RUB' Then amount_cents END as last_deposit_amount_cents
FROM payments
JOIN (VALUES (5),(22),(26)) --~6000 values
AS v(user_id) USING (user_id)
WHERE action = 'deposit'
AND success = 't'
AND currency IN ('RUB')
ORDER BY user_id, finished_at DESC
QUERY PLAN for query with many VALUES:
Unique (cost=444606.97..449760.44 rows=19276 width=24) (actual time=6129.403..6418.317 rows=5991 loops=1)
Buffers: shared hit=2386527, temp read=7807 written=7808
-> Sort (cost=444606.97..447183.71 rows=1030695 width=24) (actual time=6129.401..6295.457 rows=1877039 loops=1)
Sort Key: payments.user_id, payments.finished_at DESC
Sort Method: external merge Disk: 62456kB
Buffers: shared hit=2386527, temp read=7807 written=7808
-> Nested Loop (cost=0.43..341665.35 rows=1030695 width=24) (actual time=0.612..5085.376 rows=1877039 loops=1)
Buffers: shared hit=2386521
-> Values Scan on "*VALUES*" (cost=0.00..75.00 rows=6000 width=4) (actual time=0.002..4.507 rows=6000 loops=1)
-> Index Scan using index_payments_on_user_id on payments (cost=0.43..54.78 rows=172 width=28) (actual time=0.010..0.793 rows=313 loops=6000)
Index Cond: (user_id = "*VALUES*".column1)
Filter: (success AND ((action)::text = 'deposit'::text) AND ((currency)::text = 'RUB'::text))
Rows Removed by Filter: 85
Buffers: shared hit=2386521
Planning time: 5.886 ms
Execution time: 6429.685 ms
I use PosgreSQL 10.8.0. Is there any chance to speed up this query?
I tried replacing DISTINCT with recursion:
WITH RECURSIVE t AS (
(SELECT min(user_id) AS user_id FROM payments)
UNION ALL
SELECT (SELECT min(user_id) FROM payments
WHERE user_id > t.user_id
) AS user_id FROM
t
WHERE t.user_id IS NOT NULL
)
SELECT payments.* FROM t
JOIN (VALUES (5),(22),(26)) --~6000 VALUES
AS v(user_id) USING (user_id)
, LATERAL (
SELECT user_id,
finished_at as last_deposit_date,
CASE When currency = 'RUB' Then amount_cents END as last_deposit_amount_cents FROM payments
WHERE payments.user_id=t.user_id
AND action = 'deposit'
AND success = 't'
AND currency IN ('RUB')
ORDER BY finished_at DESC LIMIT 1
) AS payments
WHERE t.user_id IS NOT NULL;
But it turned out even slower.
Hash Join (cost=418.67..21807.22 rows=3000 width=24) (actual time=16.804..10843.174 rows=5991 loops=1)
Hash Cond: (t.user_id = "VALUES".column1)
Buffers: shared hit=6396763
CTE t
-> Recursive Union (cost=0.46..53.73 rows=101 width=8) (actual time=0.142..1942.351 rows=237029 loops=1)
Buffers: shared hit=864281
-> Result (cost=0.46..0.47 rows=1 width=8) (actual time=0.141..0.142 rows=1 loops=1)
Buffers: shared hit=4
InitPlan 3 (returns $1)
-> Limit (cost=0.43..0.46 rows=1 width=8) (actual time=0.138..0.139 rows=1 loops=1)
Buffers: shared hit=4
-> Index Only Scan using index_payments_on_user_id on payments payments_2 (cost=0.43..155102.74 rows=4858092 width=8) (actual time=0.137..0.138 rows=1 loops=1)
Index Cond: (user_id IS NOT NULL)
Heap Fetches: 0
Buffers: shared hit=4
-> WorkTable Scan on t t_1 (cost=0.00..5.12 rows=10 width=8) (actual time=0.008..0.008 rows=1 loops=237029)
Filter: (user_id IS NOT NULL)
Rows Removed by Filter: 0
Buffers: shared hit=864277
SubPlan 2
-> Result (cost=0.48..0.49 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=237028)
Buffers: shared hit=864277
InitPlan 1 (returns $3)
-> Limit (cost=0.43..0.48 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=237028)
Buffers: shared hit=864277
-> Index Only Scan using index_payments_on_user_id on payments payments_1 (cost=0.43..80786.25 rows=1619364 width=8) (actual time=0.007..0.007 rows=1 loops=237028)
Index Cond: ((user_id IS NOT NULL) AND (user_id > t_1.user_id))
Heap Fetches: 46749
Buffers: shared hit=864277
-> Nested Loop (cost=214.94..21498.23 rows=100 width=32) (actual time=0.475..10794.535 rows=167333 loops=1)
Buffers: shared hit=6396757
-> CTE Scan on t (cost=0.00..2.02 rows=100 width=8) (actual time=0.145..1998.788 rows=237028 loops=1)
Filter: (user_id IS NOT NULL)
Rows Removed by Filter: 1
Buffers: shared hit=864281
-> Limit (cost=214.94..214.94 rows=1 width=24) (actual time=0.037..0.037 rows=1 loops=237028)
Buffers: shared hit=5532476
-> Sort (cost=214.94..215.37 rows=172 width=24) (actual time=0.036..0.036 rows=1 loops=237028)
Sort Key: payments.finished_at DESC
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=5532476
-> Index Scan using index_payments_on_user_id on payments (cost=0.43..214.08 rows=172 width=24) (actual time=0.003..0.034 rows=15 loops=237028)
Index Cond: (user_id = t.user_id)
Filter: (success AND ((action)::text = 'deposit'::text) AND ((currency)::text = 'RUB'::text))
Rows Removed by Filter: 6
Buffers: shared hit=5532473
-> Hash (cost=75.00..75.00 rows=6000 width=4) (actual time=2.255..2.255 rows=6000 loops=1)
Buckets: 8192 Batches: 1 Memory Usage: 275kB
-> Values Scan on "VALUES" (cost=0.00..75.00 rows=6000 width=4) (actual time=0.004..1.206 rows=6000 loops=1)
Planning time: 7.029 ms
Execution time: 10846.774 ms
For this query:
SELECT DISTINCT ON (user_id)
p.user_id,
p.finished_at as last_deposit_date,
(CASE WHEN p.currency = 'RUB' THEN p.amount_cents END) as last_deposit_amount_cents
FROM payments p JOIN
(VALUES (5),( 22), (26) --~6000 values
) v(user_id)
USING (user_id)
WHERE p.action = 'deposit' AND
p.success = 't' ND
p.currency = 'RUB'
ORDER BY p.user_id, p.finished_at DESC;
I don't fully understand the CASE expression, because the WHERE is filtering out all other values.
That said, I would expect an index on (action, success, currency, user_id, finished_at desc) to be helpful.
Here's my sql, followed by the explanation. I need to improve the performance. Any ideas?
PostgreSQL 9.3.12 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4, 64-bit
explain analyze
SELECT DISTINCT "apts"."id", practices.name AS alias_0
FROM "apts"
LEFT OUTER JOIN "patients" ON "patients"."id" = "apts"."patient_id"
LEFT OUTER JOIN "practices" ON "practices"."id" = "apts"."practice_id"
LEFT OUTER JOIN "eligibility_messages" ON "eligibility_messages"."apt_id" = "apts"."id"
WHERE (apts.eligibility_status_id != 1)
AND (eligibility_messages.current = 't')
AND (practices.id = '104')
ORDER BY practices.name desc
LIMIT 25 OFFSET 0
Limit (cost=881321.34..881321.41 rows=25 width=20) (actual time=2928.225..2928.227 rows=25 loops=1)
-> Sort (cost=881321.34..881391.94 rows=28240 width=20) (actual time=2928.223..2928.224 rows=25 loops=1)
Sort Key: practices.name
Sort Method: top-N heapsort Memory: 26kB
-> HashAggregate (cost=880242.03..880524.43 rows=28240 width=20) (actual time=2927.213..2927.319 rows=520 loops=1)
-> Nested Loop (cost=286614.55..880100.83 rows=28240 width=20) (actual time=206.180..2926.791 rows=520 loops=1)
-> Seq Scan on practices (cost=0.00..6.36 rows=1 width=20) (actual time=0.018..0.031 rows=1 loops=1)
Filter: (id = 104)
Rows Removed by Filter: 108
-> Hash Join (cost=286614.55..879812.07 rows=28240 width=8) (actual time=206.159..2926.643 rows=520 loops=1)
Hash Cond: (eligibility_messages.apt_id = apts.id)
-> Seq Scan on eligibility_messages (cost=0.00..561275.63 rows=2029532 width=4) (actual time=0.691..2766.867 rows=67559 loops=1)
Filter: current
Rows Removed by Filter: 3924633
-> Hash (cost=284614.02..284614.02 rows=115082 width=12) (actual time=121.957..121.957 rows=91660 loops=1)
Buckets: 16384 Batches: 2 Memory Usage: 1974kB
-> Bitmap Heap Scan on apts (cost=8296.88..284614.02 rows=115082 width=12) (actual time=19.927..91.038 rows=91660 loops=1)
Recheck Cond: (practice_id = 104)
Filter: (eligibility_status_id <> 1)
Rows Removed by Filter: 80169
-> Bitmap Index Scan on index_apts_on_practice_id (cost=0.00..8268.11 rows=177540 width=0) (actual time=16.856..16.856 rows=179506 loops=1)
Index Cond: (practice_id = 104)
Total runtime: 2928.361 ms
First, rewrite the query to a more manageable form:
SELECT DISTINCT a."id", pr.name AS alias_0
FROM "apts" a JOIN
"practices" pr
ON pr."id" = a."practice_id" JOIN
"eligibility_messages" em
ON em."apt_id" = a."id"
WHERE (a.eligibility_status_id <> 1) AND
(em.current = 't') AND
(a.practice_id = 104)
ORDER BY pr.name desc ;
Notes:
The WHERE clause turns the outer joins into inner joins anyway, so you might as well express them correctly.
I doubt pr.id is actually a string
The patients table isn't used, so I just removed it.
Perhaps you don't even need the select distinct any more.
Switched the condition in the where to apts rather than practices.
If this isn't fast enough, you want indexes, probably on apts(practice_id, eligibility_status_id, id), practices(id), and eligibility_messages(apt_id, current).