I want to check if the order of JOINS is important in a SQL Querie for runtime and efficiency.
I'm using PostgreSQL and for my check I used the example world db from MYSQL (https://downloads.mysql.com/docs/world.sql.zip) and wrote these two statements:
Querie 1:
EXPLAIN ANALYSE SELECT * FROM countrylanguage
JOIN city ON city.countrycode = countrylanguage.countrycode
JOIN country c ON city.countrycode = c.code
Querie 2:
EXPLAIN ANALYSE SELECT * FROM city
JOIN country c ON c.code = city.countrycode
JOIN countrylanguage c2 on c.code = c2.countrycode
Query Plan 1:
Hash Join (cost=41.14..484.78 rows=29946 width=161) (actual time=1.472..17.602 rows=30670 loops=1)
Hash Cond: (city.countrycode = countrylanguage.countrycode)
-> Seq Scan on city (cost=0.00..72.79 rows=4079 width=31) (actual time=0.062..1.220 rows=4079 loops=1)
-> Hash (cost=28.84..28.84 rows=984 width=130) (actual time=1.378..1.378 rows=984 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 172kB
-> Hash Join (cost=10.38..28.84 rows=984 width=130) (actual time=0.267..0.823 rows=984 loops=1)
Hash Cond: (countrylanguage.countrycode = c.code)
-> Seq Scan on countrylanguage (cost=0.00..15.84 rows=984 width=17) (actual time=0.029..0.158 rows=984 loops=1)
-> Hash (cost=7.39..7.39 rows=239 width=113) (actual time=0.220..0.220 rows=239 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 44kB
-> Seq Scan on country c (cost=0.00..7.39 rows=239 width=113) (actual time=0.013..0.137 rows=239 loops=1)
Planning Time: 3.818 ms
Execution Time: 18.801 ms
Query Plan 2:
Hash Join (cost=41.14..312.47 rows=16794 width=161) (actual time=2.415..18.628 rows=30670 loops=1)
Hash Cond: (city.countrycode = c.code)
-> Seq Scan on city (cost=0.00..72.79 rows=4079 width=31) (actual time=0.032..0.574 rows=4079 loops=1)
-> Hash (cost=28.84..28.84 rows=984 width=130) (actual time=2.364..2.364 rows=984 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 171kB
-> Hash Join (cost=10.38..28.84 rows=984 width=130) (actual time=0.207..1.307 rows=984 loops=1)
Hash Cond: (c2.countrycode = c.code)
-> Seq Scan on countrylanguage c2 (cost=0.00..15.84 rows=984 width=17) (actual time=0.027..0.204 rows=984 loops=1)
-> Hash (cost=7.39..7.39 rows=239 width=113) (actual time=0.163..0.163 rows=239 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 44kB
-> Seq Scan on country c (cost=0.00..7.39 rows=239 width=113) (actual time=0.015..0.049 rows=239 loops=1)
Planning Time: 1.901 ms
Execution Time: 19.694 ms
The estimated costs and rows are different and the last hash condition is different. Does this mean, that the Query Planner hasn't done the same thing for both queries, or am I on the wrong track?
Thanks for your help!
The issue is not the order of the joins but that the join conditions are different -- referring to different tables.
In the first query, you are joining to countrylanguage using the country code from city. In the second, you are using the country code from country.
With inner joins this should not make a difference to the final result. However, it clearly does affect how the optimizer considers different paths.
(as said) the queries are not identical
though not identical, the plans are comparable
both queries execute in 18ms, comparing them is close to useless
queries on tables with insufficient structure(keys,indexes,statistics), but with a small enough footprint (work_mem) will always result in hash-joins.
Related
We run a join query between 2 tables.
The query has an OR statement that compares one column from the left table and one column from the right table. The query performance is very low, and we fixed it by changing the OR to UNION.
Why is this happening? I'm looking for a detailed explanation or a reference to the documentation that might shed a light on the issue.
Query with Or Statment:
db1=# explain analyze select count(*)
from conversations
join agents on conversations.agent_id=agents.id
where conversations.id=1 or agents.id = '123';
**Query plan**
----------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=**11017.95..11017.96** rows=1 width=8) (actual time=54.088..54.088 rows=1 loops=1)
-> Gather (cost=11017.73..11017.94 rows=2 width=8) (actual time=53.945..57.181 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=10017.73..10017.74 rows=1 width=8) (actual time=48.303..48.303 rows=1 loops=3)
-> Hash Join (cost=219.26..10016.69 rows=415 width=0) (actual time=5.292..48.287 rows=130 loops=3)
Hash Cond: (conversations.agent_id = agents.id)
Join Filter: ((conversations.id = 1) OR ((agents.id)::text = '123'::text))
Rows Removed by Join Filter: 80035
-> Parallel Seq Scan on conversations (cost=0.00..9366.95 rows=163995 width=8) (actual time=0.017..14.972 rows=131196 loops=3)
-> Hash (cost=143.56..143.56 rows=6056 width=16) (actual time=2.686..2.686 rows=6057 loops=3)
Buckets: 8192 Batches: 1 Memory Usage: 353kB
-> Seq Scan on agents (cost=0.00..143.56 rows=6056 width=16) (actual time=0.011..1.305 rows=6057 loops=3)
Planning time: 0.710 ms
Execution time: 57.276 ms
(15 rows)
Changing the OR to UNION:
db1=# explain analyze select count(*) from (
select *
from conversations
join agents on conversations.agent_id=agents.id
where conversations.installation_id=1
union
select *
from conversations
join agents on conversations.agent_id=agents.id
where agents.source_id = '123') as subquery;
**Query plan:**
----------------------------------------------------------------------------------------------------------------------------------
Aggregate (**cost=1114.31..1114.32** rows=1 width=8) (actual time=8.038..8.038 rows=1 loops=1)
-> HashAggregate (cost=1091.90..1101.86 rows=996 width=1437) (actual time=7.783..8.009 rows=390 loops=1)
Group Key: conversations.id, conversations.created, conversations.modified, conversations.source_created, conversations.source_id, conversations.installation_id, bra
in_conversation.resolution_reason, conversations.solve_time, conversations.agent_id, conversations.submission_reason, conversations.is_marked_as_duplicate, conversations.n
um_back_and_forths, conversations.is_closed, conversations.is_solved, conversations.conversation_type, conversations.related_ticket_source_id, conversations.channel, brain_convers
ation.last_updated_from_platform, conversations.csat, agents.id, agents.created, agents.modified, agents.name, agents.source_id, organizati
on_agent.installation_id, agents.settings
-> Append (cost=219.68..1027.16 rows=996 width=1437) (actual time=5.517..6.307 rows=390 loops=1)
-> Hash Join (cost=219.68..649.69 rows=931 width=224) (actual time=5.516..6.063 rows=390 loops=1)
Hash Cond: (conversations.agent_id = agents.id)
-> Index Scan using conversations_installation_id_b3ff5c00 on conversations (cost=0.42..427.98 rows=931 width=154) (actual time=0.039..0.344 rows=879 loops=1)
Index Cond: (installation_id = 1)
-> Hash (cost=143.56..143.56 rows=6056 width=70) (actual time=5.394..5.394 rows=6057 loops=1)
Buckets: 8192 Batches: 1 Memory Usage: 710kB
-> Seq Scan on agents (cost=0.00..143.56 rows=6056 width=70) (actual time=0.014..1.938 rows=6057 loops=1)
-> Nested Loop (cost=0.70..367.52 rows=65 width=224) (actual time=0.210..0.211 rows=0 loops=1)
-> Index Scan using agents_source_id_106c8103_like on agents agents_1 (cost=0.28..8.30 rows=1 width=70) (actual time=0.210..0.210 rows=0 loops=1)
Index Cond: ((source_id)::text = '123'::text)
-> Index Scan using conversations_agent_id_de76554b on conversations conversations_1 (cost=0.42..358.12 rows=110 width=154) (never executed)
Index Cond: (agent_id = agents_1.id)
Planning time: 2.024 ms
Execution time: 9.367 ms
(18 rows)
Yes. or has a way of killing the performance of queries. For this query:
select count(*)
from conversations c join
agents a
on c.agent_id = a.id
where c.id = 1 or a.id = 123;
Note I removed the quotes around 123. It looks like a number so I assume it is. For this query, you want an index on conversations(agent_id).
Probably the most effective way to write the query is:
select count(*)
from ((select 1
from conversations c join
agents a
on c.agent_id = a.id
where c.id = 1
) union all
(select 1
from conversations c join
agents a
on c.agent_id = a.id
where a.id = 123 and c.id <> 1
)
) ac;
Note the use of union all rather than union. The additional where condition eliminates duplicates.
This can take advantage of the following indexes:
conversations(id, agent_id)
agents(id)
conversations(agent_id, id)
I have following query:
SELECT
Sum(fact_individual_re.quality_hours) AS C0,
dim_gender.name AS C1,
dim_date.year AS C2
FROM
fact_individual_re
INNER JOIN dim_date ON fact_individual_re.dim_date_id = dim_date.id
INNER JOIN dim_gender ON fact_individual_re.dim_gender_id = dim_gender.id
GROUP BY dim_date.year,dim_gender.name
ORDER BY dim_date.year ASC,dim_gender.name ASC,Sum(fact_individual_re.quality_hours) ASC
When explaining it's plan, HASH JOIN is taking most time. Is there any way to minimize the time for HASH JOIN:
Sort (cost=190370.50..190370.55 rows=20 width=18) (actual time=4005.152..4005.154 rows=20 loops=1)
Sort Key: dim_date.year, dim_gender.name, (sum(fact_individual_re.quality_hours))
Sort Method: quicksort Memory: 26kB
-> Finalize GroupAggregate (cost=190369.07..190370.07 rows=20 width=18) (actual time=4005.106..4005.135 rows=20 loops=1)
Group Key: dim_date.year, dim_gender.name
-> Sort (cost=190369.07..190369.27 rows=80 width=18) (actual time=4005.100..4005.103 rows=100 loops=1)
Sort Key: dim_date.year, dim_gender.name
Sort Method: quicksort Memory: 32kB
-> Gather (cost=190358.34..190366.54 rows=80 width=18) (actual time=4004.966..4005.020 rows=100 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Partial HashAggregate (cost=189358.34..189358.54 rows=20 width=18) (actual time=3885.254..3885.259 rows=20 loops=5)
Group Key: dim_date.year, dim_gender.name
-> Hash Join (cost=125.17..170608.34 rows=2500000 width=14) (actual time=2.279..2865.808 rows=2000000 loops=5)
Hash Cond: (fact_individual_re.dim_gender_id = dim_gender.id)
-> Hash Join (cost=124.13..150138.54 rows=2500000 width=12) (actual time=2.060..2115.234 rows=2000000 loops=5)
Hash Cond: (fact_individual_re.dim_date_id = dim_date.id)
-> Parallel Seq Scan on fact_individual_re (cost=0.00..118458.00 rows=2500000 width=12) (actual time=0.204..982.810 rows=2000000 loops=5)
-> Hash (cost=78.50..78.50 rows=3650 width=8) (actual time=1.824..1.824 rows=3650 loops=5)
Buckets: 4096 Batches: 1 Memory Usage: 175kB
-> Seq Scan on dim_date (cost=0.00..78.50 rows=3650 width=8) (actual time=0.143..1.030 rows=3650 loops=5)
-> Hash (cost=1.02..1.02 rows=2 width=10) (actual time=0.193..0.193 rows=2 loops=5)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on dim_gender (cost=0.00..1.02 rows=2 width=10) (actual time=0.181..0.182 rows=2 loops=5)
Planning time: 0.609 ms
Execution time: 4020.423 ms
(26 rows)
I am using postgresql v10.
I'd recommend to partially group the rows before the join:
select
sum(quality_hours_sum) AS C0,
dim_gender.name AS C1,
dim_date.year AS C2
from
(
select
sum(quality_hours) as quality_hours_sum,
dim_date_id,
dim_gender_id
from fact_individual_re
group by dim_date_id, dim_gender_id
) as fact_individual_re_sum
join dim_date on dim_date_id = dim_date.id
join dim_gender on dim_gender_id = dim_gender.id
group by dim_date.year, dim_gender.name
order by dim_date.year, dim_gender.name, 0;
This way you will be joining only 1460 rows (count(distinct dim_date_id)*count(distint dim_gender_id)) instead of all 2M rows. Although it would still need to read and group all 2M rows - to avoid that you'd need something like summary table maintained with a trigger.
There is no predicate shown on the fact table, so we can assume prior to the filtering via the joins 100% of that table is required.
The indexes exist on the lookup tables, but are not covering indexes from what you say. Given 100% of the fact table being scanned, combined with the index not being covering, I would expect it to hash join.
As an experiment, you could apply a covering index (index the dim_date.date_id and dim_date.year in a single index) to see if it swaps off a hash join against dim_date.
With the overall lack of predicates though - outside of a covering index, a hash join is not necessarily the wrong query plan.
I have the following two queries.Query 1 is fast since it uses indexes(uses nested loop join) and Query 2 uses hash join and it is slower.
Query 1 does order by on table 1 column and Query 2 does order by using table 2 column.
Query 1
learning=# explain analyze
select *
from users left join
access_logs
on users.userid = access_logs.userid
order by users.userid
limit 10 offset 90;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=14.00..15.46 rows=10 width=104) (actual time=1.330..1.504 rows=10 loops=1)
-> Merge Left Join (cost=0.85..291532.97 rows=1995958 width=104) (actual time=0.037..1.482 rows=100 loops=1)
Merge Cond: (users.userid = access_logs.userid)
-> Index Scan using users_pkey on users (cost=0.43..151132.75 rows=1995958 width=76) (actual time=0.018..1.135 rows=100 loops=1)
-> Index Scan using access_logs_userid_idx on access_logs (cost=0.43..110471.45 rows=1995958 width=28) (actual time=0.012..0.198 rows=100 loops=1)
Planning time: 0.469 ms
Execution time: 1.569 ms
Query 2
learning=# explain analyze
select *
from users left join
access_logs
on users.userid = access_logs.userid
order by access_logs.userid
limit 10 offset 90;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=293584.20..293584.23 rows=10 width=104) (actual time=3821.432..3821.439 rows=10 loops=1)
-> Sort (cost=293583.98..298573.87 rows=1995958 width=104) (actual time=3821.391..3821.415 rows=100 loops=1)
Sort Key: access_logs.userid
Sort Method: top-N heapsort Memory: 51kB
-> Hash Left Join (cost=73231.06..217299.90 rows=1995958 width=104) (actual time=539.859..3168.754 rows=1995958 loops=1)
Hash Cond: (users.userid = access_logs.userid)
-> Seq Scan on users (cost=0.00..44814.58 rows=1995958 width=76) (actual time=0.009..443.260 rows=1995958 loops=1)
-> Hash (cost=34636.58..34636.58 rows=1995958 width=28) (actual time=539.112..539.112 rows=1995958 loops=1)
Buckets: 262144 Batches: 2 Memory Usage: 58532kB
-> Seq Scan on access_logs (cost=0.00..34636.58 rows=1995958 width=28) (actual time=0.006..170.061 rows=1995958 loops=1)
Planning time: 0.480 ms
Execution time: 3832.245 ms
Questions
The second query is slow since the sorting is done before the join as in the plan.
Why does the sort in the second table not use the index? There is a plan below with just the sort.
Query - explain analyze select * from access_logs order by userid limit 10 offset 90;
Plan
Limit (cost=5.41..5.96 rows=10 width=28) (actual time=0.199..0.218 rows=10 loops=1)
-> Index Scan using access_logs_userid_idx on access_logs (cost=0.43..110471.45 rows=1995958 width=28) (actual time=0.029..0.201 rows=100 loops=1)
Planning time: 0.120 ms
Execution time: 0.252 ms
Edit 1:
My goal is not to compare both queries, in fact I want the result as in query 2,I just provided query 1 so that I can understand in comparison.
The order by is not limited to the join column, the user can also do order by another column in table 2, plans are below.
learning=# explain analyze select * from users left join access_logs on users.userid=access_logs.userid order by access_logs.last_login limit 10;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=260431.83..260431.86 rows=10 width=104) (actual time=3846.625..3846.627 rows=10 loops=1)
-> Sort (cost=260431.83..265421.73 rows=1995958 width=104) (actual time=3846.623..3846.623 rows=10 loops=1)
Sort Key: access_logs.last_login
Sort Method: top-N heapsort Memory: 27kB
-> Hash Left Join (cost=73231.06..217299.90 rows=1995958 width=104) (actual time=567.104..3174.818 rows=1995958 loops=1)
Hash Cond: (users.userid = access_logs.userid)
-> Seq Scan on users (cost=0.00..44814.58 rows=1995958 width=76) (actual time=0.007..443.364 rows=1995958 loops=1)
-> Hash (cost=34636.58..34636.58 rows=1995958 width=28) (actual time=566.814..566.814 rows=1995958 loops=1)
Buckets: 262144 Batches: 2 Memory Usage: 58532kB
-> Seq Scan on access_logs (cost=0.00..34636.58 rows=1995958 width=28) (actual time=0.004..169.137 rows=1995958 loops=1)
Planning time: 0.490 ms
Execution time: 3857.171 ms
Sort in the second query would not use index because the index is not guaranteed to have all the values being sorted. If there are some records in users not matched by access_logs then Left Join would generate null values referenced in query as access_logs.userid but not actually present in access_logs and thus not covered by the index.
The workaround can be to create a default initial record in access_log for each user and use Inner Join.
The queries are performed on a large table with 11 million rows. I have already performed an ANALYZE on the table prior to the query executions.
Query 1:
SELECT *
FROM accounts t1
LEFT OUTER JOIN accounts t2
ON (t1.account_no = t2.account_no
AND t1.effective_date < t2.effective_date)
WHERE t2.account_no IS NULL;
Explain Analyze:
Hash Anti Join (cost=480795.57..1201111.40 rows=7369854 width=292) (actual time=29619.499..115662.111 rows=1977871 loops=1)
Hash Cond: ((t1.account_no)::text = (t2.account_no)::text)
Join Filter: ((t1.effective_date)::text < (t2.effective_date)::text)
-> Seq Scan on accounts t1 (cost=0.00..342610.81 rows=11054781 width=146) (actual time=0.025..25693.921 rows=11034070 loops=1)
-> Hash (cost=342610.81..342610.81 rows=11054781 width=146) (actual time=29612.925..29612.925 rows=11034070 loops=1)
Buckets: 2097152 Batches: 1 Memory Usage: 1834187kB
-> Seq Scan on accounts t2 (cost=0.00..342610.81 rows=11054781 width=146) (actual time=0.006..22929.635 rows=11034070 loops=1)
Total runtime: 115870.788 ms
The estimated cost is ~1.2 million and the actual time taken is ~1.9 minutes.
Query 2:
SELECT t1.*
FROM accounts t1
LEFT OUTER JOIN accounts t2
ON (t1.account_no = t2.account_no
AND t1.effective_date < t2.effective_date)
WHERE t2.account_no IS NULL;
Explain Analyze:
Hash Anti Join (cost=480795.57..1201111.40 rows=7369854 width=146) (actual time=13365.808..65519.402 rows=1977871 loops=1)
Hash Cond: ((t1.account_no)::text = (t2.account_no)::text)
Join Filter: ((t1.effective_date)::text < (t2.effective_date)::text)
-> Seq Scan on accounts t1 (cost=0.00..342610.81 rows=11054781 width=146) (actual time=0.007..5032.778 rows=11034070 loops=1)
-> Hash (cost=342610.81..342610.81 rows=11054781 width=18) (actual time=13354.219..13354.219 rows=11034070 loops=1)
Buckets: 2097152 Batches: 1 Memory Usage: 545369kB
-> Seq Scan on accounts t2 (cost=0.00..342610.81 rows=11054781 width=18) (actual time=0.011..8964.571 rows=11034070 loops=1)
Total runtime: 65705.707 ms
The estimated cost is ~1.2 million (again) but the actual time taken is <1.1 minutes.
Query 3:
SELECT *
FROM accounts
WHERE (account_no,
effective_date) IN
(SELECT account_no,
max(effective_date)
FROM accounts
GROUP BY account_no);
Explain Analyze:
Nested Loop (cost=406416.19..502216.84 rows=2763695 width=146) (actual time=31779.457..917543.228 rows=1977871 loops=1)
-> HashAggregate (cost=406416.19..406757.45 rows=34126 width=43) (actual time=31774.877..33378.968 rows=1977425 loops=1)
-> Subquery Scan on "ANY_subquery" (cost=397884.72..404709.90 rows=341259 width=43) (actual time=27979.226..29841.217 rows=1977425 loops=1)
-> HashAggregate (cost=397884.72..401297.31 rows=341259 width=18) (actual time=27979.224..29315.346 rows=1977425 loops=1)
-> Seq Scan on accounts (cost=0.00..342610.81 rows=11054781 width=18) (actual time=0.851..16092.755 rows=11034070 loops=1)
-> Index Scan using accounts_idx2 on accounts (cost=0.00..2.78 rows=1 width=146) (actual time=0.443..0.445 rows=1 loops=1977425)
Index Cond: (((account_no)::text = ("ANY_subquery".account_no)::text) AND ((effective_date)::text = "ANY_subquery".max))
Total runtime: 918039.614 ms
The estimated cost is ~502,000 but the actual time taken is ~15.3 minutes!
How reliable is the EXPLAIN output?
Do we always have to EXPLAIN ANALYZE to see how our query is going to perform on real data, and not place trust on how much the query planner thinks it will cost?
They are reliable, except for when they are not. You can't really generalize.
It looks like it is dramatically underestimating the number of different account_no that it will find (thinks it will find 34126 actually found 1977425). Your default_statistics_target might not be high enough to get a good estimate for this column.
Facts:
PGSQL 8.4.2, Linux
I make use of table inheritance
Each Table contains 3 million rows
Indexes on joining columns are set
Table statistics (analyze, vacuum analyze) are up-to-date
Only used table is "node" with varios partitioned sub-tables
Recursive query (pg >= 8.4)
Now here is the explained query:
WITH RECURSIVE
rows AS
(
SELECT *
FROM (
SELECT r.id, r.set, r.parent, r.masterid
FROM d_storage.node_dataset r
WHERE masterid = 3533933
) q
UNION ALL
SELECT *
FROM (
SELECT c.id, c.set, c.parent, r.masterid
FROM rows r
JOIN a_storage.node c
ON c.parent = r.id
) q
)
SELECT r.masterid, r.id AS nodeid
FROM rows r
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
CTE Scan on rows r (cost=2742105.92..2862119.94 rows=6000701 width=16) (actual time=0.033..172111.204 rows=4 loops=1)
CTE rows
-> Recursive Union (cost=0.00..2742105.92 rows=6000701 width=28) (actual time=0.029..172111.183 rows=4 loops=1)
-> Index Scan using node_dataset_masterid on node_dataset r (cost=0.00..8.60 rows=1 width=28) (actual time=0.025..0.027 rows=1 loops=1)
Index Cond: (masterid = 3533933)
-> Hash Join (cost=0.33..262208.33 rows=600070 width=28) (actual time=40628.371..57370.361 rows=1 loops=3)
Hash Cond: (c.parent = r.id)
-> Append (cost=0.00..211202.04 rows=12001404 width=20) (actual time=0.011..46365.669 rows=12000004 loops=3)
-> Seq Scan on node c (cost=0.00..24.00 rows=1400 width=20) (actual time=0.002..0.002 rows=0 loops=3)
-> Seq Scan on node_dataset c (cost=0.00..55001.01 rows=3000001 width=20) (actual time=0.007..3426.593 rows=3000001 loops=3)
-> Seq Scan on node_stammdaten c (cost=0.00..52059.01 rows=3000001 width=20) (actual time=0.008..9049.189 rows=3000001 loops=3)
-> Seq Scan on node_stammdaten_adresse c (cost=0.00..52059.01 rows=3000001 width=20) (actual time=3.455..8381.725 rows=3000001 loops=3)
-> Seq Scan on node_testdaten c (cost=0.00..52059.01 rows=3000001 width=20) (actual time=1.810..5259.178 rows=3000001 loops=3)
-> Hash (cost=0.20..0.20 rows=10 width=16) (actual time=0.010..0.010 rows=1 loops=3)
-> WorkTable Scan on rows r (cost=0.00..0.20 rows=10 width=16) (actual time=0.002..0.004 rows=1 loops=3)
Total runtime: 172111.371 ms
(16 rows)
(END)
So far so bad, the planner decides to choose hash joins (good) but no indexes (bad).
Now after doing the following:
SET enable_hashjoins TO false;
The explained query looks like that:
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
CTE Scan on rows r (cost=15198247.00..15318261.02 rows=6000701 width=16) (actual time=0.038..49.221 rows=4 loops=1)
CTE rows
-> Recursive Union (cost=0.00..15198247.00 rows=6000701 width=28) (actual time=0.032..49.201 rows=4 loops=1)
-> Index Scan using node_dataset_masterid on node_dataset r (cost=0.00..8.60 rows=1 width=28) (actual time=0.028..0.031 rows=1 loops=1)
Index Cond: (masterid = 3533933)
-> Nested Loop (cost=0.00..1507822.44 rows=600070 width=28) (actual time=10.384..16.382 rows=1 loops=3)
Join Filter: (r.id = c.parent)
-> WorkTable Scan on rows r (cost=0.00..0.20 rows=10 width=16) (actual time=0.001..0.003 rows=1 loops=3)
-> Append (cost=0.00..113264.67 rows=3001404 width=20) (actual time=8.546..12.268 rows=1 loops=4)
-> Seq Scan on node c (cost=0.00..24.00 rows=1400 width=20) (actual time=0.001..0.001 rows=0 loops=4)
-> Bitmap Heap Scan on node_dataset c (cost=58213.87..113214.88 rows=3000001 width=20) (actual time=1.906..1.906 rows=0 loops=4)
Recheck Cond: (c.parent = r.id)
-> Bitmap Index Scan on node_dataset_parent (cost=0.00..57463.87 rows=3000001 width=0) (actual time=1.903..1.903 rows=0 loops=4)
Index Cond: (c.parent = r.id)
-> Index Scan using node_stammdaten_parent on node_stammdaten c (cost=0.00..8.60 rows=1 width=20) (actual time=3.272..3.273 rows=0 loops=4)
Index Cond: (c.parent = r.id)
-> Index Scan using node_stammdaten_adresse_parent on node_stammdaten_adresse c (cost=0.00..8.60 rows=1 width=20) (actual time=4.333..4.333 rows=0 loops=4)
Index Cond: (c.parent = r.id)
-> Index Scan using node_testdaten_parent on node_testdaten c (cost=0.00..8.60 rows=1 width=20) (actual time=2.745..2.746 rows=0 loops=4)
Index Cond: (c.parent = r.id)
Total runtime: 49.349 ms
(21 rows)
(END)
-> incredibly faster, because indexes were used.
Notice: Cost of the second query ist somewhat higher than for the first query.
So the main question is: Why does the planner make the first decision, instead of the second?
Also interesing:
Via
SET enable_seqscan TO false;
i temp. disabled seq scans. Than the planner used indexes and hash joins, and the query still was slow. So the problem seems to be the hash join.
Maybe someone can help in this confusing situation?
thx, R.
If your explain differs significantly from reality (the cost is lower but the time is higher) it is likely that your statistics are out of date, or are on a non-representative sample.
Try again with fresh statistics. If this does not help, increase the sample size and build stats again.
Try this:
set seq_page_cost = '4.0';
set random_page_cost = '1.0;
EXPLAIN ANALYZE ....
Do this in your session to see if it makes a difference.
The hash join was expecting 600070 resulting rows, but only got 4 (in 3 loops, averaging 1 row per loop). If 600070 had been accurate, the hash join would presumably have been appropriate.