Incorrect rows estimate for joins - sql

I have simple query (Postgres 9.4):
bo_labels L
LEFT JOIN bo_party party ON ( = L.bo_party_fkey)
LEFT JOIN bo_document_base D ON ( = L.bo_doc_base_fkey)
LEFT JOIN bo_contract_hardwood_deal C ON (C.bo_document_fkey =
party.inn = '?'
Explain looks like:
Aggregate (cost=2385.30..2385.30 rows=1 width=0) (actual time=31762.367..31762.367 rows=1 loops=1)
-> Nested Loop Left Join (cost=1.28..2385.30 rows=1 width=0) (actual time=7.621..31760.776 rows=1694 loops=1)
Join Filter: ((c.bo_document_fkey)::text = (
Rows Removed by Join Filter: 101658634
-> Nested Loop Left Join (cost=1.28..106.33 rows=1 width=10) (actual time=0.110..54.635 rows=1694 loops=1)
-> Nested Loop (cost=0.85..105.69 rows=1 width=9) (actual time=0.081..4.404 rows=1694 loops=1)
-> Index Scan using bo_party_inn_idx on bo_party party (cost=0.43..12.43 rows=3 width=10) (actual time=0.031..0.037 rows=3 loops=1)
Index Cond: (inn = '2534005760'::text)
-> Index Only Scan using bo_labels__party_fkey__docbase_fkey__tnved_fkey__idx on bo_labels l (cost=0.42..29.80 rows=1289 width=17) (actual time=0.013..1.041 rows=565 loops=3)
Index Cond: (bo_party_fkey = (
Heap Fetches: 0
-> Index Only Scan using bo_document_pkey on bo_document_base d (cost=0.43..0.64 rows=1 width=10) (actual time=0.022..0.025 rows=1 loops=1694)
Index Cond: (id = (l.bo_doc_base_fkey)::text)
Heap Fetches: 1134
-> Seq Scan on bo_contract_hardwood_deal c (cost=0.00..2069.77 rows=59770 width=9) (actual time=0.003..11.829 rows=60012 loops=1694)
Planning time: 13.484 ms
Execution time: 31762.885 ms
What is very annoying is incorrect estimate of rows:
Nested Loop (cost=0.85..105.69 rows=1 width=9) (actual time=0.081..4.404 rows=1694 loops=1)
Because that postgres choose nested loops and query run about 30 seconds.
With SET LOCAL enable_nestloop = OFF; it accomplished just in a second.
What is also interesting, I have default_statistics_target = 10000 (at max value) and on all 4 tables run VACUUM VERBOSE ANALYZE just before.
As postgres does not gather statistic between tables such case is very likely possible to happens for other joins too.
Without external extension pghintplan it is not possible change enable_nestloop for just that query.
Is there some other way I could try to force use more speedy way to accomplish that query?
Update by comments
I can't eliminate join in common way. My main search is there any possibilities change statistic (for example) to include desired values which break normal statistical appearance? May be other way to force postgres to change weight of nested loops to use it not so frequently?
Could also someone explain or point to documentation how postgres analyzer for nested loops of two results with 3 (exact correct) and 1289 (which will really 565, but actually such error different question) rows made assumption what in result will be only 1 row??? I've speak about that part of plan:
-> Nested Loop (cost=0.85..105.69 rows=1 width=9) (actual time=0.081..4.404 rows=1694 loops=1)
-> Index Scan using bo_party_inn_idx on bo_party party (cost=0.43..12.43 rows=3 width=10) (actual time=0.031..0.037 rows=3 loops=1)
Index Cond: (inn = '2534005760'::text)
-> Index Only Scan using bo_labels__party_fkey__docbase_fkey__tnved_fkey__idx on bo_labels l (cost=0.42..29.80 rows=1289 width=17) (actual time=0.013..1.041 rows=565 loops=3)
Index Cond: (bo_party_fkey = (
On first glance it looks initially wrong. What statistics used there and how?
Does postgres maintain also some statistics for indexes?

Actually, I don't have a good sample data to test my answer but I think it might help.
Based on your join columns I'm assuming the following relationship cardinality:
1) bo_party (id 1:N bo_party_fkey) bo_labels
2) bo_labels (bo_doc_base_fkey N:1 id) bo_document_base
3) bo_document_base (id 1:N bo_document_fkey) bo_contract_hardwood_deal
You want to count how much rows were selected. So, based on the cardinality in 1) and 2) the table "bo_labels" have a many to many relationship. This means that joining it with "bo_party" and "bo_document_base" will produce no more rows than the ones existing in the table.
But, after joining "bo_document_base", another join is done to "bo_contract_hardwood_deal" which cardinality described in 3) is one to many, perhaps generating more rows in the final result.
This way, to find the right count of rows you can simplify the join structure to "bo_labels" and "bo_contract_hardwood_deal" through:
4) bo_labels (bo_doc_base_fkey 1:N bo_document_fkey) bo_contract_hardwood_deal
A sample query could be one of the following:
FROM bo_labels L
LEFT JOIN bo_contract_hardwood_deal C ON (C.bo_document_fkey = L.bo_doc_base_fkey)
and exists
select 1
from bo_party party
where 1=1
and = L.bo_party_fkey
and party.inn = '?'
SELECT sum((select COUNT(*) from bo_contract_hardwood_deal C where C.bo_document_fkey = L.bo_doc_base_fkey))
FROM bo_labels L
and exists
select 1
from bo_party party
where 1=1
and = L.bo_party_fkey
and party.inn = '?'
I could not test with large tables, so I don't know exactly if it will improve performance against your original query, but I think it might help.


How to optimize filter for big data volume? PostgreSQL

A few weeks ago our team faced difficulties with our SQL query because the data volume has increased a lot.
We would appreciate any advice on how we can update schema or optimize the query in order to keep status filtering logic the same.
In a nutshell:
We have two tables a and b. b has FK to a as M-1.
id | processed
a_id| status | type_id | l_id
1 '1' 5 105
1 '3' 6 105
2 '2' 7 105
We can have only one status for a unique combination of (l_id, type_id, a_id).
We need to calculate count of a rows filtered by statuses from b grouped by a_id .
In table a we have 5 300 000 rows.
In table b 750 000 000 rows.
So we need to calculate status for each a row by the next rules:
For a_id there are x rows in b:
1) If at least one status of x equals '3', then status for a_id is '3'.
2) If all statuses of x equal 1 then the status is 1.
And so on.
In current approach we use array_agg() function for filtering of subselection. So our query looks like:
SELECT as id,
BOOL_AND(bt.processed) AS not_pending,
ARRAY_AGG(DISTINCT bt.status) AS status
FROM a AS at
ON ( = bt.a_id AND bt.l_id = 105 AND
bt.type_id IN (2,10,18,1,4,5,6))
WHERE at.processed = True
WHERE not_pending = True
AND status <# ARRAY ['1']::"char"[]
) counter
Our plan looks like:
Aggregate (cost=14665999.33..14665999.34 rows=1 width=8) (actual time=1875987.846..1875987.846 rows=1 loops=1)
-> GroupAggregate (cost=14166691.70..14599096.58 rows=5352220 width=37) (actual time=1875987.844..1875987.844 rows=0 loops=1)
Group Key:
Filter: (bool_and(bt.processed) AND (array_agg(DISTINCT bt.status) <# '{1}'::"char"[]))
Rows Removed by Filter: 5353930
-> Sort (cost=14166691.70..14258067.23 rows=36550213 width=6) (actual time=1860315.593..1864175.762 rows=37430745 loops=1)
Sort Key:
Sort Method: external merge Disk: 586000kB
-> Hash Right Join (cost=1135654.48..8076230.39 rows=36550213 width=6) (actual time=55665.584..1846965.271 rows=37430745 loops=1)
Hash Cond: (bt.a_id =
-> Bitmap Heap Scan on b bt (cost=882095.79..7418660.65 rows=36704370 width=6) (actual time=51871.658..1826058.186 rows=37430378 loops=1)
Recheck Cond: ((l_id = 105) AND (type_id = ANY ('{2,10,18,1,4,5,6}'::integer[])))
Rows Removed by Index Recheck: 574462752
Heap Blocks: exact=28898 lossy=5726508
-> Bitmap Index Scan on db_page_index_atableobjects (cost=0.00..872919.69 rows=36704370 width=0) (actual time=51861.815..51861.815 rows=37586483 loops=1)
Index Cond: ((l_id = 105) AND (type_id = ANY ('{2,10,18,1,4,5,6}'::integer[])))
-> Hash (cost=165747.94..165747.94 rows=5352220 width=4) (actual time=3791.710..3791.710 rows=5353930 loops=1)
Buckets: 131072 Batches: 128 Memory Usage: 2507kB
-> Seq Scan on a at (cost=0.00..165747.94 rows=5352220 width=4) (actual time=0.528..2958.004 rows=5353930 loops=1)
Filter: processed
Rows Removed by Filter: 18659
Planning time: 0.328 ms
Execution time: 1876066.242 ms
As you see the time for the query execution is immense and we would like to make it at least <30 seconds.
We have already tried some approaches like using bitor() instead of array_agg() and LATERAL JOIN. But they didn't give us desired performance and we decided to use materialized views for now. But we are still in search for any other solution and would really appreciate any suggestions!
Plan with track_io_timing enabled:
Aggregate (cost=14665999.33..14665999.34 rows=1 width=8) (actual time=2820945.285..2820945.285 rows=1 loops=1)
Buffers: shared hit=23 read=5998844, temp read=414465 written=414880
I/O Timings: read=2655805.505
-> GroupAggregate (cost=14166691.70..14599096.58 rows=5352220 width=930) (actual time=2820945.283..2820945.283 rows=0 loops=1)
Group Key:
Filter: (bool_and(bt.processed) AND (array_agg(DISTINCT bt.status) <# '{1}'::"char"[]))
Rows Removed by Filter: 5353930
Buffers: shared hit=23 read=5998844, temp read=414465 written=414880
I/O Timings: read=2655805.505
-> Sort (cost=14166691.70..14258067.23 rows=36550213 width=6) (actual time=2804900.123..2808826.358 rows=37430745 loops=1)
Sort Key:
Sort Method: external merge Disk: 586000kB
Buffers: shared hit=18 read=5998840, temp read=414465 written=414880
I/O Timings: read=2655805.491
-> Hash Right Join (cost=1135654.48..8076230.39 rows=36550213 width=6) (actual time=55370.788..2791441.542 rows=37430745 loops=1)
Hash Cond: (bt.a_id =
Buffers: shared hit=15 read=5998840, temp read=142879 written=142625
I/O Timings: read=2655805.491
-> Bitmap Heap Scan on b bt (cost=882095.79..7418660.65 rows=36704370 width=6) (actual time=51059.047..2769127.810 rows=37430378 loops=1)
Recheck Cond: ((l_id = 105) AND (type_id = ANY ('{2,10,18,1,4,5,6}'::integer[])))
Rows Removed by Index Recheck: 574462752
Heap Blocks: exact=28898 lossy=5726508
Buffers: shared hit=13 read=5886842
I/O Timings: read=2653254.939
-> Bitmap Index Scan on db_page_index_atableobjects (cost=0.00..872919.69 rows=36704370 width=0) (actual time=51049.365..51049.365 rows=37586483 loops=1)
Index Cond: ((l_id = 105) AND (type_id = ANY ('{2,10,18,1,4,5,6}'::integer[])))
Buffers: shared hit=12 read=131437
I/O Timings: read=49031.671
-> Hash (cost=165747.94..165747.94 rows=5352220 width=4) (actual time=4309.761..4309.761 rows=5353930 loops=1)
Buckets: 131072 Batches: 128 Memory Usage: 2507kB
Buffers: shared hit=2 read=111998, temp written=15500
I/O Timings: read=2550.551
-> Seq Scan on a at (cost=0.00..165747.94 rows=5352220 width=4) (actual time=0.515..3457.040 rows=5353930 loops=1)
Filter: processed
Rows Removed by Filter: 18659
Buffers: shared hit=2 read=111998
I/O Timings: read=2550.551
Planning time: 0.347 ms
Execution time: 2821022.622 ms
In the current plan, substantially all of the time is going to reading the table pages for the Bitmap Heap Scan. You must already have an index on something like (l_id, type_id). If you change it (create a new, then optionally drop the old one) to by on (ld_id, type_id, processed, a_id, status) instead, or perhaps on (ld_id, type_id, a_id, status) where processed), then it can probably switch to an index-only scan which can avoid reading the table as all the data is present in the index. You will need to make sure the table is well-vacuumed for this stategy to be effective. I would just manually vacuum the table once before building the index, then if it works you can at that point worry about how to keep it well-vacuumed.
Another option would be to jack up effective_io_concurrency (I'd just set it to 20. If it works; you can play with it more to find the optimal setting), so that more than one IO read request on the table can be outstanding at once. How effective this will be will depend on your IO system, and I don't know the answer to that for db.r5.xlarge. The index-only scan is better though as it uses less resources, while this method just uses the same resources faster. (If you have multiple similar queries running simultaneously, that is important. Also, if you are paying per IO, you want fewer of them, not the same number faster)
Another option is try to change the shape of the plan completely by having a nested loop from a into b. For this to have a hope, you will need an index on b which contains a_id and l_id as the leading columns (in either order). If you already have such an index and it doesn't naturally choose such a plan, you might be able to force by set enable_hashjoin=off. My gut feeling this is that a nested loop which needs to kick the other side 5,353,930 times is not going to be better than what you currently have, even if that other side has an efficient index.
You can filter and group table B before joining it with A. And order both tables by ID, because it increases speed of table scan when join operation is processed. Please check this code:
with at as (
select distinct, at.processed
from a AS at
WHERE at.processed = True
order by
bt as (
select bt.a_id, bt.l_id, bt.type_id, --BOOL_AND(bt.processed) AS not_pending,
ARRAY_AGG(DISTINCT bt.status) as status
from b AS bt
group by bt.a_id, bt.l_id, bt.type_id
having bt.l_id = 105 AND bt.type_id IN (2,10,18,1,4,5,6)
order by bt.a_id
counter as (
when '1' = all(status) then '1'
when '3' = any(status) then '3'
else status end as status
from at inner join bt on
select count (*) from counter where status='1'

Optimizing a recursive postgres query that uses where clause and a lot of ANDs / ORs

I want to create a graph whose nodes are choosen from a dataset (bidirectional_edges) using a recursive query. Basically the resulting table starts with a source node (choosen by the user) and joins the target node related to the first. Next, the recursive query selects the target of the last ones already selected. When a node has more than 20 relations, I don’t want to show the next nodes related to it, so I called the nodes with more than 20 relations by “censored”. There are two types of relations: one and two, and consequently, two ways the nodes could be censored.
The problem is that the way I built the code it takes so much time to load when the number of nodes and censored nodes are big. I've already tried to insert in the bidirectional_edges table the information of censored nodes by flags to eliminate the left joins with the censored tables, but the running time doesn't reduced considerably. The bidirectional_edges are indexed by the SOURCE_ID and TARGET_ID.
Is there a way to optimize the query? I think that the problem is in the where clause with several ANDs and ORs.
Here is an example of the dataset bidirectional_edges, censored_nodes_one and censored_nodes_two:
CREATE TABLE bidirectional_edges (
CREATE TABLE censored_nodes_one (
node integer NOT NULL,
relations integer NOT NULL
INSERT INTO censored_nodes_one(node, relations)
CREATE TABLE censored_nodes_two (
node integer NOT NULL,
relations integer NOT NULL
INSERT INTO censored_nodes_two(node, relations)
In the code below, I took the node 1 as the first and want to bring only the nodes 1, 2, 3, 4, 5 and 6.
with recursive search_graph("NODE", "DEPTH", "PATH") as (
sg."DEPTH" + 1 as DEPTH,
sg."PATH" || be.TARGET_ID as PATH
bidirectional_edges as be
inner join
search_graph as sg on
sg."NODE" = be.SOURCE_ID
left join
censored_nodes_one as cno on
sg."NODE" = cno.node
left join
censored_nodes_two as cnt on
sg."NODE" = cnt.node
sg."DEPTH" < 2 and
not (be.TARGET_ID) = any(sg."PATH") and
(be.FLAG_ONE = 1 and cno.node is null) OR
(be.FLAG_TWO = 1 and cnt.node is null)
select *
Below, the query plan for this query:
CTE Scan on search_graph (cost=1705.74..1707.36 rows=81 width=40) (actual time=0.066..1.231 rows=6 loops=1)
CTE search_graph
-> Recursive Union (cost=0.00..1705.74 rows=81 width=40) (actual time=0.055..1.206 rows=6 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.030..0.031 rows=1 loops=1)
-> Hash Left Join (cost=102.95..170.41 rows=8 width=40) (actual time=0.226..0.370 rows=2 loops=3)
Hash Cond: (sg."NODE" = cnt.node)
Filter: (((be.flag_one = 1) AND (cno.node IS NULL)) OR ((be.flag_two = 1) AND (cnt.node IS NULL)))
Rows Removed by Filter: 16
-> Hash Join (cost=42.10..47.71 rows=136 width=56) (actual time=0.161..0.279 rows=18 loops=3)
Hash Cond: (be.source_id = sg."NODE")
Join Filter: (be.target_id <> ALL (sg."PATH"))
-> Seq Scan on teste_bidirectional_edges be (cost=0.00..1.81 rows=41 width=16) (actual time=0.084..0.197 rows=54 loops=2)
Filter: ((flag_one = 1) OR (flag_two = 1))
-> Hash (cost=41.68..41.68 rows=34 width=44) (actual time=0.084..0.084 rows=1 loops=3)
Buckets: 1024 Batches: 1 Memory Usage: 8kB
-> Hash Right Join (cost=0.26..41.68 rows=34 width=44) (actual time=0.062..0.065 rows=1 loops=3)
Hash Cond: (cno.node = sg."NODE")
-> Seq Scan on teste_censored_nodes_one cno (cost=0.00..32.60 rows=2260 width=4) (actual time=0.029..0.034 rows=1 loops=2)
-> Hash (cost=0.22..0.22 rows=3 width=40) (actual time=0.021..0.021 rows=1 loops=3)
Buckets: 1024 Batches: 1 Memory Usage: 8kB
-> WorkTable Scan on search_graph sg (cost=0.00..0.22 rows=3 width=40) (actual time=0.009..0.010 rows=1 loops=3)
Rows Removed by Filter: 1
-> Hash (cost=32.60..32.60 rows=2260 width=4) (actual time=0.074..0.074 rows=1 loops=1)
Buckets: 4096 Batches: 1 Memory Usage: 33kB
-> Seq Scan on teste_censored_nodes_two cnt (cost=0.00..32.60 rows=2260 width=4) (actual time=0.057..0.060 rows=1 loops=1)
Planning time: 1.924 ms
Execution time: 1.707 ms
The table I need to use has about 740.000.000 rows, so when I try to run this recursive query, it takes a long time.
I cannot really provide a solution – I also tried something similar to what #wildplasser suggests, but couldn't get it to run.
However, I have a few ideas that I want to share; maybe they can help you to solve this problem:
It seems cumbersome to have to join (or refer to in a NOT EXISTS clause) two other tables in each iteration. If you can get rid of that, your query will be so much faster.
Since everything is joined on sg."NODE" (do yourself a favor and use lower case identifiers), you could just add another boolean field to bidirectional_edges and set it to the result of the OR condition ahead of time.
So first you run a query that joins with censored_nodes_one and censored_nodes_two and sets the boolean field accordingly, then you run a recursive query using only bidirectional_edges.
If that does not work for some reason, you could do something similar with a CTE:
WITH dbe_flagged AS (
perform the join and provide an "artificial" boolean flag
RECURSIVE search_graph AS (
now run your recursive query on "dbe_flagged"
SELECT * FROM search_graph;
The large size of the CTE result may be a problem here.
Don't forget to use indexes – for example, your query looks like it might profit from an index on bidirectional_edges(source_id).

Why does Postgres do a sequential scan where the index would return < 1% of the data?

I have 19 years of Oracle and MySQL experience (DBA and dev) and I am new to Postgres, so I may be missing something obvious. But I can not get this query to do what I want.
NOTE: This query is running on an EngineYard Postgres instance. I am not immediately aware of the parameters it has set up. Also, columns applicable_type and status in the items table are of extension type citext.
The following query can take in excess of 60 seconds to return rows:
SELECT items.item_id,
CASE when items.sku is null then items.title else concat(item.title, ' (SKU: ', items.sku, ')') END title,
items.listing_status, items.updated_at,,
items.sku, count( detail_count
FROM "items" LEFT OUTER JOIN details ON details.applicable_id =
and details.applicable_type = 'Item'
and details.status = 'Valid'
LEFT OUTER JOIN products ON = items.product_id
WHERE "items"."user_id" = 3
ORDER BY title asc
The details table contains 6.5M rows. The LEFT OUTER JOIN to it does a sequential scan on applicable_id. Cardinality-wise, that column has 120K distinct possibilities across 6.5M rows.
I have a btree index on details with the following columns:
but really, applicable_id and applicable_type have low cardinality.
My explain analyze looks like this:
Limit (cost=247701.59..247701.65 rows=25 width=118) (actual time=28781.090..28781.098 rows=25 loops=1)
-> Sort (cost=247701.59..247703.05 rows=585 width=118) (actual time=28781.087..28781.090 rows=25 loops=1)
Sort Key: (CASE WHEN (items.sku IS NULL) THEN (items.title)::text ELSE pg_catalog.concat(items.title, ' (SKU: ', items.sku, ')') END)
Sort Method: top-N heapsort Memory: 30kB
-> HashAggregate (cost=247677.77..247685.08 rows=585 width=118) (actual time=28779.658..28779.974 rows=664 loops=1)
-> Hash Right Join (cost=2069.47..247645.64 rows=6425 width=118) (actual time=17798.898..28742.395 rows=60047 loops=1)
Hash Cond: (details.applicable_id =
-> Seq Scan on details (cost=0.00..220591.65 rows=6645404 width=8) (actual time=6.272..27702.717 rows=6646205 loops=1)
Filter: ((applicable_type = 'Listing'::citext) AND (status = 'Valid'::citext))
Rows Removed by Filter: 942
-> Hash (cost=2062.16..2062.16 rows=585 width=118) (actual time=1.286..1.286 rows=664 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 90kB
-> Bitmap Heap Scan on items (cost=16.87..2062.16 rows=585 width=118) (actual time=0.157..0.748 rows=664 loops=1)
Recheck Cond: (user_id = 3)
-> Bitmap Index Scan on index_items_on_user_id (cost=0.00..16.73 rows=585 width=0) (actual time=0.141..0.141 rows=664 loops=1)
Index Cond: (user_id = 3)
Total runtime: 28781.238 ms
Do you have an index on the expression that yields the title? Better yet, one on (user_id, title_expression).
If not, that might be an excellent thing to add, so as to nestloop through the first 25 rows of an index scan, seeing that Postgres can't reasonably guess which random 25 rows (hence the seq scan you're currently getting on the joined table) will be needed.
I think you need an index on applicable_id column only (without applicable_type, status columns).
You may also need to increase default_statistics_target param (system wide or better for applicable_id column only) so postgresql had better guess about number of rows in joining.

Slow PostgreSQL query in production - help me understand this explain analyze output

I have a query that is taking 9 minutes to run on PostgreSQL 9.0.0 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
This query is automatically generated by hibernate for my application. It's trying to find all of the "teacher members" in a school. A membership is a user with a role in a group. There are several types of groups, but here what matters are schools and services. If someone is a teacher member in a service and a member in this school (15499) then they are what we are looking for.
This query used to run fine in production and still runs fine in development, but in production it is now taking several minutes to run. Can you help me understand why?
Here's the query:
select distinct user1_.ID as ID14_, user1_.FIRST_NAME as FIRST2_14_, user1_.LAST_NAME as LAST3_14_, user1_.STREET_1 as STREET4_14_, user1_.STREET_2 as STREET5_14_, user1_.CITY as CITY14_, user1_.us_state_id as us7_14_, user1_.REGION as REGION14_, user1_.country_id as country9_14_, user1_.postal_code as postal10_14_, user1_.USER_NAME as USER11_14_, user1_.PASSWORD as PASSWORD14_, user1_.PROFESSION as PROFESSION14_, user1_.PHONE as PHONE14_, user1_.URL as URL14_, as bio14_, user1_.LAST_LOGIN as LAST17_14_, user1_.STATUS as STATUS14_, user1_.birthdate as birthdate14_, user1_.ageInYears as ageInYears14_, user1_.deleted as deleted14_, user1_.CREATEDATE as CREATEDATE14_, user1_.audit as audit14_, user1_.migrated2008 as migrated24_14_, user1_.creator as creator14_
from DIR_MEMBERSHIPS membership0_
inner join DIR_USERS user1_ on membership0_.USER_ID=user1_.ID, DIR_ROLES role2_, DIR_GROUPS group4_
where membership0_.role=role2_.ID
and membership0_.GROUP_ID=15499
and case when membership0_.expires is null
then 1
else case when (membership0_.expires > CURRENT_TIMESTAMP and (membership0_.startDate is null or membership0_.startDate < CURRENT_TIMESTAMP))
then 1
else 0 end
end =1
and membership0_.deleted=false
and role2_.deleted=false
and role2_.NAME='ROLE_MEMBER'
and group4_.deleted=false
and user1_.STATUS='active'
and user1_.deleted=false
and (membership0_.USER_ID in (
select membership7_.USER_ID
from DIR_MEMBERSHIPS membership7_, DIR_USERS user8_, DIR_ROLES role9_
where membership7_.USER_ID=user8_.ID
and membership7_.role=role9_.ID
and case when membership7_.expires is null
then 1
else case when (membership7_.expires > CURRENT_TIMESTAMP
and (membership7_.startDate is null or membership7_.startDate < CURRENT_TIMESTAMP))
then 1
else 0 end
end =1
and membership7_.deleted=false
Explain analyze output:
HashAggregate (cost=61755.63..61755.64 rows=1 width=3334) (actual time=652504.302..652504.307 rows=4 loops=1)
-> Nested Loop (cost=4355.35..61755.56 rows=1 width=3334) (actual time=304.450..652504.217 rows=6 loops=1)
-> Nested Loop (cost=4355.35..61747.28 rows=1 width=3342) (actual time=304.419..652504.060 rows=6 loops=1)
-> Nested Loop Semi Join (cost=4355.35..61738.97 rows=1 width=32) (actual time=304.385..652503.961 rows=6 loops=1)
Join Filter: (user_id = user_id)
-> Nested Loop (cost=0.00..32.75 rows=1 width=16) (actual time=0.190..26.703 rows=758 loops=1)
-> Seq Scan on dir_roles role2_ (cost=0.00..1.25 rows=1 width=8) (actual time=0.032..0.038 rows=1 loops=1)
Filter: ((NOT deleted) AND ((name)::text = 'ROLE_MEMBER'::text))
-> Index Scan using dir_memberships_role_group_id_index on dir_memberships membership0_ (cost=0.00..31.49 rows=1 width=24) (actual time=0.151..25.626 rows=758 loops=1)
Index Cond: ((role = AND (group_id = 15499))
Filter: ((NOT deleted) AND (CASE WHEN (expires IS NULL) THEN 1 ELSE CASE WHEN ((expires > now()) AND ((startdate IS NULL) OR (startdate < now()))) THEN 1 ELSE 0 END END = 1))
-> Nested Loop (cost=4355.35..61692.86 rows=1069 width=16) (actual time=91.088..843.967 rows=79986 loops=758)
-> Nested Loop (cost=4355.35..54185.33 rows=1069 width=8) (actual time=91.065..555.830 rows=79986 loops=758)
-> Seq Scan on dir_roles role9_ (cost=0.00..1.25 rows=1 width=8) (actual time=0.006..0.013 rows=1 loops=758)
Filter: ((name)::text = 'ROLE_TEACHER_MEMBER'::text)
-> Bitmap Heap Scan on dir_memberships membership7_ (cost=4355.35..53983.63 rows=16036 width=16) (actual time=91.047..534.236 rows=79986 loops=758)
Recheck Cond: (role =
Filter: ((NOT deleted) AND (CASE WHEN (expires IS NULL) THEN 1 ELSE CASE WHEN ((expires > now()) AND ((startdate IS NULL) OR (startdate < now()))) THEN 1 ELSE 0 END END = 1))
-> Bitmap Index Scan on dir_memberships_role_index (cost=0.00..4355.09 rows=214190 width=0) (actual time=87.050..87.050 rows=375858 loops=758)
Index Cond: (role =
-> Index Scan using dir_users_pkey on dir_users user8_ (cost=0.00..7.01 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=60629638)
Index Cond: (id = user_id)
-> Index Scan using dir_users_pkey on dir_users user1_ (cost=0.00..8.29 rows=1 width=3334) (actual time=0.011..0.011 rows=1 loops=6)
Index Cond: (id = user_id)
Filter: ((NOT deleted) AND ((status)::text = 'active'::text))
-> Index Scan using dir_groups_pkey on dir_groups group4_ (cost=0.00..8.28 rows=1 width=8) (actual time=0.023..0.023 rows=1 loops=6)
Index Cond: ( = 15499)
Filter: (NOT group4_.deleted)
Total runtime: 652504.827 ms
(29 rows)
I am reading and reading forum posts and the user manual, but I can't figure out what would make this run faster, except maybe if it were possible to make indexes for the select that uses the now() function.
I rewrote your query and assume this will be faster:
SELECT AS id14_, u.first_name AS first2_14_, u.last_name AS last3_14_, u.street_1 AS street4_14_, u.street_2 AS street5_14_, AS city14_, u.us_state_id AS us7_14_, u.region AS region14_, u.country_id AS country9_14_, u.postal_code AS postal10_14_, u.user_name AS user11_14_, u.password AS password14_, u.profession AS profession14_, AS phone14_, u.url AS url14_, AS bio14_, u.last_login AS last17_14_, u.status AS status14_, u.birthdate AS birthdate14_, u.ageinyears AS ageinyears14_, u.deleted AS deleted14_, u.createdate AS createdate14_, u.audit AS audit14_, u.migrated2008 AS migrated24_14_, u.creator AS creator14_
FROM dir_users u
WHERE u.status = 'active'
AND u.deleted = FALSE
FROM dir_memberships m
JOIN dir_roles r ON = m.role
JOIN dir_groups g ON = m.group_id
WHERE m.group_id = 15499
AND m.user_id =
AND (m.expires IS NULL
OR m.expires > now() AND (m.startdate IS NULL OR m.startdate < now()))
AND m.deleted = FALSE
AND r.deleted = FALSE
AND g.deleted = FALSE
FROM dir_memberships m
JOIN dir_roles r ON = m.role
WHERE (m.expires IS NULL
OR m.expires > now() AND (m.startDate IS NULL OR m.startDate < now()))
AND m.deleted = FALSE
AND m.user_id =
Rewrite with EXISTS
Replaced the weird case ... end = 1 expressions with simple expressions
Rewrote all JOINs with explicit join syntax to make it easier to read.
Transformed the big JOIN construct and the IN expression into two EXISTS semi-joins, which voids the necessity for DISTINCT. This should be quite a bit faster.
Lots of minor edits to make the query simpler, but they don't change the substance.
Especially use simper aliases - what you had was noisy and confusing.
If this isn't fast enough yet, and your write performance can deal with more indexes, add this partial multi-column index:
CREATE INDEX dir_memberships_g_id_u_id_idx ON dir_memberships (group_id, user_id)
WHERE deleted = FALSE;
The WHERE conditions have to match your query for the index to be useful!
I assume that you already have primary keys and indexes on relevant foreign keys.
CREATE INDEX dir_memberships_u_id_role_idx ON dir_memberships (user_id, role)
WHERE deleted = FALSE;
Why user_id a second time?. See:
Working of indexes in PostgreSQL
Is a composite index also good for queries on the first field?
Also, since user_id is already used in another index you are not blocking HOT-updates (which can only be used with columns not involved in any indexes.
Why role?
I assume both columns are of type integer (4 bytes). I have seen in your detailed question, that you run a 64 bit OS where MAXALIGN 8 bytes, so another integer will not make the index grow at all. I threw in role which might be useful for the second EXISTS semi-join.
If you have many "dead" users, this might also help:
CREATE INDEX dir_users_id_idx ON dir_users (id)
WHERE status = 'active' AND deleted = FALSE;
As always, check with EXPLAIN to see whether the indexes actually get used. You wouldn't want useless indexes consuming resources.
Are we fast yet?
Of course, all the usual advice for performance optimization applies, too.
The query, minus the last 4 conditions, i.e.
and group4_.deleted=false
and user1_.STATUS='active'
and user1_.deleted=false
and (membership0_.USER_ID in (...))
returns 758 rows. Each of these 758 rows will then go through the select membership7_.USER_ID ... subquery, which takes 843.967 miliseconds to run.
843.967 * 758 = 639726.986, there goes the 10 minutes.
As for tuning the query, I don't think you need DIR_USERS user8_ in the subquery. You can start by removing it, and also changing the subquery to use EXISTS instead of IN.
By the way, is the database being vacuumed? Even without any tuning, it doesn't look that complex a query or that much of data to require 10 minutes.

Need help understanding the SQL explanation of a JOIN query versus a query with subselects

I posted a previous question here asking about what was better, JOIN queries or queries using subselects. Link: Queries within queries: Is there a better way?
This is an extension to that question. Can somebody explain to me why I'm seeing what I'm seeing here?
Query (Subselects):
SELECT article_seq, title, synopsis, body, lastmodified_date, (SELECT type_id FROM types WHERE kbarticles.type = type_seq), status, scope, images, archived, author, owner, (SELECT owner_description FROM owners WHERE kbarticles.owner = owner_seq), (SELECT review_date FROM kbreview WHERE kbarticles.article_seq = article_seq) FROM kbarticles WHERE article_seq = $1
Explain Analyze (Subselects)
Index Scan using article_seq_pkey on kbarticles (cost=0.00..32.24 rows=1 width=1241) (actual time=1.421..1.426 rows=1 loops=1)
Index Cond: (article_seq = 1511)
-> Seq Scan on kbreview (cost=0.00..14.54 rows=1 width=8) (actual time=0.243..1.158 rows=1 loops=1)
Filter: ($2 = article_seq)
-> Seq Scan on owners (cost=0.00..1.16 rows=1 width=24) (actual time=0.073..0.078 rows=1 loops=1)
Filter: ($1 = owner_seq)
-> Index Scan using types_type_seq_key on types (cost=0.00..8.27 rows=1 width=24) (actual time=0.044..0.050 rows=1 loops=1)
Index Cond: ($0 = type_seq)
Total runtime: 2.051 ms
Query (JOINs)
SELECT k.article_seq, k.title, k.synopsis, k.body, k.lastmodified_date, t.type_id, k.status, k.scope, k.images, k.archived,, k.owner, o.owner_description, r.review_date FROM kbarticles k JOIN types t ON k.type = t.type_seq JOIN owners o ON k.owner = o.owner_seq JOIN kbreview r ON k.article_seq = r.article_seq WHERE k.article_seq = $1
Explain Analyze (JOINs)
Nested Loop (cost=0.00..32.39 rows=1 width=1293) (actual time=0.532..1.467 rows=1 loops=1)
Join Filter: (k.owner = o.owner_seq)
-> Nested Loop (cost=0.00..31.10 rows=1 width=1269) (actual time=0.419..1.345 rows=1 loops=1)
-> Nested Loop (cost=0.00..22.82 rows=1 width=1249) (actual time=0.361..1.277 rows=1 loops=1)
-> Index Scan using article_seq_pkey on kbarticles k (cost=0.00..8.27 rows=1 width=1241) (actual time=0.065..0.071 rows=1 loops=1)
Index Cond: (article_seq = 1511)
-> Seq Scan on kbreview r (cost=0.00..14.54 rows=1 width=12) (actual time=0.267..1.175 rows=1 loops=1)
Filter: (r.article_seq = 1511)
-> Index Scan using types_type_seq_key on types t (cost=0.00..8.27 rows=1 width=28) (actual time=0.048..0.055 rows=1 loops=1)
Index Cond: (t.type_seq = k.type)
-> Seq Scan on owners o (cost=0.00..1.13 rows=13 width=28) (actual time=0.022..0.038 rows=13 loops=1)
Total runtime: 2.256 ms
Based on the answers given (and accepted) in my previous question, JOINs should prove to have better results. However, in all my tests, I'm seeing JOINs to have worse results by a few milliseconds. It also seems like the JOINs are riddled with nested loops. All the tables I'm JOINing are indexed.
Am I doing something that I should be doing differently? Is there something I'm missing?
These queries are logically different.
The first one:
SELECT article_seq, title, synopsis, body, lastmodified_date,
SELECT type_id
FROM types
WHERE kbarticles.type = type_seq
status, scope, images, archived, author, owner,
SELECT owner_description
FROM owners
WHERE kbarticles.owner = owner_seq
SELECT review_date
FROM kbreview
WHERE kbarticles.article_seq = article_seq
FROM kbarticles
WHERE article_seq = $1
The second one:
SELECT k.article_seq, k.title, k.synopsis, k.body, k.lastmodified_date, t.type_id, k.status,
k.scope, k.images, k.archived,, k.owner, o.owner_description, r.review_date
FROM kbarticles k
JOIN types t
ON k.type = t.type_seq
JOIN owners o
ON k.owner = o.owner_seq
JOIN kbreview r
ON k.article_seq = r.article_seq
WHERE k.article_seq = $1
If there is more than one record in types, owners or kbreview, the first query will fail while the second one will return duplicates from kbarticles.
If there is no types, owners or kbreviews for a kbarticle, the first query will return a NULL in appropriate field, while the second one will just omit that record.
If the *_seq fields seem to be the PRIMARY KEY fields, there will never be duplicates and the query will never fail; in the same way if kbarticles is constrained with FOREIGN KEY references to types, owners or kbreview, there can be no missing rows.
However, JOIN operators give the optimizer more place: it can make any table leading and use more advanced JOIN techniques like HASH JOIN or MERGE JOIN which are not available if you are using subqueries.
Is this table column indexed? r.article_seq
-> Seq Scan on kbreview r (cost=0.00..14.54 rows=1 width=12)
(actual time=0.267..1.175 rows=1
This is where most time is spend.
Given that both plans are doing the same table scans, just arranged in a different way, I'd say there's no significant difference between the two. A "nested loop" where the lower arm produces a single row is pretty much the same as a single-row subselect.
Joins are more general, since using scalar subselects won't extend to getting two columns from any of those auxiliary tables, for example.