GroupAggregate for Subquery in Redshift/PostgreSQL - sql

I've noticed some strange behavior in the query optimizer for Redshift, and I'm wondering if anyone can explain it or point out a workaround.
For large group by queries, it's pretty essential to get the optimizer to plan a GroupAggregate rather than a HashAggregate, so it doesn't try to fit the temporary results in memory. This works fine for me in general. But when I try to use that group by as a subquery, it switches to HashAggregate.
For example, consider the following query.
select install_app_version, user_id, max(platform) as plat
from dailies
group by install_app_version, user_id;
The table dailies has sortkeys (install_app_version, user_id) and distkey (user_id). Hence a GroupAggregate is possible, and the query plan looks like this, as it should.
XN GroupAggregate (cost=0.00..184375.32 rows=1038735 width=51)
-> XN Seq Scan on daily_players (cost=0.00..103873.42 rows=10387342 width=51)
In contrast, if I use the above in a subquery of any other query, I get a HashAggregate. For example, even something as simple as
select count(1) from
( select install_app_version, user_id, max(platform) as plat
from daily_players
group by install_app_version, user_id
);
has the query plan
XN Aggregate (cost=168794.32..168794.32 rows=1 width=0)
-> XN Subquery Scan derived_table1 (cost=155810.13..166197.48 rows=1038735 width=0)
-> XN HashAggregate (cost=155810.13..155810.13 rows=1038735 width=39)
-> XN Seq Scan on daily_players (cost=0.00..103873.42 rows=10387342 width=39)
The same pattern persists no matter what I do in the outer query. I can group by install_app_version and user_id, I can take aggregates, I can do no grouping at all externally. Even sorting the inner query does nothing.
In the cases I've shown it's not such a big deal, but I'm joining several subqueries with their own group by, doing aggregates over that - it quickly gets out of hand and very slow without GroupAggregate.
If anyone has wisdom about the query optimizer and can answer this, it'd be much appreciated! Thanks!

don't know if your question is still opened, but I put this here because I think others could be interested.
Redshift seems to perform GROUP BY aggregation with HashAggregate by default (even when conditions for GroupAggregate are right), and switch only to GroupAggregate when there is at least one computation made by aggregation THAT NEED TO BE RESOLVED FOR THE QUERY TO RETURN. I mean by this that, in your previous example, the "max(platform) as plat" is of no use for the final "COUNT(1)" result of the query. I believe that, in such case, the aggregate computation of MAX() function is not computed at all.
The workaround I use is to add an useless HAVING clause that does nothing but still need to be computed (for exemple "HAVING COUNT(1)"). This always return true (because each group has COUNT(1) equals to at least 1 and so is true), but enables the query plan to use GroupAggregate.
Example :
EXPLAIN SELECT COUNT(*) FROM (SELECT mycol FROM mytable GROUP BY 1);
XN Aggregate (cost=143754365.00..143754365.00 rows=1 width=0)
-> XN Subquery Scan derived_table1 (cost=141398732.80..143283238.56 rows=188450576 width=0)
-> XN HashAggregate (cost=141398732.80..141398732.80 rows=188450576 width=40)
-> XN Seq Scan on mytable (cost=0.00..113118986.24 rows=11311898624 width=40)
EXPLAIN SELECT COUNT(*) FROM (SELECT mycol FROM mytable GROUP BY 1 HAVING COUNT(1));
XN Aggregate (cost=171091871.18..171091871.18 rows=1 width=0)
-> XN Subquery Scan derived_table1 (cost=0.00..171091868.68 rows=1000 width=0)
-> XN GroupAggregate (cost=0.00..171091858.68 rows=1000 width=40)
Filter: ((count(1))::boolean = true)
-> XN Seq Scan on mytable (cost=0.00..113118986.24 rows=11311898624 width=40)
This works because 'mycol' is both the distkey and the sortkey of 'mytable'.
As you can see, the query plan estimate than the query with GroupAggregate is more costly than the one with HashAggregate (which must be the thing which make the query plan choose HashAggregate). Do not trust that, in my example the second query runs up to 7 times faster than the first one ! The cool thing is that the GroupAggregate do not need much memory to be computed, and so will almost never perform 'Disk Based Aggregate'.
In fact, I realised it's even a much better option to perform COUNT(DISTINCT x) with a subquery GroupAggregate than with the standard COUNT(DISTINCT x) (in my example, 'mycol' is a NOT NULL column) :
EXPLAIN SELECT COUNT(DISTINCT mycol) FROM mytable ;
XN Aggregate (cost=143754365.00..143754365.00 rows=1 width=72)
-> XN Subquery Scan volt_dt_0 (cost=141398732.80..143283238.56 rows=188450576 width=72)
-> XN HashAggregate (cost=141398732.80..141398732.80 rows=188450576 width=40)
-> XN Seq Scan on mytable (cost=0.00..113118986.24 rows=11311898624 width=40)
3 minutes 46 s
EXPLAIN SELECT COUNT(*) FROM (SELECT mycol FROM mytable GROUP BY 1 HAVING COUNT(1));
XN Aggregate (cost=171091871.18..171091871.18 rows=1 width=0)
-> XN Subquery Scan derived_table1 (cost=0.00..171091868.68 rows=1000 width=0)
-> XN GroupAggregate (cost=0.00..171091858.68 rows=1000 width=40)
Filter: ((count(1))::boolean = true)
-> XN Seq Scan on mytable (cost=0.00..113118986.24 rows=11311898624 width=40)
40 seconds
Hopes that helps !

Related

Why does PostgreSQL sorts on a boolean WHERE condition?

I am testing some queries over a bunch of materialized views. All of them have the same structure, like this one:
EXPLAIN ANALYZE SELECT mr.foo, ..., CAST(SUM(mr.bar) AS INTEGER) AS stuff
FROM foo.bar mr
WHERE
mr.a = 'TRUE' AND
mr.b = 'something' AND
mr.c = '12'
GROUP BY
mr.a,
mr.b,
mr.c;
Obviously the system is giving me a different query plan for each one of them, but if (and only if) a WHERE clause involves a boolean column (like in the examples), the planner always sorts the result set before finishing. Example:
Finalize GroupAggregate (cost=16305.92..16317.98 rows=85 width=21) (actual time=108.301..108.301 rows=1 loops=1)
Group Key: festivo, nome_strada, ora
-> Gather Merge (cost=16305.92..16315.05 rows=70 width=77) (actual time=108.279..109.015 rows=2 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial GroupAggregate (cost=15305.90..15306.95 rows=35 width=77) (actual time=101.422..101.422 rows=1 loops=3)
Group Key: festivo, nome_strada, ora
-> Sort (cost=15305.90..15305.99 rows=35 width=21) (actual time=101.390..101.395 rows=28 loops=3)
Sort Key: festivo
Sort Method: quicksort Memory: 25kB
-> Parallel Seq Scan on sft_vmv3_g3 mr (cost=0.00..15305.00 rows=35 width=21) (actual time=75.307..101.329 rows=28 loops=3)
Filter: (festivo AND ((nome_strada)::text = '16th St'::text) AND (ora = '12'::smallint))
Rows Removed by Filter: 277892
I am really curios about this kind of approach, but I still haven't found an explaination about this.
I'm curious why you wouldn't phrase the logic as:
SELECT true as a, 'something' as b, '12' as c, CAST(SUM(mr.bar) as INTEGER)
FROM foo.bar as mr
WHERE mr.a AND
mr.b = 'something' AND
mr.c = '12';
This an aggregation query (because of the SUM() in the SELECT) and does not have an explicit GROUP BY. I think it should produce a more optimal execution plan. In addition, it will always return one row, even if no rows match the condition.

Optimizing a recursive postgres query that uses where clause and a lot of ANDs / ORs

I want to create a graph whose nodes are choosen from a dataset (bidirectional_edges) using a recursive query. Basically the resulting table starts with a source node (choosen by the user) and joins the target node related to the first. Next, the recursive query selects the target of the last ones already selected. When a node has more than 20 relations, I don’t want to show the next nodes related to it, so I called the nodes with more than 20 relations by “censored”. There are two types of relations: one and two, and consequently, two ways the nodes could be censored.
The problem is that the way I built the code it takes so much time to load when the number of nodes and censored nodes are big. I've already tried to insert in the bidirectional_edges table the information of censored nodes by flags to eliminate the left joins with the censored tables, but the running time doesn't reduced considerably. The bidirectional_edges are indexed by the SOURCE_ID and TARGET_ID.
Is there a way to optimize the query? I think that the problem is in the where clause with several ANDs and ORs.
Here is an example of the dataset bidirectional_edges, censored_nodes_one and censored_nodes_two:
CREATE TABLE bidirectional_edges (
SOURCE_ID integer NOT NULL,
TARGET_ID integer NOT NULL,
FLAG_ONE integer NOT NULL,
FLAG_TWO integer NOT NULL
);
INSERT INTO bidirectional_edges(SOURCE_ID, TARGET_ID, FLAG_ONE, FLAG_TWO)
VALUES
(1,2,1,0),
(1,3,0,1),
(2,5,1,1),
(2,6,0,1),
(2,7,1,0),
(2,8,1,0),
(2,9,1,0),
(2,10,1,0),
(2,11,1,0),
(2,12,1,0),
(2,13,1,0),
(2,14,1,0),
(2,15,1,0),
(2,16,1,0),
(2,17,1,0),
(2,18,1,0),
(2,19,1,0),
(2,20,1,0),
(2,21,1,0),
(2,22,1,0),
(2,23,1,0),
(2,24,1,0),
(2,25,1,0),
(2,26,1,0),
(2,27,1,0),
(2,28,1,0),
(2,29,1,0),
(2,30,1,0),
(3,4,1,1),
(3,31,0,1),
(3,32,0,1),
(3,33,0,1),
(3,34,0,1),
(3,35,0,1),
(3,36,0,1),
(3,37,0,1),
(3,38,0,1),
(3,39,0,1),
(3,40,0,1),
(3,41,0,1),
(3,42,0,1),
(3,43,0,1),
(3,44,0,1),
(3,45,0,1),
(3,46,0,1),
(3,47,0,1),
(3,48,0,1),
(3,49,0,1),
(3,50,0,1),
(3,51,0,1),
(3,52,0,1),
(3,53,0,1),
(3,54,0,1),
(3,55,0,1)
;
CREATE TABLE censored_nodes_one (
node integer NOT NULL,
relations integer NOT NULL
);
INSERT INTO censored_nodes_one(node, relations)
VALUES
(2,25)
;
CREATE TABLE censored_nodes_two (
node integer NOT NULL,
relations integer NOT NULL
);
INSERT INTO censored_nodes_two(node, relations)
VALUES
(3,26)
;
In the code below, I took the node 1 as the first and want to bring only the nodes 1, 2, 3, 4, 5 and 6.
with recursive search_graph("NODE", "DEPTH", "PATH") as (
select
1 AS NODE,
0 AS DEPTH,
ARRAY[1] as PATH
union
select
be.TARGET_ID as NODE,
sg."DEPTH" + 1 as DEPTH,
sg."PATH" || be.TARGET_ID as PATH
from
bidirectional_edges as be
inner join
search_graph as sg on
sg."NODE" = be.SOURCE_ID
left join
censored_nodes_one as cno on
sg."NODE" = cno.node
left join
censored_nodes_two as cnt on
sg."NODE" = cnt.node
where
sg."DEPTH" < 2 and
not (be.TARGET_ID) = any(sg."PATH") and
(
(be.FLAG_ONE = 1 and cno.node is null) OR
(be.FLAG_TWO = 1 and cnt.node is null)
)
)
select *
from
search_graph
Below, the query plan for this query:
CTE Scan on search_graph (cost=1705.74..1707.36 rows=81 width=40) (actual time=0.066..1.231 rows=6 loops=1)
CTE search_graph
-> Recursive Union (cost=0.00..1705.74 rows=81 width=40) (actual time=0.055..1.206 rows=6 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.030..0.031 rows=1 loops=1)
-> Hash Left Join (cost=102.95..170.41 rows=8 width=40) (actual time=0.226..0.370 rows=2 loops=3)
Hash Cond: (sg."NODE" = cnt.node)
Filter: (((be.flag_one = 1) AND (cno.node IS NULL)) OR ((be.flag_two = 1) AND (cnt.node IS NULL)))
Rows Removed by Filter: 16
-> Hash Join (cost=42.10..47.71 rows=136 width=56) (actual time=0.161..0.279 rows=18 loops=3)
Hash Cond: (be.source_id = sg."NODE")
Join Filter: (be.target_id <> ALL (sg."PATH"))
-> Seq Scan on teste_bidirectional_edges be (cost=0.00..1.81 rows=41 width=16) (actual time=0.084..0.197 rows=54 loops=2)
Filter: ((flag_one = 1) OR (flag_two = 1))
-> Hash (cost=41.68..41.68 rows=34 width=44) (actual time=0.084..0.084 rows=1 loops=3)
Buckets: 1024 Batches: 1 Memory Usage: 8kB
-> Hash Right Join (cost=0.26..41.68 rows=34 width=44) (actual time=0.062..0.065 rows=1 loops=3)
Hash Cond: (cno.node = sg."NODE")
-> Seq Scan on teste_censored_nodes_one cno (cost=0.00..32.60 rows=2260 width=4) (actual time=0.029..0.034 rows=1 loops=2)
-> Hash (cost=0.22..0.22 rows=3 width=40) (actual time=0.021..0.021 rows=1 loops=3)
Buckets: 1024 Batches: 1 Memory Usage: 8kB
-> WorkTable Scan on search_graph sg (cost=0.00..0.22 rows=3 width=40) (actual time=0.009..0.010 rows=1 loops=3)
Rows Removed by Filter: 1
-> Hash (cost=32.60..32.60 rows=2260 width=4) (actual time=0.074..0.074 rows=1 loops=1)
Buckets: 4096 Batches: 1 Memory Usage: 33kB
-> Seq Scan on teste_censored_nodes_two cnt (cost=0.00..32.60 rows=2260 width=4) (actual time=0.057..0.060 rows=1 loops=1)
Planning time: 1.924 ms
Execution time: 1.707 ms
The table I need to use has about 740.000.000 rows, so when I try to run this recursive query, it takes a long time.
I cannot really provide a solution – I also tried something similar to what #wildplasser suggests, but couldn't get it to run.
However, I have a few ideas that I want to share; maybe they can help you to solve this problem:
It seems cumbersome to have to join (or refer to in a NOT EXISTS clause) two other tables in each iteration. If you can get rid of that, your query will be so much faster.
Since everything is joined on sg."NODE" (do yourself a favor and use lower case identifiers), you could just add another boolean field to bidirectional_edges and set it to the result of the OR condition ahead of time.
So first you run a query that joins with censored_nodes_one and censored_nodes_two and sets the boolean field accordingly, then you run a recursive query using only bidirectional_edges.
If that does not work for some reason, you could do something similar with a CTE:
WITH dbe_flagged AS (
perform the join and provide an "artificial" boolean flag
),
RECURSIVE search_graph AS (
now run your recursive query on "dbe_flagged"
)
SELECT * FROM search_graph;
The large size of the CTE result may be a problem here.
Don't forget to use indexes – for example, your query looks like it might profit from an index on bidirectional_edges(source_id).

Postgres: STABLE function called multiple times on constant

I'm having a Postgresql (version 9.4) performance puzzle. I have a function (prevd) declared as STABLE (see below). When I run this function on a constant in where clause, it is called multiple times - instead of once.
If I understand postgres documentation correctly, the query should be optimized to call prevd only once.
A STABLE function cannot modify the database and is guaranteed to return the same results given the same arguments for all rows within a single statement
Why it doesn't optimize calls to prevd in this case?
I'm not expecting prevd to be called once for all subsequent queries using prevd on the same argument (like it was IMMUTABLE). I'm expecting postgres to create a plan for my query with just one call to prevd('2015-12-12')
Please find the code below:
Schema
create table somedata(d date, number double precision);
create table dates(d date);
insert into dates
select generate_series::date
from generate_series('2015-01-01'::date, '2015-12-31'::date, '1 day');
insert into somedata
select '2015-01-01'::date + (random() * 365 + 1)::integer, random()
from generate_series(1, 100000);
create or replace function prevd(date_ date)
returns date
language sql
stable
as $$
select max(d) from dates where d < date_;
$$
Slow Query
select avg(number) from somedata where d=prevd('2015-12-12');
Poor query plan of the query above
Aggregate (cost=28092.74..28092.75 rows=1 width=8) (actual time=3532.638..3532.638 rows=1 loops=1)
Output: avg(number)
-> Seq Scan on public.somedata (cost=0.00..28091.43 rows=525 width=8) (actual time=10.210..3532.576 rows=282 loops=1)
Output: d, number
Filter: (somedata.d = prevd('2015-12-12'::date))
Rows Removed by Filter: 99718
Planning time: 1.144 ms
Execution time: 3532.688 ms
(8 rows)
Performance
The query above, on my machine runs around 3.5s. After changing prevd to IMMUTABLE, it's changing to 0.035s.
I started writing this as a comment, but it got a bit long, so I'm expanding it into an answer.
As discussed in this previous answer, Postgres does not promise to always optimise based on STABLE or IMMUTABLE annotations, only that it can sometimes do so. It does this by planning the query differently by taking advantage of certain assumptions. This part of the previous answer is directly analogous to your case:
This particular sort of rewriting depends upon immutability or stability. With where test_multi_calls1(30) != num query re-writing will happen for immutable but not for merely stable functions.
If you change the function to IMMUTABLE and look at the query plan, you will see that the rewriting it does is really rather radical:
Seq Scan on public.somedata (cost=0.00..1791.00 rows=272 width=12) (actual time=0.036..14.549 rows=270 loops=1)
Output: d, number
Filter: (somedata.d = '2015-12-11'::date)
Buffers: shared read=541 written=14
Total runtime: 14.589 ms
It actually runs the function while planning the query, and substitutes the value before the query is even executed. With a STABLE function, this optimisation would clearly not be appropriate - the data might change between planning and executing the query.
In a comment, it was mentioned that this query results in an optimised plan:
select avg(number) from somedata where d=(select prevd(date '2015-12-12'));
This is fast, but note that the plan doesn't look anything like what the IMMUTABLE version did:
Aggregate (cost=1791.69..1791.70 rows=1 width=8) (actual time=14.670..14.670 rows=1 loops=1)
Output: avg(number)
Buffers: shared read=541 written=21
InitPlan 1 (returns $0)
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)
Output: '2015-12-11'::date
-> Seq Scan on public.somedata (cost=0.00..1791.00 rows=273 width=8) (actual time=0.026..14.589 rows=270 loops=1)
Output: d, number
Filter: (somedata.d = $0)
Buffers: shared read=541 written=21
Total runtime: 14.707 ms
By putting it into a sub-query, you are moving the function call from the WHERE clause to the SELECT clause. More importantly, the sub-query can always be executed once and used by the rest of the query; so the function is run once in a separate node of the plan.
To confirm this, we can take the SQL out of a function altogether:
select avg(number) from somedata where d=(select max(d) from dates where d < '2015-12-12');
This gives a rather longer plan with very similar performance:
Aggregate (cost=1799.12..1799.13 rows=1 width=8) (actual time=14.174..14.174 rows=1 loops=1)
Output: avg(somedata.number)
Buffers: shared read=543 written=19
InitPlan 1 (returns $0)
-> Aggregate (cost=7.43..7.44 rows=1 width=4) (actual time=0.150..0.150 rows=1 loops=1)
Output: max(dates.d)
Buffers: shared read=2
-> Seq Scan on public.dates (cost=0.00..6.56 rows=347 width=4) (actual time=0.015..0.103 rows=345 loops=1)
Output: dates.d
Filter: (dates.d < '2015-12-12'::date)
Buffers: shared read=2
-> Seq Scan on public.somedata (cost=0.00..1791.00 rows=273 width=8) (actual time=0.190..14.098 rows=270 loops=1)
Output: somedata.d, somedata.number
Filter: (somedata.d = $0)
Buffers: shared read=543 written=19
Total runtime: 14.232 ms
The important thing to note is that the inner Aggregate (the max(d)) is executed once, on a separate node from the main Seq Scan (which is checking the where clause). In this position, even a VOLATILE function can be optimised in the same way.
In short, while you know that the query you've produced can be optimised by executing the function only once, it doesn't match any of the patterns that Postgres's query planner knows how to rewrite, so it uses a naive plan which runs the function multiple times.
[Note: all tests performed on Postgres 9.1, because it's what I happened to have to hand.]

Incorrect rows estimate for joins

I have simple query (Postgres 9.4):
EXPLAIN ANALYZE
SELECT
COUNT(*)
FROM
bo_labels L
LEFT JOIN bo_party party ON (party.id = L.bo_party_fkey)
LEFT JOIN bo_document_base D ON (D.id = L.bo_doc_base_fkey)
LEFT JOIN bo_contract_hardwood_deal C ON (C.bo_document_fkey = D.id)
WHERE
party.inn = '?'
Explain looks like:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=2385.30..2385.30 rows=1 width=0) (actual time=31762.367..31762.367 rows=1 loops=1)
-> Nested Loop Left Join (cost=1.28..2385.30 rows=1 width=0) (actual time=7.621..31760.776 rows=1694 loops=1)
Join Filter: ((c.bo_document_fkey)::text = (d.id)::text)
Rows Removed by Join Filter: 101658634
-> Nested Loop Left Join (cost=1.28..106.33 rows=1 width=10) (actual time=0.110..54.635 rows=1694 loops=1)
-> Nested Loop (cost=0.85..105.69 rows=1 width=9) (actual time=0.081..4.404 rows=1694 loops=1)
-> Index Scan using bo_party_inn_idx on bo_party party (cost=0.43..12.43 rows=3 width=10) (actual time=0.031..0.037 rows=3 loops=1)
Index Cond: (inn = '2534005760'::text)
-> Index Only Scan using bo_labels__party_fkey__docbase_fkey__tnved_fkey__idx on bo_labels l (cost=0.42..29.80 rows=1289 width=17) (actual time=0.013..1.041 rows=565 loops=3)
Index Cond: (bo_party_fkey = (party.id)::text)
Heap Fetches: 0
-> Index Only Scan using bo_document_pkey on bo_document_base d (cost=0.43..0.64 rows=1 width=10) (actual time=0.022..0.025 rows=1 loops=1694)
Index Cond: (id = (l.bo_doc_base_fkey)::text)
Heap Fetches: 1134
-> Seq Scan on bo_contract_hardwood_deal c (cost=0.00..2069.77 rows=59770 width=9) (actual time=0.003..11.829 rows=60012 loops=1694)
Planning time: 13.484 ms
Execution time: 31762.885 ms
http://explain.depesz.com/s/V2wn
What is very annoying is incorrect estimate of rows:
Nested Loop (cost=0.85..105.69 rows=1 width=9) (actual time=0.081..4.404 rows=1694 loops=1)
Because that postgres choose nested loops and query run about 30 seconds.
With SET LOCAL enable_nestloop = OFF; it accomplished just in a second.
What is also interesting, I have default_statistics_target = 10000 (at max value) and on all 4 tables run VACUUM VERBOSE ANALYZE just before.
As postgres does not gather statistic between tables such case is very likely possible to happens for other joins too.
Without external extension pghintplan it is not possible change enable_nestloop for just that query.
Is there some other way I could try to force use more speedy way to accomplish that query?
Update by comments
I can't eliminate join in common way. My main search is there any possibilities change statistic (for example) to include desired values which break normal statistical appearance? May be other way to force postgres to change weight of nested loops to use it not so frequently?
Could also someone explain or point to documentation how postgres analyzer for nested loops of two results with 3 (exact correct) and 1289 (which will really 565, but actually such error different question) rows made assumption what in result will be only 1 row??? I've speak about that part of plan:
-> Nested Loop (cost=0.85..105.69 rows=1 width=9) (actual time=0.081..4.404 rows=1694 loops=1)
-> Index Scan using bo_party_inn_idx on bo_party party (cost=0.43..12.43 rows=3 width=10) (actual time=0.031..0.037 rows=3 loops=1)
Index Cond: (inn = '2534005760'::text)
-> Index Only Scan using bo_labels__party_fkey__docbase_fkey__tnved_fkey__idx on bo_labels l (cost=0.42..29.80 rows=1289 width=17) (actual time=0.013..1.041 rows=565 loops=3)
Index Cond: (bo_party_fkey = (party.id)::text)
On first glance it looks initially wrong. What statistics used there and how?
Does postgres maintain also some statistics for indexes?
Actually, I don't have a good sample data to test my answer but I think it might help.
Based on your join columns I'm assuming the following relationship cardinality:
1) bo_party (id 1:N bo_party_fkey) bo_labels
2) bo_labels (bo_doc_base_fkey N:1 id) bo_document_base
3) bo_document_base (id 1:N bo_document_fkey) bo_contract_hardwood_deal
You want to count how much rows were selected. So, based on the cardinality in 1) and 2) the table "bo_labels" have a many to many relationship. This means that joining it with "bo_party" and "bo_document_base" will produce no more rows than the ones existing in the table.
But, after joining "bo_document_base", another join is done to "bo_contract_hardwood_deal" which cardinality described in 3) is one to many, perhaps generating more rows in the final result.
This way, to find the right count of rows you can simplify the join structure to "bo_labels" and "bo_contract_hardwood_deal" through:
4) bo_labels (bo_doc_base_fkey 1:N bo_document_fkey) bo_contract_hardwood_deal
A sample query could be one of the following:
SELECT COUNT(*)
FROM bo_labels L
LEFT JOIN bo_contract_hardwood_deal C ON (C.bo_document_fkey = L.bo_doc_base_fkey)
WHERE 1=1
and exists
(
select 1
from bo_party party
where 1=1
and party.id = L.bo_party_fkey
and party.inn = '?'
)
;
or
SELECT sum((select COUNT(*) from bo_contract_hardwood_deal C where C.bo_document_fkey = L.bo_doc_base_fkey))
FROM bo_labels L
WHERE 1=1
and exists
(
select 1
from bo_party party
where 1=1
and party.id = L.bo_party_fkey
and party.inn = '?'
)
;
I could not test with large tables, so I don't know exactly if it will improve performance against your original query, but I think it might help.

Why does the following join increase the query time significantly?

I have a star schema here and I am querying the fact table and would like to join one very small dimension table. I can't really explain the following:
EXPLAIN ANALYZE SELECT
COUNT(impression_id), imp.os_id
FROM bi.impressions imp
GROUP BY imp.os_id;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=868719.08..868719.24 rows=16 width=10) (actual time=12559.462..12559.466 rows=26 loops=1)
-> Seq Scan on impressions imp (cost=0.00..690306.72 rows=35682472 width=10) (actual time=0.009..3030.093 rows=35682474 loops=1)
Total runtime: 12559.523 ms
(3 rows)
This takes ~12600ms, but of course there is no joined data, so I can't "resolve" the imp.os_id to something meaningful, so I add a join:
EXPLAIN ANALYZE SELECT
COUNT(impression_id), imp.os_id, os.os_desc
FROM bi.impressions imp, bi.os_desc os
WHERE imp.os_id=os.os_id
GROUP BY imp.os_id, os.os_desc;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=1448560.83..1448564.99 rows=416 width=22) (actual time=25565.124..25565.127 rows=26 loops=1)
-> Hash Join (cost=1.58..1180942.29 rows=35682472 width=22) (actual time=0.046..15157.684 rows=35682474 loops=1)
Hash Cond: (imp.os_id = os.os_id)
-> Seq Scan on impressions imp (cost=0.00..690306.72 rows=35682472 width=10) (actual time=0.007..3705.647 rows=35682474 loops=1)
-> Hash (cost=1.26..1.26 rows=26 width=14) (actual time=0.028..0.028 rows=26 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 2kB
-> Seq Scan on os_desc os (cost=0.00..1.26 rows=26 width=14) (actual time=0.003..0.010 rows=26 loops=1)
Total runtime: 25565.199 ms
(8 rows)
This effectively doubles the execution time of my query. My question is, what did I leave out from the picture? I would think such a small lookup was not causing huge difference in query execution time.
Rewritten with (recommended) explicit ANSI JOIN syntax:
SELECT COUNT(impression_id), imp.os_id, os.os_desc
FROM bi.impressions imp
JOIN bi.os_desc os ON os.os_id = imp.os_id
GROUP BY imp.os_id, os.os_desc;
First of all, your second query might be wrong, if more or less than exactly one match are found in os_desc for every row in impressions.
This can be ruled out if you have a foreign key constraint on os_id in place, that guarantees referential integrity, plus a NOT NULL constraint on bi.impressions.os_id. If so, in a first step, simplify to:
SELECT COUNT(*) AS ct, imp.os_id, os.os_desc
FROM bi.impressions imp
JOIN bi.os_desc os USING (os_id)
GROUP BY imp.os_id, os.os_desc;
count(*) is faster than count(column) and equivalent here if the column is NOT NULL. And add a column alias for the count.
Faster, yet:
SELECT os_id, os.os_desc, sub.ct
FROM (
SELECT os_id, COUNT(*) AS ct
FROM bi.impressions
GROUP BY 1
) sub
JOIN bi.os_desc os USING (os_id)
Aggregate first, join later. More here:
Aggregate a single column in query with many columns
PostgreSQL - order by an array
HashAggregate (cost=868719.08..868719.24 rows=16 width=10)
HashAggregate (cost=1448560.83..1448564.99 rows=416 width=22)
Hmm, width from 10 to 22 is a doubling. Perhaps you should join after grouping instead of before?
The following query solves the problem without increasing the query execution time. The question still stands why does the execution time increase significantly with adding a very simple join, but it might be a Postgres specific question and somebody with extensive experience in the area might answer it eventually.
WITH
OSES AS (SELECT os_id,os_desc from bi.os_desc)
SELECT
COUNT(impression_id) as imp_count,
os_desc FROM bi.impressions imp,
OSES os
WHERE
os.os_id=imp.os_id
GROUP BY os_desc
ORDER BY imp_count;