Optimize query with multiple "between" conditions - sql

I have a table playground with column val, column val is indexed.
I have a list of ranges [(min1, max1), (min2, max2), ... , (minN, maxN)]
and I want to select all rows with val that fit in any of those ranges.
E.g. my ranges looks like that: [(1,5), (20,25), (200,400)]
Here is the simple query that extracts corresponding rows:
select p.*
from playground p
where (val between 1 AND 5) or (val between 20 and 25) or
(val between 200 and 400);
The problem here is that this list of ranges is dynamic, my application generates it and sends it along with the query to postgres.
I tried to rewrite the query to accept dynamic list of ranges:
select p.*
from playground p,
unnest(ARRAY [(1, 5),(20, 25),(200, 400)]) as r(min_val INT, max_val INT)
where p.val between r.min_val and r.max_val;
It extracts the same rows, but I don't know is an effective query or not?
This is how the explain looks like for the first query:
Bitmap Heap Scan on playground p (cost=12.43..16.45 rows=1 width=36) (actual time=0.017..0.018 rows=4 loops=1)
Recheck Cond: (((val >= 1) AND (val <= 5)) OR ((val >= 20) AND (val <= 25)) OR ((val >= 200) AND (val <= 400)))
Heap Blocks: exact=1
-> BitmapOr (cost=12.43..12.43 rows=1 width=0) (actual time=0.012..0.012 rows=0 loops=1)
-> Bitmap Index Scan on playground_val_index (cost=0.00..4.14 rows=1 width=0) (actual time=0.010..0.010 rows=3 loops=1)
Index Cond: ((val >= 1) AND (val <= 5))
-> Bitmap Index Scan on playground_val_index (cost=0.00..4.14 rows=1 width=0) (actual time=0.001..0.001 rows=0 loops=1)
Index Cond: ((val >= 20) AND (val <= 25))
-> Bitmap Index Scan on playground_val_index (cost=0.00..4.14 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)
Index Cond: ((val >= 200) AND (val <= 400))
Planning Time: 0.071 ms
Execution Time: 0.057 ms
And here is the explain for the second:
Nested Loop (cost=0.14..12.52 rows=2 width=36) (actual time=0.033..0.065 rows=4 loops=1)
-> Function Scan on unnest r (cost=0.00..0.03 rows=3 width=8) (actual time=0.011..0.012 rows=3 loops=1)
-> Index Scan using playground_val_index on playground p (cost=0.13..4.15 rows=1 width=36) (actual time=0.008..0.015 rows=1 loops=3)
Index Cond: ((val >= r.min_val) AND (val <= r.max_val))
Planning Time: 0.148 ms
Execution Time: 0.714 ms
NOTE: In both cases I did set enable_seqscan = false; to make the index work.
I am worried about the "Nested Loop" stage. Is it Okay? Or there are more effective ways to pass dynamic list of ranges into a query?
My postgres version is 12.1

You added more information, but much more is relevant, yet. Exact table and index definition, cardinality, data distribution, row size stats, number of ranges in predicate, purpose of the table, write patterns, ... Performance optimization needs all the input it can get.
Shot in the dark: with non-overlapping ranges, a UNION ALL query may deliver best performance:
SELECT * FROM playground WHERE val BETWEEN 1 AND 5
UNION ALL
SELECT * FROM playground WHERE val BETWEEN 20 AND 25
UNION ALL
SELECT * FROM playground WHERE val BETWEEN 200 AND 400;
We know that ranges don't overlap, but Postgres doesn't, so it has to do extra work in your attempts. This query should avoid both the BitmapOr of the first as well as the Nested Loop of the second plan. Just fetch each range and append to the output. Should result in a plan like:
Append (cost=0.13..24.50 rows=3 width=40)
-> Index Scan using playground_val_idx on playground (cost=0.13..8.15 rows=1 width=40)
Index Cond: ((val >= 1) AND (val <= 5))
-> Index Scan using playground_val_idx on playground playground_1 (cost=0.13..8.15 rows=1 width=40)
Index Cond: ((val >= 20) AND (val <= 25))
-> Index Scan using playground_val_idx on playground playground_2 (cost=0.13..8.15 rows=1 width=40)
Index Cond: ((val >= 200) AND (val <= 400))
Plus, each sub-SELECT will be based on actual statistics for the given range, not generic estimates, even for a longer list of ranges. See (recommended!):
How to use index for simple time range join?
You can generate the query in your client or write a server-side function to generate and execute dynamic SQL (applicable as the result type is known).
You might even test a server-side function using a LOOP (which is often less efficient, but this may be an exception):
CREATE OR REPLACE FUNCTION foo(_ranges int[])
RETURNS SETOF playground LANGUAGE plpgsql PARALLEL SAFE STABLE AS
$func$
DECLARE
_range int[];
BEGIN
FOREACH _range SLICE 1 IN ARRAY _ranges
LOOP
RETURN QUERY
SELECT * FROM playground WHERE val BETWEEN _range[1] AND _range[2];
END LOOP;
END
$func$;
The overhead may not pay for few ranges in the call. But very convenient to call, if nothing else:
SELECT * FROM foo('{{1,5},{20,25},{200,400}}');
Related:
Loop over array dimension in plpgsql
db<>fiddle here
Physical order of rows may help a lot. If rows are stored in sequence, (much) fewer data pages need to be processed. Depends on undisclosed details. Built-in CLUSTER or the extensions pg_repack or pg_squeeze may help with that. Related:
Optimize Postgres timestamp query range
And it's recommended to use the latest available minor release for whatever major version is in use. That would be 12.2 at the time of writing (released 2020-02-13).

Related

Equivalent for FETCH FIRST WITH TIES in Postgres 11 with comparable performance

Given the following DDL
CREATE TABLE numbers (
id BIGSERIAL PRIMARY KEY,
number BIGINT
);
CREATE INDEX ON numbers (number);
In PostgreSQL 13 I can use the following query:
SELECT *
FROM numbers
WHERE number > 1
ORDER BY number
FETCH FIRST 1000 ROWS WITH TIES
It produces a very effective query plan and performs well enough with large tables:
Limit (cost=...)
-> Index Scan using numbers_number_idx on numbers (cost=...)
Index Cond: (number > 1)
Is there an equivalent for that in PostgreSQL 11 that gives a similar query plan and comparable performance for large (10TB+) tables?
This answer suggests to use the following query:
WITH cte AS (
SELECT *, RANK() OVER (ORDER BY number) AS rnk
FROM numbers
WHERE number > 1
)
SELECT *
FROM cte
WHERE rnk <= 1000;
But it is not really usable for large tables because performance is many times worse.
Subquery Scan on cte (cost=...)
Filter: (cte.rnk <= 1000)
-> WindowAgg (cost=...)
-> Index Scan using numbers_number_idx on numbers (cost=...)
Index Cond: (number > 1)
You won't get the performance of the extremely convenient WITH TIES in Postgres 13 or later. See:
Get top row(s) with highest value, with ties
But there are always options ...
Faster SQL variant
Should be faster for big tables because it avoids to rank the whole table like the simple solution in the referenced answer.
WITH cte AS (
SELECT *, row_number() OVER (ORDER BY number, id) AS rn -- ORDER must match
FROM numbers
WHERE number > 1
ORDER BY number, id -- tiebreaker to make it deterministic
LIMIT 1000
)
TABLE cte
UNION ALL
(
SELECT n.*, 1000 + row_number() OVER (ORDER BY n.id) AS rn
FROM numbers n, (SELECT number, id FROM cte WHERE rn = 1000) AS max
WHERE n.number = max.number
AND n.id > max.id
ORDER BY n.id
);
id is any UNIQUE NOT NULL column in your table that lends itself as tiebreaker, typically the PRIMARY KEY.
Sort by that additionally, which has the (maybe welcome) side effect that you get a deterministically ordered result.
Ideally, you have a multicolumn index on (number, id) for this. But it will use the existing index on just (number), too. Resulting performance depends on cardinalities and data types.
Related:
Is a composite index also good for queries on the first field?
Since the CTE query counts as one command, there is no race condition under concurrent write load. All parts of the command see the same snapshot of the table in default isolation level READ COMMITTED.
I might wrap this into a simple SQL function (or prepared statement) and pass LIMIT to it for convenience.
Alternative with PL/pgSQL
Only running a single SELECT probably outweighs the added overhead. Works optimally with the existing index on just (number):
CREATE OR REPLACE FUNCTION foo(_bound int, _limit int, OUT _row public.numbers)
RETURNS SETOF public.numbers
LANGUAGE plpgsql AS
$func$
DECLARE
_ct int := 0;
_number int; -- match type!
BEGIN
FOR _row IN
SELECT *
FROM public.numbers
WHERE number > _bound
ORDER BY number
LOOP
_ct := _ct + 1;
CASE
WHEN _ct < _limit THEN
-- do nothing
WHEN _ct > _limit THEN
EXIT WHEN _row.number > _number;
WHEN _ct = _limit THEN
_number := _row.number;
END CASE;
RETURN NEXT;
END LOOP;
END
$func$;
But it's tailored for just the one query. Gets tedious for varying queries.
I got pretty good results with the following select query. Not as fast as the other answer, but the query is also a lot simpler. the offset value has to be 1 less than the number of rows desired (999 for 1000) since offset starts at 0.
-- query 1
select *
from numbers
where number <= (
select number
from numbers
where number > 1
order by number
offset 999 limit 1
)
and number > 1
Performance
test was run with the table as defined in the question, and the following insert statement.
INSERT INTO numbers (n)
SELECT floor(random() * 100000000)
FROM generate_series(1, 100000000);
I'll admit its not very scientific, as this is done on my puny windows 10 laptop & running on a postgresql 13 cluster on wsl with all default settings on battery power. i didn't restart the server between runs or do anything to prevent optimization due to data present in cache.
Explain analyze results, where query 1 refers to the query above, query 2 refers to the simple query in question, query 3 refers to query in other answer and query 4 is the elegant with ties variant.
query
execution time
factor slower
1
80.642 ms
63.6
2
142142.026 ms
112011
3
59.551 ms
46.9
4
1.269 ms
1
Explain analyze query plans
-- query 1
Gather (cost=10965.26..1178206.10 rows=500000 width=16) (actual time=64.157..79.186 rows=1000 loops=1)
Workers Planned: 2
Params Evaluated: $0
Workers Launched: 2
InitPlan 1 (returns $0)
-> Limit (cost=27.66..27.69 rows=1 width=8) (actual time=63.613..63.615 rows=1 loops=1)
-> Index Only Scan using numbers_n_idx on numbers numbers_1 (cost=0.57..2712066.05 rows=100000085 width=8) (actual time=0.029..0.318 rows=1000 loops=1)
Index Cond: (number > 1)
Heap Fetches: 0
-> Parallel Bitmap Heap Scan on numbers (cost=9937.57..1127178.41 rows=208333 width=16) (actual time=0.090..0.498 rows=333 loops=3)
Recheck Cond: ((number <= $0) AND (number > 1))
Heap Blocks: exact=1000
-> Bitmap Index Scan on numbers_n_idx (cost=0.00..9812.57 rows=500000 width=0) (actual time=0.135..0.135 rows=1000 loops=1)
Index Cond: ((number <= $0) AND (number > 1))
Planning Time: 0.230 ms
JIT:
Functions: 18
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 1.936 ms, Inlining 44.863 ms, Optimization 12.045 ms, Emission 6.182 ms, Total 65.026 ms
Execution Time: 80.642 ms
(20 rows)
-- query 2
Subquery Scan on cte (cost=8472029.70..22868689.85 rows=33333362 width=24) (actual time=40001.966..142074.323 rows=1000 loops=1)
Filter: (cte.rnk <= 1000)
Rows Removed by Filter: 99998999
-> WindowAgg (cost=8472029.70..21618688.79 rows=100000085 width=24) (actual time=40001.964..129209.141 rows=99999999 loops=1)
-> Gather Merge (cost=8472029.70..20118687.52 rows=100000085 width=16) (actual time=40001.921..75176.460 rows=99999999 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=8471029.68..8575196.43 rows=41666702 width=16) (actual time=39707.316..46741.888 rows=33333333 loops=3)
Sort Key: numbers.number
Sort Method: external merge Disk: 856888kB
Worker 0: Sort Method: external merge Disk: 839696kB
Worker 1: Sort Method: external merge Disk: 847688kB
-> Parallel Seq Scan on numbers (cost=0.00..1061374.79 rows=41666702 width=16) (actual time=60.334..13404.824 rows=33333333 loops=3)
Filter: (number > 1)
Rows Removed by Filter: 0
Planning Time: 0.179 ms
JIT:
Functions: 16
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 1.773 ms, Inlining 90.193 ms, Optimization 55.148 ms, Emission 34.709 ms, Total 181.823 ms
Execution Time: 142142.026 ms
(21 rows)
-- query 3
Append (cost=2497.62..2628.54 rows=1005 width=24) (actual time=12.881..59.360 rows=1000 loops=1)
CTE cte
-> Limit (cost=1004.33..2497.62 rows=1000 width=24) (actual time=12.877..58.683 rows=1000 loops=1)
-> WindowAgg (cost=1004.33..149329948.33 rows=100000085 width=24) (actual time=12.875..58.435 rows=1000 loops=1)
-> Gather Merge (cost=1004.33..147579946.84 rows=100000085 width=16) (actual time=12.858..57.988 rows=1001 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Incremental Sort (cost=4.31..136036455.76 rows=41666702 width=16) (actual time=0.147..29.746 rows=366 loops=3)
Sort Key: numbers.number, numbers.id
Presorted Key: numbers.number
Full-sort Groups: 13 Sort Method: quicksort Average Memory: 26kB Peak Memory: 26kB
Worker 0: Full-sort Groups: 15 Sort Method: quicksort Average Memory: 26kB Peak Memory: 26kB
Worker 1: Full-sort Groups: 8 Sort Method: quicksort Average Memory: 26kB Peak Memory: 26kB
-> Parallel Index Scan using numbers_n_idx on numbers (cost=0.57..134359882.50 rows=41666702 width=16) (actual time=0.042..29.576 rows=393 loops=3)
Index Cond: (number > 1)
-> CTE Scan on cte (cost=0.00..20.00 rows=1000 width=24) (actual time=12.880..14.618 rows=1000 loops=1)
-> WindowAgg (cost=105.75..105.85 rows=5 width=24) (actual time=0.079..0.081 rows=0 loops=1)
-> Sort (cost=105.75..105.76 rows=5 width=16) (actual time=0.077..0.079 rows=0 loops=1)
Sort Key: n.id
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=0.57..105.69 rows=5 width=16) (actual time=0.070..0.071 rows=0 loops=1)
-> CTE Scan on cte cte_1 (cost=0.00..22.50 rows=5 width=16) (actual time=0.051..0.051 rows=1 loops=1)
Filter: (rn = 1000)
Rows Removed by Filter: 999
-> Index Scan using numbers_n_idx on numbers n (cost=0.57..16.63 rows=1 width=16) (actual time=0.015..0.016 rows=0 loops=1)
Index Cond: (number = cte_1.number)
Filter: (id > cte_1.id)
Rows Removed by Filter: 1
Planning Time: 0.202 ms
Execution Time: 59.551 ms
(30 rows)
-- query 4
Limit (cost=0.57..1350.00 rows=1000 width=16) (actual time=0.013..1.113 rows=1000 loops=1)
-> Index Scan using numbers_n_idx on numbers (cost=0.57..134943216.33 rows=100000085 width=16) (actual time=0.012..0.869 rows=1001 loops=1)
Index Cond: (number > 1)
Planning Time: 0.142 ms
Execution Time: 1.269 ms
(5 rows)

PL/pgSQL Query Plan Worse Inside Function Than Outside

I have a function that is running too slow. I've isolated which piece of the function is slow.. a small SELECT statement:
SELECT image_group_id
FROM programs.image_family fam
JOIN programs.provider_file pf
ON (fam.provider_data_id = pf.provider_data_id
AND fam.family_id = $1 AND pf.image_group_id IS NOT NULL)
LIMIT 1
When I run the function this piece of SQL generates the following query plan:
Query Text: SELECT image_group_id FROM programs.image_family fam JOIN programs.provider_file pf ON (fam.provider_data_id = pf.provider_data_id AND fam.family_id = $1 AND pf.image_group_id IS NOT NULL) LIMIT 1
Limit (cost=0.56..6.75 rows=1 width=6) (actual time=3471.004..3471.004 rows=0 loops=1)
-> Nested Loop (cost=0.56..594054.42 rows=96017 width=6) (actual time=3471.002..3471.002 rows=0 loops=1)
-> Seq Scan on image_family fam (cost=0.00..391880.08 rows=96023 width=6) (actual time=3471.001..3471.001 rows=0 loops=1)
Filter: ((family_id)::numeric = '8419853'::numeric)
Rows Removed by Filter: 19204671
-> Index Scan using "IX_DBO_PROVIDER_FILE_1" on provider_file pf (cost=0.56..2.11 rows=1 width=12) (never executed)
Index Cond: (provider_data_id = fam.provider_data_id)
Filter: (image_group_id IS NOT NULL)
When I run the selected query in a query tool (outside of the function) the query plan looks like this:
Limit (cost=1.12..3.81 rows=1 width=6) (actual time=0.043..0.043 rows=1 loops=1)
Output: pf.image_group_id
Buffers: shared hit=11
-> Nested Loop (cost=1.12..14.55 rows=5 width=6) (actual time=0.041..0.041 rows=1 loops=1)
Output: pf.image_group_id
Inner Unique: true
Buffers: shared hit=11
-> Index Only Scan using image_family_family_id_provider_data_id_idx on programs.image_family fam (cost=0.56..1.65 rows=5 width=6) (actual time=0.024..0.024 rows=1 loops=1)
Output: fam.family_id, fam.provider_data_id
Index Cond: (fam.family_id = 8419853)
Heap Fetches: 2
Buffers: shared hit=6
-> Index Scan using "IX_DBO_PROVIDER_FILE_1" on programs.provider_file pf (cost=0.56..2.58 rows=1 width=12) (actual time=0.013..0.013 rows=1 loops=1)
Output: pf.provider_data_id, pf.provider_file_path, pf.posted_dt, pf.file_repository_id, pf.restricted_size, pf.image_group_id, pf.is_master, pf.is_biggest
Index Cond: (pf.provider_data_id = fam.provider_data_id)
Filter: (pf.image_group_id IS NOT NULL)
Buffers: shared hit=5
Planning time: 0.809 ms
Execution time: 0.100 ms
If I disable sequence scans in the function I can get a similar query plan:
Query Text: SELECT image_group_id FROM programs.image_family fam JOIN programs.provider_file pf ON (fam.provider_data_id = pf.provider_data_id AND fam.family_id = $1 AND pf.image_group_id IS NOT NULL) LIMIT 1
Limit (cost=1.12..8.00 rows=1 width=6) (actual time=3855.722..3855.722 rows=0 loops=1)
-> Nested Loop (cost=1.12..660217.34 rows=96017 width=6) (actual time=3855.721..3855.721 rows=0 loops=1)
-> Index Only Scan using image_family_family_id_provider_data_id_idx on image_family fam (cost=0.56..458043.00 rows=96023 width=6) (actual time=3855.720..3855.720 rows=0 loops=1)
Filter: ((family_id)::numeric = '8419853'::numeric)
Rows Removed by Filter: 19204671
Heap Fetches: 368
-> Index Scan using "IX_DBO_PROVIDER_FILE_1" on provider_file pf (cost=0.56..2.11 rows=1 width=12) (never executed)
Index Cond: (provider_data_id = fam.provider_data_id)
Filter: (image_group_id IS NOT NULL)
The query plans are different where the Filter functions are for the Index Only Scan. The function has more Heap Fetches and seems to treat the argument as a string casted to a numeric.
Things I've tried:
Increasing statistics (and running vacuum/analyze)
Calling the problematic piece of SQL in another function with language SQL
Add another index (the one that its using now to perform an INDEX ONLY scan)
Create a CTE for the image_family table (this did help performance but would still do a sequence scan on the image_family instead of using the index so still, too slow)
Change from executing raw SQL to using an EXECUTE ... INTO .. USING in the function.
Makeup of the two tables:
image_family:
provider_data_id: numeric(16)
family_id: int4
(rest omitted for brevity)
unique index on provider_data_id
index on family_id
I recently added a unique index on (family_id, provider_data_id) as well
Approximately 20 million rows here. Families have many provider_data_ids but not all provider_data_ids are part of families and thus aren't all in this table.
provider_file:
provider_data_id numeric(16)
image_group_id numeric(16)
(rest omitted for brevity)
unique index on provider_data_id
Approximately 32 million rows in this table. Most rows (> 95%) have a Non-Null image_group_id.
Postgres Version 10
How can I get the query performance to match whether I call it from a function or as raw SQL in a query tool?
The problem is exhibited in this line:
Filter: ((family_id)::numeric = '8419853'::numeric)
The index on family_id cannot be used because family_id is compared to a numeric value. This requires a cast to numeric, and there is no index on family_id::numeric.
Even though integer and numeric both are types representing numbers, their internal representation is quite different, and so the indexes are incompatible. In other words, the cast to numeric is like a function for PostgreSQL, and since it has no index on that functional expression, it has to resort to a scan of the whole table (or index).
The solution is simple, however: use an integer instead of a numeric parameter for the query. If in doubt, use a cast like
fam.family_id = $1::integer

Postgres: STABLE function called multiple times on constant

I'm having a Postgresql (version 9.4) performance puzzle. I have a function (prevd) declared as STABLE (see below). When I run this function on a constant in where clause, it is called multiple times - instead of once.
If I understand postgres documentation correctly, the query should be optimized to call prevd only once.
A STABLE function cannot modify the database and is guaranteed to return the same results given the same arguments for all rows within a single statement
Why it doesn't optimize calls to prevd in this case?
I'm not expecting prevd to be called once for all subsequent queries using prevd on the same argument (like it was IMMUTABLE). I'm expecting postgres to create a plan for my query with just one call to prevd('2015-12-12')
Please find the code below:
Schema
create table somedata(d date, number double precision);
create table dates(d date);
insert into dates
select generate_series::date
from generate_series('2015-01-01'::date, '2015-12-31'::date, '1 day');
insert into somedata
select '2015-01-01'::date + (random() * 365 + 1)::integer, random()
from generate_series(1, 100000);
create or replace function prevd(date_ date)
returns date
language sql
stable
as $$
select max(d) from dates where d < date_;
$$
Slow Query
select avg(number) from somedata where d=prevd('2015-12-12');
Poor query plan of the query above
Aggregate (cost=28092.74..28092.75 rows=1 width=8) (actual time=3532.638..3532.638 rows=1 loops=1)
Output: avg(number)
-> Seq Scan on public.somedata (cost=0.00..28091.43 rows=525 width=8) (actual time=10.210..3532.576 rows=282 loops=1)
Output: d, number
Filter: (somedata.d = prevd('2015-12-12'::date))
Rows Removed by Filter: 99718
Planning time: 1.144 ms
Execution time: 3532.688 ms
(8 rows)
Performance
The query above, on my machine runs around 3.5s. After changing prevd to IMMUTABLE, it's changing to 0.035s.
I started writing this as a comment, but it got a bit long, so I'm expanding it into an answer.
As discussed in this previous answer, Postgres does not promise to always optimise based on STABLE or IMMUTABLE annotations, only that it can sometimes do so. It does this by planning the query differently by taking advantage of certain assumptions. This part of the previous answer is directly analogous to your case:
This particular sort of rewriting depends upon immutability or stability. With where test_multi_calls1(30) != num query re-writing will happen for immutable but not for merely stable functions.
If you change the function to IMMUTABLE and look at the query plan, you will see that the rewriting it does is really rather radical:
Seq Scan on public.somedata (cost=0.00..1791.00 rows=272 width=12) (actual time=0.036..14.549 rows=270 loops=1)
Output: d, number
Filter: (somedata.d = '2015-12-11'::date)
Buffers: shared read=541 written=14
Total runtime: 14.589 ms
It actually runs the function while planning the query, and substitutes the value before the query is even executed. With a STABLE function, this optimisation would clearly not be appropriate - the data might change between planning and executing the query.
In a comment, it was mentioned that this query results in an optimised plan:
select avg(number) from somedata where d=(select prevd(date '2015-12-12'));
This is fast, but note that the plan doesn't look anything like what the IMMUTABLE version did:
Aggregate (cost=1791.69..1791.70 rows=1 width=8) (actual time=14.670..14.670 rows=1 loops=1)
Output: avg(number)
Buffers: shared read=541 written=21
InitPlan 1 (returns $0)
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)
Output: '2015-12-11'::date
-> Seq Scan on public.somedata (cost=0.00..1791.00 rows=273 width=8) (actual time=0.026..14.589 rows=270 loops=1)
Output: d, number
Filter: (somedata.d = $0)
Buffers: shared read=541 written=21
Total runtime: 14.707 ms
By putting it into a sub-query, you are moving the function call from the WHERE clause to the SELECT clause. More importantly, the sub-query can always be executed once and used by the rest of the query; so the function is run once in a separate node of the plan.
To confirm this, we can take the SQL out of a function altogether:
select avg(number) from somedata where d=(select max(d) from dates where d < '2015-12-12');
This gives a rather longer plan with very similar performance:
Aggregate (cost=1799.12..1799.13 rows=1 width=8) (actual time=14.174..14.174 rows=1 loops=1)
Output: avg(somedata.number)
Buffers: shared read=543 written=19
InitPlan 1 (returns $0)
-> Aggregate (cost=7.43..7.44 rows=1 width=4) (actual time=0.150..0.150 rows=1 loops=1)
Output: max(dates.d)
Buffers: shared read=2
-> Seq Scan on public.dates (cost=0.00..6.56 rows=347 width=4) (actual time=0.015..0.103 rows=345 loops=1)
Output: dates.d
Filter: (dates.d < '2015-12-12'::date)
Buffers: shared read=2
-> Seq Scan on public.somedata (cost=0.00..1791.00 rows=273 width=8) (actual time=0.190..14.098 rows=270 loops=1)
Output: somedata.d, somedata.number
Filter: (somedata.d = $0)
Buffers: shared read=543 written=19
Total runtime: 14.232 ms
The important thing to note is that the inner Aggregate (the max(d)) is executed once, on a separate node from the main Seq Scan (which is checking the where clause). In this position, even a VOLATILE function can be optimised in the same way.
In short, while you know that the query you've produced can be optimised by executing the function only once, it doesn't match any of the patterns that Postgres's query planner knows how to rewrite, so it uses a naive plan which runs the function multiple times.
[Note: all tests performed on Postgres 9.1, because it's what I happened to have to hand.]

Postgres where query optimization

In our database we have a table menus having 515502 rows. It has a column status which is of type smallint.
Currently, a simple count query takes 700 ms for set of docs having value of status as 3.
explain analyze select count(id) from menus where status = 2;
Aggregate (cost=72973.71..72973.72 rows=1 width=4) (actual time=692.564..692.565 rows=1 loops=1)
-> Bitmap Heap Scan on menus (cost=2510.63..72638.80 rows=133962 width=4) (actual time=28.179..623.077 rows=135429 loops=1)
Recheck Cond: (status = 2)
Rows Removed by Index Recheck: 199654
-> Bitmap Index Scan on menus_status (cost=0.00..2477.14 rows=133962 width=0) (actual time=26.211..26.211 rows=135429 loops=1)
Index Cond: (status = 2)
Total runtime: 692.705 ms
(7 rows)
Some rows have column value of 1 for which the query runs very fast.
explain analyze select count(id) from menus where status = 4;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=7198.73..7198.74 rows=1 width=4) (actual time=24.926..24.926 rows=1 loops=1)
-> Bitmap Heap Scan on menus (cost=40.53..7193.53 rows=2079 width=4) (actual time=1.461..23.418 rows=2220 loops=1)
Recheck Cond: (status = 4)
-> Bitmap Index Scan on menus_status (cost=0.00..40.02 rows=2079 width=0) (actual time=0.858..0.858 rows=2220 loops=1)
Index Cond: (status = 4)
Total runtime: 25.089 ms
(6 rows)
I observed that the most general btree index is the best indexing strategy for simple equality based queries. Both gin and hash were slower than btree.
Any tips for making count queries faster for any filter that is using an index?
I understand that this is a beginner level question, so apologies in advance for any kind of mistakes I might have made.
Maybe your table has more rows having status = 2 than ones having status = 4 , so, the total table access time is more for the second case.
So, for status = 2 there are too many rows to consider, so the the Bitmap for the Bitmap Heap Scan goes to the "lossy" mode, and recheck is needed after the operation. So, there are two things to consider: either your result is too big (but you can't do anything with that without reorganizing your tables, say, with partitioning), or your 'work_mem' param is too small to keep the intermittent result. Try to increase its value if you have possibility.

Partitioned table query still scanning all partitions

I have a table with over a billion records. In order to improve performance, I partitioned it to 30 partitions. The most frequent queries have (id = ...) in their where clause, so I decided to partition the table on the id column.
Basically, the partitions were created in this way:
CREATE TABLE foo_0 (CHECK (id % 30 = 0)) INHERITS (foo);
CREATE TABLE foo_1 (CHECK (id % 30 = 1)) INHERITS (foo);
CREATE TABLE foo_2 (CHECK (id % 30 = 2)) INHERITS (foo);
CREATE TABLE foo_3 (CHECK (id % 30 = 3)) INHERITS (foo);
.
.
.
I ran ANALYZE for the entire database and in particular, I made it collect extra statistics for this table's id column by running:
ALTER TABLE foo ALTER COLUMN id SET STATISTICS 10000;
However when I run queries that filter on the id column the planner shows that it's still scanning all the partitions. constraint_exclusion is set to partition, so that's not the problem.
EXPLAIN ANALYZE SELECT * FROM foo WHERE (id = 2);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=0.00..8106617.40 rows=3620981 width=54) (actual time=30.544..215.540 rows=171477 loops=1)
-> Append (cost=0.00..8106617.40 rows=3620981 width=54) (actual time=30.539..106.446 rows=171477 loops=1)
-> Seq Scan on foo (cost=0.00..0.00 rows=1 width=203) (actual time=0.002..0.002 rows=0 loops=1)
Filter: (id = 2)
-> Bitmap Heap Scan on foo_0 foo (cost=3293.44..281055.75 rows=122479 width=52) (actual time=0.020..0.020 rows=0 loops=1)
Recheck Cond: (id = 2)
-> Bitmap Index Scan on foo_0_idx_1 (cost=0.00..3262.82 rows=122479 width=0) (actual time=0.018..0.018 rows=0 loops=1)
Index Cond: (id = 2)
-> Bitmap Heap Scan on foo_1 foo (cost=3312.59..274769.09 rows=122968 width=56) (actual time=0.012..0.012 rows=0 loops=1)
Recheck Cond: (id = 2)
-> Bitmap Index Scan on foo_1_idx_1 (cost=0.00..3281.85 rows=122968 width=0) (actual time=0.010..0.010 rows=0 loops=1)
Index Cond: (id = 2)
-> Bitmap Heap Scan on foo_2 foo (cost=3280.30..272541.10 rows=121903 width=56) (actual time=30.504..77.033 rows=171477 loops=1)
Recheck Cond: (id = 2)
-> Bitmap Index Scan on foo_2_idx_1 (cost=0.00..3249.82 rows=121903 width=0) (actual time=29.825..29.825 rows=171477 loops=1)
Index Cond: (id = 2)
.
.
.
What could I do to make the planer have a better plan? Do I need to run ALTER TABLE foo ALTER COLUMN id SET STATISTICS 10000; for all the partitions as well?
EDIT
After using Erwin's suggested change to the query, the planner only scans the correct partition, however the execution time is actually worse then a full scan (at least of the index).
EXPLAIN ANALYZE select * from foo where (id % 30 = 2) and (id = 2);
QUERY PLAN
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=0.00..8106617.40 rows=3620981 width=54) (actual time=32.611..224.934 rows=171477 loops=1)
-> Append (cost=0.00..8106617.40 rows=3620981 width=54) (actual time=32.606..116.565 rows=171477 loops=1)
-> Seq Scan on foo (cost=0.00..0.00 rows=1 width=203) (actual time=0.002..0.002 rows=0 loops=1)
Filter: (id = 2)
-> Bitmap Heap Scan on foo_0 foo (cost=3293.44..281055.75 rows=122479 width=52) (actual time=0.046..0.046 rows=0 loops=1)
Recheck Cond: (id = 2)
-> Bitmap Index Scan on foo_0_idx_1 (cost=0.00..3262.82 rows=122479 width=0) (actual time=0.044..0.044 rows=0 loops=1)
Index Cond: (id = 2)
-> Bitmap Heap Scan on foo_1 foo (cost=3312.59..274769.09 rows=122968 width=56) (actual time=0.021..0.021 rows=0 loops=1)
Recheck Cond: (id = 2)
-> Bitmap Index Scan on foo_1_idx_1 (cost=0.00..3281.85 rows=122968 width=0) (actual time=0.020..0.020 rows=0 loops=1)
Index Cond: (id = 2)
-> Bitmap Heap Scan on foo_2 foo (cost=3280.30..272541.10 rows=121903 width=56) (actual time=32.536..86.730 rows=171477 loops=1)
Recheck Cond: (id = 2)
-> Bitmap Index Scan on foo_2_idx_1 (cost=0.00..3249.82 rows=121903 width=0) (actual time=31.842..31.842 rows=171477 loops=1)
Index Cond: (id = 2)
-> Bitmap Heap Scan on foo_3 foo (cost=3475.87..285574.05 rows=129032 width=52) (actual time=0.035..0.035 rows=0 loops=1)
Recheck Cond: (id = 2)
-> Bitmap Index Scan on foo_3_idx_1 (cost=0.00..3443.61 rows=129032 width=0) (actual time=0.031..0.031 rows=0 loops=1)
.
.
.
-> Bitmap Heap Scan on foo_29 foo (cost=3401.84..276569.90 rows=126245 width=56) (actual time=0.019..0.019 rows=0 loops=1)
Recheck Cond: (id = 2)
-> Bitmap Index Scan on foo_29_idx_1 (cost=0.00..3370.28 rows=126245 width=0) (actual time=0.018..0.018 rows=0 loops=1)
Index Cond: (id = 2)
Total runtime: 238.790 ms
Versus:
EXPLAIN ANALYZE select * from foo where (id % 30 = 2) and (id = 2);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=0.00..273120.30 rows=611 width=56) (actual time=31.519..257.051 rows=171477 loops=1)
-> Append (cost=0.00..273120.30 rows=611 width=56) (actual time=31.516..153.356 rows=171477 loops=1)
-> Seq Scan on foo (cost=0.00..0.00 rows=1 width=203) (actual time=0.002..0.002 rows=0 loops=1)
Filter: ((id = 2) AND ((id % 30) = 2))
-> Bitmap Heap Scan on foo_2 foo (cost=3249.97..273120.30 rows=610 width=56) (actual time=31.512..124.177 rows=171477 loops=1)
Recheck Cond: (id = 2)
Filter: ((id % 30) = 2)
-> Bitmap Index Scan on foo_2_idx_1 (cost=0.00..3249.82 rows=121903 width=0) (actual time=30.816..30.816 rows=171477 loops=1)
Index Cond: (id = 2)
Total runtime: 270.384 ms
For non-trivial expressions you have to repeat the more or less verbatim condition in queries to make the Postgres query planner understand it can rely on the CHECK constraint. Even if it seems redundant!
Per documentation:
With constraint exclusion enabled, the planner will examine the
constraints of each partition and try to prove that the partition need
not be scanned because it could not contain any rows meeting the
query's WHERE clause. When the planner can prove this, it excludes
the partition from the query plan.
Bold emphasis mine. The planner does not understand complex expressions.
Of course, this has to be met, too:
Ensure that the constraint_exclusion configuration parameter is not
disabled in postgresql.conf. If it is, queries will not be optimized as desired.
Instead of
SELECT * FROM foo WHERE (id = 2);
Try:
SELECT * FROM foo WHERE id % 30 = 2 AND id = 2;
And:
The default (and recommended) setting of constraint_exclusion is
actually neither on nor off, but an intermediate setting called
partition, which causes the technique to be applied only to queries
that are likely to be working on partitioned tables. The on setting
causes the planner to examine CHECK constraints in all queries, even
simple ones that are unlikely to benefit.
You can experiment with the constraint_exclusion = on to see if the planner catches on without redundant verbatim condition. But you have to weigh cost and benefit of this setting.
The alternative would be simpler conditions for your partitions as already outlined by #harmic.
An no, increasing the number for STATISTICS will not help in this case. Only the CHECK constraints and your WHERE conditions in the query matter.
Unfortunately, partioning in postgresql is fairly primitive. It only works for range and list based constraints. Your partition constraints are too complex for the query planner to use to decide to exclude some partitions.
In the manual it says:
Keep the partitioning constraints simple, else the planner may not be
able to prove that partitions don't need to be visited. Use simple
equality conditions for list partitioning, or simple range tests for
range partitioning, as illustrated in the preceding examples. A good
rule of thumb is that partitioning constraints should contain only
comparisons of the partitioning column(s) to constants using
B-tree-indexable operators.
You might get away with changing your WHERE clause so that the modulus expression is explicitly mentioned, as Erwin suggested. I haven't had much luck with that in the past, although I have not tried recently and as he says, there have been improvements in the planner. That is probably the first thing to try.
Otherwise, you will have to rearrange your partitions to use ranges of id values instead of the modulus method you are using now. Not a great solution, I know.
One other solution is to store the modulus of the id in a separate column, which you can then use a simple value equality check for the partition constraint. Bit of a waste of disk space, though, and you would also need to add a term to the where clauses to boot.
In addition to Erwin's answer about the details of the how the planner works with partitions, there is a larger issue here.
Partitioning is not a magic bullet. There are a handful of very specific things for which partitioning is very useful. If none of those very specific things apply to you, then you cannot expect a performance improvement from partitioning, and most likely will get a decrease instead.
To do partitioning correctly, you need a thorough understanding of your usage patterns, or your data loading and unloading patterns.