Small result query with LIMIT 1000x slower than queries with >100 rows - sql

I am trying to debug a query that runs faster the more records it returns but performance severely degrades (>10x slower) with smaller returns (i.e. <10 rows) using small LIMIT (ie 10).
Example:
Fast query with 5 results out of 1M rows - no LIMIT
SELECT *
FROM transaction_internal_by_addresses
WHERE address = 'foo'
ORDER BY block_number desc;
Explain:
Sort (cost=7733.14..7749.31 rows=6468 width=126) (actual time=0.030..0.031 rows=5 loops=1)
" Output: address, block_number, log_index, transaction_hash"
Sort Key: transaction_internal_by_addresses.block_number
Sort Method: quicksort Memory: 26kB
Buffers: shared hit=10
-> Index Scan using transaction_internal_by_addresses_pkey on public.transaction_internal_by_addresses (cost=0.69..7323.75 rows=6468 width=126) (actual time=0.018..0.021 rows=5 loops=1)
" Output: address, block_number, log_index, transaction_hash"
Index Cond: (transaction_internal_by_addresses.address = 'foo'::text)
Buffers: shared hit=10
Query Identifier: -8912211611755432198
Planning Time: 0.051 ms
Execution Time: 0.041 ms
Fast query with 5 results out of 1M rows: - High LIMIT
SELECT *
FROM transaction_internal_by_addresses
WHERE address = 'foo'
ORDER BY block_number desc
LIMIT 100;
Limit (cost=7570.95..7571.20 rows=100 width=126) (actual time=0.024..0.025 rows=5 loops=1)
" Output: address, block_number, log_index, transaction_hash"
Buffers: shared hit=10
-> Sort (cost=7570.95..7587.12 rows=6468 width=126) (actual time=0.023..0.024 rows=5 loops=1)
" Output: address, block_number, log_index, transaction_hash"
Sort Key: transaction_internal_by_addresses.block_number DESC
Sort Method: quicksort Memory: 26kB
Buffers: shared hit=10
-> Index Scan using transaction_internal_by_addresses_pkey on public.transaction_internal_by_addresses (cost=0.69..7323.75 rows=6468 width=126) (actual time=0.016..0.020 rows=5 loops=1)
" Output: address, block_number, log_index, transaction_hash"
Index Cond: (transaction_internal_by_addresses.address = 'foo'::text)
Buffers: shared hit=10
Query Identifier: 3421253327669991203
Planning Time: 0.042 ms
Execution Time: 0.034 ms
Slow query: - Low LIMIT
SELECT *
FROM transaction_internal_by_addresses
WHERE address = 'foo'
ORDER BY block_number desc
LIMIT 10;
Explain result:
Limit (cost=1000.63..6133.94 rows=10 width=126) (actual time=10277.845..11861.269 rows=0 loops=1)
" Output: address, block_number, log_index, transaction_hash"
Buffers: shared hit=56313576
-> Gather Merge (cost=1000.63..3333036.90 rows=6491 width=126) (actual time=10277.844..11861.266 rows=0 loops=1)
" Output: address, block_number, log_index, transaction_hash"
Workers Planned: 4
Workers Launched: 4
Buffers: shared hit=56313576
-> Parallel Index Scan Backward using transaction_internal_by_address_idx_block_number on public.transaction_internal_by_addresses (cost=0.57..3331263.70 rows=1623 width=126) (actual time=10256.995..10256.995 rows=0 loops=5)
" Output: address, block_number, log_index, transaction_hash"
Filter: (transaction_internal_by_addresses.address = 'foo'::text)
Rows Removed by Filter: 18485480
Buffers: shared hit=56313576
Worker 0: actual time=10251.822..10251.823 rows=0 loops=1
Buffers: shared hit=11387166
Worker 1: actual time=10250.971..10250.972 rows=0 loops=1
Buffers: shared hit=10215941
Worker 2: actual time=10252.269..10252.269 rows=0 loops=1
Buffers: shared hit=10191990
Worker 3: actual time=10252.513..10252.514 rows=0 loops=1
Buffers: shared hit=10238279
Query Identifier: 2050754902087402293
Planning Time: 0.081 ms
Execution Time: 11861.297 ms
DDL
create table transaction_internal_by_addresses
(
address text not null,
block_number bigint,
log_index bigint not null,
transaction_hash text not null,
primary key (address, log_index, transaction_hash)
);
alter table transaction_internal_by_addresses
owner to "icon-worker";
create index transaction_internal_by_address_idx_block_number
on transaction_internal_by_addresses (block_number);
So my questions
Should I just be looking at ways to force the query planner to apply the WHERE on the address (primary key)?
As you can see in the explain, the row block_number is scanned in the slow query but I am not sure why. Can anyone explain?
Is this normal? Seems like the more data, the harder the query, not the other way around as in this case.
Update
Apologies for A, the delay in responding and B, some of the inconsistencies in this question.
I have updated the EXPLAIN clearly showing the 1000x performance degradation

A multicolumn BTREE index on (address, block_number DESC) is exactly what the query planner needs to generate the result sets you mentioned. It will random-access the index to the first eligible row, then read the rows out in sequential order until it hits the LIMIT. You can also omit the DESC with no ill effects.
create index address_block_number
on transaction_internal_by_addresses
(address, block_number DESC);
As for asking "why" about query planner results, that's often an enduring mystery.

Sub-millisecond differences are hardly predictable so you're pretty much staring at noise, random miniscule differences caused by other things happening on the system. Your fastest query runs in tens of microseconds, slowest in a single millisecond - all of these are below typical network, mouse click, screen refresh latencies.
The planner already applies a where on your address: Index Cond: (a_table.address = 'foo'::text)
You're ordering by block_number, so it makes sense to scan it. It's also taking place in all three of your queries because they all do that.
It is normal - here's an online demo with similar differences. If what you're after is some reliable time estimation, use pgbench to run your queries multiple times and average out the timings.
Your third query plan seems to be coming from a different query, against a different table: a_table, compared to the initial two: transaction_internal_by_addresses.
If you were just wondering why these timings look like that, it's pretty much random and/or irrelevant at this level. If you're facing some kind of problem because of how these queries behave, it'd be better to focus on describing that problem - the queries themselves all do the same thing and the difference in their execution times is negligible.

Should I just be looking at ways to force the query planner to apply the WHERE on the address (primary key)?
Yes, it can be improve performance
As you can see in the explain, the row block_number is scanned in the slow query but I am not sure why. Can anyone explain?
Because Sort keys are different. Look carefully:
Sort Key: transaction_internal_by_addresses.block_number DESC
Sort Key: a_table.a_indexed_row DESC
it seems a_table.a_indexed_row has less performant stuff (eg more columns, more complex structure etc.)
Is this normal? Seems like the more data, the harder the query, not the other way around as in this case.
Normally more queries cause more time. But as I mentioned above, maybe a_table.a_indexed_row returns more values, has more columns etc.

Related

How to speed up query execution in a table that has more than 100 000 000 rows and has a WHERE and ORDER (order by timestamp) clause?

I have a problem related to SQL query optimization. I'm working in a PostgreSQL database, but I assume it's more or less the same for all databases.
The problem is this:
I have a table with more than 100 000 000 rows and I need to select only the first 20/25 rows that match the condition in the WHERE clause. Additionally, the rows must be the most recently added to the table. The query is simple:
SELECT *
FROM transactions
WHERE transactions.from = '0xfbde4acae6c489197280635f0fa172148c61838b'
OR transactions.to = '0xfbde4acae6c489197280635f0fa172148c61838b'
ORDER BY transactions.timestamp DESC
LIMIT 25;
EXPLAIN (ANALYZE, VERBOSE, BUFFER) for the above query:
"QUERY PLAN"
"Limit (cost=2174753.21..2174756.12 rows=25 width=324) (actual time=5225.725..5242.938 rows=25 loops=1)"
" Output: hash, block_hash, block_number, ""from"", ""to"", gas, gas_used, gas_price, nonce, transaction_index, value, contract_address, status, ""timestamp"""
" Buffers: shared hit=17 read=146499 dirtied=6 written=379"
" -> Gather Merge (cost=2174753.21..2214827.74 rows=343472 width=324) (actual time=5225.723..5242.935 rows=25 loops=1)"
" Output: hash, block_hash, block_number, ""from"", ""to"", gas, gas_used, gas_price, nonce, transaction_index, value, contract_address, status, ""timestamp"""
" Workers Planned: 2"
" Workers Launched: 2"
" Buffers: shared hit=17 read=146499 dirtied=6 written=379"
" -> Sort (cost=2173753.18..2174182.52 rows=171736 width=324) (actual time=5212.330..5212.332 rows=19 loops=3)"
" Output: hash, block_hash, block_number, ""from"", ""to"", gas, gas_used, gas_price, nonce, transaction_index, value, contract_address, status, ""timestamp"""
" Sort Key: transactions.""timestamp"" DESC"
" Sort Method: top-N heapsort Memory: 40kB"
" Buffers: shared hit=17 read=146499 dirtied=6 written=379"
" Worker 0: actual time=5205.779..5205.781 rows=25 loops=1"
" Sort Method: top-N heapsort Memory: 39kB"
" Buffers: shared hit=5 read=49090 dirtied=2 written=117"
" Worker 1: actual time=5205.776..5205.778 rows=25 loops=1"
" Sort Method: top-N heapsort Memory: 42kB"
" Buffers: shared hit=7 read=49181 dirtied=2 written=134"
" -> Parallel Bitmap Heap Scan on public.transactions (cost=6130.59..2168906.92 rows=171736 width=324) (actual time=33.562..5167.871 rows=131846 loops=3)"
" Output: hash, block_hash, block_number, ""from"", ""to"", gas, gas_used, gas_price, nonce, transaction_index, value, contract_address, status, ""timestamp"""
" Recheck Cond: ((transactions.""from"" = '0xfbde4acae6c489197280635f0fa172148c61838b'::bpchar) OR (transactions.""to"" = '0xfbde4acae6c489197280635f0fa172148c61838b'::bpchar))"
" Rows Removed by Index Recheck: 663904"
" Heap Blocks: exact=13510 lossy=33729"
" Buffers: shared hit=5 read=146497 dirtied=6 written=379"
" Worker 0: actual time=26.980..5162.413 rows=133634 loops=1"
" Buffers: shared read=49088 dirtied=2 written=117"
" Worker 1: actual time=27.062..5162.932 rows=132710 loops=1"
" Buffers: shared read=49181 dirtied=2 written=134"
" -> BitmapOr (cost=6130.59..6130.59 rows=412182 width=0) (actual time=40.743..40.744 rows=0 loops=1)"
" Buffers: shared hit=5 read=349"
" -> Bitmap Index Scan on from_idx (cost=0.00..5840.70 rows=406417 width=0) (actual time=40.487..40.487 rows=401210 loops=1)"
" Index Cond: (transactions.""from"" = '0xfbde4acae6c489197280635f0fa172148c61838b'::bpchar)"
" Buffers: shared hit=3 read=347"
" -> Bitmap Index Scan on to_idx (cost=0.00..83.81 rows=5765 width=0) (actual time=0.254..0.254 rows=124 loops=1)"
" Index Cond: (transactions.""to"" = '0xfbde4acae6c489197280635f0fa172148c61838b'::bpchar)"
" Buffers: shared hit=2 read=2"
"Planning Time: 0.108 ms"
"Execution Time: 5243.004 ms"
The problem is that this query takes so long to execute (4+ seconds; sometimes it takes more than 8 seconds). Of course, the reason is the order of execution of the commands (FROM -> WHERE -> SELECT -> ORDERS BY -> LIMIT).
So it's going through all 100 000 000 rows, then select the rows that match the WHERE clause, then ORDER them and finally take the first 25 rows.
It is important to note that I have indexes over the transactions.from and transactions.to columns
I am also doing a following query to see how many rows match the WHERE clause in total:
SELECT count(*)
FROM transactions
WHERE transactions.from = '0xfbde4acae6c489197280635f0fa172148c61838b'
OR transactions.to = '0xfbde4acae6c489197280635f0fa172148c61838b'
The strange thing is that when the second query gives some small number the first query executes fast (up to 300 milliseconds). Otherwise, when it gives big count number, the first query has the low execution speed. So, I guess the ORDER is the problem, because if the WHERE clause gives back, for example, 200 000 matches, it must order it and that takes some time. So, my question is: Is there a way to optimize this? Using a non-cluster index for the timestamp would not make sense to me (or maybe it would?). Has anyone had a similar problem and does anyone know how to speed up query execution?
PostgreSQL is rather weak in this regard. I don't know if other DBMS are better, but it is at least questionable to assume they are all the same. You can get good performance, but it requires you to do the heavy lifting yourself, by writing the query in a rather contorted way:
EXPLAIN (ANALYZE, BUFFERS)
(select * from transactions WHERE transactions.from = 871 order by timestamp desc limit 25)
union all
(select * from transactions where transactions.to = 871 order by timestamp desc limit 25)
ORDER BY timestamp DESC LIMIT 25;
This will give you fast plan as long as you have two two-column indexes, on ("from", timestamp) and ("to", timestamp). It will read the tuples already ordered using each index then merge them with a Merge Append. Note that if a row qualifies on both the "from" and the "to", then that row will be returned two times, so this query is not formally identical to your current one.
Note that the 2 "inner" ORDER BY and LIMIT specifications are not necessary to get the correct answer, but they are necessary to get the fast plan. Otherwise it resorts to a slow plan similar to the one you already have.
This is the fast plan:
Limit (cost=0.88..101.07 rows=25 width=16) (actual time=15.738..62.608 rows=25 loops=1)
Buffers: shared hit=10 read=22
I/O Timings: shared/local read=61.744
-> Merge Append (cost=0.88..201.26 rows=50 width=16) (actual time=15.734..62.579 rows=25 loops=1)
Sort Key: transactions."timestamp" DESC
Buffers: shared hit=10 read=22
I/O Timings: shared/local read=61.744
-> Limit (cost=0.43..100.37 rows=25 width=16) (actual time=14.636..50.620 rows=16 loops=1)
Buffers: shared hit=4 read=15
I/O Timings: shared/local read=50.054
-> Index Scan Backward using transactions_from_timestamp_idx on transactions (cost=0.43..4009.98 rows=1003 width=16) (actual time=14.634..50.589 rows=16 loops=1)
Index Cond: ("from" = 19)
Buffers: shared hit=4 read=15
I/O Timings: shared/local read=50.054
-> Limit (cost=0.43..100.37 rows=25 width=16) (actual time=1.092..11.910 rows=10 loops=1)
Buffers: shared hit=6 read=7
I/O Timings: shared/local read=11.690
-> Index Scan Backward using transactions_to_timestamp_idx on transactions transactions_1 (cost=0.43..3997.92 rows=1000 width=16) (actual time=1.089..11.900 rows=10 loops=1)
Index Cond: ("to" = 19)
Buffers: shared hit=6 read=7
I/O Timings: shared/local read=11.690
Planning Time: 0.898 ms
Execution Time: 62.808 ms
For demo purposes I converted the column types to int, for ease of random data generation. But that should have no meaningful effect on the performance.
So, I guess the ORDER is the problem, because if the WHERE clause gives back, for example, 200 000 matches, it must order it and that takes some time
If you look at your original plan, the Sort only takes a trivial amount of time, less than 1% of the total time: (5212.332-5167.871) / 5242.938. The ORDER is the problem only indirectly, because, as a precursor to sorting 131846 rows, you first need to read 131846 rows scattered randomly across your table, and that takes a lot of random IO, which is slow. By reading them already in order and preserving that order with Merge Append, the fast plan gets to stop early once it reaches the LIMIT.

Understanding SQL EXPLAIN on a JOIN query

I'm struggling to make sense of postgres EXPLAIN to figure out why my query is slow. Can someone help? This is my query, it's a pretty simple join:
SELECT DISTINCT graph_concepts.*
FROM graph_concepts
INNER JOIN graph_family_links
ON graph_concepts.id = graph_family_links.descendent_concept_id
WHERE graph_family_links.ancestor_concept_id = 1016
AND graph_family_links.generation = 1
AND graph_concepts.state != 2
It's starting from a concept and it's getting a bunch of related concepts through the links table.
Notably, I have an index on graph_family_links.descendent_concept_id, yet this query takes about 3 seconds to return a result. This is way too long for my purposes.
This is the SQL explain:
Unique (cost=46347.01..46846.16 rows=4485 width=108) (actual time=27.406..33.667 rows=13 loops=1)
Buffers: shared hit=13068 read=5
I/O Timings: read=0.074
-> Gather Merge (cost=46347.01..46825.98 rows=4485 width=108) (actual time=27.404..33.656 rows=13 loops=1)
Workers Planned: 1
Workers Launched: 1
Buffers: shared hit=13068 read=5
I/O Timings: read=0.074
-> Sort (cost=45347.01..45348.32 rows=2638 width=108) (actual time=23.618..23.621 rows=6 loops=2)
Sort Key: graph_concepts.id, graph_concepts.definition, graph_concepts.checkvist_task_id, graph_concepts.primary_question_id, graph_concepts.created_at, graph_concepts.updated_at, graph_concepts.tsn, graph_concepts.state, graph_concepts.search_phrases
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=13068 read=5
I/O Timings: read=0.074
Worker 0: Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=301.97..45317.02 rows=2638 width=108) (actual time=8.890..23.557 rows=6 loops=2)
Buffers: shared hit=13039 read=5
I/O Timings: read=0.074
-> Parallel Bitmap Heap Scan on graph_family_links (cost=301.88..39380.60 rows=2640 width=4) (actual time=8.766..23.293 rows=6 loops=2)
Recheck Cond: (ancestor_concept_id = 1016)
Filter: (generation = 1)
Rows Removed by Filter: 18850
Heap Blocks: exact=2558
Buffers: shared hit=12985
-> Bitmap Index Scan on index_graph_family_links_on_ancestor_concept_id (cost=0.00..301.66 rows=38382 width=0) (actual time=4.744..4.744 rows=47346 loops=1)
Index Cond: (ancestor_concept_id = 1016)
Buffers: shared hit=67
-> Index Scan using graph_concepts_pkey on graph_concepts (cost=0.08..2.25 rows=1 width=108) (actual time=0.036..0.036 rows=1 loops=13)
Index Cond: (id = graph_family_links.descendent_concept_id)
Filter: (state <> 2)
Buffers: shared hit=54 read=5
I/O Timings: read=0.074
Planning:
Buffers: shared hit=19
Planning Time: 0.306 ms
Execution Time: 33.747 ms
(35 rows)
I'm doing lots of googling to help me figure out how to read this EXPLAIN and I'm struggling. Can someone help translate this into plain english for me?
Answering myself (for the benefit of future people):
My question was primarily how to understand EXPLAIN. Many people below contributed to my understanding but no one really gave me the beginner unpacking I was looking for. I want to teach myself to fish rather than simply having other people read this and give me advice on solving this specific issue, although I do greatly appreciate the specific suggestions!
For others trying to understand EXPLAIN, this is the important context you need to know, which was holding me back:
"Cost" is some arbitrary unit of how expense each step of the process is, you can think of it almost like a stopwatch.
Look near the end of your EXPLAIN until you find: cost=0.00.. This is the very start of your query execution. In my case, cost=0.00..301.66 is the first step and cost=0.08..2.25 runs in parallel (from step 0.08 to step 2.25, just a small fraction of the 0 to 300).
Find the step with the biggest "span" of cost. In my case, cost=301.88..39380.60. Although I was confused because I also have a cost=301.97..45317.02. I think those are, again, both happening in parallel so I'm not sure which one is contributing more.
SELECT DISTINCT
graph_concepts.*
FROM
graph_concepts
INNER JOIN graph_family_links ON graph_concepts.id = graph_family_links.descendent_concept_id
WHERE
graph_family_links.ancestor_concept_id = 1016
AND graph_family_links.generation = 1
AND graph_concepts.state != 2
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=46347.01..46846.16 rows=4485 width=108)
## (Merge records DISTINCT)
-> Gather Merge (cost=46347.01..46825.98 rows=4485 width=108)
Workers Planned: 1
## (Sort table graph_concepts.* )
-> Sort (cost=45347.01..45348.32 rows=2638 width=108)
Sort Key: graph_concepts.id, graph_concepts.definition, graph_concepts.checkvist_task_id, graph_concepts.primary_question_id, graph_concepts.created_at, graph_concepts.updated_at, graph_concepts.tsn, graph_concepts.state, graph_concepts.search_phrases
-> Nested Loop (cost=301.97..45317.02 rows=2638 width=108)
## WHERE graph_family_links.ancestor_concept_id = 1016 (Use Parallel Bitmap Heap Scan table and filter record)
-> Parallel Bitmap Heap Scan on graph_family_links (cost=301.88..39380.60 rows=2640 width=4)
Recheck Cond: (ancestor_concept_id = 1016)
Filter: (generation = 1)
## AND graph_family_links.generation = 1 (Use Bitmap Index Scan table and filter record)
-> Bitmap Index Scan on index_graph_family_links_on_ancestor_concept_id (cost=0.00..301.66 rows=38382 width=0)
Index Cond: (ancestor_concept_id = 1016)
## AND graph_concepts.state != 2 (Use Index Scan table and filter record)
-> Index Scan using graph_concepts_pkey on graph_concepts (cost=0.08..2.25 rows=1 width=108)
Index Cond: (id = graph_family_links.descendent_concept_id)
Filter: (state <> 2)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Please refer to the below sql script.
SELECT DISTINCT graph_concepts.*
FROM graph_concepts
INNER JOIN (select descendent_concept_id from graph_family_links where ancestor_concept_id = 1016 and generation = 1) A ON graph_concepts.id = A.descendent_concept_id
WHERE graph_concepts.state != 2
Explain read from bottom to up, generally.
https://www.postgresql.org/docs/current/sql-explain.html
This command displays the execution plan that the PostgreSQL planner
generates for the supplied statement. The execution plan shows how the
table(s) referenced by the statement will be scanned — by plain
sequential scan, index scan, etc. — and if multiple tables are
referenced, what join algorithms will be used to bring together the
required rows from each input table.
Since you only do explain your select command meaning the output is the system planner generate a execute plan for this select query. But if you do explain analyze then it will plan and execute the query.
First there is two table there. For each table use some ways to found out which rows meet the where criteria. index scan (one of the way to find out where is the row) found out in table graph_concepts which row meet the condition: graph_concepts.state != 2.
Also in the mean time(Parallel) use Bitmap Heap Scan to found out in table
graph_family_links which row meet the criteria: graph_family_links.ancestor_concept_id = 1016
After that then do join operation. In this case, it's Nested Loop.
After join then do Sort. Why we need to sort operation? Because you specified: SELECT DISTINCT : https://www.postgresql.org/docs/current/sql-select.html
-After sort then since you specified key word DISTINCT then eliminate the duplicates.
People have given you general links and advice on understanding plans. To focus on one relevant part of the plan, it expects to find 38382 rows satisfying ancestor_concept_id = 1016, but then well over 90% of them are expected to fail the generation = 1 filter. But that is expensive as it was to jump to some random table page to fetch the "generation" value to apply the filter.
If you had a combined index on (ancestor_concept_id, generation) it could apply both restrictions efficiently simultaneously. Alternatively, of you had separate single column indexes on those columns, it could combine then with a BitmapAnd operation. That would be more efficient than what you are currently doing to but less efficient than the combined index.

Simple aggregate query running slow

I am trying to determine why a fairly simple aggregate query is taking so long to perform on a single table. The table is called plots, and it is [id, device_id, time, ...] There are two indices, UNIQUE(id) and UNIQUE(device_id, time).
The query is simply:
SELECT device_id, MIN(time)
FROM plots
GROUP BY device_id
To me, this should be very fast, but it is taking 3+ minutes. The table has ~45 million rows, divided roughly equally among 1200 or so device_id's.
EXPLAIN for the query:
Finalize GroupAggregate (cost=1502955.41..1503055.97 rows=906 width=12)
Group Key: device_id
-> Gather Merge (cost=1502955.41..1503052.35 rows=906 width=12)
Workers Planned: 1
-> Sort (cost=1501955.41..1501955.86 rows=906 width=12)
Sort Key: device_id
-> Partial HashAggregate (cost=1501943.79..1501946.51 rows=906 width=12)
Group Key: device_id
-> Parallel Seq Scan on plots (cost=0.00..1476417.34 rows=25526447 width=12)
EXPLAIN for query with a where device_id = xxx:
GroupAggregate (cost=398.86..78038.77 rows=906 width=12)
Group Key: device_id
-> Bitmap Heap Scan on plots (cost=398.86..77992.99 rows=43065 width=12)
Recheck Cond: (device_id = 6780)
-> Bitmap Index Scan on index_plots_on_device_id_and_time (cost=0.00..396.71 rows=43065 width=0)
Index Cond: (device_id = 6780)
I have done VACUUM (FULL, ANALYZE) as well as REINDEX DATABASE.
I have also tried doing partition queries to accomplish the same.
Any pointers on making this faster? Or am I just boned on the table size. It seems like it should be fine with the index though. Maybe I am missing something...
EDIT / UPDATE:
The problem seems to be resolved at this point, though I am not sure why. I have dropped and rebuilt the index many times, and suddenly the query is only taking ~7 seconds, which is acceptable. Of note, this morning I dropped the index and created a new one with the reverse column order (time, device_id) and I was surprised to see good results. I then reverted to the previous index, and the results were improved further. I will refork the production database and try to retrace my steps and post an update. Should I be worried about the query planner wonking out in the future?
Current EXPLAIN with analysis (as requested):
Finalize GroupAggregate (cost=1000.12..480787.58 rows=905 width=12) (actual time=36.299..7530.403 rows=916 loops=1)
Group Key: device_id
Buffers: shared hit=135087 read=40325
I/O Timings: read=138.419
-> Gather Merge (cost=1000.12..480783.96 rows=905 width=12) (actual time=36.226..7552.052 rows=1829 loops=1)
Workers Planned: 1
Workers Launched: 1
Buffers: shared hit=509502 read=160807
I/O Timings: read=639.797
-> Partial GroupAggregate (cost=0.11..479687.58 rows=905 width=12) (actual time=15.779..5026.094 rows=914 loops=2)
Group Key: device_id
Buffers: shared hit=509502 read=160807
I/O Timings: read=639.797
-> Parallel Index Only Scan using index_plots_time_and_device_id on plots (cost=0.11..454158.41 rows=25526447 width=12) (actual time=0.033..2999.764 rows=21697480 loops=2)
Heap Fetches: 0
Buffers: shared hit=509502 read=160807
I/O Timings: read=639.797
Planning Time: 0.092 ms
Execution Time: 7554.100 ms
(19 rows)
Approach 1:
You can try to remove your UNIQUE to an index on your database. CREATE UNIQUE INDEX and CREATE INDEX have different behaviors. I believe that you can get benefits from CREATE INDEX.
Approach 2:
You can create a materialized view. If you can get some delay on your information, you can do the following:
CREATE MATERIALIZED VIEW myreport AS
SELECT device_id,
MIN(time) AS mintime
FROM plots
GROUP BY device_id
CREATE INDEX myreport_device_id ON myreport(device_id);
Also, you need to remember to regularly do:
REFRESH MATERIALIZED VIEW CONCURRENTLY myreport;
And less regularly do:
VACUUM ANALYZE myreport

Improve PostgresSQL aggregation query performance

I am aggregating data from a Postgres table, the query is taking approx 2 seconds which I want to reduce to less than a second.
Please find below the execution details:
Query
select
a.search_keyword,
hll_cardinality( hll_union_agg(a.users) ):: int as user_count,
hll_cardinality( hll_union_agg(a.sessions) ):: int as session_count,
sum(a.total) as keyword_count
from
rollup_day a
where
a.created_date between '2018-09-01' and '2019-09-30'
and a.tenant_id = '62850a62-19ac-477d-9cd7-837f3d716885'
group by
a.search_keyword
order by
session_count desc
limit 100;
Table metadata
Total number of rows - 506527
Composite Index on columns : tenant_id and created_date
Query plan
Custom Scan (cost=0.00..0.00 rows=0 width=0) (actual time=1722.685..1722.694 rows=100 loops=1)
Task Count: 1
Tasks Shown: All
-> Task
Node: host=localhost port=5454 dbname=postgres
-> Limit (cost=64250.24..64250.49 rows=100 width=42) (actual time=1783.087..1783.106 rows=100 loops=1)
-> Sort (cost=64250.24..64558.81 rows=123430 width=42) (actual time=1783.085..1783.093 rows=100 loops=1)
Sort Key: ((hll_cardinality(hll_union_agg(sessions)))::integer) DESC
Sort Method: top-N heapsort Memory: 33kB
-> GroupAggregate (cost=52933.89..59532.83 rows=123430 width=42) (actual time=905.502..1724.363 rows=212633 loops=1)
Group Key: search_keyword
-> Sort (cost=52933.89..53636.53 rows=281055 width=54) (actual time=905.483..1351.212 rows=280981 loops=1)
Sort Key: search_keyword
Sort Method: external merge Disk: 18496kB
-> Seq Scan on rollup_day a (cost=0.00..17890.22 rows=281055 width=54) (actual time=29.720..112.161 rows=280981 loops=1)
Filter: ((created_date >= '2018-09-01'::date) AND (created_date <= '2019-09-30'::date) AND (tenant_id = '62850a62-19ac-477d-9cd7-837f3d716885'::uuid))
Rows Removed by Filter: 225546
Planning Time: 0.129 ms
Execution Time: 1786.222 ms
Planning Time: 0.103 ms
Execution Time: 1722.718 ms
What I've tried
I've tried with indexes on tenant_id and created_date but as the data is huge so it's always doing sequence scan rather than an index scan for filters. I've read about it and found, the Postgres query engine switch to sequence scan if the data returned is > 5-10% of the total rows. Please follow the link for more reference.
I've increased the work_mem to 100MB but it only improved the performance a little bit.
Any help would be really appreciated.
Update
Query plan after setting work_mem to 100MB
Custom Scan (cost=0.00..0.00 rows=0 width=0) (actual time=1375.926..1375.935 rows=100 loops=1)
Task Count: 1
Tasks Shown: All
-> Task
Node: host=localhost port=5454 dbname=postgres
-> Limit (cost=48348.85..48349.10 rows=100 width=42) (actual time=1307.072..1307.093 rows=100 loops=1)
-> Sort (cost=48348.85..48633.55 rows=113880 width=42) (actual time=1307.071..1307.080 rows=100 loops=1)
Sort Key: (sum(total)) DESC
Sort Method: top-N heapsort Memory: 35kB
-> GroupAggregate (cost=38285.79..43996.44 rows=113880 width=42) (actual time=941.504..1261.177 rows=172945 loops=1)
Group Key: search_keyword
-> Sort (cost=38285.79..38858.52 rows=229092 width=54) (actual time=941.484..963.061 rows=227261 loops=1)
Sort Key: search_keyword
Sort Method: quicksort Memory: 32982kB
-> Seq Scan on rollup_day_104290 a (cost=0.00..17890.22 rows=229092 width=54) (actual time=38.803..104.350 rows=227261 loops=1)
Filter: ((created_date >= '2019-01-01'::date) AND (created_date <= '2019-12-30'::date) AND (tenant_id = '62850a62-19ac-477d-9cd7-837f3d716885'::uuid))
Rows Removed by Filter: 279266
Planning Time: 0.131 ms
Execution Time: 1308.814 ms
Planning Time: 0.112 ms
Execution Time: 1375.961 ms
Update 2
After creating an index on created_date and increased work_mem to 120MB
create index date_idx on rollup_day(created_date);
The total number of rows is: 12,124,608
Query Plan is:
Custom Scan (cost=0.00..0.00 rows=0 width=0) (actual time=2635.530..2635.540 rows=100 loops=1)
Task Count: 1
Tasks Shown: All
-> Task
Node: host=localhost port=9702 dbname=postgres
-> Limit (cost=73545.19..73545.44 rows=100 width=51) (actual time=2755.849..2755.873 rows=100 loops=1)
-> Sort (cost=73545.19..73911.25 rows=146424 width=51) (actual time=2755.847..2755.858 rows=100 loops=1)
Sort Key: (sum(total)) DESC
Sort Method: top-N heapsort Memory: 35kB
-> GroupAggregate (cost=59173.97..67948.97 rows=146424 width=51) (actual time=2014.260..2670.732 rows=296537 loops=1)
Group Key: search_keyword
-> Sort (cost=59173.97..60196.85 rows=409152 width=55) (actual time=2013.885..2064.775 rows=410618 loops=1)
Sort Key: search_keyword
Sort Method: quicksort Memory: 61381kB
-> Index Scan using date_idx_102913 on rollup_day_102913 a (cost=0.42..21036.35 rows=409152 width=55) (actual time=0.026..183.370 rows=410618 loops=1)
Index Cond: ((created_date >= '2018-01-01'::date) AND (created_date <= '2018-12-31'::date))
Filter: (tenant_id = '12850a62-19ac-477d-9cd7-837f3d716885'::uuid)
Planning Time: 0.135 ms
Execution Time: 2760.667 ms
Planning Time: 0.090 ms
Execution Time: 2635.568 ms
You should experiment with higher settings of work_mem until you get an in-memory sort. Of course you can only be generous with memory if your machine has enough of it.
What would make your query way faster is if you store pre-aggregated data, either using a materialized view or a second table and a trigger on your original table that keeps the sums in the other table updated. I don't know if that is possible with your data, as I don't know what hll_cardinality and hll_union_agg are.
Have you tried a Covering indexes, so the optimizer will use the index, and not do a sequential scan ?
create index covering on rollup_day(tenant_id, created_date, search_keyword, users, sessions, total);
If Postgres 11
create index covering on rollup_day(tenant_id, created_date) INCLUDE (search_keyword, users, sessions, total);
But since you also do a sort/group by on search_keyword maybe :
create index covering on rollup_day(tenant_id, created_date, search_keyword);
create index covering on rollup_day(tenant_id, search_keyword, created_date);
Or :
create index covering on rollup_day(tenant_id, created_date, search_keyword) INCLUDE (users, sessions, total);
create index covering on rollup_day(tenant_id, search_keyword, created_date) INCLUDE (users, sessions, total);
One of these indexes should make the query faster. You should only add one of these indexes.
Even if it makes this query faster, having big indexes will/might make your write operations slower (especially HOT updates are not available on indexed columns). And you will use more storage.
Idea came from here , there is also an hint about size for the work_mem
Another example where the index was not used
use the table partitions and create a composite index it will bring down the total cost as:
it will save huge cost on scans for you.
partitions will segregate data and will be very helpful in future purge operations as well.
I have personally tried and tested table partitions with such cases and the throughput is amazing with the combination of
partitions & composite indexes.
Partitioning can be done on the range of created date and then composite indexes on date & tenant.
Remember you can always have a composite index with a condition in it if there is a very specific requirement for the condition in your query. This way the data will be sorted already in the index and will save huge costs for sort operations as well.
Hope this helps.
PS: Also, is it possible to share any test sample data for the same?
my suggestion would be to break up the select.
Now what I would try also in combination with this to setup 2 indices on the table. One on the Dates the other on the ID. One of the problem with weird IDs is, that it takes time to compare and they can be treated as string compare in the background. Thats why the break up, to prefilter the data before the between command is executed. Now the between command can make a select slow. Here I would suggest to break it up into 2 selects and inner join (I now the memory consumption is a problem).
Here is an example what I mean. I hope the optimizer is smart enough to restructure your query.
SELECT
a.search_keyword,
hll_cardinality( hll_union_agg(a.users) ):: int as user_count,
hll_cardinality( hll_union_agg(a.sessions) ):: int as session_count,
sum(a.total) as keyword_count
FROM
(SELECT
*
FROM
rollup_day a
WHERE
a.tenant_id = '62850a62-19ac-477d-9cd7-837f3d716885') t1
WHERE
a.created_date between '2018-09-01' and '2019-09-30'
group by
a.search_keyword
order by
session_count desc
Now if this does not work then you need more specific optimizations. For example. Can the total be equal to 0, then you need filtered index on the data where the total is > 0. Are there any other criteria that make it easy to exclude rows from the select.
The next consideration would be to create a row where there is a short ID (instead of 62850a62-19ac-477d-9cd7-837f3d716885 -> 62850 ), that can be a number and that would make preselection very easy and memory consumption less.

Order of columns in INNER JOIN condition affects the performance badly

I have two tables which links to each other like this:
Table answered_questions with the following columns and indexes:
id: primary key
taken_test_id: integer (foreign key)
question_id: integer (foreign key, links to another table called questions)
indexes: (taken_test_id, question_id)
Table taken_tests
id: primary key
user_id: (foreign key, links to table Users)
indexes: user_id column
First query (with EXPLAIN ANALYZE output):
EXPLAIN ANALYZE
SELECT
"answered_questions".*
FROM
"answered_questions"
INNER JOIN "taken_tests" ON "answered_questions"."taken_test_id" = "taken_tests"."id"
WHERE
"taken_tests"."user_id" = 1;
Output:
Nested Loop (cost=0.99..116504.61 rows=1472 width=61) (actual time=0.025..2.208 rows=653 loops=1)
-> Index Scan using index_taken_tests_on_user_id on taken_tests (cost=0.43..274.18 rows=91 width=4) (actual time=0.014..0.483 rows=371 loops=1)
Index Cond: (user_id = 1)
-> Index Scan using index_answered_questions_on_taken_test_id_and_question_id on answered_questions (cost=0.56..1273.61 rows=365 width=61) (actual time=0.00
2..0.003 rows=2 loops=371)
Index Cond: (taken_test_id = taken_tests.id)
Planning time: 0.276 ms
Execution time: 2.365 ms
(7 rows)
Another query (this is generated automatically by Rails when using joins method in ActiveRecord)
EXPLAIN ANALYZE
SELECT
"answered_questions".*
FROM
"answered_questions"
INNER JOIN "taken_tests" ON "taken_tests"."id" = "answered_questions"."taken_test_id"
WHERE
"taken_tests"."user_id" = 1;
And here is the output
Nested Loop (cost=0.99..116504.61 rows=1472 width=61) (actual time=23.611..1257.807 rows=653 loops=1)
-> Index Scan using index_taken_tests_on_user_id on taken_tests (cost=0.43..274.18 rows=91 width=4) (actual time=10.451..71.474 rows=371 loops=1)
Index Cond: (user_id = 1)
-> Index Scan using index_answered_questions_on_taken_test_id_and_question_id on answered_questions (cost=0.56..1273.61 rows=365 width=61) (actual time=2.07
1..3.195 rows=2 loops=371)
Index Cond: (taken_test_id = taken_tests.id)
Planning time: 0.302 ms
Execution time: 1258.035 ms
(7 rows)
The only difference is the order of columns in the INNER JOIN condition. In the first query, it is ON "answered_questions"."taken_test_id" = "taken_tests"."id" while in the second query, it is ON "taken_tests"."id" = "answered_questions"."taken_test_id". But the query time is hugely different.
Do you have any idea why this happens? I read some articles and it says that the order of columns in JOIN condition should not affect the execution time (ex: Best practices for the order of joined columns in a sql join?)
I am using Postgres 9.6. There are more than 40 million rows in answered_questions table and more than 3 million rows in taken_tests table
Update 1:
When I ran the EXPLAIN with (analyze true, verbose true, buffers true), I got a much better result for the second query (quite similar to the first query)
EXPLAIN (ANALYZE TRUE, VERBOSE TRUE, BUFFERS TRUE)
SELECT
"answered_questions".*
FROM
"answered_questions"
INNER JOIN "taken_tests" ON "taken_tests"."id" = "answered_questions"."taken_test_id"
WHERE
"taken_tests"."user_id" = 1;
Output
Nested Loop (cost=0.99..116504.61 rows=1472 width=61) (actual time=0.030..2.192 rows=653 loops=1)
Output: answered_questions.id, answered_questions.question_id, answered_questions.answer_text, answered_questions.created_at, answered_questions.updated_at, a
nswered_questions.taken_test_id, answered_questions.correct, answered_questions.answer
Buffers: shared hit=1986
-> Index Scan using index_taken_tests_on_user_id on public.taken_tests (cost=0.43..274.18 rows=91 width=4) (actual time=0.014..0.441 rows=371 loops=1)
Output: taken_tests.id
Index Cond: (taken_tests.user_id = 1)
Buffers: shared hit=269
-> Index Scan using index_answered_questions_on_taken_test_id_and_question_id on public.answered_questions (cost=0.56..1273.61 rows=365 width=61) (actual ti
me=0.002..0.003 rows=2 loops=371)
Output: answered_questions.id, answered_questions.question_id, answered_questions.answer_text, answered_questions.created_at, answered_questions.updated
_at, answered_questions.taken_test_id, answered_questions.correct, answered_questions.answer
Index Cond: (answered_questions.taken_test_id = taken_tests.id)
Buffers: shared hit=1717
Planning time: 0.238 ms
Execution time: 2.335 ms
As you can see from the initial EXPLAIN ANALYZE statement results -- the queries are resulting in the equivalent query plan and are executed exactly the same.
The difference comes from the very same unit's execution time:
-> Index Scan using index_taken_tests_on_user_id on taken_tests (cost=0.43..274.18 rows=91 width=4) (actual time=0.014..0.483rows=371 loops=1)
and
-> Index Scan using index_taken_tests_on_user_id on taken_tests (cost=0.43..274.18 rows=91 width=4) (actual time=10.451..71.474rows=371 loops=1)
As the commenters already pointed out (see documentation links in the wuestion comments), the query plan for an inner join is expected to be the same regardless of the table order. It is ordered based on the query planner decisions. This means that you should really look at other performance-optimisation parts of the query execution. One of those would be memory used for caching (SHARED BUFFER). It looks like the query results would depend a lot on whether this data has already been loaded into memory. Just as you have noticed -- the query execution time grows after you have waited some time. This clearly indicates the cache expiry issue more than the plan problem.
Increasing the size of the shared buffers may help resolve it, but the initial execution of the query will always take longer -- this is just your disk access speed.
For more hints on memory configuration of Pg database see here: https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
Note: VACUUM or ANALYZE commands will be unlikely to help here. Both queries are using the same plan already. Keep in mind, though, that due to PostgreSQL transaction isolation mechanism (MVCC) it may have to read the underlying table rows to validate that they are still visible to the current transaction after getting the results from the index. This could be improved by updating the visibility map (see https://www.postgresql.org/docs/10/storage-vm.html), which is done during vacuuming.