Full text search query is slow on first query - sql

I have an airports table which contains a list of nearly 4k airports. The table has a searchable column which is a ts_vector column and an index airports_searchable_index:
searchable tsvector NULL
CREATE INDEX airports_searchable_index ON airports USING gin (searchable)
Given I have an indexed document in the searchable column and I attempt to run a query against that column, I get very quick responses on my dev machine (around 3ms for the query) but around 650ms on production (using the exact same data!). The weird part is that my production machine is much stronger than my local dev machine. A query for example:
select * from "airports" where searchable ## to_tsquery('public.hebrew', 'ltn:*') order by "popularity" desc limit 100
I've opened PGAdmin and tried doing some tests. What I saw that for the first time I run the query above in a new "Query Tool Panel", it takes anywhere between 650-800ms to execute. However, on the second run, it takes 30-60ms to run even if I change the query term. I had concluded from that, that Postgres is possible opening the document in memory for each connection and run the query against that. Since I'm using PHP to talk with my backend, every request is going to open it's own connection to the DB, hence causing Postgres to constantly re-opening the document.
Could it be a misconfiguration on my production server?
Here is an explain query (for production server):
Limit (cost=24.03..24.04 rows=1 width=8) (actual time=0.048..0.048 rows=1 loops=1)
Output: id, popularity
Buffers: shared hit=4
-> Sort (cost=24.03..24.04 rows=1 width=8) (actual time=0.047..0.047 rows=1 loops=1)
Output: id, popularity
Sort Key: airports.popularity DESC
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=4
-> Bitmap Heap Scan on lametayel.airports (cost=20.01..24.02 rows=1 width=8) (actual time=0.040..0.040 rows=1 loops=1)
Output: id, popularity
Recheck Cond: (airports.searchable ## '''ltn'':*'::tsquery)
Heap Blocks: exact=1
Buffers: shared hit=4
-> Bitmap Index Scan on airports_searchable_index (cost=0.00..20.01 rows=1 width=0) (actual time=0.036..0.036 rows=1 loops=1)
Index Cond: (airports.searchable ## '''ltn'':*'::tsquery)
Buffers: shared hit=3
Planning time: 0.304 ms
Execution time: 0.078 ms
Here is an explain query (for development server):
Limit (cost=28.03..28.04 rows=1 width=8) (actual time=0.065..0.067 rows=1 loops=1)
Output: id, popularity
Buffers: shared hit=5
-> Sort (cost=28.03..28.04 rows=1 width=8) (actual time=0.064..0.065 rows=1 loops=1)
Output: id, popularity
Sort Key: airports.popularity DESC
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=5
-> Bitmap Heap Scan on lametayel.airports (cost=24.01..28.02 rows=1 width=8) (actual time=0.046..0.047 rows=1 loops=1)
Output: id, popularity
Recheck Cond: (airports.searchable ## '''ltn'':*'::tsquery)
Heap Blocks: exact=1
Buffers: shared hit=5
-> Bitmap Index Scan on airports_searchable_index (cost=0.00..24.01 rows=1 width=0) (actual time=0.038..0.038 rows=1 loops=1)
Index Cond: (airports.searchable ## '''ltn'':*'::tsquery)
Buffers: shared hit=4
Planning time: 0.534 ms
Execution time: 0.122 ms

Related

Inconsistent cold start query performance in Postgres query

We're having issues trying to build queries for a Postgres hosted datamart. Our query is simple, and contains a modest amount of data. We've seen some vast differences in the execution time of this query between runs- sometimes taking around 20 seconds, other times taking just 3 seconds- but we cannot seem to see what causes these differences and we're aiming to get consistent results. There are only 2 tables involved in the query, one representing order rows (OrderItemTransactionFact 2,937,264 rows) and the other recording current stock levels for each item (stocklevels 62,353 rows). There are no foreign keys due to this being a datamart which we run ETL processes against so require fast loading.
The query is;
select
oitf."SKUId",
sum(oitf."ConvertedLineTotal") as "totalrevenue",
sum(oitf."Quantity") as "quantitysold",
coalesce (sl."Available",0) as "availablestock"
from "OrderItemTransactionFact" oitf
left join stocklevels sl on sl."SKUId" = oitf."SKUId"
where
oitf."transactionTypeId" = 2
and oitf."hasComposite" = false
and oitf."ReceivedDate" >= extract(epoch from timestamp '2020-07-01 00:00:00')
and oitf."ReceivedDate" <= extract(epoch from timestamp '2021-10-01 00:00:00')
group by
oitf."SKUId", sl."Available"
order by oitf."SKUId";
The OrderItemTransactionFact table has a couple indexes;
create index IX_OrderItemTransactionFact_ReceivedDate on public."OrderItemTransactionFact" ("ReceivedDate" DESC);
create index IX_OrderItemTransactionFact_ReceivedDate_transactionTypeId on public."OrderItemTransactionFact" ("ReceivedDate" desc, "transactionTypeId");
Execution plan output for a 26 second run is;
GroupAggregate (cost=175096.24..195424.66 rows=813137 width=52) (actual time=24100.268..24874.065 rows=26591 loops=1)
Group Key: oitf."SKUId", sl."Available"
Buffers: shared hit=659 read=43311 written=1042
-> Sort (cost=175096.24..177129.08 rows=813137 width=19) (actual time=24100.249..24275.594 rows=916772 loops=1)
Sort Key: oitf."SKUId", sl."Available"
Sort Method: quicksort Memory: 95471kB
Buffers: shared hit=659 read=43311 written=1042
-> Hash Left Join (cost=20671.85..95274.08 rows=813137 width=19) (actual time=239.392..23127.993 rows=916772 loops=1)
Hash Cond: (oitf."SKUId" = sl."SKUId")
Buffers: shared hit=659 read=43311 written=1042
-> Bitmap Heap Scan on "OrderItemTransactionFact" oitf (cost=18091.90..73485.91 rows=738457 width=15) (actual time=200.178..22413.601 rows=701397 loops=1)
Recheck Cond: (("ReceivedDate" >= '1585699200'::double precision) AND ("ReceivedDate" <= '1625097600'::double precision))
Filter: ((NOT "hasComposite") AND ("transactionTypeId" = 2))
Rows Removed by Filter: 166349
Heap Blocks: exact=40419
Buffers: shared hit=55 read=42738 written=1023
-> Bitmap Index Scan on ix_orderitemtransactionfact_receiveddate (cost=0.00..17907.29 rows=853486 width=0) (actual time=191.274..191.274 rows=867746 loops=1)
Index Cond: (("ReceivedDate" >= '1585699200'::double precision) AND ("ReceivedDate" <= '1625097600'::double precision))
Buffers: shared hit=9 read=2365 written=181
-> Hash (cost=1800.53..1800.53 rows=62353 width=8) (actual time=38.978..38.978 rows=62353 loops=1)
Buckets: 65536 Batches: 1 Memory Usage: 2948kB
Buffers: shared hit=604 read=573 written=19
-> Seq Scan on stocklevels sl (cost=0.00..1800.53 rows=62353 width=8) (actual time=0.031..24.301 rows=62353 loops=1)
Buffers: shared hit=604 read=573 written=19
Planning Time: 0.543 ms
Execution Time: 24889.522 ms
But then execution plan for the same query when it took just 3 seconds;
GroupAggregate (cost=173586.52..193692.59 rows=804243 width=52) (actual time=2616.588..3220.394 rows=26848 loops=1)
Group Key: oitf."SKUId", sl."Available"
Buffers: shared hit=2 read=43929
-> Sort (cost=173586.52..175597.13 rows=804243 width=19) (actual time=2616.570..2813.571 rows=889937 loops=1)
Sort Key: oitf."SKUId", sl."Available"
Sort Method: quicksort Memory: 93001kB
Buffers: shared hit=2 read=43929
-> Hash Left Join (cost=20472.48..94701.25 rows=804243 width=19) (actual time=185.018..1512.626 rows=889937 loops=1)
Hash Cond: (oitf."SKUId" = sl."SKUId")
Buffers: shared hit=2 read=43929
-> Bitmap Heap Scan on "OrderItemTransactionFact" oitf (cost=17892.54..73123.18 rows=730380 width=15) (actual time=144.000..960.232 rows=689090 loops=1)
Recheck Cond: (("ReceivedDate" >= '1593561600'::double precision) AND ("ReceivedDate" <= '1633046400'::double precision))
Filter: ((NOT "hasComposite") AND ("transactionTypeId" = 2))
Rows Removed by Filter: 159949
Heap Blocks: exact=40431
Buffers: shared read=42754
-> Bitmap Index Scan on ix_orderitemtransactionfact_receiveddate (cost=0.00..17709.94 rows=844151 width=0) (actual time=134.806..134.806 rows=849039 loops=1)
Index Cond: (("ReceivedDate" >= '1593561600'::double precision) AND ("ReceivedDate" <= '1633046400'::double precision))
Buffers: shared read=2323
-> Hash (cost=1800.53..1800.53 rows=62353 width=8) (actual time=40.500..40.500 rows=62353 loops=1)
Buckets: 65536 Batches: 1 Memory Usage: 2948kB
Buffers: shared hit=2 read=1175
-> Seq Scan on stocklevels sl (cost=0.00..1800.53 rows=62353 width=8) (actual time=0.025..24.620 rows=62353 loops=1)
Buffers: shared hit=2 read=1175
Planning Time: 0.565 ms
Execution Time: 3235.300 ms
The server config is;
version: PostgreSQL 12.1, compiled by Visual C++ build 1914, 64-bit
work_mem : 1048576kb
shared_buffers : 16384 (x8kb)
Thanks in advance!
It is the filesystem cache. The slow one had to read the data off disk. The fast one just had to fetch the data from memory, probably because the slow one already read it and left it there. You can make this show up explicitly in the plans by turning on track_io_timing.
It should help a little to have an index on ("transactionTypeId","hasComposite","ReceivedDate"), perhaps a lot to crank up effective_io_concurrency (depending on your storage system).
But mostly, get faster disks.

Explain analyze slower than actual query in postgres

I have the following query
select * from activity_feed where user_id in (select following_id from user_follow where follower_id=:user_id)
union
select * from activity_feed where project_id in (select project_id from user_project_follow where user_id=:user_id)
order by id desc limit 30
Which runs in approximately 14 ms according to postico
But when i do explain analyze on this query , the plannig time is 0.5 ms and the execution time is around 800 ms (which is what i would actually expect). Is this because the query without explain analyze is returning cached results? I still get less than 20 ms results even when. use other values.
Which one is more indictivie of the performance I'll get in production? I also realized that this is a rather inefficient query, I can't seem to figure out an index that would make this more efficient. It's possible that I will have to not use union
Edit: the execution plan
Limit (cost=1380.94..1380.96 rows=10 width=148) (actual time=771.111..771.405 rows=10 loops=1)
-> Sort (cost=1380.94..1385.64 rows=1881 width=148) (actual time=771.097..771.160 rows=10 loops=1)
Sort Key: activity_feed."timestamp" DESC
Sort Method: top-N heapsort Memory: 27kB
-> HashAggregate (cost=1321.48..1340.29 rows=1881 width=148) (actual time=714.888..743.273 rows=4462 loops=1)
Group Key: activity_feed.id, activity_feed."timestamp", activity_feed.user_id, activity_feed.verb, activity_feed.object_type, activity_feed.object_id, activity_feed.project_id, activity_feed.privacy_level, activity_feed.local_time, activity_feed.local_date
-> Append (cost=5.12..1274.46 rows=1881 width=148) (actual time=0.998..682.466 rows=4487 loops=1)
-> Hash Join (cost=5.12..610.43 rows=1350 width=70) (actual time=0.982..326.089 rows=3013 loops=1)
Hash Cond: (activity_feed.user_id = user_follow.following_id)
-> Seq Scan on activity_feed (cost=0.00..541.15 rows=24215 width=70) (actual time=0.016..150.535 rows=24215 loops=1)
-> Hash (cost=4.78..4.78 rows=28 width=8) (actual time=0.911..0.922 rows=29 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 10kB
-> Index Only Scan using unique_user_follow_pair on user_follow (cost=0.29..4.78 rows=28 width=8) (actual time=0.022..0.334 rows=29 loops=1)
Index Cond: (follower_id = '17420532762804570'::bigint)
Heap Fetches: 0
-> Hash Join (cost=30.50..635.81 rows=531 width=70) (actual time=0.351..301.945 rows=1474 loops=1)
Hash Cond: (activity_feed_1.project_id = user_project_follow.project_id)
-> Seq Scan on activity_feed activity_feed_1 (cost=0.00..541.15 rows=24215 width=70) (actual time=0.027..143.896 rows=24215 loops=1)
-> Hash (cost=30.36..30.36 rows=11 width=8) (actual time=0.171..0.182 rows=11 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Index Only Scan using idx_user_project_follow_temp on user_project_follow (cost=0.28..30.36 rows=11 width=8) (actual time=0.020..0.102 rows=11 loops=1)
Index Cond: (user_id = '17420532762804570'::bigint)
Heap Fetches: 11
Planning Time: 0.571 ms
Execution Time: 771.774 ms
Thanks for the help in advance!
Very slow clock access like you show here (nearly 100 fold slower when TIMING defaults to ON!) usually indicates either old hardware or an old kernel IME. Not being able to trust EXPLAIN (ANALYZE) to get good data can be very frustrating if you are very particular about performance, so you should consider upgrading your hardware or your OS.

Why is Postgres EXPLAIN ANALYZE execution_time different than when I run the actual query?

I'm using a database client to test.
Using EXPLAIN ANALYZE:
Hash Join (cost=5.02..287015.54 rows=3400485 width=33) (actual time=0.023..1725.842 rows=3327845 loops=1)
Hash Cond: ((fact_orders.financial_status)::text = (include_list.financial_status)::text)
CTE include_list
-> Result (cost=0.00..1.77 rows=100 width=32) (actual time=0.003..0.004 rows=4 loops=1)
-> ProjectSet (cost=0.00..0.52 rows=100 width=32) (actual time=0.002..0.003 rows=4 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.000 rows=1 loops=1)
-> Seq Scan on fact_orders (cost=0.00..240253.85 rows=3400485 width=38) (actual time=0.006..551.558 rows=3400485 loops=1)
-> Hash (cost=2.00..2.00 rows=100 width=32) (actual time=0.009..0.009 rows=4 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> CTE Scan on include_list (cost=0.00..2.00 rows=100 width=32) (actual time=0.004..0.007 rows=4 loops=1)
Planning time: 0.163 ms
Execution time: 1852.226 ms
According to the query above, I have an execution time of 1852.226 ms.
There are approximately 3.3 million records returned.
But when I run the query without the EXPLAIN ANALYZE, it takes roughly ~30 seconds to get the results back from my database client.
Is the extra 28 seconds the transfer time from the server to my client? Or is that the actual time to execute the query?
Edit: Client is Navicat. Using the time elapsed after the results are yielded to the screen.
The documentation says:
Keep in mind that the statement is actually executed when the ANALYZE option is used. Although EXPLAIN will discard any output that a SELECT would return, other side effects of the statement will happen as usual.
So the only difference between running an explain on a select query and running the actual query is that data is not actually fetched. Your query returns a huge amount of records, so that only can very well explain the difference that you are seeing.

How do i reduce the cost of this query while keeping the query results the same?

I have the below query running on a postgres and sqlserver DB (Use top for SQL server). The sorting of the "change_sequence" value is causing a high cost in my query, is there any way to reduce the cost but maintain the same results?
Query:
SELECT tablename,
CAST(primary_key_values AS VARCHAR),
primary_key_fields,
CAST(min_sequence AS NUMERIC),
_changed_fieldlist,
_operation,
min_sequence
FROM (
SELECT 'memdep' AS tablename,
CONCAT_WS(',',dependant,mem_num) AS primary_key_values,
'dependant,mem_num,' AS primary_key_fields,
_change_sequence AS min_sequence,
ROW_NUMBER() OVER(partition by dependant,mem_num order by _change_sequence) AS rn,
_changed_fieldlist,
_operation
FROM mipbi_ods.memdep
WHERE mipbi_status = 'NEW'
) main
WHERE rn = 1
LIMIT 100
In essence what i'm looking for is the records from "memdep" where they have a "mipbi_status" of 'NEW' with the lowest "_change_sequence". Ive tried using a MIN() function instead of the ROW_NUMBER the speed is about the same cost is about 5 more.
Is there a way to reduce the cost/speed of the query. I have around 400 million records in this table if that helps.
Here is the query explained:
Limit (cost=3080.03..3080.53 rows=100 width=109) (actual time=17.633..17.648 rows=35 loops=1)
-> Unique (cost=3080.03..3089.04 rows=1793 width=109) (actual time=17.632..17.644 rows=35 loops=1)
-> Sort (cost=3080.03..3084.53 rows=1803 width=109) (actual time=17.631..17.634 rows=36 loops=1)
Sort Key: (concat_ws(','::text, memdet.mem_num))
Sort Method: quicksort Memory: 29kB
-> Bitmap Heap Scan on memdet (cost=54.39..2982.52 rows=1803 width=109) (actual time=16.853..17.542 rows=36 loops=1)
Recheck Cond: ((mipbi_status)::text = 'NEW'::text)
Heap Blocks: exact=8
-> Bitmap Index Scan on idx_mipbi_status_memdet (cost=0.00..53.94 rows=1803 width=0) (actual time=10.396..10.396 rows=38 loops=1)
Index Cond: ((mipbi_status)::text = 'NEW'::text)
Planning time: 0.201 ms
Execution time: 17.700 ms
I'm using a smaller table to show here, this isn't the 400 million record table, but indexes and all will be the same.
Here is the query plan for the large table:
Limit (cost=47148422.27..47149122.27 rows=100 width=113) (actual time=2407976.293..2407977.112 rows=100 loops=1)
Output: main.tablename, ((main.primary_key_values)::character varying), main.primary_key_fields, main.min_sequence, main._changed_fieldlist, main._operation, main.min_sequence
Buffers: shared hit=6269554 read=12205028 dirtied=1893 written=4566983, temp read=443831 written=1016025
-> Subquery Scan on main (cost=47148422.27..52102269.25 rows=707692 width=113) (actual time=2407976.292..2407977.100 rows=100 loops=1)
Output: main.tablename, (main.primary_key_values)::character varying, main.primary_key_fields, main.min_sequence, main._changed_fieldlist, main._operation, main.min_sequence
Filter: (main.rn = 1)
Buffers: shared hit=6269554 read=12205028 dirtied=1893 written=4566983, temp read=443831 written=1016025
-> WindowAgg (cost=47148422.27..50333038.19 rows=141538485 width=143) (actual time=2407976.288..2407977.080 rows=100 loops=1)
Output: 'claim', concat_ws(','::text, claim.gen_claimnum), 'gen_claimnum,', claim._change_sequence, row_number() OVER (?), claim._changed_fieldlist, claim._operation, claim.gen_claimnum
Buffers: shared hit=6269554 read=12205028 dirtied=1893 written=4566983, temp read=443831 written=1016025
-> Sort (cost=47148422.27..47502268.49 rows=141538485 width=39) (actual time=2407976.236..2407976.905 rows=100 loops=1)
Output: claim._change_sequence, claim.gen_claimnum, claim._changed_fieldlist, claim._operation
Sort Key: claim.gen_claimnum, claim._change_sequence
Sort Method: external merge Disk: 4588144kB
Buffers: shared hit=6269554 read=12205028 dirtied=1893 written=4566983, temp read=443831 written=1016025
-> Seq Scan on mipbi_ods.claim (cost=0.00..20246114.01 rows=141538485 width=39) (actual time=0.028..843181.418 rows=88042077 loops=1)
Output: claim._change_sequence, claim.gen_claimnum, claim._changed_fieldlist, claim._operation
Filter: ((claim.mipbi_status)::text = 'NEW'::text)
Rows Removed by Filter: 356194
Buffers: shared hit=6269554 read=12205028 dirtied=1893 written=4566983
Planning time: 8.796 ms
Execution time: 2408702.464 ms

Why my query cost is so high?

When I execute explain analyze on some query I've got the normal cost from some low value to some higher value. But when I'm trying to force to use the index on table by switching enable_seqscan to false, the query cost jumps to insane values like:
Merge Join (cost=10064648609.460..10088218360.810 rows=564249 width=21) (actual time=341699.323..370702.969 rows=3875328 loops=1)
Merge Cond: ((foxtrot.two = ((five_hotel.two)::numeric)) AND (foxtrot.alpha_two07 = ((five_hotel.alpha_two07)::numeric)))
-> Merge Append (cost=10000000000.580..10023064799.260 rows=23522481 width=24) (actual time=0.049..19455.320 rows=23522755 loops=1)
Sort Key: foxtrot.two, foxtrot.alpha_two07
-> Sort (cost=10000000000.010..10000000000.010 rows=1 width=76) (actual time=0.005..0.005 rows=0 loops=1)
Sort Key: foxtrot.two, foxtrot.alpha_two07
Sort Method: quicksort Memory: 25kB
-> Seq Scan on foxtrot (cost=10000000000.000..10000000000.000 rows=1 width=76) (actual time=0.001..0.001 rows=0 loops=1)
Filter: (kilo_sierra_oscar = 'oscar'::date)
-> Index Scan using alpha_five on five_uniform (cost=0.560..22770768.220 rows=23522480 width=24) (actual time=0.043..17454.619 rows=23522755 loops=1)
Filter: (kilo_sierra_oscar = 'oscar'::date)
As you can see I'm trying to retrive values by index, so they doesn't need to be sorted once they're loaded.
It is a simple query:
select *
from foxtrot a
where foxtrot.kilo_sierra_oscar = date'2015-01-01'
order by foxtrot.two, foxtrot.alpha_two07
Index scan: "Execution time: 19009.569 ms"
Sequential scan: "Execution time: 127062.802 ms"
Setting the enable_seqscan to false improves execution time of query, but I would like optimizer to calculate that.
EDIT:
Seq plan with buffers:
Sort (cost=4607555.110..4666361.310 rows=23522481 width=24) (actual time=101094.754..120740.190 rows=23522756 loops=1)
Sort Key: foxtrot.two, foxtrot.alpha07
Sort Method: external merge Disk: 805304kB
Buffers: shared hit=468690, temp read=100684 written=100684
-> Append (cost=0.000..762721.000 rows=23522481 width=24) (actual time=0.006..12018.725 rows=23522756 loops=1)
Buffers: shared hit=468690
-> Seq Scan on foxtrot (cost=0.000..0.000 rows=1 width=76) (actual time=0.001..0.001 rows=0 loops=1)
Filter: (kilo = 'oscar'::date)
-> Seq Scan on foxtrot (cost=0.000..762721.000 rows=23522480 width=24) (actual time=0.005..9503.851 rows=23522756 loops=1)
Filter: (kilo = 'oscar'::date)
Buffers: shared hit=468690
Index plan with buffers:
Merge Append (cost=10000000000.580..10023064799.260 rows=23522481 width=24) (actual time=0.046..19302.855 rows=23522756 loops=1)
Sort Key: foxtrot.two, foxtrot.alpha_two07
Buffers: shared hit=17855133 -> Sort (cost=10000000000.010..10000000000.010 rows=1 width=76) (actual time=0.009..0.009 rows=0 loops=1)
Sort Key: foxtrot.two, foxtrot.alpha_two07
Sort Method: quicksort Memory: 25kB
-> Seq Scan on foxtrot (cost=10000000000.000..10000000000.000 rows=1 width=76) (actual time=0.000..0.000 rows=0 loops=1)
Filter: (kilo = 'oscar'::date)
-> Index Scan using alpha_five on five (cost=0.560..22770768.220 rows=23522480 width=24) (actual time=0.036..17035.903 rows=23522756 loops=1)
Filter: (kilo = 'oscar'::date)
Buffers: shared hit=17855133
Why the cost of the query jumps so high? How can I avoid it?
The high cost is a direct consequence of set enable_seqscan=false.
The planner implements this "hint" by setting an arbitrary super-high cost (10 000 000 000) to the sequential scan technique. Then it computes the different potential execution strategies with their associated costs.
If the best result still has a super-high cost, it means that the planner found no strategy to avoid the sequential scan, even when trying at all costs.
In the plan shown in the question under "Index plan with buffers" this happens at the Seq Scan on foxtrot node.