Postgresql Query slow if empty table in IN clause

Postgresql Query slow if empty table in IN clause - sql

I have the following SQL
WITH filtered_users_pre as (
SELECT value as username,row_number() OVER (partition by value) AS rk
FROM "user-stats".tag_table
WHERE _at_timestamp = 1626955200
AND tag in ('commercial','marketing')
),
filtered_users as (
SELECT username
FROM filtered_users_pre
WHERE rk = 2
),
valid_users as (
SELECT aa.username, aa.rank, aa.points, aa.version
FROM "users-results".ai_algo aa
WHERE aa._at_timestamp = 1626955200
AND aa.rank_timeframe = '7d'
AND aa.username IN (SELECT * FROM filtered_users)
ORDER BY aa.rank ASC
LIMIT 15
OFFSET 0
)
select * from valid_users;
"user-stats".tag_table is a table with around 60 million rows, with proper indexes.
"users-results".ai_algo is a table with around 10 million rows, with proper indexes.
With proper indexes I mean all the fields that appear in a WHERE clause above.
If filtered_users is empty, the query takes 4 seconds to run. If filtered_users has at least one row, it takes 400ms.
Anyone can explain me why? Is there I way I can have the query running with the same performance (400ms) also with filtered_users empty? I was expecting to get better performance with the reducing of number of rows in filtered_users. That's what happens up to 1 row. When the rows are 0, it takes 10 times more.
Of couse same happens if instead of IN clause in the WHERE, I put a INNER JOIN between ai_algo and filtered_users
Update
This is the EXPLAIN (ANALYZE, BUFFERS) output query when filtered_users has 0 rows (4 secs of execution)
Limit (cost=14592.13..15870.39 rows=15 width=35) (actual time=3953.945..3953.949 rows=0 loops=1)
Buffers: shared hit=7456641
-> Nested Loop Semi Join (cost=14592.13..1795382.62 rows=20897 width=35) (actual time=3953.944..3953.947 rows=0 loops=1)
Join Filter: (aa.username = filtered_users_pre.username)
Buffers: shared hit=7456641
-> Index Scan using ai_algo_202107_rank_timeframe_rank_idx on ai_algo_202107 aa (cost=0.56..1718018.61 rows=321495 width=35) (actual time=0.085..3885.547 rows=313611 loops=1)
" Index Cond: (rank_timeframe = '7d'::""valid-users-timeframe"")"
Filter: (_at_timestamp = 1626955200)
Rows Removed by Filter: 7793096
Buffers: shared hit=7456533
-> Materialize (cost=14591.56..14672.51 rows=13 width=21) (actual time=0.000..0.000 rows=0 loops=313611)
Buffers: shared hit=108
-> Subquery Scan on filtered_users_pre (cost=14591.56..14672.44 rows=13 width=21) (actual time=3.543..3.545 rows=0 loops=1)
Filter: (filtered_users_pre.rk = 2)
Rows Removed by Filter: 2415
Buffers: shared hit=108
-> WindowAgg (cost=14591.56..14638.74 rows=2696 width=29) (actual time=1.996..3.356 rows=2415 loops=1)
Buffers: shared hit=108
-> Sort (cost=14591.56..14598.30 rows=2696 width=21) (actual time=1.990..2.189 rows=2415 loops=1)
Sort Key: tag_table_20210722.value
Sort Method: quicksort Memory: 285kB
Buffers: shared hit=108
-> Bitmap Heap Scan on tag_table_20210722 (cost=146.24..14437.94 rows=2696 width=21) (actual time=0.612..1.080 rows=2415 loops=1)
" Recheck Cond: ((tag)::text = ANY ('{commercial,marketing}'::text[]))"
Filter: (_at_timestamp = 1626955200)
Rows Removed by Filter: 2415
Heap Blocks: exact=72
Buffers: shared hit=105
-> Bitmap Index Scan on tag_table_20210722_tag_idx (cost=0.00..145.57 rows=5428 width=0) (actual time=0.292..0.292 rows=4830 loops=1)
" Index Cond: ((tag)::text = ANY ('{commercial,marketing}'::text[]))"
Buffers: shared hit=33
Planning Time: 0.914 ms
Execution Time: 3954.035 ms
This is when filtered_users has at least 1 row (300ms)
Limit (cost=14592.13..15870.39 rows=15 width=35) (actual time=15.958..300.759 rows=15 loops=1)
Buffers: shared hit=11042
-> Nested Loop Semi Join (cost=14592.13..1795382.62 rows=20897 width=35) (actual time=15.957..300.752 rows=15 loops=1)
Join Filter: (aa.username = filtered_users_pre.username)
Rows Removed by Join Filter: 1544611
Buffers: shared hit=11042
-> Index Scan using ai_algo_202107_rank_timeframe_rank_idx on ai_algo_202107 aa (cost=0.56..1718018.61 rows=321495 width=35) (actual time=0.075..10.455 rows=645 loops=1)
" Index Cond: (rank_timeframe = '7d'::""valid-users-timeframe"")"
Filter: (_at_timestamp = 1626955200)
Rows Removed by Filter: 16124
Buffers: shared hit=10937
-> Materialize (cost=14591.56..14672.51 rows=13 width=21) (actual time=0.003..0.174 rows=2395 loops=645)
Buffers: shared hit=105
-> Subquery Scan on filtered_users_pre (cost=14591.56..14672.44 rows=13 width=21) (actual time=1.895..3.680 rows=2415 loops=1)
Filter: (filtered_users_pre.rk = 1)
Buffers: shared hit=105
-> WindowAgg (cost=14591.56..14638.74 rows=2696 width=29) (actual time=1.894..3.334 rows=2415 loops=1)
Buffers: shared hit=105
-> Sort (cost=14591.56..14598.30 rows=2696 width=21) (actual time=1.889..2.102 rows=2415 loops=1)
Sort Key: tag_table_20210722.value
Sort Method: quicksort Memory: 285kB
Buffers: shared hit=105
-> Bitmap Heap Scan on tag_table_20210722 (cost=146.24..14437.94 rows=2696 width=21) (actual time=0.604..1.046 rows=2415 loops=1)
" Recheck Cond: ((tag)::text = ANY ('{commercial,marketing}'::text[]))"
Filter: (_at_timestamp = 1626955200)
Rows Removed by Filter: 2415
Heap Blocks: exact=72
Buffers: shared hit=105
-> Bitmap Index Scan on tag_table_20210722_tag_idx (cost=0.00..145.57 rows=5428 width=0) (actual time=0.287..0.287 rows=4830 loops=1)
" Index Cond: ((tag)::text = ANY ('{commercial,marketing}'::text[]))"
Buffers: shared hit=33
Planning Time: 0.310 ms
Execution Time: 300.954 ms

The problem is that if there are no matching filtered_users, PostgreSQL has to go through all "users-results".ai_algo without finding 15 result rows. If the subquery contains elements, it quickly finds 15 matching "users-results".ai_algo rows and can terminate processing.
There is nothing you can do about that, but you can speed up the scan of "users-results".ai_algo. Currently, you have
-> Index Scan using ai_algo_202107_rank_timeframe_rank_idx on ai_algo_202107 aa
... (actual time=0.085..3885.547 rows=313611 loops=1)
Index Cond: (rank_timeframe = '7d'::"valid-users-timeframe")
Filter: (_at_timestamp = 1626955200)
Rows Removed by Filter: 7793096
Buffers: shared hit=7456533
You see that the index scan is not as effective as it could be: it reads 313611 + 7793096 = 8106707 rows from the table and discards all but the 313611 that match the filter condition.
You can do better by creating an index that can find only the result rows directly:
CREATE INDEX ON "users-results".ai_algo (rank_timeframe, _at_timestamp);
Then you can drop the index ai_algo_rank_timeframe_rank_idx, because the new index can do everything that the old one could do (and more).

Related

Inconsistent cold start query performance in Postgres query

We're having issues trying to build queries for a Postgres hosted datamart. Our query is simple, and contains a modest amount of data. We've seen some vast differences in the execution time of this query between runs- sometimes taking around 20 seconds, other times taking just 3 seconds- but we cannot seem to see what causes these differences and we're aiming to get consistent results. There are only 2 tables involved in the query, one representing order rows (OrderItemTransactionFact 2,937,264 rows) and the other recording current stock levels for each item (stocklevels 62,353 rows). There are no foreign keys due to this being a datamart which we run ETL processes against so require fast loading.
The query is;
select
oitf."SKUId",
sum(oitf."ConvertedLineTotal") as "totalrevenue",
sum(oitf."Quantity") as "quantitysold",
coalesce (sl."Available",0) as "availablestock"
from "OrderItemTransactionFact" oitf
left join stocklevels sl on sl."SKUId" = oitf."SKUId"
where
oitf."transactionTypeId" = 2
and oitf."hasComposite" = false
and oitf."ReceivedDate" >= extract(epoch from timestamp '2020-07-01 00:00:00')
and oitf."ReceivedDate" <= extract(epoch from timestamp '2021-10-01 00:00:00')
group by
oitf."SKUId", sl."Available"
order by oitf."SKUId";
The OrderItemTransactionFact table has a couple indexes;
create index IX_OrderItemTransactionFact_ReceivedDate on public."OrderItemTransactionFact" ("ReceivedDate" DESC);
create index IX_OrderItemTransactionFact_ReceivedDate_transactionTypeId on public."OrderItemTransactionFact" ("ReceivedDate" desc, "transactionTypeId");
Execution plan output for a 26 second run is;
GroupAggregate (cost=175096.24..195424.66 rows=813137 width=52) (actual time=24100.268..24874.065 rows=26591 loops=1)
Group Key: oitf."SKUId", sl."Available"
Buffers: shared hit=659 read=43311 written=1042
-> Sort (cost=175096.24..177129.08 rows=813137 width=19) (actual time=24100.249..24275.594 rows=916772 loops=1)
Sort Key: oitf."SKUId", sl."Available"
Sort Method: quicksort Memory: 95471kB
Buffers: shared hit=659 read=43311 written=1042
-> Hash Left Join (cost=20671.85..95274.08 rows=813137 width=19) (actual time=239.392..23127.993 rows=916772 loops=1)
Hash Cond: (oitf."SKUId" = sl."SKUId")
Buffers: shared hit=659 read=43311 written=1042
-> Bitmap Heap Scan on "OrderItemTransactionFact" oitf (cost=18091.90..73485.91 rows=738457 width=15) (actual time=200.178..22413.601 rows=701397 loops=1)
Recheck Cond: (("ReceivedDate" >= '1585699200'::double precision) AND ("ReceivedDate" <= '1625097600'::double precision))
Filter: ((NOT "hasComposite") AND ("transactionTypeId" = 2))
Rows Removed by Filter: 166349
Heap Blocks: exact=40419
Buffers: shared hit=55 read=42738 written=1023
-> Bitmap Index Scan on ix_orderitemtransactionfact_receiveddate (cost=0.00..17907.29 rows=853486 width=0) (actual time=191.274..191.274 rows=867746 loops=1)
Index Cond: (("ReceivedDate" >= '1585699200'::double precision) AND ("ReceivedDate" <= '1625097600'::double precision))
Buffers: shared hit=9 read=2365 written=181
-> Hash (cost=1800.53..1800.53 rows=62353 width=8) (actual time=38.978..38.978 rows=62353 loops=1)
Buckets: 65536 Batches: 1 Memory Usage: 2948kB
Buffers: shared hit=604 read=573 written=19
-> Seq Scan on stocklevels sl (cost=0.00..1800.53 rows=62353 width=8) (actual time=0.031..24.301 rows=62353 loops=1)
Buffers: shared hit=604 read=573 written=19
Planning Time: 0.543 ms
Execution Time: 24889.522 ms
But then execution plan for the same query when it took just 3 seconds;
GroupAggregate (cost=173586.52..193692.59 rows=804243 width=52) (actual time=2616.588..3220.394 rows=26848 loops=1)
Group Key: oitf."SKUId", sl."Available"
Buffers: shared hit=2 read=43929
-> Sort (cost=173586.52..175597.13 rows=804243 width=19) (actual time=2616.570..2813.571 rows=889937 loops=1)
Sort Key: oitf."SKUId", sl."Available"
Sort Method: quicksort Memory: 93001kB
Buffers: shared hit=2 read=43929
-> Hash Left Join (cost=20472.48..94701.25 rows=804243 width=19) (actual time=185.018..1512.626 rows=889937 loops=1)
Hash Cond: (oitf."SKUId" = sl."SKUId")
Buffers: shared hit=2 read=43929
-> Bitmap Heap Scan on "OrderItemTransactionFact" oitf (cost=17892.54..73123.18 rows=730380 width=15) (actual time=144.000..960.232 rows=689090 loops=1)
Recheck Cond: (("ReceivedDate" >= '1593561600'::double precision) AND ("ReceivedDate" <= '1633046400'::double precision))
Filter: ((NOT "hasComposite") AND ("transactionTypeId" = 2))
Rows Removed by Filter: 159949
Heap Blocks: exact=40431
Buffers: shared read=42754
-> Bitmap Index Scan on ix_orderitemtransactionfact_receiveddate (cost=0.00..17709.94 rows=844151 width=0) (actual time=134.806..134.806 rows=849039 loops=1)
Index Cond: (("ReceivedDate" >= '1593561600'::double precision) AND ("ReceivedDate" <= '1633046400'::double precision))
Buffers: shared read=2323
-> Hash (cost=1800.53..1800.53 rows=62353 width=8) (actual time=40.500..40.500 rows=62353 loops=1)
Buckets: 65536 Batches: 1 Memory Usage: 2948kB
Buffers: shared hit=2 read=1175
-> Seq Scan on stocklevels sl (cost=0.00..1800.53 rows=62353 width=8) (actual time=0.025..24.620 rows=62353 loops=1)
Buffers: shared hit=2 read=1175
Planning Time: 0.565 ms
Execution Time: 3235.300 ms
The server config is;
version: PostgreSQL 12.1, compiled by Visual C++ build 1914, 64-bit
work_mem : 1048576kb
shared_buffers : 16384 (x8kb)
Thanks in advance!

It is the filesystem cache. The slow one had to read the data off disk. The fast one just had to fetch the data from memory, probably because the slow one already read it and left it there. You can make this show up explicitly in the plans by turning on track_io_timing.
It should help a little to have an index on ("transactionTypeId","hasComposite","ReceivedDate"), perhaps a lot to crank up effective_io_concurrency (depending on your storage system).
But mostly, get faster disks.

Can I optimize this query with an index?

given this query:
SELECT count(u.*)
FROM res_users u
WHERE active=true AND
share=false AND
NOT exists(SELECT 1 FROM res_users_log WHERE create_uid=u.id);
It currently takes 10 seconds.
I tried to make it faster with these 2 index commands, but it didn't help.
CREATE INDEX CONCURRENTLY id_active_share_index ON res_users (id,active,share);
CREATE INDEX CONCURRENTLY create_uid_index ON res_users_log (create_uid);
I guess it's because of the "NOT exists" line, but I have no idea how to include it into an index.
EXPLAIN (ANALYZE, BUFFERS) gives me this output:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=2815437.14..2815437.15 rows=1 width=8) (actual time=39174.365..39174.367 rows=1 loops=1)
Buffers: shared hit=124 read=112875 dirtied=70, temp read=98788 written=99211
-> Merge Anti Join (cost=2678572.70..2815437.09 rows=20 width=1064) (actual time=39174.360..39174.361 rows=0 loops=1)
Merge Cond: (u.id = res_users_log.create_uid)
Buffers: shared hit=124 read=112875 dirtied=70, temp read=98788 written=99211
-> Sort (cost=11.92..11.97 rows=20 width=1068) (actual time=5.577..5.602 rows=16 loops=1)
Sort Key: u.id
Sort Method: quicksort Memory: 79kB
Buffers: shared hit=53 read=5
-> Seq Scan on res_users u (cost=0.00..11.49 rows=20 width=1068) (actual time=0.050..5.519 rows=16 loops=1)
Filter: (active AND (NOT share))
Rows Removed by Filter: 33
Buffers: shared hit=49 read=5
-> Sort (cost=2678560.78..2716236.90 rows=15070449 width=4) (actual time=36258.796..38013.471 rows=15069209 loops=1)
Sort Key: res_users_log.create_uid
Sort Method: external merge Disk: 206464kB
Buffers: shared hit=71 read=112870 dirtied=70, temp read=98788 written=99211
-> Seq Scan on res_users_log (cost=0.00..263645.49 rows=15070449 width=4) (actual time=1.755..29961.086 rows=15069319 loops=1)
Buffers: shared hit=71 read=112870 dirtied=70
Planning Time: 0.889 ms
Execution Time: 39202.694 ms
(21 rows)

For this query:
SELECT count(*)
FROM res_users u
WHERE active = true AND
share = false AND
NOT exists (SELECT 1 FROM res_users_log rul WHERE rul.create_uid = u.id);
You want indexes on:
res_users(active, share, id)
res_users_log(create_uid)
Note that the ordering of the columns matters.

This index will make the query fast as lightning:
CREATE INDEX ON res_users_log (create_uid);

How do i reduce the cost of this query while keeping the query results the same?

I have the below query running on a postgres and sqlserver DB (Use top for SQL server). The sorting of the "change_sequence" value is causing a high cost in my query, is there any way to reduce the cost but maintain the same results?
Query:
SELECT tablename,
CAST(primary_key_values AS VARCHAR),
primary_key_fields,
CAST(min_sequence AS NUMERIC),
_changed_fieldlist,
_operation,
min_sequence
FROM (
SELECT 'memdep' AS tablename,
CONCAT_WS(',',dependant,mem_num) AS primary_key_values,
'dependant,mem_num,' AS primary_key_fields,
_change_sequence AS min_sequence,
ROW_NUMBER() OVER(partition by dependant,mem_num order by _change_sequence) AS rn,
_changed_fieldlist,
_operation
FROM mipbi_ods.memdep
WHERE mipbi_status = 'NEW'
) main
WHERE rn = 1
LIMIT 100
In essence what i'm looking for is the records from "memdep" where they have a "mipbi_status" of 'NEW' with the lowest "_change_sequence". Ive tried using a MIN() function instead of the ROW_NUMBER the speed is about the same cost is about 5 more.
Is there a way to reduce the cost/speed of the query. I have around 400 million records in this table if that helps.
Here is the query explained:
Limit (cost=3080.03..3080.53 rows=100 width=109) (actual time=17.633..17.648 rows=35 loops=1)
-> Unique (cost=3080.03..3089.04 rows=1793 width=109) (actual time=17.632..17.644 rows=35 loops=1)
-> Sort (cost=3080.03..3084.53 rows=1803 width=109) (actual time=17.631..17.634 rows=36 loops=1)
Sort Key: (concat_ws(','::text, memdet.mem_num))
Sort Method: quicksort Memory: 29kB
-> Bitmap Heap Scan on memdet (cost=54.39..2982.52 rows=1803 width=109) (actual time=16.853..17.542 rows=36 loops=1)
Recheck Cond: ((mipbi_status)::text = 'NEW'::text)
Heap Blocks: exact=8
-> Bitmap Index Scan on idx_mipbi_status_memdet (cost=0.00..53.94 rows=1803 width=0) (actual time=10.396..10.396 rows=38 loops=1)
Index Cond: ((mipbi_status)::text = 'NEW'::text)
Planning time: 0.201 ms
Execution time: 17.700 ms
I'm using a smaller table to show here, this isn't the 400 million record table, but indexes and all will be the same.
Here is the query plan for the large table:
Limit (cost=47148422.27..47149122.27 rows=100 width=113) (actual time=2407976.293..2407977.112 rows=100 loops=1)
Output: main.tablename, ((main.primary_key_values)::character varying), main.primary_key_fields, main.min_sequence, main._changed_fieldlist, main._operation, main.min_sequence
Buffers: shared hit=6269554 read=12205028 dirtied=1893 written=4566983, temp read=443831 written=1016025
-> Subquery Scan on main (cost=47148422.27..52102269.25 rows=707692 width=113) (actual time=2407976.292..2407977.100 rows=100 loops=1)
Output: main.tablename, (main.primary_key_values)::character varying, main.primary_key_fields, main.min_sequence, main._changed_fieldlist, main._operation, main.min_sequence
Filter: (main.rn = 1)
Buffers: shared hit=6269554 read=12205028 dirtied=1893 written=4566983, temp read=443831 written=1016025
-> WindowAgg (cost=47148422.27..50333038.19 rows=141538485 width=143) (actual time=2407976.288..2407977.080 rows=100 loops=1)
Output: 'claim', concat_ws(','::text, claim.gen_claimnum), 'gen_claimnum,', claim._change_sequence, row_number() OVER (?), claim._changed_fieldlist, claim._operation, claim.gen_claimnum
Buffers: shared hit=6269554 read=12205028 dirtied=1893 written=4566983, temp read=443831 written=1016025
-> Sort (cost=47148422.27..47502268.49 rows=141538485 width=39) (actual time=2407976.236..2407976.905 rows=100 loops=1)
Output: claim._change_sequence, claim.gen_claimnum, claim._changed_fieldlist, claim._operation
Sort Key: claim.gen_claimnum, claim._change_sequence
Sort Method: external merge Disk: 4588144kB
Buffers: shared hit=6269554 read=12205028 dirtied=1893 written=4566983, temp read=443831 written=1016025
-> Seq Scan on mipbi_ods.claim (cost=0.00..20246114.01 rows=141538485 width=39) (actual time=0.028..843181.418 rows=88042077 loops=1)
Output: claim._change_sequence, claim.gen_claimnum, claim._changed_fieldlist, claim._operation
Filter: ((claim.mipbi_status)::text = 'NEW'::text)
Rows Removed by Filter: 356194
Buffers: shared hit=6269554 read=12205028 dirtied=1893 written=4566983
Planning time: 8.796 ms
Execution time: 2408702.464 ms

Why my query cost is so high?

When I execute explain analyze on some query I've got the normal cost from some low value to some higher value. But when I'm trying to force to use the index on table by switching enable_seqscan to false, the query cost jumps to insane values like:
Merge Join (cost=10064648609.460..10088218360.810 rows=564249 width=21) (actual time=341699.323..370702.969 rows=3875328 loops=1)
Merge Cond: ((foxtrot.two = ((five_hotel.two)::numeric)) AND (foxtrot.alpha_two07 = ((five_hotel.alpha_two07)::numeric)))
-> Merge Append (cost=10000000000.580..10023064799.260 rows=23522481 width=24) (actual time=0.049..19455.320 rows=23522755 loops=1)
Sort Key: foxtrot.two, foxtrot.alpha_two07
-> Sort (cost=10000000000.010..10000000000.010 rows=1 width=76) (actual time=0.005..0.005 rows=0 loops=1)
Sort Key: foxtrot.two, foxtrot.alpha_two07
Sort Method: quicksort Memory: 25kB
-> Seq Scan on foxtrot (cost=10000000000.000..10000000000.000 rows=1 width=76) (actual time=0.001..0.001 rows=0 loops=1)
Filter: (kilo_sierra_oscar = 'oscar'::date)
-> Index Scan using alpha_five on five_uniform (cost=0.560..22770768.220 rows=23522480 width=24) (actual time=0.043..17454.619 rows=23522755 loops=1)
Filter: (kilo_sierra_oscar = 'oscar'::date)
As you can see I'm trying to retrive values by index, so they doesn't need to be sorted once they're loaded.
It is a simple query:
select *
from foxtrot a
where foxtrot.kilo_sierra_oscar = date'2015-01-01'
order by foxtrot.two, foxtrot.alpha_two07
Index scan: "Execution time: 19009.569 ms"
Sequential scan: "Execution time: 127062.802 ms"
Setting the enable_seqscan to false improves execution time of query, but I would like optimizer to calculate that.
EDIT:
Seq plan with buffers:
Sort (cost=4607555.110..4666361.310 rows=23522481 width=24) (actual time=101094.754..120740.190 rows=23522756 loops=1)
Sort Key: foxtrot.two, foxtrot.alpha07
Sort Method: external merge Disk: 805304kB
Buffers: shared hit=468690, temp read=100684 written=100684
-> Append (cost=0.000..762721.000 rows=23522481 width=24) (actual time=0.006..12018.725 rows=23522756 loops=1)
Buffers: shared hit=468690
-> Seq Scan on foxtrot (cost=0.000..0.000 rows=1 width=76) (actual time=0.001..0.001 rows=0 loops=1)
Filter: (kilo = 'oscar'::date)
-> Seq Scan on foxtrot (cost=0.000..762721.000 rows=23522480 width=24) (actual time=0.005..9503.851 rows=23522756 loops=1)
Filter: (kilo = 'oscar'::date)
Buffers: shared hit=468690
Index plan with buffers:
Merge Append (cost=10000000000.580..10023064799.260 rows=23522481 width=24) (actual time=0.046..19302.855 rows=23522756 loops=1)
Sort Key: foxtrot.two, foxtrot.alpha_two07
Buffers: shared hit=17855133 -> Sort (cost=10000000000.010..10000000000.010 rows=1 width=76) (actual time=0.009..0.009 rows=0 loops=1)
Sort Key: foxtrot.two, foxtrot.alpha_two07
Sort Method: quicksort Memory: 25kB
-> Seq Scan on foxtrot (cost=10000000000.000..10000000000.000 rows=1 width=76) (actual time=0.000..0.000 rows=0 loops=1)
Filter: (kilo = 'oscar'::date)
-> Index Scan using alpha_five on five (cost=0.560..22770768.220 rows=23522480 width=24) (actual time=0.036..17035.903 rows=23522756 loops=1)
Filter: (kilo = 'oscar'::date)
Buffers: shared hit=17855133
Why the cost of the query jumps so high? How can I avoid it?

The high cost is a direct consequence of set enable_seqscan=false.
The planner implements this "hint" by setting an arbitrary super-high cost (10 000 000 000) to the sequential scan technique. Then it computes the different potential execution strategies with their associated costs.
If the best result still has a super-high cost, it means that the planner found no strategy to avoid the sequential scan, even when trying at all costs.
In the plan shown in the question under "Index plan with buffers" this happens at the Seq Scan on foxtrot node.

Slow SQL execution when querying from view

I execute select * from a view on 3 tables (gis_wcdma_cells, wcdma_cells, wcdma_cells_statistic):
CREATE OR REPLACE VIEW gisview_wcdma_cell_statistic AS
SELECT gis_wcdma_cells.wcdma_cell_id,
gis_wcdma_cells.geometry,
wcdma_cells."Cellname",
wcdma_cells."Longitude",
wcdma_cells."Latitude",
wcdma_cells."Orientation",
wcdma_cell_statistic.avg_ecio,
wcdma_cell_statistic.avg_rssi,
wcdma_cell_statistic.message_count,
wcdma_cell_statistic.measurement_count
FROM gis_wcdma_cells,
wcdma_cells
LEFT JOIN wcdma_cell_statistic ON wcdma_cells.id = wcdma_cell_statistic.cell_id
WHERE gis_wcdma_cells.wcdma_cell_id = wcdma_cells.id;
However, I find it's quite slow although I have created indexes for 3 tables in my view:
hash index created in wcdma_cells.id
wcdma_cell_statistic.cell_id
gis_wcdma_cells.wcdma_cell_id
I find select * from view is much slower than select specific columns from the view. While SQL explanation for select * looks faster (I mean the cost is much lower in select * case.). Could you help to explain why? And if there is any suggestion to make the query faster?
SQL explaination for "select *" from view
Nested Loop Left Join (cost=0.00..11501.93 rows=41036 width=479) (actual time=0.034..326.680 rows=41036 loops=1)
Buffers: shared hit=238761
-> Nested Loop (cost=0.00..9017.44 rows=41036 width=467) (actual time=0.026..183.752 rows=41036 loops=1)
Buffers: shared hit=125390
-> Seq Scan on gis_wcdma_cells (cost=0.00..2690.36 rows=41036 width=396) (actual time=0.007..13.359 rows=41036 loops=1)
Buffers: shared hit=2280
-> Index Scan using wcdma_cells_id_idx on wcdma_cells (cost=0.00..0.14 rows=1 width=71) (actual time=0.002..0.003 rows=1 loops=41036)
Index Cond: (id = gis_wcdma_cells.wcdma_cell_id)
Rows Removed by Index Recheck: 0
Buffers: shared hit=123110
-> Index Scan using wcdma_cell_statistic_cell_id_idx on wcdma_cell_statistic (cost=0.00..0.05 rows=1 width=20) (actual time=0.002..0.002 rows=1 loops=41036)
Index Cond: (wcdma_cells.id = cell_id)
Rows Removed by Index Recheck: 0
Buffers: shared hit=113371
Total runtime: 334.551 ms
SQL explaination for select specific columns from my view
Nested Loop Left Join (cost=2340.31..9007.59 rows=41036 width=58) (actual time=46.498..263.589 rows=41036 loops=1)
Buffers: shared hit=116667, temp read=400 written=394
-> Hash Join (cost=2340.31..6523.10 rows=41036 width=58) (actual time=46.471..119.983 rows=41036 loops=1)
Hash Cond: (gis_wcdma_cells.wcdma_cell_id = wcdma_cells.id)
Buffers: shared hit=3296, temp read=400 written=394
-> Seq Scan on gis_wcdma_cells (cost=0.00..2690.36 rows=41036 width=4) (actual time=0.007..14.688 rows=41036 loops=1)
Buffers: shared hit=2280
-> Hash (cost=1426.36..1426.36 rows=41036 width=54) (actual time=46.358..46.358 rows=41036 loops=1)
Buckets: 2048 Batches: 4 Memory Usage: 927kB
Buffers: shared hit=1016, temp written=299
-> Seq Scan on wcdma_cells (cost=0.00..1426.36 rows=41036 width=54) (actual time=0.004..24.612 rows=41036 loops=1)
Buffers: shared hit=1016
-> Index Scan using wcdma_cell_statistic_cell_id_idx on wcdma_cell_statistic (cost=0.00..0.05 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=41036)
Index Cond: (wcdma_cells.id = cell_id)
Rows Removed by Index Recheck: 0
Buffers: shared hit=113371
Total runtime: 271.146 ms
I am running the SQL query in Navicat or PgAdmin tool. The query execution time is: 15 sec for selecting specific column and a few minutes for select *.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Postgresql Query slow if empty table in IN clause - sql

Related

Inconsistent cold start query performance in Postgres query

Can I optimize this query with an index?

How do i reduce the cost of this query while keeping the query results the same?

Why my query cost is so high?

Slow SQL execution when querying from view

Categories

Resources