Select distinct very slow - sql

I have a table where I store rows with external ids. Quite often I need to select latest timestamp for given external ids. Now it is a bottleneck for my app
Query:
SELECT DISTINCT ON ("T1"."external_id") "T1"."external_id", "T1"."timestamp"
FROM "T1"
WHERE "T1"."external_id" IN ('825889935', '825904511')
ORDER BY "T1"."external_id" ASC, "T1"."timestamp" DESC
Explain:
Unique (cost=169123.13..169123.19 rows=12 width=18) (actual time=1327.443..1334.118 rows=2 loops=1)
-> Sort (cost=169123.13..169123.16 rows=12 width=18) (actual time=1327.441..1334.112 rows=2 loops=1)
Sort Key: external_id, timestamp DESC
Sort Method: quicksort Memory: 25kB
-> Gather (cost=1000.00..169122.91 rows=12 width=18) (actual time=752.577..1334.056 rows=2 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on T1 (cost=0.00..168121.71 rows=5 width=18) (actual time=921.649..1300.556 rows=1 loops=3)
Filter: ((external_id)::text = ANY ('{825889935,825904511}'::text[]))
Rows Removed by Filter: 1168882
Planning Time: 0.592 ms
Execution Time: 1334.159 ms
What could I do to make this query faster? Or probably should I use completely different query?
UPDATE:
Added new query plan as asked #jahrl. It looks like query is faster but previous query plan was made under the load and now it works similar time
Finalize GroupAggregate (cost=169121.80..169123.21 rows=12 width=18) (actual time=321.009..322.410 rows=2 loops=1)
Group Key: external_id
-> Gather Merge (cost=169121.80..169123.04 rows=10 width=18) (actual time=321.003..322.403 rows=2 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial GroupAggregate (cost=168121.77..168121.86 rows=5 width=18) (actual time=318.671..318.672 rows=1 loops=3)
Group Key: external_id
-> Sort (cost=168121.77..168121.78 rows=5 width=18) (actual time=318.664..318.665 rows=1 loops=3)
Sort Key: external_id
Sort Method: quicksort Memory: 25kB
Worker 0: Sort Method: quicksort Memory: 25kB
Worker 1: Sort Method: quicksort Memory: 25kB
-> Parallel Seq Scan on T1 (cost=0.00..168121.71 rows=5 width=18) (actual time=144.338..318.611 rows=1 loops=3)
Filter: ((external_id)::text = ANY ('{825889935,825904511}'::text[]))
Rows Removed by Filter: 1170827
Planning Time: 0.093 ms
Execution Time: 322.441 ms

Perhaps a basic GROUP BY query will perform better?
SELECT "T1"."external_id", MAX("T1"."timestamp") as "timestamp"
FROM "T1"
WHERE "T1"."external_id" IN ('825889935', '825904511')
GROUP BY "T1"."external_id"
ORDER BY "T1"."external_id" ASC
And, as #melcher said, don't forget an ("external_id", "timestamp") index!

Look at the number of rows removed by the filter and create an index on external_id.

Related

Totalling Unique Serial numbers every week

I'm trying to write a query that will search my database and on a weekly basis finds the total unique serial numbers of devices. My current code is:
SELECT date_part('week', "timestamp") , count(DISTINCT serialno)
FROM eddi_minute em
GROUP BY date_part('week', "timestamp")
Unfortunately, the dataset I'm searching is huge (~600Gb) so its taking an incredibly long time to search. I want to be able to search once a week, every week for a short time i.e. for 1 minute a.k.a.
select count(distinct serialno) as Devices
from eddi_minute em where "timestamp" >= '2021-06-23 00:01:00' and "timestamp" < '2021-06-23 00:02:00';
but for every week over a whole year so that I can press enter once and it does this for the whole database and to avoid counting unnecessarily.
In an ideal world, my idea would be to create a table of the times I want to search and then do a left join with that and my database to cut down on the data I'm searching but I only have read permissions to the server, so that is not an option. Is there an easy way I can do this?? Apologies if anything here is unclear, I'll elaborate if any of it is not properly explained.
The indexes for the table are
CREATE UNIQUE INDEX "PK_4c94f05e4de575488f4a0c2905d" ON ONLY public.eddi_minute USING btree (serialno, "timestamp")
The explain analyse result was:
GroupAggregate (cost=41219561.55..90787854.96 rows=200 width=16) (actual time=7065790.406..8172419.446 rows=53 loops=1)
Group Key: (date_part('week'::text, em."timestamp"))
-> Gather Merge (cost=41219561.55..88747442.16 rows=408082059 width=16) (actual time=7052726.256..7834672.575 rows=408057194 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=41218561.53..41643646.99 rows=170034187 width=16) (actual time=6956066.331..7201252.404 rows=136019065 loops=3)
Sort Key: (date_part('week'::text, em."timestamp"))
Sort Method: external merge Disk: 3368720kB
Worker 0: Sort Method: external merge Disk: 3640792kB
Worker 1: Sort Method: external merge Disk: 3371808kB
-> Parallel Append (cost=0.00..9256242.79 rows=170034187 width=16) (actual time=0.435..2825202.379 rows=136019065 loops=3)
-> Parallel Seq Scan on eddi_minute_p2021_05 em_11 (cost=0.00..1725776.58 rows=34898767 width=16) (actual time=0.011..1722528.987 rows=83740195 loops=1)
-> Parallel Seq Scan on eddi_minute_p2021_06 em_12 (cost=0.00..1488905.33 rows=30102507 width=16) (actual time=1.266..1488189.219 rows=72252984 loops=1)
-> Parallel Seq Scan on eddi_minute_p2021_04 em_10 (cost=0.00..1428581.36 rows=28905149 width=16) (actual time=149.934..1290294.249 rows=69366177 loops=1)
-> Parallel Seq Scan on eddi_minute_p2021_03 em_9 (cost=0.00..1290438.50 rows=26110040 width=16) (actual time=69.475..483281.530 rows=20887814 loops=3)
-> Parallel Seq Scan on eddi_minute_p2021_02 em_8 (cost=0.00..922294.02 rows=18661202 width=16) (actual time=195.734..931653.840 rows=44786882 loops=1)
-> Parallel Seq Scan on eddi_minute_p2021_01 em_7 (cost=0.00..823415.96 rows=16660557 width=16) (actual time=102.708..834900.144 rows=39985282 loops=1)
-> Parallel Seq Scan on eddi_minute_p2020_12 em_6 (cost=0.00..293130.95 rows=5931036 width=16) (actual time=182.465..296634.818 rows=14234537 loops=1)
-> Parallel Seq Scan on eddi_minute_p2020_11 em_5 (cost=0.00..111271.35 rows=2251388 width=16) (actual time=195.367..110910.685 rows=5403366 loops=1)
-> Parallel Seq Scan on eddi_minute_p2020_10 em_4 (cost=0.00..105311.10 rows=2130808 width=16) (actual time=146.920..109340.586 rows=5113938 loops=1)
-> Parallel Seq Scan on eddi_minute_p2020_09 em_3 (cost=0.00..93692.39 rows=1895711 width=16) (actual time=87.456..94169.812 rows=4549714 loops=1)
-> Parallel Seq Scan on eddi_minute_p2020_08 em_2 (cost=0.00..86189.97 rows=1743918 width=16) (actual time=0.007..88029.891 rows=4185403 loops=1)
-> Parallel Seq Scan on eddi_minute_p2020_07 em_1 (cost=0.00..33400.45 rows=675796 width=16) (actual time=1.046..14190.279 rows=1621911 loops=1)
-> Parallel Seq Scan on eddi_minute_p2021_07 em_13 (cost=0.00..3438.66 rows=88773 width=16) (actual time=0.006..51.229 rows=150887 loops=1)
-> Parallel Seq Scan on eddi_minute_default em_26 (cost=0.00..45.20 rows=1456 width=16) (actual time=0.016..0.639 rows=2477 loops=1)
-> Parallel Seq Scan on eddi_minute_p2021_08 em_14 (cost=0.00..15.00 rows=400 width=16) (actual time=0.000..0.000 rows=0 loops=1)
-> Parallel Seq Scan on eddi_minute_p2021_09 em_15 (cost=0.00..15.00 rows=400 width=16) (actual time=0.000..0.515 rows=0 loops=1)
-> Parallel Seq Scan on eddi_minute_p2021_10 em_16 (cost=0.00..15.00 rows=400 width=16) (actual time=0.000..0.000 rows=0 loops=1)
-> Parallel Seq Scan on eddi_minute_p2021_11 em_17 (cost=0.00..15.00 rows=400 width=16) (actual time=0.000..0.000 rows=0 loops=1)
-> Parallel Seq Scan on eddi_minute_p2021_12 em_18 (cost=0.00..15.00 rows=400 width=16) (actual time=0.000..0.000 rows=0 loops=1)
-> Parallel Seq Scan on eddi_minute_p2022_01 em_19 (cost=0.00..15.00 rows=400 width=16) (actual time=0.000..0.000 rows=0 loops=1)
-> Parallel Seq Scan on eddi_minute_p2022_02 em_20 (cost=0.00..15.00 rows=400 width=16) (actual time=0.000..0.000 rows=0 loops=1)
-> Parallel Seq Scan on eddi_minute_p2022_03 em_21 (cost=0.00..15.00 rows=400 width=16) (actual time=0.000..0.001 rows=0 loops=1)
-> Parallel Seq Scan on eddi_minute_p2022_04 em_22 (cost=0.00..15.00 rows=400 width=16) (actual time=0.000..0.000 rows=0 loops=1)
-> Parallel Seq Scan on eddi_minute_p2022_05 em_23 (cost=0.00..15.00 rows=400 width=16) (actual time=0.000..0.000 rows=0 loops=1)
-> Parallel Seq Scan on eddi_minute_p2022_06 em_24 (cost=0.00..15.00 rows=400 width=16) (actual time=0.000..0.000 rows=0 loops=1)
-> Parallel Seq Scan on eddi_minute_p2022_07 em_25 (cost=0.00..15.00 rows=400 width=16) (actual time=0.002..0.003 rows=0 loops=1)
Planning Time: 35.809 ms
Execution Time: 8172556.078 ms
A few thoughts:
Although "timestamp" is valid column name, it is considered bad practice to use reserved names for objects. It might seem harmless but can get pretty annoying on the long run.
I believe an index in the column "timestamp" should significantly improve the performance of the second query:
CREATE INDEX idx_timestamp ON eddi_minute ("timestamp");
Regarding the first query: considering you have a 600GB (!) table, it might be interesting to create a partial index in the column "timestamp", so that the timestamps are indexed by the value you will use in your queries, e.g., week:
CREATE INDEX idx_timestamp_week ON eddi_minute (date_part('week', "timestamp"));
Note: although indexes speed up queries, they slow down other operations, like inserts, updates and deletes. If you create new indexes, test the performance of all relevant operations.
Demo: db<>fiddle

Why Postgres EXPLAIN ANALYSE report huge performance difference compare to real query execution

I have been tasked with rewriting some low performance sql in our system for which I have this query
select
"aggtable".id as t_id,
count(joined.packages)::integer as t_package_count,
sum(coalesce((joined.packages ->> 'weight'::text)::double precision, 0::double precision)) as t_total_weight
from
"aggtable"
join (
select
"unnested".myid, json_array_elements("jsontable".jsondata) as packages
from
(
select
distinct unnest("tounnest".arrayofid) as myid
from
"aggtable" "tounnest") "unnested"
join "jsontable" on
"jsontable".id = "unnested".myid) joined on
joined.myid = any("aggtable".arrayofid)
group by
"aggtable".id
The EXPLAN ANALYSE result is
Sort Method: quicksort Memory: 611kB
-> Nested Loop (cost=30917.16..31333627.69 rows=27270 width=69) (actual time=4.028..2054.470 rows=3658 loops=1)
Join Filter: ((unnest(tounnest.arrayofid)) = ANY (aggtable.arrayofid))
Rows Removed by Join Filter: 9055436
-> ProjectSet (cost=30917.16..36645.61 rows=459000 width=48) (actual time=3.258..13.846 rows=3322 loops=1)
-> Hash Join (cost=30917.16..34316.18 rows=4590 width=55) (actual time=3.246..7.079 rows=1661 loops=1)
Hash Cond: ((unnest(tounnest.arrayofid)) = jsontable.id)
-> Unique (cost=30726.88..32090.38 rows=144700 width=16) (actual time=1.901..3.720 rows=1664 loops=1)
-> Sort (cost=30726.88..31408.63 rows=272700 width=16) (actual time=1.900..2.711 rows=1845 loops=1)
Sort Key: (unnest(tounnest.arrayofid))
Sort Method: quicksort Memory: 135kB
-> ProjectSet (cost=0.00..1444.22 rows=272700 width=16) (actual time=0.011..1.110 rows=1845 loops=1)
-> Seq Scan on aggtable tounnest (cost=0.00..60.27 rows=2727 width=30) (actual time=0.007..0.311 rows=2727 loops=1)
-> Hash (cost=132.90..132.90 rows=4590 width=55) (actual time=1.328..1.329 rows=4590 loops=1)
Buckets: 8192 Batches: 1 Memory Usage: 454kB
-> Seq Scan on jsontable (cost=0.00..132.90 rows=4590 width=55) (actual time=0.006..0.497 rows=4590 loops=1)
-> Materialize (cost=0.00..73.91 rows=2727 width=67) (actual time=0.000..0.189 rows=2727 loops=3322)
-> Seq Scan on aggtable (cost=0.00..60.27 rows=2727 width=67) (actual time=0.012..0.317 rows=2727 loops=1)
Planning Time: 0.160 ms
Execution Time: 2065.268 ms
I tried to rewrite this query from scratch to profile performance and to understand the original intention
select
joined.joinid,
count(joined.packages)::integer as t_package_count,
sum(coalesce((joined.packages ->> 'weight'::text)::double precision, 0::double precision)) as t_total_weight
from
(
select
joinid ,
json_array_elements(jsondata) as packages
from
( (
select
distinct unnest(at2.arrayofid) as joinid, at2.id as rootid
from
aggtable at2) unnested
join jsontable jt on
jt.id = unnested.joinid)) joined
group by joined.joinid
For which the EXPLAIN ANALYSE return
HashAggregate (cost=873570.28..873572.78 rows=200 width=28) (actual time=18.379..18.741 rows=1661 loops=1)
Group Key: (unnest(at2.arrayofid))
-> ProjectSet (cost=44903.16..191820.28 rows=27270000 width=48) (actual time=3.019..14.684 rows=3658 loops=1)
-> Hash Join (cost=44903.16..53425.03 rows=272700 width=55) (actual time=3.010..4.999 rows=1829 loops=1)
Hash Cond: ((unnest(at2.arrayofid)) = jt.id)
-> Unique (cost=44712.88..46758.13 rows=272700 width=53) (actual time=1.825..2.781 rows=1845 loops=1)
-> Sort (cost=44712.88..45394.63 rows=272700 width=53) (actual time=1.824..2.135 rows=1845 loops=1)
Sort Key: (unnest(at2.arrayofid)), at2.id
Sort Method: quicksort Memory: 308kB
-> ProjectSet (cost=0.00..1444.22 rows=272700 width=53) (actual time=0.009..1.164 rows=1845 loops=1)
-> Seq Scan on aggtable at2 (cost=0.00..60.27 rows=2727 width=67) (actual time=0.005..0.311 rows=2727 loops=1)
-> Hash (cost=132.90..132.90 rows=4590 width=55) (actual time=1.169..1.169 rows=4590 loops=1)
Buckets: 8192 Batches: 1 Memory Usage: 454kB
-> Seq Scan on jsontable jt (cost=0.00..132.90 rows=4590 width=55) (actual time=0.007..0.462 rows=4590 loops=1)
Planning Time: 0.144 ms
Execution Time: 18.889 ms
I see a huge difference in the query performance (20ms to 2000ms), as evaluated by postgres. Howver, the real query performance is no where near that difference ( the fast one is about 500ms and the slow one is about 1s )
My question
1/ Is that normal that EXPLAIN produce drastic difference in performance but not so much in real life?
2/ Is the second - optimized query correct? what did the first query do wrong?
I suppy also the credential to a sample database so that everyone can try the queries out
postgres://birylwwg:X6EM3Al9Jhqzz0w6EaSSx79pa4aXRBZq#arjuna.db.elephantsql.com:5432/birylwwg
PW is
X6EM3Al9Jhqzz0w6EaSSx79pa4aXRBZq

Can I optimize this query with an index?

given this query:
SELECT count(u.*)
FROM res_users u
WHERE active=true AND
share=false AND
NOT exists(SELECT 1 FROM res_users_log WHERE create_uid=u.id);
It currently takes 10 seconds.
I tried to make it faster with these 2 index commands, but it didn't help.
CREATE INDEX CONCURRENTLY id_active_share_index ON res_users (id,active,share);
CREATE INDEX CONCURRENTLY create_uid_index ON res_users_log (create_uid);
I guess it's because of the "NOT exists" line, but I have no idea how to include it into an index.
EXPLAIN (ANALYZE, BUFFERS) gives me this output:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=2815437.14..2815437.15 rows=1 width=8) (actual time=39174.365..39174.367 rows=1 loops=1)
Buffers: shared hit=124 read=112875 dirtied=70, temp read=98788 written=99211
-> Merge Anti Join (cost=2678572.70..2815437.09 rows=20 width=1064) (actual time=39174.360..39174.361 rows=0 loops=1)
Merge Cond: (u.id = res_users_log.create_uid)
Buffers: shared hit=124 read=112875 dirtied=70, temp read=98788 written=99211
-> Sort (cost=11.92..11.97 rows=20 width=1068) (actual time=5.577..5.602 rows=16 loops=1)
Sort Key: u.id
Sort Method: quicksort Memory: 79kB
Buffers: shared hit=53 read=5
-> Seq Scan on res_users u (cost=0.00..11.49 rows=20 width=1068) (actual time=0.050..5.519 rows=16 loops=1)
Filter: (active AND (NOT share))
Rows Removed by Filter: 33
Buffers: shared hit=49 read=5
-> Sort (cost=2678560.78..2716236.90 rows=15070449 width=4) (actual time=36258.796..38013.471 rows=15069209 loops=1)
Sort Key: res_users_log.create_uid
Sort Method: external merge Disk: 206464kB
Buffers: shared hit=71 read=112870 dirtied=70, temp read=98788 written=99211
-> Seq Scan on res_users_log (cost=0.00..263645.49 rows=15070449 width=4) (actual time=1.755..29961.086 rows=15069319 loops=1)
Buffers: shared hit=71 read=112870 dirtied=70
Planning Time: 0.889 ms
Execution Time: 39202.694 ms
(21 rows)
For this query:
SELECT count(*)
FROM res_users u
WHERE active = true AND
share = false AND
NOT exists (SELECT 1 FROM res_users_log rul WHERE rul.create_uid = u.id);
You want indexes on:
res_users(active, share, id)
res_users_log(create_uid)
Note that the ordering of the columns matters.
This index will make the query fast as lightning:
CREATE INDEX ON res_users_log (create_uid);

How do i reduce the cost of this query while keeping the query results the same?

I have the below query running on a postgres and sqlserver DB (Use top for SQL server). The sorting of the "change_sequence" value is causing a high cost in my query, is there any way to reduce the cost but maintain the same results?
Query:
SELECT tablename,
CAST(primary_key_values AS VARCHAR),
primary_key_fields,
CAST(min_sequence AS NUMERIC),
_changed_fieldlist,
_operation,
min_sequence
FROM (
SELECT 'memdep' AS tablename,
CONCAT_WS(',',dependant,mem_num) AS primary_key_values,
'dependant,mem_num,' AS primary_key_fields,
_change_sequence AS min_sequence,
ROW_NUMBER() OVER(partition by dependant,mem_num order by _change_sequence) AS rn,
_changed_fieldlist,
_operation
FROM mipbi_ods.memdep
WHERE mipbi_status = 'NEW'
) main
WHERE rn = 1
LIMIT 100
In essence what i'm looking for is the records from "memdep" where they have a "mipbi_status" of 'NEW' with the lowest "_change_sequence". Ive tried using a MIN() function instead of the ROW_NUMBER the speed is about the same cost is about 5 more.
Is there a way to reduce the cost/speed of the query. I have around 400 million records in this table if that helps.
Here is the query explained:
Limit (cost=3080.03..3080.53 rows=100 width=109) (actual time=17.633..17.648 rows=35 loops=1)
-> Unique (cost=3080.03..3089.04 rows=1793 width=109) (actual time=17.632..17.644 rows=35 loops=1)
-> Sort (cost=3080.03..3084.53 rows=1803 width=109) (actual time=17.631..17.634 rows=36 loops=1)
Sort Key: (concat_ws(','::text, memdet.mem_num))
Sort Method: quicksort Memory: 29kB
-> Bitmap Heap Scan on memdet (cost=54.39..2982.52 rows=1803 width=109) (actual time=16.853..17.542 rows=36 loops=1)
Recheck Cond: ((mipbi_status)::text = 'NEW'::text)
Heap Blocks: exact=8
-> Bitmap Index Scan on idx_mipbi_status_memdet (cost=0.00..53.94 rows=1803 width=0) (actual time=10.396..10.396 rows=38 loops=1)
Index Cond: ((mipbi_status)::text = 'NEW'::text)
Planning time: 0.201 ms
Execution time: 17.700 ms
I'm using a smaller table to show here, this isn't the 400 million record table, but indexes and all will be the same.
Here is the query plan for the large table:
Limit (cost=47148422.27..47149122.27 rows=100 width=113) (actual time=2407976.293..2407977.112 rows=100 loops=1)
Output: main.tablename, ((main.primary_key_values)::character varying), main.primary_key_fields, main.min_sequence, main._changed_fieldlist, main._operation, main.min_sequence
Buffers: shared hit=6269554 read=12205028 dirtied=1893 written=4566983, temp read=443831 written=1016025
-> Subquery Scan on main (cost=47148422.27..52102269.25 rows=707692 width=113) (actual time=2407976.292..2407977.100 rows=100 loops=1)
Output: main.tablename, (main.primary_key_values)::character varying, main.primary_key_fields, main.min_sequence, main._changed_fieldlist, main._operation, main.min_sequence
Filter: (main.rn = 1)
Buffers: shared hit=6269554 read=12205028 dirtied=1893 written=4566983, temp read=443831 written=1016025
-> WindowAgg (cost=47148422.27..50333038.19 rows=141538485 width=143) (actual time=2407976.288..2407977.080 rows=100 loops=1)
Output: 'claim', concat_ws(','::text, claim.gen_claimnum), 'gen_claimnum,', claim._change_sequence, row_number() OVER (?), claim._changed_fieldlist, claim._operation, claim.gen_claimnum
Buffers: shared hit=6269554 read=12205028 dirtied=1893 written=4566983, temp read=443831 written=1016025
-> Sort (cost=47148422.27..47502268.49 rows=141538485 width=39) (actual time=2407976.236..2407976.905 rows=100 loops=1)
Output: claim._change_sequence, claim.gen_claimnum, claim._changed_fieldlist, claim._operation
Sort Key: claim.gen_claimnum, claim._change_sequence
Sort Method: external merge Disk: 4588144kB
Buffers: shared hit=6269554 read=12205028 dirtied=1893 written=4566983, temp read=443831 written=1016025
-> Seq Scan on mipbi_ods.claim (cost=0.00..20246114.01 rows=141538485 width=39) (actual time=0.028..843181.418 rows=88042077 loops=1)
Output: claim._change_sequence, claim.gen_claimnum, claim._changed_fieldlist, claim._operation
Filter: ((claim.mipbi_status)::text = 'NEW'::text)
Rows Removed by Filter: 356194
Buffers: shared hit=6269554 read=12205028 dirtied=1893 written=4566983
Planning time: 8.796 ms
Execution time: 2408702.464 ms

Why my query cost is so high?

When I execute explain analyze on some query I've got the normal cost from some low value to some higher value. But when I'm trying to force to use the index on table by switching enable_seqscan to false, the query cost jumps to insane values like:
Merge Join (cost=10064648609.460..10088218360.810 rows=564249 width=21) (actual time=341699.323..370702.969 rows=3875328 loops=1)
Merge Cond: ((foxtrot.two = ((five_hotel.two)::numeric)) AND (foxtrot.alpha_two07 = ((five_hotel.alpha_two07)::numeric)))
-> Merge Append (cost=10000000000.580..10023064799.260 rows=23522481 width=24) (actual time=0.049..19455.320 rows=23522755 loops=1)
Sort Key: foxtrot.two, foxtrot.alpha_two07
-> Sort (cost=10000000000.010..10000000000.010 rows=1 width=76) (actual time=0.005..0.005 rows=0 loops=1)
Sort Key: foxtrot.two, foxtrot.alpha_two07
Sort Method: quicksort Memory: 25kB
-> Seq Scan on foxtrot (cost=10000000000.000..10000000000.000 rows=1 width=76) (actual time=0.001..0.001 rows=0 loops=1)
Filter: (kilo_sierra_oscar = 'oscar'::date)
-> Index Scan using alpha_five on five_uniform (cost=0.560..22770768.220 rows=23522480 width=24) (actual time=0.043..17454.619 rows=23522755 loops=1)
Filter: (kilo_sierra_oscar = 'oscar'::date)
As you can see I'm trying to retrive values by index, so they doesn't need to be sorted once they're loaded.
It is a simple query:
select *
from foxtrot a
where foxtrot.kilo_sierra_oscar = date'2015-01-01'
order by foxtrot.two, foxtrot.alpha_two07
Index scan: "Execution time: 19009.569 ms"
Sequential scan: "Execution time: 127062.802 ms"
Setting the enable_seqscan to false improves execution time of query, but I would like optimizer to calculate that.
EDIT:
Seq plan with buffers:
Sort (cost=4607555.110..4666361.310 rows=23522481 width=24) (actual time=101094.754..120740.190 rows=23522756 loops=1)
Sort Key: foxtrot.two, foxtrot.alpha07
Sort Method: external merge Disk: 805304kB
Buffers: shared hit=468690, temp read=100684 written=100684
-> Append (cost=0.000..762721.000 rows=23522481 width=24) (actual time=0.006..12018.725 rows=23522756 loops=1)
Buffers: shared hit=468690
-> Seq Scan on foxtrot (cost=0.000..0.000 rows=1 width=76) (actual time=0.001..0.001 rows=0 loops=1)
Filter: (kilo = 'oscar'::date)
-> Seq Scan on foxtrot (cost=0.000..762721.000 rows=23522480 width=24) (actual time=0.005..9503.851 rows=23522756 loops=1)
Filter: (kilo = 'oscar'::date)
Buffers: shared hit=468690
Index plan with buffers:
Merge Append (cost=10000000000.580..10023064799.260 rows=23522481 width=24) (actual time=0.046..19302.855 rows=23522756 loops=1)
Sort Key: foxtrot.two, foxtrot.alpha_two07
Buffers: shared hit=17855133 -> Sort (cost=10000000000.010..10000000000.010 rows=1 width=76) (actual time=0.009..0.009 rows=0 loops=1)
Sort Key: foxtrot.two, foxtrot.alpha_two07
Sort Method: quicksort Memory: 25kB
-> Seq Scan on foxtrot (cost=10000000000.000..10000000000.000 rows=1 width=76) (actual time=0.000..0.000 rows=0 loops=1)
Filter: (kilo = 'oscar'::date)
-> Index Scan using alpha_five on five (cost=0.560..22770768.220 rows=23522480 width=24) (actual time=0.036..17035.903 rows=23522756 loops=1)
Filter: (kilo = 'oscar'::date)
Buffers: shared hit=17855133
Why the cost of the query jumps so high? How can I avoid it?
The high cost is a direct consequence of set enable_seqscan=false.
The planner implements this "hint" by setting an arbitrary super-high cost (10 000 000 000) to the sequential scan technique. Then it computes the different potential execution strategies with their associated costs.
If the best result still has a super-high cost, it means that the planner found no strategy to avoid the sequential scan, even when trying at all costs.
In the plan shown in the question under "Index plan with buffers" this happens at the Seq Scan on foxtrot node.