I have the following PostgreSQL table with about 67 million rows, which stores the EOD prices for all the US stocks starting in 1985:
Table "public.eods"
Column | Type | Collation | Nullable | Default
--------+-----------------------+-----------+----------+---------
stk | character varying(16) | | not null |
dt | date | | not null |
o | integer | | not null |
hi | integer | | not null |
lo | integer | | not null |
c | integer | | not null |
v | integer | | |
Indexes:
"eods_pkey" PRIMARY KEY, btree (stk, dt)
"eods_dt_idx" btree (dt)
I would like to query efficiently the table above based on either the stock name or the date. The primary key of the table is stock name and date. I have also defined an index on the date column, hoping to improve performance for queries that retrieve all the records for a specific date.
Unfortunately, I see a big difference in performance for the queries below. While getting all the records for a specific stock takes a decent amount of time to complete (2 seconds), getting all the records for a specific date takes much longer (about 56 seconds). I have tried to analyze these queries using explain analyze, and I have got the results below:
explain analyze select * from eods where stk='MSFT';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on eods (cost=169.53..17899.61 rows=4770 width=36) (actual time=207.218..2142.215 rows=8364 loops=1)
Recheck Cond: ((stk)::text = 'MSFT'::text)
Heap Blocks: exact=367
-> Bitmap Index Scan on eods_pkey (cost=0.00..168.34 rows=4770 width=0) (actual time=187.844..187.844 rows=8364 loops=1)
Index Cond: ((stk)::text = 'MSFT'::text)
Planning Time: 577.906 ms
Execution Time: 2143.101 ms
(7 rows)
explain analyze select * from eods where dt='2010-02-22';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Index Scan using eods_dt_idx on eods (cost=0.56..25886.45 rows=7556 width=36) (actual time=40.047..56963.769 rows=8143 loops=1)
Index Cond: (dt = '2010-02-22'::date)
Planning Time: 67.876 ms
Execution Time: 56970.499 ms
(4 rows)
I really cannot understand why the second query runs 28 times slower than the first query. They retrieve a similar number of records, they both seem to be using an index. So could somebody please explain to me why this difference in performance, and can I do something to improve the performance of the queries that retrieve all the records for a specific date?
I would guess that this has to do with the data layout. I am guessing that you are loading the data by stk, so the rows for a given stk are on a handful of pages that pretty much only contain that stk.
So, the execution engine is only reading about 25 pages.
On the other hand, no single page contains two records for the same date. When you read by date, you have to read about 7,556 pages. That is, about 300 times the number of pages.
The scaling must also take into account the work for loading and reading the index. This should be about the same for the two queries, so the ratio is less than a factor of 300.
There can be more issues - so it is hard to say where is a problem. Index scan should be usually faster, than bitmap heap scan - if not, then there can be following problems:
unhealthy index - try to run REINDEX INDEX indexname
bad statistics - try to run ANALYZE tablename
suboptimal state of table - try to run VACUUM tablename
too low, or to high setting of effective_cache_size
issues with IO - some systems has a problem with high random IO, try to increase random_page_cost
Investigation what is a issue is little bit alchemy - but it is possible - there are only closed set of very probably issues. Good start is
VACUUM ANALYZE tablename
benchmark your IO if it is possible (like bonie++)
To find the difference, you'll probably have to run EXPLAIN (ANALYZE, BUFFERS) on the query so that you see how many blocks are touched and where they come from.
I can think of two reasons:
Bad statistics that make PostgreSQL believe that dt has a high correlation while it has not. If the correlation is low, a bitmap index scan is often more efficient.
To see if that is the problem, run
ANALYZE eods;
and see if that changes the execution plans chosen.
Caching effects: perhaps the first query finds all required blocks already cached, while the second doesn't.
At any rate, it might be worth experimenting to see if a bitmap index scan would be cheaper for the second query:
SET enable_indexscan = off;
Then repeat the query.
Related
Hi I'm curious about why index doesn't work when data rows are large even 100.
Here's select for 10 data:
mydb> explain select * from data where user_id=1;
+-----------------------------------------------------------------------------------+
| QUERY PLAN |
|-----------------------------------------------------------------------------------|
| Index Scan using ix_data_user_id on data (cost=0.14..8.15 rows=1 width=2043) |
| Index Cond: (user_id = 1) |
+-----------------------------------------------------------------------------------+
EXPLAIN
Here's select for 100 data:
mydb> explain select * from data where user_id=1;
+------------------------------------------------------------+
| QUERY PLAN |
|------------------------------------------------------------|
| Seq Scan on data (cost=0.00..44.67 rows=1414 width=945) |
| Filter: (user_id = 1) |
+------------------------------------------------------------+
EXPLAIN
How can index work when data rows are 100?
100 is not a large amount of data. Think 10,000 or 100,000 rows for a respectable amount.
To put it simply, records in a table are stored on data pages. A data page typically has about 8k bytes (it depends on the database and on settings). A major purpose of indexes is to reduce the number of data pages that need to be read.
If all the records in a table fit on one page, there is no need to reduce the number pages being read. The one page will be read. Hence, the index may not be particularly useful.
In oracle Is there any way to determine howlong the sql query will take to fetch the entire records and what will be the size of it, Without actually executing and waiting for entire result.
I am getting repeatedly to download and provide the data to the users using normal oracle SQL select (not datapump/import etc) . Some times rows will be in millions.
Actual run time will not known unless you run it, but you can try to estimate it..
first you can do explain plan explain only, this will NOT run query -- based on your current stats it will show you more or less how it will be executed
this will not have actual time and efforts to read the data from datablocks..
do you have large blocksize
is this schema normalized/de-normalized for query/reporting?
how large is row does it fit in same block so only 1 fetch is needed?
of rows you are expecting
based on amount of data * your network latency
Based on this you can try estimate time
This requires good statistics, explain plan for ..., adjusting sys.aux_stats, and then adjusting your expectations.
Good statistics The explain plan estimates are based on optimizer statistics. Make sure that tables and indexes have up-to-date statistics. On 11g this usually means sticking with the default settings and tasks, and only manually gathering statistics after large data loads.
Explain plan for ... Use a statement like this to create and store the explain plan for any SQL statement. This even works for creating indexes and tables.
explain plan set statement_id = 'SOME_UNIQUE_STRING' for
select * from dba_tables cross join dba_tables;
This is usually the best way to visualize an explain plan:
select * from table(dbms_xplan.display);
Plan hash value: 2788227900
-------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Time |
-------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 12M| 5452M| 00:00:19 |
|* 1 | HASH JOIN RIGHT OUTER | | 12M| 5452M| 00:00:19 |
| 2 | TABLE ACCESS FULL | SEG$ | 7116 | 319K| 00:00:01 |
...
The raw data is stored in PLAN_TABLE. The first row of the plan usually sums up the estimates for the other steps:
select cardinality, bytes, time
from plan_table
where statement_id = 'SOME_UNIQUE_STRING'
and id = 0;
CARDINALITY BYTES TIME
12934699 5717136958 19
Adjust sys.aux_stats$ The time estimate is based on system statistics stored in sys.aux_stats. These are numbers for metrics like CPU speed, single-block I/O read time, etc. For example, on my system:
select * from sys.aux_stats$ order by sname
SNAME PNAME PVAL1 PVAL2
SYSSTATS_INFO DSTART 09-11-2014 11:18
SYSSTATS_INFO DSTOP 09-11-2014 11:18
SYSSTATS_INFO FLAGS 1
SYSSTATS_INFO STATUS COMPLETED
SYSSTATS_MAIN CPUSPEED
SYSSTATS_MAIN CPUSPEEDNW 3201.10192837466
SYSSTATS_MAIN IOSEEKTIM 10
SYSSTATS_MAIN IOTFRSPEED 4096
SYSSTATS_MAIN MAXTHR
SYSSTATS_MAIN MBRC
SYSSTATS_MAIN MREADTIM
SYSSTATS_MAIN SLAVETHR
SYSSTATS_MAIN SREADTIM
The numbers can be are automatically gathered by dbms_stats.gather_system_stats. They can also be manually modified. It's a SYS table but relatively safe to modify. Create some sample queries, compare the estimated time with the actual time, and adjust the numbers until they match.
Discover you probably wasted a lot of time
Predicting run time is theoretically impossible to get right in all cases, and in practice it is horribly difficult to forecast for non-trivial queries. Jonathan Lewis wrote a whole book about those predictions, and that book only covers the "basics".
Complex explain plans are typically "good enough" if the estimates are off by one or two orders of magnitude. But that kind of difference is typically not good enough to show to a user, or use for making any important decisions.
I look after a single Postgres 9.3.3 (Amazon RDS instance: db.m3.2xlarge), which is the back-end of a system that logs incoming statistics and provides reports based on those data - yes, from the same DB node.
Performance is generally very good, but upon adding an extra index on table R to improve reporting performance, logging performance collapsed, as both INSERTs and UPDATEs on a different table L used by the logging process immediately began to lock - seemingly on one another, according to pg_locks, although no deadlocks were reported. Immediately, all available connections (according to pg_stat_activity) locked in the same way, DB CPU rose quickly to 100%. The logger's load-balancer took all of its nodes out of use, but as the INSERTs and UPDATEs refused to complete or to time out, all connections stayed locked.
Note that this isn't a problem during index creation, only during usage. Nor is this an issue of load: throttling logging by 90% and starting the system completely afresh again immediately locked it up. No reporting whatsoever was happening at the same time.
Dropping the R index immediately releases all L locks.
I create the index with:
CREATE INDEX idxForGroup ON R (group,article_id,month);
where the columns are:
'group' type: VARCHAR(64) defaultValue: "" nullable: false
'month' type: TIMESTAMP nullable: false
'article_id' type: BIGINT defaultValue: 0 nullable: false
There is already a composite primary key, of which the above is just a subset:
customer_resource_id (a FK), subtype (a VARCHAR), group, article_id, month
I should add that there is a relationship between R and L: a trigger updates the reporting table R based upon updates to L:
CREATE TRIGGER on_event_report AFTER INSERT OR UPDATE ON L FOR EACH ROW EXECUTE PROCEDURE resource_event_trigger();
I accept that adding any index imposes a small (microseconds?) cost/load, but there are already indexes on R, so I don't understand how a 'little' extra indexing on R could have such a huge impact as to cause lockups for L.
Update:
If I investigate the L queries that are getting locked:
EXPLAIN (analyze,buffers) update L set count=count+1 where customer_resource_id=911657 and item_type_id='type' and event_subtype='subtype' and reporting_date='2014-04-13 00:00:00' AND group='';
Update on L (cost=0.57..20.18 rows=5 width=49) (actual time=70.968..70.968 rows=0 loops=1)
Buffers: shared hit=170 read=16 dirtied=15
-> Index Scan using L_pkey on L (cost=0.57..20.18 rows=5 width=49) (actual time=0.067..0.525 rows=19 loops=1)
Index Cond: ((customer_resource_id = 911657) AND ((group)::text = ''::text) AND ((item_type_id)::text = 'type'::text) AND ((event_subtype)::text = 'subtype'::text) AND (article_id = 0))
Buffers: shared hit=24
Trigger on_L: time=11626.219 calls=19 <---
Total runtime: 11697.285 ms
So, you'd think the trigger that updates R must be the problem - and yet when I EXPLAIN the trigger queries, they all check out fine: indexes hit, no scans, etc.
Update 2:
Not sure if this is really a locking issue, or just a massive performance degradation, but here's pg_locks with the index present:
SELECT mode,COUNT(*) FROM pg_locks GROUP BY mode;
mode | granted | count
------------------+---------+-------
AccessShareLock | t | 24715
ExclusiveLock | t | 1504
ExclusiveLock | f | 138
RowExclusiveLock | t | 5901
RowShareLock | t | 185
ShareLock | f | 95
Drop the index, and within seconds:
mode | count
-----------------+-------
ExclusiveLock | 3
AccessShareLock | 31
Update 3:
Here's the source of the trigger on the logging table L that updates the reporting table R:
CREATE OR REPLACE FUNCTION resource_event_trigger()
RETURNS TRIGGER AS $$
DECLARE
cre_row R%ROWTYPE;
delta INTEGER;
BEGIN
SELECT * INTO cre_row FROM R cre WHERE cre.customer_resource_id = NEW.customer_resource_id AND cre.group = NEW.group_id AND cre.subtype = NEW.event_subtype AND cre.date = date_trunc('month', NEW.date) AND cre.article_id = NEW.article_id;
IF cre_row IS null THEN
INSERT INTO R (customer_resource_id, group, subtype, article_id, date) VALUES (NEW.customer_resource_id, NEW.group_id, NEW.event_subtype, NEW.article_id, date_trunc('month', NEW.date));
END IF;
IF TG_OP = 'INSERT' THEN
delta = NEW.event_count;
ELSE
delta = NEW.event_count - OLD.event_count;
END IF;
CASE
WHEN NEW.item_type_id = 'typeA' THEN
UPDATE R SET count_A = count_A + delta WHERE customer_resource_id = NEW.customer_resource_id AND group = NEW.group_id AND subtype = NEW.event_subtype AND article_id = NEW.article_id AND date = date_trunc('month', NEW.date);
[...]
END CASE;
RETURN NEW;
END;
$$
LANGUAGE plpgsql;
It's long-ish, but pretty straightforward. When 'EXPLAIN'ed individually, all the individual queries use primary keys / indexes, use few buffers, etc.
Update 4:
If I examine the created index, I notice:
SELECT tablename, attname, n_distinct, correlation from pg_stats where tablename='R' AND attname IN ('group','article_id','date','customer_resource_id','subtype') ORDER BY attname;
tablename | attname | n_distinct | correlation
-----------+----------------------+------------+-------------
R | article_id | 25886 | 0.756468
R | group | 165 | 0.227023
R | customer_resource_id | -0.304469 | 0.729134
R | date | 53 | 0.943593
R | subtype | 2 | 0.657429
... which looks plausible. And if I look at cardinality I get:
SELECT relname, relkind, reltuples as cardinality, relpages FROM pg_class where relkind='i' [...] order by relname;
relname | relkind | cardinality | relpages
-------------+---------+-------------+----------
R_pkey | i | 2.69955e+07 | 293035
idxForGroup | i | 2.70333e+07 | 134149
L_pkey | i | 7.14889e+07 | 771581
Both the PK and the newly added index have values that are almost the same as the row count which, again, should be fine...
Well while I can't say EXACTLY what your problem may be, it does obviously seem to hinge on this new index. It would certainly be easier if I were to look thoroughly through things, but I will give a couple shots in the dark.
Indexes and postgres performance can be a big subject, but fundamentally there are a few things I see could be wrong that you should check:
When changing/adding an index on a table, the query optimizer (that fires milliseconds before the query goes, and evaluates how best to execute that query) of course is looking at the table in a different way. It sees: "Hey, there is a new index here and maybe this is better than the old index I was using" and in some cases the query optimizer can be wrong.
And so instead of using a past index that was working just fine, the optimizer then starts doing full table scans or something stupid. This is in an extreme case, but could of course occur.
The other thing is that you may be dealing with a lot of "dead rows". Whenever you to an update to a table, it creates a "dead row" and inserts a new one (your updated row). The "dead row" doesn't really go anywhere and sometimes can bloat up your table and the Postgres query optimizer can go a little haywire trying to understand this.
One time I had a table that just went CRAZY like this and I could not, for the life of me, understand why it ws going SO SLOW. I then looked at the table statistics and saw hundreds of thousands of dead rows. I took a shot in the dark and just dropped and re-created the table (albeit, that takes some work on a live database) and magically everything worked completely fine after that (NOTHING was changed on the table structure or data in the table - other than that dead rows were completely released when dropping that instance of the table).
While that is a bit extreme, what I would do immediate is:
a) run vacuum on the table which does a pretty good jog at getting the "dead rows" out of the way
b) then run an analyze on the table. This re-sets the table summarized statistics, which is where the optimizer takes a lot of the data to determine how best to query the table.
In PostgreSQL 9.2, I have a table of items that are being rated by users:
id | userid | itemid | rating | timestamp | !update_time
--------+--------+--------+---------------+---------------------+------------------------
522241 | 3991 | 6887 | 0.1111111111 | 2005-06-20 03:13:56 | 2013-10-11 17:50:24.545
522242 | 3991 | 6934 | 0.1111111111 | 2005-04-05 02:25:21 | 2013-10-11 17:50:24.545
522243 | 3991 | 6936 | -0.1111111111 | 2005-03-31 03:17:25 | 2013-10-11 17:50:24.545
522244 | 3991 | 6942 | -0.3333333333 | 2005-03-24 04:38:02 | 2013-10-11 17:50:24.545
522245 | 3991 | 6951 | -0.5555555556 | 2005-06-20 03:15:35 | 2013-10-11 17:50:24.545
... | ... | ... | ... | ... | ...
I want to perform a very simple query: for each user, select the total number of ratings in the database.
I'm using the following straightforward approach:
SELECT userid, COUNT(*) AS rcount
FROM ratings
GROUP BY userid
The table contains 10M records. The query takes... well, about 2 or 3 minutes. Honestly, I'm not satisfied with that, and I believe that 10M is not so large number for the query to take so long. (Or is it..??)
Henceforth, I asked PostgreSQL to show me the execution plan:
EXPLAIN SELECT userid, COUNT(*) AS rcount
FROM ratings
GROUP BY userid
This results in:
GroupAggregate (cost=1756177.54..1831423.30 rows=24535 width=5)
-> Sort (cost=1756177.54..1781177.68 rows=10000054 width=5)
Sort Key: userid
-> Seq Scan on ratings (cost=0.00..183334.54 rows=10000054 width=5)
I read this as follows: Firstly, the whole table is read from the disk (seq scan). Secondly, it is sorted by userid in n*log(n) (sort). Finally, the sorted table is read row-by-row and aggregated in linear time. Well, not exactly the optimal algorithm I think, if I were to implement it by myself, I would use a hash table and build the result in the first pass. Never mind.
It seems that it is the sorting by userid which takes so long. So added an index:
CREATE INDEX ratings_userid_index ON ratings (userid)
Unfortunately, this didn't help and the performance remained the same. I definitely do not consider myself an advanced user and I believe I'm doing something fundamentally wrong. However, this is where I got stuck. I would appreciate any ideas how to make the query execute in reasonable time. One more note: PostgreSQL worker process utilizes 100 % of one of my CPU cores during the execution, suggesting that disk access is not the main bottleneck.
EDIT
As requested by #a_horse_with_no_name. Wow, quite advanced for me:
EXPLAIN (analyze on, buffers on, verbose on)
SELECT userid,COUNT(userid) AS rcount
FROM movielens_10m.ratings
GROUP BY userId
Outputs:
GroupAggregate (cost=1756177.54..1831423.30 rows=24535 width=5) (actual time=110666.899..127168.304 rows=69878 loops=1)
Output: userid, count(userid)
Buffers: shared hit=906 read=82433, temp read=19358 written=19358
-> Sort (cost=1756177.54..1781177.68 rows=10000054 width=5) (actual time=110666.838..125180.683 rows=10000054 loops=1)
Output: userid
Sort Key: ratings.userid
Sort Method: external merge Disk: 154840kB
Buffers: shared hit=906 read=82433, temp read=19358 written=19358
-> Seq Scan on movielens_10m.ratings (cost=0.00..183334.54 rows=10000054 width=5) (actual time=0.019..2889.583 rows=10000054 loops=1)
Output: userid
Buffers: shared hit=901 read=82433
Total runtime: 127193.524 ms
EDIT 2
#a_horse_with_no_name's comment solved the problem. I feel happy to share my findings:
SET work_mem = '1MB';
EXPLAIN SELECT userid,COUNT(userid) AS rcount
FROM movielens_10m.ratings
GROUP BY userId
produces the same as above:
GroupAggregate (cost=1756177.54..1831423.30 rows=24535 width=5)
-> Sort (cost=1756177.54..1781177.68 rows=10000054 width=5)
Sort Key: userid
-> Seq Scan on ratings (cost=0.00..183334.54 rows=10000054 width=5)
However,
SET work_mem = '10MB';
EXPLAIN SELECT userid,COUNT(userid) AS rcount
FROM movielens_10m.ratings
GROUP BY userId
gives
HashAggregate (cost=233334.81..233580.16 rows=24535 width=5)
-> Seq Scan on ratings (cost=0.00..183334.54 rows=10000054 width=5)
The query now only takes about 3.5 seconds to complete.
Consider how your query could possibly return a result... You could build a variable-length hash and create/increment its values; or you could sort all rows by userid and count. Computationally, the latter option is cheaper. That is what Postgres does.
Then consider how to sort the data, taking disk IO into account. One option is to open disk pages A, B, C, D, etc., and then sorting rows by userid in memory. In other words, seq scan followed by a sort. The other option, called an index scan, would be to pull rows in order by using an index: visit page B, then D, then A, then B again, A again, C, ad nausea.
An index scan is efficient when pulling a handful of rows in order; not so much to fetch many rows in order — let alone all rows in order. As such, the plan you're getting is the optimal one:
Plough throw all rows (seq scan)
Sort rows to group by criteria
Count rows by criteria
Trouble is, you're sorting roughly 10 million rows in order to count them by userid. Nothing will make things faster short of investing in more RAM and super fast SSDs.
You can, however, avoid this query altogether. Either:
Count ratings for the handful of users that you actually need — using a where clause — instead of pulling the entire set; or
Add a ratings_count field to your users table and use triggers on ratings to maintain the count.
Use a materialized view, if the precise count is less relevant than having a vague idea of it.
Try like below, because COUNT(*) and COUNT(userid) makes a lot of difference.
SELECT userid, COUNT(userid) AS rcount
FROM ratings
GROUP BY userid
You can try to run 'VACUUM ANALYZE ratings' to update data Statics, so the optimizer can choose a better scenario to execute SQL.
I have 2 tables in PostgreSQL 9.1 - flight_2012_09_12 containing approx 500,000 rows and position_2012_09_12 containing about 5.5 million rows. I'm running a simple join query and it's taking a long time to complete and despite the fact the tables aren't small I'm convinced there are some major gains to be made in the execution.
The query is:
SELECT f.departure, f.arrival,
p.callsign, p.flightkey, p.time, p.lat, p.lon, p.altitude_ft, p.speed
FROM position_2012_09_12 AS p
JOIN flight_2012_09_12 AS f
ON p.flightkey = f.flightkey
WHERE p.lon < 0
AND p.time BETWEEN '2012-9-12 0:0:0' AND '2012-9-12 23:0:0'
The output of explain analyze is:
Hash Join (cost=239891.03..470396.82 rows=4790498 width=51) (actual time=29203.830..45777.193 rows=4403717 loops=1)
Hash Cond: (f.flightkey = p.flightkey)
-> Seq Scan on flight_2012_09_12 f (cost=0.00..1934.31 rows=70631 width=12) (actual time=0.014..220.494 rows=70631 loops=1)
-> Hash (cost=158415.97..158415.97 rows=3916885 width=43) (actual time=29201.012..29201.012 rows=3950815 loops=1)
Buckets: 2048 Batches: 512 (originally 256) Memory Usage: 1025kB
-> Seq Scan on position_2012_09_12 p (cost=0.00..158415.97 rows=3916885 width=43) (actual time=0.006..14630.058 rows=3950815 loops=1)
Filter: ((lon < 0::double precision) AND ("time" >= '2012-09-12 00:00:00'::timestamp without time zone) AND ("time" <= '2012-09-12 23:00:00'::timestamp without time zone))
Total runtime: 58522.767 ms
I think the problem lies with the sequential scan on the position table but I can't figure out why it's there. The table structures with indexes are below:
Table "public.flight_2012_09_12"
Column | Type | Modifiers
--------------------+-----------------------------+-----------
callsign | character varying(8) |
flightkey | integer |
source | character varying(16) |
departure | character varying(4) |
arrival | character varying(4) |
original_etd | timestamp without time zone |
original_eta | timestamp without time zone |
enroute | boolean |
etd | timestamp without time zone |
eta | timestamp without time zone |
equipment | character varying(6) |
diverted | timestamp without time zone |
time | timestamp without time zone |
lat | double precision |
lon | double precision |
altitude | character varying(7) |
altitude_ft | integer |
speed | character varying(4) |
asdi_acid | character varying(4) |
enroute_eta | timestamp without time zone |
enroute_eta_source | character varying(1) |
Indexes:
"flight_2012_09_12_flightkey_idx" btree (flightkey)
"idx_2012_09_12_altitude_ft" btree (altitude_ft)
"idx_2012_09_12_arrival" btree (arrival)
"idx_2012_09_12_callsign" btree (callsign)
"idx_2012_09_12_departure" btree (departure)
"idx_2012_09_12_diverted" btree (diverted)
"idx_2012_09_12_enroute_eta" btree (enroute_eta)
"idx_2012_09_12_equipment" btree (equipment)
"idx_2012_09_12_etd" btree (etd)
"idx_2012_09_12_lat" btree (lat)
"idx_2012_09_12_lon" btree (lon)
"idx_2012_09_12_original_eta" btree (original_eta)
"idx_2012_09_12_original_etd" btree (original_etd)
"idx_2012_09_12_speed" btree (speed)
"idx_2012_09_12_time" btree ("time")
Table "public.position_2012_09_12"
Column | Type | Modifiers
-------------+-----------------------------+-----------
callsign | character varying(8) |
flightkey | integer |
time | timestamp without time zone |
lat | double precision |
lon | double precision |
altitude | character varying(7) |
altitude_ft | integer |
course | integer |
speed | character varying(4) |
trackerkey | integer |
the_geom | geometry |
Indexes:
"index_2012_09_12_altitude_ft" btree (altitude_ft)
"index_2012_09_12_callsign" btree (callsign)
"index_2012_09_12_course" btree (course)
"index_2012_09_12_flightkey" btree (flightkey)
"index_2012_09_12_speed" btree (speed)
"index_2012_09_12_time" btree ("time")
"position_2012_09_12_flightkey_idx" btree (flightkey)
"test_index" btree (lon)
"test_index_lat" btree (lat)
I can't think of any other way to rewrite the query and so I'm stumped at this point. If the current setup is as good as it gets so be it but it seems to me that it should be much faster than it currently is. Any help would be much appreciated.
The row count estimates are pretty reasonable, so I doubt this is a stats issue.
I'd try:
Creating an index on position_2012_09_12(lon,"time") or possibly a partial index on position_2012_09_12("time") WHERE (lon < 0) if you routinely search for lon < 0.
Setting random_page_cost lower, maybe 1.1. See if (a) this changes the plan and (b) if the new plan is actually faster. For testing purposes to see if avoiding a seqscan would be faster you can SET enable_seqscan = off; if it is, change the cost paramters.
Increase work_mem for this query. SET work_mem = 10M or something before running it.
Running the latest PostgreSQL if you aren't already. Always specify your PostgreSQL version in questions. (Update after edit): You're on 9.1; that's fine. The biggest performance improvement in 9.2 was index-only scans, and it doesn't seem likely that you'd benefit massively from index-only scans for this query.
You'll also somewhat improve performance if you can get rid of columns to narrow the rows. It won't make tons of difference, but it'll make some.
The reason you are getting a sequential scan is that Postgres believes that it will read less disk pages that way than using indexes. It is probably right. Consider, if you use a non-covering index, you need to read all the matching index pages. it essentially outputs a list of row identifiers. The DB engine then needs to read each of the matching data pages.
Your position table uses 71 bytes per row, plus whatever a geom type takes (I'll assume 16 bytes for illustration), making 87 bytes. A Postgres page is 8192 bytes. So you have approximately 90 rows per pages.
Your query matches 3950815 out of 5563070 rows, or about 70% of the total. Assuming the data is randomly distributed, with regard to your where filters, there is a pretty much a 30% ^ 90 chance of finding a data page with no matching row. This is essentially nothing. So regardless of how good your indexes are, you're still going to have to read all the data pages. If you're going to have to read all the pages anyway, a table scan is usually a good approach.
The one get out here, is that I said non-covering index. If you are prepared to create indexes that can answer queries in of themselves, you can avoid looking up the data pages at all, so you are back in the game. I'd suggest the following are worth looking at:
flight_2012_09_12 (flightkey, departure, arrival)
position_2012_09_12 (filghtkey, time, lon, ...)
position_2012_09_12 (lon, time, flightkey, ...)
position_2012_09_12 (time, long, flightkey, ...)
The dots here represent the rest of the columns you are selecting. You'll only need one of the indexes on position, but it's hard to tell which will prove the best. The first approach may permit a merge join on presorted data, with the cost of reading the whole second index to do the filtering. The second and third will allow data to be prefiltered, but require a hash join. Give how much of the cost appears to be in the hash join, the merge join might well be a good option.
As your query requires 52 of the 87 bytes per row, and indexes have overheads, you may not end up with the index taking much, if any, less space then the table itself.
Another approach is to attack the "randomly distributed" side of it, by looking at clustering.