Covering index to improve performance of a SELECT query? - sql

I have following Postgres query that runs over a large table and whenever it does, I observe a high CPU load that reaches 100% :
SELECT id, user_id, type
FROM table ss
WHERE end_timestamp > CURRENT_TIMESTAMP
AND (notification_channel1 = true OR notification_channel2 = true)
AND last_notification_timestamp < CURRENT_TIMESTAMP - interval '1 days'
ORDER BY user_id, type, title
Table schema:
CREATE TABLE saved_search (
id BIGINT PRIMARY KEY,
user_id varchar(40) NOT NULL,
title varchar(500) NOT NULL,
start_timestamp timestamp with time zone NOT NULL,
end_timestamp timestamp with time zone NOT NULL,
last_notification_timestamp timestamp with time zone NOT NULL,
notification_channel1 boolean NOT NULL DEFAULT TRUE,
notification_channel2 boolean NOT NULL DEFAULT TRUE,
);
I am thinking of using a covering index to speed up this query and hopefully avoid the cpu spike.
First question: would that be a valid path to follow?
The index I'm thinking of would be something like:
CREATE INDEX IX_COVERING
ON table (end_date, notification_channel1, notification_channel2, last_notification_timestamp)
INCLUDE (id, user_id, type)
Would that be useful ? Would the INCLUDE be needed actually?
Should I change the order of the columns in the index?
Are there other / better approaches?
Gather Merge (cost=387647.26..737625.73 rows=2999606 width=40) (actual time=36085.232..47805.119 rows=3634052 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=386647.24..390396.74 rows=1499803 width=40) (actual time=35820.825..40497.683 rows=1211351 loops=3)
" Sort Key: user_id, type, title"
Sort Method: external merge Disk: 63640kB
Worker 0: Sort Method: external merge Disk: 63944kB
Worker 1: Sort Method: external merge Disk: 57896kB
-> Parallel Seq Scan on table (cost=0.00..150768.88 rows=1499803 width=40) (actual time=0.200..4176.269 rows=1211351 loops=3)
Filter: ((notification_channel1 OR notification_channel2) AND (end_timestamp > CURRENT_TIMESTAMP) AND (last_notification_timestamp < CURRENT_TIMESTAMP - interval '1 days'))
Rows Removed by Filter: 136960
Planning Time: 3.632 ms
Execution Time: 48292.788 ms
With SET work_mem = '100MB';, here is the output I'm getting:
`Gather Merge (cost=305621.26..655599.73 rows=2999606 width=40) (actual time=48856.376..55606.264 rows=3634097 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=304621.24..308370.74 rows=1499803 width=40) (actual time=46436.173..47563.732 rows=1211366 loops=3)
" Sort Key: user_id, type, title"
Sort Method: external merge Disk: 72232kB
Worker 0: Sort Method: external merge Disk: 55288kB
Worker 1: Sort Method: external merge Disk: 57816kB
-> Parallel Seq Scan on table (cost=0.00..150768.88 rows=1499803 width=40) (actual time=0.911..4643.228 rows=1211366 loops=3)
Filter: ((notification_channel1 OR notification_channel2) AND (end_timestamp > CURRENT_TIMESTAMP) AND ((ast_notification_timestamp < CURRENT_TIMESTAMP - interval '1 days'))
Rows Removed by Filter: 136960
Planning Time: 0.450 ms
Execution Time: 56035.995 ms

You have "like 4 million rows" in the table and the query selects rows=3634052. That's 90 % of all rows.
Long story short: an index is not going to help (much). Probably not worth the added storage and maintenance cost.
More work_mem would help performance - as indicated by:
Sort Method: external merge Disk: ...
See:
Configuration parameter work_mem in PostgreSQL on Linux
You may not need more work_mem if you can optimize the ORDER BY.
user_id is something like : '95068145 , 401229705 , 407722349`
That column should be integer or bigint, most likely. Much more efficient for storage and sorting.
The big column title varchar(500) in your ORDER BY is probably expensive. Maybe something like left(title, 15) is good enough?
If you can bring down the size of ordering columns, an index (or partial index) on just those ordering columns may help some.
Either way, an upgrade to Postgres 15 will instantly improve performance. The release notes:
Performance improvements, particularly for in-memory and on-disk sorting.

Your sort is incredibly slow. If I simulate data with those date types and the same row count and data volume as yours, my parallel seq scan is twice as slow but my sort is 10 times faster (although you don't list the "type" column in your CREATE, but do use it in the query, so I had to improvise). So maybe you are using a very slow collation, or there is something pretty sad about your hardware. Another thought, how long does it take if you do EXPLAIN (ANALYZE, TIMING OFF)? Maybe your system has slow access to the clock, and doing all those clock calls slows things down.
If you can't do anything about your sort being so slow, then you would want an index only scan which can avoid the sort. Which means the index must start with the columns being sorted on. The columns other than those ones can be in the INCLUDE, but there is generally no point in putting them there rather than the body. The non-sort columns can be listed in any order.
(user_id, type, title, end_timestamp, notification_channel1, notification_channel2, last_notification_timestamp, id)
If your supplied params of the channels never change, then a partial index might be slightly better:
(user_id, type, title, end_timestamp, last_notification_timestamp, id) WHERE notification_channel1 = true OR notification_channel2 = true
You didn't increase your work_mem by enough to do the sorts in memory. The on-disk format is more compact than the in-memory format, so a sort that took 72,232kB of disk might take take ~250MB of memory to do it entirely in memory. Whether that comes with a financial cost we can't say, as we don't know how much RAM you currently have, or how much of that RAM typically goes unused, or how many concurrent sessions you expect to have. Increasing work_mem but not by enough to get rid of the external merge sort altogether can make the sorting slower (not dramatically, but noticeably) I think because it makes poorer use of the layers of caching above registers but below main memory (L1, L2, L3, etc.)

Related

Optimize aggregate query on massive table to refresh materialized view

Let's say I have the following PostgreSQL database schema:
Group
id: int
Task:
id: int
created_at: datetime
group: FK Group
I have the following Materialized View to calculate the number of tasks and the most recent Task.created_at value per group:
CREATE MATERIALIZED VIEW group_statistics AS (
SELECT
group.id as group_id,
MAX(task.created_at) AS latest_task_created_at,
COUNT(task.id) AS task_count
FROM group
LEFT OUTER JOIN task ON (group.id = task.group_id)
GROUP BY group.id
);
The Task table currently has 20 million records so refreshing this materialized view takes a long time (20-30 seconds). We've also been experiencing some short but major DB performance issues ever since we started refreshing the materialized every 10 min, even with CONCURRENTLY:
REFRESH MATERIALIZED VIEW CONCURRENTLY group_statistics;
Is there a more performant way to calculate these values? Note, they do NOT need to be exact. Approximate values are totally fine, e.g. latest_task_created_at can be 10-20 min delayed.
I'm thinking of caching these values on every write to the Task table. Either in Redis or in PostgreSQL itself.
Update
People are requesting the execution plan. EXPLAIN doesn't work on REFRESH but I ran EXPLAIN on the actual query. Note, it's different than my theoretical data model above. In this case, Database is Group and Record is Task. Also note, I'm on PostgreSQL 12.10.
EXPLAIN (analyze, buffers, verbose)
SELECT
store_database.id as database_id,
MAX(store_record.updated_at) AS latest_record_updated_at,
COUNT(store_record.id) AS record_count
FROM store_database
LEFT JOIN store_record ON (store_database.id = store_record.database_id)
GROUP BY store_database.id;
Output:
HashAggregate (cost=1903868.71..1903869.22 rows=169 width=32) (actual time=18227.016..18227.042 rows=169 loops=1)
" Output: store_database.id, max(store_record.updated_at), count(store_record.id)"
Group Key: store_database.id
Buffers: shared hit=609211 read=1190704
I/O Timings: read=3385.027
-> Hash Right Join (cost=41.28..1872948.10 rows=20613744 width=40) (actual time=169.766..14572.558 rows=20928339 loops=1)
" Output: store_database.id, store_record.updated_at, store_record.id"
Inner Unique: true
Hash Cond: (store_record.database_id = store_database.id)
Buffers: shared hit=609211 read=1190704
I/O Timings: read=3385.027
-> Seq Scan on public.store_record (cost=0.00..1861691.23 rows=20613744 width=40) (actual time=0.007..8607.425 rows=20928316 loops=1)
" Output: store_record.id, store_record.key, store_record.data, store_record.created_at, store_record.updated_at, store_record.database_id, store_record.organization_id, store_record.user_id"
Buffers: shared hit=609146 read=1190704
I/O Timings: read=3385.027
-> Hash (cost=40.69..40.69 rows=169 width=16) (actual time=169.748..169.748 rows=169 loops=1)
Output: store_database.id
Buckets: 1024 Batches: 1 Memory Usage: 16kB
Buffers: shared hit=65
-> Index Only Scan using store_database_pkey on public.store_database (cost=0.05..40.69 rows=169 width=16) (actual time=0.012..0.124 rows=169 loops=1)
Output: store_database.id
Heap Fetches: 78
Buffers: shared hit=65
Planning Time: 0.418 ms
JIT:
Functions: 14
" Options: Inlining true, Optimization true, Expressions true, Deforming true"
" Timing: Generation 2.465 ms, Inlining 15.728 ms, Optimization 92.852 ms, Emission 60.694 ms, Total 171.738 ms"
Execution Time: 18229.600 ms
Note, the large execution time. It sometimes takes 5-10 minutes to run. I would love to bring this down to consistently a few seconds max.
Update #2
People are requesting the execution plan when the query takes minutes. Here it is:
HashAggregate (cost=1905790.10..1905790.61 rows=169 width=32) (actual time=128442.799..128442.825 rows=169 loops=1)
" Output: store_database.id, max(store_record.updated_at), count(store_record.id)"
Group Key: store_database.id
Buffers: shared hit=114011 read=1685876 dirtied=367
I/O Timings: read=112953.619
-> Hash Right Join (cost=15.32..1874290.39 rows=20999810 width=40) (actual time=323.497..124809.521 rows=21448762 loops=1)
" Output: store_database.id, store_record.updated_at, store_record.id"
Inner Unique: true
Hash Cond: (store_record.database_id = store_database.id)
Buffers: shared hit=114011 read=1685876 dirtied=367
I/O Timings: read=112953.619
-> Seq Scan on public.store_record (cost=0.00..1862849.43 rows=20999810 width=40) (actual time=0.649..119522.406 rows=21448739 loops=1)
" Output: store_record.id, store_record.key, store_record.data, store_record.created_at, store_record.updated_at, store_record.database_id, store_record.organization_id, store_record.user_id"
Buffers: shared hit=113974 read=1685876 dirtied=367
I/O Timings: read=112953.619
-> Hash (cost=14.73..14.73 rows=169 width=16) (actual time=322.823..322.824 rows=169 loops=1)
Output: store_database.id
Buckets: 1024 Batches: 1 Memory Usage: 16kB
Buffers: shared hit=37
-> Index Only Scan using store_database_pkey on public.store_database (cost=0.05..14.73 rows=169 width=16) (actual time=0.032..0.220 rows=169 loops=1)
Output: store_database.id
Heap Fetches: 41
Buffers: shared hit=37
Planning Time: 5.390 ms
JIT:
Functions: 14
" Options: Inlining true, Optimization true, Expressions true, Deforming true"
" Timing: Generation 1.306 ms, Inlining 82.966 ms, Optimization 176.787 ms, Emission 62.561 ms, Total 323.620 ms"
Execution Time: 128474.490 ms
Your MV currently has 169 rows, so write costs are negligible (unless you have locking issues). It's all about the expensive sequential scan over the big table.
Full counts are slow
Getting exact counts per group ("database") is expensive. There is no magic bullet for that in Postgres. Postgres has to count all rows. If the table is all-visible (visibility map is up to date), Postgres can shorten the procedure somewhat by only traversing a covering index. (You did not provide indexes ...)
There are possible shortcuts with an estimate for the total row count in the whole table. But the same is not easily available per group. See:
Fast way to discover the row count of a table in PostgreSQL
But not that slow
That said, your query can still be substantially faster. Aggregate before the join:
SELECT id AS database_id
, r.latest_record_updated_at
, COALESCE(r.record_count, 0) AS record_count
FROM store_database d
LEFT JOIN (
SELECT r.database_id AS id
, max(r.updated_at) AS latest_record_updated_at
, count(*) AS record_count
FROM store_record r
GROUP BY 1
) r USING (id);
See:
Query with LEFT JOIN not returning rows for count of 0
And use the slightly faster (and equivalent in this case) count(*). Related:
PostgreSQL: running count of rows for a query 'by minute'
Also - visibility provided - count(*) can use any non-partial index, preferably the smallest, while count(store_record.id) is limited to an index on that column (and has to inspect values, too).
I/O is your bottleneck
You added the EXPLAIN plan for an expensive execution, and the skyrocketing I/O cost stands out. It dominates the cost of your query.
Fast plan:
Buffers: shared hit=609146 read=1190704
I/O Timings: read=3385.027
Slow plan:
Buffers: shared hit=113974 read=1685876 dirtied=367
I/O Timings: read=112953.619
Your Seq Scan on public.store_record spent 112953.619 ms on reading data file blocks. 367 dirtied buffers represent under 3MB and are only a tiny fraction of total I/O. Either way, I/O dominates the cost.
Either your storage system is abysmally slow or, more likely since I/O of the fast query costs 30x less, there is too much contention for I/O from concurrent work load (on an inappropriately configured system). One or more of these can help:
faster storage
better (more appropriate) server configuration
more RAM (and server config that allows more cache memory)
less concurrent workload
more efficient table design with smaller disk footprint
smarter query that needs to read fewer data blocks
upgrade to a current version of Postgres
Hugely faster without count
If there was no count, just latest_record_updated_at, this query would deliver that in close to no time:
SELECT d.id
, (SELECT r.updated_at
FROM store_record r
WHERE r.database_id = d.id
ORDER BY r.updated_at DESC NULLS LAST
LIMIT 1) AS latest_record_updated_at
FROM store_database d;
In combination with a matching index! Ideally:
CREATE INDEX store_record_database_id_idx ON store_record (database_id, updated_at DESC NULL LAST);
See:
Optimize GROUP BY query to retrieve latest row per user
The same index can also help the complete query above, even if not as dramatically. If the table is vacuumed enough (visibility map up to date) Postgres can do a sequential scan on the smaller index without involving the bigger table. Obviously matters more for wider table rows - especially easing your I/O problem.
(Of course, index maintenance adds costs, too ...)
Upgrade to use parallelism
Upgrade to the latest version of Postgres if at all possible. Postgres 14 or 15 have received various performance improvements over Postgres 12. Most importantly, quoting the release notes for Postgres 14:
Allow REFRESH MATERIALIZED VIEW to use parallelism (Bharath Rupireddy)
Could be massive for your use case. Related:
Materialized view refresh in parallel
Estimates?
Warning: experimental stuff.
You stated:
Approximate values are totally fine
I see only 169 groups ("databases") in the query plan. Postgres maintains column statistics. While the distinct count of groups is tiny and stays below the "statistics target" for the column store_record.database_id (which you have to make sure of!), we can work with this. See:
How to check statistics targets used by ANALYZE?
Unless you have very aggressive autovacuum settings, to get better estimates, run ANALYZE on database_id to update column statistics before running below query. (Also updates reltuples and relpages in pg_class.):
ANALYZE public.store_record(database_id);
Or even (to also update the visibility map for above query):
VACUUM ANALYZE public.store_record(database_id);
This was the most expensive part (with collateral benefits). And it's optional.
WITH ct(total_est) AS (
SELECT reltuples / relpages * (pg_relation_size(oid) / 8192)
FROM pg_class
WHERE oid = 'public.store_record'::regclass -- your table here
)
SELECT v.database_id, (ct.total_est * v.freq)::bigint AS estimate
FROM pg_stats s
, ct
, unnest(most_common_vals::text::int[], most_common_freqs) v(database_id, freq)
WHERE s.schemaname = 'public'
AND s.tablename = 'store_record'
AND s.attname = 'database_id';
The query relies on various Postgres internals and may break in the future major versions (though unlikely). Tested with Postgres 14, but works with Postgres 12, too. It's basically black magic. You need to know what you are doing. You have been warned.
But the query costs close to nothing.
Take exact values for latest_record_updated_at from the fast query above, and join to these estimates for the count.
Basic explanation: Postgres maintains column statistics in the system catalog pg_statistic. pg_stats is a view on it, easier to access. Among other things, "most common values" and their relative frequency are gathered. Represented in most_common_vals and most_common_freqs. Multiplied with the current (estimated) total count, we get estimates per group. You could do all of it manually, but Postgres is probably much faster and better at this.
For the computation of the total estimate ct.total_est see:
Fast way to discover the row count of a table in PostgreSQL
(Note the "Safe and explicit" form for this query.)
Given the explain plan, the sequential scan seems to be causing the slowness. An index can definitely help there.
You can also utilize index-only scans as there are few columns in the query. So you can use something like this for store_record table.
Create index idx_store_record_db_id btree(database_id) include (id, updated_at);
An index on id column on the store_database table is also needed.
Create index idx_db_id on store_database btree(id)
Sometimes in such cases it is necessary to think of completely different business logic solutions.
For example, the count operation is a very slow query. This cannot be accelerated by any means in DB. What can be done in such cases? Since I do not know your business logic in full detail, I will tell you several options. However, these options also have disadvantages. For example:
group_id id
---------------
1 12
1 145
1 100
3 652
3 102
We group it once and insert the numbers into a table.
group_id count_id
--------------------
1 3
3 2
After then, when each record is inserted into main table then we update the group table using with triggers. Like as this:
update group_table set count_id = count_id + 1 where group_id = new.group_id
Or like that:
update group_table set count_id = (select count(id) from main_table where group_id = new.group_id)
I am not talking about small details here. For updating row properly, we can use clause for update, so for update locks row for other transactions.
So, the main solution is that: Functions like count need to be executed separately on grouped data, not on the entire table at once. Similar solutions can be applied. I explained it for general understanding.
The disadvantage of this solution is that: if you have many inserting operations on this main table, so performance of inserting will be decrease.
Parallel plan
If you first collect the store_record statistics and then join that with the store_database, you'll get a better, parallelisable plan.
EXPLAIN (analyze, buffers, verbose)
SELECT
store_database.id as database_id,
s.latest_record_updated_at as latest_record_updated_at,
coalesce(s.record_count,0) as record_count
FROM store_database
LEFT JOIN
( SELECT
store_record.database_id as database_id,
MAX(store_record.updated_at) as latest_record_updated_at,
COUNT(store_record.id) as record_count
FROM store_record
GROUP BY store_record.database_id)
AS s ON (store_database.id = s.database_id);
Here's a demo - at the end you can see both queries return the exact same results, but the one I suggest runs faster and has a more flexible plan. The number of workers dispatched depends on your max_worker_processes, max_parallel_workers, max_parallel_workers_per_gather settings as well as some additional logic inside the planner.
With more rows in store_record the difference will be more pronounced. On my system with 40 million test rows it went down from 14 seconds to 3 seconds with one worker, 1.4 seconds when it caps out dispatching six workers out of 16 available.
Caching
I'm thinking of caching these values on every write to the Task table. Either in Redis or in PostgreSQL itself.
If it's an option, it's worth a try - you can maintain proper accuracy and instantly available statistics at the cost of some (deferrable) table throughput overhead. You can replace your materialized view with a regular table or add the statistics columns to store_database
create table store_record_statistics(
database_id smallint unique references store_database(id)
on update cascade,
latest_record_updated_at timestamptz,
record_count integer default 0);
insert into store_record_statistics --initializes table with view definition
SELECT g.id, MAX(s.updated_at), COUNT(*)
FROM store_database g LEFT JOIN store_record s ON g.id = s.database_id
GROUP BY g.id;
create index store_record_statistics_idx
on store_record_statistics (database_id)
include (latest_record_updated_at,record_count);
cluster verbose store_record_statistics using store_record_statistics_idx;
And leave keeping the table up to date to a trigger that fires each time store_record changes.
CREATE FUNCTION maintain_store_record_statistics_trigger()
RETURNS TRIGGER LANGUAGE plpgsql AS
$$ BEGIN
IF TG_OP IN ('UPDATE', 'DELETE') THEN --decrement and find second most recent updated_at
UPDATE store_record_statistics srs
SET (record_count,
latest_record_updated_at)
= (record_count - 1,
(SELECT s.updated_at
FROM store_record s
WHERE s.database_id = srs.database_id
ORDER BY s.updated_at DESC NULLS LAST
LIMIT 1))
WHERE database_id = old.database_id;
END IF;
IF TG_OP in ('INSERT','UPDATE') THEN --increment and pick most recent updated_at
UPDATE store_record_statistics
SET (record_count,
latest_record_updated_at)
= (record_count + 1,
greatest(
latest_record_updated_at,
new.updated_at))
WHERE database_id=new.database_id;
END IF;
RETURN NULL;
END $$;
Making the trigger deferrable decouples its execution time from the main operation but it'll still infer its costs at the end of the transaction.
CREATE CONSTRAINT TRIGGER maintain_store_record_statistics
AFTER INSERT OR UPDATE OF database_id OR DELETE ON store_record
INITIALLY DEFERRED FOR EACH ROW
EXECUTE PROCEDURE maintain_store_record_statistics_trigger();
TRUNCATE trigger cannot be declared FOR EACH ROW with the rest of events, so it has to be defined separately
CREATE FUNCTION maintain_store_record_statistics_truncate_trigger()
RETURNS TRIGGER LANGUAGE plpgsql AS
$$ BEGIN
update store_record_statistics
set (record_count, latest_record_updated_at)
= (0 , null);--wipes/resets all stats
RETURN NULL;
END $$;
CREATE TRIGGER maintain_store_record_statistics_truncate
AFTER TRUNCATE ON store_record
EXECUTE PROCEDURE maintain_store_record_statistics_truncate_trigger();
In my test, an update or delete of 10000 random rows in a 100-million-row table run in seconds. A single insert of 1000 new, randomly generated rows took 25ms without and 200ms with the trigger. A million was 30s and 3 minutes correspondingly.
A demo.
Partitioning-backed parallel plan
store_record might be a good fit for partitioning:
create table store_record(
id serial not null,
updated_at timestamptz default now(),
database_id smallint references store_database(id)
) partition by range (database_id);
DO $$
declare
vi_database_max_id smallint:=0;
vi_database_id smallint:=0;
vi_database_id_per_partition smallint:=40;--tweak for lower/higher granularity
begin
select max(id) from store_database into vi_database_max_id;
for vi_database_id in 1 .. vi_database_max_id by vi_database_id_per_partition loop
execute format ('
drop table if exists store_record_p_%1$s;
create table store_record_p_%1$s
partition of store_record
for VALUES from (%1$s) to (%1$s + %2$s)
with (parallel_workers=16);
', vi_database_id, vi_database_id_per_partition);
end loop;
end $$ ;
Splitting objects in this manner lets the planner split their scans accordingly, which works best with parallel workers, but doesn't require them. Even your initial, unaltered query behind the view is able to take advantage of this structure:
HashAggregate (cost=60014.27..60041.47 rows=2720 width=18) (actual time=910.657..910.698 rows=169 loops=1)
  Output: store_database.id, max(store_record_p_1.updated_at), count(store_record_p_1.id)
  Group Key: store_database.id
  Buffers: shared hit=827 read=9367 dirtied=5099 written=4145
  -> Hash Right Join (cost=71.20..45168.91 rows=1979382 width=14) (actual time=0.064..663.603 rows=1600020 loops=1)
        Output: store_database.id, store_record_p_1.updated_at, store_record_p_1.id
        Inner Unique: true
        Hash Cond: (store_record_p_1.database_id = store_database.id)
        Buffers: shared hit=827 read=9367 dirtied=5099 written=4145
        -> Append (cost=0.00..39893.73 rows=1979382 width=14) (actual time=0.014..390.152 rows=1600000 loops=1)
              Buffers: shared hit=826 read=9367 dirtied=5099 written=4145
              -> Seq Scan on public.store_record_p_1 (cost=0.00..8035.02 rows=530202 width=14) (actual time=0.014..77.130 rows=429068 loops=1)
                    Output: store_record_p_1.updated_at, store_record_p_1.id, store_record_p_1.database_id
                    Buffers: shared read=2733 dirtied=1367 written=1335
              -> Seq Scan on public.store_record_p_41 (cost=0.00..8067.36 rows=532336 width=14) (actual time=0.017..75.193 rows=430684 loops=1)
                    Output: store_record_p_41.updated_at, store_record_p_41.id, store_record_p_41.database_id
                    Buffers: shared read=2744 dirtied=1373 written=1341
              -> Seq Scan on public.store_record_p_81 (cost=0.00..8029.14 rows=529814 width=14) (actual time=0.017..74.583 rows=428682 loops=1)
                    Output: store_record_p_81.updated_at, store_record_p_81.id, store_record_p_81.database_id
                    Buffers: shared read=2731 dirtied=1366 written=1334
              -> Seq Scan on public.store_record_p_121 (cost=0.00..5835.90 rows=385090 width=14) (actual time=0.016..45.407 rows=311566 loops=1)
                    Output: store_record_p_121.updated_at, store_record_p_121.id, store_record_p_121.database_id
                    Buffers: shared hit=826 read=1159 dirtied=993 written=135
              -> Seq Scan on public.store_record_p_161 (cost=0.00..29.40 rows=1940 width=14) (actual time=0.008..0.008 rows=0 loops=1)
                    Output: store_record_p_161.updated_at, store_record_p_161.id, store_record_p_161.database_id
        -> Hash (cost=37.20..37.20 rows=2720 width=2) (actual time=0.041..0.042 rows=169 loops=1)
              Output: store_database.id
              Buckets: 4096 Batches: 1 Memory Usage: 38kB
              Buffers: shared hit=1
              -> Seq Scan on public.store_database (cost=0.00..37.20 rows=2720 width=2) (actual time=0.012..0.021 rows=169 loops=1)
                    Output: store_database.id
                    Buffers: shared hit=1
Planning Time: 0.292 ms
Execution Time: 910.811 ms
Demo. It's best to test what granularity gives the best performance on your setup. You can even test sub-partioning, giving each store_record.database_id a partition, that is then sub-partitioned into date ranges, simplifying access to most recent entries.
MATERIALIZED VIEW is not a good idea for that ...
If you just want to "calculate the number of tasks and the most recent Task.created_at value per group" then I suggest you to simply :
Add two columns in the group table :
ALTER TABLE IF EXISTS "group" ADD COLUMN IF NOT EXISTS task_count integer SET DEFAULT 0 ;
ALTER TABLE IF EXISTS "group" ADD COLUMN IF NOT EXISTS last_created_date timestamp ; -- instead of datetime which does not really exist in postgres ...
Update these 2 columns from trigger fonctions defined on table task :
CREATE OR REPLACE FUNCTION task_insert() RETURNS trigger LANGUAGE plpgsql AS $$
BEGIN
UPDATE "group" AS g
SET task_count = count + 1
, last_created_at = NEW.created_at -- assuming that the last task inserted has the latest created_at datetime of the group, if not, then reuse the solution proposed in task_delete()
WHERE g.id = NEW.group ;
RETURN NEW ;
END ; $$ ;
CREATE OR REPLACE TRIGGER task_insert AFTER INSERT ON task
FOR EACH ROW EXECUTE FUNCTION task_insert () ;
CREATE OR REPLACE FUNCTION task_delete () RETURNS trigger LANGUAGE plpgsql AS $$
BEGIN
UPDATE "group" AS g
SET task_count = count - 1
, last_created_at = u.last_created_at
FROM
( SELECT max(created_at) AS last_created_at
FROM task
WHERE t.group = OLD.group
) AS u
WHERE g.id = OLD.group ;
RETURN OLD ;
END ; $$ ;
CREATE OR REPLACE TRIGGER task_insert AFTER DELETE ON task
FOR EACH ROW EXECUTE FUNCTION task_delete () ;
You will need to perform a setup action at the beginning ...
UPDATE "group" AS g
SET task_count = ref.count
, last_created_date = ref.last_created_at
FROM
( SELECT group
, max(created_at) AS last_created_at
, count(*) AS count
FROM task
GROUP BY group
) AS ref
WHERE g.id= ref.group ;
... but then you will have no more performance issue with the queries !!!
SELECT * FROM "group"
and you will optimize the size of your database ...

How to optimize large database query accessed with an item ID?

I'm making a site that stores a large amount of data (8 data points for 313 item_ids every 10 seconds over 24 hr) and I serve that data to users on demand. The request is supplied with an item ID with which I query the database with something along the lines of SELECT * FROM current_day_data WHERE item_id = <supplied ID> (assuming the id is valid).
CREATE TABLE current_day_data (
"time" bigint,
item_id text NOT NULL,
-- some data,
id integer NOT NULL
);
CREATE INDEX item_id_lookup ON public.current_day_data USING btree (item_id);
This works fine, but the request takes about a third of a second, so I'm looking into either other database options to help optimize this, or some way to optimize the query itself.
My current setup is a PostgreSQL database with an index on the item ID column, but I feel like there's options in the realm of NoSQL (an area I'm unfamiliar with) due to it's similarity to a hash table.
My ideal solution would be a hash table with the item IDs as the key and the data as a JSON-like object but I don't know what options could achieve that.
tl;dr how to optimize SELECT * FROM current_day_data WHERE item_id = <supplied ID> through better querying or new database solution?
edit: here's the EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM current_day_data
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Seq Scan on current_day_data (cost=0.00..46811.09 rows=2584364 width=75) (actual time=0.013..291.667 rows=2700251 loops=1)
Buffers: shared hit=39058
Planning:
Buffers: shared hit=112
Planning Time: 0.584 ms
Execution Time: 446.622 ms
(6 rows)
EXPLAIN with a specified item id EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM current_day_data WHERE item_id = 'SUGAR_CANE';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on current_day_info (cost=33.40..12099.27 rows=8592 width=75) (actual time=2.949..12.236 rows=8627 loops=1)
Recheck Cond: (product_id = 'SUGAR_CANE'::text)
Heap Blocks: exact=8570
Buffers: shared hit=8619
-> Bitmap Index Scan on prod_id_lookup (cost=0.00..32.97 rows=8592 width=0) (actual time=1.751..1.751 rows=8665 loops=1)
Index Cond: (product_id = 'SUGAR_CANE'::text)
Buffers: shared hit=12
Planning:
Buffers: shared hit=68
Planning Time: 0.339 ms
Execution Time: 12.686 ms
(11 rows)
Now this says 12.7ms which makes me think the 300ms has something to do with the library I'm using (SQLAlchemy), but that wouldn't really make sense since it's a popular library. More specifically, the line I'm using is:
results = CurrentDayData.query.filter(CurrentDayData.item_id == item_id).all()
That’s a very simple query that uses an index, therefore the only way to possible speed it up would be to improve the specification of your hardware.
Moving to a different form of database, on the same hardware, is not going to make a significant difference in performance to this type of query.

PostgreSQL query against millions of rows takes long on UUIDs

I have a reference table for UUIDs that is roughly 200M rows. I have ~5000 UUIDs that I want to look up in the reference table. Reference table looks like:
CREATE TABLE object_store AS (
project_id UUID,
object_id UUID,
object_name VARCHAR(20),
description VARCHAR(80)
);
CREATE INDEX object_store_project_idx ON object_store(project_id);
CREATE INDEX object_store_id_idx ON object_store(object_id);
* Edit #2 *
Request for the temp_objects table definition.
CREATE TEMPORARY TABLE temp_objects AS (
object_id UUID
)
ON COMMIT DELETE ROWS;
The reason for the separate index is because object_id is not unique, and can belong to many different projects. The reference table is just a temp table of UUIDs (temp_objects) that I want to check (5000 object_ids).
If I query the above reference table with 1 object_id literal value, it's almost instantaneous (2ms). If the temp table only has 1 row, again, instantaneous (2ms). But with 5000 rows it takes 25 minutes to even return. Granted it pulls back >3M rows of matches.
* Edited *
This is for 1 row comparison (4.198 ms):
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)SELECT O.project_id
FROM temp_objects T JOIN object_store O ON T.object_id = O.object_id;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.57..475780.22 rows=494005 width=65) (actual time=0.038..2.631 rows=1194 loops=1)
Buffers: shared hit=1202, local hit=1
-> Seq Scan on temp_objects t (cost=0.00..13.60 rows=360 width=16) (actual time=0.007..0.009 rows=1 loops=1)
Buffers: local hit=1
-> Index Scan using object_store_id_idx on object_store l (cost=0.57..1307.85 rows=1372 width=81) (actual time=0.027..1.707 rows=1194 loops=1)
Index Cond: (object_id = t.object_id)
Buffers: shared hit=1202
Planning time: 0.173 ms
Execution time: 3.096 ms
(9 rows)
Time: 4.198 ms
This is for 4911 row comparison (1579082.974 ms (26:19.083)):
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)SELECT O.project_id
FROM temp_objects T JOIN object_store O ON T.object_id = O.object_id;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.57..3217316.86 rows=3507438 width=65) (actual time=0.041..1576913.100 rows=8043500 loops=1)
Buffers: shared hit=5185078 read=2887548, local hit=71
-> Seq Scan on temp_objects d (cost=0.00..96.56 rows=2556 width=16) (actual time=0.009..3.945 rows=4911 loops=1)
Buffers: local hit=71
-> Index Scan using object_store_id_idx on object_store l (cost=0.57..1244.97 rows=1372 width=81) (actual time=1.492..320.081 rows=1638 loops=4911)
Index Cond: (object_id = t.object_id)
Buffers: shared hit=5185078 read=2887548
Planning time: 0.169 ms
Execution time: 1579078.811 ms
(9 rows)
Time: 1579082.974 ms (26:19.083)
Eventually I want to group and get a count of the matching object_ids by project_id, using standard grouping. The aggregate is at the upper end (of course) of the cost. It took just about 25 minutes again to complete the below query. Yet, when I limit the temp table to only 1 row, it comes back in 21ms. Something is not adding up...
EXPLAIN SELECT O.project_id, count(*)
FROM temp_objects T JOIN object_store O ON T.object_id = O.object_id GROUP BY O.project_id;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
HashAggregate (cost=6189484.10..6189682.84 rows=19874 width=73)
Group Key: o.project_id
-> Nested Loop (cost=0.57..6155795.69 rows=6737683 width=65)
-> Seq Scan on temp_objects t (cost=0.00..120.10 rows=4910 width=16)
-> Index Scan using object_store_id_idx on object_store o (cost=0.57..1239.98 rows=1372 width=81)
Index Cond: (object_id = t.object_id)
(6 rows)
I'm on PostgreSQL 10.6, running 2 CPUs and 8GB of RAM on an SSD. I have ANALYZEd the tables, I have set the work_mem to 50MB, shared_buffers to 2GB, and have set the random_page_cost to 1. All helped the queries actually to come back in several minutes, but still not as fast as I feel it should be.
I have the option to go to cloud computing if CPUs/RAM/parallelization make a big difference. Just looking for suggestions on how to get this simple query to return in < few seconds (if possible).
* UPDATE *
Taking the hint from Jürgen Zornig, I changed both object_id fields to be bigint, using just the top half of the UUID and reducing my datasize by half. Doing the aggregate query above the query now performs at ~16min.
Next, taking jjane's suggestion of set enable_nestloop to off, my aggregate query jumped to 6min! Unfortunately, all the other suggestions haven't sped it up past 6min, although it's interesting that changing my "TEMPORARY" table to a permanent one allowed 2 workers to work it, it didn't change the time. I think jjane is accurate by saying the IO is the binding factor here. Here is the latest explain plan from the 6min (wish it were faster, still, but it's better!):
explain (analyze, buffers, format text) select project_id, count(*) from object_store natural join temp_object group by project_id;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize GroupAggregate (cost=3966899.86..3967396.69 rows=19873 width=73) (actual time=368124.126..368744.157 rows=153633 loops=1)
Group Key: object_store.project_id
Buffers: shared hit=243022 read=2423215, temp read=3674 written=3687
I/O Timings: read=870720.440
-> Sort (cost=3966899.86..3966999.23 rows=39746 width=73) (actual time=368124.116..368586.497 rows=333427 loops=1)
Sort Key: object_store.project_id
Sort Method: external merge Disk: 29392kB
Buffers: shared hit=243022 read=2423215, temp read=3674 written=3687
I/O Timings: read=870720.440
-> Gather (cost=3959690.23..3963863.56 rows=39746 width=73) (actual time=366476.369..366827.313 rows=333427 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=243022 read=2423215
I/O Timings: read=870720.440
-> Partial HashAggregate (cost=3958690.23..3958888.96 rows=19873 width=73) (actual time=366472.712..366568.313 rows=111142 loops=3)
Group Key: object_store.project_id
Buffers: shared hit=243022 read=2423215
I/O Timings: read=870720.440
-> Hash Join (cost=132.50..3944473.09 rows=2843429 width=65) (actual time=7.880..363848.830 rows=2681167 loops=3)
Hash Cond: (object_store.object_id = temp_object.object_id)
Buffers: shared hit=243022 read=2423215
I/O Timings: read=870720.440
-> Parallel Seq Scan on object_store (cost=0.00..3499320.53 rows=83317153 width=73) (actual time=0.467..324932.880 rows=66653718 loops=3)
Buffers: shared hit=242934 read=2423215
I/O Timings: read=870720.440
-> Hash (cost=71.11..71.11 rows=4911 width=8) (actual time=7.349..7.349 rows=4911 loops=3)
Buckets: 8192 Batches: 1 Memory Usage: 256kB
Buffers: shared hit=66
-> Seq Scan on temp_object (cost=0.00..71.11 rows=4911 width=8) (actual time=0.014..2.101 rows=4911 loops=3)
Buffers: shared hit=66
Planning time: 0.247 ms
Execution time: 368779.757 ms
(32 rows)
Time: 368780.532 ms (06:08.781)
So I'm at 6min per query now. I think with I/O costs, I may try for an in-memory store on this table if possible to see if getting it off SSD makes it even better.
UUIDs are (EDIT) working against adaptive cache management and, because of their random nature effectively dropping the cache hit ratio because the index space is larger than memory. Ids cover a numerically wide range equally distributed, so in fact every Id lands pretty much on its own leaf on the index tree. As the index leaf determines in which data page the row is saved in disk pretty much every row gets its own page resulting in a whole lot of extremely expensive I/O Operations to get all these rows read in.
That's the reason why its generally not recommended to use UUIDs and if you really need UUIDs then at least generate timestamp/mac-prefixed UUIDs (have a look at uuid_generate_v1() - https://www.postgresql.org/docs/9.4/uuid-ossp.html) that are numerically close to each other, therefore chances are higher that data rows are clustered together on lesser data Pages resulting in fewer I/O Operations to get more Data in.
Long Story Short: Randomness over a large range kills your index (well actually not the index, it results in a lot of expensive I/O to get data on reading and to maintain the index on writing) and therefore slows queries down to a point where it is as good as having no index at all.
Here is also an article for reference
It looks like the centerpiece of your question is why it doesn't scale up from one input row to 5000 input rows linearly. But I think that this is a red herring. How are you choosing the one row? If you choose the same one row each time, then the data will stay in cache and will be very fast. I bet this is what you are doing. If you choose a different random one row each time you do a one-row plan, you will probably find the scaling to be more linear.
You should turn on track_io_timing. I have little doubt that IO is actually the bottleneck, but it is always nice to see it actually measured and reported, I have been surprised before.
The use of temporary table will inhibit parallel query. You might want to test with a permanent table, to see if you do get use of parallel workers, and if so, whether that actually helps. If you do this test, you should use your aggregation version of the query. They parallelize more efficiently than non-aggregation queries do, and if that is your ultimate goal that is what you should initially test with.
Another thing you could try is a large setting of effective_io_concurrency. But, that will only help if your plan uses bitmap scans to start with, which the plans you show do not. Setting random_page_cost from 1 to a slightly higher value might encourage it to use bitmap scans. (effective_io_concurrency is weird because bitmap plans can get a substantial realistic benefit from a higher setting, but the planner doesn't give bitmap plans any credit for that benefit they receive. So you must be "accidentally" using that plan already in order to get the benefit)
At some point (as you increase the number of rows in temp_objects) it is going to be faster to hash that table, and hashjoin it to a seq-scan of the object_store table. Is 5000 already past the point at which that would be faster? The planner clearly doesn't think so, but the planner never gets the cut-over point exactly right, and is often off by quite a bit. What happens if you set enable_nestloop TO off; before running your query?
Have you done low-level benchmarking of your SSD (outside of the database)? Assuming substantially all of your time is spent on IO reads and nearly none of those are fulfilled by the filesystem cache, you are getting 1576913/2887548 = 0.55ms per read. That seems pretty long. That is about what I get on a bottom-dollar laptop where the SSD is being exposed through a VM layer. I'd expect better than that from server-grade hardware.
Be sure you have also a proper index for temp_objects table
CREATE INDEX temp_object_id_idx ON temp_objects(object_id);
SELECT O.project_id
FROM temp_objects T
JOIN object_store O ON T.object_id = O.object_id;
Firstly: I would try to get the index into memory. What is shared_buffers set to? If it is small, lets make that bigger first. See if we can reduce the index scan IO.
Next: Are parallel queries enabled? I'm not sure that will help here very much because you have only 2 cpus, but it wouldn't hurt.
Even though the object column is completely random, I'd also bump up the statistics on that table from the default (100 rows or something like that) to a few thousand rows. Then run Analyze again. (or for thoroughness, vacuum analyze)
Work Mem at 50M may be low too. It could potentially be larger if you don't have a lot of concurrent users and you have G's of RAM to work with. Too large and it can be counter productive, but you could go up a bit more to see if it helps.
You could try CTAS on the big table into a new table to sort object id so that it isn't completely random.
There might be a crazy partitioning scheme you could come up with if you were using PostgreSQL 12 that would group the object ids into some even partition distribution.

With more than 30 millions entries in a table, a select count on it takes 19 minutes to perform

I am using PostgreSQL 9.4.8 32 bits on a Windows 7 64 bits OS.
I am using a RAID 5 on 3 disks of 2T eachs. CPU is Xeon E3-1225v3 with 8G of RAM.
In a table, I have inserted more than 30 millions entries (I want to go up to 50 millions).
Performing a select count(*) on this table is taking more than 19 minutes. Performing this query a second times reduce it to 14 minutes, but it is still slow. Indexes doesn't seems to do anything.
My postgresql.conf is setup like this at the end of the file :
max_connections = 100
shared_buffers = 512MB
effective_cache_size = 6GB
work_mem = 13107kB
maintenance_work_mem = 512MB
checkpoint_segments = 32
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.2
Here is the schema of this table :
CREATE TABLE recorder.records
(
recorder_id smallint NOT NULL DEFAULT 200,
rec_start timestamp with time zone NOT NULL,
rec_end timestamp with time zone NOT NULL,
deleted boolean NOT NULL DEFAULT false,
channel_number smallint NOT NULL,
channel_name text,
from_id text,
from_name text,
to_id text,
to_name text,
type character varying(32),
hash character varying(128),
codec character varying(16),
id uuid NOT NULL,
status smallint,
duration interval,
CONSTRAINT records_pkey PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
)
CREATE INDEX "idxRecordChanName"
ON recorder.records
USING btree
(channel_name COLLATE pg_catalog."default");
CREATE INDEX "idxRecordChanNumber"
ON recorder.records
USING btree
(channel_number);
CREATE INDEX "idxRecordEnd"
ON recorder.records
USING btree
(rec_end);
CREATE INDEX "idxRecordFromId"
ON recorder.records
USING btree
(from_id COLLATE pg_catalog."default");
CREATE INDEX "idxRecordStart"
ON recorder.records
USING btree
(rec_start);
CREATE INDEX "idxRecordToId"
ON recorder.records
USING btree
(to_id COLLATE pg_catalog."default");
CREATE INDEX "idxRecordsStart"
ON recorder.records
USING btree
(rec_start);
CREATE TRIGGER trig_update_duration
AFTER INSERT
ON recorder.records
FOR EACH ROW
EXECUTE PROCEDURE recorder.fct_update_duration();
My query is like this :
select count(*) from recorder.records as rec where rec.rec_start < '2016-01-01' and channel_number != 42;
Explain analyse of this query :
Aggregate (cost=1250451.14..1250451.15 rows=1 width=0) (actual time=956017.494..956017.494 rows=1 loops=1)
-> Seq Scan on records rec (cost=0.00..1195534.66 rows=21966592 width=0) (actual time=34.581..950947.593 rows=23903295 loops=1)
Filter: ((rec_start < '2016-01-01 00:00:00-06'::timestamp with time zone) AND (channel_number <> 42))
Rows Removed by Filter: 7377886
Planning time: 0.348 ms
Execution time: 956017.586 ms
The same now, but by disabling seqscan :
Aggregate (cost=1456272.87..1456272.88 rows=1 width=0) (actual time=929963.288..929963.288 rows=1 loops=1)
-> Bitmap Heap Scan on records rec (cost=284158.85..1401356.39 rows=21966592 width=0) (actual time=118685.228..925629.113 rows=23903295 loops=1)
Recheck Cond: (rec_start < '2016-01-01 00:00:00-06'::timestamp with time zone)
Rows Removed by Index Recheck: 2798893
Filter: (channel_number <> 42)
Rows Removed by Filter: 612740
Heap Blocks: exact=134863 lossy=526743
-> Bitmap Index Scan on "idxRecordStart" (cost=0.00..278667.20 rows=22542169 width=0) (actual time=118628.930..118628.930 rows=24516035 loops=1)
Index Cond: (rec_start < '2016-01-01 00:00:00-06'::timestamp with time zone)
Planning time: 0.279 ms
Execution time: 929965.547 ms
How can I make this kind of query faster ?
Added :
I have created an index using rec_start and channel_number, and after a vacuum analyse that took 57minutes, the query now is done in a little more than 3 minutes :
CREATE INDEX "plopLindex"
ON recorder.records
USING btree
(rec_start, channel_number);
Explain buffers verbose of the same query :
explain (analyse, buffers, verbose) select count(*) from recorder.records as rec where rec.rec_start < '2016-01-01' and channel_number != 42;
Aggregate (cost=875328.61..875328.62 rows=1 width=0) (actual time=199610.874..199610.874 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=69490 read=550462 dirtied=75118 written=51880"
-> Index Only Scan using "plopLindex" on recorder.records rec (cost=0.56..814734.15 rows=24237783 width=0) (actual time=66.115..197609.019 rows=23903295 loops=1)
Output: rec_start, channel_number
Index Cond: (rec.rec_start < '2016-01-01 00:00:00-06'::timestamp with time zone)
Filter: (rec.channel_number <> 42)
Rows Removed by Filter: 612740
Heap Fetches: 5364345
Buffers: shared hit=69490 read=550462 dirtied=75118 written=51880
Planning time: 12.416 ms
Execution time: 199610.988 ms
Then performing a second time this query (without explain) : 11secs ! great improvement.
Seing your number of rows, this doesnt sound abnormal to me and is going to be the same on other RDBMS.
You have too many rows to get the results fast and since you have a WHERE clause, the only solution to get your row count fast is to create specific table(s) to keep track of that, populated with either a TRIGGER on INSERT, or a batch job.
The TRIGGER solution is 100% accurate but more intensive, the batch solution is approximative but more flexible, and the more you increase the batch job frequency, the more accurate your stats are;
In your case I would go for the second solution and create one or several aggregation tables.
You can have for instance a batch job that count all rows grouped by date and channel
An example of an aggregation table for this specific need would be
CREATE TABLE agr_table (AGR_TYPE CHAR(50), AGR_DATE DATE, AGR_CHAN SMALLINT, AGR_CNT INT)
Your batch job would do:
DELETE FROM agr_table WHERE AGR_TYPE='group_by_date_and_channel';
INSERT INTO agr_table
SELECT 'group_by_date_and_channel', rec_start, channel_number, count(*) as cnt
FROM recorder.records
GROUP BY rec_start, channel_number
;
Then you can retrieve stats fast by doing :
SELECT SUM(cnt)
FROM agr_table
WHERE AGR_DATE < '2016-01-01' and AGR_CHAN != 42
That's a very simple example of course. You should design your agregation table(s) depending on the stats you need to retrieve fast.
I would suggest to read carefully Postgres Slow Counting and Postgres Count Estimate
Yes, you've created the correct index to cover your query arguments. Thomas G also suggested you a nice workaround. I totally agree.
But there is another thing I want to share with you as well: the fact the second run took only 11sec (against 3min from the first one) sounds to me you are facing a "caching issue".
When you ran the first execution, postgres grabed table pages from disk to the RAM and when you did the second run, everything it needs already was in memory and it took only 11sec to run.
I used to have the exactly same problem and my "best" solution was simply give postgres more shared_buffers. I don't rely on OS's file cache. I reserve most memory I can to postgres use. But it's a pain in the *** do that simple change in windows. You have OS limitations and Windows "wastes" too much memory to run it self. It's a shame.
Believe me... you don't have to change your hardware adding more RAM (either way, adding more memory is always something good!). The most effective change is to change your OS. And if you have a "dedicated" server, why to waste so precious memory with video/sound/drivers/services/AV/etc... on those things that you don't (or won't) ever use?
Go to a Linux OS (Ubuntu Server, perhaps?) and get much more performance at exactly same hardware.
Change kernel.shmmax to a greater value:
sysctl -w kernel.shmmax=14294967296
echo kernel.shmmax = 14294967296 >>/etc/sysctl.conf
and then you can change postgresql.conf to:
shared_buffers = 6GB
effective_cache_size = 7GB
work_mem = 128MB
You're gonna feel the difference right the way.

How can I optimize this SQL query in Postgres?

I've got a pretty large table with nearly 1 million rows and some of the queries are taking a long time (over a minute).
Here is one that's giving me a particularly hard time...
EXPLAIN ANALYZE SELECT "apps".* FROM "apps" WHERE "apps"."kind" = 'software' ORDER BY itunes_release_date DESC, rating_count DESC LIMIT 12;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Limit (cost=153823.03..153823.03 rows=12 width=2091) (actual time=162681.166..162681.194 rows=12 loops=1)
-> Sort (cost=153823.03..154234.66 rows=823260 width=2091) (actual time=162681.159..162681.169 rows=12 loops=1)
Sort Key: itunes_release_date, rating_count
Sort Method: top-N heapsort Memory: 48kB
-> Seq Scan on apps (cost=0.00..150048.41 rows=823260 width=2091) (actual time=0.718..161561.149 rows=808554 loops=1)
Filter: (kind = 'software'::text)
Total runtime: 162682.143 ms
(7 rows)
So, how would I optimize that? PG version is 9.2.4, FWIW.
There are already indexes on kind and kind, itunes_release_date.
Looks like you're missing an index, e.g. on (kind, itunes_release_date desc, rating_count desc).
How big is the apps table? Do you have at least this much memory allocated to postgres? If it's having to read from disk every time, query speed will be much slower.
Another thing that may help is to cluster the table on the 'apps' column. This may speed up disk access since all the software rows will be stored sequentially on disk.
The only way to speed up this query is to create a composite index on (itunes_release_date, rating_count). It will allow Postgres to pick first N rows from the index directly.