I have a table of events that has a very similar schema and data distribution as this artificial table that can easily be generated locally:
CREATE TABLE events AS
WITH args AS (
SELECT
300 AS scale_factor, -- feel free to reduce this to speed up local testing
1000 AS pa_count,
1 AS l_count_min,
29 AS l_count_rand,
10 AS c_count,
10 AS pr_count,
3 AS r_count,
'10 days'::interval AS time_range -- edit 2017-05-02: the real data set has years worth of data here, but the query time ranges stay small (a couple days)
)
SELECT
p.c_id,
'ABC'||lpad(p.pa_id::text, 13, '0') AS pa_id,
'abcdefgh-'||((random()*(SELECT pr_count-1 FROM args)+1))::int AS pr_id,
((random()*(SELECT r_count-1 FROM args)+1))::int AS r,
'2017-01-01Z00:00:00'::timestamp without time zone + random()*(SELECT time_range FROM args) AS t
FROM (
SELECT
pa_id,
((random()*(SELECT c_count-1 FROM args)+1))::int AS c_id,
(random()*(SELECT l_count_rand FROM args)+(SELECT l_count_min FROM args))::int AS l_count
FROM generate_series(1, (SELECT pa_count*scale_factor FROM args)) pa_id
) p
JOIN LATERAL (
SELECT generate_series(1, p.l_count)
) l(id) ON (true);
Excerpt from SELECT * FROM events:
What I need is a query that selects all rows for a given c_id in a given time range of t, then filters them down to only include the most recent rows (by t) for each unique pr_id and pa_id combination, and then counts the number of pr_id and r combinations of those rows.
That's a quite a mouthful, so here are 3 SQL queries that I came up with that produce the desired results:
WITH query_a AS (
SELECT
pr_id,
r,
count(1) AS quantity
FROM (
SELECT DISTINCT ON (pr_id, pa_id)
pr_id,
pa_id,
r
FROM events
WHERE
c_id = 5 AND
t >= '2017-01-03Z00:00:00' AND
t < '2017-01-06Z00:00:00'
ORDER BY pr_id, pa_id, t DESC
) latest
GROUP BY
1,
2
ORDER BY 3, 2, 1 DESC
),
query_b AS (
SELECT
pr_id,
r,
count(1) AS quantity
FROM (
SELECT
pr_id,
pa_id,
first_not_null(r ORDER BY t DESC) AS r
FROM events
WHERE
c_id = 5 AND
t >= '2017-01-03Z00:00:00' AND
t < '2017-01-06Z00:00:00'
GROUP BY
1,
2
) latest
GROUP BY
1,
2
ORDER BY 3, 2, 1 DESC
),
query_c AS (
SELECT
pr_id,
r,
count(1) AS quantity
FROM (
SELECT
pr_id,
pa_id,
first_not_null(r) AS r
FROM events
WHERE
c_id = 5 AND
t >= '2017-01-03Z00:00:00' AND
t < '2017-01-06Z00:00:00'
GROUP BY
1,
2
) latest
GROUP BY
1,
2
ORDER BY 3, 2, 1 DESC
)
And here is the custom aggregate function used by query_b and query_c, as well as what I believe to be the most optimal index, settings and conditions:
CREATE FUNCTION first_not_null_agg(before anyelement, value anyelement) RETURNS anyelement
LANGUAGE sql IMMUTABLE STRICT
AS $_$
SELECT $1;
$_$;
CREATE AGGREGATE first_not_null(anyelement) (
SFUNC = first_not_null_agg,
STYPE = anyelement
);
CREATE INDEX events_idx ON events USING btree (c_id, t DESC, pr_id, pa_id, r);
VACUUM ANALYZE events;
SET work_mem='128MB';
My dilemma is that query_c outperforms query_a and query_b by a factor of > 6x, but is technically not guaranteed to produce the same result as the other queries (notice the missing ORDER BY in the first_not_null aggregate). However, in practice it seems to pick a query plan that I believe to be correct and most optimal.
Below are the EXPLAIN (ANALYZE, VERBOSE) outputs for all 3 queries on my local machine:
query_a:
CTE Scan on query_a (cost=25810.77..26071.25 rows=13024 width=44) (actual time=3329.921..3329.934 rows=30 loops=1)
Output: query_a.pr_id, query_a.r, query_a.quantity
CTE query_a
-> Sort (cost=25778.21..25810.77 rows=13024 width=23) (actual time=3329.918..3329.921 rows=30 loops=1)
Output: events.pr_id, events.r, (count(1))
Sort Key: (count(1)), events.r, events.pr_id DESC
Sort Method: quicksort Memory: 27kB
-> HashAggregate (cost=24757.86..24888.10 rows=13024 width=23) (actual time=3329.849..3329.892 rows=30 loops=1)
Output: events.pr_id, events.r, count(1)
Group Key: events.pr_id, events.r
-> Unique (cost=21350.90..22478.71 rows=130237 width=40) (actual time=3168.656..3257.299 rows=116547 loops=1)
Output: events.pr_id, events.pa_id, events.r, events.t
-> Sort (cost=21350.90..21726.83 rows=150375 width=40) (actual time=3168.655..3209.095 rows=153795 loops=1)
Output: events.pr_id, events.pa_id, events.r, events.t
Sort Key: events.pr_id, events.pa_id, events.t DESC
Sort Method: quicksort Memory: 18160kB
-> Index Only Scan using events_idx on public.events (cost=0.56..8420.00 rows=150375 width=40) (actual time=0.038..101.584 rows=153795 loops=1)
Output: events.pr_id, events.pa_id, events.r, events.t
Index Cond: ((events.c_id = 5) AND (events.t >= '2017-01-03 00:00:00'::timestamp without time zone) AND (events.t < '2017-01-06 00:00:00'::timestamp without time zone))
Heap Fetches: 0
Planning time: 0.316 ms
Execution time: 3331.082 ms
query_b:
CTE Scan on query_b (cost=67140.75..67409.53 rows=13439 width=44) (actual time=3761.077..3761.090 rows=30 loops=1)
Output: query_b.pr_id, query_b.r, query_b.quantity
CTE query_b
-> Sort (cost=67107.15..67140.75 rows=13439 width=23) (actual time=3761.074..3761.081 rows=30 loops=1)
Output: events.pr_id, (first_not_null(events.r ORDER BY events.t DESC)), (count(1))
Sort Key: (count(1)), (first_not_null(events.r ORDER BY events.t DESC)), events.pr_id DESC
Sort Method: quicksort Memory: 27kB
-> HashAggregate (cost=66051.24..66185.63 rows=13439 width=23) (actual time=3760.997..3761.049 rows=30 loops=1)
Output: events.pr_id, (first_not_null(events.r ORDER BY events.t DESC)), count(1)
Group Key: events.pr_id, first_not_null(events.r ORDER BY events.t DESC)
-> GroupAggregate (cost=22188.98..63699.49 rows=134386 width=32) (actual time=2961.471..3671.669 rows=116547 loops=1)
Output: events.pr_id, events.pa_id, first_not_null(events.r ORDER BY events.t DESC)
Group Key: events.pr_id, events.pa_id
-> Sort (cost=22188.98..22578.94 rows=155987 width=40) (actual time=2961.436..3012.440 rows=153795 loops=1)
Output: events.pr_id, events.pa_id, events.r, events.t
Sort Key: events.pr_id, events.pa_id
Sort Method: quicksort Memory: 18160kB
-> Index Only Scan using events_idx on public.events (cost=0.56..8734.27 rows=155987 width=40) (actual time=0.038..97.336 rows=153795 loops=1)
Output: events.pr_id, events.pa_id, events.r, events.t
Index Cond: ((events.c_id = 5) AND (events.t >= '2017-01-03 00:00:00'::timestamp without time zone) AND (events.t < '2017-01-06 00:00:00'::timestamp without time zone))
Heap Fetches: 0
Planning time: 0.385 ms
Execution time: 3761.852 ms
query_c:
CTE Scan on query_c (cost=51400.06..51660.54 rows=13024 width=44) (actual time=524.382..524.395 rows=30 loops=1)
Output: query_c.pr_id, query_c.r, query_c.quantity
CTE query_c
-> Sort (cost=51367.50..51400.06 rows=13024 width=23) (actual time=524.380..524.384 rows=30 loops=1)
Output: events.pr_id, (first_not_null(events.r)), (count(1))
Sort Key: (count(1)), (first_not_null(events.r)), events.pr_id DESC
Sort Method: quicksort Memory: 27kB
-> HashAggregate (cost=50347.14..50477.38 rows=13024 width=23) (actual time=524.311..524.349 rows=30 loops=1)
Output: events.pr_id, (first_not_null(events.r)), count(1)
Group Key: events.pr_id, first_not_null(events.r)
-> HashAggregate (cost=46765.62..48067.99 rows=130237 width=32) (actual time=401.480..459.962 rows=116547 loops=1)
Output: events.pr_id, events.pa_id, first_not_null(events.r)
Group Key: events.pr_id, events.pa_id
-> Index Only Scan using events_idx on public.events (cost=0.56..8420.00 rows=150375 width=32) (actual time=0.027..109.459 rows=153795 loops=1)
Output: events.c_id, events.t, events.pr_id, events.pa_id, events.r
Index Cond: ((events.c_id = 5) AND (events.t >= '2017-01-03 00:00:00'::timestamp without time zone) AND (events.t < '2017-01-06 00:00:00'::timestamp without time zone))
Heap Fetches: 0
Planning time: 0.296 ms
Execution time: 525.566 ms
Broadly speaking, I believe that the index above should allow query_a and query_b to be executed without the Sort nodes that slow them down, but so far I've failed to convince the postgres query optimizer to do my bidding.
I'm also somewhat confused about the t column not being included in the Sort key for query_b, considering that quicksort is not stable. It seems like this could yield the wrong results.
I've verified that all 3 queries generate the same results running the following queries and verifying they produce an empty result set:
SELECT * FROM query_a
EXCEPT
SELECT * FROM query_b;
and
SELECT * FROM query_a
EXCEPT
SELECT * FROM query_c;
I'd consider query_a to be the canonical query when in doubt.
I greatly appreciate any input on this. I've actually found a terribly hacky workaround to achieve acceptable performance in my application, but this problem continues to hunt me in my sleep (and in fact vacation, which I'm currently on) ... 😬.
FWIW, I've looked at many similar questions and answers which have guided my current thinking, but I believe there is something unique about the two column grouping (pr_id, pa_id) and having to sort by a 3rd column (t) that doesn't make this a duplicate question.
Edit: The outer queries in the example may be entirely irrelevant to the question, so feel free to ignore them if it helps.
I'd consider query_a to be the canonical query when in doubt.
I found a way to make query_a half a second fast.
Your inner query from query_a
SELECT DISTINCT ON (pr_id, pa_id)
needs to go with
ORDER BY pr_id, pa_id, t DESC
especially with pr_id and pa_id listed first.
c_id = 5 is const, but you cannot use your index event_idx (c_id, t DESC, pr_id, pa_id, r), because the columns are not organized by (pr_id, pa_id, t DESC), which your ORDER BY clause demands.
If you had an index on at least (pr_id, pa_id, t DESC) then the sorting does not have to happen, because the ORDER BY condition aligns with this index.
So here is what I did.
CREATE INDEX events_idx2 ON events (c_id, pr_id, pa_id, t DESC, r);
This index can be used by your inner query - at least in theory.
Unfortunately the query planner thinks that it's better to reduce the number of rows by using index events_idx with c_id and x <= t < y.
Postgres does not have index hints, so we need a different way to convince the query planner to take the new index events_idx2.
One way to force the use of events_idx2 is to make the other index more expensive.
This can be done by removing the last column r from events_idx and make it unusable for query_a (at least unusable without loading the pages from the heap).
It is counter-intuitive to move the t column later in the index layout, because usually the first columns will be chosen for = and ranges, which c_id and t qualify well for.
However, your ORDER BY (pr_id, pa_id, t DESC) mandates at least this subset as-is in your index. Of course we still put the c_id first to reduce the rows as soon as possible.
You can still have an index on (c_id, t DESC, pr_id, pa_id), if you need, but it cannot be used in query_a.
Here is the query plan for query_a with events_idx2 used and events_idx deleted.
Look for events_c_id_pr_id_pa_id_t_r_idx, which is how PG names indices automatically, when you don't give them a name.
I like it this way, because I can see the order of the columns in the name of the index in every query plan.
Sort (cost=30076.71..30110.75 rows=13618 width=23) (actual time=426.898..426.914 rows=30 loops=1)
Sort Key: (count(1)), events.r, events.pr_id DESC
Sort Method: quicksort Memory: 27kB
-> HashAggregate (cost=29005.43..29141.61 rows=13618 width=23) (actual time=426.820..426.859 rows=30 loops=1)
Group Key: events.pr_id, events.r
-> Unique (cost=0.56..26622.33 rows=136177 width=40) (actual time=0.037..328.828 rows=117204 loops=1)
-> Index Only Scan using events_c_id_pr_id_pa_id_t_r_idx on events (cost=0.56..25830.50 rows=158366 width=40) (actual time=0.035..178.594 rows=154940 loops=1)
Index Cond: ((c_id = 5) AND (t >= '2017-01-03 00:00:00'::timestamp without time zone) AND (t < '2017-01-06 00:00:00'::timestamp without time zone))
Heap Fetches: 0
Planning time: 0.201 ms
Execution time: 427.017 ms
(11 Zeilen)
The planning is instantaneously and the performance is sub second, because the index matches the ORDER BY of the inner query.
With good performance on query_a there is no need for an additional function to make alternative queries query_b and query_c faster.
Remarks:
Somehow I could not find a primary key in your relation.
The aforementioned proposed solution works without any primary key assumption.
I still think that you have some primary key, but maybe forgot to mention it.
The natural key is pa_id. Each pa_id refers to "a thing" that has
~1...30 events recorded about it.
If pa_id is in relation to multiple c_id's, then pa_id alone cannot be key.
If pr_id and r are data, then maybe (c_id, pa_id, t) is unique key?
Also your index events_idx is not created unique, but spans all columns of the relation, so you could have multiple equal rows - do you want to allow that?
If you really need both indices events_idx and the proposed events_idx2, then you will have the data stored 3 times in total (twice in indices, once on the heap).
Since this really is a tricky query optimization, I kindly ask you to at least consider adding a bounty for whoever answers your question, also since it has been sitting on SO without answer for quite some time.
EDIT A
I inserted another set of data using your excellently generated setup above, basically doubling the number of rows.
The dates started from '2017-01-10' this time.
All other parameters stayed the same.
Here is a partial index on the time attribute and it's query behaviour.
CREATE INDEX events_timerange ON events (c_id, pr_id, pa_id, t DESC, r) WHERE '2017-01-03' <= t AND t < '2017-01-06';
Sort (cost=12510.07..12546.55 rows=14591 width=23) (actual time=361.579..361.595 rows=30 loops=1)
Sort Key: (count(1)), events.r, events.pr_id DESC
Sort Method: quicksort Memory: 27kB
-> HashAggregate (cost=11354.99..11500.90 rows=14591 width=23) (actual time=361.503..361.543 rows=30 loops=1)
Group Key: events.pr_id, events.r
-> Unique (cost=0.55..8801.60 rows=145908 width=40) (actual time=0.026..265.084 rows=118571 loops=1)
-> Index Only Scan using events_timerange on events (cost=0.55..8014.70 rows=157380 width=40) (actual time=0.024..115.265 rows=155800 loops=1)
Index Cond: (c_id = 5)
Heap Fetches: 0
Planning time: 0.214 ms
Execution time: 361.692 ms
(11 Zeilen)
Without the index events_timerange (that's the regular full index).
Sort (cost=65431.46..65467.93 rows=14591 width=23) (actual time=472.809..472.824 rows=30 loops=1)
Sort Key: (count(1)), events.r, events.pr_id DESC
Sort Method: quicksort Memory: 27kB
-> HashAggregate (cost=64276.38..64422.29 rows=14591 width=23) (actual time=472.732..472.776 rows=30 loops=1)
Group Key: events.pr_id, events.r
-> Unique (cost=0.56..61722.99 rows=145908 width=40) (actual time=0.024..374.392 rows=118571 loops=1)
-> Index Only Scan using events_c_id_pr_id_pa_id_t_r_idx on events (cost=0.56..60936.08 rows=157380 width=40) (actual time=0.021..222.987 rows=155800 loops=1)
Index Cond: ((c_id = 5) AND (t >= '2017-01-03 00:00:00'::timestamp without time zone) AND (t < '2017-01-06 00:00:00'::timestamp without time zone))
Heap Fetches: 0
Planning time: 0.171 ms
Execution time: 472.925 ms
(11 Zeilen)
With the partial index the runtime is about 100ms faster, meanwhile the whole table is twice as big.
(Note: the second time around it was only 50ms faster. The advantage should grow, the more events are recorded, though, because the queries requiring the full index will become slower, as you suspect (and i agree)).
Also, on my machine, the full index is 810 MB for two inserts (create table + additional from 2017-01-10).
The partial index WHERE 2017-01-03 <= t < 2017-01-06 is only 91 MB.
Maybe you can create partial indices on a monthly or yearly basis?
Depending on what time range is queried, maybe only recent data needs to be indexed, or otherwise only old data partially?
I also tried partial indexing with WHERE c_id = 5, so partitioning by c_id.
Sort (cost=51324.27..51361.47 rows=14880 width=23) (actual time=550.579..550.592 rows=30 loops=1)
Sort Key: (count(1)), events.r, events.pr_id DESC
Sort Method: quicksort Memory: 27kB
-> HashAggregate (cost=50144.21..50293.01 rows=14880 width=23) (actual time=550.481..550.528 rows=30 loops=1)
Group Key: events.pr_id, events.r
-> Unique (cost=0.42..47540.21 rows=148800 width=40) (actual time=0.050..443.393 rows=118571 loops=1)
-> Index Only Scan using events_cid on events (cost=0.42..46736.42 rows=160758 width=40) (actual time=0.047..269.676 rows=155800 loops=1)
Index Cond: ((t >= '2017-01-03 00:00:00'::timestamp without time zone) AND (t < '2017-01-06 00:00:00'::timestamp without time zone))
Heap Fetches: 0
Planning time: 0.366 ms
Execution time: 550.706 ms
(11 Zeilen)
So partial indexing may also be a viable option.
If you get ever more data, then you may also consider partitioning, for example all rows aged two years and older into a separate table or something.
I don't think Block Range Indexes BRIN (indices) might help here, though.
If you machine is more beefy than mine, then you can just insert 10 times the amount of data and check the behaviour of the regular full index first and how it behaves on an increasing table.
[EDITED]
Ok, As this depend of your data distribution here is another way to do it.
First add the following index :
CREATE INDEX events_idx2 ON events (c_id, t DESC, pr_id, pa_id, r);
This extract the MAX(t) as quick as he can, on the assumption that the sub set will be way smaller to join on the parent table. It may however probably be slower if the dataset is not that small.
SELECT
e.pr_id,
e.r,
count(1) AS quantity
FROM events e
JOIN (
SELECT
pr_id,
pa_id,
MAX(t) last_t
FROM events e
WHERE
c_id = 5
AND t >= '2017-01-03Z00:00:00'
AND t < '2017-01-06Z00:00:00'
GROUP BY
pr_id,
pa_id
) latest
ON (
c_id = 5
AND latest.pr_id = e.pr_id
AND latest.pa_id = e.pa_id
AND latest.last_t = e.t
)
GROUP BY
e.pr_id,
e.r
ORDER BY 3, 2, 1 DESC
Full Fiddle
SQL Fiddle
PostgreSQL 9.3 Schema Setup:
--PostgreSQL 9.6
--'\\' is a delimiter
-- CREATE TABLE events AS...
VACUUM ANALYZE events;
CREATE INDEX idx_events_idx ON events (c_id, t DESC, pr_id, pa_id, r);
Query 1:
-- query A
explain analyze SELECT
pr_id,
r,
count(1) AS quantity
FROM (
SELECT DISTINCT ON (pr_id, pa_id)
pr_id,
pa_id,
r
FROM events
WHERE
c_id = 5 AND
t >= '2017-01-03Z00:00:00' AND
t < '2017-01-06Z00:00:00'
ORDER BY pr_id, pa_id, t DESC
) latest
GROUP BY
1,
2
ORDER BY 3, 2, 1 DESC
Results:
QUERY PLAN
Sort (cost=2170.24..2170.74 rows=200 width=15) (actual time=358.239..358.245 rows=30 loops=1)
Sort Key: (count(1)), events.r, events.pr_id
Sort Method: quicksort Memory: 27kB
-> HashAggregate (cost=2160.60..2162.60 rows=200 width=15) (actual time=358.181..358.189 rows=30 loops=1)
-> Unique (cost=2012.69..2132.61 rows=1599 width=40) (actual time=327.345..353.750 rows=12098 loops=1)
-> Sort (cost=2012.69..2052.66 rows=15990 width=40) (actual time=327.344..348.686 rows=15966 loops=1)
Sort Key: events.pr_id, events.pa_id, events.t
Sort Method: external merge Disk: 792kB
-> Index Only Scan using idx_events_idx on events (cost=0.42..896.20 rows=15990 width=40) (actual time=0.059..5.475 rows=15966 loops=1)
Index Cond: ((c_id = 5) AND (t >= '2017-01-03 00:00:00'::timestamp without time zone) AND (t < '2017-01-06 00:00:00'::timestamp without time zone))
Heap Fetches: 0
Total runtime: 358.610 ms
Query 2:
-- query max/JOIN
explain analyze SELECT
e.pr_id,
e.r,
count(1) AS quantity
FROM events e
JOIN (
SELECT
pr_id,
pa_id,
MAX(t) last_t
FROM events e
WHERE
c_id = 5
AND t >= '2017-01-03Z00:00:00'
AND t < '2017-01-06Z00:00:00'
GROUP BY
pr_id,
pa_id
) latest
ON (
c_id = 5
AND latest.pr_id = e.pr_id
AND latest.pa_id = e.pa_id
AND latest.last_t = e.t
)
GROUP BY
e.pr_id,
e.r
ORDER BY 3, 2, 1 DESC
Results:
QUERY PLAN
Sort (cost=4153.31..4153.32 rows=1 width=15) (actual time=68.398..68.402 rows=30 loops=1)
Sort Key: (count(1)), e.r, e.pr_id
Sort Method: quicksort Memory: 27kB
-> HashAggregate (cost=4153.29..4153.30 rows=1 width=15) (actual time=68.363..68.371 rows=30 loops=1)
-> Merge Join (cost=1133.62..4153.29 rows=1 width=15) (actual time=35.083..64.154 rows=12098 loops=1)
Merge Cond: ((e.t = (max(e_1.t))) AND (e.pr_id = e_1.pr_id))
Join Filter: (e.pa_id = e_1.pa_id)
-> Index Only Scan Backward using idx_events_idx on events e (cost=0.42..2739.72 rows=53674 width=40) (actual time=0.010..8.073 rows=26661 loops=1)
Index Cond: (c_id = 5)
Heap Fetches: 0
-> Sort (cost=1133.19..1137.19 rows=1599 width=36) (actual time=29.778..32.885 rows=12098 loops=1)
Sort Key: (max(e_1.t)), e_1.pr_id
Sort Method: external sort Disk: 640kB
-> HashAggregate (cost=1016.12..1032.11 rows=1599 width=36) (actual time=12.731..16.738 rows=12098 loops=1)
-> Index Only Scan using idx_events_idx on events e_1 (cost=0.42..896.20 rows=15990 width=36) (actual time=0.029..5.084 rows=15966 loops=1)
Index Cond: ((c_id = 5) AND (t >= '2017-01-03 00:00:00'::timestamp without time zone) AND (t < '2017-01-06 00:00:00'::timestamp without time zone))
Heap Fetches: 0
Total runtime: 68.736 ms
Query 3:
DROP INDEX idx_events_idx
CREATE INDEX idx_events_flutter ON events (c_id, pr_id, pa_id, t DESC, r)
Query 5:
-- query A + index by flutter
explain analyze SELECT
pr_id,
r,
count(1) AS quantity
FROM (
SELECT DISTINCT ON (pr_id, pa_id)
pr_id,
pa_id,
r
FROM events
WHERE
c_id = 5 AND
t >= '2017-01-03Z00:00:00' AND
t < '2017-01-06Z00:00:00'
ORDER BY pr_id, pa_id, t DESC
) latest
GROUP BY
1,
2
ORDER BY 3, 2, 1 DESC
Results:
QUERY PLAN
Sort (cost=2744.82..2745.32 rows=200 width=15) (actual time=20.915..20.916 rows=30 loops=1)
Sort Key: (count(1)), events.r, events.pr_id
Sort Method: quicksort Memory: 27kB
-> HashAggregate (cost=2735.18..2737.18 rows=200 width=15) (actual time=20.883..20.892 rows=30 loops=1)
-> Unique (cost=0.42..2707.20 rows=1599 width=40) (actual time=0.037..16.488 rows=12098 loops=1)
-> Index Only Scan using idx_events_flutter on events (cost=0.42..2627.25 rows=15990 width=40) (actual time=0.036..10.893 rows=15966 loops=1)
Index Cond: ((c_id = 5) AND (t >= '2017-01-03 00:00:00'::timestamp without time zone) AND (t < '2017-01-06 00:00:00'::timestamp without time zone))
Heap Fetches: 0
Total runtime: 20.964 ms
Just two different methods(YMMV):
-- using a window finction to find the record with the most recent t::
EXPLAIN ANALYZE
SELECT pr_id, r, count(1) AS quantity
FROM (
SELECT DISTINCT ON (pr_id, pa_id)
pr_id, pa_id,
first_value(r) OVER www AS r
-- last_value(r) OVER www AS r
FROM events
WHERE c_id = 5
AND t >= '2017-01-03Z00:00:00'
AND t < '2017-01-06Z00:00:00'
WINDOW www AS (PARTITION BY pr_id, pa_id ORDER BY t DESC)
ORDER BY 1, 2, t DESC
) sss
GROUP BY 1, 2
ORDER BY 3, 2, 1 DESC
;
-- Avoiding the window function; find the MAX via NOT EXISTS() ::
EXPLAIN ANALYZE
SELECT pr_id, r, count(1) AS quantity
FROM (
SELECT DISTINCT ON (pr_id, pa_id)
pr_id, pa_id, r
FROM events e
WHERE c_id = 5
AND t >= '2017-01-03Z00:00:00'
AND t < '2017-01-06Z00:00:00'
AND NOT EXISTS ( SELECT * FROM events nx
WHERE nx.c_id = 5 AND nx.pr_id =e.pr_id AND nx.pa_id =e.pa_id
AND nx.t >= '2017-01-03Z00:00:00'
AND nx.t < '2017-01-06Z00:00:00'
AND nx.t > e.t
)
) sss
GROUP BY 1, 2
ORDER BY 3, 2, 1 DESC
;
Note: the DISTINCT ON can be omitted from the second query, the results are already unique.
I'd try to use a standard ROW_NUMBER() function with a matching index instead of Postgres-specific DISTINCT ON to find the "latest" rows.
Index
CREATE INDEX ix_events ON events USING btree (c_id, pa_id, pr_id, t DESC, r);
Query
WITH
CTE_RN
AS
(
SELECT
pa_id
,pr_id
,r
,ROW_NUMBER() OVER (PARTITION BY c_id, pa_id, pr_id ORDER BY t DESC) AS rn
FROM events
WHERE
c_id = 5
AND t >= '2017-01-03Z00:00:00'
AND t < '2017-01-06Z00:00:00'
)
SELECT
pr_id
,r
,COUNT(*) AS quantity
FROM CTE_RN
WHERE rn = 1
GROUP BY
pr_id
,r
ORDER BY quantity, r, pr_id DESC
;
I don't have Postgres at hand, so I'm using http://rextester.com for testing. I set the scale_factor to 30 in the data generation script, otherwise it takes too long for rextester. I'm getting the following query plan. The time component should be ignored, but you can see that there are no intermediate sorts, only the sort for the final ORDER BY. See http://rextester.com/GUFXY36037
Please try this query on your hardware and your data. It would be interesting to see how it compares to your query. I noticed that optimizer doesn't choose this index if the table has the index that you defined. If you see the same on your server, please try to drop or disable other indexes to get the plan that I got.
1 Sort (cost=158.07..158.08 rows=1 width=44) (actual time=81.445..81.448 rows=30 loops=1)
2 Output: cte_rn.pr_id, cte_rn.r, (count(*))
3 Sort Key: (count(*)), cte_rn.r, cte_rn.pr_id DESC
4 Sort Method: quicksort Memory: 27kB
5 CTE cte_rn
6 -> WindowAgg (cost=0.42..157.78 rows=12 width=88) (actual time=0.204..56.215 rows=15130 loops=1)
7 Output: events.pa_id, events.pr_id, events.r, row_number() OVER (?), events.t, events.c_id
8 -> Index Only Scan using ix_events3 on public.events (cost=0.42..157.51 rows=12 width=80) (actual time=0.184..28.688 rows=15130 loops=1)
9 Output: events.c_id, events.pa_id, events.pr_id, events.t, events.r
10 Index Cond: ((events.c_id = 5) AND (events.t >= '2017-01-03 00:00:00'::timestamp without time zone) AND (events.t < '2017-01-06 00:00:00'::timestamp without time zone))
11 Heap Fetches: 15130
12 -> HashAggregate (cost=0.28..0.29 rows=1 width=44) (actual time=81.363..81.402 rows=30 loops=1)
13 Output: cte_rn.pr_id, cte_rn.r, count(*)
14 Group Key: cte_rn.pr_id, cte_rn.r
15 -> CTE Scan on cte_rn (cost=0.00..0.27 rows=1 width=36) (actual time=0.214..72.841 rows=11491 loops=1)
16 Output: cte_rn.pa_id, cte_rn.pr_id, cte_rn.r, cte_rn.rn
17 Filter: (cte_rn.rn = 1)
18 Rows Removed by Filter: 3639
19 Planning time: 0.452 ms
20 Execution time: 83.234 ms
There is one more optimisation you could do that relies on the external knowledge of your data.
If you can guarantee that each pair of pa_id, pr_id has values for each, say, day, then you can safely reduce the user-defined range of t to just one day.
This will reduce the number of rows that engine reads and sorts if user usually specifies range of t longer than 1 day.
If you can't provide this kind of guarantee in your data for all values, but you still know that usually all pa_id, pr_id are close to each other (by t) and user usually provides a wide range for t, you can run a preliminary query to narrow down the range of t for the main query.
Something like this:
SELECT
MIN(MaxT) AS StartT
MAX(MaxT) AS EndT
FROM
(
SELECT
pa_id
,pr_id
,MAX(t) AS MaxT
FROM events
WHERE
c_id = 5
AND t >= '2017-01-03Z00:00:00'
AND t < '2017-01-06Z00:00:00'
GROUP BY
pa_id
,pr_id
) AS T
And then use the found StartT,EndT in the main query hoping that new range would be much narrower than original defined by the user.
The query above doesn't have to sort rows, so it should be fast. The main query has to sort rows, but there will be less rows to sort, so overall run-time may be better.
So I've taken a little bit of a tack and tried moving your grouping and distinct data into their owns tables, so that we can leverage multiple table indexes. Note that this solution only works if you have control over the way data gets inserted into the database, i.e. you can change the data source application. If not, alas this is moot.
In practice, instead of inserting into the events table immediately, you would first check if the relational date and prpa exist in their relevant tables. If not, create them. Then get their ids and use that for your insert statement to the events table.
Before I start, I was generating a 10x increase in performance on query_c over query_a, and my final result for the rewritten query_a is about a 4x performance. If that's not good enough, feel free to switch off.
Given the initial data seeding queries you gave in the first instance, I calculated the following benchmarks:
query_a: 5228.518 ms
query_b: 5708.962 ms
query_c: 538.329 ms
So, about a 10x increase in performance, give or take.
I'm going to alter the data that's generated in events, and this alteration takes quite a while. You would not need to do this in practice, as your INSERTs to the tables would be covered already.
For my optimisation, the first step is to create a table that houses dates and then transfer the data over, and relate back to it in the events table, like so:
CREATE TABLE dates (
id SERIAL,
year_part INTEGER NOT NULL,
month_part INTEGER NOT NULL,
day_part INTEGER NOT NULL
);
-- Total runtime: 8.281 ms
INSERT INTO dates(year_part, month_part, day_part) SELECT DISTINCT
EXTRACT(YEAR FROM t), EXTRACT(MONTH FROM t), EXTRACT(DAY FROM t)
FROM events;
-- Total runtime: 12802.900 ms
CREATE INDEX dates_ymd ON dates USING btree(year_part, month_part, day_part);
-- Total runtime: 13.750 ms
ALTER TABLE events ADD COLUMN date_id INTEGER;
-- Total runtime: 2.468ms
UPDATE events SET date_id = dates.id
FROM dates
WHERE EXTRACT(YEAR FROM t) = dates.year_part
AND EXTRACT(MONTH FROM t) = dates.month_part
AND EXTRACT(DAY FROM T) = dates.day_part
;
-- Total runtime: 388024.520 ms
Next, we do the same, but with the key pair (pr_id, pa_id), which doesn't reduce the cardinality too much, but when we're talking large sets it can help with memory usage and swapping in and out:
CREATE TABLE prpa (
id SERIAL,
pr_id TEXT NOT NULL,
pa_id TEXT NOT NULL
);
-- Total runtime: 5.451 ms
CREATE INDEX events_prpa ON events USING btree(pr_id, pa_id);
-- Total runtime: 218,908.894 ms
INSERT INTO prpa(pr_id, pa_id) SELECT DISTINCT pr_id, pa_id FROM events;
-- Total runtime: 5566.760 ms
CREATE INDEX prpa_idx ON prpa USING btree(pr_id, pa_id);
-- Total runtime: 84185.057 ms
ALTER TABLE events ADD COLUMN prpa_id INTEGER;
-- Total runtime: 2.067 ms
UPDATE events SET prpa_id = prpa.id
FROM prpa
WHERE events.pr_id = prpa.pr_id
AND events.pa_id = prpa.pa_id;
-- Total runtime: 757915.192
DROP INDEX events_prpa;
-- Total runtime: 1041.556 ms
Finally, let's get rid of the old indexes and the now defunct columns, and then vacuum up the new tables:
DROP INDEX events_idx;
-- Total runtime: 1139.508 ms
ALTER TABLE events
DROP COLUMN pr_id,
DROP COLUMN pa_id
;
-- Total runtime: 5.376 ms
VACUUM ANALYSE prpa;
-- Total runtime: 1030.142
VACUUM ANALYSE dates;
-- Total runtime: 6652.151
So we now have the following tables:
events (c_id, r, t, prpa_id, date_id)
dates (id, year_part, month_part, day_part)
prpa (id, pr_id, pa_id)
Let's toss on an index now, pushing t DESC to the end where it belongs, which we can do now because we're filtering results on dates before ORDERing, which cuts down the need for t DESC to be so prominent in the index:
CREATE INDEX events_idx_new ON events USING btree (c_id, date_id, prpa_id, t DESC);
-- Total runtime: 27697.795
VACUUM ANALYSE events;
Now we rewrite the query, (using a table to store intermediary results - I find this works well with large datasets) and awaaaaaay we go!
DROP TABLE IF EXISTS temp_results;
SELECT DISTINCT ON (prpa_id)
prpa_id,
r
INTO TEMPORARY temp_results
FROM events
INNER JOIN dates
ON dates.id = events.date_id
WHERE c_id = 5
AND dates.year_part BETWEEN 2017 AND 2017
AND dates.month_part BETWEEN 1 AND 1
AND dates.day_part BETWEEN 3 AND 5
ORDER BY prpa_id, t DESC;
SELECT
prpa.pr_id,
r,
count(1) AS quantity
FROM temp_results
INNER JOIN prpa ON prpa.id = temp_results.prpa_id
GROUP BY
1,
2
ORDER BY 3, 2, 1 DESC;
-- Total runtime: 1233.281 ms
So, not a 10x increase in performance, but 4x which is still alright.
This solution is a combination of a couple of techniques I have found work well with large datasets and with date ranges. Even if it's not good enough for your purposes, there might be some gems in here you can repurpose throughout your career.
EDIT:
EXPLAIN ANALYSE on SELECT INTO query:
Unique (cost=171839.95..172360.53 rows=51332 width=16) (actual time=819.385..857.777 rows=117471 loops=1)
-> Sort (cost=171839.95..172100.24 rows=104117 width=16) (actual time=819.382..836.924 rows=155202 loops=1)
Sort Key: events.prpa_id, events.t
Sort Method: external sort Disk: 3944kB
-> Hash Join (cost=14340.24..163162.92 rows=104117 width=16) (actual time=126.929..673.293 rows=155202 loops=1)
Hash Cond: (events.date_id = dates.id)
-> Bitmap Heap Scan on events (cost=14338.97..160168.28 rows=520585 width=20) (actual time=126.572..575.852 rows=516503 loops=1)
Recheck Cond: (c_id = 5)
Heap Blocks: exact=29610
-> Bitmap Index Scan on events_idx2 (cost=0.00..14208.82 rows=520585 width=0) (actual time=118.769..118.769 rows=516503 loops=1)
Index Cond: (c_id = 5)
-> Hash (cost=1.25..1.25 rows=2 width=4) (actual time=0.326..0.326 rows=3 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Seq Scan on dates (cost=0.00..1.25 rows=2 width=4) (actual time=0.320..0.323 rows=3 loops=1)
Filter: ((year_part >= 2017) AND (year_part <= 2017) AND (month_part >= 1) AND (month_part <= 1) AND (day_part >= 3) AND (day_part <= 5))
Rows Removed by Filter: 7
Planning time: 3.091 ms
Execution time: 913.543 ms
EXPLAIN ANALYSE on SELECT query:
(Note: I had to alter the first query to select into an actual table, not temporary table, on order to get the query plan for this one. AFAIK EXPLAIN ANALYSE only works on single queries)
Sort (cost=89590.66..89595.66 rows=2000 width=15) (actual time=1248.535..1248.537 rows=30 loops=1)
Sort Key: (count(1)), temp_results.r, prpa.pr_id
Sort Method: quicksort Memory: 27kB
-> HashAggregate (cost=89461.00..89481.00 rows=2000 width=15) (actual time=1248.460..1248.468 rows=30 loops=1)
Group Key: prpa.pr_id, temp_results.r
-> Hash Join (cost=73821.20..88626.40 rows=111280 width=15) (actual time=798.861..1213.494 rows=117471 loops=1)
Hash Cond: (temp_results.prpa_id = prpa.id)
-> Seq Scan on temp_results (cost=0.00..1632.80 rows=111280 width=8) (actual time=0.024..17.401 rows=117471 loops=1)
-> Hash (cost=36958.31..36958.31 rows=2120631 width=15) (actual time=798.484..798.484 rows=2120631 loops=1)
Buckets: 16384 Batches: 32 Memory Usage: 3129kB
-> Seq Scan on prpa (cost=0.00..36958.31 rows=2120631 width=15) (actual time=0.126..350.664 rows=2120631 loops=1)
Planning time: 1.073 ms
Execution time: 1248.660 ms