Query for timestampz field with index 3 times slowly then without - sql

I have integration_event table with ~300k db entries in postgres.
Indexes added for:
pk,
committed_date_time (timestamptz) - btree,
response_body(text) + response_business_description(varchar 5000) + response_business_code(varchar100) + response_headers(text) - gin (gin_trgm_ops).
This EF core generated query completed in 10 seconds:
SELECT i.committed_date_time AS "CommittedDateTime", i.external_object_id AS "ExternalObjectId", i.external_system AS "ExternalSystem", i.id AS "Id", i.internal_object_id AS "InternalObjectId", i.is_successful_request AS "IsSuccessfulRequest", i.object_type AS "ObjectType", i.request_direction AS "RequestDirection", i.request_http_method AS "RequestHttpMethod", i.request_url AS "RequestUrl", i.response_business_code AS "ResponseBusinessCode", i.response_business_description AS "ResponseBusinessDescription", i.response_http_code AS "ResponseHttpCode", i.task_id AS "TaskId"
FROM events.integration_event AS i
WHERE (i.committed_date_time >= '2021-12-14 03:00:00.000 +0300') AND (((((('00294961' = '') OR
(strpos(lower(i.response_body), '00294961') > 0)) OR (('00294961' = '') OR
(strpos(lower(i.response_business_code), '00294961') > 0))) OR (('00294961' = '') OR
(strpos(lower(i.response_business_description), '00294961') > 0))) OR (('00294961' = '') OR
(strpos(lower(i.response_headers), '00294961') > 0)))
ORDER BY i.committed_date_time DESC
LIMIT 10 OFFSET 0
In execution plan I see only committed_date_time index using.
If I delete this index, no one used and query completed in ~3 seconds.
If I delete multicolumn index nothing changes.
Explain without any indexes:
QUERY PLAN
Limit (cost=64058.96..64060.13 rows=10 width=276) (actual time=2810.118..2815.614 rows=0 loops=1)
Output: committed_date_time, external_object_id, external_system, id, internal_object_id, is_successful_request, object_type, request_direction, request_http_method, request_url, response_business_code, response_business_description, response_http_code, task_id
Buffers: shared hit=113016 read=92539 dirtied=239 written=68
-> Gather Merge (cost=64058.96..79140.58 rows=129262 width=276) (actual time=2810.117..2815.612 rows=0 loops=1)
Output: committed_date_time, external_object_id, external_system, id, internal_object_id, is_successful_request, object_type, request_direction, request_http_method, request_url, response_business_code, response_business_description, response_http_code, task_id
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=113016 read=92539 dirtied=239 written=68
-> Sort (cost=63058.94..63220.52 rows=64631 width=276) (actual time=2794.379..2794.380 rows=0 loops=3)
Output: committed_date_time, external_object_id, external_system, id, internal_object_id, is_successful_request, object_type, request_direction, request_http_method, request_url, response_business_code, response_business_description, response_http_code, task_id
Sort Key: i.committed_date_time DESC
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=113016 read=92539 dirtied=239 written=68
Worker 0: actual time=2786.424..2786.425 rows=0 loops=1
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=36144 read=30162 dirtied=52 written=13
Worker 1: actual time=2786.908..2786.909 rows=0 loops=1
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=36415 read=29682 dirtied=77 written=19
-> Parallel Seq Scan on events.integration_event i (cost=0.00..61662.29 rows=64631 width=276) (actual time=2794.338..2794.338 rows=0 loops=3)
Output: committed_date_time, external_object_id, external_system, id, internal_object_id, is_successful_request, object_type, request_direction, request_http_method, request_url, response_business_code, response_business_description, response_http_code, task_id
Filter: ((i.committed_date_time >= '2021-12-14 03:00:00+03'::timestamp with time zone) AND ((strpos(lower(i.response_body), '00294961'::text) > 0) OR (strpos(lower((i.response_business_code)::text), '00294961'::text) > 0) OR (strpos(lower((i.response_business_description)::text), '00294961'::text) > 0) OR (strpos(lower(i.response_headers), '00294961'::text) > 0)))
Rows Removed by Filter: 64464
Buffers: shared hit=112940 read=92539 dirtied=239 written=68
Worker 0: actual time=2786.353..2786.354 rows=0 loops=1
Buffers: shared hit=36106 read=30162 dirtied=52 written=13
Worker 1: actual time=2786.858..2786.858 rows=0 loops=1
Buffers: shared hit=36377 read=29682 dirtied=77 written=19
Planning Time: 0.070 ms
Execution Time: 2815.637 ms
query plan without indexes
With Indexes
QUERY PLAN
Limit (cost=0.42..15.67 rows=10 width=276) (actual time=7687.437..7687.438 rows=0 loops=1)
Output: committed_date_time, external_object_id, external_system, id, internal_object_id, is_successful_request, object_type, request_direction, request_http_method, request_url, response_business_code, response_business_description, response_http_code, task_id
Buffers: shared hit=168221 read=152857
-> Index Scan Backward using ix_commited_date_time on events.integration_event i (cost=0.42..236678.30 rows=155194 width=276) (actual time=7687.436..7687.436 rows=0 loops=1)
Output: committed_date_time, external_object_id, external_system, id, internal_object_id, is_successful_request, object_type, request_direction, request_http_method, request_url, response_business_code, response_business_description, response_http_code, task_id
Index Cond: (i.committed_date_time >= '2021-12-14 03:00:00+03'::timestamp with time zone)
Filter: ((strpos(lower(i.response_body), '00294961'::text) > 0) OR (strpos(lower((i.response_business_code)::text), '00294961'::text) > 0) OR (strpos(lower((i.response_business_description)::text), '00294961'::text) > 0) OR (strpos(lower(i.response_headers), '00294961'::text) > 0))
Rows Removed by Filter: 193323
Buffers: shared hit=168221 read=152857
Planning:
Buffers: shared hit=1 read=3
Planning Time: 0.142 ms
Execution Time: 7687.460 ms
query plan with indexes
First question - why its slowly?
Second - why multicolumn index not used?
Third - is there any way to multicolumn gin index includes timestamptz column?
I try another index types for committed_date_time, gin index only for response_body(largest text)

Related

Can I put "SELECT result with multiple rows and multiple column" as function parameters?

I want to write a common function to convert SELECT/table result as text/jsonb
Convert
SELECT COALESCE(jsonb_agg(tmp)::text, '[]') FROM (
SELECT id, balance FROM student LIMIT 5) AS tmp
to
SELECT my_to_json_string((SELECT id, balance FROM student LIMIT 5));
Expect:
[{"id": 21543, "balance": 80}, {"id": 21542, "balance": 100}, {"id": 21541, "balance": 5980}, {"id": 21540, "balance": 10}, {"id": 21539, "balance": 15}]`
Can I?
If not, what should I do to avoid repeat COALESCE(jsonb_agg(tmp)::text, '[]') FROM (...) as tmp so often?
CREATE OR REPLACE FUNCTION build_details( mysqlquery text )
RETURNS jsonb AS $$
DECLARE
c jsonb;
BEGIN
EXECUTE format('SELECT COALESCE(jsonb_agg(tmp)::text, ''[]'') FROM (%s) as tmp', mysqlquery) INTO c;
RETURN c;
END;
$$ LANGUAGE plpgsql;
To call this pass sql as a string
select * from build_details('SELECT * FROM foo LIMIT 5');
build_details
------------------------------------------
[{"i": 1, "s": "f"}, {"i": 2, "s": "s"}]
(1 row)
Answering the question in comments regarding performance.
Yes, it would be slower.
To measure it, first turn on the nested statement analyzer.
load 'auto_explain';
set auto_explain.log_min_duration=0;
set auto_explain.log_nested_statements=ON;
SET auto_explain.log_analyze = true;
set client_min_messages=DEBUG;
Now let's run it.
explain analyze SELECT COALESCE(jsonb_agg(tmp)::text, '[]') FROM ( SELECT * FROM foo LIMIT 5) as tmp;
LOG: duration: 0.030 ms plan:
Query Text: explain analyze SELECT COALESCE(jsonb_agg(tmp)::text, '[]') FROM ( SELECT * FROM foo LIMIT 5) as tmp;
Aggregate (cost=0.15..0.17 rows=1 width=32) (actual time=0.029..0.029 rows=1 loops=1)
-> Subquery Scan on tmp (cost=0.00..0.14 rows=5 width=60) (actual time=0.008..0.009 rows=2 loops=1)
-> Limit (cost=0.00..0.09 rows=5 width=36) (actual time=0.006..0.006 rows=2 loops=1)
-> Seq Scan on foo (cost=0.00..22.70 rows=1270 width=36) (actual time=0.005..0.005 rows=2 loops=1)
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Aggregate (cost=0.15..0.17 rows=1 width=32) (actual time=0.029..0.029 rows=1 loops=1)
-> Subquery Scan on tmp (cost=0.00..0.14 rows=5 width=60) (actual time=0.008..0.009 rows=2 loops=1)
-> Limit (cost=0.00..0.09 rows=5 width=36) (actual time=0.006..0.006 rows=2 loops=1)
-> Seq Scan on foo (cost=0.00..22.70 rows=1270 width=36) (actual time=0.005..0.005 rows=2 loops=1)
Planning time: 0.042 ms
Execution time: 0.143 ms
(6 rows)
vs
explain analyze select * from build_details('SELECT * FROM foo LIMIT 5');
LOG: duration: 0.043 ms plan:
Query Text: SELECT COALESCE(jsonb_agg(tmp)::text, '[]') FROM (SELECT * FROM foo LIMIT 5) as tmp
Aggregate (cost=0.15..0.17 rows=1 width=32) (actual time=0.041..0.041 rows=1 loops=1)
-> Subquery Scan on tmp (cost=0.00..0.14 rows=5 width=60) (actual time=0.007..0.009 rows=2 loops=1)
-> Limit (cost=0.00..0.09 rows=5 width=36) (actual time=0.006..0.007 rows=2 loops=1)
-> Seq Scan on foo (cost=0.00..22.70 rows=1270 width=36) (actual time=0.005..0.006 rows=2 loops=1)
LOG: duration: 0.297 ms plan:
Query Text: explain analyze select * from build_details('SELECT * FROM foo LIMIT 5');
Function Scan on build_details (cost=0.25..0.26 rows=1 width=32) (actual time=0.296..0.296 rows=1 loops=1)
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Function Scan on build_details (cost=0.25..0.26 rows=1 width=32) (actual time=0.296..0.296 rows=1 loops=1)
Planning time: 0.024 ms
Execution time: 0.385 ms
(3 rows)
My guess is that with enough volume the difference will be negligible.
Most of it is coming from the planning.
But you really need to check this out in your case and needs.
Regarding
SELECT my_to_json_string((SELECT id, balance FROM student LIMIT 5));
I think sub-select should return single value, I don't think there's a trick to make PostgreSQL accept this syntax.
It seem to me you can accomplish what you want by introducing a custom aggregate:
create or replace function jsonify_add(jsonb, anyelement) returns jsonb
language sql
as $$
select $1 || to_jsonb($2);
$$;
create or replace function jsonify_final(jsonb) returns text
language sql
as $$
select coalesce($1::text, '[]');
$$;
CREATE AGGREGATE jsonify (anyelement)
(
sfunc = jsonify_add,
stype = jsonb,
finalfunc = jsonify_final,
initcond = '[]'
);
You may use it like:
select jsonify( emp ) from employee as emp;
Returning:
[{"name": "Barry", "number": 1, "surname": "White"}, {"name": "Wayne", "number": 2, "surname": "Black"}]

Delete query on a very large table running extremely slowly SQL

I have this SQL query:
delete from scans
where scandatetime>(current_timestamp - interval '21 days') and
scandatetime <> (select min(tt.scandatetime) from scans tt where tt.imb = scans.imb) and
scandatetime <> (select max(tt.scandatetime) from scans tt where tt.imb = scans.imb)
;
That I use to delete records from the following table:
|imb |scandatetime |status |scanfacilityzip|
+-----------+-------------------+---------+---------------+
|isdijh23452|2020-01-01 13:45:12|Intake |12345 |
|isdijh23452|2020-01-01 13:45:12|Intake |12345 |
|isdijh23452|2020-01-01 19:30:32|Received |12345 |
|isdijh23452|2020-01-02 04:50:22|Confirmed|12345 |
|isdijh23452|2020-01-03 19:32:18|Processed|45867 |
|awgjnh09864|2020-01-01 10:24:16|Intake |84676 |
|awgjnh09864|2020-01-01 19:30:32|Received |84676 |
|awgjnh09864|2020-01-01 19:30:32|Received |84676 |
|awgjnh09864|2020-01-02 02:15:52|Processed|84676 |
such that only 2 records remain per IMB, the one with the minimum scandatetime and the maximum scandatetime. I also limit this so it only performs this operation for records that are less than 3 weeks old. The resultant table looks like this:
|imb |scandatetime |status |scanfacilityzip|
+-----------+-------------------+---------+---------------+
|isdijh23452|2020-01-01 13:45:12|Intake |12345 |
|isdijh23452|2020-01-03 19:32:18|Processed|45867 |
|awgjnh09864|2020-01-01 10:24:16|Intake |84676 |
|awgjnh09864|2020-01-02 02:15:52|Processed|84676 |
This table has a few indexes and has tens of millions of rows, so the query usually takes forever to run. How can I speed this up?
Explain output:
Delete on scans (cost=0.57..115934571.45 rows=10015402 width=6)
-> Index Scan using scans_staging_scandatetime_idx on scans (cost=0.57..115934571.45 rows=10015402 width=6)
Index Cond: (scandatetime > (CURRENT_TIMESTAMP - '21 days'::interval))
Filter: ((scandatetime <> (SubPlan 2)) AND (scandatetime <> (SubPlan 4)))
SubPlan 2
-> Result (cost=3.91..3.92 rows=1 width=8)
InitPlan 1 (returns $1)
-> Limit (cost=0.70..3.91 rows=1 width=8)
-> Index Only Scan using scans_staging_imb_scandatetime_idx on scans tt (cost=0.70..16.79 rows=5 width=8)
Index Cond: ((imb = scans.imb) AND (scandatetime IS NOT NULL))
SubPlan 4
-> Result (cost=3.91..3.92 rows=1 width=8)
InitPlan 3 (returns $3)
-> Limit (cost=0.70..3.91 rows=1 width=8)
-> Index Only Scan Backward using scans_staging_imb_scandatetime_idx on scans tt_1 (cost=0.70..16.79 rows=5 width=8)
Index Cond: ((imb = scans.imb) AND (scandatetime IS NOT NULL))
Table DDL:
-- Table Definition ----------------------------------------------
CREATE TABLE scans (
imb text,
scandatetime timestamp without time zone,
status text,
scanfacilityzip text
);
-- Indices -------------------------------------------------------
CREATE INDEX scans_staging_scandatetime_idx ON scans(scandatetime timestamp_ops);
CREATE INDEX scans_staging_imb_idx ON scans(imb text_ops);
CREATE INDEX scans_staging_status_idx ON scans(status text_ops);
CREATE INDEX scans_staging_scandatetime_status_idx ON scans(scandatetime timestamp_ops,status text_ops);
CREATE INDEX scans_staging_imb_scandatetime_idx ON scans(imb text_ops,scandatetime timestamp_ops);
Edit:
Here is the explain analyze output (note, I changed the interval to 1 day to make it run faster):
Delete on scans (cost=0.58..3325615.74 rows=278811 width=6) (actual time=831562.877..831562.877 rows=0 loops=1)
-> Index Scan using scans_staging_scandatetime_idx on scans (cost=0.58..3325615.74 rows=278811 width=6) (actual time=831562.875..831562.875 rows=0 loops=1)
Index Cond: (scandatetime > (CURRENT_TIMESTAMP - '1 day'::interval))
Filter: ((scandatetime <> (SubPlan 2)) AND (scandatetime <> (SubPlan 4)))
Rows Removed by Filter: 277756
SubPlan 2
-> Result (cost=3.92..3.93 rows=1 width=8) (actual time=1.675..1.675 rows=1 loops=277756)
InitPlan 1 (returns $1)
-> Limit (cost=0.70..3.92 rows=1 width=8) (actual time=1.673..1.674 rows=1 loops=277756)
-> Index Only Scan using scans_staging_imb_scandatetime_idx on scans tt (cost=0.70..16.80 rows=5 width=8) (actual time=1.672..1.672 rows=1 loops=277756)
Index Cond: ((imb = scans.imb) AND (scandatetime IS NOT NULL))
Heap Fetches: 277761
SubPlan 4
-> Result (cost=3.92..3.93 rows=1 width=8) (actual time=0.086..0.086 rows=1 loops=164210)
InitPlan 3 (returns $3)
-> Limit (cost=0.70..3.92 rows=1 width=8) (actual time=0.084..0.085 rows=1 loops=164210)
-> Index Only Scan Backward using scans_staging_imb_scandatetime_idx on scans tt_1 (cost=0.70..16.80 rows=5 width=8) (actual time=0.083..0.083 rows=1 loops=164210)
Index Cond: ((imb = scans.imb) AND (scandatetime IS NOT NULL))
Heap Fetches: 164210
Planning Time: 11.360 ms
Execution Time: 831562.956 ms
EDIT: Result with explain analyze buffers:
Delete on scans (cost=0.57..1274693.83 rows=103787 width=6) (actual time=19309.026..19309.027 rows=0 loops=1)
Buffers: shared hit=743430 read=46033
I/O Timings: read=15917.966
-> Index Scan using scans_staging_scandatetime_idx on scans (cost=0.57..1274693.83 rows=103787 width=6) (actual time=19309.025..19309.025 rows=0 loops=1)
Index Cond: (scandatetime > (CURRENT_TIMESTAMP - '1 day'::interval))
Filter: ((scandatetime <> (SubPlan 2)) AND (scandatetime <> (SubPlan 4)))
Rows Removed by Filter: 74564
Buffers: shared hit=743430 read=46033
I/O Timings: read=15917.966
SubPlan 2
-> Result (cost=4.05..4.06 rows=1 width=8) (actual time=0.232..0.233 rows=1 loops=74564)
Buffers: shared hit=458108 read=27849
I/O Timings: read=15114.478
InitPlan 1 (returns $1)
-> Limit (cost=0.70..4.05 rows=1 width=8) (actual time=0.231..0.231 rows=1 loops=74564)
Buffers: shared hit=458108 read=27849
I/O Timings: read=15114.478
-> Index Only Scan using scans_staging_imb_scandatetime_idx on scans tt (cost=0.70..20.81 rows=6 width=8) (actual time=0.230..0.230 rows=1 loops=74564)
Index Cond: ((imb = scans.imb) AND (scandatetime IS NOT NULL))
Heap Fetches: 74583
Buffers: shared hit=458108 read=27849
I/O Timings: read=15114.478
SubPlan 4
-> Result (cost=4.05..4.06 rows=1 width=8) (actual time=0.042..0.042 rows=1 loops=34497)
Buffers: shared hit=228637 read=701
I/O Timings: read=507.724
InitPlan 3 (returns $3)
-> Limit (cost=0.70..4.05 rows=1 width=8) (actual time=0.041..0.041 rows=1 loops=34497)
Buffers: shared hit=228637 read=701
I/O Timings: read=507.724
-> Index Only Scan Backward using scans_staging_imb_scandatetime_idx on scans tt_1 (cost=0.70..20.81 rows=6 width=8) (actual time=0.040..0.040 rows=1 loops=34497)
Index Cond: ((imb = scans.imb) AND (scandatetime IS NOT NULL))
Heap Fetches: 34497
Buffers: shared hit=228637 read=701
I/O Timings: read=507.724
Planning Time: 5.350 ms
Execution Time: 19313.242 ms
Without the pre-aggregation (and avoiding the CTE):
DELETE FROM scans del
WHERE del.scandatetime > (current_timestamp - interval '21 days')
AND EXISTS (SELECT *
FROM scans x
WHERE x.imb = del.imb
AND x.scandatetime < del.scandatetime
)
AND EXISTS (SELECT *
FROM scans x
WHERE x.imb = del.imb
AND x.scandatetime > del.scandatetime
)
;
The idea is: you only delete if there is (at least) one record before, and (at least) one after it. (with the same imd) This is not true for the first and last records, only the ones inbetween.
Consider running aggregation once and incorporating it in an EXISTS clause.
with agg as (
select imb
, min(sub.scandatetime) as min_dt
, max(sub.scandatetime) as max_dt
from scans
group by imb
)
delete from scans s
where s.scandatetime > (current_timestamp - interval '21 days')
and exists
(select 1
from agg
where s.imb = agg.imb
and (s.scandatetime > agg.min_dt and
s.scandatetime < agg.max_dt)
);
In the request comments you say that the table contains no rows older than 21 days. The condition scandatetime > (current_timestamp - interval '21 days') is hence superfluous. This also means that you delete almost all rows from the table. You only keep one or two rows per imb.
DELETE on so many rows (you mention tens of millions of rows) can be very slow. Not only must the table rows be deleted one by one, but also all the indexes updated.
This said, you may be better off copying those few desired rows into a temporary table, truncate the original table and copy the rows back. TRUNCATE doesn't look at single rows like DELETE does. It simply empties the whole table and its indexes in one go and immediately reclaims disk space.
The script would look something like this:
create table temp_desired_scans as
select *
from scans s
where (imb, scandatetime) in
(
select imb, min(scandatetime) from scans group by imb
union all
select imb, max(scandatetime) from scans group by imb
);
truncate table scans;
insert into scans
select * from temp_desired_scans;
drop table temp_desired_scans;
(Another common option for such mass deletes is to keep the temp table, drop the original table, rename the temp table to the original table's name and install all constraints and indexes on this new table.)
Given the fact that select is the problem, I would focus on just select. You can make a delete from it any time. You may try this form if it helps:
select * from
(select *,
row_number() over (partition by imb order by scandatetime asc) ar,
row_number() over (partition by imb order by scandatetime desc) dr
from scans
)s
where ar>1 and dr>1 and scandatetime>(current_timestamp - interval '21 days')

Postgres Trigrams and ordering verly slow

I am trying to build up a fuzzy search against multiple columns with weightings for each column's distance from the corresponding search term.
I have the following query -
select sf_id
from (
select *
from (
select sf_id ,
(1.0 - cast(coalesce(mailingcity, '') <->> 'san ant' as float)) * 3.0 as score
from contacts
order by score desc
limit 1000
) as mailingcity
union
select *
from (
select sf_id,
(1.0 - cast(coalesce(lastname, '') <->> 'anders' as float)) * 5.0 as score
from contacts
order by score
desc limit 1000
) as lastname
)
as agg
group by sf_id
order by sum(score) desc
with indexes such as the following -
create index contact_lastname_trgrm_idx on contacts using gin (lastname gin_trgm_ops)
on the columns used to match on.
We have ~ 500,000 records in the table and the query is taking over three seconds.
I also have expression indexes on the coalesce function
Is there any way to speed this up?
Result of EXPLAIN -
Sort (cost=212791.05..212791.55 rows=200 width=154) (actual time=3165.154..3165.247 rows=2000 loops=1)
Sort Key: (sum(((('1'::double precision - (((COALESCE(contacts.mailingcity, ''::character varying))::text <->> 'san ant'::text))::double precision) * '3'::double precision)))) DESC
Sort Method: quicksort Memory: 205kB
-> GroupAggregate (cost=212766.41..212783.41 rows=200 width=154) (actual time=3163.855..3164.621 rows=2000 loops=1)
Group Key: contacts.sf_id
-> Sort (cost=212766.41..212771.41 rows=2000 width=154) (actual time=3163.847..3163.966 rows=2000 loops=1)
Sort Key: contacts.sf_id
Sort Method: quicksort Memory: 205kB
-> HashAggregate (cost=212616.75..212636.75 rows=2000 width=154) (actual time=3155.719..3156.055 rows=2000 loops=1)
Group Key: contacts.sf_id, ((('1'::double precision - (((COALESCE(contacts.mailingcity, ''::character varying))::text <->> 'san ant'::text))::double precision) * '3'::double precision))
-> Append (cost=106166.70..212606.75 rows=2000 width=154) (actual time=1629.241..3154.841 rows=2000 loops=1)
-> Limit (cost=106166.70..106283.37 rows=1000 width=27) (actual time=1629.241..1629.798 rows=1000 loops=1)
-> Gather Merge (cost=106166.70..154807.03 rows=416888 width=27) (actual time=1629.239..1629.730 rows=1000 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=105166.68..105687.79 rows=208444 width=27) (actual time=1589.059..1589.232 rows=1021 loops=3)
Sort Key: ((('1'::double precision - (((COALESCE(contacts.mailingcity, ''::character varying))::text <->> 'san ant'::text))::double precision) * '3'::double precision)) DESC
Sort Method: external merge Disk: 7256kB
-> Parallel Seq Scan on contacts (cost=0.00..81763.88 rows=208444 width=27) (actual time=0.145..1405.681 rows=166755 loops=3)
-> Limit (cost=106166.70..106283.37 rows=1000 width=27) (actual time=1524.305..1524.912 rows=1000 loops=1)
-> Gather Merge (cost=106166.70..154807.03 rows=416888 width=27) (actual time=1524.304..1524.842 rows=1000 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=105166.68..105687.79 rows=208444 width=27) (actual time=1455.159..1455.386 rows=1016 loops=3)
Sort Key: ((('1'::double precision - (((COALESCE(contacts_1.lastname, ''::character varying))::text <->> 'anders'::text))::double precision) * '5'::double precision)) DESC
Sort Method: external merge Disk: 7280kB
-> Parallel Seq Scan on contacts contacts_1 (cost=0.00..81763.88 rows=208444 width=27) (actual time=0.373..1290.368 rows=166755 loops=3)
Planning time: 0.855 ms
Execution time: 3218.589 ms
As per the advice from #a_horse_with_no_name below, the following query now runs at ~250ms
select sf_id from (select * from (
select sf_id ,
(1.0 - cast(coalesce(mailingcity, '') <->> 'san ant' as float)) * 3.0
as score
from contacts
where mailingcity % 'san ant'
order by score desc limit 1000) as mailingcity
union all
select * from (
select sf_id,
(1.0 - cast(coalesce(lastname, '') <->> 'anders' as float)) * 5.0 as score
from contacts
where lastname % 'anders'
order by score desc limit 1000) as lastname)
as agg group by sf_id order by sum(score) desc

Tricky postgresql query optimization (distinct row aggregation with ordering)

I have a table of events that has a very similar schema and data distribution as this artificial table that can easily be generated locally:
CREATE TABLE events AS
WITH args AS (
SELECT
300 AS scale_factor, -- feel free to reduce this to speed up local testing
1000 AS pa_count,
1 AS l_count_min,
29 AS l_count_rand,
10 AS c_count,
10 AS pr_count,
3 AS r_count,
'10 days'::interval AS time_range -- edit 2017-05-02: the real data set has years worth of data here, but the query time ranges stay small (a couple days)
)
SELECT
p.c_id,
'ABC'||lpad(p.pa_id::text, 13, '0') AS pa_id,
'abcdefgh-'||((random()*(SELECT pr_count-1 FROM args)+1))::int AS pr_id,
((random()*(SELECT r_count-1 FROM args)+1))::int AS r,
'2017-01-01Z00:00:00'::timestamp without time zone + random()*(SELECT time_range FROM args) AS t
FROM (
SELECT
pa_id,
((random()*(SELECT c_count-1 FROM args)+1))::int AS c_id,
(random()*(SELECT l_count_rand FROM args)+(SELECT l_count_min FROM args))::int AS l_count
FROM generate_series(1, (SELECT pa_count*scale_factor FROM args)) pa_id
) p
JOIN LATERAL (
SELECT generate_series(1, p.l_count)
) l(id) ON (true);
Excerpt from SELECT * FROM events:
What I need is a query that selects all rows for a given c_id in a given time range of t, then filters them down to only include the most recent rows (by t) for each unique pr_id and pa_id combination, and then counts the number of pr_id and r combinations of those rows.
That's a quite a mouthful, so here are 3 SQL queries that I came up with that produce the desired results:
WITH query_a AS (
SELECT
pr_id,
r,
count(1) AS quantity
FROM (
SELECT DISTINCT ON (pr_id, pa_id)
pr_id,
pa_id,
r
FROM events
WHERE
c_id = 5 AND
t >= '2017-01-03Z00:00:00' AND
t < '2017-01-06Z00:00:00'
ORDER BY pr_id, pa_id, t DESC
) latest
GROUP BY
1,
2
ORDER BY 3, 2, 1 DESC
),
query_b AS (
SELECT
pr_id,
r,
count(1) AS quantity
FROM (
SELECT
pr_id,
pa_id,
first_not_null(r ORDER BY t DESC) AS r
FROM events
WHERE
c_id = 5 AND
t >= '2017-01-03Z00:00:00' AND
t < '2017-01-06Z00:00:00'
GROUP BY
1,
2
) latest
GROUP BY
1,
2
ORDER BY 3, 2, 1 DESC
),
query_c AS (
SELECT
pr_id,
r,
count(1) AS quantity
FROM (
SELECT
pr_id,
pa_id,
first_not_null(r) AS r
FROM events
WHERE
c_id = 5 AND
t >= '2017-01-03Z00:00:00' AND
t < '2017-01-06Z00:00:00'
GROUP BY
1,
2
) latest
GROUP BY
1,
2
ORDER BY 3, 2, 1 DESC
)
And here is the custom aggregate function used by query_b and query_c, as well as what I believe to be the most optimal index, settings and conditions:
CREATE FUNCTION first_not_null_agg(before anyelement, value anyelement) RETURNS anyelement
LANGUAGE sql IMMUTABLE STRICT
AS $_$
SELECT $1;
$_$;
CREATE AGGREGATE first_not_null(anyelement) (
SFUNC = first_not_null_agg,
STYPE = anyelement
);
CREATE INDEX events_idx ON events USING btree (c_id, t DESC, pr_id, pa_id, r);
VACUUM ANALYZE events;
SET work_mem='128MB';
My dilemma is that query_c outperforms query_a and query_b by a factor of > 6x, but is technically not guaranteed to produce the same result as the other queries (notice the missing ORDER BY in the first_not_null aggregate). However, in practice it seems to pick a query plan that I believe to be correct and most optimal.
Below are the EXPLAIN (ANALYZE, VERBOSE) outputs for all 3 queries on my local machine:
query_a:
CTE Scan on query_a (cost=25810.77..26071.25 rows=13024 width=44) (actual time=3329.921..3329.934 rows=30 loops=1)
Output: query_a.pr_id, query_a.r, query_a.quantity
CTE query_a
-> Sort (cost=25778.21..25810.77 rows=13024 width=23) (actual time=3329.918..3329.921 rows=30 loops=1)
Output: events.pr_id, events.r, (count(1))
Sort Key: (count(1)), events.r, events.pr_id DESC
Sort Method: quicksort Memory: 27kB
-> HashAggregate (cost=24757.86..24888.10 rows=13024 width=23) (actual time=3329.849..3329.892 rows=30 loops=1)
Output: events.pr_id, events.r, count(1)
Group Key: events.pr_id, events.r
-> Unique (cost=21350.90..22478.71 rows=130237 width=40) (actual time=3168.656..3257.299 rows=116547 loops=1)
Output: events.pr_id, events.pa_id, events.r, events.t
-> Sort (cost=21350.90..21726.83 rows=150375 width=40) (actual time=3168.655..3209.095 rows=153795 loops=1)
Output: events.pr_id, events.pa_id, events.r, events.t
Sort Key: events.pr_id, events.pa_id, events.t DESC
Sort Method: quicksort Memory: 18160kB
-> Index Only Scan using events_idx on public.events (cost=0.56..8420.00 rows=150375 width=40) (actual time=0.038..101.584 rows=153795 loops=1)
Output: events.pr_id, events.pa_id, events.r, events.t
Index Cond: ((events.c_id = 5) AND (events.t >= '2017-01-03 00:00:00'::timestamp without time zone) AND (events.t < '2017-01-06 00:00:00'::timestamp without time zone))
Heap Fetches: 0
Planning time: 0.316 ms
Execution time: 3331.082 ms
query_b:
CTE Scan on query_b (cost=67140.75..67409.53 rows=13439 width=44) (actual time=3761.077..3761.090 rows=30 loops=1)
Output: query_b.pr_id, query_b.r, query_b.quantity
CTE query_b
-> Sort (cost=67107.15..67140.75 rows=13439 width=23) (actual time=3761.074..3761.081 rows=30 loops=1)
Output: events.pr_id, (first_not_null(events.r ORDER BY events.t DESC)), (count(1))
Sort Key: (count(1)), (first_not_null(events.r ORDER BY events.t DESC)), events.pr_id DESC
Sort Method: quicksort Memory: 27kB
-> HashAggregate (cost=66051.24..66185.63 rows=13439 width=23) (actual time=3760.997..3761.049 rows=30 loops=1)
Output: events.pr_id, (first_not_null(events.r ORDER BY events.t DESC)), count(1)
Group Key: events.pr_id, first_not_null(events.r ORDER BY events.t DESC)
-> GroupAggregate (cost=22188.98..63699.49 rows=134386 width=32) (actual time=2961.471..3671.669 rows=116547 loops=1)
Output: events.pr_id, events.pa_id, first_not_null(events.r ORDER BY events.t DESC)
Group Key: events.pr_id, events.pa_id
-> Sort (cost=22188.98..22578.94 rows=155987 width=40) (actual time=2961.436..3012.440 rows=153795 loops=1)
Output: events.pr_id, events.pa_id, events.r, events.t
Sort Key: events.pr_id, events.pa_id
Sort Method: quicksort Memory: 18160kB
-> Index Only Scan using events_idx on public.events (cost=0.56..8734.27 rows=155987 width=40) (actual time=0.038..97.336 rows=153795 loops=1)
Output: events.pr_id, events.pa_id, events.r, events.t
Index Cond: ((events.c_id = 5) AND (events.t >= '2017-01-03 00:00:00'::timestamp without time zone) AND (events.t < '2017-01-06 00:00:00'::timestamp without time zone))
Heap Fetches: 0
Planning time: 0.385 ms
Execution time: 3761.852 ms
query_c:
CTE Scan on query_c (cost=51400.06..51660.54 rows=13024 width=44) (actual time=524.382..524.395 rows=30 loops=1)
Output: query_c.pr_id, query_c.r, query_c.quantity
CTE query_c
-> Sort (cost=51367.50..51400.06 rows=13024 width=23) (actual time=524.380..524.384 rows=30 loops=1)
Output: events.pr_id, (first_not_null(events.r)), (count(1))
Sort Key: (count(1)), (first_not_null(events.r)), events.pr_id DESC
Sort Method: quicksort Memory: 27kB
-> HashAggregate (cost=50347.14..50477.38 rows=13024 width=23) (actual time=524.311..524.349 rows=30 loops=1)
Output: events.pr_id, (first_not_null(events.r)), count(1)
Group Key: events.pr_id, first_not_null(events.r)
-> HashAggregate (cost=46765.62..48067.99 rows=130237 width=32) (actual time=401.480..459.962 rows=116547 loops=1)
Output: events.pr_id, events.pa_id, first_not_null(events.r)
Group Key: events.pr_id, events.pa_id
-> Index Only Scan using events_idx on public.events (cost=0.56..8420.00 rows=150375 width=32) (actual time=0.027..109.459 rows=153795 loops=1)
Output: events.c_id, events.t, events.pr_id, events.pa_id, events.r
Index Cond: ((events.c_id = 5) AND (events.t >= '2017-01-03 00:00:00'::timestamp without time zone) AND (events.t < '2017-01-06 00:00:00'::timestamp without time zone))
Heap Fetches: 0
Planning time: 0.296 ms
Execution time: 525.566 ms
Broadly speaking, I believe that the index above should allow query_a and query_b to be executed without the Sort nodes that slow them down, but so far I've failed to convince the postgres query optimizer to do my bidding.
I'm also somewhat confused about the t column not being included in the Sort key for query_b, considering that quicksort is not stable. It seems like this could yield the wrong results.
I've verified that all 3 queries generate the same results running the following queries and verifying they produce an empty result set:
SELECT * FROM query_a
EXCEPT
SELECT * FROM query_b;
and
SELECT * FROM query_a
EXCEPT
SELECT * FROM query_c;
I'd consider query_a to be the canonical query when in doubt.
I greatly appreciate any input on this. I've actually found a terribly hacky workaround to achieve acceptable performance in my application, but this problem continues to hunt me in my sleep (and in fact vacation, which I'm currently on) ... 😬.
FWIW, I've looked at many similar questions and answers which have guided my current thinking, but I believe there is something unique about the two column grouping (pr_id, pa_id) and having to sort by a 3rd column (t) that doesn't make this a duplicate question.
Edit: The outer queries in the example may be entirely irrelevant to the question, so feel free to ignore them if it helps.
I'd consider query_a to be the canonical query when in doubt.
I found a way to make query_a half a second fast.
Your inner query from query_a
SELECT DISTINCT ON (pr_id, pa_id)
needs to go with
ORDER BY pr_id, pa_id, t DESC
especially with pr_id and pa_id listed first.
c_id = 5 is const, but you cannot use your index event_idx (c_id, t DESC, pr_id, pa_id, r), because the columns are not organized by (pr_id, pa_id, t DESC), which your ORDER BY clause demands.
If you had an index on at least (pr_id, pa_id, t DESC) then the sorting does not have to happen, because the ORDER BY condition aligns with this index.
So here is what I did.
CREATE INDEX events_idx2 ON events (c_id, pr_id, pa_id, t DESC, r);
This index can be used by your inner query - at least in theory.
Unfortunately the query planner thinks that it's better to reduce the number of rows by using index events_idx with c_id and x <= t < y.
Postgres does not have index hints, so we need a different way to convince the query planner to take the new index events_idx2.
One way to force the use of events_idx2 is to make the other index more expensive.
This can be done by removing the last column r from events_idx and make it unusable for query_a (at least unusable without loading the pages from the heap).
It is counter-intuitive to move the t column later in the index layout, because usually the first columns will be chosen for = and ranges, which c_id and t qualify well for.
However, your ORDER BY (pr_id, pa_id, t DESC) mandates at least this subset as-is in your index. Of course we still put the c_id first to reduce the rows as soon as possible.
You can still have an index on (c_id, t DESC, pr_id, pa_id), if you need, but it cannot be used in query_a.
Here is the query plan for query_a with events_idx2 used and events_idx deleted.
Look for events_c_id_pr_id_pa_id_t_r_idx, which is how PG names indices automatically, when you don't give them a name.
I like it this way, because I can see the order of the columns in the name of the index in every query plan.
Sort (cost=30076.71..30110.75 rows=13618 width=23) (actual time=426.898..426.914 rows=30 loops=1)
Sort Key: (count(1)), events.r, events.pr_id DESC
Sort Method: quicksort Memory: 27kB
-> HashAggregate (cost=29005.43..29141.61 rows=13618 width=23) (actual time=426.820..426.859 rows=30 loops=1)
Group Key: events.pr_id, events.r
-> Unique (cost=0.56..26622.33 rows=136177 width=40) (actual time=0.037..328.828 rows=117204 loops=1)
-> Index Only Scan using events_c_id_pr_id_pa_id_t_r_idx on events (cost=0.56..25830.50 rows=158366 width=40) (actual time=0.035..178.594 rows=154940 loops=1)
Index Cond: ((c_id = 5) AND (t >= '2017-01-03 00:00:00'::timestamp without time zone) AND (t < '2017-01-06 00:00:00'::timestamp without time zone))
Heap Fetches: 0
Planning time: 0.201 ms
Execution time: 427.017 ms
(11 Zeilen)
The planning is instantaneously and the performance is sub second, because the index matches the ORDER BY of the inner query.
With good performance on query_a there is no need for an additional function to make alternative queries query_b and query_c faster.
Remarks:
Somehow I could not find a primary key in your relation.
The aforementioned proposed solution works without any primary key assumption.
I still think that you have some primary key, but maybe forgot to mention it.
The natural key is pa_id. Each pa_id refers to "a thing" that has
~1...30 events recorded about it.
If pa_id is in relation to multiple c_id's, then pa_id alone cannot be key.
If pr_id and r are data, then maybe (c_id, pa_id, t) is unique key?
Also your index events_idx is not created unique, but spans all columns of the relation, so you could have multiple equal rows - do you want to allow that?
If you really need both indices events_idx and the proposed events_idx2, then you will have the data stored 3 times in total (twice in indices, once on the heap).
Since this really is a tricky query optimization, I kindly ask you to at least consider adding a bounty for whoever answers your question, also since it has been sitting on SO without answer for quite some time.
EDIT A
I inserted another set of data using your excellently generated setup above, basically doubling the number of rows.
The dates started from '2017-01-10' this time.
All other parameters stayed the same.
Here is a partial index on the time attribute and it's query behaviour.
CREATE INDEX events_timerange ON events (c_id, pr_id, pa_id, t DESC, r) WHERE '2017-01-03' <= t AND t < '2017-01-06';
Sort (cost=12510.07..12546.55 rows=14591 width=23) (actual time=361.579..361.595 rows=30 loops=1)
Sort Key: (count(1)), events.r, events.pr_id DESC
Sort Method: quicksort Memory: 27kB
-> HashAggregate (cost=11354.99..11500.90 rows=14591 width=23) (actual time=361.503..361.543 rows=30 loops=1)
Group Key: events.pr_id, events.r
-> Unique (cost=0.55..8801.60 rows=145908 width=40) (actual time=0.026..265.084 rows=118571 loops=1)
-> Index Only Scan using events_timerange on events (cost=0.55..8014.70 rows=157380 width=40) (actual time=0.024..115.265 rows=155800 loops=1)
Index Cond: (c_id = 5)
Heap Fetches: 0
Planning time: 0.214 ms
Execution time: 361.692 ms
(11 Zeilen)
Without the index events_timerange (that's the regular full index).
Sort (cost=65431.46..65467.93 rows=14591 width=23) (actual time=472.809..472.824 rows=30 loops=1)
Sort Key: (count(1)), events.r, events.pr_id DESC
Sort Method: quicksort Memory: 27kB
-> HashAggregate (cost=64276.38..64422.29 rows=14591 width=23) (actual time=472.732..472.776 rows=30 loops=1)
Group Key: events.pr_id, events.r
-> Unique (cost=0.56..61722.99 rows=145908 width=40) (actual time=0.024..374.392 rows=118571 loops=1)
-> Index Only Scan using events_c_id_pr_id_pa_id_t_r_idx on events (cost=0.56..60936.08 rows=157380 width=40) (actual time=0.021..222.987 rows=155800 loops=1)
Index Cond: ((c_id = 5) AND (t >= '2017-01-03 00:00:00'::timestamp without time zone) AND (t < '2017-01-06 00:00:00'::timestamp without time zone))
Heap Fetches: 0
Planning time: 0.171 ms
Execution time: 472.925 ms
(11 Zeilen)
With the partial index the runtime is about 100ms faster, meanwhile the whole table is twice as big.
(Note: the second time around it was only 50ms faster. The advantage should grow, the more events are recorded, though, because the queries requiring the full index will become slower, as you suspect (and i agree)).
Also, on my machine, the full index is 810 MB for two inserts (create table + additional from 2017-01-10).
The partial index WHERE 2017-01-03 <= t < 2017-01-06 is only 91 MB.
Maybe you can create partial indices on a monthly or yearly basis?
Depending on what time range is queried, maybe only recent data needs to be indexed, or otherwise only old data partially?
I also tried partial indexing with WHERE c_id = 5, so partitioning by c_id.
Sort (cost=51324.27..51361.47 rows=14880 width=23) (actual time=550.579..550.592 rows=30 loops=1)
Sort Key: (count(1)), events.r, events.pr_id DESC
Sort Method: quicksort Memory: 27kB
-> HashAggregate (cost=50144.21..50293.01 rows=14880 width=23) (actual time=550.481..550.528 rows=30 loops=1)
Group Key: events.pr_id, events.r
-> Unique (cost=0.42..47540.21 rows=148800 width=40) (actual time=0.050..443.393 rows=118571 loops=1)
-> Index Only Scan using events_cid on events (cost=0.42..46736.42 rows=160758 width=40) (actual time=0.047..269.676 rows=155800 loops=1)
Index Cond: ((t >= '2017-01-03 00:00:00'::timestamp without time zone) AND (t < '2017-01-06 00:00:00'::timestamp without time zone))
Heap Fetches: 0
Planning time: 0.366 ms
Execution time: 550.706 ms
(11 Zeilen)
So partial indexing may also be a viable option.
If you get ever more data, then you may also consider partitioning, for example all rows aged two years and older into a separate table or something.
I don't think Block Range Indexes BRIN (indices) might help here, though.
If you machine is more beefy than mine, then you can just insert 10 times the amount of data and check the behaviour of the regular full index first and how it behaves on an increasing table.
[EDITED]
Ok, As this depend of your data distribution here is another way to do it.
First add the following index :
CREATE INDEX events_idx2 ON events (c_id, t DESC, pr_id, pa_id, r);
This extract the MAX(t) as quick as he can, on the assumption that the sub set will be way smaller to join on the parent table. It may however probably be slower if the dataset is not that small.
SELECT
e.pr_id,
e.r,
count(1) AS quantity
FROM events e
JOIN (
SELECT
pr_id,
pa_id,
MAX(t) last_t
FROM events e
WHERE
c_id = 5
AND t >= '2017-01-03Z00:00:00'
AND t < '2017-01-06Z00:00:00'
GROUP BY
pr_id,
pa_id
) latest
ON (
c_id = 5
AND latest.pr_id = e.pr_id
AND latest.pa_id = e.pa_id
AND latest.last_t = e.t
)
GROUP BY
e.pr_id,
e.r
ORDER BY 3, 2, 1 DESC
Full Fiddle
SQL Fiddle
PostgreSQL 9.3 Schema Setup:
--PostgreSQL 9.6
--'\\' is a delimiter
-- CREATE TABLE events AS...
VACUUM ANALYZE events;
CREATE INDEX idx_events_idx ON events (c_id, t DESC, pr_id, pa_id, r);
Query 1:
-- query A
explain analyze SELECT
pr_id,
r,
count(1) AS quantity
FROM (
SELECT DISTINCT ON (pr_id, pa_id)
pr_id,
pa_id,
r
FROM events
WHERE
c_id = 5 AND
t >= '2017-01-03Z00:00:00' AND
t < '2017-01-06Z00:00:00'
ORDER BY pr_id, pa_id, t DESC
) latest
GROUP BY
1,
2
ORDER BY 3, 2, 1 DESC
Results:
QUERY PLAN
Sort (cost=2170.24..2170.74 rows=200 width=15) (actual time=358.239..358.245 rows=30 loops=1)
Sort Key: (count(1)), events.r, events.pr_id
Sort Method: quicksort Memory: 27kB
-> HashAggregate (cost=2160.60..2162.60 rows=200 width=15) (actual time=358.181..358.189 rows=30 loops=1)
-> Unique (cost=2012.69..2132.61 rows=1599 width=40) (actual time=327.345..353.750 rows=12098 loops=1)
-> Sort (cost=2012.69..2052.66 rows=15990 width=40) (actual time=327.344..348.686 rows=15966 loops=1)
Sort Key: events.pr_id, events.pa_id, events.t
Sort Method: external merge Disk: 792kB
-> Index Only Scan using idx_events_idx on events (cost=0.42..896.20 rows=15990 width=40) (actual time=0.059..5.475 rows=15966 loops=1)
Index Cond: ((c_id = 5) AND (t >= '2017-01-03 00:00:00'::timestamp without time zone) AND (t < '2017-01-06 00:00:00'::timestamp without time zone))
Heap Fetches: 0
Total runtime: 358.610 ms
Query 2:
-- query max/JOIN
explain analyze SELECT
e.pr_id,
e.r,
count(1) AS quantity
FROM events e
JOIN (
SELECT
pr_id,
pa_id,
MAX(t) last_t
FROM events e
WHERE
c_id = 5
AND t >= '2017-01-03Z00:00:00'
AND t < '2017-01-06Z00:00:00'
GROUP BY
pr_id,
pa_id
) latest
ON (
c_id = 5
AND latest.pr_id = e.pr_id
AND latest.pa_id = e.pa_id
AND latest.last_t = e.t
)
GROUP BY
e.pr_id,
e.r
ORDER BY 3, 2, 1 DESC
Results:
QUERY PLAN
Sort (cost=4153.31..4153.32 rows=1 width=15) (actual time=68.398..68.402 rows=30 loops=1)
Sort Key: (count(1)), e.r, e.pr_id
Sort Method: quicksort Memory: 27kB
-> HashAggregate (cost=4153.29..4153.30 rows=1 width=15) (actual time=68.363..68.371 rows=30 loops=1)
-> Merge Join (cost=1133.62..4153.29 rows=1 width=15) (actual time=35.083..64.154 rows=12098 loops=1)
Merge Cond: ((e.t = (max(e_1.t))) AND (e.pr_id = e_1.pr_id))
Join Filter: (e.pa_id = e_1.pa_id)
-> Index Only Scan Backward using idx_events_idx on events e (cost=0.42..2739.72 rows=53674 width=40) (actual time=0.010..8.073 rows=26661 loops=1)
Index Cond: (c_id = 5)
Heap Fetches: 0
-> Sort (cost=1133.19..1137.19 rows=1599 width=36) (actual time=29.778..32.885 rows=12098 loops=1)
Sort Key: (max(e_1.t)), e_1.pr_id
Sort Method: external sort Disk: 640kB
-> HashAggregate (cost=1016.12..1032.11 rows=1599 width=36) (actual time=12.731..16.738 rows=12098 loops=1)
-> Index Only Scan using idx_events_idx on events e_1 (cost=0.42..896.20 rows=15990 width=36) (actual time=0.029..5.084 rows=15966 loops=1)
Index Cond: ((c_id = 5) AND (t >= '2017-01-03 00:00:00'::timestamp without time zone) AND (t < '2017-01-06 00:00:00'::timestamp without time zone))
Heap Fetches: 0
Total runtime: 68.736 ms
Query 3:
DROP INDEX idx_events_idx
CREATE INDEX idx_events_flutter ON events (c_id, pr_id, pa_id, t DESC, r)
Query 5:
-- query A + index by flutter
explain analyze SELECT
pr_id,
r,
count(1) AS quantity
FROM (
SELECT DISTINCT ON (pr_id, pa_id)
pr_id,
pa_id,
r
FROM events
WHERE
c_id = 5 AND
t >= '2017-01-03Z00:00:00' AND
t < '2017-01-06Z00:00:00'
ORDER BY pr_id, pa_id, t DESC
) latest
GROUP BY
1,
2
ORDER BY 3, 2, 1 DESC
Results:
QUERY PLAN
Sort (cost=2744.82..2745.32 rows=200 width=15) (actual time=20.915..20.916 rows=30 loops=1)
Sort Key: (count(1)), events.r, events.pr_id
Sort Method: quicksort Memory: 27kB
-> HashAggregate (cost=2735.18..2737.18 rows=200 width=15) (actual time=20.883..20.892 rows=30 loops=1)
-> Unique (cost=0.42..2707.20 rows=1599 width=40) (actual time=0.037..16.488 rows=12098 loops=1)
-> Index Only Scan using idx_events_flutter on events (cost=0.42..2627.25 rows=15990 width=40) (actual time=0.036..10.893 rows=15966 loops=1)
Index Cond: ((c_id = 5) AND (t >= '2017-01-03 00:00:00'::timestamp without time zone) AND (t < '2017-01-06 00:00:00'::timestamp without time zone))
Heap Fetches: 0
Total runtime: 20.964 ms
Just two different methods(YMMV):
-- using a window finction to find the record with the most recent t::
EXPLAIN ANALYZE
SELECT pr_id, r, count(1) AS quantity
FROM (
SELECT DISTINCT ON (pr_id, pa_id)
pr_id, pa_id,
first_value(r) OVER www AS r
-- last_value(r) OVER www AS r
FROM events
WHERE c_id = 5
AND t >= '2017-01-03Z00:00:00'
AND t < '2017-01-06Z00:00:00'
WINDOW www AS (PARTITION BY pr_id, pa_id ORDER BY t DESC)
ORDER BY 1, 2, t DESC
) sss
GROUP BY 1, 2
ORDER BY 3, 2, 1 DESC
;
-- Avoiding the window function; find the MAX via NOT EXISTS() ::
EXPLAIN ANALYZE
SELECT pr_id, r, count(1) AS quantity
FROM (
SELECT DISTINCT ON (pr_id, pa_id)
pr_id, pa_id, r
FROM events e
WHERE c_id = 5
AND t >= '2017-01-03Z00:00:00'
AND t < '2017-01-06Z00:00:00'
AND NOT EXISTS ( SELECT * FROM events nx
WHERE nx.c_id = 5 AND nx.pr_id =e.pr_id AND nx.pa_id =e.pa_id
AND nx.t >= '2017-01-03Z00:00:00'
AND nx.t < '2017-01-06Z00:00:00'
AND nx.t > e.t
)
) sss
GROUP BY 1, 2
ORDER BY 3, 2, 1 DESC
;
Note: the DISTINCT ON can be omitted from the second query, the results are already unique.
I'd try to use a standard ROW_NUMBER() function with a matching index instead of Postgres-specific DISTINCT ON to find the "latest" rows.
Index
CREATE INDEX ix_events ON events USING btree (c_id, pa_id, pr_id, t DESC, r);
Query
WITH
CTE_RN
AS
(
SELECT
pa_id
,pr_id
,r
,ROW_NUMBER() OVER (PARTITION BY c_id, pa_id, pr_id ORDER BY t DESC) AS rn
FROM events
WHERE
c_id = 5
AND t >= '2017-01-03Z00:00:00'
AND t < '2017-01-06Z00:00:00'
)
SELECT
pr_id
,r
,COUNT(*) AS quantity
FROM CTE_RN
WHERE rn = 1
GROUP BY
pr_id
,r
ORDER BY quantity, r, pr_id DESC
;
I don't have Postgres at hand, so I'm using http://rextester.com for testing. I set the scale_factor to 30 in the data generation script, otherwise it takes too long for rextester. I'm getting the following query plan. The time component should be ignored, but you can see that there are no intermediate sorts, only the sort for the final ORDER BY. See http://rextester.com/GUFXY36037
Please try this query on your hardware and your data. It would be interesting to see how it compares to your query. I noticed that optimizer doesn't choose this index if the table has the index that you defined. If you see the same on your server, please try to drop or disable other indexes to get the plan that I got.
1 Sort (cost=158.07..158.08 rows=1 width=44) (actual time=81.445..81.448 rows=30 loops=1)
2 Output: cte_rn.pr_id, cte_rn.r, (count(*))
3 Sort Key: (count(*)), cte_rn.r, cte_rn.pr_id DESC
4 Sort Method: quicksort Memory: 27kB
5 CTE cte_rn
6 -> WindowAgg (cost=0.42..157.78 rows=12 width=88) (actual time=0.204..56.215 rows=15130 loops=1)
7 Output: events.pa_id, events.pr_id, events.r, row_number() OVER (?), events.t, events.c_id
8 -> Index Only Scan using ix_events3 on public.events (cost=0.42..157.51 rows=12 width=80) (actual time=0.184..28.688 rows=15130 loops=1)
9 Output: events.c_id, events.pa_id, events.pr_id, events.t, events.r
10 Index Cond: ((events.c_id = 5) AND (events.t >= '2017-01-03 00:00:00'::timestamp without time zone) AND (events.t < '2017-01-06 00:00:00'::timestamp without time zone))
11 Heap Fetches: 15130
12 -> HashAggregate (cost=0.28..0.29 rows=1 width=44) (actual time=81.363..81.402 rows=30 loops=1)
13 Output: cte_rn.pr_id, cte_rn.r, count(*)
14 Group Key: cte_rn.pr_id, cte_rn.r
15 -> CTE Scan on cte_rn (cost=0.00..0.27 rows=1 width=36) (actual time=0.214..72.841 rows=11491 loops=1)
16 Output: cte_rn.pa_id, cte_rn.pr_id, cte_rn.r, cte_rn.rn
17 Filter: (cte_rn.rn = 1)
18 Rows Removed by Filter: 3639
19 Planning time: 0.452 ms
20 Execution time: 83.234 ms
There is one more optimisation you could do that relies on the external knowledge of your data.
If you can guarantee that each pair of pa_id, pr_id has values for each, say, day, then you can safely reduce the user-defined range of t to just one day.
This will reduce the number of rows that engine reads and sorts if user usually specifies range of t longer than 1 day.
If you can't provide this kind of guarantee in your data for all values, but you still know that usually all pa_id, pr_id are close to each other (by t) and user usually provides a wide range for t, you can run a preliminary query to narrow down the range of t for the main query.
Something like this:
SELECT
MIN(MaxT) AS StartT
MAX(MaxT) AS EndT
FROM
(
SELECT
pa_id
,pr_id
,MAX(t) AS MaxT
FROM events
WHERE
c_id = 5
AND t >= '2017-01-03Z00:00:00'
AND t < '2017-01-06Z00:00:00'
GROUP BY
pa_id
,pr_id
) AS T
And then use the found StartT,EndT in the main query hoping that new range would be much narrower than original defined by the user.
The query above doesn't have to sort rows, so it should be fast. The main query has to sort rows, but there will be less rows to sort, so overall run-time may be better.
So I've taken a little bit of a tack and tried moving your grouping and distinct data into their owns tables, so that we can leverage multiple table indexes. Note that this solution only works if you have control over the way data gets inserted into the database, i.e. you can change the data source application. If not, alas this is moot.
In practice, instead of inserting into the events table immediately, you would first check if the relational date and prpa exist in their relevant tables. If not, create them. Then get their ids and use that for your insert statement to the events table.
Before I start, I was generating a 10x increase in performance on query_c over query_a, and my final result for the rewritten query_a is about a 4x performance. If that's not good enough, feel free to switch off.
Given the initial data seeding queries you gave in the first instance, I calculated the following benchmarks:
query_a: 5228.518 ms
query_b: 5708.962 ms
query_c: 538.329 ms
So, about a 10x increase in performance, give or take.
I'm going to alter the data that's generated in events, and this alteration takes quite a while. You would not need to do this in practice, as your INSERTs to the tables would be covered already.
For my optimisation, the first step is to create a table that houses dates and then transfer the data over, and relate back to it in the events table, like so:
CREATE TABLE dates (
id SERIAL,
year_part INTEGER NOT NULL,
month_part INTEGER NOT NULL,
day_part INTEGER NOT NULL
);
-- Total runtime: 8.281 ms
INSERT INTO dates(year_part, month_part, day_part) SELECT DISTINCT
EXTRACT(YEAR FROM t), EXTRACT(MONTH FROM t), EXTRACT(DAY FROM t)
FROM events;
-- Total runtime: 12802.900 ms
CREATE INDEX dates_ymd ON dates USING btree(year_part, month_part, day_part);
-- Total runtime: 13.750 ms
ALTER TABLE events ADD COLUMN date_id INTEGER;
-- Total runtime: 2.468ms
UPDATE events SET date_id = dates.id
FROM dates
WHERE EXTRACT(YEAR FROM t) = dates.year_part
AND EXTRACT(MONTH FROM t) = dates.month_part
AND EXTRACT(DAY FROM T) = dates.day_part
;
-- Total runtime: 388024.520 ms
Next, we do the same, but with the key pair (pr_id, pa_id), which doesn't reduce the cardinality too much, but when we're talking large sets it can help with memory usage and swapping in and out:
CREATE TABLE prpa (
id SERIAL,
pr_id TEXT NOT NULL,
pa_id TEXT NOT NULL
);
-- Total runtime: 5.451 ms
CREATE INDEX events_prpa ON events USING btree(pr_id, pa_id);
-- Total runtime: 218,908.894 ms
INSERT INTO prpa(pr_id, pa_id) SELECT DISTINCT pr_id, pa_id FROM events;
-- Total runtime: 5566.760 ms
CREATE INDEX prpa_idx ON prpa USING btree(pr_id, pa_id);
-- Total runtime: 84185.057 ms
ALTER TABLE events ADD COLUMN prpa_id INTEGER;
-- Total runtime: 2.067 ms
UPDATE events SET prpa_id = prpa.id
FROM prpa
WHERE events.pr_id = prpa.pr_id
AND events.pa_id = prpa.pa_id;
-- Total runtime: 757915.192
DROP INDEX events_prpa;
-- Total runtime: 1041.556 ms
Finally, let's get rid of the old indexes and the now defunct columns, and then vacuum up the new tables:
DROP INDEX events_idx;
-- Total runtime: 1139.508 ms
ALTER TABLE events
DROP COLUMN pr_id,
DROP COLUMN pa_id
;
-- Total runtime: 5.376 ms
VACUUM ANALYSE prpa;
-- Total runtime: 1030.142
VACUUM ANALYSE dates;
-- Total runtime: 6652.151
So we now have the following tables:
events (c_id, r, t, prpa_id, date_id)
dates (id, year_part, month_part, day_part)
prpa (id, pr_id, pa_id)
Let's toss on an index now, pushing t DESC to the end where it belongs, which we can do now because we're filtering results on dates before ORDERing, which cuts down the need for t DESC to be so prominent in the index:
CREATE INDEX events_idx_new ON events USING btree (c_id, date_id, prpa_id, t DESC);
-- Total runtime: 27697.795
VACUUM ANALYSE events;
Now we rewrite the query, (using a table to store intermediary results - I find this works well with large datasets) and awaaaaaay we go!
DROP TABLE IF EXISTS temp_results;
SELECT DISTINCT ON (prpa_id)
prpa_id,
r
INTO TEMPORARY temp_results
FROM events
INNER JOIN dates
ON dates.id = events.date_id
WHERE c_id = 5
AND dates.year_part BETWEEN 2017 AND 2017
AND dates.month_part BETWEEN 1 AND 1
AND dates.day_part BETWEEN 3 AND 5
ORDER BY prpa_id, t DESC;
SELECT
prpa.pr_id,
r,
count(1) AS quantity
FROM temp_results
INNER JOIN prpa ON prpa.id = temp_results.prpa_id
GROUP BY
1,
2
ORDER BY 3, 2, 1 DESC;
-- Total runtime: 1233.281 ms
So, not a 10x increase in performance, but 4x which is still alright.
This solution is a combination of a couple of techniques I have found work well with large datasets and with date ranges. Even if it's not good enough for your purposes, there might be some gems in here you can repurpose throughout your career.
EDIT:
EXPLAIN ANALYSE on SELECT INTO query:
Unique (cost=171839.95..172360.53 rows=51332 width=16) (actual time=819.385..857.777 rows=117471 loops=1)
-> Sort (cost=171839.95..172100.24 rows=104117 width=16) (actual time=819.382..836.924 rows=155202 loops=1)
Sort Key: events.prpa_id, events.t
Sort Method: external sort Disk: 3944kB
-> Hash Join (cost=14340.24..163162.92 rows=104117 width=16) (actual time=126.929..673.293 rows=155202 loops=1)
Hash Cond: (events.date_id = dates.id)
-> Bitmap Heap Scan on events (cost=14338.97..160168.28 rows=520585 width=20) (actual time=126.572..575.852 rows=516503 loops=1)
Recheck Cond: (c_id = 5)
Heap Blocks: exact=29610
-> Bitmap Index Scan on events_idx2 (cost=0.00..14208.82 rows=520585 width=0) (actual time=118.769..118.769 rows=516503 loops=1)
Index Cond: (c_id = 5)
-> Hash (cost=1.25..1.25 rows=2 width=4) (actual time=0.326..0.326 rows=3 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Seq Scan on dates (cost=0.00..1.25 rows=2 width=4) (actual time=0.320..0.323 rows=3 loops=1)
Filter: ((year_part >= 2017) AND (year_part <= 2017) AND (month_part >= 1) AND (month_part <= 1) AND (day_part >= 3) AND (day_part <= 5))
Rows Removed by Filter: 7
Planning time: 3.091 ms
Execution time: 913.543 ms
EXPLAIN ANALYSE on SELECT query:
(Note: I had to alter the first query to select into an actual table, not temporary table, on order to get the query plan for this one. AFAIK EXPLAIN ANALYSE only works on single queries)
Sort (cost=89590.66..89595.66 rows=2000 width=15) (actual time=1248.535..1248.537 rows=30 loops=1)
Sort Key: (count(1)), temp_results.r, prpa.pr_id
Sort Method: quicksort Memory: 27kB
-> HashAggregate (cost=89461.00..89481.00 rows=2000 width=15) (actual time=1248.460..1248.468 rows=30 loops=1)
Group Key: prpa.pr_id, temp_results.r
-> Hash Join (cost=73821.20..88626.40 rows=111280 width=15) (actual time=798.861..1213.494 rows=117471 loops=1)
Hash Cond: (temp_results.prpa_id = prpa.id)
-> Seq Scan on temp_results (cost=0.00..1632.80 rows=111280 width=8) (actual time=0.024..17.401 rows=117471 loops=1)
-> Hash (cost=36958.31..36958.31 rows=2120631 width=15) (actual time=798.484..798.484 rows=2120631 loops=1)
Buckets: 16384 Batches: 32 Memory Usage: 3129kB
-> Seq Scan on prpa (cost=0.00..36958.31 rows=2120631 width=15) (actual time=0.126..350.664 rows=2120631 loops=1)
Planning time: 1.073 ms
Execution time: 1248.660 ms

Postgresql JSON index strange query time

Let's say we have table like this:
CREATE TABLE user_device_infos
(
id integer NOT NULL DEFAULT nextval('user_device_infos_id_seq1'::regclass),
user_id integer,
data jsonb,
created_at timestamp without time zone NOT NULL,
updated_at timestamp without time zone NOT NULL,
CONSTRAINT user_device_infos_pkey PRIMARY KEY (id),
CONSTRAINT fk_rails_e4001464ba FOREIGN KEY (user_id)
REFERENCES public.users (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
CREATE INDEX index_user_device_infos_imei_user_id
ON public.user_device_infos
USING btree
(((data -> 'Network'::text) ->> 'IMEI No'::text) COLLATE pg_catalog."default", user_id);
CREATE INDEX index_user_device_infos_on_user_id
ON public.user_device_infos
USING btree
(user_id);
Now i try to select user_id from first device with the same imei:
SELECT user_id FROM user_device_infos WHERE (data->'Network'->>'IMEI No' = 'xxxx') order by id LIMIT 1
This query takes 5 seconds on my table ( 152000 entries )
But if i write
SELECT user_id FROM user_device_infos WHERE (data->'Network'->>'IMEI No' = 'xxxx') order by created_at asc LIMIT 1
SELECT user_id FROM user_device_infos WHERE (data->'Network'->>'IMEI No' = 'xxxx') order by created_at desc LIMIT 1
query takes less then 1 ms.
Why this query i so much faster then first variant ? There are no indexes on created at, and id is primary key
Upd
As suggested in comments, i ran explain analyze, but still don't understand what is wrong with "order by id" query. Sorry, i am not a sql-dev
# explain analyze SELECT user_id FROM user_device_infos WHERE (data->'Network'->>'IMEI No' = 'xxxx') order by id LIMIT 1;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.42..416.84 rows=1 width=8) (actual time=5289.784..5289.784 rows=0 loops=1)
-> Index Scan using user_device_infos_pkey on user_device_infos (cost=0.42..316483.14 rows=760 width=8) (actual time=5289.782..5289.782 rows=0 loops=1)
Filter: (((data -> 'Network'::text) ->> 'IMEI No'::text) = 'xxxx'::text)
Rows Removed by Filter: 152437
Planning time: 0.153 ms
Execution time: 5289.817 ms
(6 rows)
# explain analyze SELECT user_id FROM user_device_infos WHERE (data->'Network'->>'IMEI No' = 'xxxx') order by created_at LIMIT 1;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=2823.73..2823.74 rows=1 width=12) (actual time=0.064..0.064 rows=0 loops=1)
-> Sort (cost=2823.73..2825.63 rows=760 width=12) (actual time=0.062..0.062 rows=0 loops=1)
Sort Key: created_at
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on user_device_infos (cost=22.31..2819.93 rows=760 width=12) (actual time=0.039..0.039 rows=0 loops=1)
Recheck Cond: (((data -> 'Network'::text) ->> 'IMEI No'::text) = 'xxxx'::text)
-> Bitmap Index Scan on index_user_device_infos_imei_user_id (cost=0.00..22.12 rows=760 width=0) (actual time=0.037..0.037 rows=0 loops=1)
Index Cond: (((data -> 'Network'::text) ->> 'IMEI No'::text) = 'xxxx'::text)
Planning time: 0.144 ms
Execution time: 0.092 ms
(10 rows)