Improve PostgresSQL aggregation query performance - sql

I am aggregating data from a Postgres table, the query is taking approx 2 seconds which I want to reduce to less than a second.
Please find below the execution details:
Query
select
a.search_keyword,
hll_cardinality( hll_union_agg(a.users) ):: int as user_count,
hll_cardinality( hll_union_agg(a.sessions) ):: int as session_count,
sum(a.total) as keyword_count
from
rollup_day a
where
a.created_date between '2018-09-01' and '2019-09-30'
and a.tenant_id = '62850a62-19ac-477d-9cd7-837f3d716885'
group by
a.search_keyword
order by
session_count desc
limit 100;
Table metadata
Total number of rows - 506527
Composite Index on columns : tenant_id and created_date
Query plan
Custom Scan (cost=0.00..0.00 rows=0 width=0) (actual time=1722.685..1722.694 rows=100 loops=1)
Task Count: 1
Tasks Shown: All
-> Task
Node: host=localhost port=5454 dbname=postgres
-> Limit (cost=64250.24..64250.49 rows=100 width=42) (actual time=1783.087..1783.106 rows=100 loops=1)
-> Sort (cost=64250.24..64558.81 rows=123430 width=42) (actual time=1783.085..1783.093 rows=100 loops=1)
Sort Key: ((hll_cardinality(hll_union_agg(sessions)))::integer) DESC
Sort Method: top-N heapsort Memory: 33kB
-> GroupAggregate (cost=52933.89..59532.83 rows=123430 width=42) (actual time=905.502..1724.363 rows=212633 loops=1)
Group Key: search_keyword
-> Sort (cost=52933.89..53636.53 rows=281055 width=54) (actual time=905.483..1351.212 rows=280981 loops=1)
Sort Key: search_keyword
Sort Method: external merge Disk: 18496kB
-> Seq Scan on rollup_day a (cost=0.00..17890.22 rows=281055 width=54) (actual time=29.720..112.161 rows=280981 loops=1)
Filter: ((created_date >= '2018-09-01'::date) AND (created_date <= '2019-09-30'::date) AND (tenant_id = '62850a62-19ac-477d-9cd7-837f3d716885'::uuid))
Rows Removed by Filter: 225546
Planning Time: 0.129 ms
Execution Time: 1786.222 ms
Planning Time: 0.103 ms
Execution Time: 1722.718 ms
What I've tried
I've tried with indexes on tenant_id and created_date but as the data is huge so it's always doing sequence scan rather than an index scan for filters. I've read about it and found, the Postgres query engine switch to sequence scan if the data returned is > 5-10% of the total rows. Please follow the link for more reference.
I've increased the work_mem to 100MB but it only improved the performance a little bit.
Any help would be really appreciated.
Update
Query plan after setting work_mem to 100MB
Custom Scan (cost=0.00..0.00 rows=0 width=0) (actual time=1375.926..1375.935 rows=100 loops=1)
Task Count: 1
Tasks Shown: All
-> Task
Node: host=localhost port=5454 dbname=postgres
-> Limit (cost=48348.85..48349.10 rows=100 width=42) (actual time=1307.072..1307.093 rows=100 loops=1)
-> Sort (cost=48348.85..48633.55 rows=113880 width=42) (actual time=1307.071..1307.080 rows=100 loops=1)
Sort Key: (sum(total)) DESC
Sort Method: top-N heapsort Memory: 35kB
-> GroupAggregate (cost=38285.79..43996.44 rows=113880 width=42) (actual time=941.504..1261.177 rows=172945 loops=1)
Group Key: search_keyword
-> Sort (cost=38285.79..38858.52 rows=229092 width=54) (actual time=941.484..963.061 rows=227261 loops=1)
Sort Key: search_keyword
Sort Method: quicksort Memory: 32982kB
-> Seq Scan on rollup_day_104290 a (cost=0.00..17890.22 rows=229092 width=54) (actual time=38.803..104.350 rows=227261 loops=1)
Filter: ((created_date >= '2019-01-01'::date) AND (created_date <= '2019-12-30'::date) AND (tenant_id = '62850a62-19ac-477d-9cd7-837f3d716885'::uuid))
Rows Removed by Filter: 279266
Planning Time: 0.131 ms
Execution Time: 1308.814 ms
Planning Time: 0.112 ms
Execution Time: 1375.961 ms
Update 2
After creating an index on created_date and increased work_mem to 120MB
create index date_idx on rollup_day(created_date);
The total number of rows is: 12,124,608
Query Plan is:
Custom Scan (cost=0.00..0.00 rows=0 width=0) (actual time=2635.530..2635.540 rows=100 loops=1)
Task Count: 1
Tasks Shown: All
-> Task
Node: host=localhost port=9702 dbname=postgres
-> Limit (cost=73545.19..73545.44 rows=100 width=51) (actual time=2755.849..2755.873 rows=100 loops=1)
-> Sort (cost=73545.19..73911.25 rows=146424 width=51) (actual time=2755.847..2755.858 rows=100 loops=1)
Sort Key: (sum(total)) DESC
Sort Method: top-N heapsort Memory: 35kB
-> GroupAggregate (cost=59173.97..67948.97 rows=146424 width=51) (actual time=2014.260..2670.732 rows=296537 loops=1)
Group Key: search_keyword
-> Sort (cost=59173.97..60196.85 rows=409152 width=55) (actual time=2013.885..2064.775 rows=410618 loops=1)
Sort Key: search_keyword
Sort Method: quicksort Memory: 61381kB
-> Index Scan using date_idx_102913 on rollup_day_102913 a (cost=0.42..21036.35 rows=409152 width=55) (actual time=0.026..183.370 rows=410618 loops=1)
Index Cond: ((created_date >= '2018-01-01'::date) AND (created_date <= '2018-12-31'::date))
Filter: (tenant_id = '12850a62-19ac-477d-9cd7-837f3d716885'::uuid)
Planning Time: 0.135 ms
Execution Time: 2760.667 ms
Planning Time: 0.090 ms
Execution Time: 2635.568 ms

You should experiment with higher settings of work_mem until you get an in-memory sort. Of course you can only be generous with memory if your machine has enough of it.
What would make your query way faster is if you store pre-aggregated data, either using a materialized view or a second table and a trigger on your original table that keeps the sums in the other table updated. I don't know if that is possible with your data, as I don't know what hll_cardinality and hll_union_agg are.

Have you tried a Covering indexes, so the optimizer will use the index, and not do a sequential scan ?
create index covering on rollup_day(tenant_id, created_date, search_keyword, users, sessions, total);
If Postgres 11
create index covering on rollup_day(tenant_id, created_date) INCLUDE (search_keyword, users, sessions, total);
But since you also do a sort/group by on search_keyword maybe :
create index covering on rollup_day(tenant_id, created_date, search_keyword);
create index covering on rollup_day(tenant_id, search_keyword, created_date);
Or :
create index covering on rollup_day(tenant_id, created_date, search_keyword) INCLUDE (users, sessions, total);
create index covering on rollup_day(tenant_id, search_keyword, created_date) INCLUDE (users, sessions, total);
One of these indexes should make the query faster. You should only add one of these indexes.
Even if it makes this query faster, having big indexes will/might make your write operations slower (especially HOT updates are not available on indexed columns). And you will use more storage.
Idea came from here , there is also an hint about size for the work_mem
Another example where the index was not used

use the table partitions and create a composite index it will bring down the total cost as:
it will save huge cost on scans for you.
partitions will segregate data and will be very helpful in future purge operations as well.
I have personally tried and tested table partitions with such cases and the throughput is amazing with the combination of
partitions & composite indexes.
Partitioning can be done on the range of created date and then composite indexes on date & tenant.
Remember you can always have a composite index with a condition in it if there is a very specific requirement for the condition in your query. This way the data will be sorted already in the index and will save huge costs for sort operations as well.
Hope this helps.
PS: Also, is it possible to share any test sample data for the same?

my suggestion would be to break up the select.
Now what I would try also in combination with this to setup 2 indices on the table. One on the Dates the other on the ID. One of the problem with weird IDs is, that it takes time to compare and they can be treated as string compare in the background. Thats why the break up, to prefilter the data before the between command is executed. Now the between command can make a select slow. Here I would suggest to break it up into 2 selects and inner join (I now the memory consumption is a problem).
Here is an example what I mean. I hope the optimizer is smart enough to restructure your query.
SELECT
a.search_keyword,
hll_cardinality( hll_union_agg(a.users) ):: int as user_count,
hll_cardinality( hll_union_agg(a.sessions) ):: int as session_count,
sum(a.total) as keyword_count
FROM
(SELECT
*
FROM
rollup_day a
WHERE
a.tenant_id = '62850a62-19ac-477d-9cd7-837f3d716885') t1
WHERE
a.created_date between '2018-09-01' and '2019-09-30'
group by
a.search_keyword
order by
session_count desc
Now if this does not work then you need more specific optimizations. For example. Can the total be equal to 0, then you need filtered index on the data where the total is > 0. Are there any other criteria that make it easy to exclude rows from the select.
The next consideration would be to create a row where there is a short ID (instead of 62850a62-19ac-477d-9cd7-837f3d716885 -> 62850 ), that can be a number and that would make preselection very easy and memory consumption less.

Related

Small result query with LIMIT 1000x slower than queries with >100 rows

I am trying to debug a query that runs faster the more records it returns but performance severely degrades (>10x slower) with smaller returns (i.e. <10 rows) using small LIMIT (ie 10).
Example:
Fast query with 5 results out of 1M rows - no LIMIT
SELECT *
FROM transaction_internal_by_addresses
WHERE address = 'foo'
ORDER BY block_number desc;
Explain:
Sort (cost=7733.14..7749.31 rows=6468 width=126) (actual time=0.030..0.031 rows=5 loops=1)
" Output: address, block_number, log_index, transaction_hash"
Sort Key: transaction_internal_by_addresses.block_number
Sort Method: quicksort Memory: 26kB
Buffers: shared hit=10
-> Index Scan using transaction_internal_by_addresses_pkey on public.transaction_internal_by_addresses (cost=0.69..7323.75 rows=6468 width=126) (actual time=0.018..0.021 rows=5 loops=1)
" Output: address, block_number, log_index, transaction_hash"
Index Cond: (transaction_internal_by_addresses.address = 'foo'::text)
Buffers: shared hit=10
Query Identifier: -8912211611755432198
Planning Time: 0.051 ms
Execution Time: 0.041 ms
Fast query with 5 results out of 1M rows: - High LIMIT
SELECT *
FROM transaction_internal_by_addresses
WHERE address = 'foo'
ORDER BY block_number desc
LIMIT 100;
Limit (cost=7570.95..7571.20 rows=100 width=126) (actual time=0.024..0.025 rows=5 loops=1)
" Output: address, block_number, log_index, transaction_hash"
Buffers: shared hit=10
-> Sort (cost=7570.95..7587.12 rows=6468 width=126) (actual time=0.023..0.024 rows=5 loops=1)
" Output: address, block_number, log_index, transaction_hash"
Sort Key: transaction_internal_by_addresses.block_number DESC
Sort Method: quicksort Memory: 26kB
Buffers: shared hit=10
-> Index Scan using transaction_internal_by_addresses_pkey on public.transaction_internal_by_addresses (cost=0.69..7323.75 rows=6468 width=126) (actual time=0.016..0.020 rows=5 loops=1)
" Output: address, block_number, log_index, transaction_hash"
Index Cond: (transaction_internal_by_addresses.address = 'foo'::text)
Buffers: shared hit=10
Query Identifier: 3421253327669991203
Planning Time: 0.042 ms
Execution Time: 0.034 ms
Slow query: - Low LIMIT
SELECT *
FROM transaction_internal_by_addresses
WHERE address = 'foo'
ORDER BY block_number desc
LIMIT 10;
Explain result:
Limit (cost=1000.63..6133.94 rows=10 width=126) (actual time=10277.845..11861.269 rows=0 loops=1)
" Output: address, block_number, log_index, transaction_hash"
Buffers: shared hit=56313576
-> Gather Merge (cost=1000.63..3333036.90 rows=6491 width=126) (actual time=10277.844..11861.266 rows=0 loops=1)
" Output: address, block_number, log_index, transaction_hash"
Workers Planned: 4
Workers Launched: 4
Buffers: shared hit=56313576
-> Parallel Index Scan Backward using transaction_internal_by_address_idx_block_number on public.transaction_internal_by_addresses (cost=0.57..3331263.70 rows=1623 width=126) (actual time=10256.995..10256.995 rows=0 loops=5)
" Output: address, block_number, log_index, transaction_hash"
Filter: (transaction_internal_by_addresses.address = 'foo'::text)
Rows Removed by Filter: 18485480
Buffers: shared hit=56313576
Worker 0: actual time=10251.822..10251.823 rows=0 loops=1
Buffers: shared hit=11387166
Worker 1: actual time=10250.971..10250.972 rows=0 loops=1
Buffers: shared hit=10215941
Worker 2: actual time=10252.269..10252.269 rows=0 loops=1
Buffers: shared hit=10191990
Worker 3: actual time=10252.513..10252.514 rows=0 loops=1
Buffers: shared hit=10238279
Query Identifier: 2050754902087402293
Planning Time: 0.081 ms
Execution Time: 11861.297 ms
DDL
create table transaction_internal_by_addresses
(
address text not null,
block_number bigint,
log_index bigint not null,
transaction_hash text not null,
primary key (address, log_index, transaction_hash)
);
alter table transaction_internal_by_addresses
owner to "icon-worker";
create index transaction_internal_by_address_idx_block_number
on transaction_internal_by_addresses (block_number);
So my questions
Should I just be looking at ways to force the query planner to apply the WHERE on the address (primary key)?
As you can see in the explain, the row block_number is scanned in the slow query but I am not sure why. Can anyone explain?
Is this normal? Seems like the more data, the harder the query, not the other way around as in this case.
Update
Apologies for A, the delay in responding and B, some of the inconsistencies in this question.
I have updated the EXPLAIN clearly showing the 1000x performance degradation
A multicolumn BTREE index on (address, block_number DESC) is exactly what the query planner needs to generate the result sets you mentioned. It will random-access the index to the first eligible row, then read the rows out in sequential order until it hits the LIMIT. You can also omit the DESC with no ill effects.
create index address_block_number
on transaction_internal_by_addresses
(address, block_number DESC);
As for asking "why" about query planner results, that's often an enduring mystery.
Sub-millisecond differences are hardly predictable so you're pretty much staring at noise, random miniscule differences caused by other things happening on the system. Your fastest query runs in tens of microseconds, slowest in a single millisecond - all of these are below typical network, mouse click, screen refresh latencies.
The planner already applies a where on your address: Index Cond: (a_table.address = 'foo'::text)
You're ordering by block_number, so it makes sense to scan it. It's also taking place in all three of your queries because they all do that.
It is normal - here's an online demo with similar differences. If what you're after is some reliable time estimation, use pgbench to run your queries multiple times and average out the timings.
Your third query plan seems to be coming from a different query, against a different table: a_table, compared to the initial two: transaction_internal_by_addresses.
If you were just wondering why these timings look like that, it's pretty much random and/or irrelevant at this level. If you're facing some kind of problem because of how these queries behave, it'd be better to focus on describing that problem - the queries themselves all do the same thing and the difference in their execution times is negligible.
Should I just be looking at ways to force the query planner to apply the WHERE on the address (primary key)?
Yes, it can be improve performance
As you can see in the explain, the row block_number is scanned in the slow query but I am not sure why. Can anyone explain?
Because Sort keys are different. Look carefully:
Sort Key: transaction_internal_by_addresses.block_number DESC
Sort Key: a_table.a_indexed_row DESC
it seems a_table.a_indexed_row has less performant stuff (eg more columns, more complex structure etc.)
Is this normal? Seems like the more data, the harder the query, not the other way around as in this case.
Normally more queries cause more time. But as I mentioned above, maybe a_table.a_indexed_row returns more values, has more columns etc.

Postgres: which index to add

I have a table mainly used by this query (only 3 columns are in use here, meter, timeStampUtc and createdOnUtc, but there are other in the table), which starts to take too long:
select
rank() over (order by mr.meter, mr."timeStampUtc") as row_name
, max(mr."createdOnUtc") over (partition by mr.meter, mr."timeStampUtc") as "createdOnUtc"
from
"MeterReading" mr
where
"createdOnUtc" >= '2021-01-01'
order by row_name
;
(this is the minimal query to show my issue. It might not make too much sense on its own, or could be rewritten)
I am wondering which index (or other technique) to use to optimise this particular query.
A basic index on createdOnUtc helps already.
I am mostly wondering about those 2 windows functions. They are very similar, so I factorised them (named window with thus identical partition by and order by), it had no effect. Adding an index on meter, "timeStampUtc" had no effect either (query plan unchanged).
Is there no way to use an index on those 2 columns inside a window function?
Edit - explain analyze output: using the createdOnUtc index
Sort (cost=8.51..8.51 rows=1 width=40) (actual time=61.045..62.222 rows=26954 loops=1)
Sort Key: (rank() OVER (?))
Sort Method: quicksort Memory: 2874kB
-> WindowAgg (cost=8.46..8.50 rows=1 width=40) (actual time=18.373..57.892 rows=26954 loops=1)
-> WindowAgg (cost=8.46..8.48 rows=1 width=40) (actual time=18.363..32.444 rows=26954 loops=1)
-> Sort (cost=8.46..8.46 rows=1 width=32) (actual time=18.353..19.663 rows=26954 loops=1)
Sort Key: meter, "timeStampUtc"
Sort Method: quicksort Memory: 2874kB
-> Index Scan using "MeterReading_createdOnUtc_idx" on "MeterReading" mr (cost=0.43..8.45 rows=1 width=32) (actual time=0.068..8.059 rows=26954 loops=1)
Index Cond: ("createdOnUtc" >= '2021-01-01 00:00:00'::timestamp without time zone)
Planning Time: 0.082 ms
Execution Time: 63.698 ms
Is there no way to use an index on those 2 columns inside a window function?
That is correct; a window function cannot use an index, as the work only on what otherwise would be the final result, all data selection has already finished. From the documentation.
The rows considered by a window function are those of the “virtual
table” produced by the query's FROM clause as filtered by its WHERE,
GROUP BY, and HAVING clauses if any. For example, a row removed
because it does not meet the WHERE condition is not seen by any window
function. A query can contain multiple window functions that slice up
the data in different ways using different OVER clauses, but they all
act on the same collection of rows defined by this virtual table.
The purpose of an index is to speed up the creation of that "virtual table". Applying an index would just slow things down: the data is already in memory. Scanning it is orders of magnitude faster any any index.

Difference between ANY(ARRAY[..]) vs ANY(VALUES (), () ..) in PostgreSQL

I am trying to workout query optimisation on id. Not sure which one way should I use. Below is the query plan using explain and cost wise looks similar.
1. explain (analyze, buffers) SELECT * FROM table1 WHERE id = ANY (ARRAY['00e289b0-1ac8-451f-957f-e00bc289148e'::uuid,...]);
QUERY PLAN:
Index Scan using table1_pkey on table1 (cost=0.42..641.44 rows=76 width=835) (actual time=0.258..2.603 rows=76 loops=1)
Index Cond: (id = ANY ('{00e289b0-1ac8-451f-957f-e00bc289148e,...}'::uuid[]))
Buffers: shared hit=231 read=73
Planning Time: 0.487 ms
Execution Time: 2.715 ms)
2. explain (analyze, buffers) SELECT * FROM table1 WHERE id = ANY (VALUES ('00e289b0-1ac8-451f-957f-e00bc289148e'::uuid),...);
QUERY PLAN:
Nested Loop (cost=1.56..644.10 rows=76 width=835) (actual time=0.058..0.297 rows=76 loops=1)
Buffers: shared hit=304
-> HashAggregate (cost=1.14..1.90 rows=76 width=16) (actual time=0.049..0.060 rows=76 loops=1)
Group Key: "*VALUES*".column1
-> Values Scan on "*VALUES*" (cost=0.00..0.95 rows=76 width=16) (actual time=0.006..0.022 rows=76 loops=1)
-> Index Scan using table1_pkey on table1 (cost=0.42..8.44 rows=1 width=835) (actual time=0.002..0.003 rows=1 loops=76)
Index Cond: (id = "*VALUES*".column1)
Buffers: shared hit=304
Planning Time: 0.437 ms
Execution Time: 0.389 ms
Looks like VALUES () does some hashing and join to improve performance but not sure.
NOTE: In my practical use case, id is uuid_generate_v4() e.x. d31cddc0-1771-4de8-ad41-e6c568b39a5d but the column may not be indexed as such.
Also, I have a table of with 5-10 million records.
Which way is for the better query performance?
Both options seem reasonable. I would, however, suggest to avoid casting the column you filter on. Instead, you should cast the literal values to uuid:
SELECT *
FROM table1
WHERE id = ANY (ARRAY['00e289b0-1ac8-451f-957f-e00bc289148e'::uuid, ...]);
This should allow the database to take advantage of an index on column id.

Simple aggregate query running slow

I am trying to determine why a fairly simple aggregate query is taking so long to perform on a single table. The table is called plots, and it is [id, device_id, time, ...] There are two indices, UNIQUE(id) and UNIQUE(device_id, time).
The query is simply:
SELECT device_id, MIN(time)
FROM plots
GROUP BY device_id
To me, this should be very fast, but it is taking 3+ minutes. The table has ~45 million rows, divided roughly equally among 1200 or so device_id's.
EXPLAIN for the query:
Finalize GroupAggregate (cost=1502955.41..1503055.97 rows=906 width=12)
Group Key: device_id
-> Gather Merge (cost=1502955.41..1503052.35 rows=906 width=12)
Workers Planned: 1
-> Sort (cost=1501955.41..1501955.86 rows=906 width=12)
Sort Key: device_id
-> Partial HashAggregate (cost=1501943.79..1501946.51 rows=906 width=12)
Group Key: device_id
-> Parallel Seq Scan on plots (cost=0.00..1476417.34 rows=25526447 width=12)
EXPLAIN for query with a where device_id = xxx:
GroupAggregate (cost=398.86..78038.77 rows=906 width=12)
Group Key: device_id
-> Bitmap Heap Scan on plots (cost=398.86..77992.99 rows=43065 width=12)
Recheck Cond: (device_id = 6780)
-> Bitmap Index Scan on index_plots_on_device_id_and_time (cost=0.00..396.71 rows=43065 width=0)
Index Cond: (device_id = 6780)
I have done VACUUM (FULL, ANALYZE) as well as REINDEX DATABASE.
I have also tried doing partition queries to accomplish the same.
Any pointers on making this faster? Or am I just boned on the table size. It seems like it should be fine with the index though. Maybe I am missing something...
EDIT / UPDATE:
The problem seems to be resolved at this point, though I am not sure why. I have dropped and rebuilt the index many times, and suddenly the query is only taking ~7 seconds, which is acceptable. Of note, this morning I dropped the index and created a new one with the reverse column order (time, device_id) and I was surprised to see good results. I then reverted to the previous index, and the results were improved further. I will refork the production database and try to retrace my steps and post an update. Should I be worried about the query planner wonking out in the future?
Current EXPLAIN with analysis (as requested):
Finalize GroupAggregate (cost=1000.12..480787.58 rows=905 width=12) (actual time=36.299..7530.403 rows=916 loops=1)
Group Key: device_id
Buffers: shared hit=135087 read=40325
I/O Timings: read=138.419
-> Gather Merge (cost=1000.12..480783.96 rows=905 width=12) (actual time=36.226..7552.052 rows=1829 loops=1)
Workers Planned: 1
Workers Launched: 1
Buffers: shared hit=509502 read=160807
I/O Timings: read=639.797
-> Partial GroupAggregate (cost=0.11..479687.58 rows=905 width=12) (actual time=15.779..5026.094 rows=914 loops=2)
Group Key: device_id
Buffers: shared hit=509502 read=160807
I/O Timings: read=639.797
-> Parallel Index Only Scan using index_plots_time_and_device_id on plots (cost=0.11..454158.41 rows=25526447 width=12) (actual time=0.033..2999.764 rows=21697480 loops=2)
Heap Fetches: 0
Buffers: shared hit=509502 read=160807
I/O Timings: read=639.797
Planning Time: 0.092 ms
Execution Time: 7554.100 ms
(19 rows)
Approach 1:
You can try to remove your UNIQUE to an index on your database. CREATE UNIQUE INDEX and CREATE INDEX have different behaviors. I believe that you can get benefits from CREATE INDEX.
Approach 2:
You can create a materialized view. If you can get some delay on your information, you can do the following:
CREATE MATERIALIZED VIEW myreport AS
SELECT device_id,
MIN(time) AS mintime
FROM plots
GROUP BY device_id
CREATE INDEX myreport_device_id ON myreport(device_id);
Also, you need to remember to regularly do:
REFRESH MATERIALIZED VIEW CONCURRENTLY myreport;
And less regularly do:
VACUUM ANALYZE myreport

PostgreSQL copy big data from table to table

I have architecture when some data some times is copied to temp table, and then on command, this data with condition must be copied into one of other tables, before this run counts, deletes and updates based on object_id which the same in all tables.
The longest operation is copying - it takes to 10 minutes! on 300 000 rows.
insert into t1 (t1_f1, t1_f2, name, value) SELECT DISTINCT ON (object_id) t1_f1, t1_f2, name, value where loading_process_id = 695 - it is for example.
Can I speed up the process? Or this is bad architecture and I have to change it?
Some more - heap table can contains very much data, to copy can be some millions rows. Some fields (which used for counting or filtering) indexed in heap and in other tables.
And this is plan for not so big data
Insert on main_like (cost=2993.63..3115.51 rows=6094 width=797) (actual time=6143.194..6143.194 rows=0 loops=1)
-> Subquery Scan on "*SELECT*" (cost=2993.63..3115.51 rows=6094 width=797) (actual time=55.995..125.081 rows=6094 loops=1)
-> Unique (cost=2993.63..3024.10 rows=6094 width=796) (actual time=55.909..79.237 rows=6094 loops=1)
-> Sort (cost=2993.63..3008.86 rows=6094 width=796) (actual time=55.904..69.195 rows=6094 loops=1)
Sort Key: main_loadingprocessobjects.object_id
Sort Method: quicksort Memory: 3321kB
-> Seq Scan on main_loadingprocessobjects (cost=0.00..465.02 rows=6094 width=796) (actual time=0.578..8.285 rows=6094 loops=1)
Filter: (loading_process_id = 695)
Rows Removed by Filter: 1428
Planning time: 0.394 ms
Execution time: 6143.631 ms
Explain without insert -
Unique (cost=2993.63..3024.10 rows=6094 width=796) (actual time=48.915..52.902 rows=6094 loops=1)
-> Sort (cost=2993.63..3008.86 rows=6094 width=796) (actual time=48.911..49.959 rows=6094 loops=1)
Sort Key: object_id
Sort Method: quicksort Memory: 3321kB
-> Seq Scan on main_loadingprocessobjects (cost=0.00..465.02 rows=6094 width=796) (actual time=0.401..5.516 rows=6094 loops=1)
Filter: (loading_process_id = 695)
Rows Removed by Filter: 1428
Planning time: 0.214 ms
Execution time: 53.694 ms
main_loadingprocessobjects - is heap
main_like - is t1
There is several point that you might concern about this issue:
COPY statement in PostgreSQL is faster than insert into select statement.
Create composite index on this following query ex: (type,category).
SELECT DISTINCT ON (object_id) t1_f1, t1_f2, name, value where
type='ti' and category='added'
GROUP BY Statement is faster than DISTINCT statement.
Increase temp_buffer on postgresql.conf if you consider high usage on temp table.
Try CTE (Common Table Expresions) instead temp table.
Hope this point my help you in your future development.