Postgres query becomes extremely slow with one single change - sql

I have this query, where among other things, I need the discussions.client_first_responded_at to be converted to the given time zone, which is different for every row. If I replace the reference to users.time_zone_offset to a static '-06:00'::INTERVAL — the query executes within a second, but with the dynamic reference to the users.time_zone_offset it takes ~120 seconds.
What am I missing?
SELECT client_id
FROM programs
INNER JOIN users ON users.id = programs.coach_id
WHERE programs.id IN (
SELECT COALESCE(discussions.parent_id, calls.parent_id) AS program_id
FROM categorizations
LEFT OUTER JOIN discussions ON discussions.id = categorizations.categorizable_id AND categorizations.categorizable_type = 'Discussion'
LEFT OUTER JOIN calls ON calls.id = categorizations.categorizable_id AND categorizations.categorizable_type = 'Call'
WHERE categorizations.categorizable_type = 'Discussion' OR categorizations.categorizable_type = 'Call'
AND COALESCE(
discussions.client_first_responded_at::timestamptz AT TIME ZONE users.time_zone_offset::INTERVAL,
calls.start_time::timestamptz
) BETWEEN '2020-09-01' AND '2020-09-30'
);
UPD:
Hash Join (cost=1.47..250840.61 rows=3542 width=8) (actual time=35.419..61810.027 rows=2266 loops=1)
Hash Cond: (programs.coach_id = users.id)
Join Filter: (SubPlan 1)
Rows Removed by Join Filter: 4821
Buffers: shared hit=3087792
-> Seq Scan on programs (cost=0.00..368.84 rows=7084 width=20) (actual time=0.008..11.890 rows=7087 loops=1)
Buffers: shared hit=298
-> Hash (cost=1.21..1.21 rows=21 width=40) (actual time=0.015..0.015 rows=21 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
Buffers: shared hit=1
-> Seq Scan on users (cost=0.00..1.21 rows=21 width=40) (actual time=0.004..0.008 rows=21 loops=1)
Buffers: shared hit=1
SubPlan 1
-> Hash Right Join (cost=777.19..2168.01 rows=7477 width=4) (actual time=0.007..7.887 rows=7516 loops=7087)
Hash Cond: (discussions.id = categorizations.categorizable_id)
Join Filter: ((categorizations.categorizable_type)::text = 'Discussion'::text)
Rows Removed by Join Filter: 3473
Filter: (((categorizations.categorizable_type)::text = 'Discussion'::text) OR (((categorizations.categorizable_type)::text = 'Call'::text) AND (COALESCE((timezone((users.time_zone_offset)::interval, (discussions.client_first_responded_at)::timestamp with time zone))::timestamp with time zone, (calls.start_time)::timestamp with time zone) >= '2020-09-01 00:00:00+02'::timestamp with time zone) AND (COALESCE((timezone((users.time_zone_offset)::interval, (discussions.client_first_responded_at)::timestamp with time zone))::timestamp with time zone, (calls.start_time)::timestamp with time zone) <= '2020-09-30 00:00:00+02'::timestamp with time zone)))
Rows Removed by Filter: 2578
Buffers: shared hit=3087493
-> Seq Scan on discussions (cost=0.00..751.46 rows=18746 width=20) (actual time=0.003..1.842 rows=15668 loops=7087)
Buffers: shared hit=3087252
-> Hash (cost=647.28..647.28 rows=10393 width=25) (actual time=13.355..13.355 rows=13090 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 763kB
Buffers: shared hit=241
-> Hash Left Join (cost=300.68..647.28 rows=10393 width=25) (actual time=3.476..10.083 rows=13090 loops=1)
Hash Cond: (categorizations.categorizable_id = calls.id)
Join Filter: ((categorizations.categorizable_type)::text = 'Call'::text)
Rows Removed by Join Filter: 2303
Buffers: shared hit=241
-> Seq Scan on categorizations (cost=0.00..319.32 rows=10393 width=13) (actual time=0.006..2.455 rows=13090 loops=1)
Filter: (((categorizable_type)::text = 'Discussion'::text) OR ((categorizable_type)::text = 'Call'::text))
Buffers: shared hit=123
-> Hash (cost=199.19..199.19 rows=8119 width=20) (actual time=3.462..3.462 rows=8119 loops=1)
Buckets: 8192 Batches: 1 Memory Usage: 509kB
Buffers: shared hit=118
-> Seq Scan on calls (cost=0.00..199.19 rows=8119 width=20) (actual time=0.005..1.766 rows=8119 loops=1)
Buffers: shared hit=118
Planning Time: 0.643 ms
Execution Time: 61811.118 ms
Hash Join (cost=3537.47..4289.98 rows=4825 width=8) (actual time=111.743..122.572 rows=4371 loops=1)
Hash Cond: (programs.coach_id = users.id)
Buffers: shared hit=1931
-> Hash Join (cost=3535.95..4273.46 rows=4825 width=12) (actual time=111.627..120.637 rows=4371 loops=1)
Hash Cond: (programs.id = COALESCE(discussions.parent_id, calls.parent_id))
Buffers: shared hit=1930
-> Seq Scan on programs (cost=0.00..658.50 rows=9650 width=20) (actual time=0.011..4.880 rows=9656 loops=1)
Buffers: shared hit=562
-> Hash (cost=3350.83..3350.83 rows=14810 width=8) (actual time=111.573..111.580 rows=4371 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 282kB
Buffers: shared hit=1368
-> HashAggregate (cost=3202.73..3350.83 rows=14810 width=8) (actual time=107.495..109.868 rows=4371 loops=1)
Group Key: COALESCE(discussions.parent_id, calls.parent_id)
Buffers: shared hit=1368
-> Hash Left Join (cost=1688.78..3165.70 rows=14810 width=8) (actual time=34.242..97.144 rows=19275 loops=1)
Hash Cond: (categorizations.categorizable_id = calls.id)
Join Filter: ((categorizations.categorizable_type)::text = 'Call'::text)
Rows Removed by Join Filter: 6869
Filter: (((categorizations.categorizable_type)::text = 'Discussion'::text) OR (((categorizations.categorizable_type)::text = 'Call'::text) AND (COALESCE((timezone('-06:00:00'::interval, (discussions.client_first_responded_at)::timestamp with time zone))::timestamp with time zone, (calls.start_time)::timestamp with time zone) >= '2020-09-01 00:00:00+00'::timestamp with time zone) AND (COALESCE((timezone('-06:00:00'::interval, (discussions.client_first_responded_at)::timestamp with time zone))::timestamp with time zone, (calls.start_time)::timestamp with time zone) <= '2020-09-30 00:00:00+00'::timestamp with time zone)))
Rows Removed by Filter: 7674
Buffers: shared hit=1368
-> Hash Right Join (cost=1046.24..2467.57 rows=21181 width=25) (actual time=21.970..62.581 rows=26949 loops=1)
Hash Cond: (discussions.id = categorizations.categorizable_id)
Join Filter: ((categorizations.categorizable_type)::text = 'Discussion'::text)
Rows Removed by Join Filter: 8461
Buffers: shared hit=1057
-> Seq Scan on discussions (cost=0.00..997.71 rows=31771 width=20) (actual time=0.183..8.541 rows=32064 loops=1)
Buffers: shared hit=680
-> Hash (cost=781.47..781.47 rows=21181 width=13) (actual time=21.720..21.721 rows=26949 loops=1)
Buckets: 32768 Batches: 1 Memory Usage: 1444kB
Buffers: shared hit=377
-> Seq Scan on categorizations (cost=0.00..781.47 rows=21181 width=13) (actual time=0.012..11.631 rows=26949 loops=1)
Filter: (((categorizable_type)::text = 'Discussion'::text) OR ((categorizable_type)::text = 'Call'::text))
Buffers: shared hit=377
-> Hash (cost=458.35..458.35 rows=14735 width=20) (actual time=12.205..12.206 rows=14720 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 876kB
Buffers: shared hit=311
-> Seq Scan on calls (cost=0.00..458.35 rows=14735 width=20) (actual time=0.016..6.914 rows=14720 loops=1)
Buffers: shared hit=311
-> Hash (cost=1.23..1.23 rows=23 width=8) (actual time=0.065..0.066 rows=23 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
Buffers: shared hit=1
-> Seq Scan on users (cost=0.00..1.23 rows=23 width=8) (actual time=0.048..0.054 rows=23 loops=1)
Buffers: shared hit=1
Planning Time: 1.010 ms
Execution Time: 123.511 ms

Related

Postgresql Query Performance drop down from 4 secs to 16 minutes just by adding one filter criteria

I wrote a simple query that involves 2 views.
I need to find, using self join on "contratti_gas_attivi" (which doesn't contain information on any offer,settled or not), the supplies that doesn't have a settled offer (which I found defined in "offerte_valide_dettaglio" which contains ONLY the informations for all the supplies where an offer is settled) but for which an offers exists for the same end-user supplied till a day before the starting of the new end-user.
So the following code gets me a resutl of almost 50 rows in 4 to 6 seconds (which I dont consider fast but this is not something that should be run frequentely) but I only found the supplies which haven't an offer associated to them but only 3 of those have a previous settled offer.
select ovd2.cod_offerta ,ovd2.id_contratto ,ovd2.id_offerta, cga.*,cga2.* from contratti_gas_attivi cga left join offerte_valide_dettaglio ovd
on ovd.id_contratto = cga.id
left join contratti_gas_attivi cga2 on cga.cod_pdr =cga2.cod_pdr and cga.data_inizio = (cga2.data_fine +'1 day'::interval)::date
left join offerte_valide_dettaglio ovd2 on ovd2.id_contratto = cga2.id
left join crm.customer_offers co on co.id = ovd2.id_offerta
where cga.tipo_inizio_contratto = 'VOLTURA' and ovd.id_offerta is null;
So at the end I just add and ovd2.cod_offerta is not null to take out the other 47 rows and the query now takes almost 16 minutes to complete ! Just by adding a filtering clause at the end !
How is that possible ? The query plan get completely screwed...
Before last filter
Nested Loop Left Join (cost=4472.03..6947.83 rows=1 width=251) (actual time=3052.330..4196.372 rows=54 loops=1)
-> Nested Loop Left Join (cost=3000.74..4218.60 rows=1 width=112) (actual time=1652.151..1655.099 rows=49 loops=1)
Join Filter: ((cc2.piva)::text = (cg.piva_cc)::text)
Rows Removed by Join Filter: 2349
Filter: (cc2.data_replaced IS NULL)
Rows Removed by Filter: 3
-> Nested Loop Left Join (cost=3000.74..4215.59 rows=1 width=94) (actual time=1652.133..1653.982 rows=49 loops=1)
Join Filter: ((cc.piva)::text = (cg.piva_cliente)::text)
Rows Removed by Join Filter: 2349
Filter: ((cc.data_replaced IS NULL) AND (cc.data_deleted IS NULL) AND (cc.data_deleted IS NULL))
Rows Removed by Filter: 3
-> Hash Right Join (cost=3000.74..4212.58 rows=1 width=76) (actual time=1652.098..1652.268 rows=49 loops=1)
Hash Cond: (ocg.id_contratto = cg.id)
Filter: (ocg.id_offerta IS NULL)
Rows Removed by Filter: 2352
-> Hash Left Join (cost=1457.68..2561.97 rows=40966 width=16) (actual time=1479.736..1501.558 rows=39890 loops=1)
Hash Cond: (ocg.id_contratto = cg_1.id)
-> Seq Scan on offerte_contratti_gas ocg (cost=0.00..950.66 rows=40966 width=16) (actual time=0.033..8.184 rows=39890 loops=1)
-> Hash (cost=1457.66..1457.66 rows=1 width=8) (actual time=1479.633..1479.635 rows=36517 loops=1)
Buckets: 65536 (originally 1024) Batches: 1 (originally 1) Memory Usage: 1939kB
-> Nested Loop Left Join (cost=0.00..1457.66 rows=1 width=8) (actual time=0.080..1466.558 rows=36517 loops=1)
Join Filter: ((cc2_1.piva)::text = (cg_1.piva_cc)::text)
Rows Removed by Join Filter: 1747843
Filter: (cc2_1.data_replaced IS NULL)
Rows Removed by Filter: 4973
-> Nested Loop Left Join (cost=0.00..1454.65 rows=1 width=20) (actual time=0.052..743.117 rows=36517 loops=1)
Join Filter: ((cc_1.piva)::text = (cg_1.piva_cliente)::text)
Rows Removed by Join Filter: 1747827
Filter: ((cc_1.data_replaced IS NULL) AND (cc_1.data_deleted IS NULL) AND (cc_1.data_deleted IS NULL))
Rows Removed by Filter: 4989
-> Seq Scan on contratti_gas cg_1 (cost=0.00..1451.64 rows=1 width=32) (actual time=0.024..13.750 rows=36517 loops=1)
Filter: ((delated_at IS NULL) AND (replaced_at IS NULL))
Rows Removed by Filter: 47
-> Seq Scan on controparti_commerciali cc_1 (cost=0.00..2.45 rows=45 width=20) (actual time=0.001..0.005 rows=49 loops=36517)
-> Seq Scan on controparti_commerciali cc2_1 (cost=0.00..2.45 rows=45 width=16) (actual time=0.001..0.005 rows=49 loops=36517)
-> Hash (cost=1543.05..1543.05 rows=1 width=76) (actual time=145.878..145.878 rows=2104 loops=1)
Buckets: 4096 (originally 1024) Batches: 1 (originally 1) Memory Usage: 253kB
-> Seq Scan on contratti_gas cg (cost=0.00..1543.05 rows=1 width=76) (actual time=0.035..144.183 rows=2104 loops=1)
Filter: ((delated_at IS NULL) AND (replaced_at IS NULL) AND ((tipo_inizio_contratto)::text = 'VOLTURA'::text))
Rows Removed by Filter: 34460
-> Seq Scan on controparti_commerciali cc (cost=0.00..2.45 rows=45 width=38) (actual time=0.002..0.006 rows=49 loops=49)
-> Seq Scan on controparti_commerciali cc2 (cost=0.00..2.45 rows=45 width=34) (actual time=0.001..0.006 rows=49 loops=49)
-> Hash Right Join (cost=1471.29..2729.21 rows=1 width=139) (actual time=51.523..51.845 rows=1 loops=49)
Hash Cond: (ocg_1.id_contratto = cg_2.id)
-> Hash Left Join (cost=1457.68..2561.97 rows=40966 width=27) (actual time=28.165..48.297 rows=39890 loops=49)
Hash Cond: (ocg_1.id_contratto = cg_3.id)
-> Seq Scan on offerte_contratti_gas ocg_1 (cost=0.00..950.66 rows=40966 width=27) (actual time=0.002..7.578 rows=39890 loops=49)
-> Hash (cost=1457.66..1457.66 rows=1 width=8) (actual time=1379.882..1379.883 rows=36517 loops=1)
Buckets: 65536 (originally 1024) Batches: 1 (originally 1) Memory Usage: 1939kB
-> Nested Loop Left Join (cost=0.00..1457.66 rows=1 width=8) (actual time=0.041..1368.418 rows=36517 loops=1)
Join Filter: ((cc2_3.piva)::text = (cg_3.piva_cc)::text)
Rows Removed by Join Filter: 1747843
Filter: (cc2_3.data_replaced IS NULL)
Rows Removed by Filter: 4973
-> Nested Loop Left Join (cost=0.00..1454.65 rows=1 width=20) (actual time=0.026..696.145 rows=36517 loops=1)
Join Filter: ((cc_3.piva)::text = (cg_3.piva_cliente)::text)
Rows Removed by Join Filter: 1747827
Filter: ((cc_3.data_replaced IS NULL) AND (cc_3.data_deleted IS NULL) AND (cc_3.data_deleted IS NULL))
Rows Removed by Filter: 4989
-> Seq Scan on contratti_gas cg_3 (cost=0.00..1451.64 rows=1 width=32) (actual time=0.013..12.852 rows=36517 loops=1)
Filter: ((delated_at IS NULL) AND (replaced_at IS NULL))
Rows Removed by Filter: 47
-> Seq Scan on controparti_commerciali cc_3 (cost=0.00..2.45 rows=45 width=20) (actual time=0.001..0.005 rows=49 loops=36517)
-> Seq Scan on controparti_commerciali cc2_3 (cost=0.00..2.45 rows=45 width=16) (actual time=0.001..0.005 rows=49 loops=36517)
-> Hash (cost=13.60..13.60 rows=1 width=112) (actual time=0.460..0.460 rows=1 loops=49)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Hash Right Join (cost=10.97..13.60 rows=1 width=112) (actual time=0.439..0.449 rows=1 loops=49)
Hash Cond: ((cc2_2.piva)::text = (cg_2.piva_cc)::text)
Filter: (cc2_2.data_replaced IS NULL)
Rows Removed by Filter: 0
-> Seq Scan on controparti_commerciali cc2_2 (cost=0.00..2.45 rows=45 width=34) (actual time=0.002..0.006 rows=49 loops=49)
-> Hash (cost=10.96..10.96 rows=1 width=94) (actual time=0.427..0.427 rows=1 loops=49)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Hash Right Join (cost=8.33..10.96 rows=1 width=94) (actual time=0.405..0.417 rows=1 loops=49)
Hash Cond: ((cc_2.piva)::text = (cg_2.piva_cliente)::text)
Filter: ((cc_2.data_replaced IS NULL) AND (cc_2.data_deleted IS NULL) AND (cc_2.data_deleted IS NULL))
Rows Removed by Filter: 0
-> Seq Scan on controparti_commerciali cc_2 (cost=0.00..2.45 rows=45 width=38) (actual time=0.003..0.008 rows=49 loops=49)
-> Hash (cost=8.31..8.31 rows=1 width=76) (actual time=0.391..0.391 rows=1 loops=49)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Index Scan using pk_codpdr_datainizio_iserror on contratti_gas cg_2 (cost=0.29..8.31 rows=1 width=76) (actual time=0.382..0.384 rows=1 loops=49)
Index Cond: ((cod_pdr)::text = (cg.cod_pdr)::text)
Filter: ((delated_at IS NULL) AND (replaced_at IS NULL) AND (cg.data_inizio = ((data_fine + '1 day'::interval))::date))
Rows Removed by Filter: 2
Planning Time: 5.997 ms
Execution Time: 4196.992 ms
After last filter:
Nested Loop (cost=4461.49..6786.31 rows=1 width=251) (actual time=483943.882..1016665.139 rows=43 loops=1)
Join Filter: (((cg_2.cod_pdr)::text = (cg.cod_pdr)::text) AND (((cg_2.data_fine + '1 day'::interval))::date = cg.data_inizio))
Rows Removed by Join Filter: 1954567
-> Nested Loop Left Join (cost=1460.75..2567.69 rows=1 width=139) (actual time=1416.580..5211.218 rows=39890 loops=1)
-> Hash Join (cost=1457.68..2561.97 rows=1 width=139) (actual time=1416.502..1649.064 rows=39890 loops=1)
Hash Cond: (ocg_1.id_contratto = cg_2.id)
-> Seq Scan on offerte_contratti_gas ocg_1 (cost=0.00..950.66 rows=40966 width=27) (actual time=0.030..31.748 rows=39890 loops=1)
Filter: (cod_offerta IS NOT NULL)
-> Hash (cost=1457.66..1457.66 rows=1 width=112) (actual time=1416.000..1419.044 rows=36517 loops=1)
Buckets: 32768 (originally 1024) Batches: 2 (originally 1) Memory Usage: 3841kB
-> Nested Loop Left Join (cost=0.00..1457.66 rows=1 width=112) (actual time=0.072..1383.415 rows=36517 loops=1)
Join Filter: ((cc2_2.piva)::text = (cg_2.piva_cc)::text)
Rows Removed by Join Filter: 1747843
Filter: (cc2_2.data_replaced IS NULL)
Rows Removed by Filter: 4973
-> Nested Loop Left Join (cost=0.00..1454.65 rows=1 width=94) (actual time=0.043..704.608 rows=36517 loops=1)
Join Filter: ((cc_2.piva)::text = (cg_2.piva_cliente)::text)
Rows Removed by Join Filter: 1747827
Filter: ((cc_2.data_replaced IS NULL) AND (cc_2.data_deleted IS NULL) AND (cc_2.data_deleted IS NULL))
Rows Removed by Filter: 4989
-> Seq Scan on contratti_gas cg_2 (cost=0.00..1451.64 rows=1 width=76) (actual time=0.016..14.948 rows=36517 loops=1)
Filter: ((delated_at IS NULL) AND (replaced_at IS NULL))
Rows Removed by Filter: 47
-> Seq Scan on controparti_commerciali cc_2 (cost=0.00..2.45 rows=45 width=38) (actual time=0.001..0.005 rows=49 loops=36517)
-> Seq Scan on controparti_commerciali cc2_2 (cost=0.00..2.45 rows=45 width=34) (actual time=0.001..0.005 rows=49 loops=36517)
-> Hash Right Join (cost=3.08..5.71 rows=1 width=8) (actual time=0.061..0.073 rows=1 loops=39890)
Hash Cond: ((cc2_3.piva)::text = (cg_3.piva_cc)::text)
Filter: (cc2_3.data_replaced IS NULL)
Rows Removed by Filter: 0
-> Seq Scan on controparti_commerciali cc2_3 (cost=0.00..2.45 rows=45 width=16) (actual time=0.001..0.005 rows=49 loops=39890)
-> Hash (cost=3.06..3.06 rows=1 width=20) (actual time=0.048..0.048 rows=1 loops=39890)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Hash Right Join (cost=0.44..3.06 rows=1 width=20) (actual time=0.027..0.036 rows=1 loops=39890)
Hash Cond: ((cc_3.piva)::text = (cg_3.piva_cliente)::text)
Filter: ((cc_3.data_replaced IS NULL) AND (cc_3.data_deleted IS NULL) AND (cc_3.data_deleted IS NULL))
Rows Removed by Filter: 0
-> Seq Scan on controparti_commerciali cc_3 (cost=0.00..2.45 rows=45 width=20) (actual time=0.001..0.005 rows=49 loops=39890)
-> Hash (cost=0.42..0.42 rows=1 width=32) (actual time=0.018..0.018 rows=1 loops=39890)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Index Scan using uq_id on contratti_gas cg_3 (cost=0.29..0.42 rows=1 width=32) (actual time=0.013..0.013 rows=1 loops=39890)
Index Cond: (id = ocg_1.id_contratto)
Filter: ((delated_at IS NULL) AND (replaced_at IS NULL))
-> Nested Loop Left Join (cost=3000.74..4218.60 rows=1 width=112) (actual time=23.552..25.302 rows=49 loops=39890)
Join Filter: ((cc2.piva)::text = (cg.piva_cc)::text)
Rows Removed by Join Filter: 2349
Filter: (cc2.data_replaced IS NULL)
Rows Removed by Filter: 3
-> Nested Loop Left Join (cost=3000.74..4215.59 rows=1 width=94) (actual time=23.533..24.446 rows=49 loops=39890)
Join Filter: ((cc.piva)::text = (cg.piva_cliente)::text)
Rows Removed by Join Filter: 2349
Filter: ((cc.data_replaced IS NULL) AND (cc.data_deleted IS NULL) AND (cc.data_deleted IS NULL))
Rows Removed by Filter: 3
-> Hash Right Join (cost=3000.74..4212.58 rows=1 width=76) (actual time=23.493..23.561 rows=49 loops=39890)
Hash Cond: (ocg.id_contratto = cg.id)
Filter: (ocg.id_offerta IS NULL)
Rows Removed by Filter: 2352
-> Hash Left Join (cost=1457.68..2561.97 rows=40966 width=16) (actual time=0.034..19.423 rows=39890 loops=39890)
Hash Cond: (ocg.id_contratto = cg_1.id)
-> Seq Scan on offerte_contratti_gas ocg (cost=0.00..950.66 rows=40966 width=16) (actual time=0.002..7.327 rows=39890 loops=39890)
-> Hash (cost=1457.66..1457.66 rows=1 width=8) (actual time=1236.665..1238.748 rows=36517 loops=1)
Buckets: 65536 (originally 1024) Batches: 1 (originally 1) Memory Usage: 1939kB
-> Nested Loop Left Join (cost=0.00..1457.66 rows=1 width=8) (actual time=0.052..1230.365 rows=36517 loops=1)
Join Filter: ((cc2_1.piva)::text = (cg_1.piva_cc)::text)
Rows Removed by Join Filter: 1747843
Filter: (cc2_1.data_replaced IS NULL)
Rows Removed by Filter: 4973
-> Nested Loop Left Join (cost=0.00..1454.65 rows=1 width=20) (actual time=0.035..625.434 rows=36517 loops=1)
Join Filter: ((cc_1.piva)::text = (cg_1.piva_cliente)::text)
Rows Removed by Join Filter: 1747827
Filter: ((cc_1.data_replaced IS NULL) AND (cc_1.data_deleted IS NULL) AND (cc_1.data_deleted IS NULL))
Rows Removed by Filter: 4989
-> Seq Scan on contratti_gas cg_1 (cost=0.00..1451.64 rows=1 width=32) (actual time=0.012..11.876 rows=36517 loops=1)
Filter: ((delated_at IS NULL) AND (replaced_at IS NULL))
Rows Removed by Filter: 47
-> Seq Scan on controparti_commerciali cc_1 (cost=0.00..2.45 rows=45 width=20) (actual time=0.001..0.004 rows=49 loops=36517)
-> Seq Scan on controparti_commerciali cc2_1 (cost=0.00..2.45 rows=45 width=16) (actual time=0.001..0.004 rows=49 loops=36517)
-> Hash (cost=1543.05..1543.05 rows=1 width=76) (actual time=8.053..8.053 rows=2104 loops=1)
Buckets: 4096 (originally 1024) Batches: 1 (originally 1) Memory Usage: 253kB
-> Seq Scan on contratti_gas cg (cost=0.00..1543.05 rows=1 width=76) (actual time=0.014..7.482 rows=2104 loops=1)
Filter: ((delated_at IS NULL) AND (replaced_at IS NULL) AND ((tipo_inizio_contratto)::text = 'VOLTURA'::text))
Rows Removed by Filter: 34460
-> Seq Scan on controparti_commerciali cc (cost=0.00..2.45 rows=45 width=38) (actual time=0.001..0.004 rows=49 loops=1954610)
-> Seq Scan on controparti_commerciali cc2 (cost=0.00..2.45 rows=45 width=34) (actual time=0.001..0.004 rows=49 loops=1954610)
Planning Time: 7.946 ms
Execution Time: 1016666.928 ms
I am not very familiar in reading query plans but I can see that adding the last clause completely ruin the "fast" query plan that was running before! Any advice ?
edit:
Holly molly jjanes that worked wonderfully !
I read that vacuum permanentely delete obsolete deleted/updated tuples and vaccum analyze update the statistic used by the planner. Still I'm a bit confused on why that worked ! The DB is on local and I do only work on it, I did some crud operation on the underlying tables but going from 4 seconds to 16 just because I didn't run the vacuum sounds strange to me! Now the full query only take 187ms !!! Do you have any resource to share beside official docs about vacuum to better understand ?,Many thanks, unfortunately I cannot upvote the answer yet.

SQL (Postgres) Optimal Number of Joins

This query is being performed in postgres version 12. This query poses 8 joins, and lasts approximately 5 seconds.
Query 1
select *
from "public"."products" "P"
inner join "system"."categories" "C" on "C"."id" = "P"."id_category"
inner join "public"."businesses" "E" on "E"."id" = "P"."id_business"
left join "public"."product_files" "pf" on "pf"."id_product" = "P"."id"
left join "system"."files" "f" on "f"."name" = "pf"."img_code"
left join "public"."product_variations" "pv" on ("pv"."id_product" = "P"."id" and "pv"."status" <> 'Deleted')
left join "public"."product_stocks" "ps" on ("ps"."id_product_variation" = "pv"."id" and "ps"."status" <> 'Deleted')
left join "public"."product_stocks" "pps" on ("pps"."id_product" = "P"."id" and "pps"."status" <> 'Deleted')
inner join search_products( array['tires'], 8, 1, 'es') "search" on search.id = "P"."id"
where "P"."status" <> 'Deleted'
Postgres Query EXPLAIN(ANALYZE, BUFFERS) for Query 1
Merge Join (cost=112948.60..121805.61 rows=4996 width=1145) (actual time=2003.599..2426.892 rows=40 loops=1)
Merge Cond: ("P".id = search.id)
Buffers: shared hit=760531, temp read=16912 written=18837
-> Merge Left Join (cost=112888.52..120950.73 rows=287945 width=1105) (actual time=1607.013..2093.722 rows=380961 loops=1)
Merge Cond: ("P".id = pf.id_product)
Buffers: shared hit=752079, temp read=15561 written=15606
-> Merge Left Join (cost=16288.22..19167.29 rows=57631 width=771) (actual time=165.803..271.662 rows=76193 loops=1)
Merge Cond: ("P".id = pps.id_product)
Buffers: shared hit=3820, temp read=2706 written=2733
-> Merge Left Join (cost=16287.81..16577.01 rows=57631 width=686) (actual time=165.787..217.878 rows=56921 loops=1)
Merge Cond: ("P".id = pv.id_product)
Buffers: shared hit=2058, temp read=2706 written=2733
-> Sort (cost=14888.93..15033.01 rows=57631 width=514) (actual time=156.825..175.154 rows=56920 loops=1)
Sort Key: "P".id
Sort Method: external merge Disk: 21840kB
Buffers: shared hit=1430, temp read=2706 written=2733
-> Hash Join (cost=43.49..2484.49 rows=57631 width=514) (actual time=0.266..64.052 rows=57631 loops=1)
Hash Cond: ("P".id_business = "E".id)
Buffers: shared hit=1430
-> Hash Join (cost=37.81..2322.14 rows=57631 width=374) (actual time=0.214..39.402 rows=57631 loops=1)
Hash Cond: ("P".id_category = "C".id)
Buffers: shared hit=1427
-> Seq Scan on products "P" (cost=0.00..2132.41 rows=57631 width=252) (actual time=0.009..12.754 rows=57631 loops=1)
Filter: ((status)::text <> 'Deleted'::text)
Rows Removed by Filter: 2
Buffers: shared hit=1412
-> Hash (cost=25.14..25.14 rows=1014 width=122) (actual time=0.201..0.201 rows=1014 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 124kB
Buffers: shared hit=15
-> Seq Scan on categories "C" (cost=0.00..25.14 rows=1014 width=122) (actual time=0.007..0.078 rows=1014 loops=1)
Buffers: shared hit=15
-> Hash (cost=4.19..4.19 rows=119 width=140) (actual time=0.047..0.048 rows=119 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 29kB
Buffers: shared hit=3
-> Seq Scan on businesses "E" (cost=0.00..4.19 rows=119 width=140) (actual time=0.013..0.024 rows=119 loops=1)
Buffers: shared hit=3
-> Sort (cost=1398.88..1399.05 rows=70 width=172) (actual time=8.956..8.958 rows=3 loops=1)
Sort Key: pv.id_product
Sort Method: quicksort Memory: 43kB
Buffers: shared hit=628
-> Hash Right Join (cost=4.58..1396.73 rows=70 width=172) (actual time=8.853..8.912 rows=70 loops=1)
Hash Cond: (ps.id_product_variation = pv.id)
Buffers: shared hit=628
-> Seq Scan on product_stocks ps (cost=0.00..1259.35 rows=50589 width=85) (actual time=0.009..7.030 rows=50595 loops=1)
Filter: ((status)::text <> 'Deleted'::text)
Rows Removed by Filter: 73
Buffers: shared hit=626
-> Hash (cost=3.70..3.70 rows=70 width=87) (actual time=0.048..0.049 rows=70 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 16kB
Buffers: shared hit=2
-> Seq Scan on product_variations pv (cost=0.00..3.70 rows=70 width=87) (actual time=0.020..0.039 rows=70 loops=1)
Filter: ((status)::text <> 'Deleted'::text)
Rows Removed by Filter: 66
Buffers: shared hit=2
-> Index Scan using product_stocks_id_product_id_product_variation_id_location_key on product_stocks pps (cost=0.41..1819.97 rows=50589 width=85) (actual time=0.013..17.822 rows=49924 loops=1)
Filter: ((status)::text <> 'Deleted'::text)
Rows Removed by Filter: 1
Buffers: shared hit=1762
-> Materialize (cost=96600.25..98040.03 rows=287955 width=334) (actual time=1441.203..1613.160 rows=380961 loops=1)
Buffers: shared hit=748259, temp read=12855 written=12873
-> Sort (cost=96600.25..97320.14 rows=287955 width=334) (actual time=1441.198..1567.183 rows=284596 loops=1)
Sort Key: pf.id_product
Sort Method: external merge Disk: 102840kB
Buffers: shared hit=748259, temp read=12855 written=12873
-> Merge Left Join (cost=0.84..44546.48 rows=287955 width=334) (actual time=0.021..1013.742 rows=287955 loops=1)
Merge Cond: ((pf.img_code)::text = (f.name)::text)
Buffers: shared hit=748259
-> Index Scan using product_files_pkey on product_files pf (cost=0.42..10516.05 rows=287955 width=66) (actual time=0.005..184.173 rows=287955 loops=1)
Buffers: shared hit=289884
-> Index Scan using files_pkey on files f (cost=0.42..29304.42 rows=455180 width=268) (actual time=0.005..338.206 rows=455178 loops=1)
Buffers: shared hit=458375
-> Sort (cost=60.08..62.58 rows=1000 width=40) (actual time=313.554..313.558 rows=36 loops=1)
Sort Key: search.id
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=8452, temp read=1351 written=3231
-> Function Scan on search_products search (cost=0.25..10.25 rows=1000 width=40) (actual time=313.544..313.545 rows=8 loops=1)
Buffers: shared hit=8452, temp read=1351 written=3231
Planning Time: 2.632 ms
Execution Time: 2440.414 ms
I was reviewing the way to optimize the query, so I was doing the joins one by one to see where the problem was, and among so many permutations in order of join, I realized that postgres from join number 7, apparently stops find the best way to run the query. So, when i delete any (randomly) join, the query lasts 300ms
Query 2
select *
from "public"."products" "P"
inner join "system"."categories" "C" on "C"."id" = "P"."id_category"
left join "public"."product_files" "pf" on "pf"."id_product" = "P"."id"
left join "system"."files" "f" on "f"."name" = "pf"."img_code"
left join "public"."product_variations" "pv" on ("pv"."id_product" = "P"."id" and "pv"."status" <> 'Deleted')
left join "public"."product_stocks" "ps" on ("ps"."id_product_variation" = "pv"."id" and "ps"."status" <> 'Deleted')
left join "public"."product_stocks" "pps" on ("pps"."id_product" = "P"."id" and "pps"."status" <> 'Deleted')
inner join search_products( array['tires'], 8, 1, 'es') "search" on search.id = "P"."id"
where "P"."status" <> 'Deleted'
Postgres Query EXPLAIN(ANALYZE, BUFFERS) for Query 2
Nested Loop Left Join (cost=1365.30..6482.09 rows=4996 width=1005) (actual time=349.888..350.399 rows=40 loops=1)
Buffers: shared hit=9339, temp read=1351 written=3231
-> Nested Loop Left Join (cost=1364.88..3893.89 rows=4996 width=737) (actual time=349.866..349.957 rows=40 loops=1)
Buffers: shared hit=9179, temp read=1351 written=3231
-> Nested Loop Left Join (cost=1364.46..3250.90 rows=1000 width=671) (actual time=349.857..349.899 rows=8 loops=1)
Buffers: shared hit=9147, temp read=1351 written=3231
-> Hash Join (cost=1364.04..2759.11 rows=1000 width=586) (actual time=349.839..349.853 rows=8 loops=1)
Hash Cond: ("P".id_category = "C".id)
Buffers: shared hit=9119, temp read=1351 written=3231
-> Hash Right Join (cost=1326.23..2718.65 rows=1000 width=464) (actual time=349.566..349.574 rows=8 loops=1)
Hash Cond: (pv.id_product = "P".id)
Buffers: shared hit=9104, temp read=1351 written=3231
-> Hash Right Join (cost=4.58..1396.73 rows=70 width=172) (actual time=8.953..9.013 rows=70 loops=1)
Hash Cond: (ps.id_product_variation = pv.id)
Buffers: shared hit=628
-> Seq Scan on product_stocks ps (cost=0.00..1259.35 rows=50589 width=85) (actual time=0.008..7.060 rows=50595 loops=1)
Filter: ((status)::text <> 'Deleted'::text)
Rows Removed by Filter: 73
Buffers: shared hit=626
-> Hash (cost=3.70..3.70 rows=70 width=87) (actual time=0.047..0.048 rows=70 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 16kB
Buffers: shared hit=2
-> Seq Scan on product_variations pv (cost=0.00..3.70 rows=70 width=87) (actual time=0.015..0.033 rows=70 loops=1)
Filter: ((status)::text <> 'Deleted'::text)
Rows Removed by Filter: 66
Buffers: shared hit=2
-> Hash (cost=1309.15..1309.15 rows=1000 width=292) (actual time=340.542..340.543 rows=8 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 10kB
Buffers: shared hit=8476, temp read=1351 written=3231
-> Nested Loop (cost=0.54..1309.15 rows=1000 width=292) (actual time=340.505..340.535 rows=8 loops=1)
Buffers: shared hit=8476, temp read=1351 written=3231
-> Function Scan on search_products search (cost=0.25..10.25 rows=1000 width=40) (actual time=340.483..340.485 rows=8 loops=1)
Buffers: shared hit=8452, temp read=1351 written=3231
-> Index Scan using products_pkey on products "P" (cost=0.29..1.30 rows=1 width=252) (actual time=0.005..0.005 rows=1 loops=8)
Index Cond: (id = search.id)
Filter: ((status)::text <> 'Deleted'::text)
Buffers: shared hit=24
-> Hash (cost=25.14..25.14 rows=1014 width=122) (actual time=0.268..0.268 rows=1014 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 124kB
Buffers: shared hit=15
-> Seq Scan on categories "C" (cost=0.00..25.14 rows=1014 width=122) (actual time=0.012..0.110 rows=1014 loops=1)
Buffers: shared hit=15
-> Index Scan using product_stocks_id_product_id_product_variation_id_location_key on product_stocks pps (cost=0.41..0.47 rows=2 width=85) (actual time=0.005..0.005 rows=1 loops=8)
Index Cond: (id_product = "P".id)
Filter: ((status)::text <> 'Deleted'::text)
Buffers: shared hit=28
-> Index Scan using idx_product_files_product on product_files pf (cost=0.42..0.59 rows=5 width=66) (actual time=0.004..0.005 rows=5 loops=8)
Index Cond: (id_product = "P".id)
Buffers: shared hit=32
-> Index Scan using files_pkey on files f (cost=0.42..0.52 rows=1 width=268) (actual time=0.010..0.010 rows=1 loops=40)
Index Cond: ((name)::text = (pf.img_code)::text)
Buffers: shared hit=160
Planning Time: 2.581 ms
Execution Time: 350.525 ms
Is there an article that explains this behavior to me? and how to fix it?
That is because join_collapse_limit has a default value of 8. The optimizer tries all permutations only for the first 8 tables, the rest is joined as written. The rationale is to keep planning time reasonably short, which increases exponentially with the number of tables.
Options:
increase the parameter
figure out a good join order ans rewrite the query to join in that order

How to optimize an sql query with OR in where to filter on values from different columns?

General context: I'm having a postgres-database which includes a knowledge system (skos). The knowledge system is in a different scheme. skos describes all the links between concepts, how they are related and also includes notes, notations, labels, ... . Each concept is mapped to this skos.
When querying the actual business logic, which is about Cascades, CascadeSteps, Technologies, Additives, ..., I want to include the translations for each of these concepts in the response.
I'm adding views to simplify the requests.
In one of the queries, this is the bottleneck (this is a part of the full query):
SELECT trcs.cascade,
"Translations".language,
jsonb_object_agg("Translations".code, "Translations".skos) AS resource
FROM "CascadeStep" trcs
LEFT JOIN "CascadeUnit" trcu
ON trcu.cuid = trcs.from
LEFT JOIN "TechnologyAdditive" trta on trta.technology = trcu.technology
LEFT JOIN skos."Translations"
ON (skos."Translations".id = trcs.from) OR
(skos."Translations".id = trcs.flow) OR
(skos."Translations".id = trcs.product) OR
(skos."Translations".id = trta.additive)
WHERE skos."Translations".notation = 'SimpleNotation'
GROUP BY trcs.cascade, "Translations".language)
The reason seems to be the OR in the filter. When I analyze:
HASH JOIN | 29 %
Left join on ((trcs."from")::bpchar = (trcu.cuid)::bpchar)
Hash Join Node joins to record sets by hashing one of them (using a Hash Scan).
Node Type Hash Join
Parent Relationship Outer
Parallel Aware false
Join Type Left
Startup Cost 43.02
Total Cost 191.84
Plan Rows 1620
Plan Width 72
Actual Startup Time 0.468
Actual Total Time 12.035
Actual Rows 598
Actual Loops 1
Output trcs.cascade,"Translations".language,"Translations".code,"Translations".skos
Inner Unique false
Hash Cond ((trcs."from")::bpchar = (trcu.cuid)::bpchar)
Filter (("Translations".id = (trcs."from")::bpchar) OR ("Translations".id = (trcs.flow)::bpchar) OR ("Translations".id = (trcs.product)::bpchar) OR ("Translations".id = (trta.additive)::bpchar))
I tried to use different strategies in that where clause, but none of them seems to produce better results:
... on skos."Translations".id = any(array[trcs.from, trcs.flow, trcs.product, trta.additive])
is about 1.25x slower.
... on skos."Translations".id in (select trcs.from union all select trcs.flow union all select trcs.product union all select trta.additive)
is 2-5x slower
... on skos."Translations".id in (select trcs.from
union all
select trcs.flow
union all
select trcs.product
union all
select trta.additive
from "TechnologyAdditive" trta
left join "CascadeUnit" trcu on trcu.cuid=trcs.from
where trta.technology = trcu.technology)
10-30x slower
The actual question: is it possible to efficiently use different columns in the where clause of the query, to replace the ORs, or can the query be rewritten more efficiently?
== EDIT: add (relevant) output of explain (ANALYZE, COSTS, VERBOSE, BUFFERS)
Hash (cost=200.55..200.55 rows=966 width=61) (actual time=13.467..13.467 rows=21 loops=1)
Output: i18n.language, i18n.resource, i18n.cascade
Buckets: 1024 Batches: 1 Memory Usage: 29kB
Buffers: shared hit=12
-> Subquery Scan on i18n (cost=194.27..200.55 rows=966 width=61) (actual time=13.340..13.450 rows=21 loops=1)
Output: i18n.language, i18n.resource, i18n.cascade
Buffers: shared hit=12
-> HashAggregate (cost=194.27..197.66 rows=966 width=61) (actual time=13.339..13.447 rows=21 loops=1)
Output: trcs.cascade, "Translations".language, jsonb_object_agg("Translations".code, "Translations".skos)
Group Key: trcs.cascade, "Translations".language
Buffers: shared hit=12
-> Hash Left Join (cost=43.02..191.84 rows=1620 width=72) (actual time=0.452..12.193 rows=598 loops=1)
Output: trcs.cascade, "Translations".language, "Translations".code, "Translations".skos
Hash Cond: ((trcs."from")::bpchar = (trcu.cuid)::bpchar)
Filter: (("Translations".id = (trcs."from")::bpchar) OR ("Translations".id = (trcs.flow)::bpchar) OR ("Translations".id = (trcs.product)::bpchar) OR ("Translations".id = (trta.additive)::bpchar))
Rows Removed by Filter: 19802
Buffers: shared hit=12
-> Nested Loop (cost=23.32..55.40 rows=7498 width=176) (actual time=0.344..4.213 rows=13040 loops=1)
Output: trcs.cascade, trcs."from", trcs.flow, trcs.product, "Translations".language, "Translations".code, "Translations".skos, "Translations".id
Buffers: shared hit=10
-> Seq Scan on public."CascadeStep" trcs (cost=0.00..5.49 rows=163 width=104) (actual time=0.004..0.076 rows=163 loops=1)
Output: trcs.cascade, trcs."from", trcs.flow, trcs.product
Buffers: shared hit=5
-> Materialize (cost=23.32..23.69 rows=46 width=72) (actual time=0.002..0.009 rows=80 loops=163)
Output: "Translations".language, "Translations".code, "Translations".skos, "Translations".id
Buffers: shared hit=5
-> Subquery Scan on "Translations" (cost=23.32..23.64 rows=46 width=72) (actual time=0.337..0.520 rows=80 loops=1)
Output: "Translations".language, "Translations".code, "Translations".skos, "Translations".id
Buffers: shared hit=5
-> Group (cost=23.32..23.50 rows=46 width=117) (actual time=0.336..0.508 rows=80 loops=1)
Output: l.concept_nss, l.language, n."notationType", n.value, jsonb_strip_nulls(jsonb_build_object('prefLabel', l.label, 'definition', v.object)), v.object
Group Key: l.concept_nss, n."notationType", l.language, v.object, n.value
Buffers: shared hit=5
-> Sort (cost=23.32..23.34 rows=46 width=108) (actual time=0.329..0.337 rows=80 loops=1)
Output: l.concept_nss, l.language, n."notationType", n.value, v.object, l.label
Sort Key: l.concept_nss, l.language, v.object, n.value
Sort Method: quicksort Memory: 37kB
Buffers: shared hit=5
-> Hash Right Join (cost=10.41..23.07 rows=46 width=108) (actual time=0.155..0.182 rows=80 loops=1)
Output: l.concept_nss, l.language, n."notationType", n.value, v.object, l.label
Hash Cond: ((n_1.concept_nss = n.concept_nss) AND (v.lang = l.language))
Buffers: shared hit=5
-> Hash Join (cost=7.53..20.18 rows=4 width=96) (actual time=0.025..0.031 rows=12 loops=1)
Output: n_1.concept_nss, v.object, v.lang
Inner Unique: true
Hash Cond: (v.note_id = n_1.id)
Buffers: shared hit=3
-> Seq Scan on skos."NoteValue" v (cost=0.00..12.25 rows=750 width=80) (actual time=0.003..0.004 rows=12 loops=1)
Output: v.lang, v.object, v.note_id
Buffers: shared hit=1
-> Hash (cost=7.52..7.52 rows=4 width=48) (actual time=0.016..0.016 rows=12 loops=1)
Output: n_1.id, n_1.concept_nss
Buckets: 1024 Batches: 1 Memory Usage: 9kB
Buffers: shared hit=2
-> Bitmap Heap Scan on skos."Note" n_1 (cost=2.04..7.52 rows=4 width=48) (actual time=0.010..0.012 rows=12 loops=1)
Output: n_1.id, n_1.concept_nss
Recheck Cond: (n_1.type = 'skos_definition'::text)
Heap Blocks: exact=1
Buffers: shared hit=2
-> Bitmap Index Scan on "Note_property_concept_key" (cost=0.00..2.04 rows=4 width=0) (actual time=0.006..0.006 rows=13 loops=1)
Index Cond: (n_1.type = 'skos_definition'::text)
Buffers: shared hit=1
-> Hash (cost=2.69..2.69 rows=46 width=102) (actual time=0.124..0.124 rows=80 loops=1)
Output: n."notationType", n.value, n.concept_nss, l.concept_nss, l.language, l.label
Buckets: 1024 Batches: 1 Memory Usage: 19kB
Buffers: shared hit=2
-> Hash Right Join (cost=1.34..2.69 rows=46 width=102) (actual time=0.051..0.093 rows=80 loops=1)
Output: n."notationType", n.value, n.concept_nss, l.concept_nss, l.language, l.label
Hash Cond: (l.concept_nss = n.concept_nss)
Buffers: shared hit=2
-> Seq Scan on skos."ConceptPrefLabel" l (cost=0.00..1.16 rows=53 width=52) (actual time=0.004..0.010 rows=80 loops=1)
Output: l.label, l.language, l.concept_nss
Buffers: shared hit=1
-> Hash (cost=1.18..1.18 rows=46 width=50) (actual time=0.041..0.041 rows=80 loops=1)
Output: n."notationType", n.value, n.concept_nss
Buckets: 1024 Batches: 1 Memory Usage: 15kB
Buffers: shared hit=1
-> Seq Scan on skos."Notation" n (cost=0.00..1.18 rows=46 width=50) (actual time=0.004..0.018 rows=80 loops=1)
Output: n."notationType", n.value, n.concept_nss
Filter: (n."notationType" = 'NutricasCode'::text)
Rows Removed by Filter: 5
Buffers: shared hit=1
-> Hash (cost=18.85..18.85 rows=242 width=58) (actual time=0.102..0.102 rows=137 loops=1)
Output: trcu.cuid, trta.additive
Buckets: 1024 Batches: 1 Memory Usage: 19kB
Buffers: shared hit=2
-> Hash Left Join (cost=15.72..18.85 rows=242 width=58) (actual time=0.021..0.067 rows=137 loops=1)
Output: trcu.cuid, trta.additive
Hash Cond: ((trcu.technology)::bpchar = (trta.technology)::bpchar)
Buffers: shared hit=2
-> Seq Scan on public."CascadeUnit" trcu (cost=0.00..1.17 rows=55 width=52) (actual time=0.006..0.013 rows=82 loops=1)
Output: trcu.cuid, trcu.technology
Buffers: shared hit=1
-> Hash (cost=12.64..12.64 rows=880 width=64) (actual time=0.009..0.009 rows=16 loops=1)
Output: trta.technology, trta.additive
Buckets: 1024 Batches: 1 Memory Usage: 10kB
Buffers: shared hit=1
-> Seq Scan on public."TechnologyAdditive" trta (cost=0.00..12.64 rows=880 width=64) (actual time=0.003..0.004 rows=16 loops=1)
Output: trta.technology, trta.additive
Buffers: shared hit=1
Use multiple left joins. I don't even think you need aggregation:
SELECT . . .
FROM "CascadeStep" trcs LEFT JOIN
"CascadeUnit" trcu
ON trcu.cuid = trcs.from LEFT JOIN
"TechnologyAdditive" trta
ON trta.technology = trcu.technology LEFT JOIN
skos."Translations" tfrom and tfrom.notation = 'SimpleNotation'
ON tfrom.id = trcs.from) LEFT JOIN
skos."Translations" tto and tto.notation = 'SimpleNotation'
ON tto.id = trcs.to LEFT JOIN
skos."Translations" tflow and tto.notation = 'SimpleNotation'
ON tflow.id = trcs.flow LEFT JOIN
skos."Translations" tproduct and tflow.notation = 'SimpleNotation'
ON tproduct.id = trcs.product LEFT JOIN
skos."Translations" tadditive
ON tadditive.id = trcs.additive and tadditive.notation = 'SimpleNotation'
WHERE tto.id is not null or tflow.id is not null or
tproduct.id is not null or tadditive is not null;
I'm not sure exactly what your results look like. You may need to unpivot the data (using a lateral join) to get your exact results. But this should fix your performance problem.
Note: This assumes that each join to Translations has one match. This seems reasonable but you might need to take duplicates into account if it is not the case.

How optimize SQL query with JOIN many values?

I have a query like this where join ~6000 values:
SELECT MAX(user_id) as user_id, SUM(sum_amount_win) as sum_amount_win
FROM (
SELECT
a1.user_id
,CASE When MAX(currency) = 'RUB' Then SUM(d1.amount_cents) END as sum_amount_win
FROM dense_balance_transactions as d1
JOIN accounts a1 ON a1.id = d1.account_id
JOIN (
VALUES (5),(22),(26) -- ~6000 values
) AS v(user_id) USING (user_id)
WHERE d1.created_at BETWEEN '2019-06-01 00:00:00' AND '2019-06-20 23:59:59'
AND d1.action='win'
GROUP BY a1.user_id, a1.currency
) as t
GROUP BY user_id
QUERY PLAN for query with many VALUES:
GroupAggregate (cost=266816.48..266816.54 rows=1 width=48) (actual time=5024.201..5102.633 rows=5745 loops=1)
Group Key: a1.user_id
Buffers: shared hit=12205927
-> GroupAggregate (cost=266816.48..266816.51 rows=1 width=44) (actual time=5024.185..5099.621 rows=5774 loops=1)
Group Key: a1.user_id, a1.currency
Buffers: shared hit=12205927
-> Sort (cost=266816.48..266816.49 rows=1 width=20) (actual time=5024.170..5041.840 rows=291122 loops=1)
Sort Key: a1.user_id, a1.currency
Sort Method: quicksort Memory: 35032kB
Buffers: shared hit=12205927
-> Gather (cost=214410.62..266816.47 rows=1 width=20) (actual time=292.828..5204.320 rows=291122 loops=1)
Workers Planned: 5
Workers Launched: 5
Buffers: shared hit=12205921
-> Nested Loop (cost=213410.62..265816.37 rows=1 width=20) (actual time=255.028..3939.300 rows=48520 loops=6)
Buffers: shared hit=12205921
-> Merge Join (cost=213410.19..214522.45 rows=1269 width=20) (actual time=253.545..274.872 rows=1136 loops=6)
Merge Cond: (a1.user_id = "*VALUES*".column1)
Buffers: shared hit=191958
-> Sort (cost=212958.66..213493.45 rows=213914 width=20) (actual time=251.991..263.828 rows=82468 loops=6)
Sort Key: a1.user_id
Sort Method: quicksort Memory: 24322kB
Buffers: shared hit=191916
-> Parallel Seq Scan on accounts a1 (cost=0.00..194020.14 rows=213914 width=20) (actual time=0.042..196.052 rows=179242 loops=6)
Buffers: shared hit=191881
-> Sort (cost=451.52..466.52 rows=6000 width=4) (actual time=1.547..2.429 rows=6037 loops=6)
Sort Key: "*VALUES*".column1
Sort Method: quicksort Memory: 474kB
Buffers: shared hit=42
-> Values Scan on "*VALUES*" (cost=0.00..75.00 rows=6000 width=4) (actual time=0.002..0.928 rows=6000 loops=6)
-> Index Scan using index_dense_balance_transactions_on_account_id on dense_balance_transactions d1 (cost=0.44..40.41 rows=1 width=16) (actual time=0.160..3.220 rows=43 loops=6816)
Index Cond: (account_id = a1.id)
Filter: ((created_at >= '2019-06-01 00:00:00'::timestamp without time zone) AND (created_at <= '2019-06-20 23:59:59'::timestamp without time zone) AND ((action)::text = 'win'::text))
Rows Removed by Filter: 1942
Buffers: shared hit=12013963
Planning time: 10.239 ms
Execution time: 5387.523 ms
I use PosgreSQL 10.8.0.
Is there any chance to speed up this query?

How optimize SQL query with DISTINCT ON and JOIN many values?

I have a query like this where join ~6000 values
SELECT DISTINCT ON(user_id)
user_id,
finished_at as last_deposit_date,
CASE When currency = 'RUB' Then amount_cents END as last_deposit_amount_cents
FROM payments
JOIN (VALUES (5),(22),(26)) --~6000 values
AS v(user_id) USING (user_id)
WHERE action = 'deposit'
AND success = 't'
AND currency IN ('RUB')
ORDER BY user_id, finished_at DESC
QUERY PLAN for query with many VALUES:
Unique (cost=444606.97..449760.44 rows=19276 width=24) (actual time=6129.403..6418.317 rows=5991 loops=1)
Buffers: shared hit=2386527, temp read=7807 written=7808
-> Sort (cost=444606.97..447183.71 rows=1030695 width=24) (actual time=6129.401..6295.457 rows=1877039 loops=1)
Sort Key: payments.user_id, payments.finished_at DESC
Sort Method: external merge Disk: 62456kB
Buffers: shared hit=2386527, temp read=7807 written=7808
-> Nested Loop (cost=0.43..341665.35 rows=1030695 width=24) (actual time=0.612..5085.376 rows=1877039 loops=1)
Buffers: shared hit=2386521
-> Values Scan on "*VALUES*" (cost=0.00..75.00 rows=6000 width=4) (actual time=0.002..4.507 rows=6000 loops=1)
-> Index Scan using index_payments_on_user_id on payments (cost=0.43..54.78 rows=172 width=28) (actual time=0.010..0.793 rows=313 loops=6000)
Index Cond: (user_id = "*VALUES*".column1)
Filter: (success AND ((action)::text = 'deposit'::text) AND ((currency)::text = 'RUB'::text))
Rows Removed by Filter: 85
Buffers: shared hit=2386521
Planning time: 5.886 ms
Execution time: 6429.685 ms
I use PosgreSQL 10.8.0. Is there any chance to speed up this query?
I tried replacing DISTINCT with recursion:
WITH RECURSIVE t AS (
(SELECT min(user_id) AS user_id FROM payments)
UNION ALL
SELECT (SELECT min(user_id) FROM payments
WHERE user_id > t.user_id
) AS user_id FROM
t
WHERE t.user_id IS NOT NULL
)
SELECT payments.* FROM t
JOIN (VALUES (5),(22),(26)) --~6000 VALUES
AS v(user_id) USING (user_id)
, LATERAL (
SELECT user_id,
finished_at as last_deposit_date,
CASE When currency = 'RUB' Then amount_cents END as last_deposit_amount_cents FROM payments
WHERE payments.user_id=t.user_id
AND action = 'deposit'
AND success = 't'
AND currency IN ('RUB')
ORDER BY finished_at DESC LIMIT 1
) AS payments
WHERE t.user_id IS NOT NULL;
But it turned out even slower.
Hash Join (cost=418.67..21807.22 rows=3000 width=24) (actual time=16.804..10843.174 rows=5991 loops=1)
Hash Cond: (t.user_id = "VALUES".column1)
Buffers: shared hit=6396763
CTE t
-> Recursive Union (cost=0.46..53.73 rows=101 width=8) (actual time=0.142..1942.351 rows=237029 loops=1)
Buffers: shared hit=864281
-> Result (cost=0.46..0.47 rows=1 width=8) (actual time=0.141..0.142 rows=1 loops=1)
Buffers: shared hit=4
InitPlan 3 (returns $1)
-> Limit (cost=0.43..0.46 rows=1 width=8) (actual time=0.138..0.139 rows=1 loops=1)
Buffers: shared hit=4
-> Index Only Scan using index_payments_on_user_id on payments payments_2 (cost=0.43..155102.74 rows=4858092 width=8) (actual time=0.137..0.138 rows=1 loops=1)
Index Cond: (user_id IS NOT NULL)
Heap Fetches: 0
Buffers: shared hit=4
-> WorkTable Scan on t t_1 (cost=0.00..5.12 rows=10 width=8) (actual time=0.008..0.008 rows=1 loops=237029)
Filter: (user_id IS NOT NULL)
Rows Removed by Filter: 0
Buffers: shared hit=864277
SubPlan 2
-> Result (cost=0.48..0.49 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=237028)
Buffers: shared hit=864277
InitPlan 1 (returns $3)
-> Limit (cost=0.43..0.48 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=237028)
Buffers: shared hit=864277
-> Index Only Scan using index_payments_on_user_id on payments payments_1 (cost=0.43..80786.25 rows=1619364 width=8) (actual time=0.007..0.007 rows=1 loops=237028)
Index Cond: ((user_id IS NOT NULL) AND (user_id > t_1.user_id))
Heap Fetches: 46749
Buffers: shared hit=864277
-> Nested Loop (cost=214.94..21498.23 rows=100 width=32) (actual time=0.475..10794.535 rows=167333 loops=1)
Buffers: shared hit=6396757
-> CTE Scan on t (cost=0.00..2.02 rows=100 width=8) (actual time=0.145..1998.788 rows=237028 loops=1)
Filter: (user_id IS NOT NULL)
Rows Removed by Filter: 1
Buffers: shared hit=864281
-> Limit (cost=214.94..214.94 rows=1 width=24) (actual time=0.037..0.037 rows=1 loops=237028)
Buffers: shared hit=5532476
-> Sort (cost=214.94..215.37 rows=172 width=24) (actual time=0.036..0.036 rows=1 loops=237028)
Sort Key: payments.finished_at DESC
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=5532476
-> Index Scan using index_payments_on_user_id on payments (cost=0.43..214.08 rows=172 width=24) (actual time=0.003..0.034 rows=15 loops=237028)
Index Cond: (user_id = t.user_id)
Filter: (success AND ((action)::text = 'deposit'::text) AND ((currency)::text = 'RUB'::text))
Rows Removed by Filter: 6
Buffers: shared hit=5532473
-> Hash (cost=75.00..75.00 rows=6000 width=4) (actual time=2.255..2.255 rows=6000 loops=1)
Buckets: 8192 Batches: 1 Memory Usage: 275kB
-> Values Scan on "VALUES" (cost=0.00..75.00 rows=6000 width=4) (actual time=0.004..1.206 rows=6000 loops=1)
Planning time: 7.029 ms
Execution time: 10846.774 ms
For this query:
SELECT DISTINCT ON (user_id)
p.user_id,
p.finished_at as last_deposit_date,
(CASE WHEN p.currency = 'RUB' THEN p.amount_cents END) as last_deposit_amount_cents
FROM payments p JOIN
(VALUES (5),( 22), (26) --~6000 values
) v(user_id)
USING (user_id)
WHERE p.action = 'deposit' AND
p.success = 't' ND
p.currency = 'RUB'
ORDER BY p.user_id, p.finished_at DESC;
I don't fully understand the CASE expression, because the WHERE is filtering out all other values.
That said, I would expect an index on (action, success, currency, user_id, finished_at desc) to be helpful.