SQL (Postgres) Optimal Number of Joins - sql

This query is being performed in postgres version 12. This query poses 8 joins, and lasts approximately 5 seconds.
Query 1
select *
from "public"."products" "P"
inner join "system"."categories" "C" on "C"."id" = "P"."id_category"
inner join "public"."businesses" "E" on "E"."id" = "P"."id_business"
left join "public"."product_files" "pf" on "pf"."id_product" = "P"."id"
left join "system"."files" "f" on "f"."name" = "pf"."img_code"
left join "public"."product_variations" "pv" on ("pv"."id_product" = "P"."id" and "pv"."status" <> 'Deleted')
left join "public"."product_stocks" "ps" on ("ps"."id_product_variation" = "pv"."id" and "ps"."status" <> 'Deleted')
left join "public"."product_stocks" "pps" on ("pps"."id_product" = "P"."id" and "pps"."status" <> 'Deleted')
inner join search_products( array['tires'], 8, 1, 'es') "search" on search.id = "P"."id"
where "P"."status" <> 'Deleted'
Postgres Query EXPLAIN(ANALYZE, BUFFERS) for Query 1
Merge Join (cost=112948.60..121805.61 rows=4996 width=1145) (actual time=2003.599..2426.892 rows=40 loops=1)
Merge Cond: ("P".id = search.id)
Buffers: shared hit=760531, temp read=16912 written=18837
-> Merge Left Join (cost=112888.52..120950.73 rows=287945 width=1105) (actual time=1607.013..2093.722 rows=380961 loops=1)
Merge Cond: ("P".id = pf.id_product)
Buffers: shared hit=752079, temp read=15561 written=15606
-> Merge Left Join (cost=16288.22..19167.29 rows=57631 width=771) (actual time=165.803..271.662 rows=76193 loops=1)
Merge Cond: ("P".id = pps.id_product)
Buffers: shared hit=3820, temp read=2706 written=2733
-> Merge Left Join (cost=16287.81..16577.01 rows=57631 width=686) (actual time=165.787..217.878 rows=56921 loops=1)
Merge Cond: ("P".id = pv.id_product)
Buffers: shared hit=2058, temp read=2706 written=2733
-> Sort (cost=14888.93..15033.01 rows=57631 width=514) (actual time=156.825..175.154 rows=56920 loops=1)
Sort Key: "P".id
Sort Method: external merge Disk: 21840kB
Buffers: shared hit=1430, temp read=2706 written=2733
-> Hash Join (cost=43.49..2484.49 rows=57631 width=514) (actual time=0.266..64.052 rows=57631 loops=1)
Hash Cond: ("P".id_business = "E".id)
Buffers: shared hit=1430
-> Hash Join (cost=37.81..2322.14 rows=57631 width=374) (actual time=0.214..39.402 rows=57631 loops=1)
Hash Cond: ("P".id_category = "C".id)
Buffers: shared hit=1427
-> Seq Scan on products "P" (cost=0.00..2132.41 rows=57631 width=252) (actual time=0.009..12.754 rows=57631 loops=1)
Filter: ((status)::text <> 'Deleted'::text)
Rows Removed by Filter: 2
Buffers: shared hit=1412
-> Hash (cost=25.14..25.14 rows=1014 width=122) (actual time=0.201..0.201 rows=1014 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 124kB
Buffers: shared hit=15
-> Seq Scan on categories "C" (cost=0.00..25.14 rows=1014 width=122) (actual time=0.007..0.078 rows=1014 loops=1)
Buffers: shared hit=15
-> Hash (cost=4.19..4.19 rows=119 width=140) (actual time=0.047..0.048 rows=119 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 29kB
Buffers: shared hit=3
-> Seq Scan on businesses "E" (cost=0.00..4.19 rows=119 width=140) (actual time=0.013..0.024 rows=119 loops=1)
Buffers: shared hit=3
-> Sort (cost=1398.88..1399.05 rows=70 width=172) (actual time=8.956..8.958 rows=3 loops=1)
Sort Key: pv.id_product
Sort Method: quicksort Memory: 43kB
Buffers: shared hit=628
-> Hash Right Join (cost=4.58..1396.73 rows=70 width=172) (actual time=8.853..8.912 rows=70 loops=1)
Hash Cond: (ps.id_product_variation = pv.id)
Buffers: shared hit=628
-> Seq Scan on product_stocks ps (cost=0.00..1259.35 rows=50589 width=85) (actual time=0.009..7.030 rows=50595 loops=1)
Filter: ((status)::text <> 'Deleted'::text)
Rows Removed by Filter: 73
Buffers: shared hit=626
-> Hash (cost=3.70..3.70 rows=70 width=87) (actual time=0.048..0.049 rows=70 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 16kB
Buffers: shared hit=2
-> Seq Scan on product_variations pv (cost=0.00..3.70 rows=70 width=87) (actual time=0.020..0.039 rows=70 loops=1)
Filter: ((status)::text <> 'Deleted'::text)
Rows Removed by Filter: 66
Buffers: shared hit=2
-> Index Scan using product_stocks_id_product_id_product_variation_id_location_key on product_stocks pps (cost=0.41..1819.97 rows=50589 width=85) (actual time=0.013..17.822 rows=49924 loops=1)
Filter: ((status)::text <> 'Deleted'::text)
Rows Removed by Filter: 1
Buffers: shared hit=1762
-> Materialize (cost=96600.25..98040.03 rows=287955 width=334) (actual time=1441.203..1613.160 rows=380961 loops=1)
Buffers: shared hit=748259, temp read=12855 written=12873
-> Sort (cost=96600.25..97320.14 rows=287955 width=334) (actual time=1441.198..1567.183 rows=284596 loops=1)
Sort Key: pf.id_product
Sort Method: external merge Disk: 102840kB
Buffers: shared hit=748259, temp read=12855 written=12873
-> Merge Left Join (cost=0.84..44546.48 rows=287955 width=334) (actual time=0.021..1013.742 rows=287955 loops=1)
Merge Cond: ((pf.img_code)::text = (f.name)::text)
Buffers: shared hit=748259
-> Index Scan using product_files_pkey on product_files pf (cost=0.42..10516.05 rows=287955 width=66) (actual time=0.005..184.173 rows=287955 loops=1)
Buffers: shared hit=289884
-> Index Scan using files_pkey on files f (cost=0.42..29304.42 rows=455180 width=268) (actual time=0.005..338.206 rows=455178 loops=1)
Buffers: shared hit=458375
-> Sort (cost=60.08..62.58 rows=1000 width=40) (actual time=313.554..313.558 rows=36 loops=1)
Sort Key: search.id
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=8452, temp read=1351 written=3231
-> Function Scan on search_products search (cost=0.25..10.25 rows=1000 width=40) (actual time=313.544..313.545 rows=8 loops=1)
Buffers: shared hit=8452, temp read=1351 written=3231
Planning Time: 2.632 ms
Execution Time: 2440.414 ms
I was reviewing the way to optimize the query, so I was doing the joins one by one to see where the problem was, and among so many permutations in order of join, I realized that postgres from join number 7, apparently stops find the best way to run the query. So, when i delete any (randomly) join, the query lasts 300ms
Query 2
select *
from "public"."products" "P"
inner join "system"."categories" "C" on "C"."id" = "P"."id_category"
left join "public"."product_files" "pf" on "pf"."id_product" = "P"."id"
left join "system"."files" "f" on "f"."name" = "pf"."img_code"
left join "public"."product_variations" "pv" on ("pv"."id_product" = "P"."id" and "pv"."status" <> 'Deleted')
left join "public"."product_stocks" "ps" on ("ps"."id_product_variation" = "pv"."id" and "ps"."status" <> 'Deleted')
left join "public"."product_stocks" "pps" on ("pps"."id_product" = "P"."id" and "pps"."status" <> 'Deleted')
inner join search_products( array['tires'], 8, 1, 'es') "search" on search.id = "P"."id"
where "P"."status" <> 'Deleted'
Postgres Query EXPLAIN(ANALYZE, BUFFERS) for Query 2
Nested Loop Left Join (cost=1365.30..6482.09 rows=4996 width=1005) (actual time=349.888..350.399 rows=40 loops=1)
Buffers: shared hit=9339, temp read=1351 written=3231
-> Nested Loop Left Join (cost=1364.88..3893.89 rows=4996 width=737) (actual time=349.866..349.957 rows=40 loops=1)
Buffers: shared hit=9179, temp read=1351 written=3231
-> Nested Loop Left Join (cost=1364.46..3250.90 rows=1000 width=671) (actual time=349.857..349.899 rows=8 loops=1)
Buffers: shared hit=9147, temp read=1351 written=3231
-> Hash Join (cost=1364.04..2759.11 rows=1000 width=586) (actual time=349.839..349.853 rows=8 loops=1)
Hash Cond: ("P".id_category = "C".id)
Buffers: shared hit=9119, temp read=1351 written=3231
-> Hash Right Join (cost=1326.23..2718.65 rows=1000 width=464) (actual time=349.566..349.574 rows=8 loops=1)
Hash Cond: (pv.id_product = "P".id)
Buffers: shared hit=9104, temp read=1351 written=3231
-> Hash Right Join (cost=4.58..1396.73 rows=70 width=172) (actual time=8.953..9.013 rows=70 loops=1)
Hash Cond: (ps.id_product_variation = pv.id)
Buffers: shared hit=628
-> Seq Scan on product_stocks ps (cost=0.00..1259.35 rows=50589 width=85) (actual time=0.008..7.060 rows=50595 loops=1)
Filter: ((status)::text <> 'Deleted'::text)
Rows Removed by Filter: 73
Buffers: shared hit=626
-> Hash (cost=3.70..3.70 rows=70 width=87) (actual time=0.047..0.048 rows=70 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 16kB
Buffers: shared hit=2
-> Seq Scan on product_variations pv (cost=0.00..3.70 rows=70 width=87) (actual time=0.015..0.033 rows=70 loops=1)
Filter: ((status)::text <> 'Deleted'::text)
Rows Removed by Filter: 66
Buffers: shared hit=2
-> Hash (cost=1309.15..1309.15 rows=1000 width=292) (actual time=340.542..340.543 rows=8 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 10kB
Buffers: shared hit=8476, temp read=1351 written=3231
-> Nested Loop (cost=0.54..1309.15 rows=1000 width=292) (actual time=340.505..340.535 rows=8 loops=1)
Buffers: shared hit=8476, temp read=1351 written=3231
-> Function Scan on search_products search (cost=0.25..10.25 rows=1000 width=40) (actual time=340.483..340.485 rows=8 loops=1)
Buffers: shared hit=8452, temp read=1351 written=3231
-> Index Scan using products_pkey on products "P" (cost=0.29..1.30 rows=1 width=252) (actual time=0.005..0.005 rows=1 loops=8)
Index Cond: (id = search.id)
Filter: ((status)::text <> 'Deleted'::text)
Buffers: shared hit=24
-> Hash (cost=25.14..25.14 rows=1014 width=122) (actual time=0.268..0.268 rows=1014 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 124kB
Buffers: shared hit=15
-> Seq Scan on categories "C" (cost=0.00..25.14 rows=1014 width=122) (actual time=0.012..0.110 rows=1014 loops=1)
Buffers: shared hit=15
-> Index Scan using product_stocks_id_product_id_product_variation_id_location_key on product_stocks pps (cost=0.41..0.47 rows=2 width=85) (actual time=0.005..0.005 rows=1 loops=8)
Index Cond: (id_product = "P".id)
Filter: ((status)::text <> 'Deleted'::text)
Buffers: shared hit=28
-> Index Scan using idx_product_files_product on product_files pf (cost=0.42..0.59 rows=5 width=66) (actual time=0.004..0.005 rows=5 loops=8)
Index Cond: (id_product = "P".id)
Buffers: shared hit=32
-> Index Scan using files_pkey on files f (cost=0.42..0.52 rows=1 width=268) (actual time=0.010..0.010 rows=1 loops=40)
Index Cond: ((name)::text = (pf.img_code)::text)
Buffers: shared hit=160
Planning Time: 2.581 ms
Execution Time: 350.525 ms
Is there an article that explains this behavior to me? and how to fix it?

That is because join_collapse_limit has a default value of 8. The optimizer tries all permutations only for the first 8 tables, the rest is joined as written. The rationale is to keep planning time reasonably short, which increases exponentially with the number of tables.
Options:
increase the parameter
figure out a good join order ans rewrite the query to join in that order

Related

Postgresql Query Performance drop down from 4 secs to 16 minutes just by adding one filter criteria

I wrote a simple query that involves 2 views.
I need to find, using self join on "contratti_gas_attivi" (which doesn't contain information on any offer,settled or not), the supplies that doesn't have a settled offer (which I found defined in "offerte_valide_dettaglio" which contains ONLY the informations for all the supplies where an offer is settled) but for which an offers exists for the same end-user supplied till a day before the starting of the new end-user.
So the following code gets me a resutl of almost 50 rows in 4 to 6 seconds (which I dont consider fast but this is not something that should be run frequentely) but I only found the supplies which haven't an offer associated to them but only 3 of those have a previous settled offer.
select ovd2.cod_offerta ,ovd2.id_contratto ,ovd2.id_offerta, cga.*,cga2.* from contratti_gas_attivi cga left join offerte_valide_dettaglio ovd
on ovd.id_contratto = cga.id
left join contratti_gas_attivi cga2 on cga.cod_pdr =cga2.cod_pdr and cga.data_inizio = (cga2.data_fine +'1 day'::interval)::date
left join offerte_valide_dettaglio ovd2 on ovd2.id_contratto = cga2.id
left join crm.customer_offers co on co.id = ovd2.id_offerta
where cga.tipo_inizio_contratto = 'VOLTURA' and ovd.id_offerta is null;
So at the end I just add and ovd2.cod_offerta is not null to take out the other 47 rows and the query now takes almost 16 minutes to complete ! Just by adding a filtering clause at the end !
How is that possible ? The query plan get completely screwed...
Before last filter
Nested Loop Left Join (cost=4472.03..6947.83 rows=1 width=251) (actual time=3052.330..4196.372 rows=54 loops=1)
-> Nested Loop Left Join (cost=3000.74..4218.60 rows=1 width=112) (actual time=1652.151..1655.099 rows=49 loops=1)
Join Filter: ((cc2.piva)::text = (cg.piva_cc)::text)
Rows Removed by Join Filter: 2349
Filter: (cc2.data_replaced IS NULL)
Rows Removed by Filter: 3
-> Nested Loop Left Join (cost=3000.74..4215.59 rows=1 width=94) (actual time=1652.133..1653.982 rows=49 loops=1)
Join Filter: ((cc.piva)::text = (cg.piva_cliente)::text)
Rows Removed by Join Filter: 2349
Filter: ((cc.data_replaced IS NULL) AND (cc.data_deleted IS NULL) AND (cc.data_deleted IS NULL))
Rows Removed by Filter: 3
-> Hash Right Join (cost=3000.74..4212.58 rows=1 width=76) (actual time=1652.098..1652.268 rows=49 loops=1)
Hash Cond: (ocg.id_contratto = cg.id)
Filter: (ocg.id_offerta IS NULL)
Rows Removed by Filter: 2352
-> Hash Left Join (cost=1457.68..2561.97 rows=40966 width=16) (actual time=1479.736..1501.558 rows=39890 loops=1)
Hash Cond: (ocg.id_contratto = cg_1.id)
-> Seq Scan on offerte_contratti_gas ocg (cost=0.00..950.66 rows=40966 width=16) (actual time=0.033..8.184 rows=39890 loops=1)
-> Hash (cost=1457.66..1457.66 rows=1 width=8) (actual time=1479.633..1479.635 rows=36517 loops=1)
Buckets: 65536 (originally 1024) Batches: 1 (originally 1) Memory Usage: 1939kB
-> Nested Loop Left Join (cost=0.00..1457.66 rows=1 width=8) (actual time=0.080..1466.558 rows=36517 loops=1)
Join Filter: ((cc2_1.piva)::text = (cg_1.piva_cc)::text)
Rows Removed by Join Filter: 1747843
Filter: (cc2_1.data_replaced IS NULL)
Rows Removed by Filter: 4973
-> Nested Loop Left Join (cost=0.00..1454.65 rows=1 width=20) (actual time=0.052..743.117 rows=36517 loops=1)
Join Filter: ((cc_1.piva)::text = (cg_1.piva_cliente)::text)
Rows Removed by Join Filter: 1747827
Filter: ((cc_1.data_replaced IS NULL) AND (cc_1.data_deleted IS NULL) AND (cc_1.data_deleted IS NULL))
Rows Removed by Filter: 4989
-> Seq Scan on contratti_gas cg_1 (cost=0.00..1451.64 rows=1 width=32) (actual time=0.024..13.750 rows=36517 loops=1)
Filter: ((delated_at IS NULL) AND (replaced_at IS NULL))
Rows Removed by Filter: 47
-> Seq Scan on controparti_commerciali cc_1 (cost=0.00..2.45 rows=45 width=20) (actual time=0.001..0.005 rows=49 loops=36517)
-> Seq Scan on controparti_commerciali cc2_1 (cost=0.00..2.45 rows=45 width=16) (actual time=0.001..0.005 rows=49 loops=36517)
-> Hash (cost=1543.05..1543.05 rows=1 width=76) (actual time=145.878..145.878 rows=2104 loops=1)
Buckets: 4096 (originally 1024) Batches: 1 (originally 1) Memory Usage: 253kB
-> Seq Scan on contratti_gas cg (cost=0.00..1543.05 rows=1 width=76) (actual time=0.035..144.183 rows=2104 loops=1)
Filter: ((delated_at IS NULL) AND (replaced_at IS NULL) AND ((tipo_inizio_contratto)::text = 'VOLTURA'::text))
Rows Removed by Filter: 34460
-> Seq Scan on controparti_commerciali cc (cost=0.00..2.45 rows=45 width=38) (actual time=0.002..0.006 rows=49 loops=49)
-> Seq Scan on controparti_commerciali cc2 (cost=0.00..2.45 rows=45 width=34) (actual time=0.001..0.006 rows=49 loops=49)
-> Hash Right Join (cost=1471.29..2729.21 rows=1 width=139) (actual time=51.523..51.845 rows=1 loops=49)
Hash Cond: (ocg_1.id_contratto = cg_2.id)
-> Hash Left Join (cost=1457.68..2561.97 rows=40966 width=27) (actual time=28.165..48.297 rows=39890 loops=49)
Hash Cond: (ocg_1.id_contratto = cg_3.id)
-> Seq Scan on offerte_contratti_gas ocg_1 (cost=0.00..950.66 rows=40966 width=27) (actual time=0.002..7.578 rows=39890 loops=49)
-> Hash (cost=1457.66..1457.66 rows=1 width=8) (actual time=1379.882..1379.883 rows=36517 loops=1)
Buckets: 65536 (originally 1024) Batches: 1 (originally 1) Memory Usage: 1939kB
-> Nested Loop Left Join (cost=0.00..1457.66 rows=1 width=8) (actual time=0.041..1368.418 rows=36517 loops=1)
Join Filter: ((cc2_3.piva)::text = (cg_3.piva_cc)::text)
Rows Removed by Join Filter: 1747843
Filter: (cc2_3.data_replaced IS NULL)
Rows Removed by Filter: 4973
-> Nested Loop Left Join (cost=0.00..1454.65 rows=1 width=20) (actual time=0.026..696.145 rows=36517 loops=1)
Join Filter: ((cc_3.piva)::text = (cg_3.piva_cliente)::text)
Rows Removed by Join Filter: 1747827
Filter: ((cc_3.data_replaced IS NULL) AND (cc_3.data_deleted IS NULL) AND (cc_3.data_deleted IS NULL))
Rows Removed by Filter: 4989
-> Seq Scan on contratti_gas cg_3 (cost=0.00..1451.64 rows=1 width=32) (actual time=0.013..12.852 rows=36517 loops=1)
Filter: ((delated_at IS NULL) AND (replaced_at IS NULL))
Rows Removed by Filter: 47
-> Seq Scan on controparti_commerciali cc_3 (cost=0.00..2.45 rows=45 width=20) (actual time=0.001..0.005 rows=49 loops=36517)
-> Seq Scan on controparti_commerciali cc2_3 (cost=0.00..2.45 rows=45 width=16) (actual time=0.001..0.005 rows=49 loops=36517)
-> Hash (cost=13.60..13.60 rows=1 width=112) (actual time=0.460..0.460 rows=1 loops=49)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Hash Right Join (cost=10.97..13.60 rows=1 width=112) (actual time=0.439..0.449 rows=1 loops=49)
Hash Cond: ((cc2_2.piva)::text = (cg_2.piva_cc)::text)
Filter: (cc2_2.data_replaced IS NULL)
Rows Removed by Filter: 0
-> Seq Scan on controparti_commerciali cc2_2 (cost=0.00..2.45 rows=45 width=34) (actual time=0.002..0.006 rows=49 loops=49)
-> Hash (cost=10.96..10.96 rows=1 width=94) (actual time=0.427..0.427 rows=1 loops=49)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Hash Right Join (cost=8.33..10.96 rows=1 width=94) (actual time=0.405..0.417 rows=1 loops=49)
Hash Cond: ((cc_2.piva)::text = (cg_2.piva_cliente)::text)
Filter: ((cc_2.data_replaced IS NULL) AND (cc_2.data_deleted IS NULL) AND (cc_2.data_deleted IS NULL))
Rows Removed by Filter: 0
-> Seq Scan on controparti_commerciali cc_2 (cost=0.00..2.45 rows=45 width=38) (actual time=0.003..0.008 rows=49 loops=49)
-> Hash (cost=8.31..8.31 rows=1 width=76) (actual time=0.391..0.391 rows=1 loops=49)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Index Scan using pk_codpdr_datainizio_iserror on contratti_gas cg_2 (cost=0.29..8.31 rows=1 width=76) (actual time=0.382..0.384 rows=1 loops=49)
Index Cond: ((cod_pdr)::text = (cg.cod_pdr)::text)
Filter: ((delated_at IS NULL) AND (replaced_at IS NULL) AND (cg.data_inizio = ((data_fine + '1 day'::interval))::date))
Rows Removed by Filter: 2
Planning Time: 5.997 ms
Execution Time: 4196.992 ms
After last filter:
Nested Loop (cost=4461.49..6786.31 rows=1 width=251) (actual time=483943.882..1016665.139 rows=43 loops=1)
Join Filter: (((cg_2.cod_pdr)::text = (cg.cod_pdr)::text) AND (((cg_2.data_fine + '1 day'::interval))::date = cg.data_inizio))
Rows Removed by Join Filter: 1954567
-> Nested Loop Left Join (cost=1460.75..2567.69 rows=1 width=139) (actual time=1416.580..5211.218 rows=39890 loops=1)
-> Hash Join (cost=1457.68..2561.97 rows=1 width=139) (actual time=1416.502..1649.064 rows=39890 loops=1)
Hash Cond: (ocg_1.id_contratto = cg_2.id)
-> Seq Scan on offerte_contratti_gas ocg_1 (cost=0.00..950.66 rows=40966 width=27) (actual time=0.030..31.748 rows=39890 loops=1)
Filter: (cod_offerta IS NOT NULL)
-> Hash (cost=1457.66..1457.66 rows=1 width=112) (actual time=1416.000..1419.044 rows=36517 loops=1)
Buckets: 32768 (originally 1024) Batches: 2 (originally 1) Memory Usage: 3841kB
-> Nested Loop Left Join (cost=0.00..1457.66 rows=1 width=112) (actual time=0.072..1383.415 rows=36517 loops=1)
Join Filter: ((cc2_2.piva)::text = (cg_2.piva_cc)::text)
Rows Removed by Join Filter: 1747843
Filter: (cc2_2.data_replaced IS NULL)
Rows Removed by Filter: 4973
-> Nested Loop Left Join (cost=0.00..1454.65 rows=1 width=94) (actual time=0.043..704.608 rows=36517 loops=1)
Join Filter: ((cc_2.piva)::text = (cg_2.piva_cliente)::text)
Rows Removed by Join Filter: 1747827
Filter: ((cc_2.data_replaced IS NULL) AND (cc_2.data_deleted IS NULL) AND (cc_2.data_deleted IS NULL))
Rows Removed by Filter: 4989
-> Seq Scan on contratti_gas cg_2 (cost=0.00..1451.64 rows=1 width=76) (actual time=0.016..14.948 rows=36517 loops=1)
Filter: ((delated_at IS NULL) AND (replaced_at IS NULL))
Rows Removed by Filter: 47
-> Seq Scan on controparti_commerciali cc_2 (cost=0.00..2.45 rows=45 width=38) (actual time=0.001..0.005 rows=49 loops=36517)
-> Seq Scan on controparti_commerciali cc2_2 (cost=0.00..2.45 rows=45 width=34) (actual time=0.001..0.005 rows=49 loops=36517)
-> Hash Right Join (cost=3.08..5.71 rows=1 width=8) (actual time=0.061..0.073 rows=1 loops=39890)
Hash Cond: ((cc2_3.piva)::text = (cg_3.piva_cc)::text)
Filter: (cc2_3.data_replaced IS NULL)
Rows Removed by Filter: 0
-> Seq Scan on controparti_commerciali cc2_3 (cost=0.00..2.45 rows=45 width=16) (actual time=0.001..0.005 rows=49 loops=39890)
-> Hash (cost=3.06..3.06 rows=1 width=20) (actual time=0.048..0.048 rows=1 loops=39890)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Hash Right Join (cost=0.44..3.06 rows=1 width=20) (actual time=0.027..0.036 rows=1 loops=39890)
Hash Cond: ((cc_3.piva)::text = (cg_3.piva_cliente)::text)
Filter: ((cc_3.data_replaced IS NULL) AND (cc_3.data_deleted IS NULL) AND (cc_3.data_deleted IS NULL))
Rows Removed by Filter: 0
-> Seq Scan on controparti_commerciali cc_3 (cost=0.00..2.45 rows=45 width=20) (actual time=0.001..0.005 rows=49 loops=39890)
-> Hash (cost=0.42..0.42 rows=1 width=32) (actual time=0.018..0.018 rows=1 loops=39890)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Index Scan using uq_id on contratti_gas cg_3 (cost=0.29..0.42 rows=1 width=32) (actual time=0.013..0.013 rows=1 loops=39890)
Index Cond: (id = ocg_1.id_contratto)
Filter: ((delated_at IS NULL) AND (replaced_at IS NULL))
-> Nested Loop Left Join (cost=3000.74..4218.60 rows=1 width=112) (actual time=23.552..25.302 rows=49 loops=39890)
Join Filter: ((cc2.piva)::text = (cg.piva_cc)::text)
Rows Removed by Join Filter: 2349
Filter: (cc2.data_replaced IS NULL)
Rows Removed by Filter: 3
-> Nested Loop Left Join (cost=3000.74..4215.59 rows=1 width=94) (actual time=23.533..24.446 rows=49 loops=39890)
Join Filter: ((cc.piva)::text = (cg.piva_cliente)::text)
Rows Removed by Join Filter: 2349
Filter: ((cc.data_replaced IS NULL) AND (cc.data_deleted IS NULL) AND (cc.data_deleted IS NULL))
Rows Removed by Filter: 3
-> Hash Right Join (cost=3000.74..4212.58 rows=1 width=76) (actual time=23.493..23.561 rows=49 loops=39890)
Hash Cond: (ocg.id_contratto = cg.id)
Filter: (ocg.id_offerta IS NULL)
Rows Removed by Filter: 2352
-> Hash Left Join (cost=1457.68..2561.97 rows=40966 width=16) (actual time=0.034..19.423 rows=39890 loops=39890)
Hash Cond: (ocg.id_contratto = cg_1.id)
-> Seq Scan on offerte_contratti_gas ocg (cost=0.00..950.66 rows=40966 width=16) (actual time=0.002..7.327 rows=39890 loops=39890)
-> Hash (cost=1457.66..1457.66 rows=1 width=8) (actual time=1236.665..1238.748 rows=36517 loops=1)
Buckets: 65536 (originally 1024) Batches: 1 (originally 1) Memory Usage: 1939kB
-> Nested Loop Left Join (cost=0.00..1457.66 rows=1 width=8) (actual time=0.052..1230.365 rows=36517 loops=1)
Join Filter: ((cc2_1.piva)::text = (cg_1.piva_cc)::text)
Rows Removed by Join Filter: 1747843
Filter: (cc2_1.data_replaced IS NULL)
Rows Removed by Filter: 4973
-> Nested Loop Left Join (cost=0.00..1454.65 rows=1 width=20) (actual time=0.035..625.434 rows=36517 loops=1)
Join Filter: ((cc_1.piva)::text = (cg_1.piva_cliente)::text)
Rows Removed by Join Filter: 1747827
Filter: ((cc_1.data_replaced IS NULL) AND (cc_1.data_deleted IS NULL) AND (cc_1.data_deleted IS NULL))
Rows Removed by Filter: 4989
-> Seq Scan on contratti_gas cg_1 (cost=0.00..1451.64 rows=1 width=32) (actual time=0.012..11.876 rows=36517 loops=1)
Filter: ((delated_at IS NULL) AND (replaced_at IS NULL))
Rows Removed by Filter: 47
-> Seq Scan on controparti_commerciali cc_1 (cost=0.00..2.45 rows=45 width=20) (actual time=0.001..0.004 rows=49 loops=36517)
-> Seq Scan on controparti_commerciali cc2_1 (cost=0.00..2.45 rows=45 width=16) (actual time=0.001..0.004 rows=49 loops=36517)
-> Hash (cost=1543.05..1543.05 rows=1 width=76) (actual time=8.053..8.053 rows=2104 loops=1)
Buckets: 4096 (originally 1024) Batches: 1 (originally 1) Memory Usage: 253kB
-> Seq Scan on contratti_gas cg (cost=0.00..1543.05 rows=1 width=76) (actual time=0.014..7.482 rows=2104 loops=1)
Filter: ((delated_at IS NULL) AND (replaced_at IS NULL) AND ((tipo_inizio_contratto)::text = 'VOLTURA'::text))
Rows Removed by Filter: 34460
-> Seq Scan on controparti_commerciali cc (cost=0.00..2.45 rows=45 width=38) (actual time=0.001..0.004 rows=49 loops=1954610)
-> Seq Scan on controparti_commerciali cc2 (cost=0.00..2.45 rows=45 width=34) (actual time=0.001..0.004 rows=49 loops=1954610)
Planning Time: 7.946 ms
Execution Time: 1016666.928 ms
I am not very familiar in reading query plans but I can see that adding the last clause completely ruin the "fast" query plan that was running before! Any advice ?
edit:
Holly molly jjanes that worked wonderfully !
I read that vacuum permanentely delete obsolete deleted/updated tuples and vaccum analyze update the statistic used by the planner. Still I'm a bit confused on why that worked ! The DB is on local and I do only work on it, I did some crud operation on the underlying tables but going from 4 seconds to 16 just because I didn't run the vacuum sounds strange to me! Now the full query only take 187ms !!! Do you have any resource to share beside official docs about vacuum to better understand ?,Many thanks, unfortunately I cannot upvote the answer yet.

Postgres query becomes extremely slow with one single change

I have this query, where among other things, I need the discussions.client_first_responded_at to be converted to the given time zone, which is different for every row. If I replace the reference to users.time_zone_offset to a static '-06:00'::INTERVAL — the query executes within a second, but with the dynamic reference to the users.time_zone_offset it takes ~120 seconds.
What am I missing?
SELECT client_id
FROM programs
INNER JOIN users ON users.id = programs.coach_id
WHERE programs.id IN (
SELECT COALESCE(discussions.parent_id, calls.parent_id) AS program_id
FROM categorizations
LEFT OUTER JOIN discussions ON discussions.id = categorizations.categorizable_id AND categorizations.categorizable_type = 'Discussion'
LEFT OUTER JOIN calls ON calls.id = categorizations.categorizable_id AND categorizations.categorizable_type = 'Call'
WHERE categorizations.categorizable_type = 'Discussion' OR categorizations.categorizable_type = 'Call'
AND COALESCE(
discussions.client_first_responded_at::timestamptz AT TIME ZONE users.time_zone_offset::INTERVAL,
calls.start_time::timestamptz
) BETWEEN '2020-09-01' AND '2020-09-30'
);
UPD:
Hash Join (cost=1.47..250840.61 rows=3542 width=8) (actual time=35.419..61810.027 rows=2266 loops=1)
Hash Cond: (programs.coach_id = users.id)
Join Filter: (SubPlan 1)
Rows Removed by Join Filter: 4821
Buffers: shared hit=3087792
-> Seq Scan on programs (cost=0.00..368.84 rows=7084 width=20) (actual time=0.008..11.890 rows=7087 loops=1)
Buffers: shared hit=298
-> Hash (cost=1.21..1.21 rows=21 width=40) (actual time=0.015..0.015 rows=21 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
Buffers: shared hit=1
-> Seq Scan on users (cost=0.00..1.21 rows=21 width=40) (actual time=0.004..0.008 rows=21 loops=1)
Buffers: shared hit=1
SubPlan 1
-> Hash Right Join (cost=777.19..2168.01 rows=7477 width=4) (actual time=0.007..7.887 rows=7516 loops=7087)
Hash Cond: (discussions.id = categorizations.categorizable_id)
Join Filter: ((categorizations.categorizable_type)::text = 'Discussion'::text)
Rows Removed by Join Filter: 3473
Filter: (((categorizations.categorizable_type)::text = 'Discussion'::text) OR (((categorizations.categorizable_type)::text = 'Call'::text) AND (COALESCE((timezone((users.time_zone_offset)::interval, (discussions.client_first_responded_at)::timestamp with time zone))::timestamp with time zone, (calls.start_time)::timestamp with time zone) >= '2020-09-01 00:00:00+02'::timestamp with time zone) AND (COALESCE((timezone((users.time_zone_offset)::interval, (discussions.client_first_responded_at)::timestamp with time zone))::timestamp with time zone, (calls.start_time)::timestamp with time zone) <= '2020-09-30 00:00:00+02'::timestamp with time zone)))
Rows Removed by Filter: 2578
Buffers: shared hit=3087493
-> Seq Scan on discussions (cost=0.00..751.46 rows=18746 width=20) (actual time=0.003..1.842 rows=15668 loops=7087)
Buffers: shared hit=3087252
-> Hash (cost=647.28..647.28 rows=10393 width=25) (actual time=13.355..13.355 rows=13090 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 763kB
Buffers: shared hit=241
-> Hash Left Join (cost=300.68..647.28 rows=10393 width=25) (actual time=3.476..10.083 rows=13090 loops=1)
Hash Cond: (categorizations.categorizable_id = calls.id)
Join Filter: ((categorizations.categorizable_type)::text = 'Call'::text)
Rows Removed by Join Filter: 2303
Buffers: shared hit=241
-> Seq Scan on categorizations (cost=0.00..319.32 rows=10393 width=13) (actual time=0.006..2.455 rows=13090 loops=1)
Filter: (((categorizable_type)::text = 'Discussion'::text) OR ((categorizable_type)::text = 'Call'::text))
Buffers: shared hit=123
-> Hash (cost=199.19..199.19 rows=8119 width=20) (actual time=3.462..3.462 rows=8119 loops=1)
Buckets: 8192 Batches: 1 Memory Usage: 509kB
Buffers: shared hit=118
-> Seq Scan on calls (cost=0.00..199.19 rows=8119 width=20) (actual time=0.005..1.766 rows=8119 loops=1)
Buffers: shared hit=118
Planning Time: 0.643 ms
Execution Time: 61811.118 ms
Hash Join (cost=3537.47..4289.98 rows=4825 width=8) (actual time=111.743..122.572 rows=4371 loops=1)
Hash Cond: (programs.coach_id = users.id)
Buffers: shared hit=1931
-> Hash Join (cost=3535.95..4273.46 rows=4825 width=12) (actual time=111.627..120.637 rows=4371 loops=1)
Hash Cond: (programs.id = COALESCE(discussions.parent_id, calls.parent_id))
Buffers: shared hit=1930
-> Seq Scan on programs (cost=0.00..658.50 rows=9650 width=20) (actual time=0.011..4.880 rows=9656 loops=1)
Buffers: shared hit=562
-> Hash (cost=3350.83..3350.83 rows=14810 width=8) (actual time=111.573..111.580 rows=4371 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 282kB
Buffers: shared hit=1368
-> HashAggregate (cost=3202.73..3350.83 rows=14810 width=8) (actual time=107.495..109.868 rows=4371 loops=1)
Group Key: COALESCE(discussions.parent_id, calls.parent_id)
Buffers: shared hit=1368
-> Hash Left Join (cost=1688.78..3165.70 rows=14810 width=8) (actual time=34.242..97.144 rows=19275 loops=1)
Hash Cond: (categorizations.categorizable_id = calls.id)
Join Filter: ((categorizations.categorizable_type)::text = 'Call'::text)
Rows Removed by Join Filter: 6869
Filter: (((categorizations.categorizable_type)::text = 'Discussion'::text) OR (((categorizations.categorizable_type)::text = 'Call'::text) AND (COALESCE((timezone('-06:00:00'::interval, (discussions.client_first_responded_at)::timestamp with time zone))::timestamp with time zone, (calls.start_time)::timestamp with time zone) >= '2020-09-01 00:00:00+00'::timestamp with time zone) AND (COALESCE((timezone('-06:00:00'::interval, (discussions.client_first_responded_at)::timestamp with time zone))::timestamp with time zone, (calls.start_time)::timestamp with time zone) <= '2020-09-30 00:00:00+00'::timestamp with time zone)))
Rows Removed by Filter: 7674
Buffers: shared hit=1368
-> Hash Right Join (cost=1046.24..2467.57 rows=21181 width=25) (actual time=21.970..62.581 rows=26949 loops=1)
Hash Cond: (discussions.id = categorizations.categorizable_id)
Join Filter: ((categorizations.categorizable_type)::text = 'Discussion'::text)
Rows Removed by Join Filter: 8461
Buffers: shared hit=1057
-> Seq Scan on discussions (cost=0.00..997.71 rows=31771 width=20) (actual time=0.183..8.541 rows=32064 loops=1)
Buffers: shared hit=680
-> Hash (cost=781.47..781.47 rows=21181 width=13) (actual time=21.720..21.721 rows=26949 loops=1)
Buckets: 32768 Batches: 1 Memory Usage: 1444kB
Buffers: shared hit=377
-> Seq Scan on categorizations (cost=0.00..781.47 rows=21181 width=13) (actual time=0.012..11.631 rows=26949 loops=1)
Filter: (((categorizable_type)::text = 'Discussion'::text) OR ((categorizable_type)::text = 'Call'::text))
Buffers: shared hit=377
-> Hash (cost=458.35..458.35 rows=14735 width=20) (actual time=12.205..12.206 rows=14720 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 876kB
Buffers: shared hit=311
-> Seq Scan on calls (cost=0.00..458.35 rows=14735 width=20) (actual time=0.016..6.914 rows=14720 loops=1)
Buffers: shared hit=311
-> Hash (cost=1.23..1.23 rows=23 width=8) (actual time=0.065..0.066 rows=23 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
Buffers: shared hit=1
-> Seq Scan on users (cost=0.00..1.23 rows=23 width=8) (actual time=0.048..0.054 rows=23 loops=1)
Buffers: shared hit=1
Planning Time: 1.010 ms
Execution Time: 123.511 ms

How to optimize an sql query with OR in where to filter on values from different columns?

General context: I'm having a postgres-database which includes a knowledge system (skos). The knowledge system is in a different scheme. skos describes all the links between concepts, how they are related and also includes notes, notations, labels, ... . Each concept is mapped to this skos.
When querying the actual business logic, which is about Cascades, CascadeSteps, Technologies, Additives, ..., I want to include the translations for each of these concepts in the response.
I'm adding views to simplify the requests.
In one of the queries, this is the bottleneck (this is a part of the full query):
SELECT trcs.cascade,
"Translations".language,
jsonb_object_agg("Translations".code, "Translations".skos) AS resource
FROM "CascadeStep" trcs
LEFT JOIN "CascadeUnit" trcu
ON trcu.cuid = trcs.from
LEFT JOIN "TechnologyAdditive" trta on trta.technology = trcu.technology
LEFT JOIN skos."Translations"
ON (skos."Translations".id = trcs.from) OR
(skos."Translations".id = trcs.flow) OR
(skos."Translations".id = trcs.product) OR
(skos."Translations".id = trta.additive)
WHERE skos."Translations".notation = 'SimpleNotation'
GROUP BY trcs.cascade, "Translations".language)
The reason seems to be the OR in the filter. When I analyze:
HASH JOIN | 29 %
Left join on ((trcs."from")::bpchar = (trcu.cuid)::bpchar)
Hash Join Node joins to record sets by hashing one of them (using a Hash Scan).
Node Type Hash Join
Parent Relationship Outer
Parallel Aware false
Join Type Left
Startup Cost 43.02
Total Cost 191.84
Plan Rows 1620
Plan Width 72
Actual Startup Time 0.468
Actual Total Time 12.035
Actual Rows 598
Actual Loops 1
Output trcs.cascade,"Translations".language,"Translations".code,"Translations".skos
Inner Unique false
Hash Cond ((trcs."from")::bpchar = (trcu.cuid)::bpchar)
Filter (("Translations".id = (trcs."from")::bpchar) OR ("Translations".id = (trcs.flow)::bpchar) OR ("Translations".id = (trcs.product)::bpchar) OR ("Translations".id = (trta.additive)::bpchar))
I tried to use different strategies in that where clause, but none of them seems to produce better results:
... on skos."Translations".id = any(array[trcs.from, trcs.flow, trcs.product, trta.additive])
is about 1.25x slower.
... on skos."Translations".id in (select trcs.from union all select trcs.flow union all select trcs.product union all select trta.additive)
is 2-5x slower
... on skos."Translations".id in (select trcs.from
union all
select trcs.flow
union all
select trcs.product
union all
select trta.additive
from "TechnologyAdditive" trta
left join "CascadeUnit" trcu on trcu.cuid=trcs.from
where trta.technology = trcu.technology)
10-30x slower
The actual question: is it possible to efficiently use different columns in the where clause of the query, to replace the ORs, or can the query be rewritten more efficiently?
== EDIT: add (relevant) output of explain (ANALYZE, COSTS, VERBOSE, BUFFERS)
Hash (cost=200.55..200.55 rows=966 width=61) (actual time=13.467..13.467 rows=21 loops=1)
Output: i18n.language, i18n.resource, i18n.cascade
Buckets: 1024 Batches: 1 Memory Usage: 29kB
Buffers: shared hit=12
-> Subquery Scan on i18n (cost=194.27..200.55 rows=966 width=61) (actual time=13.340..13.450 rows=21 loops=1)
Output: i18n.language, i18n.resource, i18n.cascade
Buffers: shared hit=12
-> HashAggregate (cost=194.27..197.66 rows=966 width=61) (actual time=13.339..13.447 rows=21 loops=1)
Output: trcs.cascade, "Translations".language, jsonb_object_agg("Translations".code, "Translations".skos)
Group Key: trcs.cascade, "Translations".language
Buffers: shared hit=12
-> Hash Left Join (cost=43.02..191.84 rows=1620 width=72) (actual time=0.452..12.193 rows=598 loops=1)
Output: trcs.cascade, "Translations".language, "Translations".code, "Translations".skos
Hash Cond: ((trcs."from")::bpchar = (trcu.cuid)::bpchar)
Filter: (("Translations".id = (trcs."from")::bpchar) OR ("Translations".id = (trcs.flow)::bpchar) OR ("Translations".id = (trcs.product)::bpchar) OR ("Translations".id = (trta.additive)::bpchar))
Rows Removed by Filter: 19802
Buffers: shared hit=12
-> Nested Loop (cost=23.32..55.40 rows=7498 width=176) (actual time=0.344..4.213 rows=13040 loops=1)
Output: trcs.cascade, trcs."from", trcs.flow, trcs.product, "Translations".language, "Translations".code, "Translations".skos, "Translations".id
Buffers: shared hit=10
-> Seq Scan on public."CascadeStep" trcs (cost=0.00..5.49 rows=163 width=104) (actual time=0.004..0.076 rows=163 loops=1)
Output: trcs.cascade, trcs."from", trcs.flow, trcs.product
Buffers: shared hit=5
-> Materialize (cost=23.32..23.69 rows=46 width=72) (actual time=0.002..0.009 rows=80 loops=163)
Output: "Translations".language, "Translations".code, "Translations".skos, "Translations".id
Buffers: shared hit=5
-> Subquery Scan on "Translations" (cost=23.32..23.64 rows=46 width=72) (actual time=0.337..0.520 rows=80 loops=1)
Output: "Translations".language, "Translations".code, "Translations".skos, "Translations".id
Buffers: shared hit=5
-> Group (cost=23.32..23.50 rows=46 width=117) (actual time=0.336..0.508 rows=80 loops=1)
Output: l.concept_nss, l.language, n."notationType", n.value, jsonb_strip_nulls(jsonb_build_object('prefLabel', l.label, 'definition', v.object)), v.object
Group Key: l.concept_nss, n."notationType", l.language, v.object, n.value
Buffers: shared hit=5
-> Sort (cost=23.32..23.34 rows=46 width=108) (actual time=0.329..0.337 rows=80 loops=1)
Output: l.concept_nss, l.language, n."notationType", n.value, v.object, l.label
Sort Key: l.concept_nss, l.language, v.object, n.value
Sort Method: quicksort Memory: 37kB
Buffers: shared hit=5
-> Hash Right Join (cost=10.41..23.07 rows=46 width=108) (actual time=0.155..0.182 rows=80 loops=1)
Output: l.concept_nss, l.language, n."notationType", n.value, v.object, l.label
Hash Cond: ((n_1.concept_nss = n.concept_nss) AND (v.lang = l.language))
Buffers: shared hit=5
-> Hash Join (cost=7.53..20.18 rows=4 width=96) (actual time=0.025..0.031 rows=12 loops=1)
Output: n_1.concept_nss, v.object, v.lang
Inner Unique: true
Hash Cond: (v.note_id = n_1.id)
Buffers: shared hit=3
-> Seq Scan on skos."NoteValue" v (cost=0.00..12.25 rows=750 width=80) (actual time=0.003..0.004 rows=12 loops=1)
Output: v.lang, v.object, v.note_id
Buffers: shared hit=1
-> Hash (cost=7.52..7.52 rows=4 width=48) (actual time=0.016..0.016 rows=12 loops=1)
Output: n_1.id, n_1.concept_nss
Buckets: 1024 Batches: 1 Memory Usage: 9kB
Buffers: shared hit=2
-> Bitmap Heap Scan on skos."Note" n_1 (cost=2.04..7.52 rows=4 width=48) (actual time=0.010..0.012 rows=12 loops=1)
Output: n_1.id, n_1.concept_nss
Recheck Cond: (n_1.type = 'skos_definition'::text)
Heap Blocks: exact=1
Buffers: shared hit=2
-> Bitmap Index Scan on "Note_property_concept_key" (cost=0.00..2.04 rows=4 width=0) (actual time=0.006..0.006 rows=13 loops=1)
Index Cond: (n_1.type = 'skos_definition'::text)
Buffers: shared hit=1
-> Hash (cost=2.69..2.69 rows=46 width=102) (actual time=0.124..0.124 rows=80 loops=1)
Output: n."notationType", n.value, n.concept_nss, l.concept_nss, l.language, l.label
Buckets: 1024 Batches: 1 Memory Usage: 19kB
Buffers: shared hit=2
-> Hash Right Join (cost=1.34..2.69 rows=46 width=102) (actual time=0.051..0.093 rows=80 loops=1)
Output: n."notationType", n.value, n.concept_nss, l.concept_nss, l.language, l.label
Hash Cond: (l.concept_nss = n.concept_nss)
Buffers: shared hit=2
-> Seq Scan on skos."ConceptPrefLabel" l (cost=0.00..1.16 rows=53 width=52) (actual time=0.004..0.010 rows=80 loops=1)
Output: l.label, l.language, l.concept_nss
Buffers: shared hit=1
-> Hash (cost=1.18..1.18 rows=46 width=50) (actual time=0.041..0.041 rows=80 loops=1)
Output: n."notationType", n.value, n.concept_nss
Buckets: 1024 Batches: 1 Memory Usage: 15kB
Buffers: shared hit=1
-> Seq Scan on skos."Notation" n (cost=0.00..1.18 rows=46 width=50) (actual time=0.004..0.018 rows=80 loops=1)
Output: n."notationType", n.value, n.concept_nss
Filter: (n."notationType" = 'NutricasCode'::text)
Rows Removed by Filter: 5
Buffers: shared hit=1
-> Hash (cost=18.85..18.85 rows=242 width=58) (actual time=0.102..0.102 rows=137 loops=1)
Output: trcu.cuid, trta.additive
Buckets: 1024 Batches: 1 Memory Usage: 19kB
Buffers: shared hit=2
-> Hash Left Join (cost=15.72..18.85 rows=242 width=58) (actual time=0.021..0.067 rows=137 loops=1)
Output: trcu.cuid, trta.additive
Hash Cond: ((trcu.technology)::bpchar = (trta.technology)::bpchar)
Buffers: shared hit=2
-> Seq Scan on public."CascadeUnit" trcu (cost=0.00..1.17 rows=55 width=52) (actual time=0.006..0.013 rows=82 loops=1)
Output: trcu.cuid, trcu.technology
Buffers: shared hit=1
-> Hash (cost=12.64..12.64 rows=880 width=64) (actual time=0.009..0.009 rows=16 loops=1)
Output: trta.technology, trta.additive
Buckets: 1024 Batches: 1 Memory Usage: 10kB
Buffers: shared hit=1
-> Seq Scan on public."TechnologyAdditive" trta (cost=0.00..12.64 rows=880 width=64) (actual time=0.003..0.004 rows=16 loops=1)
Output: trta.technology, trta.additive
Buffers: shared hit=1
Use multiple left joins. I don't even think you need aggregation:
SELECT . . .
FROM "CascadeStep" trcs LEFT JOIN
"CascadeUnit" trcu
ON trcu.cuid = trcs.from LEFT JOIN
"TechnologyAdditive" trta
ON trta.technology = trcu.technology LEFT JOIN
skos."Translations" tfrom and tfrom.notation = 'SimpleNotation'
ON tfrom.id = trcs.from) LEFT JOIN
skos."Translations" tto and tto.notation = 'SimpleNotation'
ON tto.id = trcs.to LEFT JOIN
skos."Translations" tflow and tto.notation = 'SimpleNotation'
ON tflow.id = trcs.flow LEFT JOIN
skos."Translations" tproduct and tflow.notation = 'SimpleNotation'
ON tproduct.id = trcs.product LEFT JOIN
skos."Translations" tadditive
ON tadditive.id = trcs.additive and tadditive.notation = 'SimpleNotation'
WHERE tto.id is not null or tflow.id is not null or
tproduct.id is not null or tadditive is not null;
I'm not sure exactly what your results look like. You may need to unpivot the data (using a lateral join) to get your exact results. But this should fix your performance problem.
Note: This assumes that each join to Translations has one match. This seems reasonable but you might need to take duplicates into account if it is not the case.

Strange pgsql query performance

I have a relation like this
R ( EDGE INTEGER, DIHEDRAL INTEGER, FACE INTEGER , VALENCY INTEGER)
I tested twice, 64 rows table R and 128 rows table R. but the simpler one takes much more time than the second one. The explain is like below (It shows error on explain.depesz.com). Could anyone help me to check why? thanks.
plan for 64 rows:
HashAggregate (cost=260.16..260.17 rows=1 width=12) (actual rows=64 loops=1)
-> Nested Loop (cost=89.44..260.15 rows=1 width=12) (actual rows=256 loops=1)
Join Filter: ((f1.face < f2.face) AND (e3.edge <> f1.edge) AND (e4.edge <> e3.edge) AND (f1.edge = f2.edge) AND (f1.face =
e3.face))
Rows Removed by Join Filter: 142606080
-> Nested Loop (cost=41.91..167.59 rows=1 width=16) (actual rows=557056 loops=1)
-> Nested Loop (cost=41.91..125.71 rows=1 width=8) (actual rows=256 loops=1)
Join Filter: ((e5.edge <> f2.edge) AND (e5.edge <> e2.edge) AND (e2.face = e5.face))
Rows Removed by Join Filter: 1113856
-> Hash Join (cost=41.91..83.73 rows=1 width=16) (actual rows=512 loops=1)
Hash Cond: (f2.face = e2.face)
Join Filter: (e2.edge <> f2.edge)
Rows Removed by Join Filter: 256
-> Seq Scan on r f2 (cost=0.00..41.76 rows=12 width=8) (actual rows=384 loops=1)
Filter: (valency = 3)
Rows Removed by Filter: 1920
-> Hash (cost=41.76..41.76 rows=12 width=8) (actual rows=2176 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 85kB
-> Seq Scan on r e2 (cost=0.00..41.76 rows=12 width=8) (actual rows=2176 loops=1)
Filter: (dihedral = 2)
Rows Removed by Filter: 128
-> Seq Scan on r e5 (cost=0.00..41.76 rows=12 width=8) (actual rows=2176 loops=512)
Filter: (dihedral = 2)
Rows Removed by Filter: 128
-> Seq Scan on r e3 (cost=0.00..41.76 rows=12 width=8) (actual rows=2176 loops=256)
Filter: (dihedral = 2)
Rows Removed by Filter: 128
-> Hash Join (cost=47.53..92.32 rows=11 width=16) (actual rows=256 loops=557056)
Hash Cond: (e4.face = f1.face)
Join Filter: (e4.edge <> f1.edge)
Rows Removed by Join Filter: 128
-> Seq Scan on r e4 (cost=0.00..36.01 rows=2301 width=8) (actual rows=2304 loops=557056)
-> Hash (cost=47.52..47.52 rows=1 width=8) (actual rows=128 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 5kB
-> Seq Scan on r f1 (cost=0.00..47.52 rows=1 width=8) (actual rows=128 loops=1)
Filter: ((valency = 3) AND (dihedral = 1))
Rows Removed by Filter: 2176
Total runtime: 159268.541 ms
(37 rows)
plan for 128 rows
HashAggregate (cost=501.28..501.29 rows=1 width=12) (actual rows=128 loops=1)
-> Nested Loop (cost=171.98..501.27 rows=2 width=12) (actual rows=512 loops=1)
Join Filter: ((e3.edge <> f1.edge) AND (e4.edge <> e3.edge) AND (f1.face = e3.face))
Rows Removed by Join Filter: 2227712
-> Seq Scan on r e3 (cost=0.00..80.31 rows=22 width=8) (actual rows=4352 loops=1)
Filter: (dihedral = 2)
Rows Removed by Filter: 256
-> Materialize (cost=171.98..420.08 rows=2 width=20) (actual rows=512 loops=4352)
-> Nested Loop (cost=171.98..420.07 rows=2 width=20) (actual rows=512 loops=1)
Join Filter: ((f1.face < f2.face) AND (f1.edge = f2.edge))
Rows Removed by Join Filter: 261632
-> Nested Loop (cost=80.59..242.23 rows=1 width=8) (actual rows=512 loops=1)
Join Filter: ((e5.edge <> f2.edge) AND (e5.edge <> e2.edge) AND (e2.face = e5.face))
Rows Removed by Join Filter: 4455936
-> Seq Scan on r e5 (cost=0.00..80.31 rows=22 width=8) (actual rows=4352 loops=1)
Filter: (dihedral = 2)
Rows Removed by Filter: 256
-> Materialize (cost=80.59..161.05 rows=2 width=16) (actual rows=1024 loops=4352)
-> Hash Join (cost=80.59..161.04 rows=2 width=16) (actual rows=1024 loops=1)
Hash Cond: (f2.face = e2.face)
Join Filter: (e2.edge <> f2.edge)
Rows Removed by Join Filter: 512
-> Seq Scan on r f2 (cost=0.00..80.31 rows=22 width=8) (actual rows=768 loops=1)
Filter: (valency = 3)
Rows Removed by Filter: 3840
-> Hash (cost=80.31..80.31 rows=22 width=8) (actual rows=4352 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 170kB
-> Seq Scan on r e2 (cost=0.00..80.31 rows=22 width=8) (actual rows=4352 loops=1)
Filter: (dihedral = 2)
Rows Removed by Filter: 256
-> Hash Join (cost=91.39..177.51 rows=22 width=16) (actual rows=512 loops=512)
Hash Cond: (e4.face = f1.face)
Join Filter: (e4.edge <> f1.edge)
Rows Removed by Join Filter: 256
-> Seq Scan on r e4 (cost=0.00..69.25 rows=4425 width=8) (actual rows=4608 loops=512)
-> Hash (cost=91.38..91.38 rows=1 width=8) (actual rows=256 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 10kB
-> Seq Scan on r f1 (cost=0.00..91.38 rows=1 width=8) (actual rows=256 loops=1)
Filter: ((valency = 3) AND (dihedral = 1))
Rows Removed by Filter: 4352
Total runtime: 1262.761 ms
(41 rows)
The query planner uses statistics on row counts/index sizes/etc. to estimate how to get the best performance out of a query. A bulk insertion of rows immediately followed by a query may not show best performance, because these statistics may be out of date.
To make sure the planner makes informed choices, you need to issue a call to ANALYZE prior to running your EXPLAIN query.
In your specific scenario, chances are the planner made a bad choice in the first case (the 64 rows) and a good one in the second case (the 128 rows).

QSqlQuery bindValue slow

I have a query like this
select Count(1) as Count, pt.Name as TypeName, pt.ID as TypeID, pc.ID as CatID,
o.Name as OffName, o.ID as OffID, pc.Color as Color, s.ID, s.ActionType,
s.EndTime, pt.Size, pt.Price, pt.Unit, pt.OffID as ProdOffID
from sess s
inner join off o on o.id = s.offid
inner join act a on a.sessid = s.id
inner join prod p on p.tagid = a.prodid
inner join ProdType pt on pt.id = p.prodtypeid and pt.offid = p.Offid
left join prodcat pc on pc.id = pt.prodcatid and pc.offid = pt.offid
where s.offid = ? and s.acttype in (?, ?)
Group By pt.Name, pt.ID, pc.ID, o.Name,
o.ID, pc.Color, s.ID, s.ActType,
s.EndTime, pt.Size, pt.Price, pt.Unit, pt.OffID
If I use bindValue for parameters, code block below takes lots of time (about 2 seconds)
QSqlQuery newQuery(db);
newQuery.prepare(queryString);
for (int parameterIndex=0;parameterIndex<values.count();parameterIndex++) {
newQuery.bindValue(parameterIndex,values[parameterIndex]);
}
newQuery.exec();
But if I replace ?'s with values and if I don’t use bindValue code block below takes about 50ms.
QSqlQuery newQuery(db);
newQuery.prepare(queryString);
newQuery.exec();
Is this normal? What makes this difference?
Note that these tables have btree indexes for their FK’s.
Using Qt 4.7.4 compiled with VC2008SP1. Database is PostgreSQL.
Answering to my own question (thanks to Mat):
PostgreSQL optimizes this query's plan according to values. So, prepared statements block these kind of optimizations and gives this query plan:
GroupAggregate (cost=581209.52..615986.02 rows=695530 width=72) (actual time=4067.645..4069.321 rows=101 loops=1)
-> Sort (cost=581209.52..582948.35 rows=695530 width=72) (actual time=4067.637..4067.719 rows=1832 loops=1)
Sort Key: pt.name, pt.id, pc.id, o.name, o.id, pc.color, s.id, s.actiontype, s.endtime, pt.size, pt.price, pt.unit, pt.officeid
Sort Method: quicksort Memory: 276kB
-> Hash Join (cost=49529.53..456659.15 rows=695530 width=72) (actual time=765.864..4047.298 rows=1832 loops=1)
Hash Cond: ((a.productid)::text = (p.tagid)::text)
-> Hash Join (cost=10640.07..391699.07 rows=555317 width=48) (actual time=41.884..3236.878 rows=2197 loops=1)
Hash Cond: (a.sessionid = s.id)
-> Seq Scan on action a (cost=0.00..280038.20 rows=15274820 width=29) (actual time=0.026..1586.065 rows=15274820 loops=1)
-> Hash (cost=10603.35..10603.35 rows=2938 width=23) (actual time=0.787..0.787 rows=116 loops=1)
-> Nested Loop (cost=208.16..10603.35 rows=2938 width=23) (actual time=0.234..0.747 rows=116 loops=1)
-> Seq Scan on office o (cost=0.00..4.26 rows=1 width=7) (actual time=0.012..0.019 rows=1 loops=1)
Filter: (id = $1)
-> Bitmap Heap Scan on session s (cost=208.16..10569.70 rows=2938 width=20) (actual time=0.216..0.701 rows=116 loops=1)
Recheck Cond: (s.officeid = $1)
Filter: (s.actiontype = ANY (ARRAY[$2, $3]))
-> Bitmap Index Scan on idx_session_officeid (cost=0.00..207.43 rows=11075 width=0) (actual time=0.103..0.103 rows=862 loops=1)
Index Cond: (s.officeid = $1)
-> Hash (cost=32726.06..32726.06 rows=244592 width=74) (actual time=707.589..707.589 rows=195238 loops=1)
-> Merge Join (cost=26994.35..32726.06 rows=244592 width=74) (actual time=383.882..595.784 rows=195238 loops=1)
Merge Cond: ((p.officeid = pt.officeid) AND (p.producttypeid = pt.id))
-> Sort (cost=26468.63..26956.84 rows=195284 width=33) (actual time=376.428..476.264 rows=195284 loops=1)
Sort Key: p.officeid, p.producttypeid
Sort Method: external merge Disk: 8776kB
-> Seq Scan on product p (cost=0.00..3966.84 rows=195284 width=33) (actual time=0.031..40.185 rows=195284 loops=1)
-> Sort (cost=525.72..536.77 rows=4421 width=49) (actual time=7.447..23.291 rows=199050 loops=1)
Sort Key: pt.officeid, pt.id
Sort Method: quicksort Memory: 618kB
-> Hash Left Join (cost=15.15..258.02 rows=4421 width=49) (actual time=0.194..3.094 rows=4421 loops=1)
Hash Cond: ((pt.productcategoryid = pc.id) AND (pt.officeid = pc.officeid))
-> Seq Scan on producttype pt (cost=0.00..112.21 rows=4421 width=41) (actual time=0.008..0.412 rows=4421 loops=1)
-> Hash (cost=8.46..8.46 rows=446 width=16) (actual time=0.175..0.175 rows=446 loops=1)
-> Seq Scan on productcategory pc (cost=0.00..8.46 rows=446 width=16) (actual time=0.005..0.075 rows=446 loops=1)
Total runtime: 4073.490 ms
But ordinary queries changes query plan in optimized way:
HashAggregate (cost=14152.70..14164.53 rows=947 width=72) (actual time=38.517..38.555 rows=101 loops=1)
-> Hash Left Join (cost=247.52..14119.55 rows=947 width=72) (actual time=3.163..35.021 rows=1832 loops=1)
Hash Cond: ((pt.productcategoryid = pc.id) AND (pt.officeid = pc.officeid))
-> Hash Join (cost=232.37..14076.41 rows=947 width=64) (actual time=2.984..33.823 rows=1832 loops=1)
Hash Cond: ((p.producttypeid = pt.id) AND (p.officeid = pt.officeid))
-> Nested Loop (cost=53.85..13699.42 rows=756 width=31) (actual time=0.288..29.579 rows=1833 loops=1)
-> Nested Loop (cost=53.85..8111.65 rows=756 width=48) (actual time=0.222..2.292 rows=2197 loops=1)
-> Nested Loop (cost=53.85..6293.69 rows=4 width=23) (actual time=0.216..0.661 rows=116 loops=1)
-> Seq Scan on office o (cost=0.00..4.26 rows=1 width=7) (actual time=0.013..0.020 rows=1 loops=1)
Filter: (id = 1)
-> Bitmap Heap Scan on session s (cost=53.85..6289.39 rows=4 width=20) (actual time=0.196..0.613 rows=116 loops=1)
Recheck Cond: (s.officeid = 1)
Filter: (s.actiontype = ANY ('{0,2}'::integer[]))
-> Bitmap Index Scan on idx_session_officeid (cost=0.00..53.84 rows=2864 width=0) (actual time=0.099..0.099 rows=862 loops=1)
Index Cond: (s.officeid = 1)
-> Index Scan using idx_action_sessionid on action a (cost=0.00..452.13 rows=189 width=29) (actual time=0.004..0.010 rows=19 loops=116)
Index Cond: (a.sessionid = s.id)
-> Index Scan using product_pkey on product p (cost=0.00..7.38 rows=1 width=33) (actual time=0.011..0.011 rows=1 loops=2197)
Index Cond: ((p.tagid)::text = (a.productid)::text)
-> Hash (cost=112.21..112.21 rows=4421 width=41) (actual time=2.686..2.686 rows=4421 loops=1)
-> Seq Scan on producttype pt (cost=0.00..112.21 rows=4421 width=41) (actual time=0.003..1.169 rows=4421 loops=1)
-> Hash (cost=8.46..8.46 rows=446 width=16) (actual time=0.173..0.173 rows=446 loops=1)
-> Seq Scan on productcategory pc (cost=0.00..8.46 rows=446 width=16) (actual time=0.003..0.067 rows=446 loops=1)
Total runtime: 38.728 ms