How optimize SQL query with JOIN many values?

How optimize SQL query with JOIN many values? - sql

I have a query like this where join ~6000 values:
SELECT MAX(user_id) as user_id, SUM(sum_amount_win) as sum_amount_win
FROM (
SELECT
a1.user_id
,CASE When MAX(currency) = 'RUB' Then SUM(d1.amount_cents) END as sum_amount_win
FROM dense_balance_transactions as d1
JOIN accounts a1 ON a1.id = d1.account_id
JOIN (
VALUES (5),(22),(26) -- ~6000 values
) AS v(user_id) USING (user_id)
WHERE d1.created_at BETWEEN '2019-06-01 00:00:00' AND '2019-06-20 23:59:59'
AND d1.action='win'
GROUP BY a1.user_id, a1.currency
) as t
GROUP BY user_id
QUERY PLAN for query with many VALUES:
GroupAggregate (cost=266816.48..266816.54 rows=1 width=48) (actual time=5024.201..5102.633 rows=5745 loops=1)
Group Key: a1.user_id
Buffers: shared hit=12205927
-> GroupAggregate (cost=266816.48..266816.51 rows=1 width=44) (actual time=5024.185..5099.621 rows=5774 loops=1)
Group Key: a1.user_id, a1.currency
Buffers: shared hit=12205927
-> Sort (cost=266816.48..266816.49 rows=1 width=20) (actual time=5024.170..5041.840 rows=291122 loops=1)
Sort Key: a1.user_id, a1.currency
Sort Method: quicksort Memory: 35032kB
Buffers: shared hit=12205927
-> Gather (cost=214410.62..266816.47 rows=1 width=20) (actual time=292.828..5204.320 rows=291122 loops=1)
Workers Planned: 5
Workers Launched: 5
Buffers: shared hit=12205921
-> Nested Loop (cost=213410.62..265816.37 rows=1 width=20) (actual time=255.028..3939.300 rows=48520 loops=6)
Buffers: shared hit=12205921
-> Merge Join (cost=213410.19..214522.45 rows=1269 width=20) (actual time=253.545..274.872 rows=1136 loops=6)
Merge Cond: (a1.user_id = "*VALUES*".column1)
Buffers: shared hit=191958
-> Sort (cost=212958.66..213493.45 rows=213914 width=20) (actual time=251.991..263.828 rows=82468 loops=6)
Sort Key: a1.user_id
Sort Method: quicksort Memory: 24322kB
Buffers: shared hit=191916
-> Parallel Seq Scan on accounts a1 (cost=0.00..194020.14 rows=213914 width=20) (actual time=0.042..196.052 rows=179242 loops=6)
Buffers: shared hit=191881
-> Sort (cost=451.52..466.52 rows=6000 width=4) (actual time=1.547..2.429 rows=6037 loops=6)
Sort Key: "*VALUES*".column1
Sort Method: quicksort Memory: 474kB
Buffers: shared hit=42
-> Values Scan on "*VALUES*" (cost=0.00..75.00 rows=6000 width=4) (actual time=0.002..0.928 rows=6000 loops=6)
-> Index Scan using index_dense_balance_transactions_on_account_id on dense_balance_transactions d1 (cost=0.44..40.41 rows=1 width=16) (actual time=0.160..3.220 rows=43 loops=6816)
Index Cond: (account_id = a1.id)
Filter: ((created_at >= '2019-06-01 00:00:00'::timestamp without time zone) AND (created_at <= '2019-06-20 23:59:59'::timestamp without time zone) AND ((action)::text = 'win'::text))
Rows Removed by Filter: 1942
Buffers: shared hit=12013963
Planning time: 10.239 ms
Execution time: 5387.523 ms
I use PosgreSQL 10.8.0.
Is there any chance to speed up this query?

Related

Need to improve count performance in PostgreSQL for this query

I have this query in PostgreSQL:
SELECT COUNT("contacts"."id")
FROM "contacts"
INNER JOIN "phone_numbers" ON "phone_numbers"."id" = "contacts"."phone_number_id"
INNER JOIN "companies" ON "companies"."id" = "contacts"."company_id"
WHERE (
(
(
CAST("phone_numbers"."value" AS VARCHAR) ILIKE '%a%'
OR CAST("contacts"."first_name" AS VARCHAR) ILIKE '%a%'
)
OR CAST("contacts"."last_name" AS VARCHAR) ILIKE '%a%'
)
OR CAST("companies"."name" AS VARCHAR) ILIKE '%a%'
)
When I run the query it is taking 19secs to run. I need to improve the performance.
Note: I already have the index for the columns.
EXPLAIN ANALYZE report
Finalize Aggregate (cost=209076.49..209076.54 rows=1 width=8) (actual time=6117.381..6646.477 rows=1 loops=1)
-> Gather (cost=209076.42..209076.48 rows=4 width=8) (actual time=6117.370..6646.473 rows=5 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Partial Aggregate (cost=209066.42..209066.47 rows=1 width=8) (actual time=5952.710..5952.723 rows=1 loops=5)
-> Hash Join (cost=137685.37..208438.42 rows=251200 width=8) (actual time=3007.055..5945.571 rows=39193 loops=5)
Hash Cond: (contacts.company_id = companies.id)
Join Filter: (((phone_numbers.value)::text ~~* '%as%'::text) OR ((contacts.first_name)::text ~~* '%as%'::text) OR ((contacts.last_name)::text ~~* '%as%'::text) OR ((companies.name)::text ~~* '%as%'::text))
Rows Removed by Join Filter: 763817
-> Parallel Hash Join (cost=137684.86..201964.34 rows=1003781 width=41) (actual time=3006.633..4596.987 rows=803010 loops=5)
Hash Cond: (contacts.phone_number_id = phone_numbers.id)
-> Parallel Seq Scan on contacts (cost=0.00..59316.85 rows=1003781 width=37) (actual time=11.032..681.124 rows=803010 loops=5)
-> Parallel Hash (cost=68914.22..68914.22 rows=1295458 width=20) (actual time=1632.770..1632.770 rows=803184 loops=5)
Buckets: 65536 Batches: 64 Memory Usage: 4032kB
-> Parallel Seq Scan on phone_numbers (cost=0.00..68914.22 rows=1295458 width=20) (actual time=10.780..1202.242 rows=803184 loops=5)
-> Hash (cost=0.30..0.30 rows=4 width=40) (actual time=0.258..0.258 rows=4 loops=5)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on companies (cost=0.00..0.30 rows=4 width=40) (actual time=0.247..0.248 rows=4 loops=5)
Planning Time: 1.895 ms
Execution Time: 6646.558 ms
Please help me on this performance issue.
I tried FUNCTION row_count_estimate (query text) and it is not giving the exact count.
Solution Tried:
I tried the Robert solution and got 16 Secs to run
My Query is:
SELECT Count(id) AS id
FROM (
SELECT contacts.id AS id
FROM contacts
WHERE (
contacts.last_name ilike '%as%')
OR (
contacts.last_name ilike '%as%')
UNION
SELECT contacts.id AS id
FROM contacts
WHERE contacts.phone_number_id IN
(
SELECT phone_numbers.id AS phone_number_id
FROM phone_numbers
WHERE phone_numbers.value ilike '%as%')
UNION
SELECT contacts.id AS id
FROM contacts
WHERE contacts.company_id IN
(
SELECT companies.id AS company_id
FROM companies
WHERE companies.name ilike '%as%' )) AS ID
Report:
Aggregate (cost=395890.08..395890.13 rows=1 width=8) (actual time=5942.601..5942.667 rows=1 loops=1)
-> Unique (cost=332446.76..337963.57 rows=1103362 width=8) (actual time=5929.800..5939.658 rows=101989 loops=1)
-> Sort (cost=332446.76..335205.17 rows=1103362 width=8) (actual time=5929.799..5933.823 rows=101989 loops=1)
Sort Key: contacts.id
Sort Method: external merge Disk: 1808kB
-> Append (cost=10.00..220843.02 rows=1103362 width=8) (actual time=1.158..5900.926 rows=101989 loops=1)
-> Gather (cost=10.00..61935.48 rows=99179 width=8) (actual time=1.158..569.412 rows=101989 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Parallel Seq Scan on contacts (cost=0.00..61826.30 rows=24795 width=8) (actual time=0.446..477.276 rows=20398 loops=5)
Filter: ((last_name)::text ~~* '%as%'::text)
Rows Removed by Filter: 782612
-> Nested Loop (cost=0.84..359.91 rows=402 width=8) (actual time=5292.088..5292.089 rows=0 loops=1)
-> Index Scan using idx_phone_value on phone_numbers (cost=0.41..64.13 rows=402 width=8) (actual time=5292.087..5292.087 rows=0 loops=1)
Index Cond: ((value)::text ~~* '%as%'::text)
Rows Removed by Index Recheck: 4015921
-> Index Scan using index_contacts_on_phone_number_id on contacts contacts_1 (cost=0.43..0.69 rows=1 width=16) (never executed)
Index Cond: (phone_number_id = phone_numbers.id)
-> Gather (cost=10.36..75795.48 rows=1003781 width=8) (actual time=26.298..26.331 rows=0 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Hash Join (cost=0.36..74781.70 rows=250945 width=8) (actual time=3.758..3.758 rows=0 loops=5)
Hash Cond: (contacts_2.company_id = companies.id)
-> Parallel Seq Scan on contacts contacts_2 (cost=0.00..59316.85 rows=1003781 width=16) (actual time=0.128..0.128 rows=1 loops=5)
-> Hash (cost=0.31..0.31 rows=1 width=8) (actual time=0.726..0.727 rows=0 loops=5)
Buckets: 1024 Batches: 1 Memory Usage: 8kB
-> Seq Scan on companies (cost=0.00..0.31 rows=1 width=8) (actual time=0.726..0.726 rows=0 loops=5)
Filter: ((name)::text ~~* '%as%'::text)
Rows Removed by Filter: 4
Planning Time: 0.846 ms
Execution Time: 5948.330 ms
I tried the below also:
EXPLAIN ANALYZE SELECT
count(id) AS id
FROM
(SELECT
contacts.id AS id
FROM
contacts
WHERE
(
position('as' in LOWER(last_name)) > 0
)
UNION
SELECT
contacts.id AS id
FROM
contacts
WHERE
EXISTS (
SELECT
1
FROM
phone_numbers
WHERE
(
position('as' in LOWER(phone_numbers.value)) > 0
)
AND (
contacts.phone_number_id = phone_numbers.id
)
)
UNION
SELECT
contacts.id AS id
FROM
contacts
WHERE
EXISTS (
SELECT
1
FROM
companies
WHERE
(
position('as' in LOWER(companies.name)) > 0
)
AND (
contacts.company_id = companies.id
)
)
UNION DISTINCT SELECT
contacts.id AS id
FROM
contacts
WHERE
(
position('as' in LOWER(first_name)) > 0
)
) AS ID;
Report
Aggregate (cost=1609467.66..1609467.71 rows=1 width=8) (actual time=1039.249..1039.330 rows=1 loops=1)
-> Unique (cost=1320886.03..1345980.09 rows=5018811 width=8) (actual time=999.363..1030.500 rows=195963 loops=1)
-> Sort (cost=1320886.03..1333433.06 rows=5018811 width=8) (actual time=999.362..1013.818 rows=198421 loops=1)
Sort Key: contacts.id
Sort Method: external merge Disk: 3520kB
-> Gather (cost=10.00..754477.62 rows=5018811 width=8) (actual time=0.581..941.210 rows=198421 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Parallel Append (cost=0.00..749448.80 rows=5018811 width=8) (actual time=290.521..943.736 rows=39684 loops=5)
-> Parallel Hash Join (cost=101469.35..164569.24 rows=334587 width=8) (actual time=724.841..724.843 rows=0 loops=2)
Hash Cond: (contacts.phone_number_id = phone_numbers.id)
-> Parallel Seq Scan on contacts (cost=0.00..59315.91 rows=1003762 width=16) (never executed)
-> Parallel Hash (cost=78630.16..78630.16 rows=431819 width=8) (actual time=723.735..723.735 rows=0 loops=2)
Buckets: 131072 Batches: 32 Memory Usage: 0kB
-> Parallel Seq Scan on phone_numbers (cost=0.00..78630.16 rows=431819 width=8) (actual time=723.514..723.514 rows=0 loops=2)
Filter: ("position"(lower((value)::text), 'as'::text) > 0)
Rows Removed by Filter: 2007960
-> Hash Join (cost=0.38..74780.48 rows=250940 width=8) (actual time=0.888..0.888 rows=0 loops=1)
Hash Cond: (contacts_1.company_id = companies.id)
-> Parallel Seq Scan on contacts contacts_1 (cost=0.00..59315.91 rows=1003762 width=16) (actual time=0.009..0.009 rows=1 loops=1)
-> Hash (cost=0.33..0.33 rows=1 width=8) (actual time=0.564..0.564 rows=0 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 8kB
-> Seq Scan on companies (cost=0.00..0.33 rows=1 width=8) (actual time=0.563..0.563 rows=0 loops=1)
Filter: ("position"(lower((name)::text), 'as'::text) > 0)
Rows Removed by Filter: 4
-> Parallel Seq Scan on contacts contacts_2 (cost=0.00..66844.13 rows=334588 width=8) (actual time=0.119..315.032 rows=20398 loops=5)
Filter: ("position"(lower((last_name)::text), 'as'::text) > 0)
Rows Removed by Filter: 782612
-> Parallel Seq Scan on contacts contacts_3 (cost=0.00..66844.13 rows=334588 width=8) (actual time=0.510..558.791 rows=32144 loops=3)
Filter: ("position"(lower((first_name)::text), 'as'::text) > 0)
Rows Removed by Filter: 1306206
Planning Time: 2.115 ms
Execution Time: 1040.620 ms

It's hard to help you, because I don't have acces to your data. Let me try...
EXPLAIN ANALYZE report shows that:
Yor query doesn't using indexes. Full scan on table phone_numbers tooks 1.202 second, and 0.681 senod on contacts table.
"Rows Removed by Join Filter: 763817".
"Parallel Hash Join (cost=137684.86..201964.34 rows=1003781 width=41) (actual time=3006.633..4596.987 rows=803010 loops=5)" . So this query joins ~800k rows and then filter 763k of it.
Maybe you can reverse that. This should speed up (but that needs to be checked).
For example you can test this - rewrite your query in this direction:
SELECT COUNT( ID)
FROM
(
SELECT "contacts"."id"
FROM "contacts"
Where <filters on contract here>
union
SELECT "contacts"."id"
FROM "contacts"
where phone_number_id in ( select "phone_numbers"."id"
from "phone_numbers"
where <filters on phone_numbers here>
) as A
union
SELECT "contacts"."id"
FROM "contacts"
where company_id in ( select "companies"."id"
from "companies"
where <filters on companies here> )
) as B
Two indexes: one on column contacts.phone_number_id and another on contacts.company_id might help.
EDIT:
It using index on "phone_numbers"."id" with nested loop it tooks 5 seconds.
Try to avoid this.
Please check, what it will do for this:
SELECT Count(id) AS id
FROM (
SELECT contacts.id AS id
FROM contacts
WHERE (
contacts.last_name ilike '%as%')
OR (
contacts.last_name ilike '%as%')
UNION
SELECT contacts.id AS id
FROM contacts
WHERE contacts.phone_number_id IN
(
SELECT to_number(to_char(phone_numbers.id))) /* just for disable index scan for that column */ AS phone_number_id
FROM phone_numbers
WHERE phone_numbers.value ilike '%as%')
UNION
SELECT contacts.id AS id
FROM contacts
WHERE contacts.company_id IN
(
SELECT companies.id AS company_id
FROM companies
WHERE companies.name ilike '%as%' )) AS ID
Aggregate (cost=419095.35..419095.40 rows=1 width=8) (actual time=13235.986..13236.335 rows=1 loops=1)
-> Unique (cost=346875.23..353155.24 rows=1256002 width=8) (actual time=13211.350..13230.729 rows=195963 loops=1)
-> Sort (cost=346875.23..350015.24 rows=1256002 width=8) (actual time=13211.349..13219.607 rows=195963 loops=1)
Sort Key: contacts.id
Sort Method: external merge Disk: 3472kB
-> Append (cost=2249.63..218658.27 rows=1256002 width=8) (actual time=5927.019..13164.421 rows=195963 loops=1)
-> Gather (cost=2249.63..48279.58 rows=251838 width=8) (actual time=5927.019..6911.795 rows=195963 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Parallel Bitmap Heap Scan on contacts (cost=2239.63..48017.74 rows=62960 width=8) (actual time=5861.480..6865.957 rows=39193 loops=5)
Recheck Cond: (((first_name)::text ~~* '%as%'::text) OR ((last_name)::text ~~* '%as%'::text))
Rows Removed by Index Recheck: 763815
Heap Blocks: exact=10860 lossy=6075
-> BitmapOr (cost=2239.63..2239.63 rows=255705 width=0) (actual time=5917.966..5917.966 rows=0 loops=1)
-> Bitmap Index Scan on idx_trgm_contacts_first_name (cost=0.00..1291.57 rows=156527 width=0) (actual time=2972.404..2972.404 rows=4015039 loops=1)
Index Cond: ((first_name)::text ~~* '%as%'::text)
-> Bitmap Index Scan on idx_trgm_contacts_last_name (cost=0.00..822.14 rows=99177 width=0) (actual time=2945.560..2945.560 rows=4015038 loops=1)
Index Cond: ((last_name)::text ~~* '%as%'::text)
-> Nested Loop (cost=81.96..384.33 rows=402 width=8) (actual time=6213.028..6213.028 rows=0 loops=1)
-> Unique (cost=81.52..83.53 rows=402 width=8) (actual time=6213.027..6213.027 rows=0 loops=1)
-> Sort (cost=81.52..82.52 rows=402 width=8) (actual time=6213.027..6213.027 rows=0 loops=1)
Sort Key: ((NULLIF((phone_numbers.id)::text, ''::text))::integer)
Sort Method: quicksort Memory: 25kB
-> Index Scan using idx_trgm_phone_value on phone_numbers (cost=0.41..64.13 rows=402 width=8) (actual time=6213.006..6213.006 rows=0 loops=1)
Index Cond: ((value)::text ~~* '%as%'::text)
Rows Removed by Index Recheck: 4015921
-> Index Scan using index_contacts_on_phone_number_id on contacts contacts_1 (cost=0.44..0.70 rows=1 width=16) (never executed)
Index Cond: (phone_number_id = (NULLIF((phone_numbers.id)::text, ''::text))::integer)
-> Gather (cost=10.36..75794.22 rows=1003762 width=8) (actual time=25.691..25.709 rows=0 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Hash Join (cost=0.36..74780.46 rows=250940 width=8) (actual time=2.653..2.653 rows=0 loops=5)
Hash Cond: (contacts_2.company_id = companies.id)
-> Parallel Seq Scan on contacts contacts_2 (cost=0.00..59315.91 rows=1003762 width=16) (actual time=0.244..0.244 rows=1 loops=5)
-> Hash (cost=0.31..0.31 rows=1 width=8) (actual time=0.244..0.244 rows=0 loops=5)
Buckets: 1024 Batches: 1 Memory Usage: 8kB
-> Seq Scan on companies (cost=0.00..0.31 rows=1 width=8) (actual time=0.244..0.244 rows=0 loops=5)
Filter: ((name)::text ~~* '%as%'::text)
Rows Removed by Filter: 4
Planning Time: 1.458 ms
Execution Time: 13236.949 ms
I tried below,
SELECT Count(id) AS id
FROM (
SELECT contacts.id AS id
FROM contacts
WHERE (substring(LOWER(contacts.first_name), position('as' in LOWER(first_name)), 2) = 'as')
OR (substring(LOWER(contacts.last_name), position('as' in LOWER(last_name)), 2) = 'as')
UNION
SELECT contacts.id AS id
FROM contacts
WHERE contacts.phone_number_id IN
(
SELECT NULLIF(CAST(phone_numbers.id AS text), '')::int AS phone_number_id
FROM phone_numbers
WHERE (substring(LOWER(phone_numbers.value), position('as' in LOWER(phone_numbers.value)), 2) = 'as'))
UNION
SELECT contacts.id AS id
FROM contacts
WHERE contacts.company_id IN
(
SELECT companies.id AS company_id
FROM companies
WHERE (substring(LOWER(companies.name), position('as' in LOWER(companies.name)), 2) = 'as') )) AS ID
Aggregate (cost=508646.88..508646.93 rows=1 width=8) (actual time=1455.892..1455.995 rows=1 loops=1)
-> Unique (cost=447473.09..452792.55 rows=1063892 width=8) (actual time=1431.464..1450.434 rows=195963 loops=1)
-> Sort (cost=447473.09..450132.82 rows=1063892 width=8) (actual time=1431.464..1439.267 rows=195963 loops=1)
Sort Key: contacts.id
Sort Method: external merge Disk: 3472kB
-> Append (cost=10.00..340141.41 rows=1063892 width=8) (actual time=0.391..1370.557 rows=195963 loops=1)
-> Gather (cost=10.00..84460.02 rows=40050 width=8) (actual time=0.391..983.457 rows=195963 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Parallel Seq Scan on contacts (cost=0.00..84409.97 rows=10012 width=8) (actual time=1.696..987.285 rows=39193 loops=5)
Filter: (("substring"(lower((first_name)::text), "position"(lower((first_name)::text), 'as'::text), 2) = 'as'::text) OR ("substring"(lower((last_name)::text), "position"(lower((last_name)::text), 'as'::text), 2) = 'as'::text))
Rows Removed by Filter: 763817
-> Nested Loop (cost=85188.17..100095.23 rows=20080 width=8) (actual time=364.076..364.125 rows=0 loops=1)
-> HashAggregate (cost=85187.73..86191.73 rows=20080 width=8) (actual time=364.074..364.123 rows=0 loops=1)
Group Key: (NULLIF((phone_numbers.id)::text, ''::text))::integer
Batches: 1 Memory Usage: 793kB
-> Gather (cost=10.00..85137.53 rows=20080 width=8) (actual time=363.976..364.025 rows=0 loops=1)
Workers Planned: 3
Workers Launched: 3
-> Parallel Seq Scan on phone_numbers (cost=0.00..85107.45 rows=6477 width=8) (actual time=357.030..357.031 rows=0 loops=4)
Filter: ("substring"(lower((value)::text), "position"(lower((value)::text), 'as'::text), 2) = 'as'::text)
Rows Removed by Filter: 1003980
-> Index Scan using index_contacts_on_phone_number_id on contacts contacts_1 (cost=0.44..0.64 rows=1 width=16) (never executed)
Index Cond: (phone_number_id = (NULLIF((phone_numbers.id)::text, ''::text))::integer)
-> Gather (cost=10.40..75794.26 rows=1003762 width=8) (actual time=6.889..6.910 rows=0 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Hash Join (cost=0.40..74780.50 rows=250940 width=8) (actual time=0.138..0.139 rows=0 loops=5)
Hash Cond: (contacts_2.company_id = companies.id)
-> Parallel Seq Scan on contacts contacts_2 (cost=0.00..59315.91 rows=1003762 width=16) (actual time=0.004..0.004 rows=1 loops=5)
-> Hash (cost=0.35..0.35 rows=1 width=8) (actual time=0.081..0.081 rows=0 loops=5)
Buckets: 1024 Batches: 1 Memory Usage: 8kB
-> Seq Scan on companies (cost=0.00..0.35 rows=1 width=8) (actual time=0.081..0.081 rows=0 loops=5)
Filter: ("substring"(lower((name)::text), "position"(lower((name)::text), 'as'::text), 2) = 'as'::text)
Rows Removed by Filter: 4
Planning Time: 0.927 ms
Execution Time: 1456.742 ms

SQL (Postgres) Optimal Number of Joins

This query is being performed in postgres version 12. This query poses 8 joins, and lasts approximately 5 seconds.
Query 1
select *
from "public"."products" "P"
inner join "system"."categories" "C" on "C"."id" = "P"."id_category"
inner join "public"."businesses" "E" on "E"."id" = "P"."id_business"
left join "public"."product_files" "pf" on "pf"."id_product" = "P"."id"
left join "system"."files" "f" on "f"."name" = "pf"."img_code"
left join "public"."product_variations" "pv" on ("pv"."id_product" = "P"."id" and "pv"."status" <> 'Deleted')
left join "public"."product_stocks" "ps" on ("ps"."id_product_variation" = "pv"."id" and "ps"."status" <> 'Deleted')
left join "public"."product_stocks" "pps" on ("pps"."id_product" = "P"."id" and "pps"."status" <> 'Deleted')
inner join search_products( array['tires'], 8, 1, 'es') "search" on search.id = "P"."id"
where "P"."status" <> 'Deleted'
Postgres Query EXPLAIN(ANALYZE, BUFFERS) for Query 1
Merge Join (cost=112948.60..121805.61 rows=4996 width=1145) (actual time=2003.599..2426.892 rows=40 loops=1)
Merge Cond: ("P".id = search.id)
Buffers: shared hit=760531, temp read=16912 written=18837
-> Merge Left Join (cost=112888.52..120950.73 rows=287945 width=1105) (actual time=1607.013..2093.722 rows=380961 loops=1)
Merge Cond: ("P".id = pf.id_product)
Buffers: shared hit=752079, temp read=15561 written=15606
-> Merge Left Join (cost=16288.22..19167.29 rows=57631 width=771) (actual time=165.803..271.662 rows=76193 loops=1)
Merge Cond: ("P".id = pps.id_product)
Buffers: shared hit=3820, temp read=2706 written=2733
-> Merge Left Join (cost=16287.81..16577.01 rows=57631 width=686) (actual time=165.787..217.878 rows=56921 loops=1)
Merge Cond: ("P".id = pv.id_product)
Buffers: shared hit=2058, temp read=2706 written=2733
-> Sort (cost=14888.93..15033.01 rows=57631 width=514) (actual time=156.825..175.154 rows=56920 loops=1)
Sort Key: "P".id
Sort Method: external merge Disk: 21840kB
Buffers: shared hit=1430, temp read=2706 written=2733
-> Hash Join (cost=43.49..2484.49 rows=57631 width=514) (actual time=0.266..64.052 rows=57631 loops=1)
Hash Cond: ("P".id_business = "E".id)
Buffers: shared hit=1430
-> Hash Join (cost=37.81..2322.14 rows=57631 width=374) (actual time=0.214..39.402 rows=57631 loops=1)
Hash Cond: ("P".id_category = "C".id)
Buffers: shared hit=1427
-> Seq Scan on products "P" (cost=0.00..2132.41 rows=57631 width=252) (actual time=0.009..12.754 rows=57631 loops=1)
Filter: ((status)::text <> 'Deleted'::text)
Rows Removed by Filter: 2
Buffers: shared hit=1412
-> Hash (cost=25.14..25.14 rows=1014 width=122) (actual time=0.201..0.201 rows=1014 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 124kB
Buffers: shared hit=15
-> Seq Scan on categories "C" (cost=0.00..25.14 rows=1014 width=122) (actual time=0.007..0.078 rows=1014 loops=1)
Buffers: shared hit=15
-> Hash (cost=4.19..4.19 rows=119 width=140) (actual time=0.047..0.048 rows=119 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 29kB
Buffers: shared hit=3
-> Seq Scan on businesses "E" (cost=0.00..4.19 rows=119 width=140) (actual time=0.013..0.024 rows=119 loops=1)
Buffers: shared hit=3
-> Sort (cost=1398.88..1399.05 rows=70 width=172) (actual time=8.956..8.958 rows=3 loops=1)
Sort Key: pv.id_product
Sort Method: quicksort Memory: 43kB
Buffers: shared hit=628
-> Hash Right Join (cost=4.58..1396.73 rows=70 width=172) (actual time=8.853..8.912 rows=70 loops=1)
Hash Cond: (ps.id_product_variation = pv.id)
Buffers: shared hit=628
-> Seq Scan on product_stocks ps (cost=0.00..1259.35 rows=50589 width=85) (actual time=0.009..7.030 rows=50595 loops=1)
Filter: ((status)::text <> 'Deleted'::text)
Rows Removed by Filter: 73
Buffers: shared hit=626
-> Hash (cost=3.70..3.70 rows=70 width=87) (actual time=0.048..0.049 rows=70 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 16kB
Buffers: shared hit=2
-> Seq Scan on product_variations pv (cost=0.00..3.70 rows=70 width=87) (actual time=0.020..0.039 rows=70 loops=1)
Filter: ((status)::text <> 'Deleted'::text)
Rows Removed by Filter: 66
Buffers: shared hit=2
-> Index Scan using product_stocks_id_product_id_product_variation_id_location_key on product_stocks pps (cost=0.41..1819.97 rows=50589 width=85) (actual time=0.013..17.822 rows=49924 loops=1)
Filter: ((status)::text <> 'Deleted'::text)
Rows Removed by Filter: 1
Buffers: shared hit=1762
-> Materialize (cost=96600.25..98040.03 rows=287955 width=334) (actual time=1441.203..1613.160 rows=380961 loops=1)
Buffers: shared hit=748259, temp read=12855 written=12873
-> Sort (cost=96600.25..97320.14 rows=287955 width=334) (actual time=1441.198..1567.183 rows=284596 loops=1)
Sort Key: pf.id_product
Sort Method: external merge Disk: 102840kB
Buffers: shared hit=748259, temp read=12855 written=12873
-> Merge Left Join (cost=0.84..44546.48 rows=287955 width=334) (actual time=0.021..1013.742 rows=287955 loops=1)
Merge Cond: ((pf.img_code)::text = (f.name)::text)
Buffers: shared hit=748259
-> Index Scan using product_files_pkey on product_files pf (cost=0.42..10516.05 rows=287955 width=66) (actual time=0.005..184.173 rows=287955 loops=1)
Buffers: shared hit=289884
-> Index Scan using files_pkey on files f (cost=0.42..29304.42 rows=455180 width=268) (actual time=0.005..338.206 rows=455178 loops=1)
Buffers: shared hit=458375
-> Sort (cost=60.08..62.58 rows=1000 width=40) (actual time=313.554..313.558 rows=36 loops=1)
Sort Key: search.id
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=8452, temp read=1351 written=3231
-> Function Scan on search_products search (cost=0.25..10.25 rows=1000 width=40) (actual time=313.544..313.545 rows=8 loops=1)
Buffers: shared hit=8452, temp read=1351 written=3231
Planning Time: 2.632 ms
Execution Time: 2440.414 ms
I was reviewing the way to optimize the query, so I was doing the joins one by one to see where the problem was, and among so many permutations in order of join, I realized that postgres from join number 7, apparently stops find the best way to run the query. So, when i delete any (randomly) join, the query lasts 300ms
Query 2
select *
from "public"."products" "P"
inner join "system"."categories" "C" on "C"."id" = "P"."id_category"
left join "public"."product_files" "pf" on "pf"."id_product" = "P"."id"
left join "system"."files" "f" on "f"."name" = "pf"."img_code"
left join "public"."product_variations" "pv" on ("pv"."id_product" = "P"."id" and "pv"."status" <> 'Deleted')
left join "public"."product_stocks" "ps" on ("ps"."id_product_variation" = "pv"."id" and "ps"."status" <> 'Deleted')
left join "public"."product_stocks" "pps" on ("pps"."id_product" = "P"."id" and "pps"."status" <> 'Deleted')
inner join search_products( array['tires'], 8, 1, 'es') "search" on search.id = "P"."id"
where "P"."status" <> 'Deleted'
Postgres Query EXPLAIN(ANALYZE, BUFFERS) for Query 2
Nested Loop Left Join (cost=1365.30..6482.09 rows=4996 width=1005) (actual time=349.888..350.399 rows=40 loops=1)
Buffers: shared hit=9339, temp read=1351 written=3231
-> Nested Loop Left Join (cost=1364.88..3893.89 rows=4996 width=737) (actual time=349.866..349.957 rows=40 loops=1)
Buffers: shared hit=9179, temp read=1351 written=3231
-> Nested Loop Left Join (cost=1364.46..3250.90 rows=1000 width=671) (actual time=349.857..349.899 rows=8 loops=1)
Buffers: shared hit=9147, temp read=1351 written=3231
-> Hash Join (cost=1364.04..2759.11 rows=1000 width=586) (actual time=349.839..349.853 rows=8 loops=1)
Hash Cond: ("P".id_category = "C".id)
Buffers: shared hit=9119, temp read=1351 written=3231
-> Hash Right Join (cost=1326.23..2718.65 rows=1000 width=464) (actual time=349.566..349.574 rows=8 loops=1)
Hash Cond: (pv.id_product = "P".id)
Buffers: shared hit=9104, temp read=1351 written=3231
-> Hash Right Join (cost=4.58..1396.73 rows=70 width=172) (actual time=8.953..9.013 rows=70 loops=1)
Hash Cond: (ps.id_product_variation = pv.id)
Buffers: shared hit=628
-> Seq Scan on product_stocks ps (cost=0.00..1259.35 rows=50589 width=85) (actual time=0.008..7.060 rows=50595 loops=1)
Filter: ((status)::text <> 'Deleted'::text)
Rows Removed by Filter: 73
Buffers: shared hit=626
-> Hash (cost=3.70..3.70 rows=70 width=87) (actual time=0.047..0.048 rows=70 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 16kB
Buffers: shared hit=2
-> Seq Scan on product_variations pv (cost=0.00..3.70 rows=70 width=87) (actual time=0.015..0.033 rows=70 loops=1)
Filter: ((status)::text <> 'Deleted'::text)
Rows Removed by Filter: 66
Buffers: shared hit=2
-> Hash (cost=1309.15..1309.15 rows=1000 width=292) (actual time=340.542..340.543 rows=8 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 10kB
Buffers: shared hit=8476, temp read=1351 written=3231
-> Nested Loop (cost=0.54..1309.15 rows=1000 width=292) (actual time=340.505..340.535 rows=8 loops=1)
Buffers: shared hit=8476, temp read=1351 written=3231
-> Function Scan on search_products search (cost=0.25..10.25 rows=1000 width=40) (actual time=340.483..340.485 rows=8 loops=1)
Buffers: shared hit=8452, temp read=1351 written=3231
-> Index Scan using products_pkey on products "P" (cost=0.29..1.30 rows=1 width=252) (actual time=0.005..0.005 rows=1 loops=8)
Index Cond: (id = search.id)
Filter: ((status)::text <> 'Deleted'::text)
Buffers: shared hit=24
-> Hash (cost=25.14..25.14 rows=1014 width=122) (actual time=0.268..0.268 rows=1014 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 124kB
Buffers: shared hit=15
-> Seq Scan on categories "C" (cost=0.00..25.14 rows=1014 width=122) (actual time=0.012..0.110 rows=1014 loops=1)
Buffers: shared hit=15
-> Index Scan using product_stocks_id_product_id_product_variation_id_location_key on product_stocks pps (cost=0.41..0.47 rows=2 width=85) (actual time=0.005..0.005 rows=1 loops=8)
Index Cond: (id_product = "P".id)
Filter: ((status)::text <> 'Deleted'::text)
Buffers: shared hit=28
-> Index Scan using idx_product_files_product on product_files pf (cost=0.42..0.59 rows=5 width=66) (actual time=0.004..0.005 rows=5 loops=8)
Index Cond: (id_product = "P".id)
Buffers: shared hit=32
-> Index Scan using files_pkey on files f (cost=0.42..0.52 rows=1 width=268) (actual time=0.010..0.010 rows=1 loops=40)
Index Cond: ((name)::text = (pf.img_code)::text)
Buffers: shared hit=160
Planning Time: 2.581 ms
Execution Time: 350.525 ms
Is there an article that explains this behavior to me? and how to fix it?

That is because join_collapse_limit has a default value of 8. The optimizer tries all permutations only for the first 8 tables, the rest is joined as written. The rationale is to keep planning time reasonably short, which increases exponentially with the number of tables.
Options:
increase the parameter
figure out a good join order ans rewrite the query to join in that order

Postgres query becomes extremely slow with one single change

I have this query, where among other things, I need the discussions.client_first_responded_at to be converted to the given time zone, which is different for every row. If I replace the reference to users.time_zone_offset to a static '-06:00'::INTERVAL — the query executes within a second, but with the dynamic reference to the users.time_zone_offset it takes ~120 seconds.
What am I missing?
SELECT client_id
FROM programs
INNER JOIN users ON users.id = programs.coach_id
WHERE programs.id IN (
SELECT COALESCE(discussions.parent_id, calls.parent_id) AS program_id
FROM categorizations
LEFT OUTER JOIN discussions ON discussions.id = categorizations.categorizable_id AND categorizations.categorizable_type = 'Discussion'
LEFT OUTER JOIN calls ON calls.id = categorizations.categorizable_id AND categorizations.categorizable_type = 'Call'
WHERE categorizations.categorizable_type = 'Discussion' OR categorizations.categorizable_type = 'Call'
AND COALESCE(
discussions.client_first_responded_at::timestamptz AT TIME ZONE users.time_zone_offset::INTERVAL,
calls.start_time::timestamptz
) BETWEEN '2020-09-01' AND '2020-09-30'
);
UPD:
Hash Join (cost=1.47..250840.61 rows=3542 width=8) (actual time=35.419..61810.027 rows=2266 loops=1)
Hash Cond: (programs.coach_id = users.id)
Join Filter: (SubPlan 1)
Rows Removed by Join Filter: 4821
Buffers: shared hit=3087792
-> Seq Scan on programs (cost=0.00..368.84 rows=7084 width=20) (actual time=0.008..11.890 rows=7087 loops=1)
Buffers: shared hit=298
-> Hash (cost=1.21..1.21 rows=21 width=40) (actual time=0.015..0.015 rows=21 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
Buffers: shared hit=1
-> Seq Scan on users (cost=0.00..1.21 rows=21 width=40) (actual time=0.004..0.008 rows=21 loops=1)
Buffers: shared hit=1
SubPlan 1
-> Hash Right Join (cost=777.19..2168.01 rows=7477 width=4) (actual time=0.007..7.887 rows=7516 loops=7087)
Hash Cond: (discussions.id = categorizations.categorizable_id)
Join Filter: ((categorizations.categorizable_type)::text = 'Discussion'::text)
Rows Removed by Join Filter: 3473
Filter: (((categorizations.categorizable_type)::text = 'Discussion'::text) OR (((categorizations.categorizable_type)::text = 'Call'::text) AND (COALESCE((timezone((users.time_zone_offset)::interval, (discussions.client_first_responded_at)::timestamp with time zone))::timestamp with time zone, (calls.start_time)::timestamp with time zone) >= '2020-09-01 00:00:00+02'::timestamp with time zone) AND (COALESCE((timezone((users.time_zone_offset)::interval, (discussions.client_first_responded_at)::timestamp with time zone))::timestamp with time zone, (calls.start_time)::timestamp with time zone) <= '2020-09-30 00:00:00+02'::timestamp with time zone)))
Rows Removed by Filter: 2578
Buffers: shared hit=3087493
-> Seq Scan on discussions (cost=0.00..751.46 rows=18746 width=20) (actual time=0.003..1.842 rows=15668 loops=7087)
Buffers: shared hit=3087252
-> Hash (cost=647.28..647.28 rows=10393 width=25) (actual time=13.355..13.355 rows=13090 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 763kB
Buffers: shared hit=241
-> Hash Left Join (cost=300.68..647.28 rows=10393 width=25) (actual time=3.476..10.083 rows=13090 loops=1)
Hash Cond: (categorizations.categorizable_id = calls.id)
Join Filter: ((categorizations.categorizable_type)::text = 'Call'::text)
Rows Removed by Join Filter: 2303
Buffers: shared hit=241
-> Seq Scan on categorizations (cost=0.00..319.32 rows=10393 width=13) (actual time=0.006..2.455 rows=13090 loops=1)
Filter: (((categorizable_type)::text = 'Discussion'::text) OR ((categorizable_type)::text = 'Call'::text))
Buffers: shared hit=123
-> Hash (cost=199.19..199.19 rows=8119 width=20) (actual time=3.462..3.462 rows=8119 loops=1)
Buckets: 8192 Batches: 1 Memory Usage: 509kB
Buffers: shared hit=118
-> Seq Scan on calls (cost=0.00..199.19 rows=8119 width=20) (actual time=0.005..1.766 rows=8119 loops=1)
Buffers: shared hit=118
Planning Time: 0.643 ms
Execution Time: 61811.118 ms
Hash Join (cost=3537.47..4289.98 rows=4825 width=8) (actual time=111.743..122.572 rows=4371 loops=1)
Hash Cond: (programs.coach_id = users.id)
Buffers: shared hit=1931
-> Hash Join (cost=3535.95..4273.46 rows=4825 width=12) (actual time=111.627..120.637 rows=4371 loops=1)
Hash Cond: (programs.id = COALESCE(discussions.parent_id, calls.parent_id))
Buffers: shared hit=1930
-> Seq Scan on programs (cost=0.00..658.50 rows=9650 width=20) (actual time=0.011..4.880 rows=9656 loops=1)
Buffers: shared hit=562
-> Hash (cost=3350.83..3350.83 rows=14810 width=8) (actual time=111.573..111.580 rows=4371 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 282kB
Buffers: shared hit=1368
-> HashAggregate (cost=3202.73..3350.83 rows=14810 width=8) (actual time=107.495..109.868 rows=4371 loops=1)
Group Key: COALESCE(discussions.parent_id, calls.parent_id)
Buffers: shared hit=1368
-> Hash Left Join (cost=1688.78..3165.70 rows=14810 width=8) (actual time=34.242..97.144 rows=19275 loops=1)
Hash Cond: (categorizations.categorizable_id = calls.id)
Join Filter: ((categorizations.categorizable_type)::text = 'Call'::text)
Rows Removed by Join Filter: 6869
Filter: (((categorizations.categorizable_type)::text = 'Discussion'::text) OR (((categorizations.categorizable_type)::text = 'Call'::text) AND (COALESCE((timezone('-06:00:00'::interval, (discussions.client_first_responded_at)::timestamp with time zone))::timestamp with time zone, (calls.start_time)::timestamp with time zone) >= '2020-09-01 00:00:00+00'::timestamp with time zone) AND (COALESCE((timezone('-06:00:00'::interval, (discussions.client_first_responded_at)::timestamp with time zone))::timestamp with time zone, (calls.start_time)::timestamp with time zone) <= '2020-09-30 00:00:00+00'::timestamp with time zone)))
Rows Removed by Filter: 7674
Buffers: shared hit=1368
-> Hash Right Join (cost=1046.24..2467.57 rows=21181 width=25) (actual time=21.970..62.581 rows=26949 loops=1)
Hash Cond: (discussions.id = categorizations.categorizable_id)
Join Filter: ((categorizations.categorizable_type)::text = 'Discussion'::text)
Rows Removed by Join Filter: 8461
Buffers: shared hit=1057
-> Seq Scan on discussions (cost=0.00..997.71 rows=31771 width=20) (actual time=0.183..8.541 rows=32064 loops=1)
Buffers: shared hit=680
-> Hash (cost=781.47..781.47 rows=21181 width=13) (actual time=21.720..21.721 rows=26949 loops=1)
Buckets: 32768 Batches: 1 Memory Usage: 1444kB
Buffers: shared hit=377
-> Seq Scan on categorizations (cost=0.00..781.47 rows=21181 width=13) (actual time=0.012..11.631 rows=26949 loops=1)
Filter: (((categorizable_type)::text = 'Discussion'::text) OR ((categorizable_type)::text = 'Call'::text))
Buffers: shared hit=377
-> Hash (cost=458.35..458.35 rows=14735 width=20) (actual time=12.205..12.206 rows=14720 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 876kB
Buffers: shared hit=311
-> Seq Scan on calls (cost=0.00..458.35 rows=14735 width=20) (actual time=0.016..6.914 rows=14720 loops=1)
Buffers: shared hit=311
-> Hash (cost=1.23..1.23 rows=23 width=8) (actual time=0.065..0.066 rows=23 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
Buffers: shared hit=1
-> Seq Scan on users (cost=0.00..1.23 rows=23 width=8) (actual time=0.048..0.054 rows=23 loops=1)
Buffers: shared hit=1
Planning Time: 1.010 ms
Execution Time: 123.511 ms

How optimize SQL query with DISTINCT ON and JOIN many values?

I have a query like this where join ~6000 values
SELECT DISTINCT ON(user_id)
user_id,
finished_at as last_deposit_date,
CASE When currency = 'RUB' Then amount_cents END as last_deposit_amount_cents
FROM payments
JOIN (VALUES (5),(22),(26)) --~6000 values
AS v(user_id) USING (user_id)
WHERE action = 'deposit'
AND success = 't'
AND currency IN ('RUB')
ORDER BY user_id, finished_at DESC
QUERY PLAN for query with many VALUES:
Unique (cost=444606.97..449760.44 rows=19276 width=24) (actual time=6129.403..6418.317 rows=5991 loops=1)
Buffers: shared hit=2386527, temp read=7807 written=7808
-> Sort (cost=444606.97..447183.71 rows=1030695 width=24) (actual time=6129.401..6295.457 rows=1877039 loops=1)
Sort Key: payments.user_id, payments.finished_at DESC
Sort Method: external merge Disk: 62456kB
Buffers: shared hit=2386527, temp read=7807 written=7808
-> Nested Loop (cost=0.43..341665.35 rows=1030695 width=24) (actual time=0.612..5085.376 rows=1877039 loops=1)
Buffers: shared hit=2386521
-> Values Scan on "*VALUES*" (cost=0.00..75.00 rows=6000 width=4) (actual time=0.002..4.507 rows=6000 loops=1)
-> Index Scan using index_payments_on_user_id on payments (cost=0.43..54.78 rows=172 width=28) (actual time=0.010..0.793 rows=313 loops=6000)
Index Cond: (user_id = "*VALUES*".column1)
Filter: (success AND ((action)::text = 'deposit'::text) AND ((currency)::text = 'RUB'::text))
Rows Removed by Filter: 85
Buffers: shared hit=2386521
Planning time: 5.886 ms
Execution time: 6429.685 ms
I use PosgreSQL 10.8.0. Is there any chance to speed up this query?
I tried replacing DISTINCT with recursion:
WITH RECURSIVE t AS (
(SELECT min(user_id) AS user_id FROM payments)
UNION ALL
SELECT (SELECT min(user_id) FROM payments
WHERE user_id > t.user_id
) AS user_id FROM
t
WHERE t.user_id IS NOT NULL
)
SELECT payments.* FROM t
JOIN (VALUES (5),(22),(26)) --~6000 VALUES
AS v(user_id) USING (user_id)
, LATERAL (
SELECT user_id,
finished_at as last_deposit_date,
CASE When currency = 'RUB' Then amount_cents END as last_deposit_amount_cents FROM payments
WHERE payments.user_id=t.user_id
AND action = 'deposit'
AND success = 't'
AND currency IN ('RUB')
ORDER BY finished_at DESC LIMIT 1
) AS payments
WHERE t.user_id IS NOT NULL;
But it turned out even slower.
Hash Join (cost=418.67..21807.22 rows=3000 width=24) (actual time=16.804..10843.174 rows=5991 loops=1)
Hash Cond: (t.user_id = "VALUES".column1)
Buffers: shared hit=6396763
CTE t
-> Recursive Union (cost=0.46..53.73 rows=101 width=8) (actual time=0.142..1942.351 rows=237029 loops=1)
Buffers: shared hit=864281
-> Result (cost=0.46..0.47 rows=1 width=8) (actual time=0.141..0.142 rows=1 loops=1)
Buffers: shared hit=4
InitPlan 3 (returns $1)
-> Limit (cost=0.43..0.46 rows=1 width=8) (actual time=0.138..0.139 rows=1 loops=1)
Buffers: shared hit=4
-> Index Only Scan using index_payments_on_user_id on payments payments_2 (cost=0.43..155102.74 rows=4858092 width=8) (actual time=0.137..0.138 rows=1 loops=1)
Index Cond: (user_id IS NOT NULL)
Heap Fetches: 0
Buffers: shared hit=4
-> WorkTable Scan on t t_1 (cost=0.00..5.12 rows=10 width=8) (actual time=0.008..0.008 rows=1 loops=237029)
Filter: (user_id IS NOT NULL)
Rows Removed by Filter: 0
Buffers: shared hit=864277
SubPlan 2
-> Result (cost=0.48..0.49 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=237028)
Buffers: shared hit=864277
InitPlan 1 (returns $3)
-> Limit (cost=0.43..0.48 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=237028)
Buffers: shared hit=864277
-> Index Only Scan using index_payments_on_user_id on payments payments_1 (cost=0.43..80786.25 rows=1619364 width=8) (actual time=0.007..0.007 rows=1 loops=237028)
Index Cond: ((user_id IS NOT NULL) AND (user_id > t_1.user_id))
Heap Fetches: 46749
Buffers: shared hit=864277
-> Nested Loop (cost=214.94..21498.23 rows=100 width=32) (actual time=0.475..10794.535 rows=167333 loops=1)
Buffers: shared hit=6396757
-> CTE Scan on t (cost=0.00..2.02 rows=100 width=8) (actual time=0.145..1998.788 rows=237028 loops=1)
Filter: (user_id IS NOT NULL)
Rows Removed by Filter: 1
Buffers: shared hit=864281
-> Limit (cost=214.94..214.94 rows=1 width=24) (actual time=0.037..0.037 rows=1 loops=237028)
Buffers: shared hit=5532476
-> Sort (cost=214.94..215.37 rows=172 width=24) (actual time=0.036..0.036 rows=1 loops=237028)
Sort Key: payments.finished_at DESC
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=5532476
-> Index Scan using index_payments_on_user_id on payments (cost=0.43..214.08 rows=172 width=24) (actual time=0.003..0.034 rows=15 loops=237028)
Index Cond: (user_id = t.user_id)
Filter: (success AND ((action)::text = 'deposit'::text) AND ((currency)::text = 'RUB'::text))
Rows Removed by Filter: 6
Buffers: shared hit=5532473
-> Hash (cost=75.00..75.00 rows=6000 width=4) (actual time=2.255..2.255 rows=6000 loops=1)
Buckets: 8192 Batches: 1 Memory Usage: 275kB
-> Values Scan on "VALUES" (cost=0.00..75.00 rows=6000 width=4) (actual time=0.004..1.206 rows=6000 loops=1)
Planning time: 7.029 ms
Execution time: 10846.774 ms

For this query:
SELECT DISTINCT ON (user_id)
p.user_id,
p.finished_at as last_deposit_date,
(CASE WHEN p.currency = 'RUB' THEN p.amount_cents END) as last_deposit_amount_cents
FROM payments p JOIN
(VALUES (5),( 22), (26) --~6000 values
) v(user_id)
USING (user_id)
WHERE p.action = 'deposit' AND
p.success = 't' ND
p.currency = 'RUB'
ORDER BY p.user_id, p.finished_at DESC;
I don't fully understand the CASE expression, because the WHERE is filtering out all other values.
That said, I would expect an index on (action, success, currency, user_id, finished_at desc) to be helpful.

Different Explain on same Query

I have created index on events table over column derived_tstamp which has over 4 million records:
CREATE INDEX derived_tstamp_date_index ON atomic.events ( date(derived_tstamp) );
When I am running queries with two different values for domain_userid I am getting different Explain results. In Query 1 its used the index but Query 2 not using the index. How to make sure index is used all the time for faster results ?
Query 1:
EXPLAIN ANALYZE SELECT
SUM(duration) as "total_time_spent"
FROM (
SELECT
domain_sessionidx,
MIN(derived_tstamp) as "start_time",
MAX(derived_tstamp) as "finish_time",
MAX(derived_tstamp) - min(derived_tstamp) as "duration"
FROM "atomic".events
WHERE date(derived_tstamp) >= date('2017-07-01') AND date(derived_tstamp) <= date('2017-08-02') AND domain_userid = 'd01ee409-ebff-4f37-bc97-9bbda45a7225'
GROUP BY 1
) v;
Explain of query 1
Aggregate (cost=1834.00..1834.01 rows=1 width=16) (actual time=138.619..138.619 rows=1 loops=1)
-> GroupAggregate (cost=1830.83..1832.93 rows=85 width=34) (actual time=137.096..138.563 rows=186 loops=1)
Group Key: events.domain_sessionidx
-> Sort (cost=1830.83..1831.09 rows=104 width=10) (actual time=137.063..137.681 rows=2726 loops=1)
Sort Key: events.domain_sessionidx
Sort Method: quicksort Memory: 224kB
-> Bitmap Heap Scan on events (cost=1412.95..1827.35 rows=104 width=10) (actual time=108.764..136.053 rows=2726 loops=1)
Recheck Cond: ((date(derived_tstamp) >= '2017-07-01'::date) AND (date(derived_tstamp) <= '2017-08-02'::date) AND ((domain_userid)::text = 'd01ee409-ebff-4f37-bc97-9bbda45a7225'::text))
Rows Removed by Index Recheck: 19704
Heap Blocks: exact=466 lossy=3331
-> BitmapAnd (cost=1412.95..1412.95 rows=104 width=0) (actual time=108.474..108.474 rows=0 loops=1)
-> Bitmap Index Scan on derived_tstamp_date_index (cost=0.00..448.34 rows=21191 width=0) (actual time=94.371..94.371 rows=818461 loops=1)
Index Cond: ((date(derived_tstamp) >= '2017-07-01'::date) AND (date(derived_tstamp) <= '2017-08-02'::date))
-> Bitmap Index Scan on events_domain_userid_index (cost=0.00..964.31 rows=20767 width=0) (actual time=3.044..3.044 rows=16834 loops=1)
Index Cond: ((domain_userid)::text = 'd01ee409-ebff-4f37-bc97-9bbda45a7225'::text)
Planning time: 0.166 ms
Query 2:
EXPLAIN ANALYZE SELECT
SUM(duration) as "total_time_spent"
FROM (
SELECT
domain_sessionidx,
MIN(derived_tstamp) as "start_time",
MAX(derived_tstamp) as "finish_time",
MAX(derived_tstamp) - min(derived_tstamp) as "duration"
FROM "atomic".events
WHERE date(derived_tstamp) >= date('2017-07-01') AND date(derived_tstamp) <= date('2017-08-02') AND domain_userid = 'e4c94f3e-9841-4b65-9031-ca4aa03809e7'
GROUP BY 1
) v;
Explain of query 2:
Aggregate (cost=226.12..226.13 rows=1 width=16) (actual time=0.402..0.402 rows=1 loops=1)
-> GroupAggregate (cost=226.08..226.10 rows=1 width=34) (actual time=0.394..0.397 rows=2 loops=1)
Group Key: events.domain_sessionidx
-> Sort (cost=226.08..226.08 rows=1 width=10) (actual time=0.381..0.386 rows=13 loops=1)
Sort Key: events.domain_sessionidx
Sort Method: quicksort Memory: 25kB
-> Index Scan using events_domain_userid_index on events (cost=0.56..226.07 rows=1 width=10) (actual time=0.030..0.368 rows=13 loops=1)
Index Cond: ((domain_userid)::text = 'e4c94f3e-9841-4b65-9031-ca4aa03809e7'::text)
Filter: ((date(derived_tstamp) >= '2017-07-01'::date) AND (date(derived_tstamp) <= '2017-08-02'::date))
Rows Removed by Filter: 184
Planning time: 0.162 ms
Execution time: 0.440 ms

The index is not used in the second case because there are so few rows matching the condition domain_userid = 'e4c94f3e-9841-4b65-9031-ca4aa03809e7' (only 197) that it is cheaper to filter those rows than to perform a bitmap index scan using your new index.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How optimize SQL query with JOIN many values? - sql

Related

Need to improve count performance in PostgreSQL for this query

SQL (Postgres) Optimal Number of Joins

Postgres query becomes extremely slow with one single change

How optimize SQL query with DISTINCT ON and JOIN many values?

Different Explain on same Query

Categories

Resources