How optimize SQL query with JOIN many values? - sql

I have a query like this where join ~6000 values:
SELECT MAX(user_id) as user_id, SUM(sum_amount_win) as sum_amount_win
FROM (
SELECT
a1.user_id
,CASE When MAX(currency) = 'RUB' Then SUM(d1.amount_cents) END as sum_amount_win
FROM dense_balance_transactions as d1
JOIN accounts a1 ON a1.id = d1.account_id
JOIN (
VALUES (5),(22),(26) -- ~6000 values
) AS v(user_id) USING (user_id)
WHERE d1.created_at BETWEEN '2019-06-01 00:00:00' AND '2019-06-20 23:59:59'
AND d1.action='win'
GROUP BY a1.user_id, a1.currency
) as t
GROUP BY user_id
QUERY PLAN for query with many VALUES:
GroupAggregate (cost=266816.48..266816.54 rows=1 width=48) (actual time=5024.201..5102.633 rows=5745 loops=1)
Group Key: a1.user_id
Buffers: shared hit=12205927
-> GroupAggregate (cost=266816.48..266816.51 rows=1 width=44) (actual time=5024.185..5099.621 rows=5774 loops=1)
Group Key: a1.user_id, a1.currency
Buffers: shared hit=12205927
-> Sort (cost=266816.48..266816.49 rows=1 width=20) (actual time=5024.170..5041.840 rows=291122 loops=1)
Sort Key: a1.user_id, a1.currency
Sort Method: quicksort Memory: 35032kB
Buffers: shared hit=12205927
-> Gather (cost=214410.62..266816.47 rows=1 width=20) (actual time=292.828..5204.320 rows=291122 loops=1)
Workers Planned: 5
Workers Launched: 5
Buffers: shared hit=12205921
-> Nested Loop (cost=213410.62..265816.37 rows=1 width=20) (actual time=255.028..3939.300 rows=48520 loops=6)
Buffers: shared hit=12205921
-> Merge Join (cost=213410.19..214522.45 rows=1269 width=20) (actual time=253.545..274.872 rows=1136 loops=6)
Merge Cond: (a1.user_id = "*VALUES*".column1)
Buffers: shared hit=191958
-> Sort (cost=212958.66..213493.45 rows=213914 width=20) (actual time=251.991..263.828 rows=82468 loops=6)
Sort Key: a1.user_id
Sort Method: quicksort Memory: 24322kB
Buffers: shared hit=191916
-> Parallel Seq Scan on accounts a1 (cost=0.00..194020.14 rows=213914 width=20) (actual time=0.042..196.052 rows=179242 loops=6)
Buffers: shared hit=191881
-> Sort (cost=451.52..466.52 rows=6000 width=4) (actual time=1.547..2.429 rows=6037 loops=6)
Sort Key: "*VALUES*".column1
Sort Method: quicksort Memory: 474kB
Buffers: shared hit=42
-> Values Scan on "*VALUES*" (cost=0.00..75.00 rows=6000 width=4) (actual time=0.002..0.928 rows=6000 loops=6)
-> Index Scan using index_dense_balance_transactions_on_account_id on dense_balance_transactions d1 (cost=0.44..40.41 rows=1 width=16) (actual time=0.160..3.220 rows=43 loops=6816)
Index Cond: (account_id = a1.id)
Filter: ((created_at >= '2019-06-01 00:00:00'::timestamp without time zone) AND (created_at <= '2019-06-20 23:59:59'::timestamp without time zone) AND ((action)::text = 'win'::text))
Rows Removed by Filter: 1942
Buffers: shared hit=12013963
Planning time: 10.239 ms
Execution time: 5387.523 ms
I use PosgreSQL 10.8.0.
Is there any chance to speed up this query?

Related

Need to improve count performance in PostgreSQL for this query

I have this query in PostgreSQL:
SELECT COUNT("contacts"."id")
FROM "contacts"
INNER JOIN "phone_numbers" ON "phone_numbers"."id" = "contacts"."phone_number_id"
INNER JOIN "companies" ON "companies"."id" = "contacts"."company_id"
WHERE (
(
(
CAST("phone_numbers"."value" AS VARCHAR) ILIKE '%a%'
OR CAST("contacts"."first_name" AS VARCHAR) ILIKE '%a%'
)
OR CAST("contacts"."last_name" AS VARCHAR) ILIKE '%a%'
)
OR CAST("companies"."name" AS VARCHAR) ILIKE '%a%'
)
When I run the query it is taking 19secs to run. I need to improve the performance.
Note: I already have the index for the columns.
EXPLAIN ANALYZE report
Finalize Aggregate (cost=209076.49..209076.54 rows=1 width=8) (actual time=6117.381..6646.477 rows=1 loops=1)
-> Gather (cost=209076.42..209076.48 rows=4 width=8) (actual time=6117.370..6646.473 rows=5 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Partial Aggregate (cost=209066.42..209066.47 rows=1 width=8) (actual time=5952.710..5952.723 rows=1 loops=5)
-> Hash Join (cost=137685.37..208438.42 rows=251200 width=8) (actual time=3007.055..5945.571 rows=39193 loops=5)
Hash Cond: (contacts.company_id = companies.id)
Join Filter: (((phone_numbers.value)::text ~~* '%as%'::text) OR ((contacts.first_name)::text ~~* '%as%'::text) OR ((contacts.last_name)::text ~~* '%as%'::text) OR ((companies.name)::text ~~* '%as%'::text))
Rows Removed by Join Filter: 763817
-> Parallel Hash Join (cost=137684.86..201964.34 rows=1003781 width=41) (actual time=3006.633..4596.987 rows=803010 loops=5)
Hash Cond: (contacts.phone_number_id = phone_numbers.id)
-> Parallel Seq Scan on contacts (cost=0.00..59316.85 rows=1003781 width=37) (actual time=11.032..681.124 rows=803010 loops=5)
-> Parallel Hash (cost=68914.22..68914.22 rows=1295458 width=20) (actual time=1632.770..1632.770 rows=803184 loops=5)
Buckets: 65536 Batches: 64 Memory Usage: 4032kB
-> Parallel Seq Scan on phone_numbers (cost=0.00..68914.22 rows=1295458 width=20) (actual time=10.780..1202.242 rows=803184 loops=5)
-> Hash (cost=0.30..0.30 rows=4 width=40) (actual time=0.258..0.258 rows=4 loops=5)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on companies (cost=0.00..0.30 rows=4 width=40) (actual time=0.247..0.248 rows=4 loops=5)
Planning Time: 1.895 ms
Execution Time: 6646.558 ms
Please help me on this performance issue.
I tried FUNCTION row_count_estimate (query text) and it is not giving the exact count.
Solution Tried:
I tried the Robert solution and got 16 Secs to run
My Query is:
SELECT Count(id) AS id
FROM (
SELECT contacts.id AS id
FROM contacts
WHERE (
contacts.last_name ilike '%as%')
OR (
contacts.last_name ilike '%as%')
UNION
SELECT contacts.id AS id
FROM contacts
WHERE contacts.phone_number_id IN
(
SELECT phone_numbers.id AS phone_number_id
FROM phone_numbers
WHERE phone_numbers.value ilike '%as%')
UNION
SELECT contacts.id AS id
FROM contacts
WHERE contacts.company_id IN
(
SELECT companies.id AS company_id
FROM companies
WHERE companies.name ilike '%as%' )) AS ID
Report:
Aggregate (cost=395890.08..395890.13 rows=1 width=8) (actual time=5942.601..5942.667 rows=1 loops=1)
-> Unique (cost=332446.76..337963.57 rows=1103362 width=8) (actual time=5929.800..5939.658 rows=101989 loops=1)
-> Sort (cost=332446.76..335205.17 rows=1103362 width=8) (actual time=5929.799..5933.823 rows=101989 loops=1)
Sort Key: contacts.id
Sort Method: external merge Disk: 1808kB
-> Append (cost=10.00..220843.02 rows=1103362 width=8) (actual time=1.158..5900.926 rows=101989 loops=1)
-> Gather (cost=10.00..61935.48 rows=99179 width=8) (actual time=1.158..569.412 rows=101989 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Parallel Seq Scan on contacts (cost=0.00..61826.30 rows=24795 width=8) (actual time=0.446..477.276 rows=20398 loops=5)
Filter: ((last_name)::text ~~* '%as%'::text)
Rows Removed by Filter: 782612
-> Nested Loop (cost=0.84..359.91 rows=402 width=8) (actual time=5292.088..5292.089 rows=0 loops=1)
-> Index Scan using idx_phone_value on phone_numbers (cost=0.41..64.13 rows=402 width=8) (actual time=5292.087..5292.087 rows=0 loops=1)
Index Cond: ((value)::text ~~* '%as%'::text)
Rows Removed by Index Recheck: 4015921
-> Index Scan using index_contacts_on_phone_number_id on contacts contacts_1 (cost=0.43..0.69 rows=1 width=16) (never executed)
Index Cond: (phone_number_id = phone_numbers.id)
-> Gather (cost=10.36..75795.48 rows=1003781 width=8) (actual time=26.298..26.331 rows=0 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Hash Join (cost=0.36..74781.70 rows=250945 width=8) (actual time=3.758..3.758 rows=0 loops=5)
Hash Cond: (contacts_2.company_id = companies.id)
-> Parallel Seq Scan on contacts contacts_2 (cost=0.00..59316.85 rows=1003781 width=16) (actual time=0.128..0.128 rows=1 loops=5)
-> Hash (cost=0.31..0.31 rows=1 width=8) (actual time=0.726..0.727 rows=0 loops=5)
Buckets: 1024 Batches: 1 Memory Usage: 8kB
-> Seq Scan on companies (cost=0.00..0.31 rows=1 width=8) (actual time=0.726..0.726 rows=0 loops=5)
Filter: ((name)::text ~~* '%as%'::text)
Rows Removed by Filter: 4
Planning Time: 0.846 ms
Execution Time: 5948.330 ms
I tried the below also:
EXPLAIN ANALYZE SELECT
count(id) AS id
FROM
(SELECT
contacts.id AS id
FROM
contacts
WHERE
(
position('as' in LOWER(last_name)) > 0
)
UNION
SELECT
contacts.id AS id
FROM
contacts
WHERE
EXISTS (
SELECT
1
FROM
phone_numbers
WHERE
(
position('as' in LOWER(phone_numbers.value)) > 0
)
AND (
contacts.phone_number_id = phone_numbers.id
)
)
UNION
SELECT
contacts.id AS id
FROM
contacts
WHERE
EXISTS (
SELECT
1
FROM
companies
WHERE
(
position('as' in LOWER(companies.name)) > 0
)
AND (
contacts.company_id = companies.id
)
)
UNION DISTINCT SELECT
contacts.id AS id
FROM
contacts
WHERE
(
position('as' in LOWER(first_name)) > 0
)
) AS ID;
Report
Aggregate (cost=1609467.66..1609467.71 rows=1 width=8) (actual time=1039.249..1039.330 rows=1 loops=1)
-> Unique (cost=1320886.03..1345980.09 rows=5018811 width=8) (actual time=999.363..1030.500 rows=195963 loops=1)
-> Sort (cost=1320886.03..1333433.06 rows=5018811 width=8) (actual time=999.362..1013.818 rows=198421 loops=1)
Sort Key: contacts.id
Sort Method: external merge Disk: 3520kB
-> Gather (cost=10.00..754477.62 rows=5018811 width=8) (actual time=0.581..941.210 rows=198421 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Parallel Append (cost=0.00..749448.80 rows=5018811 width=8) (actual time=290.521..943.736 rows=39684 loops=5)
-> Parallel Hash Join (cost=101469.35..164569.24 rows=334587 width=8) (actual time=724.841..724.843 rows=0 loops=2)
Hash Cond: (contacts.phone_number_id = phone_numbers.id)
-> Parallel Seq Scan on contacts (cost=0.00..59315.91 rows=1003762 width=16) (never executed)
-> Parallel Hash (cost=78630.16..78630.16 rows=431819 width=8) (actual time=723.735..723.735 rows=0 loops=2)
Buckets: 131072 Batches: 32 Memory Usage: 0kB
-> Parallel Seq Scan on phone_numbers (cost=0.00..78630.16 rows=431819 width=8) (actual time=723.514..723.514 rows=0 loops=2)
Filter: ("position"(lower((value)::text), 'as'::text) > 0)
Rows Removed by Filter: 2007960
-> Hash Join (cost=0.38..74780.48 rows=250940 width=8) (actual time=0.888..0.888 rows=0 loops=1)
Hash Cond: (contacts_1.company_id = companies.id)
-> Parallel Seq Scan on contacts contacts_1 (cost=0.00..59315.91 rows=1003762 width=16) (actual time=0.009..0.009 rows=1 loops=1)
-> Hash (cost=0.33..0.33 rows=1 width=8) (actual time=0.564..0.564 rows=0 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 8kB
-> Seq Scan on companies (cost=0.00..0.33 rows=1 width=8) (actual time=0.563..0.563 rows=0 loops=1)
Filter: ("position"(lower((name)::text), 'as'::text) > 0)
Rows Removed by Filter: 4
-> Parallel Seq Scan on contacts contacts_2 (cost=0.00..66844.13 rows=334588 width=8) (actual time=0.119..315.032 rows=20398 loops=5)
Filter: ("position"(lower((last_name)::text), 'as'::text) > 0)
Rows Removed by Filter: 782612
-> Parallel Seq Scan on contacts contacts_3 (cost=0.00..66844.13 rows=334588 width=8) (actual time=0.510..558.791 rows=32144 loops=3)
Filter: ("position"(lower((first_name)::text), 'as'::text) > 0)
Rows Removed by Filter: 1306206
Planning Time: 2.115 ms
Execution Time: 1040.620 ms
It's hard to help you, because I don't have acces to your data. Let me try...
EXPLAIN ANALYZE report shows that:
Yor query doesn't using indexes. Full scan on table phone_numbers tooks 1.202 second, and 0.681 senod on contacts table.
"Rows Removed by Join Filter: 763817".
"Parallel Hash Join (cost=137684.86..201964.34 rows=1003781 width=41) (actual time=3006.633..4596.987 rows=803010 loops=5)" . So this query joins ~800k rows and then filter 763k of it.
Maybe you can reverse that. This should speed up (but that needs to be checked).
For example you can test this - rewrite your query in this direction:
SELECT COUNT( ID)
FROM
(
SELECT "contacts"."id"
FROM "contacts"
Where <filters on contract here>
union
SELECT "contacts"."id"
FROM "contacts"
where phone_number_id in ( select "phone_numbers"."id"
from "phone_numbers"
where <filters on phone_numbers here>
) as A
union
SELECT "contacts"."id"
FROM "contacts"
where company_id in ( select "companies"."id"
from "companies"
where <filters on companies here> )
) as B
Two indexes: one on column contacts.phone_number_id and another on contacts.company_id might help.
EDIT:
It using index on "phone_numbers"."id" with nested loop it tooks 5 seconds.
Try to avoid this.
Please check, what it will do for this:
SELECT Count(id) AS id
FROM (
SELECT contacts.id AS id
FROM contacts
WHERE (
contacts.last_name ilike '%as%')
OR (
contacts.last_name ilike '%as%')
UNION
SELECT contacts.id AS id
FROM contacts
WHERE contacts.phone_number_id IN
(
SELECT to_number(to_char(phone_numbers.id))) /* just for disable index scan for that column */ AS phone_number_id
FROM phone_numbers
WHERE phone_numbers.value ilike '%as%')
UNION
SELECT contacts.id AS id
FROM contacts
WHERE contacts.company_id IN
(
SELECT companies.id AS company_id
FROM companies
WHERE companies.name ilike '%as%' )) AS ID
Aggregate (cost=419095.35..419095.40 rows=1 width=8) (actual time=13235.986..13236.335 rows=1 loops=1)
-> Unique (cost=346875.23..353155.24 rows=1256002 width=8) (actual time=13211.350..13230.729 rows=195963 loops=1)
-> Sort (cost=346875.23..350015.24 rows=1256002 width=8) (actual time=13211.349..13219.607 rows=195963 loops=1)
Sort Key: contacts.id
Sort Method: external merge Disk: 3472kB
-> Append (cost=2249.63..218658.27 rows=1256002 width=8) (actual time=5927.019..13164.421 rows=195963 loops=1)
-> Gather (cost=2249.63..48279.58 rows=251838 width=8) (actual time=5927.019..6911.795 rows=195963 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Parallel Bitmap Heap Scan on contacts (cost=2239.63..48017.74 rows=62960 width=8) (actual time=5861.480..6865.957 rows=39193 loops=5)
Recheck Cond: (((first_name)::text ~~* '%as%'::text) OR ((last_name)::text ~~* '%as%'::text))
Rows Removed by Index Recheck: 763815
Heap Blocks: exact=10860 lossy=6075
-> BitmapOr (cost=2239.63..2239.63 rows=255705 width=0) (actual time=5917.966..5917.966 rows=0 loops=1)
-> Bitmap Index Scan on idx_trgm_contacts_first_name (cost=0.00..1291.57 rows=156527 width=0) (actual time=2972.404..2972.404 rows=4015039 loops=1)
Index Cond: ((first_name)::text ~~* '%as%'::text)
-> Bitmap Index Scan on idx_trgm_contacts_last_name (cost=0.00..822.14 rows=99177 width=0) (actual time=2945.560..2945.560 rows=4015038 loops=1)
Index Cond: ((last_name)::text ~~* '%as%'::text)
-> Nested Loop (cost=81.96..384.33 rows=402 width=8) (actual time=6213.028..6213.028 rows=0 loops=1)
-> Unique (cost=81.52..83.53 rows=402 width=8) (actual time=6213.027..6213.027 rows=0 loops=1)
-> Sort (cost=81.52..82.52 rows=402 width=8) (actual time=6213.027..6213.027 rows=0 loops=1)
Sort Key: ((NULLIF((phone_numbers.id)::text, ''::text))::integer)
Sort Method: quicksort Memory: 25kB
-> Index Scan using idx_trgm_phone_value on phone_numbers (cost=0.41..64.13 rows=402 width=8) (actual time=6213.006..6213.006 rows=0 loops=1)
Index Cond: ((value)::text ~~* '%as%'::text)
Rows Removed by Index Recheck: 4015921
-> Index Scan using index_contacts_on_phone_number_id on contacts contacts_1 (cost=0.44..0.70 rows=1 width=16) (never executed)
Index Cond: (phone_number_id = (NULLIF((phone_numbers.id)::text, ''::text))::integer)
-> Gather (cost=10.36..75794.22 rows=1003762 width=8) (actual time=25.691..25.709 rows=0 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Hash Join (cost=0.36..74780.46 rows=250940 width=8) (actual time=2.653..2.653 rows=0 loops=5)
Hash Cond: (contacts_2.company_id = companies.id)
-> Parallel Seq Scan on contacts contacts_2 (cost=0.00..59315.91 rows=1003762 width=16) (actual time=0.244..0.244 rows=1 loops=5)
-> Hash (cost=0.31..0.31 rows=1 width=8) (actual time=0.244..0.244 rows=0 loops=5)
Buckets: 1024 Batches: 1 Memory Usage: 8kB
-> Seq Scan on companies (cost=0.00..0.31 rows=1 width=8) (actual time=0.244..0.244 rows=0 loops=5)
Filter: ((name)::text ~~* '%as%'::text)
Rows Removed by Filter: 4
Planning Time: 1.458 ms
Execution Time: 13236.949 ms
I tried below,
SELECT Count(id) AS id
FROM (
SELECT contacts.id AS id
FROM contacts
WHERE (substring(LOWER(contacts.first_name), position('as' in LOWER(first_name)), 2) = 'as')
OR (substring(LOWER(contacts.last_name), position('as' in LOWER(last_name)), 2) = 'as')
UNION
SELECT contacts.id AS id
FROM contacts
WHERE contacts.phone_number_id IN
(
SELECT NULLIF(CAST(phone_numbers.id AS text), '')::int AS phone_number_id
FROM phone_numbers
WHERE (substring(LOWER(phone_numbers.value), position('as' in LOWER(phone_numbers.value)), 2) = 'as'))
UNION
SELECT contacts.id AS id
FROM contacts
WHERE contacts.company_id IN
(
SELECT companies.id AS company_id
FROM companies
WHERE (substring(LOWER(companies.name), position('as' in LOWER(companies.name)), 2) = 'as') )) AS ID
Aggregate (cost=508646.88..508646.93 rows=1 width=8) (actual time=1455.892..1455.995 rows=1 loops=1)
-> Unique (cost=447473.09..452792.55 rows=1063892 width=8) (actual time=1431.464..1450.434 rows=195963 loops=1)
-> Sort (cost=447473.09..450132.82 rows=1063892 width=8) (actual time=1431.464..1439.267 rows=195963 loops=1)
Sort Key: contacts.id
Sort Method: external merge Disk: 3472kB
-> Append (cost=10.00..340141.41 rows=1063892 width=8) (actual time=0.391..1370.557 rows=195963 loops=1)
-> Gather (cost=10.00..84460.02 rows=40050 width=8) (actual time=0.391..983.457 rows=195963 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Parallel Seq Scan on contacts (cost=0.00..84409.97 rows=10012 width=8) (actual time=1.696..987.285 rows=39193 loops=5)
Filter: (("substring"(lower((first_name)::text), "position"(lower((first_name)::text), 'as'::text), 2) = 'as'::text) OR ("substring"(lower((last_name)::text), "position"(lower((last_name)::text), 'as'::text), 2) = 'as'::text))
Rows Removed by Filter: 763817
-> Nested Loop (cost=85188.17..100095.23 rows=20080 width=8) (actual time=364.076..364.125 rows=0 loops=1)
-> HashAggregate (cost=85187.73..86191.73 rows=20080 width=8) (actual time=364.074..364.123 rows=0 loops=1)
Group Key: (NULLIF((phone_numbers.id)::text, ''::text))::integer
Batches: 1 Memory Usage: 793kB
-> Gather (cost=10.00..85137.53 rows=20080 width=8) (actual time=363.976..364.025 rows=0 loops=1)
Workers Planned: 3
Workers Launched: 3
-> Parallel Seq Scan on phone_numbers (cost=0.00..85107.45 rows=6477 width=8) (actual time=357.030..357.031 rows=0 loops=4)
Filter: ("substring"(lower((value)::text), "position"(lower((value)::text), 'as'::text), 2) = 'as'::text)
Rows Removed by Filter: 1003980
-> Index Scan using index_contacts_on_phone_number_id on contacts contacts_1 (cost=0.44..0.64 rows=1 width=16) (never executed)
Index Cond: (phone_number_id = (NULLIF((phone_numbers.id)::text, ''::text))::integer)
-> Gather (cost=10.40..75794.26 rows=1003762 width=8) (actual time=6.889..6.910 rows=0 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Hash Join (cost=0.40..74780.50 rows=250940 width=8) (actual time=0.138..0.139 rows=0 loops=5)
Hash Cond: (contacts_2.company_id = companies.id)
-> Parallel Seq Scan on contacts contacts_2 (cost=0.00..59315.91 rows=1003762 width=16) (actual time=0.004..0.004 rows=1 loops=5)
-> Hash (cost=0.35..0.35 rows=1 width=8) (actual time=0.081..0.081 rows=0 loops=5)
Buckets: 1024 Batches: 1 Memory Usage: 8kB
-> Seq Scan on companies (cost=0.00..0.35 rows=1 width=8) (actual time=0.081..0.081 rows=0 loops=5)
Filter: ("substring"(lower((name)::text), "position"(lower((name)::text), 'as'::text), 2) = 'as'::text)
Rows Removed by Filter: 4
Planning Time: 0.927 ms
Execution Time: 1456.742 ms

SQL (Postgres) Optimal Number of Joins

This query is being performed in postgres version 12. This query poses 8 joins, and lasts approximately 5 seconds.
Query 1
select *
from "public"."products" "P"
inner join "system"."categories" "C" on "C"."id" = "P"."id_category"
inner join "public"."businesses" "E" on "E"."id" = "P"."id_business"
left join "public"."product_files" "pf" on "pf"."id_product" = "P"."id"
left join "system"."files" "f" on "f"."name" = "pf"."img_code"
left join "public"."product_variations" "pv" on ("pv"."id_product" = "P"."id" and "pv"."status" <> 'Deleted')
left join "public"."product_stocks" "ps" on ("ps"."id_product_variation" = "pv"."id" and "ps"."status" <> 'Deleted')
left join "public"."product_stocks" "pps" on ("pps"."id_product" = "P"."id" and "pps"."status" <> 'Deleted')
inner join search_products( array['tires'], 8, 1, 'es') "search" on search.id = "P"."id"
where "P"."status" <> 'Deleted'
Postgres Query EXPLAIN(ANALYZE, BUFFERS) for Query 1
Merge Join (cost=112948.60..121805.61 rows=4996 width=1145) (actual time=2003.599..2426.892 rows=40 loops=1)
Merge Cond: ("P".id = search.id)
Buffers: shared hit=760531, temp read=16912 written=18837
-> Merge Left Join (cost=112888.52..120950.73 rows=287945 width=1105) (actual time=1607.013..2093.722 rows=380961 loops=1)
Merge Cond: ("P".id = pf.id_product)
Buffers: shared hit=752079, temp read=15561 written=15606
-> Merge Left Join (cost=16288.22..19167.29 rows=57631 width=771) (actual time=165.803..271.662 rows=76193 loops=1)
Merge Cond: ("P".id = pps.id_product)
Buffers: shared hit=3820, temp read=2706 written=2733
-> Merge Left Join (cost=16287.81..16577.01 rows=57631 width=686) (actual time=165.787..217.878 rows=56921 loops=1)
Merge Cond: ("P".id = pv.id_product)
Buffers: shared hit=2058, temp read=2706 written=2733
-> Sort (cost=14888.93..15033.01 rows=57631 width=514) (actual time=156.825..175.154 rows=56920 loops=1)
Sort Key: "P".id
Sort Method: external merge Disk: 21840kB
Buffers: shared hit=1430, temp read=2706 written=2733
-> Hash Join (cost=43.49..2484.49 rows=57631 width=514) (actual time=0.266..64.052 rows=57631 loops=1)
Hash Cond: ("P".id_business = "E".id)
Buffers: shared hit=1430
-> Hash Join (cost=37.81..2322.14 rows=57631 width=374) (actual time=0.214..39.402 rows=57631 loops=1)
Hash Cond: ("P".id_category = "C".id)
Buffers: shared hit=1427
-> Seq Scan on products "P" (cost=0.00..2132.41 rows=57631 width=252) (actual time=0.009..12.754 rows=57631 loops=1)
Filter: ((status)::text <> 'Deleted'::text)
Rows Removed by Filter: 2
Buffers: shared hit=1412
-> Hash (cost=25.14..25.14 rows=1014 width=122) (actual time=0.201..0.201 rows=1014 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 124kB
Buffers: shared hit=15
-> Seq Scan on categories "C" (cost=0.00..25.14 rows=1014 width=122) (actual time=0.007..0.078 rows=1014 loops=1)
Buffers: shared hit=15
-> Hash (cost=4.19..4.19 rows=119 width=140) (actual time=0.047..0.048 rows=119 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 29kB
Buffers: shared hit=3
-> Seq Scan on businesses "E" (cost=0.00..4.19 rows=119 width=140) (actual time=0.013..0.024 rows=119 loops=1)
Buffers: shared hit=3
-> Sort (cost=1398.88..1399.05 rows=70 width=172) (actual time=8.956..8.958 rows=3 loops=1)
Sort Key: pv.id_product
Sort Method: quicksort Memory: 43kB
Buffers: shared hit=628
-> Hash Right Join (cost=4.58..1396.73 rows=70 width=172) (actual time=8.853..8.912 rows=70 loops=1)
Hash Cond: (ps.id_product_variation = pv.id)
Buffers: shared hit=628
-> Seq Scan on product_stocks ps (cost=0.00..1259.35 rows=50589 width=85) (actual time=0.009..7.030 rows=50595 loops=1)
Filter: ((status)::text <> 'Deleted'::text)
Rows Removed by Filter: 73
Buffers: shared hit=626
-> Hash (cost=3.70..3.70 rows=70 width=87) (actual time=0.048..0.049 rows=70 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 16kB
Buffers: shared hit=2
-> Seq Scan on product_variations pv (cost=0.00..3.70 rows=70 width=87) (actual time=0.020..0.039 rows=70 loops=1)
Filter: ((status)::text <> 'Deleted'::text)
Rows Removed by Filter: 66
Buffers: shared hit=2
-> Index Scan using product_stocks_id_product_id_product_variation_id_location_key on product_stocks pps (cost=0.41..1819.97 rows=50589 width=85) (actual time=0.013..17.822 rows=49924 loops=1)
Filter: ((status)::text <> 'Deleted'::text)
Rows Removed by Filter: 1
Buffers: shared hit=1762
-> Materialize (cost=96600.25..98040.03 rows=287955 width=334) (actual time=1441.203..1613.160 rows=380961 loops=1)
Buffers: shared hit=748259, temp read=12855 written=12873
-> Sort (cost=96600.25..97320.14 rows=287955 width=334) (actual time=1441.198..1567.183 rows=284596 loops=1)
Sort Key: pf.id_product
Sort Method: external merge Disk: 102840kB
Buffers: shared hit=748259, temp read=12855 written=12873
-> Merge Left Join (cost=0.84..44546.48 rows=287955 width=334) (actual time=0.021..1013.742 rows=287955 loops=1)
Merge Cond: ((pf.img_code)::text = (f.name)::text)
Buffers: shared hit=748259
-> Index Scan using product_files_pkey on product_files pf (cost=0.42..10516.05 rows=287955 width=66) (actual time=0.005..184.173 rows=287955 loops=1)
Buffers: shared hit=289884
-> Index Scan using files_pkey on files f (cost=0.42..29304.42 rows=455180 width=268) (actual time=0.005..338.206 rows=455178 loops=1)
Buffers: shared hit=458375
-> Sort (cost=60.08..62.58 rows=1000 width=40) (actual time=313.554..313.558 rows=36 loops=1)
Sort Key: search.id
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=8452, temp read=1351 written=3231
-> Function Scan on search_products search (cost=0.25..10.25 rows=1000 width=40) (actual time=313.544..313.545 rows=8 loops=1)
Buffers: shared hit=8452, temp read=1351 written=3231
Planning Time: 2.632 ms
Execution Time: 2440.414 ms
I was reviewing the way to optimize the query, so I was doing the joins one by one to see where the problem was, and among so many permutations in order of join, I realized that postgres from join number 7, apparently stops find the best way to run the query. So, when i delete any (randomly) join, the query lasts 300ms
Query 2
select *
from "public"."products" "P"
inner join "system"."categories" "C" on "C"."id" = "P"."id_category"
left join "public"."product_files" "pf" on "pf"."id_product" = "P"."id"
left join "system"."files" "f" on "f"."name" = "pf"."img_code"
left join "public"."product_variations" "pv" on ("pv"."id_product" = "P"."id" and "pv"."status" <> 'Deleted')
left join "public"."product_stocks" "ps" on ("ps"."id_product_variation" = "pv"."id" and "ps"."status" <> 'Deleted')
left join "public"."product_stocks" "pps" on ("pps"."id_product" = "P"."id" and "pps"."status" <> 'Deleted')
inner join search_products( array['tires'], 8, 1, 'es') "search" on search.id = "P"."id"
where "P"."status" <> 'Deleted'
Postgres Query EXPLAIN(ANALYZE, BUFFERS) for Query 2
Nested Loop Left Join (cost=1365.30..6482.09 rows=4996 width=1005) (actual time=349.888..350.399 rows=40 loops=1)
Buffers: shared hit=9339, temp read=1351 written=3231
-> Nested Loop Left Join (cost=1364.88..3893.89 rows=4996 width=737) (actual time=349.866..349.957 rows=40 loops=1)
Buffers: shared hit=9179, temp read=1351 written=3231
-> Nested Loop Left Join (cost=1364.46..3250.90 rows=1000 width=671) (actual time=349.857..349.899 rows=8 loops=1)
Buffers: shared hit=9147, temp read=1351 written=3231
-> Hash Join (cost=1364.04..2759.11 rows=1000 width=586) (actual time=349.839..349.853 rows=8 loops=1)
Hash Cond: ("P".id_category = "C".id)
Buffers: shared hit=9119, temp read=1351 written=3231
-> Hash Right Join (cost=1326.23..2718.65 rows=1000 width=464) (actual time=349.566..349.574 rows=8 loops=1)
Hash Cond: (pv.id_product = "P".id)
Buffers: shared hit=9104, temp read=1351 written=3231
-> Hash Right Join (cost=4.58..1396.73 rows=70 width=172) (actual time=8.953..9.013 rows=70 loops=1)
Hash Cond: (ps.id_product_variation = pv.id)
Buffers: shared hit=628
-> Seq Scan on product_stocks ps (cost=0.00..1259.35 rows=50589 width=85) (actual time=0.008..7.060 rows=50595 loops=1)
Filter: ((status)::text <> 'Deleted'::text)
Rows Removed by Filter: 73
Buffers: shared hit=626
-> Hash (cost=3.70..3.70 rows=70 width=87) (actual time=0.047..0.048 rows=70 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 16kB
Buffers: shared hit=2
-> Seq Scan on product_variations pv (cost=0.00..3.70 rows=70 width=87) (actual time=0.015..0.033 rows=70 loops=1)
Filter: ((status)::text <> 'Deleted'::text)
Rows Removed by Filter: 66
Buffers: shared hit=2
-> Hash (cost=1309.15..1309.15 rows=1000 width=292) (actual time=340.542..340.543 rows=8 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 10kB
Buffers: shared hit=8476, temp read=1351 written=3231
-> Nested Loop (cost=0.54..1309.15 rows=1000 width=292) (actual time=340.505..340.535 rows=8 loops=1)
Buffers: shared hit=8476, temp read=1351 written=3231
-> Function Scan on search_products search (cost=0.25..10.25 rows=1000 width=40) (actual time=340.483..340.485 rows=8 loops=1)
Buffers: shared hit=8452, temp read=1351 written=3231
-> Index Scan using products_pkey on products "P" (cost=0.29..1.30 rows=1 width=252) (actual time=0.005..0.005 rows=1 loops=8)
Index Cond: (id = search.id)
Filter: ((status)::text <> 'Deleted'::text)
Buffers: shared hit=24
-> Hash (cost=25.14..25.14 rows=1014 width=122) (actual time=0.268..0.268 rows=1014 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 124kB
Buffers: shared hit=15
-> Seq Scan on categories "C" (cost=0.00..25.14 rows=1014 width=122) (actual time=0.012..0.110 rows=1014 loops=1)
Buffers: shared hit=15
-> Index Scan using product_stocks_id_product_id_product_variation_id_location_key on product_stocks pps (cost=0.41..0.47 rows=2 width=85) (actual time=0.005..0.005 rows=1 loops=8)
Index Cond: (id_product = "P".id)
Filter: ((status)::text <> 'Deleted'::text)
Buffers: shared hit=28
-> Index Scan using idx_product_files_product on product_files pf (cost=0.42..0.59 rows=5 width=66) (actual time=0.004..0.005 rows=5 loops=8)
Index Cond: (id_product = "P".id)
Buffers: shared hit=32
-> Index Scan using files_pkey on files f (cost=0.42..0.52 rows=1 width=268) (actual time=0.010..0.010 rows=1 loops=40)
Index Cond: ((name)::text = (pf.img_code)::text)
Buffers: shared hit=160
Planning Time: 2.581 ms
Execution Time: 350.525 ms
Is there an article that explains this behavior to me? and how to fix it?
That is because join_collapse_limit has a default value of 8. The optimizer tries all permutations only for the first 8 tables, the rest is joined as written. The rationale is to keep planning time reasonably short, which increases exponentially with the number of tables.
Options:
increase the parameter
figure out a good join order ans rewrite the query to join in that order

Postgres query becomes extremely slow with one single change

I have this query, where among other things, I need the discussions.client_first_responded_at to be converted to the given time zone, which is different for every row. If I replace the reference to users.time_zone_offset to a static '-06:00'::INTERVAL — the query executes within a second, but with the dynamic reference to the users.time_zone_offset it takes ~120 seconds.
What am I missing?
SELECT client_id
FROM programs
INNER JOIN users ON users.id = programs.coach_id
WHERE programs.id IN (
SELECT COALESCE(discussions.parent_id, calls.parent_id) AS program_id
FROM categorizations
LEFT OUTER JOIN discussions ON discussions.id = categorizations.categorizable_id AND categorizations.categorizable_type = 'Discussion'
LEFT OUTER JOIN calls ON calls.id = categorizations.categorizable_id AND categorizations.categorizable_type = 'Call'
WHERE categorizations.categorizable_type = 'Discussion' OR categorizations.categorizable_type = 'Call'
AND COALESCE(
discussions.client_first_responded_at::timestamptz AT TIME ZONE users.time_zone_offset::INTERVAL,
calls.start_time::timestamptz
) BETWEEN '2020-09-01' AND '2020-09-30'
);
UPD:
Hash Join (cost=1.47..250840.61 rows=3542 width=8) (actual time=35.419..61810.027 rows=2266 loops=1)
Hash Cond: (programs.coach_id = users.id)
Join Filter: (SubPlan 1)
Rows Removed by Join Filter: 4821
Buffers: shared hit=3087792
-> Seq Scan on programs (cost=0.00..368.84 rows=7084 width=20) (actual time=0.008..11.890 rows=7087 loops=1)
Buffers: shared hit=298
-> Hash (cost=1.21..1.21 rows=21 width=40) (actual time=0.015..0.015 rows=21 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
Buffers: shared hit=1
-> Seq Scan on users (cost=0.00..1.21 rows=21 width=40) (actual time=0.004..0.008 rows=21 loops=1)
Buffers: shared hit=1
SubPlan 1
-> Hash Right Join (cost=777.19..2168.01 rows=7477 width=4) (actual time=0.007..7.887 rows=7516 loops=7087)
Hash Cond: (discussions.id = categorizations.categorizable_id)
Join Filter: ((categorizations.categorizable_type)::text = 'Discussion'::text)
Rows Removed by Join Filter: 3473
Filter: (((categorizations.categorizable_type)::text = 'Discussion'::text) OR (((categorizations.categorizable_type)::text = 'Call'::text) AND (COALESCE((timezone((users.time_zone_offset)::interval, (discussions.client_first_responded_at)::timestamp with time zone))::timestamp with time zone, (calls.start_time)::timestamp with time zone) >= '2020-09-01 00:00:00+02'::timestamp with time zone) AND (COALESCE((timezone((users.time_zone_offset)::interval, (discussions.client_first_responded_at)::timestamp with time zone))::timestamp with time zone, (calls.start_time)::timestamp with time zone) <= '2020-09-30 00:00:00+02'::timestamp with time zone)))
Rows Removed by Filter: 2578
Buffers: shared hit=3087493
-> Seq Scan on discussions (cost=0.00..751.46 rows=18746 width=20) (actual time=0.003..1.842 rows=15668 loops=7087)
Buffers: shared hit=3087252
-> Hash (cost=647.28..647.28 rows=10393 width=25) (actual time=13.355..13.355 rows=13090 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 763kB
Buffers: shared hit=241
-> Hash Left Join (cost=300.68..647.28 rows=10393 width=25) (actual time=3.476..10.083 rows=13090 loops=1)
Hash Cond: (categorizations.categorizable_id = calls.id)
Join Filter: ((categorizations.categorizable_type)::text = 'Call'::text)
Rows Removed by Join Filter: 2303
Buffers: shared hit=241
-> Seq Scan on categorizations (cost=0.00..319.32 rows=10393 width=13) (actual time=0.006..2.455 rows=13090 loops=1)
Filter: (((categorizable_type)::text = 'Discussion'::text) OR ((categorizable_type)::text = 'Call'::text))
Buffers: shared hit=123
-> Hash (cost=199.19..199.19 rows=8119 width=20) (actual time=3.462..3.462 rows=8119 loops=1)
Buckets: 8192 Batches: 1 Memory Usage: 509kB
Buffers: shared hit=118
-> Seq Scan on calls (cost=0.00..199.19 rows=8119 width=20) (actual time=0.005..1.766 rows=8119 loops=1)
Buffers: shared hit=118
Planning Time: 0.643 ms
Execution Time: 61811.118 ms
Hash Join (cost=3537.47..4289.98 rows=4825 width=8) (actual time=111.743..122.572 rows=4371 loops=1)
Hash Cond: (programs.coach_id = users.id)
Buffers: shared hit=1931
-> Hash Join (cost=3535.95..4273.46 rows=4825 width=12) (actual time=111.627..120.637 rows=4371 loops=1)
Hash Cond: (programs.id = COALESCE(discussions.parent_id, calls.parent_id))
Buffers: shared hit=1930
-> Seq Scan on programs (cost=0.00..658.50 rows=9650 width=20) (actual time=0.011..4.880 rows=9656 loops=1)
Buffers: shared hit=562
-> Hash (cost=3350.83..3350.83 rows=14810 width=8) (actual time=111.573..111.580 rows=4371 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 282kB
Buffers: shared hit=1368
-> HashAggregate (cost=3202.73..3350.83 rows=14810 width=8) (actual time=107.495..109.868 rows=4371 loops=1)
Group Key: COALESCE(discussions.parent_id, calls.parent_id)
Buffers: shared hit=1368
-> Hash Left Join (cost=1688.78..3165.70 rows=14810 width=8) (actual time=34.242..97.144 rows=19275 loops=1)
Hash Cond: (categorizations.categorizable_id = calls.id)
Join Filter: ((categorizations.categorizable_type)::text = 'Call'::text)
Rows Removed by Join Filter: 6869
Filter: (((categorizations.categorizable_type)::text = 'Discussion'::text) OR (((categorizations.categorizable_type)::text = 'Call'::text) AND (COALESCE((timezone('-06:00:00'::interval, (discussions.client_first_responded_at)::timestamp with time zone))::timestamp with time zone, (calls.start_time)::timestamp with time zone) >= '2020-09-01 00:00:00+00'::timestamp with time zone) AND (COALESCE((timezone('-06:00:00'::interval, (discussions.client_first_responded_at)::timestamp with time zone))::timestamp with time zone, (calls.start_time)::timestamp with time zone) <= '2020-09-30 00:00:00+00'::timestamp with time zone)))
Rows Removed by Filter: 7674
Buffers: shared hit=1368
-> Hash Right Join (cost=1046.24..2467.57 rows=21181 width=25) (actual time=21.970..62.581 rows=26949 loops=1)
Hash Cond: (discussions.id = categorizations.categorizable_id)
Join Filter: ((categorizations.categorizable_type)::text = 'Discussion'::text)
Rows Removed by Join Filter: 8461
Buffers: shared hit=1057
-> Seq Scan on discussions (cost=0.00..997.71 rows=31771 width=20) (actual time=0.183..8.541 rows=32064 loops=1)
Buffers: shared hit=680
-> Hash (cost=781.47..781.47 rows=21181 width=13) (actual time=21.720..21.721 rows=26949 loops=1)
Buckets: 32768 Batches: 1 Memory Usage: 1444kB
Buffers: shared hit=377
-> Seq Scan on categorizations (cost=0.00..781.47 rows=21181 width=13) (actual time=0.012..11.631 rows=26949 loops=1)
Filter: (((categorizable_type)::text = 'Discussion'::text) OR ((categorizable_type)::text = 'Call'::text))
Buffers: shared hit=377
-> Hash (cost=458.35..458.35 rows=14735 width=20) (actual time=12.205..12.206 rows=14720 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 876kB
Buffers: shared hit=311
-> Seq Scan on calls (cost=0.00..458.35 rows=14735 width=20) (actual time=0.016..6.914 rows=14720 loops=1)
Buffers: shared hit=311
-> Hash (cost=1.23..1.23 rows=23 width=8) (actual time=0.065..0.066 rows=23 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
Buffers: shared hit=1
-> Seq Scan on users (cost=0.00..1.23 rows=23 width=8) (actual time=0.048..0.054 rows=23 loops=1)
Buffers: shared hit=1
Planning Time: 1.010 ms
Execution Time: 123.511 ms

How optimize SQL query with DISTINCT ON and JOIN many values?

I have a query like this where join ~6000 values
SELECT DISTINCT ON(user_id)
user_id,
finished_at as last_deposit_date,
CASE When currency = 'RUB' Then amount_cents END as last_deposit_amount_cents
FROM payments
JOIN (VALUES (5),(22),(26)) --~6000 values
AS v(user_id) USING (user_id)
WHERE action = 'deposit'
AND success = 't'
AND currency IN ('RUB')
ORDER BY user_id, finished_at DESC
QUERY PLAN for query with many VALUES:
Unique (cost=444606.97..449760.44 rows=19276 width=24) (actual time=6129.403..6418.317 rows=5991 loops=1)
Buffers: shared hit=2386527, temp read=7807 written=7808
-> Sort (cost=444606.97..447183.71 rows=1030695 width=24) (actual time=6129.401..6295.457 rows=1877039 loops=1)
Sort Key: payments.user_id, payments.finished_at DESC
Sort Method: external merge Disk: 62456kB
Buffers: shared hit=2386527, temp read=7807 written=7808
-> Nested Loop (cost=0.43..341665.35 rows=1030695 width=24) (actual time=0.612..5085.376 rows=1877039 loops=1)
Buffers: shared hit=2386521
-> Values Scan on "*VALUES*" (cost=0.00..75.00 rows=6000 width=4) (actual time=0.002..4.507 rows=6000 loops=1)
-> Index Scan using index_payments_on_user_id on payments (cost=0.43..54.78 rows=172 width=28) (actual time=0.010..0.793 rows=313 loops=6000)
Index Cond: (user_id = "*VALUES*".column1)
Filter: (success AND ((action)::text = 'deposit'::text) AND ((currency)::text = 'RUB'::text))
Rows Removed by Filter: 85
Buffers: shared hit=2386521
Planning time: 5.886 ms
Execution time: 6429.685 ms
I use PosgreSQL 10.8.0. Is there any chance to speed up this query?
I tried replacing DISTINCT with recursion:
WITH RECURSIVE t AS (
(SELECT min(user_id) AS user_id FROM payments)
UNION ALL
SELECT (SELECT min(user_id) FROM payments
WHERE user_id > t.user_id
) AS user_id FROM
t
WHERE t.user_id IS NOT NULL
)
SELECT payments.* FROM t
JOIN (VALUES (5),(22),(26)) --~6000 VALUES
AS v(user_id) USING (user_id)
, LATERAL (
SELECT user_id,
finished_at as last_deposit_date,
CASE When currency = 'RUB' Then amount_cents END as last_deposit_amount_cents FROM payments
WHERE payments.user_id=t.user_id
AND action = 'deposit'
AND success = 't'
AND currency IN ('RUB')
ORDER BY finished_at DESC LIMIT 1
) AS payments
WHERE t.user_id IS NOT NULL;
But it turned out even slower.
Hash Join (cost=418.67..21807.22 rows=3000 width=24) (actual time=16.804..10843.174 rows=5991 loops=1)
Hash Cond: (t.user_id = "VALUES".column1)
Buffers: shared hit=6396763
CTE t
-> Recursive Union (cost=0.46..53.73 rows=101 width=8) (actual time=0.142..1942.351 rows=237029 loops=1)
Buffers: shared hit=864281
-> Result (cost=0.46..0.47 rows=1 width=8) (actual time=0.141..0.142 rows=1 loops=1)
Buffers: shared hit=4
InitPlan 3 (returns $1)
-> Limit (cost=0.43..0.46 rows=1 width=8) (actual time=0.138..0.139 rows=1 loops=1)
Buffers: shared hit=4
-> Index Only Scan using index_payments_on_user_id on payments payments_2 (cost=0.43..155102.74 rows=4858092 width=8) (actual time=0.137..0.138 rows=1 loops=1)
Index Cond: (user_id IS NOT NULL)
Heap Fetches: 0
Buffers: shared hit=4
-> WorkTable Scan on t t_1 (cost=0.00..5.12 rows=10 width=8) (actual time=0.008..0.008 rows=1 loops=237029)
Filter: (user_id IS NOT NULL)
Rows Removed by Filter: 0
Buffers: shared hit=864277
SubPlan 2
-> Result (cost=0.48..0.49 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=237028)
Buffers: shared hit=864277
InitPlan 1 (returns $3)
-> Limit (cost=0.43..0.48 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=237028)
Buffers: shared hit=864277
-> Index Only Scan using index_payments_on_user_id on payments payments_1 (cost=0.43..80786.25 rows=1619364 width=8) (actual time=0.007..0.007 rows=1 loops=237028)
Index Cond: ((user_id IS NOT NULL) AND (user_id > t_1.user_id))
Heap Fetches: 46749
Buffers: shared hit=864277
-> Nested Loop (cost=214.94..21498.23 rows=100 width=32) (actual time=0.475..10794.535 rows=167333 loops=1)
Buffers: shared hit=6396757
-> CTE Scan on t (cost=0.00..2.02 rows=100 width=8) (actual time=0.145..1998.788 rows=237028 loops=1)
Filter: (user_id IS NOT NULL)
Rows Removed by Filter: 1
Buffers: shared hit=864281
-> Limit (cost=214.94..214.94 rows=1 width=24) (actual time=0.037..0.037 rows=1 loops=237028)
Buffers: shared hit=5532476
-> Sort (cost=214.94..215.37 rows=172 width=24) (actual time=0.036..0.036 rows=1 loops=237028)
Sort Key: payments.finished_at DESC
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=5532476
-> Index Scan using index_payments_on_user_id on payments (cost=0.43..214.08 rows=172 width=24) (actual time=0.003..0.034 rows=15 loops=237028)
Index Cond: (user_id = t.user_id)
Filter: (success AND ((action)::text = 'deposit'::text) AND ((currency)::text = 'RUB'::text))
Rows Removed by Filter: 6
Buffers: shared hit=5532473
-> Hash (cost=75.00..75.00 rows=6000 width=4) (actual time=2.255..2.255 rows=6000 loops=1)
Buckets: 8192 Batches: 1 Memory Usage: 275kB
-> Values Scan on "VALUES" (cost=0.00..75.00 rows=6000 width=4) (actual time=0.004..1.206 rows=6000 loops=1)
Planning time: 7.029 ms
Execution time: 10846.774 ms
For this query:
SELECT DISTINCT ON (user_id)
p.user_id,
p.finished_at as last_deposit_date,
(CASE WHEN p.currency = 'RUB' THEN p.amount_cents END) as last_deposit_amount_cents
FROM payments p JOIN
(VALUES (5),( 22), (26) --~6000 values
) v(user_id)
USING (user_id)
WHERE p.action = 'deposit' AND
p.success = 't' ND
p.currency = 'RUB'
ORDER BY p.user_id, p.finished_at DESC;
I don't fully understand the CASE expression, because the WHERE is filtering out all other values.
That said, I would expect an index on (action, success, currency, user_id, finished_at desc) to be helpful.

Different Explain on same Query

I have created index on events table over column derived_tstamp which has over 4 million records:
CREATE INDEX derived_tstamp_date_index ON atomic.events ( date(derived_tstamp) );
When I am running queries with two different values for domain_userid I am getting different Explain results. In Query 1 its used the index but Query 2 not using the index. How to make sure index is used all the time for faster results ?
Query 1:
EXPLAIN ANALYZE SELECT
SUM(duration) as "total_time_spent"
FROM (
SELECT
domain_sessionidx,
MIN(derived_tstamp) as "start_time",
MAX(derived_tstamp) as "finish_time",
MAX(derived_tstamp) - min(derived_tstamp) as "duration"
FROM "atomic".events
WHERE date(derived_tstamp) >= date('2017-07-01') AND date(derived_tstamp) <= date('2017-08-02') AND domain_userid = 'd01ee409-ebff-4f37-bc97-9bbda45a7225'
GROUP BY 1
) v;
Explain of query 1
Aggregate (cost=1834.00..1834.01 rows=1 width=16) (actual time=138.619..138.619 rows=1 loops=1)
-> GroupAggregate (cost=1830.83..1832.93 rows=85 width=34) (actual time=137.096..138.563 rows=186 loops=1)
Group Key: events.domain_sessionidx
-> Sort (cost=1830.83..1831.09 rows=104 width=10) (actual time=137.063..137.681 rows=2726 loops=1)
Sort Key: events.domain_sessionidx
Sort Method: quicksort Memory: 224kB
-> Bitmap Heap Scan on events (cost=1412.95..1827.35 rows=104 width=10) (actual time=108.764..136.053 rows=2726 loops=1)
Recheck Cond: ((date(derived_tstamp) >= '2017-07-01'::date) AND (date(derived_tstamp) <= '2017-08-02'::date) AND ((domain_userid)::text = 'd01ee409-ebff-4f37-bc97-9bbda45a7225'::text))
Rows Removed by Index Recheck: 19704
Heap Blocks: exact=466 lossy=3331
-> BitmapAnd (cost=1412.95..1412.95 rows=104 width=0) (actual time=108.474..108.474 rows=0 loops=1)
-> Bitmap Index Scan on derived_tstamp_date_index (cost=0.00..448.34 rows=21191 width=0) (actual time=94.371..94.371 rows=818461 loops=1)
Index Cond: ((date(derived_tstamp) >= '2017-07-01'::date) AND (date(derived_tstamp) <= '2017-08-02'::date))
-> Bitmap Index Scan on events_domain_userid_index (cost=0.00..964.31 rows=20767 width=0) (actual time=3.044..3.044 rows=16834 loops=1)
Index Cond: ((domain_userid)::text = 'd01ee409-ebff-4f37-bc97-9bbda45a7225'::text)
Planning time: 0.166 ms
Query 2:
EXPLAIN ANALYZE SELECT
SUM(duration) as "total_time_spent"
FROM (
SELECT
domain_sessionidx,
MIN(derived_tstamp) as "start_time",
MAX(derived_tstamp) as "finish_time",
MAX(derived_tstamp) - min(derived_tstamp) as "duration"
FROM "atomic".events
WHERE date(derived_tstamp) >= date('2017-07-01') AND date(derived_tstamp) <= date('2017-08-02') AND domain_userid = 'e4c94f3e-9841-4b65-9031-ca4aa03809e7'
GROUP BY 1
) v;
Explain of query 2:
Aggregate (cost=226.12..226.13 rows=1 width=16) (actual time=0.402..0.402 rows=1 loops=1)
-> GroupAggregate (cost=226.08..226.10 rows=1 width=34) (actual time=0.394..0.397 rows=2 loops=1)
Group Key: events.domain_sessionidx
-> Sort (cost=226.08..226.08 rows=1 width=10) (actual time=0.381..0.386 rows=13 loops=1)
Sort Key: events.domain_sessionidx
Sort Method: quicksort Memory: 25kB
-> Index Scan using events_domain_userid_index on events (cost=0.56..226.07 rows=1 width=10) (actual time=0.030..0.368 rows=13 loops=1)
Index Cond: ((domain_userid)::text = 'e4c94f3e-9841-4b65-9031-ca4aa03809e7'::text)
Filter: ((date(derived_tstamp) >= '2017-07-01'::date) AND (date(derived_tstamp) <= '2017-08-02'::date))
Rows Removed by Filter: 184
Planning time: 0.162 ms
Execution time: 0.440 ms
The index is not used in the second case because there are so few rows matching the condition domain_userid = 'e4c94f3e-9841-4b65-9031-ca4aa03809e7' (only 197) that it is cheaper to filter those rows than to perform a bitmap index scan using your new index.