Slow SQL query (has_many through)

Slow SQL query (has_many through) - sql

I am facing a slow query on our production machine.
Following Setup:
the join model (polymorphic)
# Table name: util_participants
#
# id :bigint not null, primary key
# assignable_type :string not null
# assignable_id :bigint not null
# contact_id :bigint
#
# Indexes
#
# assignable_index (assignable_id,assignable_type)
#
# Foreign Keys
#
# fk_rails_... (contact_id => client_contacts.id)
This class is for assigning contacts to different elements, like a documentation (assignable).
One query i have is
documentation.contacts
> SELECT COUNT(*) FROM "client_contacts" INNER JOIN "util_participants" ON "client_contacts"."id" = "util_participants"."contact_id" WHERE "client_contacts"."account_id" = $1 AND "util_participants"."assignable_id" = $2 AND "util_participants"."assignable_type" = $3
account is for multi-tenancy
For a specific element i have a query time of 7 seconds. But i cannot track down the cause. If i switch the account_id to another customer, i get fast execution time.
For example
SELECT count(*)
FROM client_contacts
INNER JOIN util_participants
ON client_contacts.id = util_participants.contact_id
WHERE client_contacts.account_id = 35
AND util_participants.assignable_id = 1;
is very slow.
SELECT count(*)
FROM client_contacts
INNER JOIN util_participants
ON client_contacts.id = util_participants.contact_id
WHERE client_contacts.account_id = 27
AND util_participants.assignable_id = 1;
is not but 27 has even more contact rows.
EXPLAIN of postgres results in
Aggregate (cost=3688.79..3688.80 rows=1 width=8)
-> Nested Loop (cost=0.28..3688.79 rows=1 width=0)
Join Filter: (client_contacts.id = util_participants.contact_id)
-> Index Scan using index_client_contacts_on_account_id_and_client_id on client_contacts (cost=0.28..8.30 rows=1 width=8)
Index Cond: (account_id = 35)
-> Seq Scan on util_participants (cost=0.00..3680.48 rows=1 width=8)
Filter: (assignable_id = 1)
(7 rows)
Total row count for participants is 130k and contacts 6500
Update:
Here the output for
slow one:
> EXPLAIN(ANALYZE, VERBOSE, BUFFERS) SELECT count(*) FROM client_contacts INNER JOIN util_participants ON client_contacts.id = util_participants.contact_id WHERE client_contacts.account_id = 35 AND util_participants.assignable_id = 1;
Aggregate (cost=3688.79..3688.80 rows=1 width=8) (actual time=6991.704..6991.706 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=893735
-> Nested Loop (cost=0.28..3688.79 rows=1 width=0) (actual time=6991.699..6991.700 rows=0 loops=1)
Join Filter: (client_contacts.id = util_participants.contact_id)
Rows Removed by Join Filter: 428
Buffers: shared hit=893735
-> Index Scan using index_client_contacts_on_account_id_and_client_id on public.client_contacts (cost=0.28..8.30 rows=1 width=8) (actual time=0.015..1.160 rows=428 loops=1)
Output: ....
Index Cond: (client_contacts.account_id = 35)
Buffers: shared hit=71
-> Seq Scan on public.util_participants (cost=0.00..3680.48 rows=1 width=8) (actual time=0.002..16.325 rows=1 loops=428)
Output: ....
Filter: (util_participants.assignable_id = 1)
Rows Removed by Filter: 127268
Buffers: shared hit=893664
Planning Time: 0.183 ms
Execution Time: 6991.741 ms
fast one (different account_id)
EXPLAIN(ANALYZE, VERBOSE, BUFFERS) SELECT count(*) FROM client_contacts INNER JOIN util_participants ON client_contacts.id = util_participants.contact_id WHERE client_contacts.account_id = 33 AND util_participants.assignable_id = 1;
Aggregate (cost=3688.79..3688.80 rows=1 width=8) (actual time=16.882..16.884 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=2088
-> Nested Loop (cost=0.28..3688.78 rows=1 width=0) (actual time=16.876..16.878 rows=0 loops=1)
Inner Unique: true
Buffers: shared hit=2088
-> Seq Scan on public.util_participants (cost=0.00..3680.48 rows=1 width=8) (actual time=0.007..16.873 rows=1 loops=1)
Output: ...
Filter: (util_participants.assignable_id = 1)
Rows Removed by Filter: 127268
Buffers: shared hit=2088
-> Index Scan using client_contacts_pkey on public.client_contacts (cost=0.28..8.30 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=1)
Output: ...
Index Cond: (client_contacts.id = util_participants.contact_id)
Filter: (client_contacts.account_id = 33)
Planning Time: 0.176 ms
Execution Time: 16.923 ms
Why the different query plans for almost the same query?

Related

Postgres: Slow query when using OR statement in a join query

We run a join query between 2 tables.
The query has an OR statement that compares one column from the left table and one column from the right table. The query performance is very low, and we fixed it by changing the OR to UNION.
Why is this happening? I'm looking for a detailed explanation or a reference to the documentation that might shed a light on the issue.
Query with Or Statment:
db1=# explain analyze select count(*)
from conversations
join agents on conversations.agent_id=agents.id
where conversations.id=1 or agents.id = '123';
**Query plan**
----------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=**11017.95..11017.96** rows=1 width=8) (actual time=54.088..54.088 rows=1 loops=1)
-> Gather (cost=11017.73..11017.94 rows=2 width=8) (actual time=53.945..57.181 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=10017.73..10017.74 rows=1 width=8) (actual time=48.303..48.303 rows=1 loops=3)
-> Hash Join (cost=219.26..10016.69 rows=415 width=0) (actual time=5.292..48.287 rows=130 loops=3)
Hash Cond: (conversations.agent_id = agents.id)
Join Filter: ((conversations.id = 1) OR ((agents.id)::text = '123'::text))
Rows Removed by Join Filter: 80035
-> Parallel Seq Scan on conversations (cost=0.00..9366.95 rows=163995 width=8) (actual time=0.017..14.972 rows=131196 loops=3)
-> Hash (cost=143.56..143.56 rows=6056 width=16) (actual time=2.686..2.686 rows=6057 loops=3)
Buckets: 8192 Batches: 1 Memory Usage: 353kB
-> Seq Scan on agents (cost=0.00..143.56 rows=6056 width=16) (actual time=0.011..1.305 rows=6057 loops=3)
Planning time: 0.710 ms
Execution time: 57.276 ms
(15 rows)
Changing the OR to UNION:
db1=# explain analyze select count(*) from (
select *
from conversations
join agents on conversations.agent_id=agents.id
where conversations.installation_id=1
union
select *
from conversations
join agents on conversations.agent_id=agents.id
where agents.source_id = '123') as subquery;
**Query plan:**
----------------------------------------------------------------------------------------------------------------------------------
Aggregate (**cost=1114.31..1114.32** rows=1 width=8) (actual time=8.038..8.038 rows=1 loops=1)
-> HashAggregate (cost=1091.90..1101.86 rows=996 width=1437) (actual time=7.783..8.009 rows=390 loops=1)
Group Key: conversations.id, conversations.created, conversations.modified, conversations.source_created, conversations.source_id, conversations.installation_id, bra
in_conversation.resolution_reason, conversations.solve_time, conversations.agent_id, conversations.submission_reason, conversations.is_marked_as_duplicate, conversations.n
um_back_and_forths, conversations.is_closed, conversations.is_solved, conversations.conversation_type, conversations.related_ticket_source_id, conversations.channel, brain_convers
ation.last_updated_from_platform, conversations.csat, agents.id, agents.created, agents.modified, agents.name, agents.source_id, organizati
on_agent.installation_id, agents.settings
-> Append (cost=219.68..1027.16 rows=996 width=1437) (actual time=5.517..6.307 rows=390 loops=1)
-> Hash Join (cost=219.68..649.69 rows=931 width=224) (actual time=5.516..6.063 rows=390 loops=1)
Hash Cond: (conversations.agent_id = agents.id)
-> Index Scan using conversations_installation_id_b3ff5c00 on conversations (cost=0.42..427.98 rows=931 width=154) (actual time=0.039..0.344 rows=879 loops=1)
Index Cond: (installation_id = 1)
-> Hash (cost=143.56..143.56 rows=6056 width=70) (actual time=5.394..5.394 rows=6057 loops=1)
Buckets: 8192 Batches: 1 Memory Usage: 710kB
-> Seq Scan on agents (cost=0.00..143.56 rows=6056 width=70) (actual time=0.014..1.938 rows=6057 loops=1)
-> Nested Loop (cost=0.70..367.52 rows=65 width=224) (actual time=0.210..0.211 rows=0 loops=1)
-> Index Scan using agents_source_id_106c8103_like on agents agents_1 (cost=0.28..8.30 rows=1 width=70) (actual time=0.210..0.210 rows=0 loops=1)
Index Cond: ((source_id)::text = '123'::text)
-> Index Scan using conversations_agent_id_de76554b on conversations conversations_1 (cost=0.42..358.12 rows=110 width=154) (never executed)
Index Cond: (agent_id = agents_1.id)
Planning time: 2.024 ms
Execution time: 9.367 ms
(18 rows)

Yes. or has a way of killing the performance of queries. For this query:
select count(*)
from conversations c join
agents a
on c.agent_id = a.id
where c.id = 1 or a.id = 123;
Note I removed the quotes around 123. It looks like a number so I assume it is. For this query, you want an index on conversations(agent_id).
Probably the most effective way to write the query is:
select count(*)
from ((select 1
from conversations c join
agents a
on c.agent_id = a.id
where c.id = 1
) union all
(select 1
from conversations c join
agents a
on c.agent_id = a.id
where a.id = 123 and c.id <> 1
)
) ac;
Note the use of union all rather than union. The additional where condition eliminates duplicates.
This can take advantage of the following indexes:
conversations(id, agent_id)
agents(id)
conversations(agent_id, id)

How to dump data spread across multiple tables efficiently

I have main table with worklogs and some related tables, like users, teams, etc. I have to provide export functionality that will get up to 100 000 rows and join with 3-4 tables. Currently this task runs in 30-40 seconds which is too much. Any ways to improve performance for this query?
Query itself
SELECT "trapping_snaring_trapwork"."id",
"trapping_snaring_trapwork"."added",
"trapping_snaring_trapwork"."added_by_id",
"trapping_snaring_trapwork"."import_id",
"trapping_snaring_trapwork"."team_id",
"trapping_snaring_trapwork"."date",
"contacts_contact"."id",
"contacts_contact"."added",
"contacts_contact"."added_by_id",
"contacts_contact"."team_id",
"contacts_contact"."first_name",
"contacts_contact"."last_name",
"trapping_snaring_trapfeature"."id",
"trapping_snaring_trapfeature"."added",
"trapping_snaring_trapfeature"."added_by_id",
"trapping_snaring_trapfeature"."team_id",
"trapping_snaring_trapfeature"."name",
"trapping_snaring_trapfeature"."description",
"lists_snaringtraptype"."id",
"lists_snaringtraptype"."added",
"lists_snaringtraptype"."added_by_id",
"lists_snaringtraptype"."team_id",
"lists_snaringtraptype"."name",
"lists_snaringtraptype"."description",
"trapping_snaring_groupfeature"."id",
"trapping_snaring_groupfeature"."added",
"trapping_snaring_groupfeature"."added_by_id",
"trapping_snaring_groupfeature"."team_id",
"trapping_snaring_groupfeature"."name",
"trapping_snaring_groupfeature"."description",
"trapping_snaring_groupfeature"."end_date",
"trapping_session"."id",
"trapping_session"."added",
FROM "trapping_snaring_trapwork"
INNER JOIN "contacts_contact" ON ("trapping_snaring_trapwork"."checker_id" = "contacts_contact"."id")
INNER JOIN "trapping_snaring_trapfeature" ON ("trapping_snaring_trapwork"."trap_id" = "trapping_snaring_trapfeature"."id")
INNER JOIN "lists_snaringtraptype" ON ("trapping_snaring_trapfeature"."trap_type_id" = "lists_snaringtraptype"."id")
LEFT OUTER JOIN "trapping_snaring_groupfeature" ON ("trapping_snaring_trapfeature"."group_id" = "trapping_snaring_groupfeature"."id")
LEFT OUTER JOIN "trapping_session" ON ("trapping_snaring_trapwork"."session_id" = "trapping_session"."id")
WHERE "trapping_snaring_trapwork"."team_id" = 11;
Execution plan:
Merge Join (cost=2358.83..2473.45 rows=11334 width=998) (actual time=22.810..140.677 rows=11336 loops=1)
Merge Cond: (trapping_snaring_trapwork.checker_id = contacts_contact.id)
-> Nested Loop Left Join (cost=1.00..12439.20 rows=11334 width=820) (actual time=0.033..103.113 rows=11336 loops=1)
-> Nested Loop Left Join (cost=0.86..10436.08 rows=11334 width=513) (actual time=0.029..89.731 rows=11336 loops=1)
-> Nested Loop (cost=0.71..7170.02 rows=11334 width=401) (actual time=0.024..63.432 rows=11336 loops=1)
-> Nested Loop (cost=0.57..5033.83 rows=11334 width=231) (actual time=0.020..39.480 rows=11336 loops=1)
-> Index Scan using trapping_snaring_trapwork_checker_id_ae914a8a on trapping_snaring_trapwork (cost=0.29..796.63 rows=11334 width=74) (actual time=0.012..6.637 rows=11336 loops=1)
Filter: (team_id = 11)
Rows Removed by Filter: 1
-> Index Scan using trapping_snaring_trapfeature_pkey on trapping_snaring_trapfeature (cost=0.28..0.36 rows=1 width=157) (actual time=0.002..0.002 rows=1 loops=11336)
Index Cond: (id = trapping_snaring_trapwork.trap_id)
-> Index Scan using lists_snaringtraptype_pkey on lists_snaringtraptype (cost=0.15..0.18 rows=1 width=170) (actual time=0.001..0.001 rows=1 loops=11336)
Index Cond: (id = trapping_snaring_trapfeature.trap_type_id)
-> Index Scan using trapping_snaring_groupfeature_pkey on trapping_snaring_groupfeature (cost=0.14..0.28 rows=1 width=112) (actual time=0.001..0.001 rows=1 loops=11336)
Index Cond: (trapping_snaring_trapfeature.group_id = id)
-> Index Scan using trapping_session_pkey on trapping_session (cost=0.14..0.17 rows=1 width=307) (actual time=0.000..0.000 rows=0 loops=11336)
Index Cond: (trapping_snaring_trapwork.session_id = id)
-> Index Scan using contacts_contact_pkey on contacts_contact (cost=0.29..2257.76 rows=40098 width=178) (actual time=0.006..17.350 rows=50661 loops=1)
Planning time: 19.557 ms
Execution time: 143.044 ms

postgresql query performance enhancement

I am trying to get a row with highest popularity. Ordering by descending popularity is slowing down the query significantly.
Is there a better way to optimize this query ?
Postgresql - 9.5
```explain analyse SELECT v.cosmo_id,
v.resource_id, k.gid, k.popularity,v.cropinfo_id
FROM rmsg.verifications V INNER JOIN rmip.resourceinfo R ON
(R.id=V.resource_id AND R.source_id=54) INNER JOIN rmpp.kgidinfo K ON
(K.cosmo_id=V.cosmo_id) WHERE V.status=1 AND
v.crop_Status=1 AND V.locked_time isnull ORDER BY k.popularity
desc, (v.cosmo_id,
v.resource_id, v.cropinfo_id) LIMIT 1;```
QUERY PLAN
Limit (cost=470399.99..470399.99 rows=1 width=31) (actual time=19655.552..19655.553 rows=1 loops=1)
Sort (cost=470399.99..470434.80 rows=13923 width=31) (actual time=19655.549..19655.549 rows=1 loops=1)
Sort Key: k.popularity DESC, (ROW(v.cosmo_id, v.resource_id, v.cropinfo_id))
Sort Method: top-N heapsort Memory: 25kB
-> Nested Loop (cost=19053.91..470330.37 rows=13923 width=31) (actual time=58.365..19627.405 rows=23006 loops=1)
-> Hash Join (cost=19053.48..459008.74 rows=13188 width=16) (actual time=58.275..19268.339 rows=19165 loops=1)
Hash Cond: (v.resource_id = r.id)
-> Seq Scan on verifications v (cost=0.00..409876.92 rows=7985725 width=16) (actual time=0.035..11097.163 rows=9908140 loops=1)
Filter: ((locked_time IS NULL) AND (status = 1) AND (crop_status = 1))
Rows Removed by Filter: 1126121
-> Hash (cost=18984.23..18984.23 rows=5540 width=4) (actual time=57.101..57.101 rows=5186 loops=1)
Buckets: 8192 Batches: 1 Memory Usage: 247kB
-> Bitmap Heap Scan on resourceinfo r (cost=175.37..18984.23 rows=5540 width=4) (actual time=2.827..51.318 rows=5186 loops=1)
Recheck Cond: (source_id = 54)
Heap Blocks: exact=5907
-> Bitmap Index Scan on resourceinfo_source_id_key (cost=0.00..173.98 rows=5540 width=0) (actual time=1.742..1.742 rows=6483 loops=1)
Index Cond: (source_id = 54)
Index Scan using kgidinfo_cosmo_id_idx on kgidinfo k (cost=0.43..0.85 rows=1 width=23) (actual time=0.013..0.014 rows=1 loops=19165)
Index Cond: (cosmo_id = v.cosmo_id)
Planning time: 1.083 ms
Execution time: 19655.638 ms
(21 rows)```

This is your query, simplified by removing parentheses:
SELECT v.cosmo_id, v.resource_id, k.gid, k.popularity, v.cropinfo_id
FROM rmsg.verifications V INNER JOIN
rmip.resourceinfo R
ON R.id = V.resource_id AND R.source_id = 54 INNER JOIN
rmpp.kgidinfo K
ON K.cosmo_id = V.cosmo_id
WHERE V.status = 1 AND v.crop_Status = 1 AND
V.locked_time is null
ORDER BY k.popularity desc, v.cosmo_id, v.resource_id, v.cropinfo_id
LIMIT 1;
For this query, I would think in terms of indexes on verifications(status, crop_status, locked_time, resource_id, cosmo_id, crop_info_id), resourceinfo(id, source_id), and kgidinfo(cosmo_id). I don't see an easy way to remove the ORDER BY.
In looking at the query, I wonder if you might have a Cartesian product problem between the two tables.

DISTINCT INNER JOIN slow

I've written the following PostgreSQL query which works as it should. However, it seems to be awfully slow, sometimes taking up to 10 seconds to return a result. I'm sure there is something in my statement that is causing this to be slow.
Can anyone help determine why this query is slow?
SELECT DISTINCT ON (school_classes.class_id,attendance_calendar.school_date)
school_classes.class_id, school_classes.class_name, school_classes.grade_id
, school_gradelevels.linked_calendar, attendance_calendars.calendar_id
, attendance_calendar.school_date, attendance_calendar.minutes
, teacher_join_classes_subjects.staff_id, staff.first_name, staff.last_name
FROM school_classes
INNER JOIN school_gradelevels ON school_gradelevels.id=school_classes.grade_id
INNER JOIN teacher_join_classes_subjects ON teacher_join_classes_subjects.class_id=school_classes.class_id
INNER JOIN staff ON staff.staff_id=teacher_join_classes_subjects.staff_id
INNER JOIN attendance_calendars ON attendance_calendars.title=school_gradelevels.linked_calendar
INNER JOIN attendance_calendar ON attendance_calendar.calendar_id=attendance_calendars.calendar_id
WHERE teacher_join_classes_subjects.syear='2013'
AND staff.syear='2013'
AND attendance_calendars.syear='2013'
AND teacher_join_classes_subjects.does_attendance='Y'
AND teacher_join_classes_subjects.subject_id IS NULL
AND attendance_calendar.school_date<CURRENT_DATE
AND attendance_calendar.school_date NOT IN (
SELECT com.school_date FROM attendance_completed com
WHERE com.class_id=school_classes.class_id
AND (com.period_id='101' AND attendance_calendar.minutes>='151' OR
com.period_id='95' AND attendance_calendar.minutes='150') )
I replaced the NOT IN with the following:
AND NOT EXISTS (
SELECT com.school_date
FROM attendance_completed com
WHERE com.class_id=school_classes.class_id
AND com.school_date=attendance_calendar.school_date
AND (com.period_id='101' AND attendance_calendar.minutes>='151' OR
com.period_id='95' AND attendance_calendar.minutes='150') )
Result of EXPLAIN ANALYZE:
Unique (cost=2998.39..2998.41 rows=3 width=85) (actual time=10751.111..10751.118 rows=1 loops=1)
-> Sort (cost=2998.39..2998.40 rows=3 width=85) (actual time=10751.110..10751.110 rows=2 loops=1)
Sort Key: school_classes.class_id, attendance_calendar.school_date
Sort Method: quicksort Memory: 25kB
-> Hash Join (cost=2.03..2998.37 rows=3 width=85) (actual time=6409.471..10751.045 rows=2 loops=1)
Hash Cond: ((teacher_join_classes_subjects.class_id = school_classes.class_id) AND (school_gradelevels.id = school_classes.grade_id))
Join Filter: (NOT (SubPlan 1))
-> Nested Loop (cost=0.00..120.69 rows=94 width=81) (actual time=2.468..1187.397 rows=26460 loops=1)
Join Filter: (attendance_calendars.calendar_id = attendance_calendar.calendar_id)
-> Nested Loop (cost=0.00..42.13 rows=1 width=70) (actual time=0.087..3.247 rows=735 loops=1)
Join Filter: ((attendance_calendars.title)::text = (school_gradelevels.linked_calendar)::text)
-> Nested Loop (cost=0.00..40.80 rows=1 width=277) (actual time=0.077..1.005 rows=245 loops=1)
-> Nested Loop (cost=0.00..39.61 rows=1 width=27) (actual time=0.064..0.572 rows=49 loops=1)
-> Seq Scan on teacher_join_classes_subjects (cost=0.00..10.48 rows=4 width=14) (actual time=0.022..0.143 rows=49 loops=1)
Filter: ((subject_id IS NULL) AND (syear = 2013::numeric) AND ((does_attendance)::text = 'Y'::text))
-> Index Scan using staff_pkey on staff (cost=0.00..7.27 rows=1 width=20) (actual time=0.006..0.007 rows=1 loops=49)
Index Cond: (staff.staff_id = teacher_join_classes_subjects.staff_id)
Filter: (staff.syear = 2013::numeric)
-> Seq Scan on attendance_calendars (cost=0.00..1.18 rows=1 width=250) (actual time=0.003..0.006 rows=5 loops=49)
Filter: (attendance_calendars.syear = 2013::numeric)
-> Seq Scan on school_gradelevels (cost=0.00..1.15 rows=15 width=11) (actual time=0.001..0.005 rows=15 loops=245)
-> Seq Scan on attendance_calendar (cost=0.00..55.26 rows=1864 width=18) (actual time=0.003..1.129 rows=1824 loops=735)
Filter: (attendance_calendar.school_date Hash (cost=1.41..1.41 rows=41 width=18) (actual time=0.040..0.040 rows=41 loops=1)
-> Seq Scan on school_classes (cost=0.00..1.41 rows=41 width=18) (actual time=0.006..0.015 rows=41 loops=1)
SubPlan 1
-> Seq Scan on attendance_completed com (cost=0.00..958.28 rows=5 width=4) (actual time=0.228..5.411 rows=17 loops=1764)
Filter: ((class_id = $0) AND (((period_id = 101::numeric) AND ($1 >= 151::numeric)) OR ((period_id = 95::numeric) AND ($1 = 150::numeric))))

NOT EXISTS is an excellent choice. Almost always better than NOT IN. More details here.
I simplified your query a bit (which looks fine, generally):
SELECT DISTINCT ON (c.class_id, a.school_date)
c.class_id, c.class_name, c.grade_id
,g.linked_calendar, aa.calendar_id
,a.school_date, a.minutes
,t.staff_id, s.first_name, s.last_name
FROM school_classes c
JOIN teacher_join_classes_subjects t USING (class_id)
JOIN staff s USING (staff_id)
JOIN school_gradelevels g ON g.id = c.grade_id
JOIN attendance_calendars aa ON aa.title = g.linked_calendar
JOIN attendance_calendar a ON a.calendar_id = aa.calendar_id
WHERE t.syear = 2013
AND s.syear = 2013
AND aa.syear = 2013
AND t.does_attendance = 'Y' -- looks like it should be boolean!
AND t.subject_id IS NULL
AND a.school_date < CURRENT_DATE
AND NOT EXISTS (
SELECT 1
FROM attendance_completed x
WHERE x.class_id = c.class_id
AND x.school_date = a.school_date
AND (x.period_id = 101 AND a.minutes >= 151 OR -- actually numbers?
x.period_id = 95 AND a.minutes = 150)
)
ORDER BY c.class_id, a.school_date, ???
What seems to be missing is ORDER BY which should accompany your DISTINCT ON. Add more ORDER BY items in place of ???. If there are duplicates to pick from, you probably want to define which to pick.
Numeric literals don't need single quotes and boolean values should be coded as such.
You may want to revisit the chapter about data types.

Optimise the PG query

The query is used very often in the app and is too expensive.
What are the things I can do to optimise it and bring the total time to milliseconds (rather than hundreds of ms)?
NOTES:
removing DISTINCT improves (down to ~460ms), but I need to to get rid of cartesian product :( (yeah, show better way of avoiding it)
removing OREDER BY name improves, but not significantly.
The query:
SELECT DISTINCT properties.*
FROM properties JOIN developments ON developments.id = properties.development_id
-- Development allocations
LEFT JOIN allocation_items AS dev_items ON dev_items.development_id = properties.development_id
LEFT JOIN allocations AS dev_allocs ON dev_items.allocation_id = dev_allocs.id
-- Group allocations
LEFT JOIN properties_property_groupings ppg ON ppg.property_id = properties.id
LEFT JOIN property_groupings pg ON pg.id = ppg.property_grouping_id
LEFT JOIN allocation_items prop_items ON prop_items.property_grouping_id = pg.id
LEFT JOIN allocations prop_allocs ON prop_allocs.id = prop_items.allocation_id
WHERE
(properties.status <> 'deleted') AND ((
properties.status <> 'inactive'
AND (
(dev_allocs.receiving_company_id = 175 OR prop_allocs.receiving_company_id = 175)
AND developments.status = 'active'
)
OR developments.company_id = 175
)
AND EXISTS (
SELECT 1 FROM development_participations dp
JOIN participations p ON p.id = dp.participation_id
WHERE dp.allowed
AND p.user_id = 387 AND p.company_id = 175
AND dp.development_id = properties.development_id
LIMIT 1
)
)
ORDER BY properties.name
EXPLAIN ANALYZE
Unique (cost=72336.86..72517.53 rows=1606 width=4336) (actual time=703.766..710.920 rows=219 loops=1)
-> Sort (cost=72336.86..72340.87 rows=1606 width=4336) (actual time=703.765..704.698 rows=5091 loops=1)
Sort Key: properties.name, properties.id, properties.status, properties.level, etc etc (all columns)
Sort Method: external sort Disk: 1000kB
-> Nested Loop Left Join (cost=0.00..69258.84 rows=1606 width=4336) (actual time=25.230..366.489 rows=5091 loops=1)
Filter: ((((properties.status)::text <> 'inactive'::text) AND ((dev_allocs.receiving_company_id = 175) OR (prop_allocs.receiving_company_id = 175)) AND ((developments.status)::text = 'active'::text)) OR (developments.company_id = 175))
-> Nested Loop Left Join (cost=0.00..57036.99 rows=41718 width=4355) (actual time=25.122..247.587 rows=99567 loops=1)
-> Nested Loop Left Join (cost=0.00..47616.39 rows=21766 width=4355) (actual time=25.111..163.827 rows=39774 loops=1)
-> Nested Loop Left Join (cost=0.00..41508.16 rows=21766 width=4355) (actual time=25.101..112.452 rows=39774 loops=1)
-> Nested Loop Left Join (cost=0.00..34725.22 rows=21766 width=4351) (actual time=25.087..68.104 rows=19887 loops=1)
-> Nested Loop Left Join (cost=0.00..28613.00 rows=21766 width=4351) (actual time=25.076..39.360 rows=19887 loops=1)
-> Nested Loop (cost=0.00..27478.54 rows=1147 width=4347) (actual time=25.059..29.966 rows=259 loops=1)
-> Index Scan using developments_pkey on developments (cost=0.00..25.17 rows=49 width=15) (actual time=0.048..0.127 rows=48 loops=1)
Filter: (((status)::text = 'active'::text) OR (company_id = 175))
-> Index Scan using index_properties_on_development_id on properties (cost=0.00..559.95 rows=26 width=4336) (actual time=0.534..0.618 rows=5 loops=48)
Index Cond: (development_id = developments.id)
Filter: (((status)::text <> 'deleted'::text) AND (SubPlan 1))
SubPlan 1
-> Limit (cost=0.00..10.00 rows=1 width=0) (actual time=0.011..0.011 rows=0 loops=2420)
-> Nested Loop (cost=0.00..10.00 rows=1 width=0) (actual time=0.011..0.011 rows=0 loops=2420)
Join Filter: (dp.participation_id = p.id)
-> Seq Scan on development_participations dp (cost=0.00..1.71 rows=1 width=4) (actual time=0.004..0.008 rows=1 loops=2420)
Filter: (allowed AND (development_id = properties.development_id))
-> Index Scan using index_participations_on_user_id on participations p (cost=0.00..8.27 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=3148)
Index Cond: (user_id = 387)
Filter: (company_id = 175)
-> Index Scan using index_allocation_items_on_development_id on allocation_items dev_items (cost=0.00..0.70 rows=23 width=8) (actual time=0.003..0.016 rows=77 loops=259)
Index Cond: (development_id = properties.development_id)
-> Index Scan using allocations_pkey on allocations dev_allocs (cost=0.00..0.27 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=19887)
Index Cond: (dev_items.allocation_id = id)
-> Index Scan using index_properties_property_groupings_on_property_id on properties_property_groupings ppg (cost=0.00..0.29 rows=2 width=8) (actual time=0.001..0.001 rows=2 loops=19887)
Index Cond: (property_id = properties.id)
-> Index Scan using property_groupings_pkey on property_groupings pg (cost=0.00..0.27 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=39774)
Index Cond: (id = ppg.property_grouping_id)
-> Index Scan using index_allocation_items_on_property_grouping_id on allocation_items prop_items (cost=0.00..0.36 rows=6 width=8) (actual time=0.001..0.001 rows=2 loops=39774)
Index Cond: (property_grouping_id = pg.id)
-> Index Scan using allocations_pkey on allocations prop_allocs (cost=0.00..0.27 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=99567)
Index Cond: (id = prop_items.allocation_id)
Total runtime: 716.692 ms
(39 rows)

Answering my own question.
This query has 2 big issues:
6 LEFT JOINs that produce cartesian product (resulting in billion-s of records even on small dataset).
DISTINCT that has to sort that billion records dataset.
So I had to eliminate those.
The way I did it is by replacing JOINs with 2 subqueries (won't provide it here since it should be pretty obvious).
As a result, the actual time went from ~700-800ms down to ~45ms which is more or less acceptable.

Most time is spend in the disk sort, you should use RAM by changing work_mem:
SET work_mem TO '20MB';
And check EXPLAIN ANALYZE again

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Slow SQL query (has_many through) - sql

Related

Postgres: Slow query when using OR statement in a join query

How to dump data spread across multiple tables efficiently

postgresql query performance enhancement

DISTINCT INNER JOIN slow

Optimise the PG query

Categories

Resources