I have tracked the cause of a slow API endpoint to an SQL query and cannot for the life of me optimise it.
The query is as follows:
SELECT "Views".*, "Show".*, "Episode".*, "User".* FROM "Views"
LEFT OUTER JOIN "Shows" AS "Show" ON "Show"."id" = "Views"."ShowId"
LEFT OUTER JOIN "Episodes" AS "Episode" ON "Episode"."id" = "Views"."EpisodeId"
LEFT OUTER JOIN "Users" AS "User" ON "User"."id" = "Views"."UserId"
WHERE "Show"."tvdb_id" IN (259063,82066,258823,265766,261742,82283,205281,182061,121361)
AND "User"."id"=29;
It takes between 3000-4200ms to complete. The result of an EXPLAIN ANALYZE on the query can be found here: http://explain.depesz.com/s/J9R
EDIT: I have also tried separating the IN() into ORs:
SELECT "Views".*, "Show".*, "Episode".*, "User".* FROM "Views"
LEFT OUTER JOIN "Shows" AS "Show" ON "Show"."id" = "Views"."ShowId"
LEFT OUTER JOIN "Episodes" AS "Episode" ON "Episode"."id" = "Views"."EpisodeId"
LEFT OUTER JOIN "Users" AS "User" ON "User"."id" = "Views"."UserId"
WHERE
"Show"."tvdb_id"=259063 OR
"Show"."tvdb_id"=82066 OR
"Show"."tvdb_id"=258823 OR
"Show"."tvdb_id"=265766 OR
"Show"."tvdb_id"=261742 OR
"Show"."tvdb_id"=82283 OR
"Show"."tvdb_id"=205281 OR
"Show"."tvdb_id"=182061 OR
"Show"."tvdb_id"=121361
AND "User"."id"=29;
I have tried creating indexes on the columns referenced in the LEFT OUTER JOIN but that resulted in marginal gains (if anything). What's interesting is I also tried reducing the WHERE x IN to a single WHERE x = y and this instantly improved the situation, responding instead in 200-300ms. So how would I go about optimizing the IN statement?
Thanks!
Try to make combined index on table Views on fields ShowId, EpisodeId, UserId together.
CREATE INDEX Views_idx ON Views
USING btree (ShowId, EpisodeId, UserId);
ps. explain link is wrong, it's for some other query
Best regards,
nele
Related
This query has so many joins and i have to apply conditions on joined table columns to get the desired results but query becomes slow when i apply condition on a datetime column.
Here is the query
select
distinct v0_.id as id_0,
MIN(v4_.price) as sclr_4
from
venue v0_
left join facility f5_ on
v0_.id = f5_.venue_id
and (f5_.deleted_at is null)
left join sport_facility_types s6_ on
f5_.id = s6_.facility_id
left join taxonomy_term t7_ on
s6_.sport_id = t7_.id
and (t7_.deleted_at is null)
left join term t8_ on
t7_.term_id = t8_.id
left join sport_facility_types_taxonomy_term s10_ on
s6_.id = s10_.sport_facility_types_id
left join taxonomy_term t9_ on
t9_.id = s10_.taxonomy_term_id
and (t9_.deleted_at is null)
left join term t11_ on
t9_.term_id = t11_.id
left join facility_venue_item_price f12_ on
f5_.id = f12_.facility_id
left join venue_item_price v4_ on
f12_.venue_item_price_id = v4_.id
left join calendar_entry c13_ on
v4_.calendar_entry_id = c13_.id
where
(v0_.status = 'active'
and f5_.status = 'active')
and (v0_.deleted_at is null)
and c13_.start_at >= '2022-10-21 19:00:00' --- this slows down the query
group by
v0_.id
And here is the query plan https://explain.dalibo.com/plan/46h0fb3343e246a5.
The query plan is so big that i cannot paste it here
Plain query plan https://explain.depesz.com/s/7qnD
Plain query plan without where condition https://explain.depesz.com/s/3sK3
The query shouldn't take much time as there are not many rows in tables.
calendar_entry table has ~350000 rows
venue_item_price table has also ~320000 rows
Your WHERE condition turns all the outer joins into inner joins (because c13_.start_at cannot be NULL any more), which changes the whole game. Usually that is an advantage, because it gives the optimizer more freedom, but that seems to have worked in the wrong direction in your case. One reason for that may be that you join more than 8 tables, and the default value for join_collapse_limit is 8.
That's pretty much all I can say without a readable execution plan.
When you don't reference c13_ anywhere in the output or WHERE clause, it can just skip doing that join altogether, because it can't change anything. That is, the join is (presumably) on a unique key, so it can't cause multiple rows to be returned; and it is a left join, so it can't cause zero rows to be returned.
Once you reference it in the WHERE, however, the join has to be executed. That join only contributes about half the time, so it isn't like the query was blazingly fast to start with.
Is there any reason on an INNER JOIN to have a condition on the main table vs in the WHERE clause?
Example in INNER JOIN:
SELECT
(various columns here from each table)
FROM dbo.MainTable AS m
INNER JOIN dbo.JohnDataRecord AS jdr
ON m.ibID = jdr.ibID
AND m.MainID = #MainId -- question here
AND jdr.SentDate IS NULL
LEFT JOIN dbo.PTable AS p1
ON jdr.RecordID = p1.RecordID
LEFT JOIN dbo.DataRecipient AS dr
ON jdr.RecipientID = dr.RecipientID
(more left joins here)
WHERE
dr.lastRecordID IS NOT NULL;
Query with condition in WHERE clause:
SELECT
(various columns here from each table)
FROM dbo.MainTable AS m
INNER JOIN dbo.JohnDataRecord AS jdr
ON m.ibID = jdr.ibID
AND jdr.SentDate IS NULL
LEFT JOIN dbo.PTable AS p1
ON jdr.RecordID = p1.RecordID
LEFT JOIN dbo.DataRecipient AS dr
ON jdr.RecipientID = dr.RecipientID
(more left joins here)
WHERE
m.MainID = #MainId -- question here
AND dr.lastRecordID IS NOT NULL;
Difference in other similar questions that are more general whereas this is specific to SQL Server.
In the scope of the question,
Is there any reason on an INNER JOIN to have a condition on the main
table vs in the WHERE clause?
This is a STYLE choice for the INNER JOIN.
From a pure style reflection point of view:
While there is no hard and fast rule for STYLE, it is generally observed that this is a less often used style choice. For example that might generally lead to more challenging maintenance such as if someone where to remove the INNER JOIN and all the subsequent ON clause conditions, it would effect the primary table result set, OR make the query more difficult to debug/understand when it is a very complex set of joins.
It might also be noted that this line might be placed on many INNER JOINS further adding to the confusion.
I have the following SQL I want to create with activerecord. My problem is that I am stuck in a logic loop where I can't LEFT OUTER JOIN a table which has yet to be joined, and I can't find my entry point to the join fiasco
in activerecord I am trying to do
AdMsgs.joins("LEFT OUTER JOIN shows ON ad_msgs.user_id = shows.id OR ad_msgs.user_id = shows.b_id ")
.joins("LEFT OUTER JOIN m ON m.user_id = users.id OR m.m_id = shops.id OR m.m_id = shows.b_id")
.joins("LEFT OUTER JOIN users ON ad_msgs.to = users.email OR ad_msgs.user_id = users.id OR users.id = m.user_id")
.where("shows.id = ?", self.id)
.distinct("ad_msgs.id")
the query outputs an error saying it doesn't know what users is on the second join (probably since I haven't joined it yet) but I need to select the m records according the the users
AdMsgs doesn't have an association with neither of the tables.
Is there a way to full outer join these 3 tables and then select the ones relevant (or any better ways?)
use find_by_sql to implement such scenarios.
otherwise as a rule of thumb, if you can't use joins like this Blog.joins(articles: :comments) you are probably doing something bad or use find_by_sql instead.
in a complex case, i'm writing my query in SQL first to verify the logic involved. often times it's trivial to replace one complex query with 2 simple ones (using IN(*ids)).
I have a query in my WordPress plugin like this:
SELECT users.*, U.`meta_value` AS first_name,M.`meta_value` AS last_name
FROM `nwp_users` AS users
LEFT JOIN `nwp_usermeta` U
ON users.`ID`=U.`user_id`
LEFT JOIN `nwp_usermeta` M
ON users.`ID`=M.`user_id`
LEFT JOIN `nwp_usermeta` C
ON users.`ID`=C.`user_id`
WHERE U.meta_key = 'first_name'
AND M.meta_key = 'last_name'
AND C.meta_key = 'nwp_capabilities'
ORDER BY users.`user_login` ASC
LIMIT 0,10
I'm new to using JOIN and I'm wondering how efficient it is to use so many JOIN in one query. Is it better to split it up into multiple queries?
The database schema can be found here.
JOIN usually isn't so bad if the keys are indexed. LEFT JOIN is almost always a performance hit and you should avoid it if possible. The difference is that LEFT JOIN will join all rows in the joined table even if the column you're joining is NULL. While a regular (straight) JOIN just joins the rows that match.
Post your table structure and we can give you a better query.
See this comment:
http://forums.mysql.com/read.php?24,205080,205274#msg-205274
For what it's worth, to find out what MySQL is doing and to see if you have indexed properly, always check the EXPLAIN plan. You do this by putting EXPLAIN before your query (literally add the word EXPLAIN before the query), then run it.
In your query, you have a filter AND C.meta_key = 'nwp_capabilities' which means that all the LEFT JOINs above it can be equally written as INNER JOINs. Because if the LEFT JOINS fail (LEFT OUTER is intended to preserve the results from the left side), the result will 100% be filtered out by the WHERE clause.
So a more optimal query would be
SELECT users.*, U.`meta_value` AS first_name,M.`meta_value` AS last_name
FROM `nwp_users` AS users
JOIN `nwp_usermeta` U
ON users.`ID`=U.`user_id`
JOIN `nwp_usermeta` M
ON users.`ID`=M.`user_id`
JOIN `nwp_usermeta` C
ON users.`ID`=C.`user_id`
WHERE U.meta_key = 'first_name'
AND M.meta_key = 'last_name'
AND C.meta_key = 'nwp_capabilities'
ORDER BY users.`user_login` ASC
LIMIT 0,10
(note: "JOIN" (alone) = "INNER JOIN")
Try explaining the query to see what is going on and if your select if optimized. If you haven't used explain before read some tutorials:
http://www.learn-mysql-tutorial.com/OptimizeQueries.cfm
http://www.databasejournal.com/features/mysql/article.php/1382791/Optimizing-MySQL-Queries-and-Indexes.htm
I've got a pretty complex query in MySQL that slows down drastically when one of the joins is done using an OR. How can I speed this up? the relevant join is:
LEFT OUTER JOIN publications p ON p.id = virtual_performances.publication_id
OR p.shoot_id = shoots.id
Removing either condition in the OR decreases the query time from 1.5s to 0.1s. There are already indexes on all the relevant columns I can think of. Any ideas? The columns in use all have indexes on them. Using EXPLAIN I've discovered that once the OR comes into play MySQL ends up not using any of the indexes. Is there a special kind of index I can make that it will use?
This is a common difficulty with MySQL. Using OR baffles the optimizer because it doesn't know how to use an index to find a row where either condition is true.
I'll try to explain: Suppose I ask you to search a telephone book and find every person whose last name is 'Thomas' OR whose first name is 'Thomas'. Even though the telephone book is essentially an index, you don't benefit from it -- you have to search through page by page because it's not sorted by first name.
Keep in mind that in MySQL, any instance of a table in a given query can make use of only one index, even if you have defined multiple indexes in that table. A different query on that same table may use another index if the optimizer reasons that it's more helpful.
One technique people have used to help in situations like your is to do a UNION of two simpler queries that each make use of separate indexes:
SELECT ...
FROM virtual_performances v
JOIN shoots s ON (...)
LEFT OUTER JOIN publications p ON (p.id = v.publication_id)
UNION ALL
SELECT ...
FROM virtual_performances v
JOIN shoots s ON (...)
LEFT OUTER JOIN publications p ON p.shoot_id = s.id;
Make two joins on the same table (adding aliases to separate them) for the two conditions, and see if that is faster.
select ..., coalesce(p1.field, p2.field) as field
from ...
left join publications p1 on p1.id = virtual_performances.publication_id
left join publications p2 on p2.shoot_id = shoots.id
You can also try something like this on for size:
SELECT * FROM tablename WHERE id IN
(SELECT p.id FROM tablename LEFT OUTER JOIN publications p ON p.id IN virtual_performances.publication_id)
OR
p.id IN
(SELECT p.id FROM tablename LEFT OUTER JOIN publications p ON p.shoot_id = shoots.id);
It's a bit messier, and won't be faster in every case, but MySQL is good at selecting from straight data sets, so repeating yourself isn't so bad.