Optimizing for an OR in a Join in MySQL - sql

I've got a pretty complex query in MySQL that slows down drastically when one of the joins is done using an OR. How can I speed this up? the relevant join is:
LEFT OUTER JOIN publications p ON p.id = virtual_performances.publication_id
OR p.shoot_id = shoots.id
Removing either condition in the OR decreases the query time from 1.5s to 0.1s. There are already indexes on all the relevant columns I can think of. Any ideas? The columns in use all have indexes on them. Using EXPLAIN I've discovered that once the OR comes into play MySQL ends up not using any of the indexes. Is there a special kind of index I can make that it will use?

This is a common difficulty with MySQL. Using OR baffles the optimizer because it doesn't know how to use an index to find a row where either condition is true.
I'll try to explain: Suppose I ask you to search a telephone book and find every person whose last name is 'Thomas' OR whose first name is 'Thomas'. Even though the telephone book is essentially an index, you don't benefit from it -- you have to search through page by page because it's not sorted by first name.
Keep in mind that in MySQL, any instance of a table in a given query can make use of only one index, even if you have defined multiple indexes in that table. A different query on that same table may use another index if the optimizer reasons that it's more helpful.
One technique people have used to help in situations like your is to do a UNION of two simpler queries that each make use of separate indexes:
SELECT ...
FROM virtual_performances v
JOIN shoots s ON (...)
LEFT OUTER JOIN publications p ON (p.id = v.publication_id)
UNION ALL
SELECT ...
FROM virtual_performances v
JOIN shoots s ON (...)
LEFT OUTER JOIN publications p ON p.shoot_id = s.id;

Make two joins on the same table (adding aliases to separate them) for the two conditions, and see if that is faster.
select ..., coalesce(p1.field, p2.field) as field
from ...
left join publications p1 on p1.id = virtual_performances.publication_id
left join publications p2 on p2.shoot_id = shoots.id

You can also try something like this on for size:
SELECT * FROM tablename WHERE id IN
(SELECT p.id FROM tablename LEFT OUTER JOIN publications p ON p.id IN virtual_performances.publication_id)
OR
p.id IN
(SELECT p.id FROM tablename LEFT OUTER JOIN publications p ON p.shoot_id = shoots.id);
It's a bit messier, and won't be faster in every case, but MySQL is good at selecting from straight data sets, so repeating yourself isn't so bad.

Related

Access SQL subquery access field from parent

I have a SQL query that works in Access 2016:
SELECT
Count(*) AS total_tests,
Sum(IIf(score>=securing_threshold And score<mastering_threshold,1,0)) AS total_securing,
Sum(IIf(score>=mastering_threshold,1,0)) AS total_mastering,
total_securing/Count(*) AS percent_securing,
total_mastering/Count(*) AS percent_mastering,
(Count(*)-total_securing-total_mastering)/Count(*) AS percent_below,
subjects.subject,
students.year_entered,
IIf(Month(Date())<9,Year(Date())-students.year_entered+6,Year(Date())-students.year_entered+7) AS current_form,
groups.group
FROM
((subjects
INNER JOIN tests ON subjects.ID = tests.subject)
INNER JOIN (students
INNER JOIN test_results ON students.ID = test_results.student) ON tests.ID = test_results.test)
LEFT JOIN
(SELECT * FROM group_membership LEFT JOIN groups ON group_membership.group = groups.ID) As g
ON students.ID = g.student
GROUP BY subjects.subject, students.year_entered, groups.group;
However, I wish to filter out irrelevant groups before joining them to my table. The table groups has a column subject which is a foreign key.
When I try changing ON students.ID = g.student to ON students.ID = g.student And subjects.ID = g.subject I get the error 'JOIN expression not supported'.
Alternatively, when I try adding WHERE subjects.ID = groups.subject to the subquery, it asks me for the parameter value of subjects.ID, although it is a column in the parent query.
Googling reveals many similar errors but they were all resolved by changing the brackets. That didn't help.
Just in case the table relationships help:
Thank you.
EDIT: Sample database at https://www.dropbox.com/s/yh80oooem6gsni7/student%20tracker.ACCDB?dl=0
MS Access queries with many joins are difficult to update by SQL alone as parenetheses pairings are required unlike other RDBMS's and these pairings must follow an order. Moreover, some pairings can even be nested. Hence, for beginners it is advised to build queries with complex and many joins using the query design GUI in the MS Office application and let it build out the SQL.
For a simple filter on the g derived table, you could filter subject on the derived table, g, but likely you want want all subjects:
...
(SELECT * FROM group_membership
LEFT JOIN groups ON group_membership.group = groups.ID
WHERE groups.subject='Earth Science') As g
...
So for all subjects, consider re-building query from scratch in GUI that nearly mirrors your table relationships which actually auto-links joins in the GUI. Then, drop unneeded tables.
Usually you want to begin with the join table or set like groups and group_membership or tests and test_results. In fact, consider saving the g derived table as its own query.
Then add the distinct record primary source tables like students and subjects.
You may even need to play around with order in FROM and JOIN clauses to attain desired results, and maybe even add the same table in query. And be careful with adding join tables like group_membership (two one-to-many links), to GROUP BY queries as it leads to the duplicate record aggregation. So you may need to join aggregates queries by subject.
Unless you can post content of all tables, from our perspective it is difficult to help from here.
Your subquery g uses a LEFT JOIN, but there is a enforced 1:n relation between the two tables, so there will always be a matching group. Use a INNER JOIN instead.
With g.subject you are trying to join on a column that is on the right side of a left join, that cannot really work.
Also you shouldn't use SELECT * on a join of tables with identical column names. Include only the qualified column names that you need.
LEFT JOIN
(SELECT group_membership.student, groups.group, groups.subject
FROM group_membership INNER JOIN groups
ON group_membership.group = groups.ID) As g
ON (students.ID = g.student AND subjects.ID = g.subject)
I would call the columns in group_membership group_ID and student_ID to avoid confusion.
I don't have the database to test, but I would use subject table as subquery:
(SELECT * FROM subject WHERE filter out what you don't need) Subj
Then INNER JOIN this new Subj Table in your query which would exclude irrelevant groups.
Also I would never create join in WHERE clause (WHERE subjects.ID = groups.subject), what this does it creates cartesian product (table with all the possible combinations of subjects.ID and groups.subject) then it filters out records to satisfy your join. When dealing with huge data it might take forever or crash.
Error related to "Join expression may not be supported"; do datatypes match in those fields?
I solved it by (a lot of trial and error and) taking the advice here to make queries in the GUI and joining them. The end result is 4 queries deep! If the database was bigger, performance would be awful... but now its working I can tweak it eventually.
Thank you everybody!

Sub query select statment vs inner join

I'm confused about these two statements who's faster and more common to use and best for memory
select p.id, p.name, w.id, w.name
from person p
inner join work w on w.id = p.wid
where p.id in (somenumbers)
vs
select p.id, p.name, (select id from work where id=p.wid) , (select name from work where id=p.wid)
from person p
where p.id in (somenumbers)
The whole idea of this is that if I have I huge database and I want to make inner join it will take memory and less performance to johin work table and person table but the sub query select statments it will only select one statment at the time so which is the best here
First, the two queries are not the same. The first filters out any rows that have no matching rows in work.
The equivalent first query uses a left join:
select p.id, p.name, w.id, w.name
from person p left join
work w
on w.id = p.wid
where p.id in (somenumbers);
Then, the second query can be simplified to:
select p.id, p.name, p.wid,
(select name from work where w.id = p.wid)
from person p
where p.id in (somenumbers);
There is no reason to look up the id in work when it is already present in person.
If you want optimized queries, then you want indexes on person(id, wid, name) and work(id, name).
With these indexes, the two queries should have basically the same performance. The subquery will use the index on work for fetching the rows from work and the where clause will use the index on person. Either query should be fast and scalable.
The subqueries in your second example will execute once for every row, which will perform badly. That said, some optimizers may be able to convert it to a join for you - YMMV.
A good rule to follow in general is: much prefer joins to subqueries.
joins give better performance as comparison with sub-query .if there is join on Int column or have index on join column gives best performance .
select p.id, p.name, w.id, w.name
from person p
inner join work w on w.id = p.wid
where p.id in (somenumbers)
It really depends on how you want to optimaze the query (includie but not limited to add/removing/reordering the index),
I found the setup which makes join soars might let subquery suffer, the opposite may also be true. Thus there is not that much point to compare them with the same setup.
I choose to use and optimize with join. In my experince join at its best condition setup, rarely loses to subquery, but a lot eaiser to read.
When the vendor stuff an extreme load of queries with subqueries to the system. Unless the performance start to crawl, due to my other work's query optimization, it simply doesn't worth the effort to change them.

Excluding rows from result set, LEFT JOIN and EXCEPT

When you have two tables, and want to exclude rows from the second one, there are a multitude of options including EXISTS, NOT IN, LEFT JOIN and EXCEPT.
I've always used left join:
select N.ProductID from NewProducts N
left join Products P on P.ProductID = N.ProductID
where P.ProductID is null
Now I'm thinking it's cleaner to to use EXCEPT:
select ProductID from NewProducts
except
select ProductID from Products
Are there performance issues of using EXCEPT?
You can check execution plan and SQL profiler to choose the suitable query.
But, for me, NOT EXISTS is good. Reference here
The answer to your question is all up to you, depending on how large the data.
You can use any of that (EXISTS, NOT IN, LEFT JOIN and EXCEPT.) depending on your requirement.
you said that you always use LEFT JOIN , and that is good.. because joining the two tables will minimize the execution time of the query, especially when you are holding large amount of data.
JOIN is advisable but it is always depends on you.
You can see the difference of execution time using the execution plan of sql.

SQL Joins on varchar fields timing out

I have a join which deletes rows that match another table but the joining fields have to be a large varchar (250 chars). I know this isn't ideal but I can't think of a better way. Here's my query:
DELETE P
FROM dbo.FeedPhotos AS P
INNER JOIN dbo.ListingPhotos AS P1 ON P.photo = P1.feedImage
INNER JOIN dbo.Listings AS L ON P.accountID = L.accountID
WHERE P.feedID = #feedID
This query is constantly timing out even though there are less than 1000 rows in the ListingPhotos table.
Any help would be appreciated.
I'd probably start by removing this line, as it doesn't seem to be doing anything:
INNER JOIN dbo.Listings AS L ON P.accountID = L.accountID
There might not be a lot of rows in ListingPhotos, but if there are a lot of rows in Listings then the join won't be optimized out.
Also check your indexing, as any join is bound to be slow without the appropriate indexes. Although you should generally try to avoid joining on character fields anyway, it's usually a sign that the data is not normalized properly.
I would consider:
rewriting to use EXISTS. This will stop processing if one row is found more reliably then relying on JOIN which may have many more intermediate rows (which is what Aaronaught said)
ensure all datatypes match exactly. All differences in length or type will mean no indexes will be used
speaking of which, do you have an index (rough guess) on feedid, photo and accountid?
Something like:
DELETE
P
FROM
dbo.FeedPhotos AS P
WHERE
P.feedID = #feedID
AND
EXISTS (SELECT * FROM
dbo.ListingPhotos P1
WHERE P.photo = P1.feedImage)
AND
EXISTS (SELECT * FROM
dbo.Listings L
WHERE P.accountID = L.accountID)
Simply add an index.
CREATE INDEX idx_feedPhotos_feedid
ON dbo.FeedPhotos (feedId)

Does "Right Outer Join" have any useful purpose?

I use INNER JOIN and LEFT OUTER JOINs all the time. However, I never seem to need RIGHT OUTER JOINs, ever.
I've seen plenty of nasty auto-generated SQL that uses right joins, but to me, that code is impossible to get my head around. I always need to rewrite it using inner and left joins to make heads or tails of it.
Does anyone actually write queries using Right joins?
In SQL Server one edge case where I have found right joins useful is when used in conjunction with join hints.
The following queries have the same semantics but differ in which table is used as the build input for the hash table (it would be more efficient to build the hash table from the smaller input than the larger one which the right join syntax achieves)
SELECT #Large.X
FROM #Small
RIGHT HASH JOIN #Large ON #Small.X = #Large.X
WHERE #Small.X IS NULL
SELECT #Large.X
FROM #Large
LEFT HASH JOIN #Small ON #Small.X = #Large.X
WHERE #Small.X IS NULL
Aside from that (product specific) edge case there are other general examples where a RIGHT JOIN may be useful.
Suppose that there are three tables for People, Pets, and Pet Accessories. People may optionally have pets and these pets may optionally have accessories
CREATE TABLE Persons
(
PersonName VARCHAR(10) PRIMARY KEY
);
INSERT INTO Persons
VALUES ('Alice'),
('Bob'),
('Charles');
CREATE TABLE Pets
(
PetName VARCHAR(10) PRIMARY KEY,
PersonName VARCHAR(10)
);
INSERT INTO Pets
VALUES ('Rover',
'Alice'),
('Lassie',
'Alice'),
('Fifi',
'Charles');
CREATE TABLE PetAccessories
(
AccessoryName VARCHAR(10) PRIMARY KEY,
PetName VARCHAR(10)
);
INSERT INTO PetAccessories
VALUES ('Ball', 'Rover'),
('Bone', 'Rover'),
('Mouse','Fifi');
If the requirement is to get a result listing all people irrespective of whether or not they own a pet and information about any pets they own that also have accessories.
This doesn't work (Excludes Bob)
SELECT P.PersonName,
Pt.PetName,
Pa.AccessoryName
FROM Persons P
LEFT JOIN Pets Pt
ON P.PersonName = Pt.PersonName
INNER JOIN PetAccessories Pa
ON Pt.PetName = Pa.PetName;
This doesn't work (Includes Lassie)
SELECT P.PersonName,
Pt.PetName,
Pa.AccessoryName
FROM Persons P
LEFT JOIN Pets Pt
ON P.PersonName = Pt.PersonName
LEFT JOIN PetAccessories Pa
ON Pt.PetName = Pa.PetName;
This does work (but the syntax is much less commonly understood as it requires two ON clauses in succession to achieve the desired logical join order)
SELECT P.PersonName,
Pt.PetName,
Pa.AccessoryName
FROM Persons P
LEFT JOIN Pets Pt
INNER JOIN PetAccessories Pa
ON Pt.PetName = Pa.PetName
ON P.PersonName = Pt.PersonName;
All in all probably easiest to use a RIGHT JOIN
SELECT P.PersonName,
Pt.PetName,
Pa.AccessoryName
FROM Pets Pt
JOIN PetAccessories Pa
ON Pt.PetName = Pa.PetName
RIGHT JOIN Persons P
ON P.PersonName = Pt.PersonName;
Though if determined to avoid this another option would be to introduce a derived table that can be left joined to
SELECT P.PersonName,
T.PetName,
T.AccessoryName
FROM Persons P
LEFT JOIN (SELECT Pt.PetName,
Pa.AccessoryName,
Pt.PersonName
FROM Pets Pt
JOIN PetAccessories Pa
ON Pt.PetName = Pa.PetName) T
ON T.PersonName = P.PersonName;
SQL Fiddles: MySQL, PostgreSQL, SQL Server
It depends on what side of the join you put each table.
If you want to return all rows from the left table, even if there are no matches in the right table... you use left join.
If you want to return all rows from the right table, even if there are no matches in the left table, you use right join.
Interestingly enough, I rarely used right joins.
You usually use RIGHT OUTER JOINS to find orphan items in other tables.
No, I don't for the simple reason I can accomplish everything with inner or left joins.
I only use left, but let me say they are really the same depending how how you order things. I worked with some people that only used right, becasue they built queries from the inside out and liked to keep their main items at the bottom thus in their minds it made sense to only use right.
I.e.
Got main thing here
need more junk
More Junk right outer join Main Stuff
I prefer to do main stuff then junk... So left outer works for me.
So whatever floats your boat.
The only time I use a Right outer join is when I am working on an existing query and need to change it (normally from an inner). I could reverse the join and make it a left and probably be ok, but I try and reduce the amount of things I change when I modify code.
Our standard practice here is to write everything in terms of LEFT JOINs if possible. We've occasionally used FULL OUTER JOINs if we've needed to, but never RIGHT JOINs.
You can accomplish the same thing using LEFT or RIGHT joins. Generally most people think in terms of a LEFT join probably because we read from left to right. It really comes down to being consistent. Your team should focus on using either LEFT or RIGHT joins, not both, as they are essentially the same exact thing, written differently.
Rarely, as stated you can usually reorder and use a left join. Also I naturally tend to order the data, so that left joins work for getting the data I require. I think the same can be said of full outer and cross joins, most people tend to stay away from them.