I was told in an interview that a right join is typically faster than a left join.
Is this true?
That depends on the RDBMS of course but in general there is no reason for that to be true. A right join can easily be rewritten to a left join automatically. So if it was true the query optimizer, even a primitive one, could do that transformation.
Semantically, you don't have a choice anyway for correctness reasons so you don't get to pick.
There is one case where this is generally true, though. When you have a data warehouse style query like this:
select aggregates...
from Facts
left join Dim1 on ...
left join Dim2 on ...
left join Dim3 on ...
left join Dim4 on ...
group by ...
You want to get a hash join plan with physical right joins. Left joins would use the huge Facts table to build a hash table which is terrible. You rather want to build small hash tables from the dimension inputs and then stream the huge Facts table through those hash tables by probing into them.
Of course all good query optimizers do that for you (at least in databases that are meant for DW use).
Related
I have a collection which can be of several types: Cat, Dog, Bird.
If it is a Cat then I need to join with Cat-related tables, same with Dog and Bird.
I end up with quite a lot of LEFT JOIN-s and when tables have lots of records the performance impacted.
SELECT Animal.*, CatDetail1.*, CatDetail2.*, DogDetail1.*, DogDetail2.*, BirdDetail1.*, BirdDetail2.*
FROM Animal
LEFT JOIN CatDetail1 on CatDetail1.id = Animal.id
LEFT JOIN CatDetail2 on CatDetail1.id = Animal.id
LEFT JOIN DogDetail1 on DogDetail1.id = Animal.id
LEFT JOIN DogDetail2 on DogDetail2.id = Animal.id
LEFT JOIN BirdDetail1 on BirdDetail1.id = Animal.id
LEFT JOIN BirdDetail2 on BirdDetail2.id = Animal.id
ORDER BY Animal.sequence
I was thinking a View might make it run faster but there is no official documentation supporting that.
Is there a way to reduce the LEFT JOIN, and use more INNER JOIN to improve performance?
If you need an outer join, you cannot use an inner join. And you clearly need an outer join here.
There is no better way to write that query, and there is no substantially different way to model a situation where you want different types of objects stored in one table.
If that query is slow, that is not surprising. After all, you want everything from seven tables, and want the result sorted. If the tables are large, that will take a while.
A view won't make any difference here, since a view is just a named SQL statement, and the view name will be replaced by its definition when the query is executed.
The question you should ask yourself is if you really need everything from seven tables. Perhaps you don't need all columns? Perhaps you don't need all rows?
How to optimize the speed of SQL queries looking like this:
select ... from TABLE
left join TABLE2 on TABLE2.COL2 = TABLE.COL
left join TABLE3 on TABLE3.COL2 = TABLE2.COL
etc.
I am asking from a SQL (precisely Postgres) point of view, e.g.: does the order of the joins matter? Do subqueries or CTE help? Does the type of join matter?
I am not asking from a database implementation point of view, e.g. indexes, tablespaces, configuration variables, etc.
In theory the order of the joins should not matter since the built-in query optimizer should put the joins that limit more the volume of the result-set before those that has less effect on the volume.
However in my practice I learned that it is always best to try to help the performance as much as you can and put the more restrictive joins before the less restrictive ones.
So generally speaking the less you relay on the query optimizer the better will be the performance in the edge cases.
Here you can learn more about the query optimizer: http://www.postgresql.org/docs/9.1/static/runtime-config-query.html#RUNTIME-CONFIG-QUERY-GEQO
As a rule of the thumb using join should be faster than CTE or sub-queries, but this is just a rule and exceptions are still possible.
Also some of the problems need both joins and CTE.
This is kind of killing question: Does the type of join matter?
Yes it does! Actually this matters most of all! :)
Here you can see the idea behind the different join types: http://en.wikipedia.org/wiki/Join_(SQL)
For the left and right join these 2 statements are equal:
... table1 LEFT JOIN table2 ...
... table2 RIGHT JOIN table1 ...
Right and left outer joins are functionally equivalent. Neither provides any functionality that the other does not, so right and left outer joins may replace each other as long as the table order is switched.
Which is better in between joining a table or selecting from multiple tables ?
For instance, lets assume the following similar scenario:
Using join:
SELECT COALESCE(SUM(SALARY),0) FROM X
JOIN Y ON X.X_ID=Y.Y_X_ID
OR
By selecting from multiple tables
SELECT COALESCE(SUM(SALARY),0) FROM X, Y
WHERE X.X_ID=Y.Y_X_ID
Both are joins. The first is an explicit join and the second one is an implicit join and is a SQL antipattern.
The second one is bad because it is easy to get an accidental cross join. It is also bad becasue when you want a cross join, it is not clear if your did want that or if you have an accidental one.
Further in the second style if you ned to convert to an outer join, you need to change all joins in the query or risk getting incorrect results. So the second style is harder to maintain.
Explcit joins were institututed in the last century, why anyone is still using error-prone and hard to maintain implicit joins is beyond me.
mainly join is used to retrieve data from multiple tables
so in sql there are 3 types join are available
Equi join-inner join
outer join-left
right
full
Non equi join
Self join
Cross join
You should use the JOIN syntax for a lot of reasons which can be found here.
Moreover this syntax has the advantage to give some hints to the query optimizer (during the computation of weights, weights computed directly by the facts mentionned in this syntax are more favorably weighted than the others).
JOIN, LEFT JOIN, RIGHT JOIN, FULL JOIN?
I'm guessing the size of the datasets on each side of the join may make LEFT vs RIGHT a hard call, but how do the others compare.
Also am I correct in assuming JOIN & INNER JOIN are one and the same? If not, how does this fit into the order/ranking.
Yes, JOIN and INNER JOIN are the same. In general the ranking is JOIN is fastest, followed closely by LEFT JOIN which is equivalent to RIGHT JOIN, and then followed very far in the distance by FULL JOIN.
But this ranking is so variable that it can be largely ignored. Your actual performance is highly dependent upon the size of the datasets, availability of proper indexes, and exact query plan chosen. One LEFT JOIN may be fast and the next INNER JOIN might be glacially slow.
That notwithstanding, I would advise avoiding FULL JOIN unless you absolutely need it. (At least in Oracle, which is where I've had bad experiences with it.)
INNER is an optional word when INNER JOIN is desired => so they are one and the same. This is the same as the word OUTER being optional in LEFT/RIGHT/FULL OUTER JOIN
In terms of efficiency, it completely depends on what else is happening. If it is a LEFT JOIN with a IS NOT NULL test on the right side (anti-semi join) then it is very efficient and works like an EXISTS clause.
Absent other factors, and considering only
SELECT .. FROM A X-JOIN B ON <condition>
If results need to be preserved from A, B or Both, then efficiency is not a factor. You need a LEFT/RIGHT/FULL join because it provides the correct results
If you need results that match on both sides, and not all data is available from either side, then same as the above, you need an INNER JOIN.
Only if the join is bound to find rows on both sides, then LEFT/RIGHT/FULL join becomes an option. In most cases, the INNER JOIN will be faster because it gives the optimizer the option to start from the smaller table (or better indexed) and hash match to the larger table.
"in most cases" in Point #3 because different RDBMS may optimize queries differently.
Ranking them for efficiency would be pointless, as they return different results. If you need a left join, an inner join won't do the job.
Efficiency in a join has more to with the size of the tables, the indexing, and how the rest of the query is written than whether it is an INNER, OUTER, CROSS or FUll JOIN. A CROSS JOIN on two small tables might be fast but a INNER join on two large tables with a WHERE clause that is not sargable would not be.
Doing some refactoring in some legacy code I've found in a project. This is for MSSQL. The thing is, i can't understand why we're using mixed left and right joins and collating some of the joining conditions together.
My question is this: doesn't this create implicit inner joins in some places and implicit full joins in others?
I'm of the school that just about anything can be written using just left (and inner/full) or just right (and inner/full) but that's because i like to keep things simple where possible.
As an aside, we convert all this stuff to work on oracle databases as well, so maybe there's some optimization rules that work differently with Ora?
For instance, here's the FROM part of one of the queries:
FROM Table1
RIGHT OUTER JOIN Table2
ON Table1.T2FK = Table2.T2PK
LEFT OUTER JOIN Table3
RIGHT OUTER JOIN Table4
LEFT OUTER JOIN Table5
ON Table4.T3FK = Table5.T3FK
AND Table4.T2FK = Table5.T2FK
LEFT OUTER JOIN Table6
RIGHT OUTER JOIN Table7
ON Table6.T6PK = Table7.T6FK
LEFT OUTER JOIN Table8
RIGHT OUTER JOIN Table9
ON Table8.T8PK= Table9.T8FK
ON Table7.T9FK= Table9.T9PK
ON Table4.T7FK= Table7.T7PK
ON Table3.T3PK= Table4.T3PK
RIGHT OUTER JOIN ( SELECT *
FROM TableA
WHERE ( TableA.PK = #PK )
AND ( TableA.Date BETWEEN #StartDate
AND #EndDate )
) Table10
ON Table4.T4PK= Table10.T4FK
ON Table2.T2PK = Table4.T2PK
One thing I would do is make sure you know what results you are expecting before messing with this. Wouldn't want to "fix" it and have different results returned. Although honestly, with a query that poorly designed, I'm not sure that you are actually getting correct results right now.
To me this looks like something that someone did over time maybe even originally starting with inner joins, realizing they wouldn't work and changing to outer joins but not wanting to bother changing the order the tables were referenced in the query.
Of particular concern to me for maintenance purposes is to put the ON clauses next to the tables you are joining as well as converting all the joins to left joins rather than mixing right and left joins. Having the ON clause for table 4 and table 3 down next to table 9 makes no sense at all to me and should contribute to confusion as to what the query should actually return. You may also need to change the order of the joins in order to convert to all left joins. Personally I prefer to start with the main table that the others will join to (which appears to be table2) and then work down the food chain from there.
It could probably be converted to use all LEFT joins: I'd be looking and moving the right-hand table in each RIGHT to be above all the existing LEFTs, then you might be able to then turn every RIGHT join into a LEFT join. I'm not sure you'll get any FULL joins behind the scenes -- if the query looks like it is, it might be a quirk of this specific query rather than a SQL Server "rule": that query you've provided does seem to be mixing it up in a rather confusing way.
As for Oracle optimisation -- that's certainly possible. No experience of Oracle myself, but speaking to a friend who's knowledgeable in this area, Oracle (no idea what version) is/was fussy about the order of predicates. For example, with SQL Server you can write your way clause so that columns are in any order and indexes will get used, but with Oracle you end up having to specify the columns in the order they appear in the index in order to get best performance with the index. As stated - no idea if this is the case with newer Oracle's, but was the case with older ones (apparently).
Whether this explains this particular construction, I can't say. It could simply be less-thean-optimal code if it's changed over the years and a clean-up is what it's begging for.
LEFT and RIGHT join are pure syntax sugar.
Any LEFT JOIN can be transformed into a RIGHT JOIN merely by switching the sets.
Pre-9i Oracle used this construct:
WHERE table1.col(+) = table2.col
, (+) here denoting the nullable column, and LEFT and RIGHT joins could be emulated by mere switching:
WHERE table1.col = table2.col(+)
In MySQL, there is no FULL OUTER JOIN and it needs to be emulated.
Ususally it is done this way:
SELECT *
FROM table1
LEFT JOIN
table2
ON table1.col = table2.col
UNION ALL
SELECT *
FROM table1
RIGHT JOIN
table2
ON table1.col = table2.col
WHERE table1.col IS NULL
, and it's more convenient to copy the JOIN and replace LEFT with RIGHT, than to swap the tables.
Note that in SQL Server plans, Hash Left Semi Join and Hash Right Semi Join are different operators.
For the query like this:
SELECT *
FROM table1
WHERE table1.col IN
(
SELECT col
FROM table2
)
, Hash Match (Left Semi Join) hashes table1 and removes the matched elements from the hash table in runtime (so that they cannot match more than one time).
Hash Match (Right Semi Join) hashes table2 and removes the duplicate elements from the hash table while building it.
I may be missing something here, but the only difference between LEFT and RIGHT joins is which order the source tables were written in, and so having multiple LEFT joins or multiple RIGHT joins is no different to having a mix. The equivalence to FULL OUTERs could be achieved just as easily with all LEFT/RIGHT than with a mix, n'est pas?
We have some LEFT OUTER JOINs and RIGHT OUTER JOINs in the same query. Typically such queries are large, have been around a long time, probably badly written in the first place and have received infrequent maintenance. I assume the RIGHT OUTER JOINs were introduced as a means of maintaining the query without taking on the inevitable risk when refactoring a query significantly.
I think most SQL coders are most confortable with using all LEFT OUTER JOINs, probably because a FROM clause is read left-to-right in the English way.
The only time I use a RIGHT OUTER JOIN myself is when when writing a new query based on an existing query (no need to reinvent the wheel) and I need to change an INNER JOIN to an OUTER JOIN. Rather than change the order of the JOINs in the FROM clause just to be able to use a LEFT OUTER JOIN I would instead use a RIGHT OUTER JOIN and this would not bother me. This is quite rare though. If the original query had LEFT OUTER JOINs then I'd end up with a mix of LEFT- and RIGHT OUTER JOINs, which again wouldn't bother me. Hasn't happened to me yet, though.
Note that for SQL products such as the Access database engine that do not support FULL OUTER JOIN, one workaround is to UNION a LEFT OUTER JOIN and a RIGHT OUTER JOIN in the same query.
The bottom line is that this is a very poorly formatted SQL statement and should be re-written. Many of the ON clauses are located far from their JOIN statements, which I am not sure is even valid SQL.
For clarity's sake, I would rewrite the query using all LEFT JOINS (rather than RIGHT), and locate the using statements underneath their corresponding JOIN clauses. Otherwise, this is a bit of a train wreck and is obfuscating the purpose of the query, making errors during future modifications more likely to occur.
doesn't this create implicit inner
joins in some places and implicit full
joins in others?
Perhaps you are assuming that because you don't see the ON clause for some joins, e.g., RIGHT OUTER JOIN Table4, but it is located down below, ON Table4.T7FK= Table7.T7PK. I don't see any implicit inner joins, which could occur if there was a WHERE clause like WHERE Table3.T3PK is not null.
The fact that you are asking questions like this is a testament to the opaqueness of the query.
To answer another portion of this question that hasn't been answered yet, the reason this query is formatted so oddly is that it's likely built using the Query Designer inside SQL Management Studio. The give away is the combined ON clauses that happen many lines after the table is mentioned. Essentially tables get added in the build query window and the order is kept even if that way things are connected would favor moving a table up, so to speak, and keeping all the joins a certain direction.