This question already has answers here:
Odd INNER JOIN syntax and encapsulation
(1 answer)
Join statement order of operation
(1 answer)
Closed 9 years ago.
I came across this horrifyingly freakish SQL query today that another developer generated with the SQL Server query designer tool. I hate the query designer, but I'm stuck trying to figure out what it did. I've never seen syntax like it before and don't understand it. How does it work?
In particular it is is the multiple ON clauses joined together separate from the JOIN clauses that is throwing me off.
SELECT *
FROM dbo.tblDealStatus
RIGHT OUTER JOIN dbo.tblUser
RIGHT OUTER JOIN dbo.tblOwnerLocation
INNER JOIN dbo.tblOwner
INNER JOIN dbo.tblDeal
ON dbo.tblOwner.OwnerID = dbo.tblDeal.OwnerID
ON dbo.tblOwnerLocation.DealID = dbo.tblDeal.DealID
ON dbo.tblUser.UserID = dbo.tblDeal.CHK_Contact
LEFT OUTER JOIN dbo.tblCompany AS tblCompany_1
INNER JOIN dbo.tblParticipation
ON tblCompany_1.CompanyID = dbo.tblParticipation.CompanyID
ON dbo.tblDeal.ParticipationID = dbo.tblParticipation.ParticipationID
ON /*...
....so on and so forth...*/
First, I make it rule for clarity to never mix right and left joins in the same query. All right joins can be switched to left joins and that alone will make it easier to figure out what is going on.
Next abandon select *. It is never appropraite in a query with joins as you are returning the same data in two or more fields (the join fields) and that is wasteful of valuable network and database processing time.
I believe the wierd ONs are forcing the query to go in a particular order. They are bad and should not be used in my opinion as they are hard to maintain and hard for developers to understand as they are not common and are totally unneeded. Just reversing the right joins and putting the tables in the order you need to join them may fix this. If not you may need a few derived tables to get the right data. Note that in reversing it, you may need to change those inner joins to something else. Right now it is such a mess, it is highly likely it does not return the correct results. So in rewriting it, while you would want to see if your changes change the results, you would also want to use judgement to determine if the changes are fixes for a bad query or incorrect changes to translate the query into something maintainable.
If the developer who wrote this mess is still there, I would force him to rewrite in in more standard SQL and tell him he is forbidden to ever use the query designer again. This fails code review as far as I am concerned.
If I were rewriting this, I would look for the table that should be first in the query and work down from there. My personal guess right now is that it would be the tblDeal table but I don't know your data model so I could be wrong.
Related
I feel a bit stupid for asking this, because I'm sure there is a simple other type of join I should be using, however, I can't seem to find the answer, so I'm hoping one of you can point me in the right direction.
I have a big query in Access 2007 that pulls in records, but in some cases, I can't use an INNER JOIN on some tables, because linking records may not exist, so the main record rightly drops off. I can get round this problem by using IIF statements, checking if an entry exists first, but this makes the query terribly slow. I simplify the scenario below. Many thanks in advance:
Use a LEFT JOIN instead of an INNER JOIN.
I'm using a database that requires optimized queries and I'm wondering which one of those queries are the optimized one, I used a timer but the result are too close. so I do not have to clue which one to use.
QUERY 1:
SELECT A.MIG_ID_ACTEUR, A.FL_FACTURE_FSL , B.VAL_NOM,
B.VAL_PRENOM, C.VAL_CDPOSTAL, C.VAL_NOM_COMMUNE, D.CCB_ID_ACTEUR
FROM MIG_FACTURE A
INNER JOIN MIG_ACTEUR B
ON A.MIG_ID_ACTEUR= B.MIG_ID_ACTEUR
INNER JOIN MIG_ADRESSE C
ON C.MIG_ID_ADRESSE = B.MIG_ID_ADRESSE
INNER JOIN MIG_CORR_REF_ACTEUR D
ON A.MIG_ID_ACTEUR= D.MIG_ID_ACTEUR;
QUERY 2:
SELECT A.MIG_ID_ACTEUR, A.FL_FACTURE_FSL , B.VAL_NOM, B.VAL_PRENOM,
C.VAL_CDPOSTAL, C.VAL_NOM_COMMUNE, D.CCB_ID_ACTEUR
FROM MIG_FACTURE A , MIG_ACTEUR B, MIG_ADRESSE C, MIG_CORR_REF_ACTEUR D
WHERE A.MIG_ID_ACTEUR= B.MIG_ID_ACTEUR
AND C.MIG_ID_ADRESSE = B.MIG_ID_ADRESSE
AND A.MIG_ID_ACTEUR= D.MIG_ID_ACTEUR;
If you are asking whether it is more efficient to use the SQL 99 join syntax (a inner join b) or whether it is more efficient to use the older join syntax of listing the join predicates in the WHERE clause, it shouldn't matter. I'd expect that the query plans for the two queries would be identical. If the query plans are identical, performance will be identical. If the plans are not identical, that would generally imply that you had encountered a bug in the database's query parsing engine.
Personally, I'd use the SQL 99 syntax (query 1) both because it is more portable when you want to do an outer join and because it generally makes the query more readable and decreases the probability that you'll accidentally leave out a join condition. That's solely a readability and maintainability consideration, though, not a performance consideration.
First things first:
"I used a timer but the result are too close" -- This is actually not a good way to test performance. Databases have caches. The results you get back won't be comparable with a stopwatch. You have system load to contend with, caching, and a million other things that make that particular comparison worthless. Instead of that, try using EXPLAIN to figure out the execution plan. Use SHOW PROFILES and SHOW STATUS to see where and how the queries are spending time. Check last_query_cost. But don't check your stopwatch. That won't tell you anything.
Second: this question can't be answered with the info your provided. In point of fact the queries are identical (verify that with Explain) and simply boil down to implicit vs explicit joins. Doesn't make either one of them optimized though. Again, you need to dig into the join itself and see if it's making use of indices, for example, or if it's doing a lot temp tables or file sorts.
Optimizing the query is a good thing... but these two are the same. A stop watch won't help you. Use explain, show profiles, show status.... not a stop watch :-)
This question already has answers here:
INNER JOIN ON vs WHERE clause
(12 answers)
Closed 8 years ago.
These two SQL syntaxtes produces the same result, which one is better to use and why?
1st:
SELECT c.Id,c.Name,s.Id,s.Name,s.ClassId
FROM dbo.ClassSet c,dbo.StudentSet s WHERE c.Id=s.ClassId
2nd:
SELECT c.Id,c.Name,s.Id,s.Name,s.ClassId
FROM dbo.ClassSet c JOIN dbo.StudentSet s ON c.Id=s.ClassId
The 2:nd one is better.
The way youre joining in the first query in considered outdated. Avoid using , and use JOIN
"In terms of precedence, a JOIN's ON clause happens before the WHERE clause. This allows things like a LEFT JOIN b ON a.id = b.id WHERE b.id IS NULL to check for cases where there is NOT a matching row in b."
"Using , notation is similar to processing the WHERE and ON conditions at the same time"
Found the details about it here, MySQL - SELECT, JOIN
Read more about SQL standards
http://en.wikipedia.org/wiki/SQL-92
SELECT c.Id,c.Name,s.Id,s.Name,s.ClassId FROM dbo.ClassSet c JOIN dbo.StudentSet s ON c.Id=s.ClassId
Without any doubt the above one is better when comparing to your first one.In the precedence table "On" is sitting Second and "Where" is on fourth
But for the simpler query like you don't want to break your head like this, for project level "JOIN" is always recommended
Check this link Is a JOIN faster than a WHERE?
Answer by #MehrdadAfshari
Theoretically, no, it shouldn't be any faster. The query optimizer
should be able to generate an identical execution plan. However, some
DB engines can produce better execution plans for one of them (not
likely to happen for such a simple query but for complex enough ones).
You should test both and see (on your DB engine).
The second because it's more readable. That is all.
This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Is there something wrong with joins that don't use the JOIN keyword in SQL or MySQL?
Hi,
i'ave always retrieved data without joins...
but is there a benefit to one method over the other?
select * from a INNER JOIN b on a.a = b.b;
select a.*,b.* from a,b where a.a = b.b;
Thanks!
The first method using the INNER JOIN keyword is:
ANSI SQL standard
much cleaner and more expressive
Therefore, I always cringe when I see the second option used - it just bloats up your WHERE clause, and you can't really see at one glance how the tables are joined (on what fields).
Should you happen to forget one of the JOIN conditions in a long list of WHERE clause expressions, you suddenly get a messy cartesian product..... can't do that with the INNER JOIN keyword (you must express what field(s) to join on).
I'd say the biggest benefit is readability. The version with the explicitly named join types are much easier for me to comprehend.
You are using a different syntax for a JOIN, basically. As a matter of best practices, it is best to use the first syntax (explicit JOIN) because it is clearer what the intention of the query is and makes the code easier to maintain.
These are both joins. they are just two different syntactical representations for joins. The first one, (using the "Join" keyword, is the current ANSI Standard (as of 1992 I think).
In the case of inner joins only, the two differeent representations are functionally identical, but the latter ANSI SQL92 standard syntax is much moire readable, once you get used to it, because each individual join condition is associated with the pair of intermediate resultsets being joined together, In the older representation, the join conditions are all together, along with the overall queries' filter conditions, in the where clause, and it is not as clear which is which. This makes identifying bad join conditions (where for example, an unintended cartesian product will be generated) much more difficult.
But more important, perhaps, is that, when performing an outer Join, in certain scenarios, the older syntax is NOT equivilent, and in fact will generate the WRONG resultset.
You should transition to the newer syntax for all your queries.
You've always retrieved the data with joins. The second query is using old syntax, but in the background it is still join :)
This depends on the RDBMS but in the case of SQL server I understand that the utilizing the former syntax allows for better optimization. This is less of a SQL question and more of a vendor specific question.
You can also use the EXPLAIN (SQL Server: Query Execution Plan) type functions to help you understand if there is a difference. Each query is unique and I imagine that the stored statistics can (and will) alter the behavior.
SELECT * FROM TableA
INNER JOIN TableB
ON TableA.name = TableB.name
SELECT * FROM TableA, TableB
where TableA.name = TableB.name
Which is the preferred way and why?
Will there be any performance difference when keywords like JOIN is used?
Thanks
The second way is the classical way of doing it, from before the join keyword existed.
Normally the query processor generates the same database operations from the two queries, so there would be no difference in performance.
Using join better describes what you are doing in the query. If you have many joins, it's also better because the joined table and it's condition are beside each other, instead of putting all tables in one place and all conditions in another.
Another aspect is that it's easier to do an unbounded join by mistake using the second way, resulting in a cross join containing all combinations from the two tables.
Use the first one, as it is:
More explicit
Is the Standard way
As for performance - there should be no difference.
find out by using EXPLAIN SELECT …
it depends on the engine used, on the query optimizer, on the keys, on the table; on pretty much everything
In some SQL engines the second form (associative joins) is depreicated. Use the first form.
Second is less explicit, causes begginers to SQL to pause when writing code. Is much more difficult to manage in complex SQL due to the sequence of the join match requirement to match the WHERE clause sequence - they (squence in the code) must match or the results returned will change making the returned data set change which really goes against the thought that sequence should not change the results when elements at the same level are considered.
When joins containing multiple tables are created, it gets REALLY difficult to code, quite fast using the second form.
EDIT: Performance: I consider coding, debugging ease part of personal performance, thus ease of edit/debug/maintenance is better performant using the first form - it just takes me less time to do/understand stuff during the development and maintenance cycles.
Most current databases will optimize both of those queries into the exact same execution plan. However, use the first syntax, it is the current standard. By learning and using this join syntax, it will help when you do queries with LEFT OUTER JOIN and RIGHT OUTER JOIN. which become tricky and problematic using the older syntax with the joins in the WHERE clause.
Filtering joins solely using WHERE can be extremely inefficient in some common scenarios. For example:
SELECT * FROM people p, companies c WHERE p.companyID = c.id AND p.firstName = 'Daniel'
Most databases will execute this query quite literally, first taking the Cartesian product of the people and companies tables and then filtering by those which have matching companyID and id fields. While the fully-unconstrained product does not exist anywhere but in memory and then only for a moment, its calculation does take some time.
A better approach is to group the constraints with the JOINs where relevant. This is not only subjectively easier to read but also far more efficient. Thusly:
SELECT * FROM people p JOIN companies c ON p.companyID = c.id
WHERE p.firstName = 'Daniel'
It's a little longer, but the database is able to look at the ON clause and use it to compute the fully-constrained JOIN directly, rather than starting with everything and then limiting down. This is faster to compute (especially with large data sets and/or many-table joins) and requires less memory.
I change every query I see which uses the "comma JOIN" syntax. In my opinion, the only purpose for its existence is conciseness. Considering the performance impact, I don't think this is a compelling reason.