In MySQL queries, why use join instead of where? - sql

It seems like to combine two or more tables, we can either use join or where. What are the advantages of one over the other?

Any query involving more than one table requires some form of association to link the results from table "A" to table "B". The traditional (ANSI-89) means of doing this is to:
List the tables involved in a comma separated list in the FROM clause
Write the association between the tables in the WHERE clause
SELECT *
FROM TABLE_A a,
TABLE_B b
WHERE a.id = b.id
Here's the query re-written using ANSI-92 JOIN syntax:
SELECT *
FROM TABLE_A a
JOIN TABLE_B b ON b.id = a.id
From a Performance Perspective:
Where supported (Oracle 9i+, PostgreSQL 7.2+, MySQL 3.23+, SQL Server 2000+), there is no performance benefit to using either syntax over the other. The optimizer sees them as the same query. But more complex queries can benefit from using ANSI-92 syntax:
Ability to control JOIN order - the order which tables are scanned
Ability to apply filter criteria on a table prior to joining
From a Maintenance Perspective:
There are numerous reasons to use ANSI-92 JOIN syntax over ANSI-89:
More readable, as the JOIN criteria is separate from the WHERE clause
Less likely to miss JOIN criteria
Consistent syntax support for JOIN types other than INNER, making queries easy to use on other databases
WHERE clause only serves as filtration of the cartesian product of the tables joined
From a Design Perspective:
ANSI-92 JOIN syntax is pattern, not anti-pattern:
The purpose of the query is more obvious; the columns used by the application is clear
It follows the modularity rule about using strict typing whenever possible. Explicit is almost universally better.
Conclusion
Short of familiarity and/or comfort, I don't see any benefit to continuing to use the ANSI-89 WHERE clause instead of the ANSI-92 JOIN syntax. Some might complain that ANSI-92 syntax is more verbose, but that's what makes it explicit. The more explicit, the easier it is to understand and maintain.

These are the problems with using the where syntax (other wise known as the implicit join):
First, it is all too easy to get accidental cross joins because the join conditions are not right next to the table names. If you have 6 tables being joined together, it is easy to miss one in the where clause. You will see this fixed all too often by using the distinct keyword. This is ahuge performance hit for the database. You can't get an accidental cross join using the explicit join syntax as it will fail the syntax check.
Right and left joins are problematic (In SQl server you are not guaranteed to get the correct results) in the old syntax in some databases. Further they are deprecated in SQL Server I know.
If you intend to use a cross join, that is not clear from the old syntax. It is clear using the current ANSII standard.
It is much harder for the maintainer to see exactly which fields are part of the join or even which tables join together in what order using the implicit syntax. This means it might take more time to revise the queries. I have known very few people who, once they took the time to feel comfortable with the explicit join syntax, ever went back to the old way.
I've also noticed that some people who use these implicit joins don't actually understand how joins work and thus are getting incorrect results in their queries.
Honestly, would you use any other kind of code that was replaced with a better method 18 years ago?

Most people tend to find the JOIN syntax a bit clearer as to what is being joined to what. Additionally, it has the benefit of being a standard.
Personally, I "grew up" on WHEREs, but the more I use the JOIN syntax the more I'm starting to see how it's more clear.

Explicit joins convey intent, leaving the where clause to do the filtering. It is cleaner and it is standard, and you can do things such as left outer or right outer which is harder to do only with where.

You can't use WHERE to combine two tables. What you can do though is to write:
SELECT * FROM A, B
WHERE ...
The comma here is equivalent to writing:
SELECT *
FROM A
CROSS JOIN B
WHERE ...
Would you write that? No - because it's not what you mean at all. You don't want a cross join, you want an INNER JOIN. But when you write comma, you're saying CROSS JOIN and that's confusing.

Actually you often need both "WHERE" and "JOIN".
"JOIN" is used to retrieve data from two tables - based ON the values of a common column. If you then want to further filter this result, use the WHERE clause.
For example, "LEFT JOIN" retrieves ALL rows from the left table, plus the matching rows from the right table. But that does not filter the records on any specific value or on other columns that are not part of the JOIN. Thus, if you want to further filter this result, specify the extra filters in the WHERE clause.

Related

correct query design? cross joins driving ad-hoc reporting interface

I'm hoping some of the more experienced database/dwh developers or DBAs can weigh in on this one:
My team is using OBIEE as a front-end tool to drive ad-hoc reporting being done by our business units.
There is a lot of latency when generating sets that are relatively small. We are facing ~1 hour to produce ~50k records.
I looked into one of the queries that is behaving this way, and I was surprised to find that all of the tables being referenced are being cross-joined, and then filters are being applied in the WHERE clause.
So, to illustrate, the queries tend to look like this:
SELECT ...
FROM tbl1
,tbl2
,tbl3
,tbl4
WHERE tbl1.col1 = tbl2.col1
and tbl3.col2 = tbl2.col2
and tbl4.col3 = tbl3.col3
instead of like this:
SELECT ...
FROM tbl1
INNER JOIN tbl2
ON tbl1.col1 = tbl2.col1
INNER JOIN tbl3
ON tbl3.col2 = tbl2.col2
INNER JOIN tbl4
ON tbl4.col3 = tbl3.col3
Now, from what I know about the order of query operations, the FROM clause gets performed before the WHERE clause, so the first example would perform much more slowly than the latter example. Am I correct (please answer only if you know the answer in the context of Oracle DB)? Unfortunately, I don't have the admin rights to run a trace against the 2 different versions of the query.
Is there a reason to set up the query the first way, related to how the OBIEE interface works? Remember, the query is the result of a user drag-and-dropping attributes into a sandbox, from a 'bank' of attributes. Selecting any combination of the attributes is supposed to generate output (if the data exists). The attributes come from many different tables. I don't have any experience in designing the mecahnism that generates the SQL based on this kind of ad-hoc attribute selection, so I don't know whether the query design in the first example is required to service this kind of reporting tool.
Don't worry, historically Oracle used the first notation for inner joins but later on adopted ANSI SQL standards.
The results in terms of performance and returned recordsets are exactly the same, the implicit 'comma' joins are not crossing resultset but effectively integrating the WHERE filters. If you doubt it, run an EXPLAIN SELECT command for both queries and you will see the forcasted algorithms will be identical.
Expanding this answer you may notice in the future the analogous notation (+) in place of outer joins. This answer will also stand correct in that context.
The real issue comes when both notations (implicit and explicit joins) are mixed in the same query. This would be asking for trouble big time, but I doubt you find such a case in OBIEE.
Those are inner joins, not cross joins, they just use the old syntax for doing it rather than ANSI as you were expecting.
Most join queries contain at least one join condition, either in the FROM clause or in the WHERE clause. (Oracle Documentation)
For a simple query such as in your example the execution should be exactly the same.
Where you have set outer joins (in the business model join) you will see OBI produce a query where the inner joins are made in the WHERE clause and the outer joins are done ANSI in the FROM statement – just to make things really hard to debug!
SELECT ...
FROM tbl1
,tbl2
,tbl3 left outer join
tbl4 on tbl3.col1 = tbl4.col2
WHERE tbl1.col1 = tbl2.col1
and tbl3.col2 = tbl2.col2
and tbl4.col3 = tbl3.col3

What is difference between ANSI and non-ANSI joins, and which do you recommend? [duplicate]

This question already has answers here:
Oracle Joins - Comparison between conventional syntax VS ANSI Syntax
(11 answers)
Closed 9 years ago.
I have come across many websites to find the answer about which one is better, ANSI or non- ANSI syntax. What is the difference between these two queries?
select a.name,a.empno,b.loc
from tab a, tab b
where a.deptno=b.deptno(+);
and:
select a.name,a.empno,b.loc
from tab a
left outer join tab b on a.deptno=b.deptno;
The result is same in both the cases. The second query is also longer. Which one is better?
suppose if we have added another table Salgrade in the above query based on what conditions we need to join them?? ..
can any one assume one table and give me explanation
both syntaxes usually work without problems, but if you try to add a where condition you will see that with the second one is much simpler to understand which is the join condition and which is the where clause.
1)
SELECT a.name,
a.empno,
b.loc
FROM tab a,
tab b
WHERE a.deptno = b.deptno(+)
AND a.empno = 190;
2)
SELECT a.name,
a.empno,
b.loc
FROM tab a,
LEFT OUTER JOIN tab b
ON a.deptno = b.deptno
WHERE a.empno = 190;
Also, it's much easier to recognize an outer join and do not forget to include the (+). Overall you can say it's just a question of taste, but the truth is that the second syntax is much more readable and less prone to errors.
The first is a legacy Oracle specific way of writing joins, the second is ANSI SQL-92+ standard and is the preferred one.
Extensively discussed many a times, including one by me.
Use explicit JOINs rather than implicit (regardless whether they are outer joins or not) is that it's much easier to accidently create a cartesian product with the implicit joins.
With explicit JOINs you cannot "by accident" create one. The more tables are involved the higher the risk is that you miss one join condition.
Basically (+) is severely limited compared to ANSI joins. Furthermore it is only available in Oracle whereas the ANSI join syntax is supported by all major DBMS
SQL will not start to perform better after migration to ANSI syntax - it's just different syntax.
Oracle strongly recommends that you use the more flexible FROM clause join syntax shown in the former example. In the past there were some bugs with ANSI syntax but if you go with latest 11.2 or 12.1 that should be fixed already.
Using the JOIN operators ensure your SQL code is ANSI compliant, and thus would allow a front-end application to be more easily ported for other database platforms.
Join conditions have a very low selectivity on each table and a high selectivity on the tuples in the theoretical cross product. Conditions in the where statement usually have a much higher selectivity.
Oracle internally converts ANSI syntax to the (+) syntax, you can see this happening in the execution plan's Predicate Information section.
If you are using 11.2 I advise ANSI join. If you use 12C, there are some new bugs unearthed on OUTER JOINS.
I also remember some bugs in Oracle while using ANSI syntax, before 11.2 where it got fixed in 11.2.
In my opinion, I am not a big fan of ANSI syntax, though Oracle does confirm to the standards of ANSI, it is not totally bug free.
please, read this article about joins.
result of your example is not same, if you have data in B table and not in A table

Whats the Difference between InnerJoin a table and include it in the FROM clause?

What the difference between the two below SQL Statements (one uses INNER JOIN, and the second uses the from clause) (Performance, execution time..), and is there any cases i must use one instead of the other?
SELECT Tbl1_Fld1, Tbl2_Fld1 FROM DB1..TABLE1
INNER JOIN DB2..TABLE1
on DB1..TABLE1 .Tbl1_Fld1 = DB2..TABLE1.Tbl2_Fld1
SELECT Tbl1_Fld1, Tbl2_Fld1 FROM DB1..TABLE1,DB2..TABLE1
WHERE DB1..TABLE1 .Tbl1_Fld1 = DB2..TABLE1.Tbl2_Fld1
In a perfect world, those should be equivalent except that the first one better documents what you want to achieve (join two tables and then search the result).
Alas, history, bugs, features, optimizers and other obstacles make this much more complicated than it needs to be.
Some databases simply don't support INNER JOIN even though it's a SQL standard syntax.
Other have bugs for certain data types, so the join won't work or will be very slow.
So in reality, you will have to run these with suitable test data to find out. There is no way to say for sure just by looking at the SQL. Sometimes, there isn't even a way to say for sure when you can run it because changes in the underlying data can have a huge impact (for example, Oracle can suddenly decide to ignore the index because too many rows in the table have been changed).
Given a sane query optimizer, there shouldn't be a difference between the two.
Use INNER JOIN if your engine supports it, basically for clarity of join condition vs filter separation
SELECT Tbl1_Fld1, Tbl2_Fld1
FROM
DB1..TABLE1,DB2..TABLE1
WHERE
DB1..TABLE1 .Tbl1_Fld1 = DB2..TABLE1.Tbl2_Fld1 AND
DB1..TABLE1 .Tbl1_Fld1 = 'foo' AND DB2..TABLE1.Tbl2_Fld1 = 1
vs
SELECT Tbl1_Fld1, Tbl2_Fld1
FROM
DB1..TABLE1
INNER JOIN
DB2..TABLE1 ON DB1..TABLE1 .Tbl1_Fld1 = DB2..TABLE1.Tbl2_Fld1
WHERE
DB1..TABLE1 .Tbl1_Fld1 = 'foo' AND DB2..TABLE1.Tbl2_Fld1 = 1
There should be no difference in execution for most engines
However, and there's always a "however", not all joins are INNER. What about an OUTER join?
The *= and =* is deprecated in SQL Server for example.
It was in Books online in SQL Server 2000. Here is the quote
In earlier versions of Microsoft® SQL
Server™ 2000, left and right outer
join conditions were specified in the
WHERE clause using the *= and =*
operators. In some cases, this syntax
results in an ambiguous query that can
be interpreted in more than one way.
SQL-92 compliant outer joins are
specified in the FROM clause and do
not result in this ambiguity. Because
the SQL-92 syntax is more precise,
detailed information about using the
old Transact-SQL outer join syntax in
the WHERE clause is not included with
this release. The syntax may not be
supported in a future version of SQL
Server. Any statements using the
Transact-SQL outer joins should be
changed to use the SQL-92 syntax.
So if you have a more complex query with both inner and outer joins then it's going to become confusing if you mix styles. So, add consistency to clarity mentioned above
No new queries should be written using the "From " syntax (really called an implicit join). And personally I would rewrite any I came into contact with because it is simply a bad coding practice.
This syntax is highly subject to error from accidental cross joins and harder to mainatin and harder to read correctly. If you need to add a left join later you may get incorrect results if you don't convert to explicit joins. And sometimes you need a cross join but with the implicit syntax, you don't know if the cross join was a mistake (a common one using this syntax) or deliberate, therefore the maintainer doesn't know whether to change it or not. It was replaced in 1992 for goodness sakes. There is no excuse for not using the explicit join syntax in 2010.

SQL - Benefits of JOINs? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Is there something wrong with joins that don't use the JOIN keyword in SQL or MySQL?
Hi,
i'ave always retrieved data without joins...
but is there a benefit to one method over the other?
select * from a INNER JOIN b on a.a = b.b;
select a.*,b.* from a,b where a.a = b.b;
Thanks!
The first method using the INNER JOIN keyword is:
ANSI SQL standard
much cleaner and more expressive
Therefore, I always cringe when I see the second option used - it just bloats up your WHERE clause, and you can't really see at one glance how the tables are joined (on what fields).
Should you happen to forget one of the JOIN conditions in a long list of WHERE clause expressions, you suddenly get a messy cartesian product..... can't do that with the INNER JOIN keyword (you must express what field(s) to join on).
I'd say the biggest benefit is readability. The version with the explicitly named join types are much easier for me to comprehend.
You are using a different syntax for a JOIN, basically. As a matter of best practices, it is best to use the first syntax (explicit JOIN) because it is clearer what the intention of the query is and makes the code easier to maintain.
These are both joins. they are just two different syntactical representations for joins. The first one, (using the "Join" keyword, is the current ANSI Standard (as of 1992 I think).
In the case of inner joins only, the two differeent representations are functionally identical, but the latter ANSI SQL92 standard syntax is much moire readable, once you get used to it, because each individual join condition is associated with the pair of intermediate resultsets being joined together, In the older representation, the join conditions are all together, along with the overall queries' filter conditions, in the where clause, and it is not as clear which is which. This makes identifying bad join conditions (where for example, an unintended cartesian product will be generated) much more difficult.
But more important, perhaps, is that, when performing an outer Join, in certain scenarios, the older syntax is NOT equivilent, and in fact will generate the WRONG resultset.
You should transition to the newer syntax for all your queries.
You've always retrieved the data with joins. The second query is using old syntax, but in the background it is still join :)
This depends on the RDBMS but in the case of SQL server I understand that the utilizing the former syntax allows for better optimization. This is less of a SQL question and more of a vendor specific question.
You can also use the EXPLAIN (SQL Server: Query Execution Plan) type functions to help you understand if there is a difference. Each query is unique and I imagine that the stored statistics can (and will) alter the behavior.

INNER JOIN keywords | with and without using them

SELECT * FROM TableA
INNER JOIN TableB
ON TableA.name = TableB.name
SELECT * FROM TableA, TableB
where TableA.name = TableB.name
Which is the preferred way and why?
Will there be any performance difference when keywords like JOIN is used?
Thanks
The second way is the classical way of doing it, from before the join keyword existed.
Normally the query processor generates the same database operations from the two queries, so there would be no difference in performance.
Using join better describes what you are doing in the query. If you have many joins, it's also better because the joined table and it's condition are beside each other, instead of putting all tables in one place and all conditions in another.
Another aspect is that it's easier to do an unbounded join by mistake using the second way, resulting in a cross join containing all combinations from the two tables.
Use the first one, as it is:
More explicit
Is the Standard way
As for performance - there should be no difference.
find out by using EXPLAIN SELECT …
it depends on the engine used, on the query optimizer, on the keys, on the table; on pretty much everything
In some SQL engines the second form (associative joins) is depreicated. Use the first form.
Second is less explicit, causes begginers to SQL to pause when writing code. Is much more difficult to manage in complex SQL due to the sequence of the join match requirement to match the WHERE clause sequence - they (squence in the code) must match or the results returned will change making the returned data set change which really goes against the thought that sequence should not change the results when elements at the same level are considered.
When joins containing multiple tables are created, it gets REALLY difficult to code, quite fast using the second form.
EDIT: Performance: I consider coding, debugging ease part of personal performance, thus ease of edit/debug/maintenance is better performant using the first form - it just takes me less time to do/understand stuff during the development and maintenance cycles.
Most current databases will optimize both of those queries into the exact same execution plan. However, use the first syntax, it is the current standard. By learning and using this join syntax, it will help when you do queries with LEFT OUTER JOIN and RIGHT OUTER JOIN. which become tricky and problematic using the older syntax with the joins in the WHERE clause.
Filtering joins solely using WHERE can be extremely inefficient in some common scenarios. For example:
SELECT * FROM people p, companies c WHERE p.companyID = c.id AND p.firstName = 'Daniel'
Most databases will execute this query quite literally, first taking the Cartesian product of the people and companies tables and then filtering by those which have matching companyID and id fields. While the fully-unconstrained product does not exist anywhere but in memory and then only for a moment, its calculation does take some time.
A better approach is to group the constraints with the JOINs where relevant. This is not only subjectively easier to read but also far more efficient. Thusly:
SELECT * FROM people p JOIN companies c ON p.companyID = c.id
WHERE p.firstName = 'Daniel'
It's a little longer, but the database is able to look at the ON clause and use it to compute the fully-constrained JOIN directly, rather than starting with everything and then limiting down. This is faster to compute (especially with large data sets and/or many-table joins) and requires less memory.
I change every query I see which uses the "comma JOIN" syntax. In my opinion, the only purpose for its existence is conciseness. Considering the performance impact, I don't think this is a compelling reason.