INNER JOIN vs multiple table names in "FROM" [duplicate] - sql

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
INNER JOIN versus WHERE clause — any difference?
What is the difference between an INNER JOIN query and an implicit join query (i.e. listing multiple tables after the FROM keyword)?
For example, given the following two tables:
CREATE TABLE Statuses(
id INT PRIMARY KEY,
description VARCHAR(50)
);
INSERT INTO Statuses VALUES (1, 'status');
CREATE TABLE Documents(
id INT PRIMARY KEY,
statusId INT REFERENCES Statuses(id)
);
INSERT INTO Documents VALUES (9, 1);
What is the difference between the below two SQL queries?
From the testing I've done, they return the same result. Do they do the same thing? Are there situations where they will return different result sets?
-- Using implicit join (listing multiple tables)
SELECT s.description
FROM Documents d, Statuses s
WHERE d.statusId = s.id
AND d.id = 9;
-- Using INNER JOIN
SELECT s.description
FROM Documents d
INNER JOIN Statuses s ON d.statusId = s.id
WHERE d.id = 9;

There is no reason to ever use an implicit join (the one with the commas). Yes for inner joins it will return the same results. However, it is subject to inadvertent cross joins especially in complex queries and it is harder for maintenance because the left/right outer join syntax (deprecated in SQL Server, where it doesn't work correctly right now anyway) differs from vendor to vendor. Since you shouldn't mix implicit and explict joins in the same query (you can get wrong results), needing to change something to a left join means rewriting the entire query.

If you do it the first way, people under the age of 30 will probably chuckle at you, but as long as you're doing an inner join, they produce the same result and the optimizer will generate the same execution plan (at least as far as I've ever been able to tell).
This does of course presume that the where clause in the first query is how you would be joining in the second query.
This will probably get closed as a duplicate, btw.

The nice part of the second method is that it helps separates the join condition (on ...) from the filter condition (where ...). This can help make the intent of the query more readable.
The join condition will typically be more descriptive of the structure of the database and the relation between the tables. e.g., the salary table is related to the employee table by the EmployeeID column, and queries involving those two tables will probably always join on that column.
The filter condition is more descriptive of the specific task being performed by the query. If the query is FindRichPeople, the where clause might be "where salaries.Salary > 1000000"... thats describing the task at hand, not the database structure.
Note that the SQL compiler doesn't see it that way... if it decides that it will be faster to cross join and then filter the results, it will cross join and filter the results. It doesn't care what is in the ON clause and whats in the WHERE clause. But, that typically wont happen if the on clause matches a foreign key or joins to a primary key or indexed column. As far as operating correctly, they are identical; as far as writing readable, maintainable code, the second way is probably a little better.

there is no difference as far as I know is the second one with the inner join the new way to write such statements and the first one the old method.

The first one does a Cartesian product on all record within those two tables then filters by the where clause.
The second only joins on records that meet the requirements of your ON clause.
EDIT: As others have indicated, the optimization engine will take care of an attempt on a Cartesian product and will result in the same query more or less.

A bit same. Can help you out.
Left join vs multiple tables in SQL (a)
Left join vs multiple tables in SQL (b)

In the example you've given, the queries are equivalent; if you're using SQL Server, run the query and display the actual exection plan to see what the server's doing internally.

Related

Using "From Multiple Tables" or "Join" performance difference [duplicate]

Most SQL dialects accept both the following queries:
SELECT a.foo, b.foo
FROM a, b
WHERE a.x = b.x
SELECT a.foo, b.foo
FROM a
LEFT JOIN b ON a.x = b.x
Now obviously when you need an outer join, the second syntax is required. But when doing an inner join why should I prefer the second syntax to the first (or vice versa)?
The old syntax, with just listing the tables, and using the WHERE clause to specify the join criteria, is being deprecated in most modern databases.
It's not just for show, the old syntax has the possibility of being ambiguous when you use both INNER and OUTER joins in the same query.
Let me give you an example.
Let's suppose you have 3 tables in your system:
Company
Department
Employee
Each table contain numerous rows, linked together. You got multiple companies, and each company can have multiple departments, and each department can have multiple employees.
Ok, so now you want to do the following:
List all the companies, and include all their departments, and all their employees. Note that some companies don't have any departments yet, but make sure you include them as well. Make sure you only retrieve departments that have employees, but always list all companies.
So you do this:
SELECT * -- for simplicity
FROM Company, Department, Employee
WHERE Company.ID *= Department.CompanyID
AND Department.ID = Employee.DepartmentID
Note that the last one there is an inner join, in order to fulfill the criteria that you only want departments with people.
Ok, so what happens now. Well, the problem is, it depends on the database engine, the query optimizer, indexes, and table statistics. Let me explain.
If the query optimizer determines that the way to do this is to first take a company, then find the departments, and then do an inner join with employees, you're not going to get any companies that don't have departments.
The reason for this is that the WHERE clause determines which rows end up in the final result, not individual parts of the rows.
And in this case, due to the left join, the Department.ID column will be NULL, and thus when it comes to the INNER JOIN to Employee, there's no way to fulfill that constraint for the Employee row, and so it won't appear.
On the other hand, if the query optimizer decides to tackle the department-employee join first, and then do a left join with the companies, you will see them.
So the old syntax is ambiguous. There's no way to specify what you want, without dealing with query hints, and some databases have no way at all.
Enter the new syntax, with this you can choose.
For instance, if you want all companies, as the problem description stated, this is what you would write:
SELECT *
FROM Company
LEFT JOIN (
Department INNER JOIN Employee ON Department.ID = Employee.DepartmentID
) ON Company.ID = Department.CompanyID
Here you specify that you want the department-employee join to be done as one join, and then left join the results of that with the companies.
Additionally, let's say you only want departments that contains the letter X in their name. Again, with old style joins, you risk losing the company as well, if it doesn't have any departments with an X in its name, but with the new syntax, you can do this:
SELECT *
FROM Company
LEFT JOIN (
Department INNER JOIN Employee ON Department.ID = Employee.DepartmentID
) ON Company.ID = Department.CompanyID AND Department.Name LIKE '%X%'
This extra clause is used for the joining, but is not a filter for the entire row. So the row might appear with company information, but might have NULLs in all the department and employee columns for that row, because there is no department with an X in its name for that company. This is hard with the old syntax.
This is why, amongst other vendors, Microsoft has deprecated the old outer join syntax, but not the old inner join syntax, since SQL Server 2005 and upwards. The only way to talk to a database running on Microsoft SQL Server 2005 or 2008, using the old style outer join syntax, is to set that database in 8.0 compatibility mode (aka SQL Server 2000).
Additionally, the old way, by throwing a bunch of tables at the query optimizer, with a bunch of WHERE clauses, was akin to saying "here you are, do the best you can". With the new syntax, the query optimizer has less work to do in order to figure out what parts goes together.
So there you have it.
LEFT and INNER JOIN is the wave of the future.
The JOIN syntax keeps conditions near the table they apply to. This is especially useful when you join a large amount of tables.
By the way, you can do an outer join with the first syntax too:
WHERE a.x = b.x(+)
Or
WHERE a.x *= b.x
Or
WHERE a.x = b.x or a.x not in (select x from b)
Basically, when your FROM clause lists tables like so:
SELECT * FROM
tableA, tableB, tableC
the result is a cross product of all the rows in tables A, B, C. Then you apply the restriction WHERE tableA.id = tableB.a_id which will throw away a huge number of rows, then further ... AND tableB.id = tableC.b_id and you should then get only those rows you are really interested in.
DBMSs know how to optimise this SQL so that the performance difference to writing this using JOINs is negligible (if any). Using the JOIN notation makes the SQL statement more readable (IMHO, not using joins turns the statement into a mess). Using the cross product, you need to provide join criteria in the WHERE clause, and that's the problem with the notation. You are crowding your WHERE clause with stuff like
tableA.id = tableB.a_id
AND tableB.id = tableC.b_id
which is only used to restrict the cross product. WHERE clause should only contain RESTRICTIONS to the resultset. If you mix table join criteria with resultset restrictions, you (and others) will find your query harder to read. You should definitely use JOINs and keep the FROM clause a FROM clause, and the WHERE clause a WHERE clause.
The first way is the older standard. The second method was introduced in SQL-92, http://en.wikipedia.org/wiki/SQL. The complete standard can be viewed at http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt .
It took many years before database companies adopted the SQL-92 standard.
So the reason why the second method is preferred, it is the SQL standard according the ANSI and ISO standards committee.
The second is preferred because it is far less likely to result in an accidental cross join by forgetting to put inthe where clause. A join with no on clause will fail the syntax check, an old style join with no where clause will not fail, it will do a cross join.
Additionally when you later have to a left join, it is helpful for maintenance that they all be in the same structure. And the old syntax has been out of date since 1992, it is well past time to stop using it.
Plus I have found that many people who exclusively use the first syntax don't really understand joins and understanding joins is critical to getting correct results when querying.
I think there are some good reasons on this page to adopt the second method -using explicit JOINs. The clincher though is that when the JOIN criteria are removed from the WHERE clause it becomes much easier to see the remaining selection criteria in the WHERE clause.
In really complex SELECT statements it becomes much easier for a reader to understand what is going on.
The SELECT * FROM table1, table2, ... syntax is ok for a couple of tables, but it becomes exponentially (not necessarily a mathematically accurate statement) harder and harder to read as the number of tables increases.
The JOIN syntax is harder to write (at the beginning), but it makes it explicit what criteria affects which tables. This makes it much harder to make a mistake.
Also, if all the joins are INNER, then both versions are equivalent. However, the moment you have an OUTER join anywhere in the statement, things get much more complicated and it's virtually guarantee that what you write won't be querying what you think you wrote.
When you need an outer join the second syntax is not always required:
Oracle:
SELECT a.foo, b.foo
FROM a, b
WHERE a.x = b.x(+)
MSSQLServer (although it's been deprecated in 2000 version)/Sybase:
SELECT a.foo, b.foo
FROM a, b
WHERE a.x *= b.x
But returning to your question. I don't know the answer, but it is probably related to the fact that a join is more natural (syntactically, at least) than adding an expression to a where clause when you are doing exactly that: joining.
I hear a lot of people complain the first one is too difficult to understand and that it is unclear. I don't see a problem with it, but after having that discussion, I use the second one even on INNER JOINS for clarity.
To the database, they end up being the same. For you, though, you'll have to use that second syntax in some situations. For the sake of editing queries that end up having to use it (finding out you needed a left join where you had a straight join), and for consistency, I'd pattern only on the 2nd method. It'll make reading queries easier.
Well the first and second queries may yield different results because a LEFT JOIN includes all records from the first table, even if there are no corresponding records in the right table.
If both are inner joins there is no difference in the semantics or the execution of the SQL or performance. Both are ANSI Standard SQL It is purely a matter of preference, of coding standards within your work group.
Over the last 25 years, I've developed the habit that if I have a fairly complicated SQL I will use the INNER JOIN syntax because it is easier for the reader to pick out the structure of the query at a glance. It also gives more clarity by singling out the join conditions from the residual conditions, which can save time (and mistakes) if you ever come back to your query months later.
However for outer joins, for the purpose of clarity I would not under any circumstances use non-ansi extensions.

Left join or Select in select (SQL - Speed of query)

I have something like this:
SELECT CompanyId
FROM Company
WHERE CompanyId not in
(SELECT CompanyId
FROM Company
WHERE (IsPublic = 0) and CompanyId NOT IN
(SELECT ShoppingLike.WhichId
FROM Company
INNER JOIN
ShoppingLike ON Company.CompanyId = ShoppingLike.UserId
WHERE (ShoppingLike.IsWaiting = 0) AND
(ShoppingLike.ShoppingScoreTypeId = 2) AND
(ShoppingLike.UserId = 75)
)
)
It has 3 select, I want to know how could I have it without making 3 selects, and which one has better speed for 1 million record? "select in select" or "left join"?
My experiences are from Oracle. There is never a correct answer to optimising tricky queries, it's a collaboration between you and the optimiser. You need to check explain plans and sometimes traces, often at each stage of writing the query, to find out what the optimiser in thinking. Having said that:
You could remove the outer SELECT by putting the entire contents of it's subquery WHERE clause in a NOT(...). On the face of it will prevent that outer full scan of Company (or it's index of CompanyId). Try it, check the output is the same and get timings, then remove it temporarily before trying the below. The NOT() may well cause the optimiser to stop considering an ANTI-JOIN against the ShoppingLike subquery due to an implicit OR being created.
Ensure that CompanyId and WhichId are defined as NOT NULL columns. Without this (or the likes of an explicit CompanyId IS NOT NULL) then ANTI-JOIN options are often discarded.
The inner most subquery is not correlated (does not reference anything from it's outer query) so can be extracted and tuned separately. As a matter of style I'd swap the table names round the INNER JOIN as you want ShoppingLike scanned first as it has all the filters against it. It wont make any difference but it reads easier and makes it possible to use a hint to scan tables in the order specified. I would even question the need for the Company table in this subquery.
You've used NOT IN when sometimes the very similar NOT EXISTS gives the optimiser more/alternative options.
All the above is just trial and error unless you start trying the explain plan. Oracle can, with a following wind, convert between LEFT JOIN and IN SELECT. 1M+ rows will create time to invest.

When to use SQL natural join instead of join .. on?

I'm studying SQL for a database exam and the way I've seen SQL is they way it looks on this page:
http://en.wikipedia.org/wiki/Star_schema
IE join written the way Join <table name> On <table attribute> and then the join condition for the selection. My course book and my exercises given to me from the academic institution however, use only natural join in their examples. So when is it right to use natural join? Should natural join be used if the query can also be written using JOIN .. ON ?
Thanks for any answer or comment
A natural join will find columns with the same name in both tables and add one column in the result for each pair found. The inner join lets you specify the comparison you want to make using any column.
IMO, the JOIN ON syntax is much more readable and maintainable than the natural join syntax. Natural joins is a leftover of some old standards, and I try to avoid it like the plague.
A natural join will find columns with the same name in both tables and add one column in the result for each pair found. The inner join lets you specify the comparison you want to make using any column.
The JOIN keyword is used in an SQL statement to query data from two or more tables, based on a relationship between certain columns in these tables.
Different Joins
* JOIN: Return rows when there is at least one match in both tables
* LEFT JOIN: Return all rows from the left table, even if there are no matches in the right table
* RIGHT JOIN: Return all rows from the right table, even if there are no matches in the left table
* FULL JOIN: Return rows when there is a match in one of the tables
INNER JOIN
http://www.w3schools.com/sql/sql_join_inner.asp
FULL JOIN
http://www.w3schools.com/sql/sql_join_full.asp
A natural join is said to be an abomination because it does not allow qualifying key columns, which makes it confusing. Because you never know which "common" columns are being used to join two tables simply by looking at the sql statement.
A NATURAL JOIN matches on any shared column names between the tables, whereas an INNER JOIN only matches on the given ON condition.
The joins often interchangeable and usually produce the same results. However, there are some important considerations to make:
If a NATURAL JOIN finds no matching columns, it returns the cross
product. This could produce disastrous results if the schema is
modified. On the other hand, an INNER JOIN will return a 'column does
not exist' error. This is much more fault tolerant.
An INNER JOIN self-documents with its ON clause, resulting in a
clearer query that describes the table schema to the reader.
An INNER JOIN results in a maintainable and reusable query in
which the column names can be swapped in and out with changes in the
use case or table schema.
The programmer can notice column name mis-matches (e.g. item_ID vs itemID) sooner if they are forced to define the ON predicate.
Otherwise, a NATURAL JOIN is still a good choice for a quick, ad-hoc query.

Translating Oracle SQL to Access Jet SQL, Left Join

There must be something I'm missing here. I have this nice, pretty Oracle SQL statement in Toad that gives me back a list of all active personnel with the IDs that I want:
SELECT PERSONNEL.PERSON_ID,
PERSONNEL.NAME_LAST_KEY,
PERSONNEL.NAME_FIRST_KEY,
PA_EID.ALIAS EID,
PA_IDTWO.ALIAS IDTWO,
PA_LIC.ALIAS LICENSENO
FROM PERSONNEL
LEFT JOIN PERSONNEL_ALIAS PA_EID
ON PERSONNEL.PERSON_ID = PA_EID.PERSON_ID
AND PA_EID.PERSONNEL_ALIAS_TYPE_CD = 1086
AND PA_EID.ALIAS_POOL_CD = 3796547
AND PERSONNEL.ACTIVE_IND = 1
LEFT JOIN PERSONNEL_ALIAS PA_IDTWO
ON PERSONNEL.PERSON_ID = PA_IDTWO.PERSON_ID
AND PA_IDTWO.PERSONNEL_ALIAS_TYPE_CD = 3839085
AND PA_IDTWO.ACTIVE_IND = 1
LEFT JOIN PERSONNEL_ALIAS PA_LIC
ON PERSONNEL.PERSON_ID = PA_LIC.PERSON_ID
AND PA_LIC.PERSONNEL_ALIAS_TYPE_CD = 1087
AND PA_LIC.ALIAS_POOL_CD = 683988
AND PA_LIC.ACTIVE_IND = 1
WHERE PERSONNEL.ACTIVE_IND = 1 AND PERSONNEL.PHYSICIAN_IND = 1;
This works very nicely. Where I run into problems is when I put it into Access. I know, I know, Access Sucks. Sometimes one needs to use it, especially if one has multiple database types that they just want to store a few queries in, and especially if one's boss only knows Access. Anyway, I was having trouble with the ANDs inside the FROM, so I moved those to the WHERE, but for some odd reason, Access isn't doing the LEFT JOINs, returning only those personnel with EID, IDTWO, and LICENSENO's. Not everybody has all three of these.
Best shot in Access so far is:
SELECT PERSONNEL.PERSON_ID,
PERSONNEL.NAME_LAST_KEY,
PERSONNEL.NAME_FIRST_KEY,
PA_EID.ALIAS AS EID,
PA_IDTWO.ALIAS AS ID2,
PA_LIC.ALIAS AS LICENSENO
FROM ((PERSONNEL
LEFT JOIN PERSONNEL_ALIAS AS PA_EID ON PERSONNEL.PERSON_ID=PA_EID.PERSON_ID)
LEFT JOIN PERSONNEL_ALIAS AS PA_IDTWO ON PERSONNEL.PERSON_ID=PA_IDTWO.PERSON_ID)
LEFT JOIN PERSONNEL_ALIAS AS PA_LIC ON PERSONNEL.PERSON_ID=PA_LIC.PERSON_ID
WHERE (((PERSONNEL.ACTIVE_IND)=1)
AND ((PERSONNEL.PHYSICIAN_IND)=1)
AND ((PA_EID.PRSNL_ALIAS_TYPE_CD)=1086)
AND ((PA_EID.ALIAS_POOL_CD)=3796547)
AND ((PA_IDTWO.PRSNL_ALIAS_TYPE_CD)=3839085)
AND ((PA_IDTWO.ACTIVE_IND)=1)
AND ((PA_LIC.PRSNL_ALIAS_TYPE_CD)=1087)
AND ((PA_LIC.ALIAS_POOL_CD)=683988)
AND ((PA_LIC.ACTIVE_IND)=1));
I think that part of the problem could be that I'm using the same alias (lookup) table for all three joins. Maybe there's a more efficient way of doing this? Still new to SQL land, so any tips as far as that goes would be great. I feel like these should be equivalent, but the Toad query gives me back many many tens of thousands of imperfect rows, and Access gives me fewer than 500. I need to find everybody so that nobody is left out. It's almost as if the LEFT JOINs aren't working at all in Access.
To understand what you are doing, let's look at simplified version of your query:
SELECT PERSONNEL.PERSON_ID,
PA_EID.ALIAS AS EID
FROM PERSONNEL
LEFT JOIN PERSONNEL_ALIAS AS PA_EID ON PERSONNEL.PERSON_ID=PA_EID.PERSON_ID
WHERE PERSONNEL.ACTIVE_IND=1
AND PERSONNEL.PHYSICIAN_IND=1
AND PA_EID.PRSNL_ALIAS_TYPE_CD=1086
AND PA_EID.ALIAS_POOL_CD=3796547
If the LEFT JOIN finds match, your row might look like this:
Person_ID EID
12345 JDB
If it doesn't find a match, (disregard the WHERE clause for a second), it could look like:
Person_ID EID
12345 NULL
When you add the WHERE clauses above, you are telling it to only find records in the PERSONNEL_ALIAS table that meet the condition, but if no records are found, then the values are considered NULL, so they will never satisfy the WHERE condition and no records will come back...
As Joe Stefanelli said in his comment, adding a WHERE clause to a LEFT JOIN'ed table make it act as an INNER JOIN instead...
Further to #Sparky's answer, to get the equivalent of what you're doing in Oracle, you need to filter rows from the tables on the "outer" side of the joins before you join them. One way to do this might be:
For each table on the "outer" side of a join that you need to filter rows from (that is, the three instances of PERSONNEL_ALIAS), create a query that filters the rows you want. For example, the first query (say, named PA_EID) might look something like this:SELECT PERSONNEL_ALIAS.* FROM PERSONNEL_ALIAS WHERE PERSONNEL_ALIAS.PERSONNEL_ALIAS_TYPE_CD = 1086 AND PERSONNEL_ALIAS.ALIAS_POOL_CD = 3796547
In your "best shot in Access so far" query in the original post: a) replace each instance of PERSONNEL_ALIAS with the corresponding query created in Step 1, and, b) remove the corresponding conditions (on PA_EID, PA_IDTWO, and PA_LIC) from the WHERE clause.

INNER JOIN ON vs WHERE clause

For simplicity, assume all relevant fields are NOT NULL.
You can do:
SELECT
table1.this, table2.that, table2.somethingelse
FROM
table1, table2
WHERE
table1.foreignkey = table2.primarykey
AND (some other conditions)
Or else:
SELECT
table1.this, table2.that, table2.somethingelse
FROM
table1 INNER JOIN table2
ON table1.foreignkey = table2.primarykey
WHERE
(some other conditions)
Do these two work on the same way in MySQL?
INNER JOIN is ANSI syntax that you should use.
It is generally considered more readable, especially when you join lots of tables.
It can also be easily replaced with an OUTER JOIN whenever a need arises.
The WHERE syntax is more relational model oriented.
A result of two tables JOINed is a cartesian product of the tables to which a filter is applied which selects only those rows with joining columns matching.
It's easier to see this with the WHERE syntax.
As for your example, in MySQL (and in SQL generally) these two queries are synonyms.
Also, note that MySQL also has a STRAIGHT_JOIN clause.
Using this clause, you can control the JOIN order: which table is scanned in the outer loop and which one is in the inner loop.
You cannot control this in MySQL using WHERE syntax.
Others have pointed out that INNER JOIN helps human readability, and that's a top priority, I agree.
Let me try to explain why the join syntax is more readable.
A basic SELECT query is this:
SELECT stuff
FROM tables
WHERE conditions
The SELECT clause tells us what we're getting back; the FROM clause tells us where we're getting it from, and the WHERE clause tells us which ones we're getting.
JOIN is a statement about the tables, how they are bound together (conceptually, actually, into a single table).
Any query elements that control the tables - where we're getting stuff from - semantically belong to the FROM clause (and of course, that's where JOIN elements go). Putting joining-elements into the WHERE clause conflates the which and the where-from, that's why the JOIN syntax is preferred.
Applying conditional statements in ON / WHERE
Here I have explained the logical query processing steps.
Reference: Inside Microsoft® SQL Server™ 2005 T-SQL Querying
Publisher: Microsoft Press
Pub Date: March 07, 2006
Print ISBN-10: 0-7356-2313-9
Print ISBN-13: 978-0-7356-2313-2
Pages: 640
Inside Microsoft® SQL Server™ 2005 T-SQL Querying
(8) SELECT (9) DISTINCT (11) TOP <top_specification> <select_list>
(1) FROM <left_table>
(3) <join_type> JOIN <right_table>
(2) ON <join_condition>
(4) WHERE <where_condition>
(5) GROUP BY <group_by_list>
(6) WITH {CUBE | ROLLUP}
(7) HAVING <having_condition>
(10) ORDER BY <order_by_list>
The first noticeable aspect of SQL that is different than other programming languages is the order in which the code is processed. In most programming languages, the code is processed in the order in which it is written. In SQL, the first clause that is processed is the FROM clause, while the SELECT clause, which appears first, is processed almost last.
Each step generates a virtual table that is used as the input to the following step. These virtual tables are not available to the caller (client application or outer query). Only the table generated by the final step is returned to the caller. If a certain clause is not specified in a query, the corresponding step is simply skipped.
Brief Description of Logical Query Processing Phases
Don't worry too much if the description of the steps doesn't seem to make much sense for now. These are provided as a reference. Sections that come after the scenario example will cover the steps in much more detail.
FROM: A Cartesian product (cross join) is performed between the first two tables in the FROM clause, and as a result, virtual table VT1 is generated.
ON: The ON filter is applied to VT1. Only rows for which the <join_condition> is TRUE are inserted to VT2.
OUTER (join): If an OUTER JOIN is specified (as opposed to a CROSS JOIN or an INNER JOIN), rows from the preserved table or tables for which a match was not found are added to the rows from VT2 as outer rows, generating VT3. If more than two tables appear in the FROM clause, steps 1 through 3 are applied repeatedly between the result of the last join and the next table in the FROM clause until all tables are processed.
WHERE: The WHERE filter is applied to VT3. Only rows for which the <where_condition> is TRUE are inserted to VT4.
GROUP BY: The rows from VT4 are arranged in groups based on the column list specified in the GROUP BY clause. VT5 is generated.
CUBE | ROLLUP: Supergroups (groups of groups) are added to the rows from VT5, generating VT6.
HAVING: The HAVING filter is applied to VT6. Only groups for which the <having_condition> is TRUE are inserted to VT7.
SELECT: The SELECT list is processed, generating VT8.
DISTINCT: Duplicate rows are removed from VT8. VT9 is generated.
ORDER BY: The rows from VT9 are sorted according to the column list specified in the ORDER BY clause. A cursor is generated (VC10).
TOP: The specified number or percentage of rows is selected from the beginning of VC10. Table VT11 is generated and returned to the caller.
Therefore, (INNER JOIN) ON will filter the data (the data count of VT will be reduced here itself) before applying the WHERE clause. The subsequent join conditions will be executed with filtered data which improves performance. After that, only the WHERE condition will apply filter conditions.
(Applying conditional statements in ON / WHERE will not make much difference in few cases. This depends on how many tables you have joined and the number of rows available in each join tables)
The implicit join ANSI syntax is older, less obvious, and not recommended.
In addition, the relational algebra allows interchangeability of the predicates in the WHERE clause and the INNER JOIN, so even INNER JOIN queries with WHERE clauses can have the predicates rearranged by the optimizer.
I recommend you write the queries in the most readable way possible.
Sometimes this includes making the INNER JOIN relatively "incomplete" and putting some of the criteria in the WHERE simply to make the lists of filtering criteria more easily maintainable.
For example, instead of:
SELECT *
FROM Customers c
INNER JOIN CustomerAccounts ca
ON ca.CustomerID = c.CustomerID
AND c.State = 'NY'
INNER JOIN Accounts a
ON ca.AccountID = a.AccountID
AND a.Status = 1
Write:
SELECT *
FROM Customers c
INNER JOIN CustomerAccounts ca
ON ca.CustomerID = c.CustomerID
INNER JOIN Accounts a
ON ca.AccountID = a.AccountID
WHERE c.State = 'NY'
AND a.Status = 1
But it depends, of course.
Implicit joins (which is what your first query is known as) become much much more confusing, hard to read, and hard to maintain once you need to start adding more tables to your query. Imagine doing that same query and type of join on four or five different tables ... it's a nightmare.
Using an explicit join (your second example) is much more readable and easy to maintain.
I'll also point out that using the older syntax is more subject to error. If you use inner joins without an ON clause, you will get a syntax error. If you use the older syntax and forget one of the join conditions in the where clause, you will get a cross join. The developers often fix this by adding the distinct keyword (rather than fixing the join because they still don't realize the join itself is broken) which may appear to cure the problem but will slow down the query considerably.
Additionally for maintenance if you have a cross join in the old syntax, how will the maintainer know if you meant to have one (there are situations where cross joins are needed) or if it was an accident that should be fixed?
Let me point you to this question to see why the implicit syntax is bad if you use left joins.
Sybase *= to Ansi Standard with 2 different outer tables for same inner table
Plus (personal rant here), the standard using the explicit joins is over 20 years old, which means implicit join syntax has been outdated for those 20 years. Would you write application code using a syntax that has been outdated for 20 years? Why do you want to write database code that is?
The SQL:2003 standard changed some precedence rules so a JOIN statement takes precedence over a "comma" join. This can actually change the results of your query depending on how it is setup. This cause some problems for some people when MySQL 5.0.12 switched to adhering to the standard.
So in your example, your queries would work the same. But if you added a third table:
SELECT ... FROM table1, table2 JOIN table3 ON ... WHERE ...
Prior to MySQL 5.0.12, table1 and table2 would be joined first, then table3. Now (5.0.12 and on), table2 and table3 are joined first, then table1. It doesn't always change the results, but it can and you may not even realize it.
I never use the "comma" syntax anymore, opting for your second example. It's a lot more readable anyway, the JOIN conditions are with the JOINs, not separated into a separate query section.
They have a different human-readable meaning.
However, depending on the query optimizer, they may have the same meaning to the machine.
You should always code to be readable.
That is to say, if this is a built-in relationship, use the explicit join. if you are matching on weakly related data, use the where clause.
I know you're talking about MySQL, but anyway:
In Oracle 9 explicit joins and implicit joins would generate different execution plans. AFAIK that has been solved in Oracle 10+: there's no such difference anymore.
If you are often programming dynamic stored procedures, you will fall in love with your second example (using where). If you have various input parameters and lots of morph mess, then that is the only way. Otherwise, they both will run the same query plan so there is definitely no obvious difference in classic queries.
ANSI join syntax is definitely more portable.
I'm going through an upgrade of Microsoft SQL Server, and I would also mention that the =* and *= syntax for outer joins in SQL Server is not supported (without compatibility mode) for 2005 SQL server and later.
I have two points for the implicit join (The second example):
Tell the database what you want, not what it should do.
You can write all tables in a clear list that is not cluttered by join conditions. Then you can much easier read what tables are all mentioned. The conditions come all in the WHERE part, where they are also all lined up one below the other. Using the JOIN keyword mixes up tables and conditions.