Using "From Multiple Tables" or "Join" performance difference [duplicate] - sql

Most SQL dialects accept both the following queries:
SELECT a.foo, b.foo
FROM a, b
WHERE a.x = b.x
SELECT a.foo, b.foo
FROM a
LEFT JOIN b ON a.x = b.x
Now obviously when you need an outer join, the second syntax is required. But when doing an inner join why should I prefer the second syntax to the first (or vice versa)?

The old syntax, with just listing the tables, and using the WHERE clause to specify the join criteria, is being deprecated in most modern databases.
It's not just for show, the old syntax has the possibility of being ambiguous when you use both INNER and OUTER joins in the same query.
Let me give you an example.
Let's suppose you have 3 tables in your system:
Company
Department
Employee
Each table contain numerous rows, linked together. You got multiple companies, and each company can have multiple departments, and each department can have multiple employees.
Ok, so now you want to do the following:
List all the companies, and include all their departments, and all their employees. Note that some companies don't have any departments yet, but make sure you include them as well. Make sure you only retrieve departments that have employees, but always list all companies.
So you do this:
SELECT * -- for simplicity
FROM Company, Department, Employee
WHERE Company.ID *= Department.CompanyID
AND Department.ID = Employee.DepartmentID
Note that the last one there is an inner join, in order to fulfill the criteria that you only want departments with people.
Ok, so what happens now. Well, the problem is, it depends on the database engine, the query optimizer, indexes, and table statistics. Let me explain.
If the query optimizer determines that the way to do this is to first take a company, then find the departments, and then do an inner join with employees, you're not going to get any companies that don't have departments.
The reason for this is that the WHERE clause determines which rows end up in the final result, not individual parts of the rows.
And in this case, due to the left join, the Department.ID column will be NULL, and thus when it comes to the INNER JOIN to Employee, there's no way to fulfill that constraint for the Employee row, and so it won't appear.
On the other hand, if the query optimizer decides to tackle the department-employee join first, and then do a left join with the companies, you will see them.
So the old syntax is ambiguous. There's no way to specify what you want, without dealing with query hints, and some databases have no way at all.
Enter the new syntax, with this you can choose.
For instance, if you want all companies, as the problem description stated, this is what you would write:
SELECT *
FROM Company
LEFT JOIN (
Department INNER JOIN Employee ON Department.ID = Employee.DepartmentID
) ON Company.ID = Department.CompanyID
Here you specify that you want the department-employee join to be done as one join, and then left join the results of that with the companies.
Additionally, let's say you only want departments that contains the letter X in their name. Again, with old style joins, you risk losing the company as well, if it doesn't have any departments with an X in its name, but with the new syntax, you can do this:
SELECT *
FROM Company
LEFT JOIN (
Department INNER JOIN Employee ON Department.ID = Employee.DepartmentID
) ON Company.ID = Department.CompanyID AND Department.Name LIKE '%X%'
This extra clause is used for the joining, but is not a filter for the entire row. So the row might appear with company information, but might have NULLs in all the department and employee columns for that row, because there is no department with an X in its name for that company. This is hard with the old syntax.
This is why, amongst other vendors, Microsoft has deprecated the old outer join syntax, but not the old inner join syntax, since SQL Server 2005 and upwards. The only way to talk to a database running on Microsoft SQL Server 2005 or 2008, using the old style outer join syntax, is to set that database in 8.0 compatibility mode (aka SQL Server 2000).
Additionally, the old way, by throwing a bunch of tables at the query optimizer, with a bunch of WHERE clauses, was akin to saying "here you are, do the best you can". With the new syntax, the query optimizer has less work to do in order to figure out what parts goes together.
So there you have it.
LEFT and INNER JOIN is the wave of the future.

The JOIN syntax keeps conditions near the table they apply to. This is especially useful when you join a large amount of tables.
By the way, you can do an outer join with the first syntax too:
WHERE a.x = b.x(+)
Or
WHERE a.x *= b.x
Or
WHERE a.x = b.x or a.x not in (select x from b)

Basically, when your FROM clause lists tables like so:
SELECT * FROM
tableA, tableB, tableC
the result is a cross product of all the rows in tables A, B, C. Then you apply the restriction WHERE tableA.id = tableB.a_id which will throw away a huge number of rows, then further ... AND tableB.id = tableC.b_id and you should then get only those rows you are really interested in.
DBMSs know how to optimise this SQL so that the performance difference to writing this using JOINs is negligible (if any). Using the JOIN notation makes the SQL statement more readable (IMHO, not using joins turns the statement into a mess). Using the cross product, you need to provide join criteria in the WHERE clause, and that's the problem with the notation. You are crowding your WHERE clause with stuff like
tableA.id = tableB.a_id
AND tableB.id = tableC.b_id
which is only used to restrict the cross product. WHERE clause should only contain RESTRICTIONS to the resultset. If you mix table join criteria with resultset restrictions, you (and others) will find your query harder to read. You should definitely use JOINs and keep the FROM clause a FROM clause, and the WHERE clause a WHERE clause.

The first way is the older standard. The second method was introduced in SQL-92, http://en.wikipedia.org/wiki/SQL. The complete standard can be viewed at http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt .
It took many years before database companies adopted the SQL-92 standard.
So the reason why the second method is preferred, it is the SQL standard according the ANSI and ISO standards committee.

The second is preferred because it is far less likely to result in an accidental cross join by forgetting to put inthe where clause. A join with no on clause will fail the syntax check, an old style join with no where clause will not fail, it will do a cross join.
Additionally when you later have to a left join, it is helpful for maintenance that they all be in the same structure. And the old syntax has been out of date since 1992, it is well past time to stop using it.
Plus I have found that many people who exclusively use the first syntax don't really understand joins and understanding joins is critical to getting correct results when querying.

I think there are some good reasons on this page to adopt the second method -using explicit JOINs. The clincher though is that when the JOIN criteria are removed from the WHERE clause it becomes much easier to see the remaining selection criteria in the WHERE clause.
In really complex SELECT statements it becomes much easier for a reader to understand what is going on.

The SELECT * FROM table1, table2, ... syntax is ok for a couple of tables, but it becomes exponentially (not necessarily a mathematically accurate statement) harder and harder to read as the number of tables increases.
The JOIN syntax is harder to write (at the beginning), but it makes it explicit what criteria affects which tables. This makes it much harder to make a mistake.
Also, if all the joins are INNER, then both versions are equivalent. However, the moment you have an OUTER join anywhere in the statement, things get much more complicated and it's virtually guarantee that what you write won't be querying what you think you wrote.

When you need an outer join the second syntax is not always required:
Oracle:
SELECT a.foo, b.foo
FROM a, b
WHERE a.x = b.x(+)
MSSQLServer (although it's been deprecated in 2000 version)/Sybase:
SELECT a.foo, b.foo
FROM a, b
WHERE a.x *= b.x
But returning to your question. I don't know the answer, but it is probably related to the fact that a join is more natural (syntactically, at least) than adding an expression to a where clause when you are doing exactly that: joining.

I hear a lot of people complain the first one is too difficult to understand and that it is unclear. I don't see a problem with it, but after having that discussion, I use the second one even on INNER JOINS for clarity.

To the database, they end up being the same. For you, though, you'll have to use that second syntax in some situations. For the sake of editing queries that end up having to use it (finding out you needed a left join where you had a straight join), and for consistency, I'd pattern only on the 2nd method. It'll make reading queries easier.

Well the first and second queries may yield different results because a LEFT JOIN includes all records from the first table, even if there are no corresponding records in the right table.

If both are inner joins there is no difference in the semantics or the execution of the SQL or performance. Both are ANSI Standard SQL It is purely a matter of preference, of coding standards within your work group.
Over the last 25 years, I've developed the habit that if I have a fairly complicated SQL I will use the INNER JOIN syntax because it is easier for the reader to pick out the structure of the query at a glance. It also gives more clarity by singling out the join conditions from the residual conditions, which can save time (and mistakes) if you ever come back to your query months later.
However for outer joins, for the purpose of clarity I would not under any circumstances use non-ansi extensions.

Related

SQL Multi-Table View Joins [duplicate]

I have my business-logic in ~7000 lines of T-SQL stored procedures, and most of them has next JOIN syntax:
SELECT A.A, B.B, C.C
FROM aaa AS A, bbb AS B, ccc AS C
WHERE
A.B = B.ID
AND B.C = C.ID
AND C.ID = #param
Will I get performance growth if I will replace such query with this:
SELECT A.A, B.B, C.C
FROM aaa AS A
JOIN bbb AS B
ON A.B = B.ID
JOIN ccc AS C
ON B.C = C.ID
AND C.ID = #param
Or they are the same?
The two queries are the same, except the second is ANSI-92 SQL syntax and the first is the older SQL syntax which didn't incorporate the join clause. They should produce exactly the same internal query plan, although you may like to check.
You should use the ANSI-92 syntax for several of reasons
The use of the JOIN clause separates
the relationship logic from the
filter logic (the WHERE) and is thus
cleaner and easier to understand.
It doesn't matter with this particular query, but there are a few circumstances where the older outer join syntax (using + ) is ambiguous and the query results are hence implementation dependent - or the query cannot be resolved at all. These do not occur with ANSI-92
It's good practice as most developers and dba's will use ANSI-92 nowadays and you should follow the standard. Certainly all modern query tools will generate ANSI-92.
As pointed out by #gbn, it does tend to avoid accidental cross joins.
Myself I resisted ANSI-92 for some time as there is a slight conceptual advantage to the old syntax as it's a easier to envisage the SQL as a mass Cartesian join of all tables used followed by a filtering operation - a mental technique that can be useful for grasping what a SQL query is doing. However I decided a few years ago that I needed to move with the times and after a relatively short adjustment period I now strongly prefer it - predominantly because of the first reason given above. The only place that one should depart from the ANSI-92 syntax, or rather not use the option, is with natural joins which are implicitly dangerous.
The second construct is known as the "infixed join syntax" in the SQL community. The first construct AFAIK doesn't have widely accepted name so let's call it the 'old style' inner join syntax.
The usual arguments go like this:
Pros of the 'Traditional' syntax: the
predicates are physically grouped together in the WHERE clause in
whatever order which makes the query generally, and n-ary relationships particularly, easier to read and understand (the ON clauses of the infixed syntax can spread out the predicates so you have to look for the appearance of one table or column over a visual distance).
Cons of the 'Traditional' syntax: There is no parse error when omitting one of the 'join' predicates and the result is a Cartesian product (known as a CROSS JOIN in the infixed syntax) and such an error can be tricky to detect and debug. Also, 'join' predicates and 'filtering' predicates are physically grouped together in the WHERE clause, which can cause them to be confused for one another.
The two queries are equal - the first is using non-ANSI JOIN syntax, the 2nd is ANSI JOIN syntax. I recommend sticking with the ANSI JOIN syntax.
And yes, LEFT OUTER JOINs (which, btw are also ANSI JOIN syntax) are what you want to use when there's a possibility that the table you're joining to might not contain any matching records.
Reference: Conditional Joins in SQL Server
OK, they execute the same. That's agreed.
Unlike many I use the older convention. That SQL-92 is "easier to understand" is debatable. Having written programming languages for pushing 40 years (gulp) I know that 'easy to read' begins first, before any other convention, with 'visual acuity' (misapplied term here but it's the best phrase I can use).
When reading SQL the FIRST thing you mind cares about is what tables are involved and then which table (most) defines the grain. Then you care about relevant constraints on the data, then the attributes selected. While SQL-92 mostly separates these ideas out, there are so many noise words, the mind's eye has to interpret and deal with these and it makes reading the SQL slower.
SELECT Mgt.attrib_a AS attrib_a
,Sta.attrib_b AS attrib_b
,Stb.attrib_c AS attrib_c
FROM Main_Grain_Table Mgt
,Surrounding_TabA Sta
,Surrounding_tabB Stb
WHERE Mgt.sta_join_col = Sta.sta_join_col
AND Mgt.stb_join_col = Stb.stb_join_col
AND Mgt.bus_logic_col = 'TIGHT'
Visual Acuity!
Put the commas for new attributes in front It makes commenting code easier too
Use a specific case for functions and keywords
Use a specific case for tables
Use a specific case for attributes
Vertically Line up operators and operations
Make the first table(s) in the FROM represent the grain of the data
Make the first tables of the WHERE be join constraints and let the specific, tight constraints float to the bottom.
Select 3 character alias for ALL tables in your database and use the alias EVERYWHERE you reference the table. You should use that alias as a prefix for (many) indexes on that table as well.
6 of 1 1/2 dozen of another, right? Maybe. But even if you're using ANSI-92 convention (as I have and in cases will continue to do) use visual acuity principles, verticle alignment to let your mind's eye avert to the places you want to see and and easily avoid things (particularly noise words) you don't need to.
Execute both and check their query plans. They should be equal.
In my mind the FROM clause is where I decide what columns I need in the rows for my SELECT clause to work on. It is where a business rule is expressed that will bring onto the same row, values needed in calculations. The business rule can be customers who have invoices, resulting in rows of invoices including the customer responsible. It could also be venues in the same postcode as clients, resulting in a list of venues and clients that are close together.
It is where I work out the centricity of the rows in my result set. After all, we are simply shown the metaphor of a list in RDBMSs, each list having a topic (the entity) and each row being an instance of the entity. If the row centricity is understood, the entity of the result set is understood.
The WHERE clause, which conceptually executes after the rows are defined in the from clause, culls rows not required (or includes rows that are required) for the SELECT clause to work on.
Because join logic can be expressed in both the FROM clause and the WHERE clause, and because the clauses exist to divide and conquer complex logic, I choose to put join logic that involves values in columns in the FROM clause because that is essentially expressing a business rule that is supported by matching values in columns.
i.e. I won't write a WHERE clause like this:
WHERE Column1 = Column2
I will put that in the FROM clause like this:
ON Column1 = Column2
Likewise, if a column is to be compared to external values (values that may or may not be in a column) such as comparing a postcode to a specific postcode, I will put that in the WHERE clause because I am essentially saying I only want rows like this.
i.e. I won't write a FROM clause like this:
ON PostCode = '1234'
I will put that in the WHERE clause like this:
WHERE PostCode = '1234'
ANSI syntax does enforce neither predicate placement in the proper clause (be that ON or WHERE), nor the affinity of the ON clause to adjacent table reference. A developer is free to write a mess like this
SELECT
C.FullName,
C.CustomerCode,
O.OrderDate,
O.OrderTotal,
OD.ExtendedShippingNotes
FROM
Customer C
CROSS JOIN Order O
INNER JOIN OrderDetail OD
ON C.CustomerID = O.CustomerID
AND C.CustomerStatus = 'Preferred'
AND O.OrderTotal > 1000.0
WHERE
O.OrderID = OD.OrderID;
Speaking of query tools who "will generate ANSI-92", I'm commenting here because it generated
SELECT 1
FROM DEPARTMENTS C
JOIN EMPLOYEES A
JOIN JOBS B
ON C.DEPARTMENT_ID = A.DEPARTMENT_ID
ON A.JOB_ID = B.JOB_ID
The only syntax that escapes conventional "restrict-project-cartesian product" is outer join. This operation is more complicated because it is not associative (both with itself and with normal join). One have to judiciously parenthesize query with outer join, at least. However, it is an exotic operation; if you are using it too often I suggest taking relational database class.

select * from a,b where a.id=b.id VS select * from a inner join b on a.id=b.id [duplicate]

I have my business-logic in ~7000 lines of T-SQL stored procedures, and most of them has next JOIN syntax:
SELECT A.A, B.B, C.C
FROM aaa AS A, bbb AS B, ccc AS C
WHERE
A.B = B.ID
AND B.C = C.ID
AND C.ID = #param
Will I get performance growth if I will replace such query with this:
SELECT A.A, B.B, C.C
FROM aaa AS A
JOIN bbb AS B
ON A.B = B.ID
JOIN ccc AS C
ON B.C = C.ID
AND C.ID = #param
Or they are the same?
The two queries are the same, except the second is ANSI-92 SQL syntax and the first is the older SQL syntax which didn't incorporate the join clause. They should produce exactly the same internal query plan, although you may like to check.
You should use the ANSI-92 syntax for several of reasons
The use of the JOIN clause separates
the relationship logic from the
filter logic (the WHERE) and is thus
cleaner and easier to understand.
It doesn't matter with this particular query, but there are a few circumstances where the older outer join syntax (using + ) is ambiguous and the query results are hence implementation dependent - or the query cannot be resolved at all. These do not occur with ANSI-92
It's good practice as most developers and dba's will use ANSI-92 nowadays and you should follow the standard. Certainly all modern query tools will generate ANSI-92.
As pointed out by #gbn, it does tend to avoid accidental cross joins.
Myself I resisted ANSI-92 for some time as there is a slight conceptual advantage to the old syntax as it's a easier to envisage the SQL as a mass Cartesian join of all tables used followed by a filtering operation - a mental technique that can be useful for grasping what a SQL query is doing. However I decided a few years ago that I needed to move with the times and after a relatively short adjustment period I now strongly prefer it - predominantly because of the first reason given above. The only place that one should depart from the ANSI-92 syntax, or rather not use the option, is with natural joins which are implicitly dangerous.
The second construct is known as the "infixed join syntax" in the SQL community. The first construct AFAIK doesn't have widely accepted name so let's call it the 'old style' inner join syntax.
The usual arguments go like this:
Pros of the 'Traditional' syntax: the
predicates are physically grouped together in the WHERE clause in
whatever order which makes the query generally, and n-ary relationships particularly, easier to read and understand (the ON clauses of the infixed syntax can spread out the predicates so you have to look for the appearance of one table or column over a visual distance).
Cons of the 'Traditional' syntax: There is no parse error when omitting one of the 'join' predicates and the result is a Cartesian product (known as a CROSS JOIN in the infixed syntax) and such an error can be tricky to detect and debug. Also, 'join' predicates and 'filtering' predicates are physically grouped together in the WHERE clause, which can cause them to be confused for one another.
The two queries are equal - the first is using non-ANSI JOIN syntax, the 2nd is ANSI JOIN syntax. I recommend sticking with the ANSI JOIN syntax.
And yes, LEFT OUTER JOINs (which, btw are also ANSI JOIN syntax) are what you want to use when there's a possibility that the table you're joining to might not contain any matching records.
Reference: Conditional Joins in SQL Server
OK, they execute the same. That's agreed.
Unlike many I use the older convention. That SQL-92 is "easier to understand" is debatable. Having written programming languages for pushing 40 years (gulp) I know that 'easy to read' begins first, before any other convention, with 'visual acuity' (misapplied term here but it's the best phrase I can use).
When reading SQL the FIRST thing you mind cares about is what tables are involved and then which table (most) defines the grain. Then you care about relevant constraints on the data, then the attributes selected. While SQL-92 mostly separates these ideas out, there are so many noise words, the mind's eye has to interpret and deal with these and it makes reading the SQL slower.
SELECT Mgt.attrib_a AS attrib_a
,Sta.attrib_b AS attrib_b
,Stb.attrib_c AS attrib_c
FROM Main_Grain_Table Mgt
,Surrounding_TabA Sta
,Surrounding_tabB Stb
WHERE Mgt.sta_join_col = Sta.sta_join_col
AND Mgt.stb_join_col = Stb.stb_join_col
AND Mgt.bus_logic_col = 'TIGHT'
Visual Acuity!
Put the commas for new attributes in front It makes commenting code easier too
Use a specific case for functions and keywords
Use a specific case for tables
Use a specific case for attributes
Vertically Line up operators and operations
Make the first table(s) in the FROM represent the grain of the data
Make the first tables of the WHERE be join constraints and let the specific, tight constraints float to the bottom.
Select 3 character alias for ALL tables in your database and use the alias EVERYWHERE you reference the table. You should use that alias as a prefix for (many) indexes on that table as well.
6 of 1 1/2 dozen of another, right? Maybe. But even if you're using ANSI-92 convention (as I have and in cases will continue to do) use visual acuity principles, verticle alignment to let your mind's eye avert to the places you want to see and and easily avoid things (particularly noise words) you don't need to.
Execute both and check their query plans. They should be equal.
In my mind the FROM clause is where I decide what columns I need in the rows for my SELECT clause to work on. It is where a business rule is expressed that will bring onto the same row, values needed in calculations. The business rule can be customers who have invoices, resulting in rows of invoices including the customer responsible. It could also be venues in the same postcode as clients, resulting in a list of venues and clients that are close together.
It is where I work out the centricity of the rows in my result set. After all, we are simply shown the metaphor of a list in RDBMSs, each list having a topic (the entity) and each row being an instance of the entity. If the row centricity is understood, the entity of the result set is understood.
The WHERE clause, which conceptually executes after the rows are defined in the from clause, culls rows not required (or includes rows that are required) for the SELECT clause to work on.
Because join logic can be expressed in both the FROM clause and the WHERE clause, and because the clauses exist to divide and conquer complex logic, I choose to put join logic that involves values in columns in the FROM clause because that is essentially expressing a business rule that is supported by matching values in columns.
i.e. I won't write a WHERE clause like this:
WHERE Column1 = Column2
I will put that in the FROM clause like this:
ON Column1 = Column2
Likewise, if a column is to be compared to external values (values that may or may not be in a column) such as comparing a postcode to a specific postcode, I will put that in the WHERE clause because I am essentially saying I only want rows like this.
i.e. I won't write a FROM clause like this:
ON PostCode = '1234'
I will put that in the WHERE clause like this:
WHERE PostCode = '1234'
ANSI syntax does enforce neither predicate placement in the proper clause (be that ON or WHERE), nor the affinity of the ON clause to adjacent table reference. A developer is free to write a mess like this
SELECT
C.FullName,
C.CustomerCode,
O.OrderDate,
O.OrderTotal,
OD.ExtendedShippingNotes
FROM
Customer C
CROSS JOIN Order O
INNER JOIN OrderDetail OD
ON C.CustomerID = O.CustomerID
AND C.CustomerStatus = 'Preferred'
AND O.OrderTotal > 1000.0
WHERE
O.OrderID = OD.OrderID;
Speaking of query tools who "will generate ANSI-92", I'm commenting here because it generated
SELECT 1
FROM DEPARTMENTS C
JOIN EMPLOYEES A
JOIN JOBS B
ON C.DEPARTMENT_ID = A.DEPARTMENT_ID
ON A.JOB_ID = B.JOB_ID
The only syntax that escapes conventional "restrict-project-cartesian product" is outer join. This operation is more complicated because it is not associative (both with itself and with normal join). One have to judiciously parenthesize query with outer join, at least. However, it is an exotic operation; if you are using it too often I suggest taking relational database class.

Joins vs Join like queries - Are they equivalent?

I expect this question has been asked multiple times, with different twists.. I want to try and get a generic and comprehensive understanding of this topic though. (does it belong in programming SO? ..)
Lets say I have a table for sports and a table for matches. matches, among other fields has a sport_id column, and this is a 1:many relationship.
Lets say I want to list out sports which have matches on day X. I could do this in 3 ways that I can think of..
Nested queries - easy to reason?
SELECT *
FROM sports
WHERE id IN (SELECT sport_id FROM matches WHERE <DATE CHECK>)
From/where - easy to write?
SELECT sports.*
FROM sports, matches
WHERE sports.id = matches.sport_id
AND <DATE CHECK>
Joins - I am not too familiar, so forgive any mistakes
SELECT *
FROM sports
JOIN matches ON sports.id = matches.sport_id
WHERE <DATE CHECK>
There might be other methods based on variations of Join which might be better suited here, inner join perhaps..
What I want to know is how I can compare these 3 on the basis of
Equivalent response (same rows returned?)
Performance on DB
Are all of them 1 query/network call or ?
Are any of these answers db engine dependent?
How might I choose among these?
Is #2 syntactic sugar for #3? is #1? Or are they optimized to #3 on some/all cases?
The second and third forms are completely equivalent (except you have an extra comma in the third version). FROM sports, matches is an implicit join, FROM sports JOIN matches is an explicit join. Implicit joins are the earlier form, explicit joins are more modern and generally preferred by database experts.
The version with WHERE IN is almost the same, but there are some differences. First, SELECT * will return columns from both tables in the join, but will only return columns from sports in the WHERE IN query. Second, if a row in sports matches multiple rows in matches, the joins will return a row for each pair of matches (it performs a cross product), while WHERE IN will just return the row from sports once regardless of how many matches there are.
Performance differences are implementation dependent. There shouldn't be any difference between the explicit and implicit joins, they're just syntactic sugar. However, databases don't always optimize the WHERE IN queries the same. For instance, when I've used EXPLAIN with MySQL, WHERE IN queries often perform a full scan over the outer table, matching the column against the index of the table in the subquery, even though the subquery might only return a small number of rows. I think some people have told me that recent MySQL versions are better at this.
They will all be just 1 network call. All queries are just a single call to the database server.
BTW, there's another form that you didn't list, using WHERE EXISTS with a correlated subquery.
SELECT *
FROM sports s
WHERE EXISTS (SELECT 1
FROM matches m
WHERE s.id = m.sport_id AND <DATE CHECK>)
Performance differences between this and JOIN will again be implementation-dependent.
Here is what I think about your questions
1.Equivalent response (same rows returned?)
for first QUERY where you usered IN Oprator my answer is NO (you get same number of rows but only columns from table sports )
and second and third is almost same
2.Performance on DB
First In oprator is Slower then join beacause
The IN is evaluated (and the select from b re-run) for each row in a, whereas the JOIN is optimized to use indices and other neat paging tricks...
ANSI JOIN Syntax
SELECT fname, lname, department
FROM names INNER JOIN departments ON names.employeeid = departments.employeeid
Former Microsoft JOIN Syntax
SELECT fname, lname, department
FROM names, departments
WHERE names.employeeid = departments.employeeid
If written correctly, either format will produce identical results. But that is a big if. The older Microsoft join syntax lends itself to mistakes because the syntax is a little less obvious. On the other hand, the ANSI syntax is very explicit and there is little chance you can make a mistake.
3.Are all of them 1 query/network call or ?
-Trial 1 result for IN
-Trial 2 result for Microsoft JOIN,
-Trial 3 result for ANSI JOIN
4.Are any of these answers db engine dependant?
(Sorry I didt got ans for this question)
5.How might I choose among these?
I sugges you shuold use ANSI JOIN
6.Is #2 syntactic sugar for #3? is #1? Or are they optimized to #3 on some/all cases?
-I think NO as I mentioned above #3 syntex is more batter
as per my past experience
I ran across a slow-performing query from an ERP program. After reviewing the code, which used the Microsoft JOIN syntax, I noticed that instead of creating a LEFT JOIN, the developer had accidentally created a CROSS JOIN instead. In this particular example, less than 10,000 rows should have resulted from the LEFT JOIN, but because a CROSS JOIN was used, over 11 million rows were returned instead. Then the developer used a SELECT DISTINCT to get rid of all the unnecessary rows created by the CROSS JOIN. As you can guess, this made for a very lengthy query. I notified the vendor’s support department about it, and they fixed their code.
The moral of this story is that you probably should be using the ANSI syntax, not the old Microsoft syntax. Besides reducing the odds of making silly mistakes, this code is more portable between database, and eventually, I imagine Microsoft will eventually stop supporting the old format, making the ANSI syntax the only option

Is there any difference between using innerjoin and writing all the tables directly in the from segment?

Do these two queries differ from each other?
Query 1:
SELECT * FROM Table1, Table2 WHERE Table1.Id = Table2.RefId
Query 2:
SELECT * FROM Table1 INNER JOIN Table2 ON Table1.Id = Table2.RefId
I analysed both methods and they clearly produced the same actual execution plans. Do you know any cases where using inner joins would work in a more efficient way. What is the real advantage of using inner joins rather than approaching the manner of "Query 1"?
The two statements you have provided are functionally equivalent to one another.
The variation is caused by differing SQL syntax standards.
For a really exciting read, you can lookup the various SQL standards by visiting the following Wikipedia link. On the right hand side are references and links to the various dialects/standards of SQL.
http://en.wikipedia.org/wiki/SQL
These SQL statements are synonymous, though specifying the INNER JOIN is the preferred method and follows ISO format. I prefer it as well because it limits the plumbing of joining the tables from your where clause and makes the goal of your query clearer.
These will result in an identical query plan, but the INNER JOIN, OUTER JOIN, CROSS JOIN keywords are prefered because they add clarity to the code.
While you have the ability to specifiy join hints using the keywords in the FROM clause, you can do more complicated joins in the WHERE clause. But otherwise, there will be no difference in query plan.
I will also add that the first syntax is much more subject to inadvertent cross joins as the queries get complicated. Further the left and right joins in this syntax do not work properly in SQL server and should never be used. Mixing the syntax when you add a left join can also cause problems where the query does not correctly return the results. The syntax in the first example has been outdated for 17 years, I see no reason to ever use it.
Query 1 is considered an old syntax style and its use is discouraged. You will run into problems with you use LEFT and Right joins using that syntax style. Also on SQL Server you can have problems mixing those two different syles together in queries that use view of different formats.
I have found a significant difference using the LEFT OUTER JOINS and putting the conditions on the joined table in the ON clause rather than the WHERE clause. Once you put a condition on the joined table in the WHERE clause, you defeat the left outer join.
When I was using Oracle, I used the archaic (+) after the joined table (with all conditions including join conditions in the WHERE clause)because that's what I knew. When we became a SQL Server shop, I was forced to use LEFT OUTER JOINs, and I found they didn't work as before until I discovered this behavior. Here's an example:
select NC.*,
IsNull(F.STRING_VAL, 'NONE') as USER_ID,
CO.TOTAL_AMT_ORDERED
from customer_order CO
INNER JOIN VTG_CO_NET_CHANGE NC
ON NC.CUST_ORDER_ID=CO.ID
LEFT OUTER JOIN USER_DEF_FIELDS F
ON F.DOCUMENT_ID = CO.ID and
F.PROGRAM_ID='VMORDENT' and
F.ID='UDF-0000072' and
F.DOCUMENT_ID is not null
where NC.acct_year=2017

INNER JOIN ON vs WHERE clause

For simplicity, assume all relevant fields are NOT NULL.
You can do:
SELECT
table1.this, table2.that, table2.somethingelse
FROM
table1, table2
WHERE
table1.foreignkey = table2.primarykey
AND (some other conditions)
Or else:
SELECT
table1.this, table2.that, table2.somethingelse
FROM
table1 INNER JOIN table2
ON table1.foreignkey = table2.primarykey
WHERE
(some other conditions)
Do these two work on the same way in MySQL?
INNER JOIN is ANSI syntax that you should use.
It is generally considered more readable, especially when you join lots of tables.
It can also be easily replaced with an OUTER JOIN whenever a need arises.
The WHERE syntax is more relational model oriented.
A result of two tables JOINed is a cartesian product of the tables to which a filter is applied which selects only those rows with joining columns matching.
It's easier to see this with the WHERE syntax.
As for your example, in MySQL (and in SQL generally) these two queries are synonyms.
Also, note that MySQL also has a STRAIGHT_JOIN clause.
Using this clause, you can control the JOIN order: which table is scanned in the outer loop and which one is in the inner loop.
You cannot control this in MySQL using WHERE syntax.
Others have pointed out that INNER JOIN helps human readability, and that's a top priority, I agree.
Let me try to explain why the join syntax is more readable.
A basic SELECT query is this:
SELECT stuff
FROM tables
WHERE conditions
The SELECT clause tells us what we're getting back; the FROM clause tells us where we're getting it from, and the WHERE clause tells us which ones we're getting.
JOIN is a statement about the tables, how they are bound together (conceptually, actually, into a single table).
Any query elements that control the tables - where we're getting stuff from - semantically belong to the FROM clause (and of course, that's where JOIN elements go). Putting joining-elements into the WHERE clause conflates the which and the where-from, that's why the JOIN syntax is preferred.
Applying conditional statements in ON / WHERE
Here I have explained the logical query processing steps.
Reference: Inside Microsoft® SQL Server™ 2005 T-SQL Querying
Publisher: Microsoft Press
Pub Date: March 07, 2006
Print ISBN-10: 0-7356-2313-9
Print ISBN-13: 978-0-7356-2313-2
Pages: 640
Inside Microsoft® SQL Server™ 2005 T-SQL Querying
(8) SELECT (9) DISTINCT (11) TOP <top_specification> <select_list>
(1) FROM <left_table>
(3) <join_type> JOIN <right_table>
(2) ON <join_condition>
(4) WHERE <where_condition>
(5) GROUP BY <group_by_list>
(6) WITH {CUBE | ROLLUP}
(7) HAVING <having_condition>
(10) ORDER BY <order_by_list>
The first noticeable aspect of SQL that is different than other programming languages is the order in which the code is processed. In most programming languages, the code is processed in the order in which it is written. In SQL, the first clause that is processed is the FROM clause, while the SELECT clause, which appears first, is processed almost last.
Each step generates a virtual table that is used as the input to the following step. These virtual tables are not available to the caller (client application or outer query). Only the table generated by the final step is returned to the caller. If a certain clause is not specified in a query, the corresponding step is simply skipped.
Brief Description of Logical Query Processing Phases
Don't worry too much if the description of the steps doesn't seem to make much sense for now. These are provided as a reference. Sections that come after the scenario example will cover the steps in much more detail.
FROM: A Cartesian product (cross join) is performed between the first two tables in the FROM clause, and as a result, virtual table VT1 is generated.
ON: The ON filter is applied to VT1. Only rows for which the <join_condition> is TRUE are inserted to VT2.
OUTER (join): If an OUTER JOIN is specified (as opposed to a CROSS JOIN or an INNER JOIN), rows from the preserved table or tables for which a match was not found are added to the rows from VT2 as outer rows, generating VT3. If more than two tables appear in the FROM clause, steps 1 through 3 are applied repeatedly between the result of the last join and the next table in the FROM clause until all tables are processed.
WHERE: The WHERE filter is applied to VT3. Only rows for which the <where_condition> is TRUE are inserted to VT4.
GROUP BY: The rows from VT4 are arranged in groups based on the column list specified in the GROUP BY clause. VT5 is generated.
CUBE | ROLLUP: Supergroups (groups of groups) are added to the rows from VT5, generating VT6.
HAVING: The HAVING filter is applied to VT6. Only groups for which the <having_condition> is TRUE are inserted to VT7.
SELECT: The SELECT list is processed, generating VT8.
DISTINCT: Duplicate rows are removed from VT8. VT9 is generated.
ORDER BY: The rows from VT9 are sorted according to the column list specified in the ORDER BY clause. A cursor is generated (VC10).
TOP: The specified number or percentage of rows is selected from the beginning of VC10. Table VT11 is generated and returned to the caller.
Therefore, (INNER JOIN) ON will filter the data (the data count of VT will be reduced here itself) before applying the WHERE clause. The subsequent join conditions will be executed with filtered data which improves performance. After that, only the WHERE condition will apply filter conditions.
(Applying conditional statements in ON / WHERE will not make much difference in few cases. This depends on how many tables you have joined and the number of rows available in each join tables)
The implicit join ANSI syntax is older, less obvious, and not recommended.
In addition, the relational algebra allows interchangeability of the predicates in the WHERE clause and the INNER JOIN, so even INNER JOIN queries with WHERE clauses can have the predicates rearranged by the optimizer.
I recommend you write the queries in the most readable way possible.
Sometimes this includes making the INNER JOIN relatively "incomplete" and putting some of the criteria in the WHERE simply to make the lists of filtering criteria more easily maintainable.
For example, instead of:
SELECT *
FROM Customers c
INNER JOIN CustomerAccounts ca
ON ca.CustomerID = c.CustomerID
AND c.State = 'NY'
INNER JOIN Accounts a
ON ca.AccountID = a.AccountID
AND a.Status = 1
Write:
SELECT *
FROM Customers c
INNER JOIN CustomerAccounts ca
ON ca.CustomerID = c.CustomerID
INNER JOIN Accounts a
ON ca.AccountID = a.AccountID
WHERE c.State = 'NY'
AND a.Status = 1
But it depends, of course.
Implicit joins (which is what your first query is known as) become much much more confusing, hard to read, and hard to maintain once you need to start adding more tables to your query. Imagine doing that same query and type of join on four or five different tables ... it's a nightmare.
Using an explicit join (your second example) is much more readable and easy to maintain.
I'll also point out that using the older syntax is more subject to error. If you use inner joins without an ON clause, you will get a syntax error. If you use the older syntax and forget one of the join conditions in the where clause, you will get a cross join. The developers often fix this by adding the distinct keyword (rather than fixing the join because they still don't realize the join itself is broken) which may appear to cure the problem but will slow down the query considerably.
Additionally for maintenance if you have a cross join in the old syntax, how will the maintainer know if you meant to have one (there are situations where cross joins are needed) or if it was an accident that should be fixed?
Let me point you to this question to see why the implicit syntax is bad if you use left joins.
Sybase *= to Ansi Standard with 2 different outer tables for same inner table
Plus (personal rant here), the standard using the explicit joins is over 20 years old, which means implicit join syntax has been outdated for those 20 years. Would you write application code using a syntax that has been outdated for 20 years? Why do you want to write database code that is?
The SQL:2003 standard changed some precedence rules so a JOIN statement takes precedence over a "comma" join. This can actually change the results of your query depending on how it is setup. This cause some problems for some people when MySQL 5.0.12 switched to adhering to the standard.
So in your example, your queries would work the same. But if you added a third table:
SELECT ... FROM table1, table2 JOIN table3 ON ... WHERE ...
Prior to MySQL 5.0.12, table1 and table2 would be joined first, then table3. Now (5.0.12 and on), table2 and table3 are joined first, then table1. It doesn't always change the results, but it can and you may not even realize it.
I never use the "comma" syntax anymore, opting for your second example. It's a lot more readable anyway, the JOIN conditions are with the JOINs, not separated into a separate query section.
They have a different human-readable meaning.
However, depending on the query optimizer, they may have the same meaning to the machine.
You should always code to be readable.
That is to say, if this is a built-in relationship, use the explicit join. if you are matching on weakly related data, use the where clause.
I know you're talking about MySQL, but anyway:
In Oracle 9 explicit joins and implicit joins would generate different execution plans. AFAIK that has been solved in Oracle 10+: there's no such difference anymore.
If you are often programming dynamic stored procedures, you will fall in love with your second example (using where). If you have various input parameters and lots of morph mess, then that is the only way. Otherwise, they both will run the same query plan so there is definitely no obvious difference in classic queries.
ANSI join syntax is definitely more portable.
I'm going through an upgrade of Microsoft SQL Server, and I would also mention that the =* and *= syntax for outer joins in SQL Server is not supported (without compatibility mode) for 2005 SQL server and later.
I have two points for the implicit join (The second example):
Tell the database what you want, not what it should do.
You can write all tables in a clear list that is not cluttered by join conditions. Then you can much easier read what tables are all mentioned. The conditions come all in the WHERE part, where they are also all lined up one below the other. Using the JOIN keyword mixes up tables and conditions.