When to use SQL natural join instead of join .. on? - sql

I'm studying SQL for a database exam and the way I've seen SQL is they way it looks on this page:
http://en.wikipedia.org/wiki/Star_schema
IE join written the way Join <table name> On <table attribute> and then the join condition for the selection. My course book and my exercises given to me from the academic institution however, use only natural join in their examples. So when is it right to use natural join? Should natural join be used if the query can also be written using JOIN .. ON ?
Thanks for any answer or comment

A natural join will find columns with the same name in both tables and add one column in the result for each pair found. The inner join lets you specify the comparison you want to make using any column.

IMO, the JOIN ON syntax is much more readable and maintainable than the natural join syntax. Natural joins is a leftover of some old standards, and I try to avoid it like the plague.

A natural join will find columns with the same name in both tables and add one column in the result for each pair found. The inner join lets you specify the comparison you want to make using any column.
The JOIN keyword is used in an SQL statement to query data from two or more tables, based on a relationship between certain columns in these tables.
Different Joins
* JOIN: Return rows when there is at least one match in both tables
* LEFT JOIN: Return all rows from the left table, even if there are no matches in the right table
* RIGHT JOIN: Return all rows from the right table, even if there are no matches in the left table
* FULL JOIN: Return rows when there is a match in one of the tables
INNER JOIN
http://www.w3schools.com/sql/sql_join_inner.asp
FULL JOIN
http://www.w3schools.com/sql/sql_join_full.asp

A natural join is said to be an abomination because it does not allow qualifying key columns, which makes it confusing. Because you never know which "common" columns are being used to join two tables simply by looking at the sql statement.

A NATURAL JOIN matches on any shared column names between the tables, whereas an INNER JOIN only matches on the given ON condition.
The joins often interchangeable and usually produce the same results. However, there are some important considerations to make:
If a NATURAL JOIN finds no matching columns, it returns the cross
product. This could produce disastrous results if the schema is
modified. On the other hand, an INNER JOIN will return a 'column does
not exist' error. This is much more fault tolerant.
An INNER JOIN self-documents with its ON clause, resulting in a
clearer query that describes the table schema to the reader.
An INNER JOIN results in a maintainable and reusable query in
which the column names can be swapped in and out with changes in the
use case or table schema.
The programmer can notice column name mis-matches (e.g. item_ID vs itemID) sooner if they are forced to define the ON predicate.
Otherwise, a NATURAL JOIN is still a good choice for a quick, ad-hoc query.

Related

ORA-01417 - Two outer joins error. New join syntax?

I never got my head around the "new" SQL join syntax, and therefore use the "old" join system, with the (+). I know it's about time I learned it - however I just find the old syntax a lot more intuitive, especially when working with multiple tables with multiple joins.
However I now have an operation which requires two outer joins on the same table. My code is:
SELECT
C.ID,
R.VALUE,
R.LOG_ID,
LOG.ACTION
FROM
C,
R,
LOG
WHERE
C.DELETED IS NULL
AND R.DELETED IS NULL
-- Two joins below
AND R.C_ID(+) = C.ID
AND R.LOG_ID(+) = LOG.ID
However this results in an error:
ORA-01417 - A table may be outer joined to at most one table.
Searching for this error I find that the solution is to use the new syntax For example this answer on SO:
Outer join between three tables causing Oracle ORA-01417 error
So I am aware that some may consider this question a duplicate as it technically already has an answer. However the "old" syntax posed in that question does not contain exactly the same number of tables and joins as I have here, and try as I might, I'm not sure how I would factor this in to my own code.
Is anyone able to assist? Thanks.
I think you want:
SELECT C.ID, R.VALUE, R.LOG_ID, LOG.ACTION
FROM C LEFT JOIN
R
ON R.C_ID = C.ID LEFT JOIN
LOG
ON R.LOG_ID = LOG.I
WHERE C.DELETED IS NULL AND
R.DELETED IS NULL;
The "new" (it is 25 years old) outer join syntax is actually very easy to follow, particularly for a simple example with just LEFT JOIN.
The idea is you want to keep all rows from one table (perhaps subject to filters in the WHERE clause). That is the first table. Then you use a chain of LEFT JOIN to bring in other tables.
All rows from the first table are in the result set. If there are matching rows in the other tables, then columns from those tables come from matching rows. If there are no matches, then the row from the first table is kept.

Using "From Multiple Tables" or "Join" performance difference [duplicate]

Most SQL dialects accept both the following queries:
SELECT a.foo, b.foo
FROM a, b
WHERE a.x = b.x
SELECT a.foo, b.foo
FROM a
LEFT JOIN b ON a.x = b.x
Now obviously when you need an outer join, the second syntax is required. But when doing an inner join why should I prefer the second syntax to the first (or vice versa)?
The old syntax, with just listing the tables, and using the WHERE clause to specify the join criteria, is being deprecated in most modern databases.
It's not just for show, the old syntax has the possibility of being ambiguous when you use both INNER and OUTER joins in the same query.
Let me give you an example.
Let's suppose you have 3 tables in your system:
Company
Department
Employee
Each table contain numerous rows, linked together. You got multiple companies, and each company can have multiple departments, and each department can have multiple employees.
Ok, so now you want to do the following:
List all the companies, and include all their departments, and all their employees. Note that some companies don't have any departments yet, but make sure you include them as well. Make sure you only retrieve departments that have employees, but always list all companies.
So you do this:
SELECT * -- for simplicity
FROM Company, Department, Employee
WHERE Company.ID *= Department.CompanyID
AND Department.ID = Employee.DepartmentID
Note that the last one there is an inner join, in order to fulfill the criteria that you only want departments with people.
Ok, so what happens now. Well, the problem is, it depends on the database engine, the query optimizer, indexes, and table statistics. Let me explain.
If the query optimizer determines that the way to do this is to first take a company, then find the departments, and then do an inner join with employees, you're not going to get any companies that don't have departments.
The reason for this is that the WHERE clause determines which rows end up in the final result, not individual parts of the rows.
And in this case, due to the left join, the Department.ID column will be NULL, and thus when it comes to the INNER JOIN to Employee, there's no way to fulfill that constraint for the Employee row, and so it won't appear.
On the other hand, if the query optimizer decides to tackle the department-employee join first, and then do a left join with the companies, you will see them.
So the old syntax is ambiguous. There's no way to specify what you want, without dealing with query hints, and some databases have no way at all.
Enter the new syntax, with this you can choose.
For instance, if you want all companies, as the problem description stated, this is what you would write:
SELECT *
FROM Company
LEFT JOIN (
Department INNER JOIN Employee ON Department.ID = Employee.DepartmentID
) ON Company.ID = Department.CompanyID
Here you specify that you want the department-employee join to be done as one join, and then left join the results of that with the companies.
Additionally, let's say you only want departments that contains the letter X in their name. Again, with old style joins, you risk losing the company as well, if it doesn't have any departments with an X in its name, but with the new syntax, you can do this:
SELECT *
FROM Company
LEFT JOIN (
Department INNER JOIN Employee ON Department.ID = Employee.DepartmentID
) ON Company.ID = Department.CompanyID AND Department.Name LIKE '%X%'
This extra clause is used for the joining, but is not a filter for the entire row. So the row might appear with company information, but might have NULLs in all the department and employee columns for that row, because there is no department with an X in its name for that company. This is hard with the old syntax.
This is why, amongst other vendors, Microsoft has deprecated the old outer join syntax, but not the old inner join syntax, since SQL Server 2005 and upwards. The only way to talk to a database running on Microsoft SQL Server 2005 or 2008, using the old style outer join syntax, is to set that database in 8.0 compatibility mode (aka SQL Server 2000).
Additionally, the old way, by throwing a bunch of tables at the query optimizer, with a bunch of WHERE clauses, was akin to saying "here you are, do the best you can". With the new syntax, the query optimizer has less work to do in order to figure out what parts goes together.
So there you have it.
LEFT and INNER JOIN is the wave of the future.
The JOIN syntax keeps conditions near the table they apply to. This is especially useful when you join a large amount of tables.
By the way, you can do an outer join with the first syntax too:
WHERE a.x = b.x(+)
Or
WHERE a.x *= b.x
Or
WHERE a.x = b.x or a.x not in (select x from b)
Basically, when your FROM clause lists tables like so:
SELECT * FROM
tableA, tableB, tableC
the result is a cross product of all the rows in tables A, B, C. Then you apply the restriction WHERE tableA.id = tableB.a_id which will throw away a huge number of rows, then further ... AND tableB.id = tableC.b_id and you should then get only those rows you are really interested in.
DBMSs know how to optimise this SQL so that the performance difference to writing this using JOINs is negligible (if any). Using the JOIN notation makes the SQL statement more readable (IMHO, not using joins turns the statement into a mess). Using the cross product, you need to provide join criteria in the WHERE clause, and that's the problem with the notation. You are crowding your WHERE clause with stuff like
tableA.id = tableB.a_id
AND tableB.id = tableC.b_id
which is only used to restrict the cross product. WHERE clause should only contain RESTRICTIONS to the resultset. If you mix table join criteria with resultset restrictions, you (and others) will find your query harder to read. You should definitely use JOINs and keep the FROM clause a FROM clause, and the WHERE clause a WHERE clause.
The first way is the older standard. The second method was introduced in SQL-92, http://en.wikipedia.org/wiki/SQL. The complete standard can be viewed at http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt .
It took many years before database companies adopted the SQL-92 standard.
So the reason why the second method is preferred, it is the SQL standard according the ANSI and ISO standards committee.
The second is preferred because it is far less likely to result in an accidental cross join by forgetting to put inthe where clause. A join with no on clause will fail the syntax check, an old style join with no where clause will not fail, it will do a cross join.
Additionally when you later have to a left join, it is helpful for maintenance that they all be in the same structure. And the old syntax has been out of date since 1992, it is well past time to stop using it.
Plus I have found that many people who exclusively use the first syntax don't really understand joins and understanding joins is critical to getting correct results when querying.
I think there are some good reasons on this page to adopt the second method -using explicit JOINs. The clincher though is that when the JOIN criteria are removed from the WHERE clause it becomes much easier to see the remaining selection criteria in the WHERE clause.
In really complex SELECT statements it becomes much easier for a reader to understand what is going on.
The SELECT * FROM table1, table2, ... syntax is ok for a couple of tables, but it becomes exponentially (not necessarily a mathematically accurate statement) harder and harder to read as the number of tables increases.
The JOIN syntax is harder to write (at the beginning), but it makes it explicit what criteria affects which tables. This makes it much harder to make a mistake.
Also, if all the joins are INNER, then both versions are equivalent. However, the moment you have an OUTER join anywhere in the statement, things get much more complicated and it's virtually guarantee that what you write won't be querying what you think you wrote.
When you need an outer join the second syntax is not always required:
Oracle:
SELECT a.foo, b.foo
FROM a, b
WHERE a.x = b.x(+)
MSSQLServer (although it's been deprecated in 2000 version)/Sybase:
SELECT a.foo, b.foo
FROM a, b
WHERE a.x *= b.x
But returning to your question. I don't know the answer, but it is probably related to the fact that a join is more natural (syntactically, at least) than adding an expression to a where clause when you are doing exactly that: joining.
I hear a lot of people complain the first one is too difficult to understand and that it is unclear. I don't see a problem with it, but after having that discussion, I use the second one even on INNER JOINS for clarity.
To the database, they end up being the same. For you, though, you'll have to use that second syntax in some situations. For the sake of editing queries that end up having to use it (finding out you needed a left join where you had a straight join), and for consistency, I'd pattern only on the 2nd method. It'll make reading queries easier.
Well the first and second queries may yield different results because a LEFT JOIN includes all records from the first table, even if there are no corresponding records in the right table.
If both are inner joins there is no difference in the semantics or the execution of the SQL or performance. Both are ANSI Standard SQL It is purely a matter of preference, of coding standards within your work group.
Over the last 25 years, I've developed the habit that if I have a fairly complicated SQL I will use the INNER JOIN syntax because it is easier for the reader to pick out the structure of the query at a glance. It also gives more clarity by singling out the join conditions from the residual conditions, which can save time (and mistakes) if you ever come back to your query months later.
However for outer joins, for the purpose of clarity I would not under any circumstances use non-ansi extensions.

using parenthesis in SQL

What's the differences between these SQLs:
SELECT *
FROM COURS NATURAL JOIN COMPOSITIONCOURS NATURAL JOIN PARCOURS;
SELECT *
FROM COURS NATURAL JOIN (COMPOSITIONCOURS C JOIN PARCOURS P ON C.IDPARCOURS = P.IDPARCOURS) ;
SELECT *
FROM (COURS NATURAL JOIN COMPOSITIONCOURS C) JOIN PARCOURS P ON C.IDPARCOURS = P.IDPARCOURS ;
They have different results.
It's difficult to tell precisely without a sample result set, but I would imagine its your utilization of natural JOINs which is typically bad practice. Reasons being:
I would avoid using natural joins, because natural joins are:
Not standard SQL.
Not informative. You aren't specifying what columns are being joined without referring to the schema.
Your conditions are vulnerable to schema changes. If there are multiple natural join columns and one such column is removed from a table, the query will still execute, but probably not correctly and this change in behavior will be silent.
Thanks for all of yours answers! I finally find out that the problem is because of the use of natural JoIn.
When using natural join, it will join every column with the same name! When I am using NATURAL JOIN for several times , it's possible those columns, which have same names but you don't wants to combine , could be joined automatically!
Here is an example,
Table a (IDa, name,year,IDb)
Table b (IDb, bonjour,salute,IDc)
Table c (IDc, year, merci)
If I wrote like From a Natural Join b Natural c,because IDb and IDc are common names for Table a,b,c. It seems Ok! But attention please!
a.year and b.year can also be joined ! That's what we don't want!

INNER JOIN vs multiple table names in "FROM" [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
INNER JOIN versus WHERE clause — any difference?
What is the difference between an INNER JOIN query and an implicit join query (i.e. listing multiple tables after the FROM keyword)?
For example, given the following two tables:
CREATE TABLE Statuses(
id INT PRIMARY KEY,
description VARCHAR(50)
);
INSERT INTO Statuses VALUES (1, 'status');
CREATE TABLE Documents(
id INT PRIMARY KEY,
statusId INT REFERENCES Statuses(id)
);
INSERT INTO Documents VALUES (9, 1);
What is the difference between the below two SQL queries?
From the testing I've done, they return the same result. Do they do the same thing? Are there situations where they will return different result sets?
-- Using implicit join (listing multiple tables)
SELECT s.description
FROM Documents d, Statuses s
WHERE d.statusId = s.id
AND d.id = 9;
-- Using INNER JOIN
SELECT s.description
FROM Documents d
INNER JOIN Statuses s ON d.statusId = s.id
WHERE d.id = 9;
There is no reason to ever use an implicit join (the one with the commas). Yes for inner joins it will return the same results. However, it is subject to inadvertent cross joins especially in complex queries and it is harder for maintenance because the left/right outer join syntax (deprecated in SQL Server, where it doesn't work correctly right now anyway) differs from vendor to vendor. Since you shouldn't mix implicit and explict joins in the same query (you can get wrong results), needing to change something to a left join means rewriting the entire query.
If you do it the first way, people under the age of 30 will probably chuckle at you, but as long as you're doing an inner join, they produce the same result and the optimizer will generate the same execution plan (at least as far as I've ever been able to tell).
This does of course presume that the where clause in the first query is how you would be joining in the second query.
This will probably get closed as a duplicate, btw.
The nice part of the second method is that it helps separates the join condition (on ...) from the filter condition (where ...). This can help make the intent of the query more readable.
The join condition will typically be more descriptive of the structure of the database and the relation between the tables. e.g., the salary table is related to the employee table by the EmployeeID column, and queries involving those two tables will probably always join on that column.
The filter condition is more descriptive of the specific task being performed by the query. If the query is FindRichPeople, the where clause might be "where salaries.Salary > 1000000"... thats describing the task at hand, not the database structure.
Note that the SQL compiler doesn't see it that way... if it decides that it will be faster to cross join and then filter the results, it will cross join and filter the results. It doesn't care what is in the ON clause and whats in the WHERE clause. But, that typically wont happen if the on clause matches a foreign key or joins to a primary key or indexed column. As far as operating correctly, they are identical; as far as writing readable, maintainable code, the second way is probably a little better.
there is no difference as far as I know is the second one with the inner join the new way to write such statements and the first one the old method.
The first one does a Cartesian product on all record within those two tables then filters by the where clause.
The second only joins on records that meet the requirements of your ON clause.
EDIT: As others have indicated, the optimization engine will take care of an attempt on a Cartesian product and will result in the same query more or less.
A bit same. Can help you out.
Left join vs multiple tables in SQL (a)
Left join vs multiple tables in SQL (b)
In the example you've given, the queries are equivalent; if you're using SQL Server, run the query and display the actual exection plan to see what the server's doing internally.

What are the uses of the different join operations?

What are the uses of the different join operations in SQL? Like I want to know why do we need the different inner and outer joins?
The only type of join you really need is LEFT OUTER JOIN. Every other type of join can be rewritten in terms of one or more left outer joins, and possibly some filtering. So why do we need all the others? Is it just to confuse people? Wouldn't it be simpler if there were only one type of join?
You could also ask: Why have both a <= b and b >= a? Don't these just do the same thing? Can't we just get rid of one of them? It would simplify things!
Sometimes it's easier to swap <= to >= instead of swapping the arguments round. Similarly, a left join and a right join are the same thing just with the operands swapped. But again it's practical to have both options instead of requiring people to write their queries in a specific order.
Another thing you could ask is: In logic why do we have AND, OR, NOT, XOR, NAND, NOR, etc? All these can be rewritten in terms of NANDs! Why not just have NAND? Well it's awkward to write an OR in terms of NANDs, and it's not as obvious what the intention is - if you write OR, people know immediately what you mean. If you write a bunch of NANDs, it is not obvious what you are trying to achieve.
Similarly, if you want to do a FULL OUTER JOIN b you could make a left join and a right join, remove duplicated results, and then union all. But that's a pain and so there's a shorthand for it.
When do you use each one? Here's a simplified rule:
If you always want a result row for each row in the LEFT table, use a LEFT OUTER JOIN.
If you always want a result row for each row in the RIGHT table, use a RIGHT OUTER JOIN.
If you always want a result row for each row in either table, use a FULL OUTER JOIN.
If you only want a result row when there's a row in both tables, use an INNER JOIN.
If you want all possible pairs of rows, one row from each table, use a CROSS JOIN.
inner join - joins rows from both sets of the match based on specified criteria.
outer join - selects all of one set, along with matching or empty (if not matched) elements from the other set. Outer joins can be left or right, to specify which set is returned in its entirety.
To make the other answers clearer - YOU GET DIFFERENT RESULTS according to the join you choose, when the columns you're joining on contain null values - for example.
So - for each Real-life scenario there is a join that suits it (either you want the lines without the data or not in the null values example).
My answer assumes 2 tables joined on a single key:
INNER JOIN - get the results that are in both join tables (according to the join rule)
FULL OUTER JOIN - get all results from both table (Cartesian product)
LEFT OUTER JOIN - get all the results from left table and the matching results from the right
You can add WHERE clauses in order to further constrain the results.
Use these in order to only get what you want to get.