How is a natural join different from a JOIN ON clause? - sql

So for a fun little lab my Professor assigned, he wants us to create our own queries using different join operations. The ones that I'm curious about are NATURAL JOIN and JOIN ON.
The textbook definition of a natural join - "returns all rows with matching values in the matching columns and eliminates duplicates columns." So, say I have two tables, Customers and Orders. I list all orders submitted by the customer with an id = 1 as follows:
Select Customers.Name
From Customers, Orders
Where Customers.ID = 1
AND Customers.ID = Orders.CID
I want to know how that is different from JOIN ON, which according to the textbook "returns rows that meet the indicated join condition, and typically includes an equality comparison of two expressed columns" i.e. a primary key of one table and a foreign key of another. So a JOIN ON clause essentially does the same thing as a natural join. It returns all rows with matching values according to the parameters specified in the ON clause.
Select Customers.Name
From Customers JOIN Orders ON Customers.ID = Orders.CID
Same results. Is the latter just an easier way to write a natural join, or is there something I'm missing here?
Kinda like how in JavaScript, I can say:
var array = new Array(1, 2, 3);
OR I could just use the quicker and easier literal, without the constructor:
var array = [1, 2, 3];
Edit: Didn't even realize that the natural join uses a JOIN keyword in the FROM clause, and omits the WHERE clause. That just shows how little I know about this language. I'll keep the error for the sake of tracking my own progress.

NATURAL JOIN is :
always an equi-join
always matches by equality of all of the same-named attributes
which in essence boils down to there being no way at all to specify the JOIN condition. So you can only specify T1 NATURAL JOIN T2 and that's it, SQL will derive the entire matching condition from just that.
JOIN ON is :
not always an equi-JOIN (you can also specify JOIN ON T1.A > T2.A)
not always involving all attributes that correspond by name (if both tables have an attribute named A, you can still leave out ON T1.A = T2.A).
Your ID/CID example is not suitable for using NATURAL JOIN directly. You would have to rename attributes to get the column/attribute matching you want by stating [something like] :
SELECT Customers.Name
From Customers
NATURAL JOIN
(SELECT CID as ID FROM Orders)
(And as you stated in the question yourself, there is the thing about duplicate removal, which no other form of JOIN does by and of itself. It's an issue of scrutinous conformance to relational theory, which SQL as a whole doesn't exactly excel at, to put it mildly.)

Related

Why does FULL JOIN order make a difference in these queries?

I'm using PostgreSQL. Everything I read here suggests that in a query using nothing but full joins on a single column, the order of tables joined basically doesn't matter.
My intuition says this should also go for multiple columns, so long as every common column is listed in the query where possible (that is, wherever both joined tables have the column in common). But this is not the case, and I'm trying to figure out why.
Simplified to three tables a, b, and c.
Columns in table a: id, name_a
Columns in table b: id, id_x
Columns in table c: id, id_x
This query:
SELECT *
FROM a
FULL JOIN b USING(id)
FULL JOIN c USING(id, id_x);
returns a different number of rows than this one:
SELECT *
FROM a
FULL JOIN c USING(id)
FULL JOIN b USING(id, id_x);
What I want/expect is hard to articulate, but basically, a I'd like a "complete" full merger. I want no null fields anywhere unless that is unavoidable.
For example, whenever there is a not-null id, I want the corresponding name column to always have the name_a and not be null. Instead, one of those example queries returns semi-redundant results, with one row having a name_a but no id, and another having an id but no name_a, rather than a single merged row.
When the joins are listed in the other order, I do get that desired result (but I'm not sure what other problems might occur, because future data is unknown).
Your queries are different.
In the first, you are doing a full join to b using a single column, id.
In the second, you are doing a full join to b using two columns.
Although the two queries could return the same results under some circumstances, there is not reason to think that the results would be comparable.
Argument order matters in OUTER JOINs, except that FULL NATURAL JOIN is symmetric. They return what an INNER JOIN (ON, USING or NATURAL) does but also the unmatched rows from the left (LEFT JOIN), right (RIGHT JOIN) or both (FULL JOIN) tables extended by NULLs.
USING returns the single shared value for each specified column in INNER JOIN rows; in NULL-extended rows another common column can have NULL in one table's version and a value in the other's.
Join order matters too. Even FULL NATURAL JOIN is not associative, since with multiple tables each pair of tables (either operand being an original or join result) can have a unique set of common columns, ie in general (A ⟗ B) ⟗ C ≠ A ⟗ (B ⟗ C).
There are a lot of special cases where certain additional identities hold. Eg FULL JOIN USING all common column names and OUTER JOIN ON equality of same-named columns are symmetric. Some cases involve CKs (candidate keys), FKs (foreign keys) and other constraints on arguments.
Your question doesn't make clear exactly what input conditions you are assuming or what output conditions you are seeking.

using parenthesis in SQL

What's the differences between these SQLs:
SELECT *
FROM COURS NATURAL JOIN COMPOSITIONCOURS NATURAL JOIN PARCOURS;
SELECT *
FROM COURS NATURAL JOIN (COMPOSITIONCOURS C JOIN PARCOURS P ON C.IDPARCOURS = P.IDPARCOURS) ;
SELECT *
FROM (COURS NATURAL JOIN COMPOSITIONCOURS C) JOIN PARCOURS P ON C.IDPARCOURS = P.IDPARCOURS ;
They have different results.
It's difficult to tell precisely without a sample result set, but I would imagine its your utilization of natural JOINs which is typically bad practice. Reasons being:
I would avoid using natural joins, because natural joins are:
Not standard SQL.
Not informative. You aren't specifying what columns are being joined without referring to the schema.
Your conditions are vulnerable to schema changes. If there are multiple natural join columns and one such column is removed from a table, the query will still execute, but probably not correctly and this change in behavior will be silent.
Thanks for all of yours answers! I finally find out that the problem is because of the use of natural JoIn.
When using natural join, it will join every column with the same name! When I am using NATURAL JOIN for several times , it's possible those columns, which have same names but you don't wants to combine , could be joined automatically!
Here is an example,
Table a (IDa, name,year,IDb)
Table b (IDb, bonjour,salute,IDc)
Table c (IDc, year, merci)
If I wrote like From a Natural Join b Natural c,because IDb and IDc are common names for Table a,b,c. It seems Ok! But attention please!
a.year and b.year can also be joined ! That's what we don't want!

JOIN on references

Why doesn't SQL contain a keyword for specifying a join on columns in one table that references columns in another table?
Like NATURAL in a join specifies that it's on columns with the same name.
I often find myself doing something like SELECT ... FROM a JOIN b ON a.b_id=b.id; where the b_id column in a is defined to reference the id column in b. That seems like an awful lot of typing for something quite natural?
Would such a feature be particularly hard to implement or undesirable for some reason that I haven't thought of?
I mostly know SQL from postgresql, so if most other RDBMS'es has such a feature, the questions is just why postgresql doesn't have it.
In fact, SQL does. It is the USING clause:
select . . .
from a join
b
using (id);
As the documentation explains:
The USING clause is a shorthand that allows you to take advantage of
the specific situation where both sides of the join use the same name
for the joining column(s). It takes a comma-separated list of the
shared column names and forms a join condition that includes an
equality comparison for each one. For example, joining T1 and T2 with
USING (a, b) produces the join condition ON T1.a = T2.a AND T1.b =
T2.b.
It does still require that the names be the same in the two tables. But, because the names are explicitly specified in the join condition, this is a safe alternative to natural join (which I consider to be a bug waiting to happen).

Is it better to use where instead of adding conditions into join clause in SQL queries? [duplicate]

This question already has answers here:
INNER JOIN ON vs WHERE clause
(12 answers)
Closed 8 years ago.
Hello :) I've got a question on MySQL queries.
Which one's faster and why?
Is there any difference at all?
select tab1.something, tab2.smthelse
from tab1 inner join tab2 on tab1.pk=tab2.fk
WHERE tab2.somevalue = 'value'
Or this one:
select tab1.something, tab2.smthelse
from tab1 inner join tab2 on tab1.pk=tab2.fk
AND tab2.somevalue = 'value'
As Simon noted, the difference in performance should be negligible. The main concern would be ensuring your query correctly expresses your intent, and (especially) you get the expected results.
Generally, you want to add filters to the JOIN clause only if the filter is a condition of the join. In most (not all) cases, a filter should be applied to the WHERE clause, as it is a filter of the overall query, not of the join itself.
AFAIK, the only instance where this really affects the outcome of the query is when using an OUTER JOIN.
Consider the following queries:
SELECT *
FROM Customer c
LEFT JOIN Orders o ON c.CustomerId = o.CustomerId
WHERE o.OrderType = "InternetOrder"
vs.
SELECT *
FROM Customer c
LEFT JOIN Orders o ON c.CustomerId = o.CustomerId AND o.OrderType = "InternetOrder"
The first will return one row for each customer order that has an order type of "Internet Order". In effect, your left join has become an inner join because of the filter that was applied to the whole query (i.e. customers who do not have an "InternetOrder" will not be returned at all).
The second will return at least one row for each customer. If the customer has no orders of order type "Internet Order", it will return null values for all order table fields. Otherwise it will return one row for each customer order of type "Internet Order".
If the constraint is based off the joined table (as yours is) then it makes sense to specify the constraint when you join.
This way MySQL is able to reduce the rows from the joined table at the time it joins, as otherwise it will need to be able to select all data that fulfills the basic JOIN criteria prior to applying the WHERE logic.
In reality you'll see little difference in performance until you get to more complex queries or larger datasets, however limiting the data at each JOIN will be more efficient overall if done well especially if there are good indexes on the joined table.

When to use SQL natural join instead of join .. on?

I'm studying SQL for a database exam and the way I've seen SQL is they way it looks on this page:
http://en.wikipedia.org/wiki/Star_schema
IE join written the way Join <table name> On <table attribute> and then the join condition for the selection. My course book and my exercises given to me from the academic institution however, use only natural join in their examples. So when is it right to use natural join? Should natural join be used if the query can also be written using JOIN .. ON ?
Thanks for any answer or comment
A natural join will find columns with the same name in both tables and add one column in the result for each pair found. The inner join lets you specify the comparison you want to make using any column.
IMO, the JOIN ON syntax is much more readable and maintainable than the natural join syntax. Natural joins is a leftover of some old standards, and I try to avoid it like the plague.
A natural join will find columns with the same name in both tables and add one column in the result for each pair found. The inner join lets you specify the comparison you want to make using any column.
The JOIN keyword is used in an SQL statement to query data from two or more tables, based on a relationship between certain columns in these tables.
Different Joins
* JOIN: Return rows when there is at least one match in both tables
* LEFT JOIN: Return all rows from the left table, even if there are no matches in the right table
* RIGHT JOIN: Return all rows from the right table, even if there are no matches in the left table
* FULL JOIN: Return rows when there is a match in one of the tables
INNER JOIN
http://www.w3schools.com/sql/sql_join_inner.asp
FULL JOIN
http://www.w3schools.com/sql/sql_join_full.asp
A natural join is said to be an abomination because it does not allow qualifying key columns, which makes it confusing. Because you never know which "common" columns are being used to join two tables simply by looking at the sql statement.
A NATURAL JOIN matches on any shared column names between the tables, whereas an INNER JOIN only matches on the given ON condition.
The joins often interchangeable and usually produce the same results. However, there are some important considerations to make:
If a NATURAL JOIN finds no matching columns, it returns the cross
product. This could produce disastrous results if the schema is
modified. On the other hand, an INNER JOIN will return a 'column does
not exist' error. This is much more fault tolerant.
An INNER JOIN self-documents with its ON clause, resulting in a
clearer query that describes the table schema to the reader.
An INNER JOIN results in a maintainable and reusable query in
which the column names can be swapped in and out with changes in the
use case or table schema.
The programmer can notice column name mis-matches (e.g. item_ID vs itemID) sooner if they are forced to define the ON predicate.
Otherwise, a NATURAL JOIN is still a good choice for a quick, ad-hoc query.