Purpose of joins in structured query language - sql

What is the purpose of joins if we can collect data from multiple tables through
SELECT ,table1.a , table2.b , FROM table1,table2 ...

The syntax you've shown is in fact a join. It's called an implicit join. The join syntax is called an explicit join, and has the same effect, with a few advantages:
It's the standard, ANSI, way of doing things
As the name says, it's explicit - it's easier to understand where the join is, and to separate the join conditions (in the on clause) from the logical conditions.
It's easier to specify different types of joins (inner/outer/cross), which some databases may not allow in the implicit form, at least not with a standard syntax.

A JOIN allows you to return all or selected data from multiple tables into a single temporary table. Using single SELECT commands per table would leave you with multiple datasets rather than a single source.

Joins are useful for bringing data together from different tables based on their database relations.
Try to go on this link to learn much about this topic http://www.codeproject.com/Articles/435694/Understanding-Table-Joins-using-SQL

Comma is from before explicit JOIN syntax. It is a cross join of two tables: all possible combinations of a row from each. WHERE keeps only rows that meet its conditon. Given comma and WHERE and nested SELECTS, the alternative spelling CROSS JOIN and JOIN on another condition in ON is not needed.
However the OUTER JOINs ON a condition result in the rows from comma plus a WHERE-like restiriction to rows satisfying the condition plus the unmatched rows from either the left table (LEFT JOIN), the right table (RIGHT JOIN) or both (FULL JOIN) extended with a row of NULLs for the columns from the other table. This requires specifying a match condition (determining unmatched rows where NULLs are added to the cross join that comma would give) separately from later restriction via WHERE.
But as long as an ON condition is added for OUTER JOINs one might as well allow it for inner joins. It gives the same result as an outer join ON that condition returns less any unmatched rows. That is the same as doing a cross join then (like WHERE) restricting per the condition. So (INNER) JOIN is a comma with its own condition from ON, and CROSS JOIN is comma with no ON. (CROSS JOIN is like (INNER) JOIN ON 1=1.) (Also comma has lower precedence then the explicit non-comma joins.)
TL;DR Comma is low-precedence JOIN; we don't need explicit INNER JOIN or ON; OUTER JOINs need ON distinct from WHERE; we might as well then add explicit inner JOIN ON syntax.

Related

What if i dont use Join Keyword in query?

I have a query where i am retrieving data from more than two tables. I am using the filter criteria in where clause but not using any join keyword
select
d.proc_code,
d.dos,
s.svc_type
from
claim_detail d, h_claim_hdr hh, car_svc s
where
d.bu_id="$inp_bu_id"
and
hh.bu_id="$inp_bu_id"
and
s.bu_id="$inp_bu_id"
and
d.audit_nbr="$inp_audit_nbr"
and
hh.audit_nbr="$inp_audit_nbr"
and
d.audit_nbr=hh.audit_nbr
and
s.car_svc_nbr=hh.aut_nbr
Is there a better way of writing this?
Although you are not using a JOIN keyword, your query does perform a JOIN.
A more "modern" way of writing your query (i.e. one following the ANSI SQL standard) would be as follows:
select
d.proc_code,
d.dos,
s.svc_type
from
claim_detail d
join
h_claim_hdr hh on d.audit_nbr=hh.audit_nbr
join
car_svc s on s.car_svc_nbr=hh.aut_nbr
where
d.bu_id="$inp_bu_id"
and
hh.bu_id="$inp_bu_id"
and
s.bu_id="$inp_bu_id"
and
d.audit_nbr="$inp_audit_nbr"
and
hh.audit_nbr="$inp_audit_nbr"
Note that this is simply a modern syntax. It expresses the same query, and it will not impact the performance.
Note that in order for a row to appear in the output of this query, the corresponding rows must exist in all three queries (i.e. it's an inner join). If you would like to return rows of claim_detail for which no h_claim_hdr and / or car_svc existed, use left outer join instead.
A comma in the from clause is essentially the same as a cross join. You really don't want to use a cross join, unless you really know what you are doing.
Proper join syntax has several advantages. The most important of which is the ability to express other types of joins easily and compatibly across databases.
Most people would probably find this version easier to follow and maintain:
select d.proc_code, d.dos, s.svc_type
from claim_detail d join
h_claim_hdr hh
on d.bu_id = hh.bu_id and d.audit_nbr = hh.audit_nbr
car_svc s
on d.bu_id = s.bu_id and s.car_svc_nbr = hh.aut_nbr
where d.bu_id = "$inp_bu_id"
d.audit_nbr = "$inp_audit_nbr";
Using the WHERE clause instead of the JOIN keyword is essentially a different syntax for doing a join. I believe it is called Theta syntax, where using the JOIN clause is called ANSI syntax.
I believe ANSI syntax is almost universally recommended, and some databases require ANSI syntax for outer JOINs.
If you do not use JOIN it will be an implicit inner join. As is in your example with the join criteria on your WHERE clause. So you could me missing records. Lets say you want all records from the first table even if there is not a corresponding record in the second. Your current code would only return the records from the first table that have a matching record in the second.
Joins

When to use SQL natural join instead of join .. on?

I'm studying SQL for a database exam and the way I've seen SQL is they way it looks on this page:
http://en.wikipedia.org/wiki/Star_schema
IE join written the way Join <table name> On <table attribute> and then the join condition for the selection. My course book and my exercises given to me from the academic institution however, use only natural join in their examples. So when is it right to use natural join? Should natural join be used if the query can also be written using JOIN .. ON ?
Thanks for any answer or comment
A natural join will find columns with the same name in both tables and add one column in the result for each pair found. The inner join lets you specify the comparison you want to make using any column.
IMO, the JOIN ON syntax is much more readable and maintainable than the natural join syntax. Natural joins is a leftover of some old standards, and I try to avoid it like the plague.
A natural join will find columns with the same name in both tables and add one column in the result for each pair found. The inner join lets you specify the comparison you want to make using any column.
The JOIN keyword is used in an SQL statement to query data from two or more tables, based on a relationship between certain columns in these tables.
Different Joins
* JOIN: Return rows when there is at least one match in both tables
* LEFT JOIN: Return all rows from the left table, even if there are no matches in the right table
* RIGHT JOIN: Return all rows from the right table, even if there are no matches in the left table
* FULL JOIN: Return rows when there is a match in one of the tables
INNER JOIN
http://www.w3schools.com/sql/sql_join_inner.asp
FULL JOIN
http://www.w3schools.com/sql/sql_join_full.asp
A natural join is said to be an abomination because it does not allow qualifying key columns, which makes it confusing. Because you never know which "common" columns are being used to join two tables simply by looking at the sql statement.
A NATURAL JOIN matches on any shared column names between the tables, whereas an INNER JOIN only matches on the given ON condition.
The joins often interchangeable and usually produce the same results. However, there are some important considerations to make:
If a NATURAL JOIN finds no matching columns, it returns the cross
product. This could produce disastrous results if the schema is
modified. On the other hand, an INNER JOIN will return a 'column does
not exist' error. This is much more fault tolerant.
An INNER JOIN self-documents with its ON clause, resulting in a
clearer query that describes the table schema to the reader.
An INNER JOIN results in a maintainable and reusable query in
which the column names can be swapped in and out with changes in the
use case or table schema.
The programmer can notice column name mis-matches (e.g. item_ID vs itemID) sooner if they are forced to define the ON predicate.
Otherwise, a NATURAL JOIN is still a good choice for a quick, ad-hoc query.

What are the uses of the different join operations?

What are the uses of the different join operations in SQL? Like I want to know why do we need the different inner and outer joins?
The only type of join you really need is LEFT OUTER JOIN. Every other type of join can be rewritten in terms of one or more left outer joins, and possibly some filtering. So why do we need all the others? Is it just to confuse people? Wouldn't it be simpler if there were only one type of join?
You could also ask: Why have both a <= b and b >= a? Don't these just do the same thing? Can't we just get rid of one of them? It would simplify things!
Sometimes it's easier to swap <= to >= instead of swapping the arguments round. Similarly, a left join and a right join are the same thing just with the operands swapped. But again it's practical to have both options instead of requiring people to write their queries in a specific order.
Another thing you could ask is: In logic why do we have AND, OR, NOT, XOR, NAND, NOR, etc? All these can be rewritten in terms of NANDs! Why not just have NAND? Well it's awkward to write an OR in terms of NANDs, and it's not as obvious what the intention is - if you write OR, people know immediately what you mean. If you write a bunch of NANDs, it is not obvious what you are trying to achieve.
Similarly, if you want to do a FULL OUTER JOIN b you could make a left join and a right join, remove duplicated results, and then union all. But that's a pain and so there's a shorthand for it.
When do you use each one? Here's a simplified rule:
If you always want a result row for each row in the LEFT table, use a LEFT OUTER JOIN.
If you always want a result row for each row in the RIGHT table, use a RIGHT OUTER JOIN.
If you always want a result row for each row in either table, use a FULL OUTER JOIN.
If you only want a result row when there's a row in both tables, use an INNER JOIN.
If you want all possible pairs of rows, one row from each table, use a CROSS JOIN.
inner join - joins rows from both sets of the match based on specified criteria.
outer join - selects all of one set, along with matching or empty (if not matched) elements from the other set. Outer joins can be left or right, to specify which set is returned in its entirety.
To make the other answers clearer - YOU GET DIFFERENT RESULTS according to the join you choose, when the columns you're joining on contain null values - for example.
So - for each Real-life scenario there is a join that suits it (either you want the lines without the data or not in the null values example).
My answer assumes 2 tables joined on a single key:
INNER JOIN - get the results that are in both join tables (according to the join rule)
FULL OUTER JOIN - get all results from both table (Cartesian product)
LEFT OUTER JOIN - get all the results from left table and the matching results from the right
You can add WHERE clauses in order to further constrain the results.
Use these in order to only get what you want to get.

Is there any difference between using innerjoin and writing all the tables directly in the from segment?

Do these two queries differ from each other?
Query 1:
SELECT * FROM Table1, Table2 WHERE Table1.Id = Table2.RefId
Query 2:
SELECT * FROM Table1 INNER JOIN Table2 ON Table1.Id = Table2.RefId
I analysed both methods and they clearly produced the same actual execution plans. Do you know any cases where using inner joins would work in a more efficient way. What is the real advantage of using inner joins rather than approaching the manner of "Query 1"?
The two statements you have provided are functionally equivalent to one another.
The variation is caused by differing SQL syntax standards.
For a really exciting read, you can lookup the various SQL standards by visiting the following Wikipedia link. On the right hand side are references and links to the various dialects/standards of SQL.
http://en.wikipedia.org/wiki/SQL
These SQL statements are synonymous, though specifying the INNER JOIN is the preferred method and follows ISO format. I prefer it as well because it limits the plumbing of joining the tables from your where clause and makes the goal of your query clearer.
These will result in an identical query plan, but the INNER JOIN, OUTER JOIN, CROSS JOIN keywords are prefered because they add clarity to the code.
While you have the ability to specifiy join hints using the keywords in the FROM clause, you can do more complicated joins in the WHERE clause. But otherwise, there will be no difference in query plan.
I will also add that the first syntax is much more subject to inadvertent cross joins as the queries get complicated. Further the left and right joins in this syntax do not work properly in SQL server and should never be used. Mixing the syntax when you add a left join can also cause problems where the query does not correctly return the results. The syntax in the first example has been outdated for 17 years, I see no reason to ever use it.
Query 1 is considered an old syntax style and its use is discouraged. You will run into problems with you use LEFT and Right joins using that syntax style. Also on SQL Server you can have problems mixing those two different syles together in queries that use view of different formats.
I have found a significant difference using the LEFT OUTER JOINS and putting the conditions on the joined table in the ON clause rather than the WHERE clause. Once you put a condition on the joined table in the WHERE clause, you defeat the left outer join.
When I was using Oracle, I used the archaic (+) after the joined table (with all conditions including join conditions in the WHERE clause)because that's what I knew. When we became a SQL Server shop, I was forced to use LEFT OUTER JOINs, and I found they didn't work as before until I discovered this behavior. Here's an example:
select NC.*,
IsNull(F.STRING_VAL, 'NONE') as USER_ID,
CO.TOTAL_AMT_ORDERED
from customer_order CO
INNER JOIN VTG_CO_NET_CHANGE NC
ON NC.CUST_ORDER_ID=CO.ID
LEFT OUTER JOIN USER_DEF_FIELDS F
ON F.DOCUMENT_ID = CO.ID and
F.PROGRAM_ID='VMORDENT' and
F.ID='UDF-0000072' and
F.DOCUMENT_ID is not null
where NC.acct_year=2017

INNER JOIN ON vs WHERE clause

For simplicity, assume all relevant fields are NOT NULL.
You can do:
SELECT
table1.this, table2.that, table2.somethingelse
FROM
table1, table2
WHERE
table1.foreignkey = table2.primarykey
AND (some other conditions)
Or else:
SELECT
table1.this, table2.that, table2.somethingelse
FROM
table1 INNER JOIN table2
ON table1.foreignkey = table2.primarykey
WHERE
(some other conditions)
Do these two work on the same way in MySQL?
INNER JOIN is ANSI syntax that you should use.
It is generally considered more readable, especially when you join lots of tables.
It can also be easily replaced with an OUTER JOIN whenever a need arises.
The WHERE syntax is more relational model oriented.
A result of two tables JOINed is a cartesian product of the tables to which a filter is applied which selects only those rows with joining columns matching.
It's easier to see this with the WHERE syntax.
As for your example, in MySQL (and in SQL generally) these two queries are synonyms.
Also, note that MySQL also has a STRAIGHT_JOIN clause.
Using this clause, you can control the JOIN order: which table is scanned in the outer loop and which one is in the inner loop.
You cannot control this in MySQL using WHERE syntax.
Others have pointed out that INNER JOIN helps human readability, and that's a top priority, I agree.
Let me try to explain why the join syntax is more readable.
A basic SELECT query is this:
SELECT stuff
FROM tables
WHERE conditions
The SELECT clause tells us what we're getting back; the FROM clause tells us where we're getting it from, and the WHERE clause tells us which ones we're getting.
JOIN is a statement about the tables, how they are bound together (conceptually, actually, into a single table).
Any query elements that control the tables - where we're getting stuff from - semantically belong to the FROM clause (and of course, that's where JOIN elements go). Putting joining-elements into the WHERE clause conflates the which and the where-from, that's why the JOIN syntax is preferred.
Applying conditional statements in ON / WHERE
Here I have explained the logical query processing steps.
Reference: Inside Microsoft® SQL Server™ 2005 T-SQL Querying
Publisher: Microsoft Press
Pub Date: March 07, 2006
Print ISBN-10: 0-7356-2313-9
Print ISBN-13: 978-0-7356-2313-2
Pages: 640
Inside Microsoft® SQL Server™ 2005 T-SQL Querying
(8) SELECT (9) DISTINCT (11) TOP <top_specification> <select_list>
(1) FROM <left_table>
(3) <join_type> JOIN <right_table>
(2) ON <join_condition>
(4) WHERE <where_condition>
(5) GROUP BY <group_by_list>
(6) WITH {CUBE | ROLLUP}
(7) HAVING <having_condition>
(10) ORDER BY <order_by_list>
The first noticeable aspect of SQL that is different than other programming languages is the order in which the code is processed. In most programming languages, the code is processed in the order in which it is written. In SQL, the first clause that is processed is the FROM clause, while the SELECT clause, which appears first, is processed almost last.
Each step generates a virtual table that is used as the input to the following step. These virtual tables are not available to the caller (client application or outer query). Only the table generated by the final step is returned to the caller. If a certain clause is not specified in a query, the corresponding step is simply skipped.
Brief Description of Logical Query Processing Phases
Don't worry too much if the description of the steps doesn't seem to make much sense for now. These are provided as a reference. Sections that come after the scenario example will cover the steps in much more detail.
FROM: A Cartesian product (cross join) is performed between the first two tables in the FROM clause, and as a result, virtual table VT1 is generated.
ON: The ON filter is applied to VT1. Only rows for which the <join_condition> is TRUE are inserted to VT2.
OUTER (join): If an OUTER JOIN is specified (as opposed to a CROSS JOIN or an INNER JOIN), rows from the preserved table or tables for which a match was not found are added to the rows from VT2 as outer rows, generating VT3. If more than two tables appear in the FROM clause, steps 1 through 3 are applied repeatedly between the result of the last join and the next table in the FROM clause until all tables are processed.
WHERE: The WHERE filter is applied to VT3. Only rows for which the <where_condition> is TRUE are inserted to VT4.
GROUP BY: The rows from VT4 are arranged in groups based on the column list specified in the GROUP BY clause. VT5 is generated.
CUBE | ROLLUP: Supergroups (groups of groups) are added to the rows from VT5, generating VT6.
HAVING: The HAVING filter is applied to VT6. Only groups for which the <having_condition> is TRUE are inserted to VT7.
SELECT: The SELECT list is processed, generating VT8.
DISTINCT: Duplicate rows are removed from VT8. VT9 is generated.
ORDER BY: The rows from VT9 are sorted according to the column list specified in the ORDER BY clause. A cursor is generated (VC10).
TOP: The specified number or percentage of rows is selected from the beginning of VC10. Table VT11 is generated and returned to the caller.
Therefore, (INNER JOIN) ON will filter the data (the data count of VT will be reduced here itself) before applying the WHERE clause. The subsequent join conditions will be executed with filtered data which improves performance. After that, only the WHERE condition will apply filter conditions.
(Applying conditional statements in ON / WHERE will not make much difference in few cases. This depends on how many tables you have joined and the number of rows available in each join tables)
The implicit join ANSI syntax is older, less obvious, and not recommended.
In addition, the relational algebra allows interchangeability of the predicates in the WHERE clause and the INNER JOIN, so even INNER JOIN queries with WHERE clauses can have the predicates rearranged by the optimizer.
I recommend you write the queries in the most readable way possible.
Sometimes this includes making the INNER JOIN relatively "incomplete" and putting some of the criteria in the WHERE simply to make the lists of filtering criteria more easily maintainable.
For example, instead of:
SELECT *
FROM Customers c
INNER JOIN CustomerAccounts ca
ON ca.CustomerID = c.CustomerID
AND c.State = 'NY'
INNER JOIN Accounts a
ON ca.AccountID = a.AccountID
AND a.Status = 1
Write:
SELECT *
FROM Customers c
INNER JOIN CustomerAccounts ca
ON ca.CustomerID = c.CustomerID
INNER JOIN Accounts a
ON ca.AccountID = a.AccountID
WHERE c.State = 'NY'
AND a.Status = 1
But it depends, of course.
Implicit joins (which is what your first query is known as) become much much more confusing, hard to read, and hard to maintain once you need to start adding more tables to your query. Imagine doing that same query and type of join on four or five different tables ... it's a nightmare.
Using an explicit join (your second example) is much more readable and easy to maintain.
I'll also point out that using the older syntax is more subject to error. If you use inner joins without an ON clause, you will get a syntax error. If you use the older syntax and forget one of the join conditions in the where clause, you will get a cross join. The developers often fix this by adding the distinct keyword (rather than fixing the join because they still don't realize the join itself is broken) which may appear to cure the problem but will slow down the query considerably.
Additionally for maintenance if you have a cross join in the old syntax, how will the maintainer know if you meant to have one (there are situations where cross joins are needed) or if it was an accident that should be fixed?
Let me point you to this question to see why the implicit syntax is bad if you use left joins.
Sybase *= to Ansi Standard with 2 different outer tables for same inner table
Plus (personal rant here), the standard using the explicit joins is over 20 years old, which means implicit join syntax has been outdated for those 20 years. Would you write application code using a syntax that has been outdated for 20 years? Why do you want to write database code that is?
The SQL:2003 standard changed some precedence rules so a JOIN statement takes precedence over a "comma" join. This can actually change the results of your query depending on how it is setup. This cause some problems for some people when MySQL 5.0.12 switched to adhering to the standard.
So in your example, your queries would work the same. But if you added a third table:
SELECT ... FROM table1, table2 JOIN table3 ON ... WHERE ...
Prior to MySQL 5.0.12, table1 and table2 would be joined first, then table3. Now (5.0.12 and on), table2 and table3 are joined first, then table1. It doesn't always change the results, but it can and you may not even realize it.
I never use the "comma" syntax anymore, opting for your second example. It's a lot more readable anyway, the JOIN conditions are with the JOINs, not separated into a separate query section.
They have a different human-readable meaning.
However, depending on the query optimizer, they may have the same meaning to the machine.
You should always code to be readable.
That is to say, if this is a built-in relationship, use the explicit join. if you are matching on weakly related data, use the where clause.
I know you're talking about MySQL, but anyway:
In Oracle 9 explicit joins and implicit joins would generate different execution plans. AFAIK that has been solved in Oracle 10+: there's no such difference anymore.
If you are often programming dynamic stored procedures, you will fall in love with your second example (using where). If you have various input parameters and lots of morph mess, then that is the only way. Otherwise, they both will run the same query plan so there is definitely no obvious difference in classic queries.
ANSI join syntax is definitely more portable.
I'm going through an upgrade of Microsoft SQL Server, and I would also mention that the =* and *= syntax for outer joins in SQL Server is not supported (without compatibility mode) for 2005 SQL server and later.
I have two points for the implicit join (The second example):
Tell the database what you want, not what it should do.
You can write all tables in a clear list that is not cluttered by join conditions. Then you can much easier read what tables are all mentioned. The conditions come all in the WHERE part, where they are also all lined up one below the other. Using the JOIN keyword mixes up tables and conditions.