SQL: SELECT from different tables [duplicate] - sql

What is a SQL JOIN and what are different types?

An illustration from W3schools:

What is SQL JOIN ?
SQL JOIN is a method to retrieve data from two or more database tables.
What are the different SQL JOINs ?
There are a total of five JOINs. They are :
1. JOIN or INNER JOIN
2. OUTER JOIN
2.1 LEFT OUTER JOIN or LEFT JOIN
2.2 RIGHT OUTER JOIN or RIGHT JOIN
2.3 FULL OUTER JOIN or FULL JOIN
3. NATURAL JOIN
4. CROSS JOIN
5. SELF JOIN
1. JOIN or INNER JOIN :
In this kind of a JOIN, we get all records that match the condition in both tables, and records in both tables that do not match are not reported.
In other words, INNER JOIN is based on the single fact that: ONLY the matching entries in BOTH the tables SHOULD be listed.
Note that a JOIN without any other JOIN keywords (like INNER, OUTER, LEFT, etc) is an INNER JOIN. In other words, JOIN is
a Syntactic sugar for INNER JOIN (see: Difference between JOIN and INNER JOIN).
2. OUTER JOIN :
OUTER JOIN retrieves
Either,
the matched rows from one table and all rows in the other table
Or,
all rows in all tables (it doesn't matter whether or not there is a match).
There are three kinds of Outer Join :
2.1 LEFT OUTER JOIN or LEFT JOIN
This join returns all the rows from the left table in conjunction with the matching rows from the
right table. If there are no columns matching in the right table, it returns NULL values.
2.2 RIGHT OUTER JOIN or RIGHT JOIN
This JOIN returns all the rows from the right table in conjunction with the matching rows from the
left table. If there are no columns matching in the left table, it returns NULL values.
2.3 FULL OUTER JOIN or FULL JOIN
This JOIN combines LEFT OUTER JOIN and RIGHT OUTER JOIN. It returns rows from either table when the conditions are met and returns NULL value when there is no match.
In other words, OUTER JOIN is based on the fact that: ONLY the matching entries in ONE OF the tables (RIGHT or LEFT) or BOTH of the tables(FULL) SHOULD be listed.
Note that `OUTER JOIN` is a loosened form of `INNER JOIN`.
3. NATURAL JOIN :
It is based on the two conditions :
the JOIN is made on all the columns with the same name for equality.
Removes duplicate columns from the result.
This seems to be more of theoretical in nature and as a result (probably) most DBMS
don't even bother supporting this.
4. CROSS JOIN :
It is the Cartesian product of the two tables involved. The result of a CROSS JOIN will not make sense
in most of the situations. Moreover, we won't need this at all (or needs the least, to be precise).
5. SELF JOIN :
It is not a different form of JOIN, rather it is a JOIN (INNER, OUTER, etc) of a table to itself.
JOINs based on Operators
Depending on the operator used for a JOIN clause, there can be two types of JOINs. They are
Equi JOIN
Theta JOIN
1. Equi JOIN :
For whatever JOIN type (INNER, OUTER, etc), if we use ONLY the equality operator (=), then we say that
the JOIN is an EQUI JOIN.
2. Theta JOIN :
This is same as EQUI JOIN but it allows all other operators like >, <, >= etc.
Many consider both EQUI JOIN and Theta JOIN similar to INNER, OUTER
etc JOINs. But I strongly believe that its a mistake and makes the
ideas vague. Because INNER JOIN, OUTER JOIN etc are all connected with
the tables and their data whereas EQUI JOIN and THETA JOIN are only
connected with the operators we use in the former.
Again, there are many who consider NATURAL JOIN as some sort of
"peculiar" EQUI JOIN. In fact, it is true, because of the first
condition I mentioned for NATURAL JOIN. However, we don't have to
restrict that simply to NATURAL JOINs alone. INNER JOINs, OUTER JOINs
etc could be an EQUI JOIN too.

Definition:
JOINS are way to query the data that combined together from multiple tables simultaneously.
Types of JOINS:
Concern to RDBMS there are 5-types of joins:
Equi-Join: Combines common records from two tables based on equality condition.
Technically, Join made by using equality-operator (=) to compare values of Primary Key of one table and Foreign Key values of another table, hence result set includes common(matched) records from both tables. For implementation see INNER-JOIN.
Natural-Join: It is enhanced version of Equi-Join, in which SELECT
operation omits duplicate column. For implementation see INNER-JOIN
Non-Equi-Join: It is reverse of Equi-join where joining condition is uses other than equal operator(=) e.g, !=, <=, >=, >, < or BETWEEN etc. For implementation see INNER-JOIN.
Self-Join:: A customized behavior of join where a table combined with itself; This is typically needed for querying self-referencing tables (or Unary relationship entity).
For implementation see INNER-JOINs.
Cartesian Product: It cross combines all records of both tables without any condition. Technically, it returns the result set of a query without WHERE-Clause.
As per SQL concern and advancement, there are 3-types of joins and all RDBMS joins can be achieved using these types of joins.
INNER-JOIN: It merges(or combines) matched rows from two tables. The matching is done based on common columns of tables and their comparing operation. If equality based condition then: EQUI-JOIN performed, otherwise Non-EQUI-Join.
OUTER-JOIN: It merges(or combines) matched rows from two tables and unmatched rows with NULL values. However, can customized selection of un-matched rows e.g, selecting unmatched row from first table or second table by sub-types: LEFT OUTER JOIN and RIGHT OUTER JOIN.
2.1. LEFT Outer JOIN (a.k.a, LEFT-JOIN): Returns matched rows from two tables and unmatched from the LEFT table(i.e, first table) only.
2.2. RIGHT Outer JOIN (a.k.a, RIGHT-JOIN): Returns matched rows from two tables and unmatched from the RIGHT table only.
2.3. FULL OUTER JOIN (a.k.a OUTER JOIN): Returns matched and unmatched from both tables.
CROSS-JOIN: This join does not merges/combines instead it performs Cartesian product.
Note: Self-JOIN can be achieved by either INNER-JOIN, OUTER-JOIN and CROSS-JOIN based on requirement but the table must join with itself.
For more information:
Examples:
1.1: INNER-JOIN: Equi-join implementation
SELECT *
FROM Table1 A
INNER JOIN Table2 B ON A.<Primary-Key> =B.<Foreign-Key>;
1.2: INNER-JOIN: Natural-JOIN implementation
Select A.*, B.Col1, B.Col2 --But no B.ForeignKeyColumn in Select
FROM Table1 A
INNER JOIN Table2 B On A.Pk = B.Fk;
1.3: INNER-JOIN with NON-Equi-join implementation
Select *
FROM Table1 A INNER JOIN Table2 B On A.Pk <= B.Fk;
1.4: INNER-JOIN with SELF-JOIN
Select *
FROM Table1 A1 INNER JOIN Table1 A2 On A1.Pk = A2.Fk;
2.1: OUTER JOIN (full outer join)
Select *
FROM Table1 A FULL OUTER JOIN Table2 B On A.Pk = B.Fk;
2.2: LEFT JOIN
Select *
FROM Table1 A LEFT OUTER JOIN Table2 B On A.Pk = B.Fk;
2.3: RIGHT JOIN
Select *
FROM Table1 A RIGHT OUTER JOIN Table2 B On A.Pk = B.Fk;
3.1: CROSS JOIN
Select *
FROM TableA CROSS JOIN TableB;
3.2: CROSS JOIN-Self JOIN
Select *
FROM Table1 A1 CROSS JOIN Table1 A2;
//OR//
Select *
FROM Table1 A1,Table1 A2;

Interestingly most other answers suffer from these two problems:
They focus on basic forms of join only
They (ab)use Venn diagrams, which are an inaccurate tool for visualising joins (they're much better for unions).
I've recently written an article on the topic: A Probably Incomplete, Comprehensive Guide to the Many Different Ways to JOIN Tables in SQL, which I'll summarise here.
First and foremost: JOINs are cartesian products
This is why Venn diagrams explain them so inaccurately, because a JOIN creates a cartesian product between the two joined tables. Wikipedia illustrates it nicely:
The SQL syntax for cartesian products is CROSS JOIN. For example:
SELECT *
-- This just generates all the days in January 2017
FROM generate_series(
'2017-01-01'::TIMESTAMP,
'2017-01-01'::TIMESTAMP + INTERVAL '1 month -1 day',
INTERVAL '1 day'
) AS days(day)
-- Here, we're combining all days with all departments
CROSS JOIN departments
Which combines all rows from one table with all rows from the other table:
Source:
+--------+ +------------+
| day | | department |
+--------+ +------------+
| Jan 01 | | Dept 1 |
| Jan 02 | | Dept 2 |
| ... | | Dept 3 |
| Jan 30 | +------------+
| Jan 31 |
+--------+
Result:
+--------+------------+
| day | department |
+--------+------------+
| Jan 01 | Dept 1 |
| Jan 01 | Dept 2 |
| Jan 01 | Dept 3 |
| Jan 02 | Dept 1 |
| Jan 02 | Dept 2 |
| Jan 02 | Dept 3 |
| ... | ... |
| Jan 31 | Dept 1 |
| Jan 31 | Dept 2 |
| Jan 31 | Dept 3 |
+--------+------------+
If we just write a comma separated list of tables, we'll get the same:
-- CROSS JOINing two tables:
SELECT * FROM table1, table2
INNER JOIN (Theta-JOIN)
An INNER JOIN is just a filtered CROSS JOIN where the filter predicate is called Theta in relational algebra.
For instance:
SELECT *
-- Same as before
FROM generate_series(
'2017-01-01'::TIMESTAMP,
'2017-01-01'::TIMESTAMP + INTERVAL '1 month -1 day',
INTERVAL '1 day'
) AS days(day)
-- Now, exclude all days/departments combinations for
-- days before the department was created
JOIN departments AS d ON day >= d.created_at
Note that the keyword INNER is optional (except in MS Access).
(look at the article for result examples)
EQUI JOIN
A special kind of Theta-JOIN is equi JOIN, which we use most. The predicate joins the primary key of one table with the foreign key of another table. If we use the Sakila database for illustration, we can write:
SELECT *
FROM actor AS a
JOIN film_actor AS fa ON a.actor_id = fa.actor_id
JOIN film AS f ON f.film_id = fa.film_id
This combines all actors with their films.
Or also, on some databases:
SELECT *
FROM actor
JOIN film_actor USING (actor_id)
JOIN film USING (film_id)
The USING() syntax allows for specifying a column that must be present on either side of a JOIN operation's tables and creates an equality predicate on those two columns.
NATURAL JOIN
Other answers have listed this "JOIN type" separately, but that doesn't make sense. It's just a syntax sugar form for equi JOIN, which is a special case of Theta-JOIN or INNER JOIN. NATURAL JOIN simply collects all columns that are common to both tables being joined and joins USING() those columns. Which is hardly ever useful, because of accidental matches (like LAST_UPDATE columns in the Sakila database).
Here's the syntax:
SELECT *
FROM actor
NATURAL JOIN film_actor
NATURAL JOIN film
OUTER JOIN
Now, OUTER JOIN is a bit different from INNER JOIN as it creates a UNION of several cartesian products. We can write:
-- Convenient syntax:
SELECT *
FROM a LEFT JOIN b ON <predicate>
-- Cumbersome, equivalent syntax:
SELECT a.*, b.*
FROM a JOIN b ON <predicate>
UNION ALL
SELECT a.*, NULL, NULL, ..., NULL
FROM a
WHERE NOT EXISTS (
SELECT * FROM b WHERE <predicate>
)
No one wants to write the latter, so we write OUTER JOIN (which is usually better optimised by databases).
Like INNER, the keyword OUTER is optional, here.
OUTER JOIN comes in three flavours:
LEFT [ OUTER ] JOIN: The left table of the JOIN expression is added to the union as shown above.
RIGHT [ OUTER ] JOIN: The right table of the JOIN expression is added to the union as shown above.
FULL [ OUTER ] JOIN: Both tables of the JOIN expression are added to the union as shown above.
All of these can be combined with the keyword USING() or with NATURAL (I've actually had a real world use-case for a NATURAL FULL JOIN recently)
Alternative syntaxes
There are some historic, deprecated syntaxes in Oracle and SQL Server, which supported OUTER JOIN already before the SQL standard had a syntax for this:
-- Oracle
SELECT *
FROM actor a, film_actor fa, film f
WHERE a.actor_id = fa.actor_id(+)
AND fa.film_id = f.film_id(+)
-- SQL Server
SELECT *
FROM actor a, film_actor fa, film f
WHERE a.actor_id *= fa.actor_id
AND fa.film_id *= f.film_id
Having said so, don't use this syntax. I just list this here so you can recognise it from old blog posts / legacy code.
Partitioned OUTER JOIN
Few people know this, but the SQL standard specifies partitioned OUTER JOIN (and Oracle implements it). You can write things like this:
WITH
-- Using CONNECT BY to generate all dates in January
days(day) AS (
SELECT DATE '2017-01-01' + LEVEL - 1
FROM dual
CONNECT BY LEVEL <= 31
),
-- Our departments
departments(department, created_at) AS (
SELECT 'Dept 1', DATE '2017-01-10' FROM dual UNION ALL
SELECT 'Dept 2', DATE '2017-01-11' FROM dual UNION ALL
SELECT 'Dept 3', DATE '2017-01-12' FROM dual UNION ALL
SELECT 'Dept 4', DATE '2017-04-01' FROM dual UNION ALL
SELECT 'Dept 5', DATE '2017-04-02' FROM dual
)
SELECT *
FROM days
LEFT JOIN departments
PARTITION BY (department) -- This is where the magic happens
ON day >= created_at
Parts of the result:
+--------+------------+------------+
| day | department | created_at |
+--------+------------+------------+
| Jan 01 | Dept 1 | | -- Didn't match, but still get row
| Jan 02 | Dept 1 | | -- Didn't match, but still get row
| ... | Dept 1 | | -- Didn't match, but still get row
| Jan 09 | Dept 1 | | -- Didn't match, but still get row
| Jan 10 | Dept 1 | Jan 10 | -- Matches, so get join result
| Jan 11 | Dept 1 | Jan 10 | -- Matches, so get join result
| Jan 12 | Dept 1 | Jan 10 | -- Matches, so get join result
| ... | Dept 1 | Jan 10 | -- Matches, so get join result
| Jan 31 | Dept 1 | Jan 10 | -- Matches, so get join result
The point here is that all rows from the partitioned side of the join will wind up in the result regardless if the JOIN matched anything on the "other side of the JOIN". Long story short: This is to fill up sparse data in reports. Very useful!
SEMI JOIN
Seriously? No other answer got this? Of course not, because it doesn't have a native syntax in SQL, unfortunately (just like ANTI JOIN below). But we can use IN() and EXISTS(), e.g. to find all actors who have played in films:
SELECT *
FROM actor a
WHERE EXISTS (
SELECT * FROM film_actor fa
WHERE a.actor_id = fa.actor_id
)
The WHERE a.actor_id = fa.actor_id predicate acts as the semi join predicate. If you don't believe it, check out execution plans, e.g. in Oracle. You'll see that the database executes a SEMI JOIN operation, not the EXISTS() predicate.
ANTI JOIN
This is just the opposite of SEMI JOIN (be careful not to use NOT IN though, as it has an important caveat)
Here are all the actors without films:
SELECT *
FROM actor a
WHERE NOT EXISTS (
SELECT * FROM film_actor fa
WHERE a.actor_id = fa.actor_id
)
Some folks (especially MySQL people) also write ANTI JOIN like this:
SELECT *
FROM actor a
LEFT JOIN film_actor fa
USING (actor_id)
WHERE film_id IS NULL
I think the historic reason is performance.
LATERAL JOIN
OMG, this one is too cool. I'm the only one to mention it? Here's a cool query:
SELECT a.first_name, a.last_name, f.*
FROM actor AS a
LEFT OUTER JOIN LATERAL (
SELECT f.title, SUM(amount) AS revenue
FROM film AS f
JOIN film_actor AS fa USING (film_id)
JOIN inventory AS i USING (film_id)
JOIN rental AS r USING (inventory_id)
JOIN payment AS p USING (rental_id)
WHERE fa.actor_id = a.actor_id -- JOIN predicate with the outer query!
GROUP BY f.film_id
ORDER BY revenue DESC
LIMIT 5
) AS f
ON true
It will find the TOP 5 revenue producing films per actor. Every time you need a TOP-N-per-something query, LATERAL JOIN will be your friend. If you're a SQL Server person, then you know this JOIN type under the name APPLY
SELECT a.first_name, a.last_name, f.*
FROM actor AS a
OUTER APPLY (
SELECT f.title, SUM(amount) AS revenue
FROM film AS f
JOIN film_actor AS fa ON f.film_id = fa.film_id
JOIN inventory AS i ON f.film_id = i.film_id
JOIN rental AS r ON i.inventory_id = r.inventory_id
JOIN payment AS p ON r.rental_id = p.rental_id
WHERE fa.actor_id = a.actor_id -- JOIN predicate with the outer query!
GROUP BY f.film_id
ORDER BY revenue DESC
LIMIT 5
) AS f
OK, perhaps that's cheating, because a LATERAL JOIN or APPLY expression is really a "correlated subquery" that produces several rows. But if we allow for "correlated subqueries", we can also talk about...
MULTISET
This is only really implemented by Oracle and Informix (to my knowledge), but it can be emulated in PostgreSQL using arrays and/or XML and in SQL Server using XML.
MULTISET produces a correlated subquery and nests the resulting set of rows in the outer query. The below query selects all actors and for each actor collects their films in a nested collection:
SELECT a.*, MULTISET (
SELECT f.*
FROM film AS f
JOIN film_actor AS fa USING (film_id)
WHERE a.actor_id = fa.actor_id
) AS films
FROM actor
As you have seen, there are more types of JOIN than just the "boring" INNER, OUTER, and CROSS JOIN that are usually mentioned. More details in my article. And please, stop using Venn diagrams to illustrate them.

I have created an illustration that explains better than words, in my opinion:

I'm going to push my pet peeve: the USING keyword.
If both tables on both sides of the JOIN have their foreign keys properly named (ie, same name, not just "id) then this can be used:
SELECT ...
FROM customers JOIN orders USING (customer_id)
I find this very practical, readable, and not used often enough.

Related

SQL column added twice during INNER JOIN

I am trying to join two tables from a database; energyImport and sunAlt.
energyImport has three columns: timestamp_id, energy, duration.
sunAlt has two columns: timestamp_id, altitude
I am doing an inner join on these two tables using the SQL:
SELECT *
FROM energyImport
INNER JOIN sunAz ON sunAz.timestamp_id = energyImport.timestamp_id;
The output from this is:
timestamp_id,duration,energy,timestamp_id,altitude
1601769600,1800,81310,1601769600,0.0
1601771400,1800,78915,1601771400,0.0
1601773200,1800,78305,1601773200,0.0
The problem is that the timestamp_id column is repeated. How can I join these columns and only include the first timestamp_id?
Replace the * with
energyImport.timestamp_id,energyImport.duration, energyImport.energy,
sunAz.altitude
Either you specify the columns that you want in the results:
SELECT e.timestamp_id, e.duration, e.energy, s.altitude
FROM energyImport e INNER JOIN sunAz s
ON s.timestamp_id = e.timestamp_id;
Or, use NATURAL instead of INNER join, so that the join is based on the columns(s) with the same names of the 2 tables (if this fits your requirement), because NATURAL join returns only 1 of each pair of these columns:
SELECT *
FROM energyImport NATURAL JOIN sunAz;
See the demo.

BigQuery Left Join based on st_dwithin condition acting like an Inner Join

I need help with the following...
I've created a query that should join the records from another table based on a certain distance between two coordinates. I end up with a table that only has records with matching location names (like an inner join). I need every record in the table_customer_x and locationname should be a null if the distance between any location for that customer is > 250.
The query that I created:
SELECT t.customerid, t.geolatitude, t.geolongitude, tt.locationname
FROM `table_customer_x` t
LEFT JOIN `table_location` tt
on ST_DWITHIN(ST_GEOGPOINT(t.geoLatitude,t.geoLongitude), ST_GEOGPOINT(tt.latitude, tt.longitude), 250)
where tt.customer_id= 204
and t.timestamp > "2016-01-01"
and tt.latitude <= 90 and tt.latitude >= -90
table_customer_x looks like:
timestamp geoLatitude geoLongitude
2018-01-01 00:00:00 52.000 4.000
table_location looks like:
latitude longitude name customer_id
52.010 4.010 hospital x 204
[Why] BigQuery Left Join based on st_dwithin condition acting like an Inner Join
In BigQuery, Spatial JOINs are implemented for INNER JOIN and CROSS JOIN operators with the following standard SQL predicate functions:
ST_DWithin
ST_Intersects
ST_Contains
ST_Within
ST_Covers
ST_CoveredBy
ST_Equals
ST_Touches
So, you cannot expect LEFT JOIN to work properly in your case - instead - your left JOIN is "converted" into CROSS JOIN with filter in ON clause moved into Where clause
So result you see is as expected
Summary - you just need to rewrite your query :o)
You can try something like below to workaround (not tested - just possible direction for you)
#standardSQL
SELECT tt.customer_id, t.geolatitude, t.geolongitude, tt.name
FROM `project.dataset.table_customer_x` t
JOIN `project.dataset.table_location` tt
ON ST_DWITHIN(ST_GEOGPOINT(t.geoLatitude,t.geoLongitude), ST_GEOGPOINT(tt.latitude, tt.longitude), 250)
UNION ALL
SELECT tt.customer_id, t.geolatitude, t.geolongitude, tt.name
FROM `project.dataset.table_customer_x` t
JOIN `project.dataset.table_location` tt
ON NOT ST_DWITHIN(ST_GEOGPOINT(t.geoLatitude,t.geoLongitude), ST_GEOGPOINT(tt.latitude, tt.longitude), 250)
WHERE tt.customer_id= 204
AND t.timestamp > "2016-01-01"
AND tt.latitude <= 90 AND tt.latitude >= -90
It could have been a BigQuery bug, seems to be fixed now.
Geospatial outer join is not yet implemented, so this query should fail with message LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join.
The workaround is to simulate outer join using inner join: do inner join, then union with unmatched rows on the left side. It requires some unique key on the outer side to work properly, I'm not sure if you have one in table_customer_x.

SQL: Proper JOIN Protocol

I have the following tables with the following attributes:
Op(OpNo, OpName, Date)
OpConvert(OpNo, M_OpNo, Source_ID, Date)
Source(Source_ID, Source_Name, Date)
Fleet(OpNo, S_No, Date)
I have the current multiple JOIN query which gives me the results that I want:
SELECT O.OpNo AS Op_NO, O.OpName, O.Date AS Date_Entered, C.*
FROM Op O
LEFT OUTER JOIN OpConvert C
ON O.OpNo = C.OpNo
LEFT OUTER JOIN Source D
ON C.Source_ID = D.Source_ID
WHERE C.OpNo IS NOT NULL
The problem is this. I need to join the Fleet table on the previous multiple JOIN statement to attach the relevant S_No to the multiple JOIN table. Would I still be able to accomplish this using a LEFT OUTER JOIN or would I have to use a different JOIN statement? Also, which table would I JOIN on?
Please note that I am only familiar with LEFT OUTER JOINS.
Thanks.
I guess in your case you could use INNER JOIN or LEFT JOIN (which is the same thing as LEFT OUTER JOIN in SQL Server.
INNER JOIN means that it will only return records from other tables only if there are corresponding records (based on the join condition) in the Fleet table.
LEFT JOIN means that it will return records from other tables even if there are no corresponding records (based on the join condition) in the Fleet table. All columns from Fleet will return NULL in this case.
As for which table to join, you should really join the table that makes more logical sense based on your data structure.
Yes, you can use all tables mentioned before in your join conditions. Actually, JOINS (no matter of INNER, LEFT OUTER, RIGHT OUTER, CROSS, FULL OUTER or whatever) are left- associative, i. e. they are implicitly evaluated as if they would have been included in parentheses from the left as follows:
FROM ( ( ( Op O
LEFT OUTER JOIN OpConvert C
ON O.OpNo = C.OpNo
)
LEFT OUTER JOIN Source D
ON C.Source_ID = D.Source_ID
)
LEFT OUTER JOIN Fleet
ON ...
)
This is similar to how + or - would implicitly use parentheses, i. e.
2 + 3 - 4 - 5
is evaluated as
(((2 + 3) - 4) - 5)
By the way: If you use C.OpNo IS NOT NULL, then the LEFT OUTER JOIN Source D is treated as if it were an INNER JOIN, as you are explicitly removing all the "OUTER" rows.

SQL JOIN and different types of JOINs

What is a SQL JOIN and what are different types?
An illustration from W3schools:
What is SQL JOIN ?
SQL JOIN is a method to retrieve data from two or more database tables.
What are the different SQL JOINs ?
There are a total of five JOINs. They are :
1. JOIN or INNER JOIN
2. OUTER JOIN
2.1 LEFT OUTER JOIN or LEFT JOIN
2.2 RIGHT OUTER JOIN or RIGHT JOIN
2.3 FULL OUTER JOIN or FULL JOIN
3. NATURAL JOIN
4. CROSS JOIN
5. SELF JOIN
1. JOIN or INNER JOIN :
In this kind of a JOIN, we get all records that match the condition in both tables, and records in both tables that do not match are not reported.
In other words, INNER JOIN is based on the single fact that: ONLY the matching entries in BOTH the tables SHOULD be listed.
Note that a JOIN without any other JOIN keywords (like INNER, OUTER, LEFT, etc) is an INNER JOIN. In other words, JOIN is
a Syntactic sugar for INNER JOIN (see: Difference between JOIN and INNER JOIN).
2. OUTER JOIN :
OUTER JOIN retrieves
Either,
the matched rows from one table and all rows in the other table
Or,
all rows in all tables (it doesn't matter whether or not there is a match).
There are three kinds of Outer Join :
2.1 LEFT OUTER JOIN or LEFT JOIN
This join returns all the rows from the left table in conjunction with the matching rows from the
right table. If there are no columns matching in the right table, it returns NULL values.
2.2 RIGHT OUTER JOIN or RIGHT JOIN
This JOIN returns all the rows from the right table in conjunction with the matching rows from the
left table. If there are no columns matching in the left table, it returns NULL values.
2.3 FULL OUTER JOIN or FULL JOIN
This JOIN combines LEFT OUTER JOIN and RIGHT OUTER JOIN. It returns rows from either table when the conditions are met and returns NULL value when there is no match.
In other words, OUTER JOIN is based on the fact that: ONLY the matching entries in ONE OF the tables (RIGHT or LEFT) or BOTH of the tables(FULL) SHOULD be listed.
Note that `OUTER JOIN` is a loosened form of `INNER JOIN`.
3. NATURAL JOIN :
It is based on the two conditions :
the JOIN is made on all the columns with the same name for equality.
Removes duplicate columns from the result.
This seems to be more of theoretical in nature and as a result (probably) most DBMS
don't even bother supporting this.
4. CROSS JOIN :
It is the Cartesian product of the two tables involved. The result of a CROSS JOIN will not make sense
in most of the situations. Moreover, we won't need this at all (or needs the least, to be precise).
5. SELF JOIN :
It is not a different form of JOIN, rather it is a JOIN (INNER, OUTER, etc) of a table to itself.
JOINs based on Operators
Depending on the operator used for a JOIN clause, there can be two types of JOINs. They are
Equi JOIN
Theta JOIN
1. Equi JOIN :
For whatever JOIN type (INNER, OUTER, etc), if we use ONLY the equality operator (=), then we say that
the JOIN is an EQUI JOIN.
2. Theta JOIN :
This is same as EQUI JOIN but it allows all other operators like >, <, >= etc.
Many consider both EQUI JOIN and Theta JOIN similar to INNER, OUTER
etc JOINs. But I strongly believe that its a mistake and makes the
ideas vague. Because INNER JOIN, OUTER JOIN etc are all connected with
the tables and their data whereas EQUI JOIN and THETA JOIN are only
connected with the operators we use in the former.
Again, there are many who consider NATURAL JOIN as some sort of
"peculiar" EQUI JOIN. In fact, it is true, because of the first
condition I mentioned for NATURAL JOIN. However, we don't have to
restrict that simply to NATURAL JOINs alone. INNER JOINs, OUTER JOINs
etc could be an EQUI JOIN too.
Definition:
JOINS are way to query the data that combined together from multiple tables simultaneously.
Types of JOINS:
Concern to RDBMS there are 5-types of joins:
Equi-Join: Combines common records from two tables based on equality condition.
Technically, Join made by using equality-operator (=) to compare values of Primary Key of one table and Foreign Key values of another table, hence result set includes common(matched) records from both tables. For implementation see INNER-JOIN.
Natural-Join: It is enhanced version of Equi-Join, in which SELECT
operation omits duplicate column. For implementation see INNER-JOIN
Non-Equi-Join: It is reverse of Equi-join where joining condition is uses other than equal operator(=) e.g, !=, <=, >=, >, < or BETWEEN etc. For implementation see INNER-JOIN.
Self-Join:: A customized behavior of join where a table combined with itself; This is typically needed for querying self-referencing tables (or Unary relationship entity).
For implementation see INNER-JOINs.
Cartesian Product: It cross combines all records of both tables without any condition. Technically, it returns the result set of a query without WHERE-Clause.
As per SQL concern and advancement, there are 3-types of joins and all RDBMS joins can be achieved using these types of joins.
INNER-JOIN: It merges(or combines) matched rows from two tables. The matching is done based on common columns of tables and their comparing operation. If equality based condition then: EQUI-JOIN performed, otherwise Non-EQUI-Join.
OUTER-JOIN: It merges(or combines) matched rows from two tables and unmatched rows with NULL values. However, can customized selection of un-matched rows e.g, selecting unmatched row from first table or second table by sub-types: LEFT OUTER JOIN and RIGHT OUTER JOIN.
2.1. LEFT Outer JOIN (a.k.a, LEFT-JOIN): Returns matched rows from two tables and unmatched from the LEFT table(i.e, first table) only.
2.2. RIGHT Outer JOIN (a.k.a, RIGHT-JOIN): Returns matched rows from two tables and unmatched from the RIGHT table only.
2.3. FULL OUTER JOIN (a.k.a OUTER JOIN): Returns matched and unmatched from both tables.
CROSS-JOIN: This join does not merges/combines instead it performs Cartesian product.
Note: Self-JOIN can be achieved by either INNER-JOIN, OUTER-JOIN and CROSS-JOIN based on requirement but the table must join with itself.
For more information:
Examples:
1.1: INNER-JOIN: Equi-join implementation
SELECT *
FROM Table1 A
INNER JOIN Table2 B ON A.<Primary-Key> =B.<Foreign-Key>;
1.2: INNER-JOIN: Natural-JOIN implementation
Select A.*, B.Col1, B.Col2 --But no B.ForeignKeyColumn in Select
FROM Table1 A
INNER JOIN Table2 B On A.Pk = B.Fk;
1.3: INNER-JOIN with NON-Equi-join implementation
Select *
FROM Table1 A INNER JOIN Table2 B On A.Pk <= B.Fk;
1.4: INNER-JOIN with SELF-JOIN
Select *
FROM Table1 A1 INNER JOIN Table1 A2 On A1.Pk = A2.Fk;
2.1: OUTER JOIN (full outer join)
Select *
FROM Table1 A FULL OUTER JOIN Table2 B On A.Pk = B.Fk;
2.2: LEFT JOIN
Select *
FROM Table1 A LEFT OUTER JOIN Table2 B On A.Pk = B.Fk;
2.3: RIGHT JOIN
Select *
FROM Table1 A RIGHT OUTER JOIN Table2 B On A.Pk = B.Fk;
3.1: CROSS JOIN
Select *
FROM TableA CROSS JOIN TableB;
3.2: CROSS JOIN-Self JOIN
Select *
FROM Table1 A1 CROSS JOIN Table1 A2;
//OR//
Select *
FROM Table1 A1,Table1 A2;
Interestingly most other answers suffer from these two problems:
They focus on basic forms of join only
They (ab)use Venn diagrams, which are an inaccurate tool for visualising joins (they're much better for unions).
I've recently written an article on the topic: A Probably Incomplete, Comprehensive Guide to the Many Different Ways to JOIN Tables in SQL, which I'll summarise here.
First and foremost: JOINs are cartesian products
This is why Venn diagrams explain them so inaccurately, because a JOIN creates a cartesian product between the two joined tables. Wikipedia illustrates it nicely:
The SQL syntax for cartesian products is CROSS JOIN. For example:
SELECT *
-- This just generates all the days in January 2017
FROM generate_series(
'2017-01-01'::TIMESTAMP,
'2017-01-01'::TIMESTAMP + INTERVAL '1 month -1 day',
INTERVAL '1 day'
) AS days(day)
-- Here, we're combining all days with all departments
CROSS JOIN departments
Which combines all rows from one table with all rows from the other table:
Source:
+--------+ +------------+
| day | | department |
+--------+ +------------+
| Jan 01 | | Dept 1 |
| Jan 02 | | Dept 2 |
| ... | | Dept 3 |
| Jan 30 | +------------+
| Jan 31 |
+--------+
Result:
+--------+------------+
| day | department |
+--------+------------+
| Jan 01 | Dept 1 |
| Jan 01 | Dept 2 |
| Jan 01 | Dept 3 |
| Jan 02 | Dept 1 |
| Jan 02 | Dept 2 |
| Jan 02 | Dept 3 |
| ... | ... |
| Jan 31 | Dept 1 |
| Jan 31 | Dept 2 |
| Jan 31 | Dept 3 |
+--------+------------+
If we just write a comma separated list of tables, we'll get the same:
-- CROSS JOINing two tables:
SELECT * FROM table1, table2
INNER JOIN (Theta-JOIN)
An INNER JOIN is just a filtered CROSS JOIN where the filter predicate is called Theta in relational algebra.
For instance:
SELECT *
-- Same as before
FROM generate_series(
'2017-01-01'::TIMESTAMP,
'2017-01-01'::TIMESTAMP + INTERVAL '1 month -1 day',
INTERVAL '1 day'
) AS days(day)
-- Now, exclude all days/departments combinations for
-- days before the department was created
JOIN departments AS d ON day >= d.created_at
Note that the keyword INNER is optional (except in MS Access).
(look at the article for result examples)
EQUI JOIN
A special kind of Theta-JOIN is equi JOIN, which we use most. The predicate joins the primary key of one table with the foreign key of another table. If we use the Sakila database for illustration, we can write:
SELECT *
FROM actor AS a
JOIN film_actor AS fa ON a.actor_id = fa.actor_id
JOIN film AS f ON f.film_id = fa.film_id
This combines all actors with their films.
Or also, on some databases:
SELECT *
FROM actor
JOIN film_actor USING (actor_id)
JOIN film USING (film_id)
The USING() syntax allows for specifying a column that must be present on either side of a JOIN operation's tables and creates an equality predicate on those two columns.
NATURAL JOIN
Other answers have listed this "JOIN type" separately, but that doesn't make sense. It's just a syntax sugar form for equi JOIN, which is a special case of Theta-JOIN or INNER JOIN. NATURAL JOIN simply collects all columns that are common to both tables being joined and joins USING() those columns. Which is hardly ever useful, because of accidental matches (like LAST_UPDATE columns in the Sakila database).
Here's the syntax:
SELECT *
FROM actor
NATURAL JOIN film_actor
NATURAL JOIN film
OUTER JOIN
Now, OUTER JOIN is a bit different from INNER JOIN as it creates a UNION of several cartesian products. We can write:
-- Convenient syntax:
SELECT *
FROM a LEFT JOIN b ON <predicate>
-- Cumbersome, equivalent syntax:
SELECT a.*, b.*
FROM a JOIN b ON <predicate>
UNION ALL
SELECT a.*, NULL, NULL, ..., NULL
FROM a
WHERE NOT EXISTS (
SELECT * FROM b WHERE <predicate>
)
No one wants to write the latter, so we write OUTER JOIN (which is usually better optimised by databases).
Like INNER, the keyword OUTER is optional, here.
OUTER JOIN comes in three flavours:
LEFT [ OUTER ] JOIN: The left table of the JOIN expression is added to the union as shown above.
RIGHT [ OUTER ] JOIN: The right table of the JOIN expression is added to the union as shown above.
FULL [ OUTER ] JOIN: Both tables of the JOIN expression are added to the union as shown above.
All of these can be combined with the keyword USING() or with NATURAL (I've actually had a real world use-case for a NATURAL FULL JOIN recently)
Alternative syntaxes
There are some historic, deprecated syntaxes in Oracle and SQL Server, which supported OUTER JOIN already before the SQL standard had a syntax for this:
-- Oracle
SELECT *
FROM actor a, film_actor fa, film f
WHERE a.actor_id = fa.actor_id(+)
AND fa.film_id = f.film_id(+)
-- SQL Server
SELECT *
FROM actor a, film_actor fa, film f
WHERE a.actor_id *= fa.actor_id
AND fa.film_id *= f.film_id
Having said so, don't use this syntax. I just list this here so you can recognise it from old blog posts / legacy code.
Partitioned OUTER JOIN
Few people know this, but the SQL standard specifies partitioned OUTER JOIN (and Oracle implements it). You can write things like this:
WITH
-- Using CONNECT BY to generate all dates in January
days(day) AS (
SELECT DATE '2017-01-01' + LEVEL - 1
FROM dual
CONNECT BY LEVEL <= 31
),
-- Our departments
departments(department, created_at) AS (
SELECT 'Dept 1', DATE '2017-01-10' FROM dual UNION ALL
SELECT 'Dept 2', DATE '2017-01-11' FROM dual UNION ALL
SELECT 'Dept 3', DATE '2017-01-12' FROM dual UNION ALL
SELECT 'Dept 4', DATE '2017-04-01' FROM dual UNION ALL
SELECT 'Dept 5', DATE '2017-04-02' FROM dual
)
SELECT *
FROM days
LEFT JOIN departments
PARTITION BY (department) -- This is where the magic happens
ON day >= created_at
Parts of the result:
+--------+------------+------------+
| day | department | created_at |
+--------+------------+------------+
| Jan 01 | Dept 1 | | -- Didn't match, but still get row
| Jan 02 | Dept 1 | | -- Didn't match, but still get row
| ... | Dept 1 | | -- Didn't match, but still get row
| Jan 09 | Dept 1 | | -- Didn't match, but still get row
| Jan 10 | Dept 1 | Jan 10 | -- Matches, so get join result
| Jan 11 | Dept 1 | Jan 10 | -- Matches, so get join result
| Jan 12 | Dept 1 | Jan 10 | -- Matches, so get join result
| ... | Dept 1 | Jan 10 | -- Matches, so get join result
| Jan 31 | Dept 1 | Jan 10 | -- Matches, so get join result
The point here is that all rows from the partitioned side of the join will wind up in the result regardless if the JOIN matched anything on the "other side of the JOIN". Long story short: This is to fill up sparse data in reports. Very useful!
SEMI JOIN
Seriously? No other answer got this? Of course not, because it doesn't have a native syntax in SQL, unfortunately (just like ANTI JOIN below). But we can use IN() and EXISTS(), e.g. to find all actors who have played in films:
SELECT *
FROM actor a
WHERE EXISTS (
SELECT * FROM film_actor fa
WHERE a.actor_id = fa.actor_id
)
The WHERE a.actor_id = fa.actor_id predicate acts as the semi join predicate. If you don't believe it, check out execution plans, e.g. in Oracle. You'll see that the database executes a SEMI JOIN operation, not the EXISTS() predicate.
ANTI JOIN
This is just the opposite of SEMI JOIN (be careful not to use NOT IN though, as it has an important caveat)
Here are all the actors without films:
SELECT *
FROM actor a
WHERE NOT EXISTS (
SELECT * FROM film_actor fa
WHERE a.actor_id = fa.actor_id
)
Some folks (especially MySQL people) also write ANTI JOIN like this:
SELECT *
FROM actor a
LEFT JOIN film_actor fa
USING (actor_id)
WHERE film_id IS NULL
I think the historic reason is performance.
LATERAL JOIN
OMG, this one is too cool. I'm the only one to mention it? Here's a cool query:
SELECT a.first_name, a.last_name, f.*
FROM actor AS a
LEFT OUTER JOIN LATERAL (
SELECT f.title, SUM(amount) AS revenue
FROM film AS f
JOIN film_actor AS fa USING (film_id)
JOIN inventory AS i USING (film_id)
JOIN rental AS r USING (inventory_id)
JOIN payment AS p USING (rental_id)
WHERE fa.actor_id = a.actor_id -- JOIN predicate with the outer query!
GROUP BY f.film_id
ORDER BY revenue DESC
LIMIT 5
) AS f
ON true
It will find the TOP 5 revenue producing films per actor. Every time you need a TOP-N-per-something query, LATERAL JOIN will be your friend. If you're a SQL Server person, then you know this JOIN type under the name APPLY
SELECT a.first_name, a.last_name, f.*
FROM actor AS a
OUTER APPLY (
SELECT f.title, SUM(amount) AS revenue
FROM film AS f
JOIN film_actor AS fa ON f.film_id = fa.film_id
JOIN inventory AS i ON f.film_id = i.film_id
JOIN rental AS r ON i.inventory_id = r.inventory_id
JOIN payment AS p ON r.rental_id = p.rental_id
WHERE fa.actor_id = a.actor_id -- JOIN predicate with the outer query!
GROUP BY f.film_id
ORDER BY revenue DESC
LIMIT 5
) AS f
OK, perhaps that's cheating, because a LATERAL JOIN or APPLY expression is really a "correlated subquery" that produces several rows. But if we allow for "correlated subqueries", we can also talk about...
MULTISET
This is only really implemented by Oracle and Informix (to my knowledge), but it can be emulated in PostgreSQL using arrays and/or XML and in SQL Server using XML.
MULTISET produces a correlated subquery and nests the resulting set of rows in the outer query. The below query selects all actors and for each actor collects their films in a nested collection:
SELECT a.*, MULTISET (
SELECT f.*
FROM film AS f
JOIN film_actor AS fa USING (film_id)
WHERE a.actor_id = fa.actor_id
) AS films
FROM actor
As you have seen, there are more types of JOIN than just the "boring" INNER, OUTER, and CROSS JOIN that are usually mentioned. More details in my article. And please, stop using Venn diagrams to illustrate them.
I have created an illustration that explains better than words, in my opinion:
I'm going to push my pet peeve: the USING keyword.
If both tables on both sides of the JOIN have their foreign keys properly named (ie, same name, not just "id) then this can be used:
SELECT ...
FROM customers JOIN orders USING (customer_id)
I find this very practical, readable, and not used often enough.

What's the difference between "where" clause and "on" clause when table left join?

SQL1:
select t1.f1,t2.f2
from t1
left join t2 on t1.f1 = t2.f2 and t1.f2=1 and t1.f3=0
SQL2:
select t1.f1,t2.f2
from t1
left join t2 on t1.f1 = t2.f2
where t1.f2=1 and t1.f3=0
The difference is where and on clause, is it returning same result? and what's the difference? does DBMS run them in same way? Thanks.
The where clause applies to the whole resultset; the on clause only applies to the join in question.
In the example supplied, all of the additional conditions related to fields on the inner side of the join - so in this example, the two queries are effectively identical.
However, if you had included a condition on a value in the table in the outer side of the join, it would have made a significant difference.
You can get more from this link: http://ask.sqlservercentral.com/questions/80067/sql-data-filter-condition-in-join-vs-where-clause
For example:
select t1.f1,t2.f2 from t1 left join t2 on t1.f1 = t2.f2 and t2.f4=1
select t1.f1,t2.f2 from t1 left join t2 on t1.f1 = t2.f2 where t2.f4=1
- do different things - the former will left join to t2 records where f4 is 1, while the latter has effectively been turned back into an inner join to t2.
The first query is quicker than the second one as the join condition is more specific than the second one: it does not makes sense to return records that you will filter with the where clause (it would be better do not return them at all- query1)
Anyway it really depends by the query optimizer.
have a look at the below:
Is a JOIN faster than a WHERE?
It is important to understand the logical order of SQL operations when thinking about SQL syntax. JOIN is an operator (and ON belongs to the relevant JOIN) in the FROM clause. The FROM clause is the first operation to be executed logically (optimisers can still choose to reorder things).
In your example, there isn't really a difference, but it is easy to construct one, as I've shown in this blog post about the difference between ON and WHERE in OUTER JOIN (the example from the blog post uses the Sakila database):
First query
SELECT a.actor_id, a.first_name, a.last_name, count(fa.film_id)
FROM actor a
LEFT JOIN film_actor fa ON a.actor_id = fa.actor_id
WHERE fa.film_id < 10
GROUP BY a.actor_id, a.first_name, a.last_name
ORDER BY count(fa.film_id) ASC;
Yields:
ACTOR_ID FIRST_NAME LAST_NAME COUNT
--------------------------------------
194 MERYL ALLEN 1
198 MARY KEITEL 1
30 SANDRA PECK 1
85 MINNIE ZELLWEGER 1
123 JULIANNE DENCH 1
Because we filtered the outer joined table in the WHERE clause, the LEFT JOIN was effectively turned into an INNER JOIN. Why? Because if we had an actor that didn't play in a film, that actor's only row would have fa.film_id IS NULL, and the fa.film_id < 10 predicate would thus yield NULL. Such actors are excluded from the result, just as with an INNER JOIN.
Second query
SELECT a.actor_id, a.first_name, a.last_name, count(fa.film_id)
FROM actor a
LEFT JOIN film_actor fa ON a.actor_id = fa.actor_id
AND fa.film_id < 10
GROUP BY a.actor_id, a.first_name, a.last_name
ORDER BY count(fa.film_id) ASC;
Yields:
ACTOR_ID FIRST_NAME LAST_NAME COUNT
-----------------------------------------
3 ED CHASE 0
4 JENNIFER DAVIS 0
5 JOHNNY LOLLOBRIGIDA 0
6 BETTE NICHOLSON 0
...
1 PENELOPE GUINESS 1
200 THORA TEMPLE 1
2 NICK WAHLBERG 1
198 MARY KEITEL 1
Now, the actors without films are included in the result, because the fa.film_id < 10 predicate is part of the LEFT JOIN's ON predicate
Conclusion
Always place predicates where they make most sense logically.
Are they part of your JOIN operation? Place them in ON
Are they filters on your entire JOIN product? Place them in WHERE
The two queries are NOT identical.
Mark Bannister was right in pointing out that the where clause is applied to the whole result set but the on clause applies to the join.
In your case, for SQL 1 LEFT JOIN conditions filter joins on the right but the left side is always returned before any WHERE filtering. Since there are no WHERE conditions all of t1 is always returned.
In SQL 2, the LEFT JOIN conditions filter some results showing up on the right but again all t1 is returned. But this time the WHERE conditions may filter some records of t1 away.
INSERT INTO `t1` (`f1`,`f2`,`f3`) VALUES (1,1,1);
INSERT INTO `t2` (`f3`) VALUES (1);
Since they point to different logic the query must be written based on that and it gives us great power and flexibility.
An INNER JOIN however returns the same result so yes check the optimiser.
The relational algebra allows interchangeability of the predicates in the WHERE clause and the INNER JOIN, so even INNER JOIN queries with WHERE clauses can have the predicates rearrranged by the optimizer so that they may already be excluded during the JOIN process.
I recommend you write the queries in the most readble way possible.
Sometimes this includes making the INNER JOIN relatively "incomplete" and putting some of the criteria in the WHERE simply to make the lists of filtering criteria more easily maintainable.
You can get more from this link:
http://ask.sqlservercentral.com/questions/80067/sql-data-filter-condition-in-join-vs-where-clause
For example, instead of:
SELECT *
FROM Customers c
INNER JOIN CustomerAccounts ca
ON ca.CustomerID = c.CustomerID
AND c.State = 'NY'
INNER JOIN Accounts a
ON ca.AccountID = a.AccountID
AND a.Status = 1
Write:
SELECT *
FROM Customers c
INNER JOIN CustomerAccounts ca
ON ca.CustomerID = c.CustomerID
INNER JOIN Accounts a
ON ca.AccountID = a.AccountID
WHERE c.State = 'NY'
AND a.Status = 1
But it depends, of course.
1)
SQL1: select t1.f1,t2.f2 from t1 left join t2 on t1.f1 = t2.f2 **and** t1.f2=1 and t1.f3=0
In this, parser will check each row of t1 with each row of t2 with these 3 conditions. Getting faster result.
2) SQL2: select t1.f1,t2.f2 from t1 left join t2 on t1.f1 = t2.f2 **where** t1.f2=1 and t1.f3=0
In this, join only take 1st condition and then the result got from join is filtered with those 2 conditions. And will take more time than 1st query.
You can get more from this link: http://ask.sqlservercentral.com/questions/80067/sql-data-filter-condition-in-join-vs-where-clause