What's the difference between Join and UNION in SQL - sql

I'm just wondering what's the difference between Join and Union in SQL and in Join, what's the difference between Join and Cross Join? THANKS!

Join: Joins the table based on certain conditions. Lets say Table A has 2 rows rowA1 and rowA2. And you join this with Table B which has 3 rows rowB1 rowB2 rowB3. So the result will be:
rowA1.data RowB1.data
rowA1.data RowB2.data
rowA1.data RowB3.data
rowA2.data RowB1.data
rowA2.data RowB2.data
rowA2.data RowB3.data
But in a union, the result will be:
rowA1.data
rowA2.data
rowB1.data
rowB2.data
rowB3.data
Union will also check for duplicate. The data types should be consistent. The data types of columns returned should be in same order and number as the datatype and no. of columns returned by second table.
Join is a concept. It can be various types liek Inner Join, Outer Join, Cross Join. Cross Join means, there is a missing condition that would uniquely join the table data.

Related

Terminology for when a join propagates out additional rows

When joining between two tables/queries:
with
cte1 (id) as (
select 1 from dual),
cte2 (id) as (
select 1 from dual union all
select 1 from dual)
select
cte1.id as cte1_id,
cte2.id as cte2_id
from
cte1
left join
cte2
on cte1.id = cte2.id
CTE1_ID CTE2_ID
1 1
1 1
Unsurprisingly, that join propagates out additional rows. The query on the left side of the join only had one row. But the resultset has two rows due to the join.
I suspect “propagate” isn’t quite the right word for describing that scenario.
What’s the proper term?
For example, when talking to people who are new to SQL, I often say, “Be careful with that join. It looks like you’re accidentally propagating out additional rows, since the join is 1:many.”
In this example you are not propagating rows (at least in my understanding anyway). You have two rows in the table on the right side of the join and you have two rows in the result.
However, if you had this:
WITH cte AS
(
SELECT 1 AS id FROM dual
UNION ALL
SELECT 1 FROM dual
)
SELECT x.id, y.id
FROM cte x
INNER JOIN cte y ON x.id = y.id;
You would start with 2 rows and the query would return 4, because the join is partial. To me, this is propagating data.
When every row from one side of the join is joined with every row on the other, the term you are looking for is a "cartesian product", which is achieved in SQL using a "cross join" or, in cases where the join is not unique but is limited partially, you could use "partial cartesian product" (though I don't recommend it) or more commonly a "partial cross-join". I think the latter is more likely to be readily appreciated by SQL developers.
In either case, there are times where both can be appropriate but a lot of the time they are the result of an error in a join clause.
What’s the proper term?
"Cartesian Product" could be one term you can use.
I.e. "Be careful of that join. It looks like you are accidentally returning the cartesian product of the two tables."
A CROSS JOIN will return the cartesian product of the two joined tables; it is also called a "Cartesian Join".
An INNER JOIN will return the cartesian product of the two joined tables that is filtered by some relationship (the join condition(s)) between columns of the two tables; it is also called an "Equi Join".
An OUTER JOIN is similar to the INNER JOIN but will also return the non-matched rows on one (for LEFT or RIGHT joins) or both (for FULL joins) sides of the join condition.
why a LEFT JOIN an INNER JOIN would do the same!
And no it doesn't propagate, you have in cte2 2 id's with 1 that is what UNION ALL actually amkes so when you join both tables, with the same id you will receive 2 rows as joined result set.
A Left Join also takes all rows of the left tables and troes to join in your case by the id and if it didn't find any companion, it adds the row with the right table columns as NULL.
So no wonders and no miracles, simple SQL

Altering Order of tables in JOIN condition

Given the below scenario:
Table A has 1000 rows and Table B has 5000 rows.
Q1: Select * from Table_A Left Outer Join Table_B
ON condition
Q2: Select * from Table_B Left Outer Join Table_A
ON condition
Does this make any difference ? Would there be any performance difference in these situations?
Yes, it makes a big difference for a LEFT JOIN. The two statements are not the same, and the execution paths are likely to be different.
The first query keeps all rows in Table A, plus any matching values from Table B. So this version returns at least 1000 rows.
The second keeps all rows in Table B, plus any matching values from Table A. This is not the same thing. This version returns at least 5000 rows.
For an INNER JOIN (or FULL OUTER JOIN) then the order of the tables in the FROM clause does not affect the result set. However, depending on the optimizer it could affect how the joins are processed (I am thinking of long chains of joins where optimizers take short-cuts).
Does this make any difference ?
Yes it does. LEFT JOIN Definition: returns all rows from left table + matching rows in both table. Matching row means intersection of both tables.
So in your case, the number of rows returned will be very different.
Q1: Select * from Table_A Left Outer Join Table_B ON condition
In this case number of rows returned will be 1000 (since your tableA has 1000 rows and in left side of JOIN) plus the match (intersection between the tables)
Q2: Select * from Table_B Left Outer Join Table_A ON condition
In this case number of rows returned will be 5000 (since your tableB has 5000 rows and in left side of JOIN) plus the match (intersection between the tables)
See the visual representation of the same [Image taken from This CodeProject Post]:
The two queries will have in different results.
See W3 Schools Left Join
and go to the Try It Yourself page. The SQL can be edited for a LEFT OUTER JOIN.

SQL Server JOINS: Are 'JOIN' Statements 'LEFT OUTER' Associated by Default in SQL Server? [duplicate]

This question already has answers here:
What is the difference between "INNER JOIN" and "OUTER JOIN"?
(28 answers)
Closed 8 years ago.
I have about 6 months novice experience in SQL, TSQL, SSIS, ETL. As I find myself using JOIN statements more and more in my intern project I have been experimenting with the different JOIN statements. I wanted to confirm my findings. Are the following statements accurate pertaining to the conclusion of JOIN statements in SQL Server?:
1)I did a LEFT OUTER JOIN query and did the same query using JOIN which yielded the same results; are all JOIN statements LEFT OUTER associated in SQL Server?
2)I did a LEFT OUTER JOIN WHERE 2nd table PK (joined to) IS NOT NULL and did the same query using an INNER JOIN which yielded the same results; is it safe to say the the INNER JOIN statement will yield only matched records? and is the same as LEFT OUTER JOIN where joined records IS NOT NULL ?
The reason I'm asking is because I have been only using LEFT OUTER JOINS because that is what I was comfortable with. However, I want to eliminate as much code as possible when writing queries to be more efficient. I just wanted to make sure my observations are correct.
Also, are there any tips that you could provide on easily figuring out which JOIN statement is appropriate for specific queries? For instance, what JOIN would you use if you wanted to yield non-matching records?
Thanks.
A join or inner join (same thing) between table A and table B on, for instance, field1, would narrow in on all rows of table A and B sharing the same field1 value.
A left outer join between A and B, on field1, would show all rows of table A, and only those rows of table B that have a field1 existing in table A.
Where the rows of field1 on table A have a field1 value that doesn't exist in table B, the table B value would show null for field1, but the row of table A would be retained because it is an outer join. These are rows that wouldn't show up in a join which is an implied inner join.
If you get the same results doing a join between table A and table B as you do a left outer join between table A and B, then whatever fields you're joining on have values that exist in both tables. No value for any of the joined fields in A or B exist exclusively in A or B, they all exist in both A and B.
It is also possible you're putting criteria into the where clause that belongs in the on clause of the outer join, which may be causing your confusion. In my example above of tables A and B, where A is being left outer joined with B, you would put any criteria related to table B in the on clause, not the where clause, otherwise you would essentially be turning the outer join into an inner join. For example if you had b.field4 = 12 in the WHERE clause, and table B didn't have a match with A, it would be null and that criteria would fail, and it'd no longer come back even though you used a left outer join. That may be what you are referring to.
JOIN's are mapped to 'INNER JOIN' by default

Performance comparison with postgresql : left join vs union all

I want to know what is the best and the fastest solution between a "left outer join" and an "union all".
The database is a PostgreSQL.
Query with an UNION ALL :
SELECT * FROM element, user WHERE elm_usr_id = usr_id
UNION ALL
SELECT * FROM element WHERE elm_usr_id ISNULL;
Query with a LEFT OUTER JOIN :
SELECT * FROM element LEFT OUTER JOIN user ON elm_usr_id = usr_id;
Your two queries may not produce the same result.
Your query with UNION ALL returns rows that matches plus rows that not matches because of a null value in elm_usr_id.
While the query with LEFT JOIN (same as LEFT OUTER JOIN) returns rows that matches plus rows that not matches because of any not corresponding value.
Regarding to this, the query with LEFT JOIN is more secure if you expect to see all rows.
Back to your original question, the query with LEFT JOIN is the best on for taking advantage of indexes. For example, if you'd like to have a sorted result, then the UNION query will be far slowest. Or if your query is a subquery in a main query, then the UNION will prevent any possible exploitation of table [element] indexes. So it will be slow to perform a JOIN or WHERE of such a subquery.
I would suggest LEFT OUTER JOIN over union all in this particular scenario,
as in union all you have to read the tables twice, whereas in LEFT OUTER JOIN only once
Probably the LEFT JOIN, but you can see the query plan by running EXPLAIN ANALYSE SELECT.... The UNION ALL form might be clearer if you were modifying columns based on the null-ness of elm_usr_id but you could always use CASE to do column modifications with a LEFT JOIN.

Compare inner join and outer join SQL statements

What is the difference between an inner join and outer join? What's the precise meaning of these two kinds of joins?
Check out Jeff Atwood's excellent:
A Visual Explanation of SQL Joins
Marc
Wikipedia has a nice long article on the topic [here](http://en.wikipedia.org/wiki/Join_(SQL))
But basically :
Inner joins return results where there are rows that satisfy the where clause in ALL tables
Outer joins return results where there are rows that satisfy the where clause in at least one of the tables
You use INNER JOIN to return all rows from both tables where there is a match. ie. in the resulting table all the rows and columns will have values.
In OUTER JOIN the resulting table may have empty columns. Outer join may be either LEFT or RIGHT
LEFT OUTER JOIN returns all the rows from the first table, even if there are no matches in the second table.
RIGHT OUTER JOIN returns all the rows from the second table, even if there are no matches in the first table.
INNER JOIN returns rows that exist in both tables
OUTER JOIN returns all rows that exist in either table
Inner join only returns a joined row if the record appears in both table.
Outer join depending on direction will show all records from one table, joined to the data from them joined table where a corresponding row exists
Using mathematical Set,
Inner Join is A ^ B;
Outer Join is A - B.
So it is (+) is your A side in the query.
Assume an example schema with customers and order:
INNER JOIN: Retrieves customers with orders only.
LEFT OUTER JOIN: Retrieves all customers with or without orders.
RIGHT OUTER JOIN: Retrieves all orders with or without matching customer records.
For a slightly more detailed infos, see Inner and Outer Join SQL Statements