Difference between join and no join select? - sql

I see no difference between the two queries below:
query_join = select a.id, b.name, a.telephone, a.description from tb_industry a left outer join tb_sector b on a.sector_id = b.id
query_select = select a.id, b.name, a.telephone, a.description from tb_industry a , tb_sector b WHERE a.sector_id = b.id
The result is exactly the same.
Now I guess this could happen, but I would like to get enlightment on what are the situations where only query_join is best, and the situations where query_condition is best?

The first is an OUTER join. This shows rows from table A even if there is no matching row in table B. Suppose tables contain the following data:
select a.name, a.sector_id from tb_industry a;
name sector_id
---- ---------
A 1
B 2
C 3
Select b.id, b.name from tb_sector b;
id name
-- ----
1 X
2 Y
(Note that there is no tb_sector row with id 3.)
The outer join still returns all rows from table A, with NULLs for values that should have come from table B:
select a.name, a.sector_id, b.name as sector_name
from tb_industry a left outer join tb_sector b on a.sector_id = b.id;
name sector_id sector_name
---- --------- -----------
A 1 X
B 2 Y
C 3
The other query (an INNER join) misses the unmatched row:
select a.name, a.sector_id, b.name as sector_name
from tb_industry , tb_sector b where a.sector_id = b.id;
name sector_id sector_name
---- --------- -----------
A 1 X
B 2 Y
The following query is also an inner join, using the newer ANSI join syntax:
select a.name, a.sector_id, b.name as sector_name
from tb_industry a
join tb_sector b on a.sector_id = b.id;
name sector_id sector_name
---- --------- -----------
A 1 X
B 2 Y
Without the OUTER keyword, the join is an inner join.

They're not the same, although they may return the same result depending on your data.
The first is a left outer join, so will return rows if the corresponding table doesn't have a matching entry.
The second is essentially a inner join, so will not return rows unless both tables have matching entries.
It depends on your preference, but the first type of syntax is easier to read when queries are complex.

You use joins when you want to retreive all results from the table you are selecting from and the values from the left or right table (LEFT JOIN, RIGHT JOIN) when there are matches.
When you want an explicit match you use the query condition style.
Hope it helps! w3schools has some simple and basic examples on this.

If they return the same results for you then every item in tb_industry has an item in tb_sector. Your second query is incorrect for the equivalent of a left join, which would be a.sector_id *= b.sector_id.
The *= syntax is deprecated and being phased out in newer RDBMS.

Joins are the newer syntax to express relations in queries. They offer the benefit of outer joins, which are not really possible in a where clause (Oracle had a language extension for this, by adding a (+) to the filter, but it was very limited and not very easy to understand). When using inner joins, it doesn't matter, the result is the same.
This is subjective, but in my opinion, joins are much easier to read.

A Left Outer Join is not the same as a normal Join.
The result of a left outer join (or simply left join) for table A and B always contains all records of the "left" table (A), even if the join-condition does not find any matching record in the "right" table (B).

Your join query with left outer join will bring in even un-matched records from the database.
Your second query will only bring in matched results.

Related

What's the purpose of a JOIN where no column from 2nd table is being used?

I am looking through some hive queries we are running as part of analytics on our hadoop cluster, but I am having trouble understanding one. This is the Hive QL query
SELECT
c_id, v_id, COUNT(DISTINCT(m_id)) AS participants,
cast(date_sub(current_date, ${window}) as string) as event_date
from (
select
a.c_id, a.v_id, a.user_id,
case
when c.id1 is not null and a.timestamp <= c.stitching_ts then c.id2 else a.m_id
end as m_id
from (
select * from first
where event_date <= cast(date_sub(current_date, ${window}) as string)
) a
join (
select * from second
) b on a.c_id = b.c_id
left join third c
on a.user_id = c.id1
) dx
group by c_id, v_id;
I have changed the names but otherwise this is the select statement being used to insert overwrite to another table.
Regarding the join
join (
select * from second
) b on a.c_id = b.c_id
b is not used anywhere except for join condition, so is this join serving any purpose at all?
Is it for making sure that this join only has entries where c_id is present in second table? Would a where IN condition be better if thats all this is doing.
Or I can just remove this join and it won't make any difference at all.
Thanks.
Join (any inner, left or right) can duplicate rows if join key in joined dataset is not unique. For example if a contains single row with c_id=1 and b contains two rows with c_id=1, the result will be two rows with a.c_id=1.
Join (inner) can filter rows if join key is absent in joined dataset. I believe this is what it meant to do.
If the goal is to get only rows with keys present in both datasets(filter) and you do not want duplication, and you do not use columns from joined dataset, then better use LEFT SEMI JOIN instead of JOIN, it will work as filter only even if there are duplicated keys in joined dataset:
left semi join (
select c_id from second
) b on a.c_id = b.c_id
This is much safer way to filter rows only which exist in both a and b and avoid unintended duplication.
You can replace join with WHERE IN/EXISTS, but it makes no difference, it is implemented as the same JOIN, check the EXPLAIN output and you will see the same query plan. Better use LEFT SEMI JOIN, it implements uncorrelated IN/EXISTS in efficient way.
If you prefer to move it to the WHERE:
WHERE a.c_id IN (select c_id from second)
or correlated EXISTS:
WHERE EXISTS (select 1 from second b where a.c_id=b.c_id)
But as I said, all of them are implemented internally using JOIN operator.

Joins in oracle

If I write simply this query -
SELECT * FROM AA,BB WHERE AA.ID = BB.ID
then what kind of join it will be. If we apply any kind of join then we need to specify like inner join, outer join, cross join. Then what kind of join it is?
And what's the actual difference between Inner Join and Cross join?
Your query represents inner join. For example, on Scott's sample schema, it'll join rows from EMP and DEPT tables on DEPTNO (which is a common column):
SQL> select count(*)
2 from emp e inner join dept d on e.deptno = d.deptno;
COUNT(*)
----------
14
SQL>
(you can omit inner keyword).
You asked what is a difference between inner and cross join; cross join represents a Cartesian product, which means that the result will be pairs of all rows from the first table with all rows from the second table:
SQL> select count(*)
2 from emp e cross join dept d;
COUNT(*)
----------
56
SQL>
Using "your" old syntax, that's a join without WHERE clause:
SQL> select count(*)
2 from emp e, dept d;
COUNT(*)
----------
56
SQL>
Outer join will take rows that don't have a "pair" in another table. In Scott's schema, it is department 40 in DEPT table as there are no employees who work there:
SQL> select count(*)
2 from emp e right outer join dept d on e.deptno = d.deptno;
COUNT(*)
----------
15
SQL>
The old Oracle outer join operator ((+)) is something you might still see somewhere:
SQL> select count(*)
2 from emp e, dept d
3 where e.deptno (+) = d.deptno;
COUNT(*)
----------
15
SQL>
Basically, you should switch to modern joins. More about the subject in documentation about joins, here: https://docs.oracle.com/cd/B28359_01/server.111/b28286/queries006.htm#SQLRF52331 (11g version; find the one related to your database version, although there shouldn't be anything revolutionary in more recent versions).
You asked:
If I write simply this query -
SELECT * FROM AA,BB WHERE AA.ID = BB.ID
then what kind of join it will be?
it's an INNER JOIN using pre ANSI-92 syntax.
If we apply any kind of join then we need to specify like inner join, outer join, cross join.
Only if using the current ANSI join syntax (which you should be in 2019).
Then what kind of join it is?
Really it's a cross join which gets records eliminated base on the where clause making it behave like an inner join. Eliminate the where clause and you get every record in AA to every record in BB.
And what's the actual difference between Inner Join and Cross join?>
An inner join only returns records that EXIST & RELATE IN both tables involved
A cross join relates EVERY record in one table to EVERY record in the other.
Take for example two Tables (A,B,C) (A,B,D)
an INNER JOIN would return the set {A.A,B.B}(2)
a cross join would return {A.A, A.B, A.D, B.A, B.B, B.D, C.A, C.B, C.D}(9)
It looks like you don't quite understand the various types of joins. Overall, there are 5 different join types: INNER, LEFT OUTER, RIGHT OUTER, FULL OUTER, CROSS. I'll show you an example of all 5 using these 2 tables
Tables AA and BB will only have the field ID. Table AA will have 'A', and 'B' as values for ID. Table BB will have 'B', 'C', and 'D' as values for ID.
For an INNER JOIN, only those rows with matching values are included. So the INNER join of AA and BB would be:
'B','B'
For a LEFT OUTER JOIN, every row in the left table is returned, and for those rows in the right table without a matching value, NULL is returned. So we get:
'A',NULL
'B','B'
For a RIGHT OUTER JOIN, every row in the right table is returned, and for those rows in the left table without a matching value, NULL is returned. So we get:
'B','B'
NULL,'C'
NULL,'D'
For a FULL OUTER JOIN, every row in both tables are returned. And missing values in either table are NULL. So we get:
'A',NULL
'B','B'
NULL,'C'
NULL,'D'
And finally, for a CROSS JOIN, every row in each table is returned with every row in the other table. So we get:
'A','B'
'A','C'
'A','D'
'B','B'
'B','C'
'B','D'
For the example SQL statement you gave, you're returning an INNER JOIN. As for the difference between an INNER JOIN and a CROSS JOIN, the above example should have illustrated the differences, but for an INNER JOIN you're examining the minimum number of rows and for a CROSS JOIN, you're examining the maximum possible number of rows. In general if you examine the plan for a SQL query and find out that a CROSS JOIN is being used in the plan, more often than not, you have an error in your SQL since cross joins tend to be extremely processor and I/O intensive.

Filter on the column on which two tables are joined

Are next two queries going to return same result set?
SELECT * FROM tableA a
JOIN tableB b
ON a.id = b.id
WHERE a.id = '5'
--------------------------------
SELECT * FROM tableA a
JOIN tableb b
ON a.id = b.id
WHERE b.id = '5'
Also, will answer be different if LEFT JOIN is used instead of JOIN?
As written, they will return the same result.
The two will not necessarily return the same result with a left join.
Yes the result will be the same.
With a left join you will get every dataset of both table who got a ID.
With a join (Inner Join) you will get only the dataset's who a.id = b.id.
This site will explain you how to join https://www.w3schools.com/sql/sql_join.asp
Yes they will. A simple join works like an inner join by default. It checks for instances where the item you're joining on exist on both tables. Since you're joining on where a.id=b.id the results will be the same.
If you change the type of join to a left, the results will include all a.id's regardless of whether they are equal to 5.

Can the order of Inner Joins Change the results o a query

I have the following scenario on a SQL Server 2008 R2:
The following queries returns :
select * from TableA where ID = '123'; -- 1 rows
select * from TableB where ID = '123'; -- 5 rows
select * from TableC where ID = '123'; -- 0 rows
When joining these tables the following way, it returns 1 row
SELECT A.ID
FROM TableA A
INNER JOIN ( SELECT DISTINCT ID
FROM TableB ) AS D
ON D.ID = A.ID
INNER JOIN TableC C
ON A.ID = C.ID
ORDER BY A.ID
But, when switching the inner joins order it does not returns any row
SELECT A.ID
FROM TableA A
INNER JOIN TableC C
ON A.ID = C.ID
INNER JOIN ( SELECT DISTINCT ID
FROM TableB ) AS D
ON D.ID = A.ID
ORDER BY A.ID
Can this be possible?
Print Screen:
For inner joins, the order of the join operations does not affect the query (it can affect the ordering of the rows and columns, but the same data is returned).
In this case, the result set is a subset of the Cartesian product of all the tables. The ordering doesn't matter.
The order can and does matter for outer joins.
In your case, one of the tables is empty. So, the Cartesian product is empty and the result set is empty. It is that simple.
As Gordon mentioned, for inner joins the order of joins doesn't matter, whereas it does matter when there's at least one outer join involved; however, in your case, none of this is pertinent as you are inner joining 3 tables, one of which will return zero rows - hence all combinations will result in zero rows.
You cannot reproduce the erratic behavior with the queries as they are shown in this question since they will always return zero records. You can try it again on your end to see what you come up with, and if you do find a difference, please share it with us then.
For the future, whenever you have something like this, creating some dummy data either in the form of insert statements or in rextester or the like, you make it that much easier for someone to help you.
Best of luck.

Factor where clauses into subqueries

I'm wondering if its always possible in SQL to factor a where condition through a join to a subquery. For instance, if I have
select ... from a join b on ... where p and q
and p pertains only to a, q to b, then can I always rewrite as?
select ... from (select ... from a where p) as a join (select ... from b where q) as b on ...
Thanks!
[Notes: 1) I'm using postgres in case this affects the answer. 2) Readability is not an important consideration, as these are automatically generated queries. Edit: 3) I'm not only interested in inner join but other joins as well.]
In general the query 1:
SELECT ...
FROM TableA
JOIN TableB ON <SomeForeignKey>
JOIN TableC ON <SomeForeignKey>
WHERE <SomeConditionOnTableA> AND
<SomeConditionOnTableB> AND
<SomeConditionOnTableC>
... is equivalent to the query 2:
SELECT ...
FROM TableA
JOIN TableB ON <SomeForeignKey> AND <SomeConditionOnTableB>
JOIN TableC ON <SomeForeignKey> AND <SomeConditionOnTableC>
WHERE <SomeConditionOnTableA>
But the same is not true if instead of (INNER) JOINs you use OUTER JOINs. With OUTER JOINs the equivalency holds for very simple conditions that match NOT NULL column values, like:
name='value'
name LIKE '%value%'
number < const
field IN (...)
Notice that these are all conditions that make the OUTER JOINs moot anyway, as they are filtering out rows that have NULL values in the envolved columns... so they would filter out also the rows added by the OUTER JOIN not retrieving anything from the joined table.
But the equivalency breaks if you use OUTER JOINs and start comparing column values with NULLs or comparing expressions that may envolve NULLs.
For example, taking this query (formatted as query 1):
SELECT ...
FROM TableA a
LEFT JOIN TableB b ON <SomeForeignKey>
LEFT JOIN TableC c ON <SomeForeignKey>
WHERE a.somefield = 'whatever'
AND b.name IS NOT NULL
AND c.somenumber >100
In this case the filter is applied after having resolved the OUTER JOIN, and it eliminates both the rows that exist in TableB and have a NULL name, but also removes the rows that where added by the OUTER JOIN not finding a matching row in TableB. This is not equivalent to the query 2 format:
SELECT ...
FROM TableA a
LEFT JOIN TableB b ON <SomeForeignKey> AND b.name IS NOT NULL
LEFT JOIN TableC c ON <SomeForeignKey> AND c.somenumber >100
WHERE a.somefield = 'whatever'
In this case the filter is applied to TableB before resolving the OUTER JOIN. TableB rows that have a NULL name are eliminated by the filter, but reintroduced by the LEFT JOIN. So this query might contain rows that the former does not.
I would say yes, I can't think of a situation where it is not possible. WHERE in it self can be replaced with a join:
select ... from A where x=10
<=>
select ... from A join ( values (10) ) B (x) on A.x = B.x
Perhaps off topic, but for transformations in general Vadim Tropashko (http://arxiv.org/abs/cs/0501053) shows that it is possible to reduce the set of classic relational algebra operators to two binary operations: natural join and generalized union