SQL Different between Left join on... and Left Join on..where - sql

I have two sql to join two table together:
select top 100 a.XXX
,a.YYY
,a.ZZZ
,b.GGG
,b.JJJ
from table_01 a
left join table_02 b
on a.XXX = b.GGG
and b.JJJ = "abc"
and a.YYY between '01/08/2009 13:18:00' and '12/08/2009 13:18:00'
select top 100 a.XXX
,a.YYY
,a.ZZZ
,b.GGG
,b.JJJ
from table_01 a
left join table_02 b
on a.XXX = b.GGG
where b.JJJ = "abc"
and a.YYY between '01/08/2009 13:18:00' and '12/08/2009 13:18:00'
The outcome of them is different but I don't understand the reason why.
I would be grateful if I can get some help here.

Whenever you are using LEFT JOIN, all the conditions about the content of the right table should be in the ON clause, otherwise you are effectively converting your LEFT JOIN to an INNER JOIN.
The reason for that is that when a LEFT JOIN is used, all the rows from the left table will be returned. If they are matched by the right table, the values of the matching row(s) will be returned as well, but if they are not matched with any row on the right table, then the right table will return a row of null values.
Since you can't compare anything to NULL (not even another NULL) (Read this answer to find out why), you are basically telling your database to return all rows that are matched in both tables.
However, when the condition is in the ON clause, Your database knows to treat it as a part of the join condition.

Related

Why an 'ON' clause is required in a left outer join

As far as I understand in a left outer join between two tables (say a & b) all the rows of the table on the left side of the join are retrieved regardless of the values in the rows on the right table. Then why do we need an 'ON' clause specifying a condition, something like this:
select * from a LEFT OUTER JOIN b on a.some_column1 = b.some_column2;
Why is there a need for the statement "a.some_column1 = b.some_column2".
A left join would return all the rows from table a, and for each row the matching row in table b, if it exists - if it doesn't, nulls would be returned instead of b's columns. The on clause defines how this matching is done.
An on clause is required since you are "joining", and you need to tell which columns you want to join by. Otherwise you would use traditional from without any where condition to all possible row combinations. But you wanted a join, right?
Yeah, that's pretty much it is.
As far as I understand in a left outer join between two tables (say a & b) all the rows of the table on the left side of the join are retrieved regardless of the values in the rows on the right table.
That is correct in the sense that it says something about what left join on returns, but it isn't a definition of what it returns. left join on returns inner join on rows plus (union all) unmatched left table rows extended by nulls.
inner join on returns the rows of cross join that satisfy the on condition--which could be any condition on the columns. cross join returns every combination of a row from the left table & a row from the right table.
What do you expect outer join without on to mean? In standard SQL outer & inner join have to have an on. inner join on a true condition is the same as cross join. Which has no unmatched left table rows. So if you want outer join with no on to mean outer join on a true condition then, since there are no unmatched rows in the inner join on that condition, the result is also just cross join. (MySQL allows inner join to be used without an on, but it just means cross join.)

What means "table A left outer join table B ON TRUE"?

I know conditions are used in table joining. But I met a specific situation and the SQL codes writes like "Table A join table B ON TRUE"
What will happen based on the "ON TRUE" condition? Is that just a total cross join without any condition selection?
Actually, the original expression is like:
Table A LEFT outer join table B on TRUE
Let's say A has m rows and B has n rows. Is there any conflict between "left outer join" and "on true"? Because it seems "on true" results a cross join.
From what I guess, the result will be m*n rows. So, it has no need to write "left outer join", just a "join" will give the same output, right?
Yes. That's the same thing as a CROSS JOIN.
In MySQL, we can omit the [optional] CROSS keyword. We can also omit the ON clause.
The condition in the ON clause is evaluated as a boolean, so we could also jave written something like ON 1=1.
UPDATE:
(The question was edited, to add another question about a LEFT [OUTER] JOIN b which is different than the original construct: a JOIN b)
The "LEFT [OUTER] JOIN" is slightly different, in that rows from the table on the left side will be returned even when there are no matching rows found in the table on the right side.
As noted, a CROSS JOIN between tables a (containing m rows) and table b containing n rows, absent any other predicates, will produce a resultset of m x n rows.
The LEFT [OUTER] JOIN will produce a different resultset in the special case where table b contains 0 rows.
CREATE TABLE a (i INT);
CREATE TABLE b (i INT);
INSERT INTO a VALUES (1),(2),(3);
SELECT a.i, b.i FROM a LEFT JOIN b ON TRUE ;
Note that the LEFT JOIN will returns rows from table a (a total of m rows) even when table b contains 0 rows.
A cross join produces a cartesian product between the two tables, returning all possible combinations of all rows. It has no on clause because you're just joining everything to everything.
Cross join does not combine the rows, if you have 100 rows in each table with 1 to 1 match, you get 10.000 results, Innerjoin will only return 100 rows in the same situation.
These 2 examples will return the same result:
Cross join
select * from table1 cross join table2 where table1.id = table2.fk_id
Inner join
select * from table1 join table2 on table1.id = table2.fk_id
Use the last method
The join syntax's general form:
SELECT *
FROM table_a
JOIN table_b ON condition
The condition is used to tell the database how to match rows from table_a to table_b, and would usually look like table_a.some_id = table_b.some_id.
If you just specify true, you will match every row from table_a with every row of table_b, so if table_a contains n rows and table_b contains m rows the result would have m*n rows.
Most(?) modern databases have a cleaner syntax for this, though:
SELECT *
FROM table_a
CROSS JOIN table_b
The difference between the pure cross join and left join (where the condition is forced to be always true, as when using ON TRUE) is that the result set for the left join will also have rows where the left table's rows appear next to a bunch of NULLs where the right table's columns would have been.

SQL trying to do a JOIN to include results from multiple Tables

I'm a complete novice teaching myself SQL by writing and modifying a few queries and reports at work.
I've got something of a handle on the various types of JOINs and I've used INNER JOIN a few times with decent success.
What I'm stuck on should be a simple task, but my Google-Fu must be weak. Here's what I'm trying to do.
Say I have 3 tables, Table_A, Table_B, and Table_C, and each table has a column called [Serial_Number].
What I'm wanting to select is 3 of the other columns if A.Serial_Number = B.Serial_Number OR C.Serial_Number.
I've tried doing:
SELECT
*
FROM
Table_A AS A
INNER JOIN Table_B AS B ON A.Serial_Number = B.Serial_Number
INNER JOIN Table_C AS C ON A.Serial_Number = C.Serial_Number
But this always yields 0 results as the nature of the data dictates that if A matches B, it will never match C and vice versa. I also tried a LEFT OUTER JOIN as the second clause, but this just includes NULLs from Table_C that have already matched on Table_B.
All the searches I have done relating to JOINs on multiple tables seem to be about using JOINS to further exclude records, where I'm actually wanting to INCLUDE more records.
Like I said, I'm sure this is really simple, just needing a nudge in right direction.
Thanks!
The use of two inner joins here is akin to saying
If A.Serial_Number = B.Serial_Number AND
A.Serial_Number = C.Serial_Number
Using left outer join on the second clause - by which i presume you mean second join - would perform a left join on a result set already filtered by A.Serial_Number = B.Serial_Number by the first inner join. Given that B.Serial_Number doesn't relate to C.Serial_Number you wouldn't expect the an equijoin to return any result from tablec.
What you want is a left outer join like you tried but for both tableb and tablec.
Select *
From tablea
Left join tableb on tableb.Serial_Number = tablea.Serial_Number
Left join tablec on tablec.Serial_Number = tablea.Serial_Number
This way regardless of whether tablea.Serial_Number is in tableb it will still be returned and thus available to be joined to tablec
Agreed. Your output for your inner joins is producing NULLs which is why it is resulting in 0. I would suggest modifying your INNER JOIN.

Left Outer join and an additional where clause

I have a join on two tables defined as a left outer join so that all records are returned from the left hand table even if they don't have a record in the right hand table. However I also need to include a where clause on a field from the right-hand table, but.... I still want a row from the left-hand table to be returned for each record in the left-hand table even if the condition in the where clause isn't met. Is there a way of doing this?
Yes, put the condition (called a predicate) in the join conditions
Select [stuff]
From TableA a
Left Join TableB b
On b.Pk = a.Pk
-- [Put your condition here, like this]
And b.Column = somevalue
The reason this works is because the query processor applies conditions in a where clause after all joins are completed, and the final result set has been constructed. So, at that point, a column from the a table on the outer side of a join that has null in a a column you have established a predicate on will be excluded.
Predicates in a join clause are applied before the two result sets are "joined". At this point all the rows on both sides of the join are still there, so the predicate is effective.
You just need to put the predicate into the JOIN condition. Putting it into the WHERE clause would effectively convert your query to an inner join.
For Example:
...
From a
Left Join b on a.id = b.id and b.condition = 'x'
You can use
WHERE (right_table.column=value OR right_table.column IS NULL)
This will return all rows from table 1 and table 2, but only where table 1 does not have a corresponding row in table 2 or the corresponding row in table 2 matches your criteria.
SELECT x.fieldA, y.fieldB
FROM x
LEFT OUTER JOIN (select fieldb, fieldc from Y where condition = some_condition)
ON x.fieldc = y.fieldc
select *
from table1 t1
left outer join table2 t2 on t1.id = t2.id
where t1.some_field = nvl(t2.some_field, t1.some_field)
UPD: errr... no. this way:
select *
from table1 t1
left outer join table2 t2 on t1.id = t2.id
where some_required_value = nvl(t2.some_field, some_required_value)
nvl is an Oracle syntax which replaces first argument with second in case it is null (which is common for outer joins). You can use ifnull or coalesce for other databases.
Thus, you compare t2.some_field with your search criteria if it has met join predicate, but if it has not, then you just return row from table1, because some_required_value compared to itself will always be true (unless it is null, however - null = null yields null, neither true not false.

How can a LEFT OUTER JOIN return more records than exist in the left table?

I have a very basic LEFT OUTER JOIN to return all results from the left table and some additional information from a much bigger table. The left table contains 4935 records yet when I LEFT OUTER JOIN it to an additional table the record count is significantly larger.
As far as I'm aware it is absolute gospel that a LEFT OUTER JOIN will return all records from the left table with matched records from the right table and null values for any rows which cannot be matched, as such it's my understanding that it should be impossible to return more rows than exist in the left table, but it's happening all the same!
SQL Query follows:
SELECT SUSP.Susp_Visits.SuspReason, SUSP.Susp_Visits.SiteID
FROM SUSP.Susp_Visits LEFT OUTER JOIN
DATA.Dim_Member ON SUSP.Susp_Visits.MemID = DATA.Dim_Member.MembershipNum
Perhaps I have made a mistake in the syntax or my understanding of LEFT OUTER JOIN is incomplete, hopefully someone can explain how this could be occurring?
The LEFT OUTER JOIN will return all records from the LEFT table joined with the RIGHT table where possible.
If there are matches though, it will still return all rows that match, therefore, one row in LEFT that matches two rows in RIGHT will return as two ROWS, just like an INNER JOIN.
EDIT:
In response to your edit, I've just had a further look at your query and it looks like you are only returning data from the LEFT table. Therefore, if you only want data from the LEFT table, and you only want one row returned for each row in the LEFT table, then you have no need to perform a JOIN at all and can just do a SELECT directly from the LEFT table.
Table1 Table2
_______ _________
1 2
2 2
3 5
4 6
SELECT Table1.Id,
Table2.Id
FROM Table1
LEFT OUTER JOIN Table2 ON Table1.Id=Table2.Id
Results:
1,null
2,2
2,2
3,null
4,null
It isn't impossible. The number of records in the left table is the minimum number of records it will return. If the right table has two records that match to one record in the left table, it will return two records.
In response to your postscript, that depends on what you would like.
You are getting (possible) multiple rows for each row in your left table because there are multiple matches for the join condition. If you want your total results to have the same number of rows as there is in the left part of the query you need to make sure your join conditions cause a 1-to-1 match.
Alternatively, depending on what you actually want you can use aggregate functions (if for example you just want a string from the right part you could generate a column that is a comma delimited string of the right side results for that left row.
If you are only looking at 1 or 2 columns from the outer join you might consider using a scalar subquery since you will be guaranteed 1 result.
Each record from the left table will be returned as many times as there are matching records on the right table -- at least 1, but could easily be more than 1.
Could it be a one to many relationship between the left and right tables?
LEFT OUTER JOIN just like INNER JOIN (normal join) will return as many results for each row in left table as many matches it finds in the right table. Hence you can have a lot of results - up to N x M, where N is number of rows in left table and M is number of rows in right table.
It's the minimum number of results is always guaranteed in LEFT OUTER JOIN to be at least N.
If you need just any one row from the right side
SELECT SuspReason, SiteID FROM(
SELECT SUSP.Susp_Visits.SuspReason, SUSP.Susp_Visits.SiteID, ROW_NUMBER()
OVER(PARTITION BY SUSP.Susp_Visits.SiteID) AS rn
FROM SUSP.Susp_Visits
LEFT OUTER JOIN DATA.Dim_Member ON SUSP.Susp_Visits.MemID = DATA.Dim_Member.MembershipNum
) AS t
WHERE rn=1
or just
SELECT SUSP.Susp_Visits.SuspReason, SUSP.Susp_Visits.SiteID
FROM SUSP.Susp_Visits WHERE EXISTS(
SELECT DATA.Dim_Member WHERE SUSP.Susp_Visits.MemID = DATA.Dim_Member.MembershipNum
)
Pay attention if you have a where clause on the "right side' table of a query containing a left outer join...
In case you have no record on the right side satisfying the where clause, then the corresponding record of the 'left side' table will not appear in the result of your query....
It seems as though there are multiple rows in the DATA.Dim_Member table per SUSP.Susp_Visits row.
if multiple (x) rows in Dim_Member are associated with a single row in Susp_Visits, there will be x rows in the resul set.
Since the left table contains 4935 records, I suspect you want your results to return 4935 records. Try this:
create table table1
(siteID int,
SuspReason int)
create table table2
(siteID int,
SuspReason int)
insert into table1(siteID, SuspReason) values
(1, 678),
(1, 186),
(1, 723)
insert into table2(siteID, SuspReason) values
(1, 678),
(1, 965)
select distinct t1.siteID, t1.SuspReason
from table1 t1 left join table2 t2 on t1.siteID = t2.siteID and t1.SuspReason = t2.SuspReason
union
select distinct t2.siteID, t2.SuspReason
from table1 t1 right join table2 t2 on t1.siteID = t2.siteID and t1.SuspReason = t2.SuspReason
The only way your query would return more number of rows than the left table ( which is SUSP.Susp_Visits in your case), is that the condition (SUSP.Susp_Visits.MemID = DATA.Dim_Member.MembershipNum) is matching multiple rows in the right table, which is DATA.Dim_Member. So, there are multiple rows in the DATA.Dim_Member where identical values are present for DATA.Dim_Member.MembershipNum. You can verify this by executing the below query:
select DATA.Dim_Member.MembershipNum, count(DATA.Dim_Member.MembershipNum) from DATA.Dim_Member group by DATA.Dim_Member.MembershipNum
Simply, LEFT OUTER JOIN is the Cartesian product within each join key, along with the unmatched rows of the left table
(i.e. for each key_x that has N records in table_L and M records in table_R the result will have N*M records if M>0, or N records if M=0)