How can a LEFT OUTER JOIN return more records than exist in the left table? - sql

I have a very basic LEFT OUTER JOIN to return all results from the left table and some additional information from a much bigger table. The left table contains 4935 records yet when I LEFT OUTER JOIN it to an additional table the record count is significantly larger.
As far as I'm aware it is absolute gospel that a LEFT OUTER JOIN will return all records from the left table with matched records from the right table and null values for any rows which cannot be matched, as such it's my understanding that it should be impossible to return more rows than exist in the left table, but it's happening all the same!
SQL Query follows:
SELECT SUSP.Susp_Visits.SuspReason, SUSP.Susp_Visits.SiteID
FROM SUSP.Susp_Visits LEFT OUTER JOIN
DATA.Dim_Member ON SUSP.Susp_Visits.MemID = DATA.Dim_Member.MembershipNum
Perhaps I have made a mistake in the syntax or my understanding of LEFT OUTER JOIN is incomplete, hopefully someone can explain how this could be occurring?

The LEFT OUTER JOIN will return all records from the LEFT table joined with the RIGHT table where possible.
If there are matches though, it will still return all rows that match, therefore, one row in LEFT that matches two rows in RIGHT will return as two ROWS, just like an INNER JOIN.
EDIT:
In response to your edit, I've just had a further look at your query and it looks like you are only returning data from the LEFT table. Therefore, if you only want data from the LEFT table, and you only want one row returned for each row in the LEFT table, then you have no need to perform a JOIN at all and can just do a SELECT directly from the LEFT table.

Table1 Table2
_______ _________
1 2
2 2
3 5
4 6
SELECT Table1.Id,
Table2.Id
FROM Table1
LEFT OUTER JOIN Table2 ON Table1.Id=Table2.Id
Results:
1,null
2,2
2,2
3,null
4,null

It isn't impossible. The number of records in the left table is the minimum number of records it will return. If the right table has two records that match to one record in the left table, it will return two records.

In response to your postscript, that depends on what you would like.
You are getting (possible) multiple rows for each row in your left table because there are multiple matches for the join condition. If you want your total results to have the same number of rows as there is in the left part of the query you need to make sure your join conditions cause a 1-to-1 match.
Alternatively, depending on what you actually want you can use aggregate functions (if for example you just want a string from the right part you could generate a column that is a comma delimited string of the right side results for that left row.
If you are only looking at 1 or 2 columns from the outer join you might consider using a scalar subquery since you will be guaranteed 1 result.

Each record from the left table will be returned as many times as there are matching records on the right table -- at least 1, but could easily be more than 1.

Could it be a one to many relationship between the left and right tables?

LEFT OUTER JOIN just like INNER JOIN (normal join) will return as many results for each row in left table as many matches it finds in the right table. Hence you can have a lot of results - up to N x M, where N is number of rows in left table and M is number of rows in right table.
It's the minimum number of results is always guaranteed in LEFT OUTER JOIN to be at least N.

If you need just any one row from the right side
SELECT SuspReason, SiteID FROM(
SELECT SUSP.Susp_Visits.SuspReason, SUSP.Susp_Visits.SiteID, ROW_NUMBER()
OVER(PARTITION BY SUSP.Susp_Visits.SiteID) AS rn
FROM SUSP.Susp_Visits
LEFT OUTER JOIN DATA.Dim_Member ON SUSP.Susp_Visits.MemID = DATA.Dim_Member.MembershipNum
) AS t
WHERE rn=1
or just
SELECT SUSP.Susp_Visits.SuspReason, SUSP.Susp_Visits.SiteID
FROM SUSP.Susp_Visits WHERE EXISTS(
SELECT DATA.Dim_Member WHERE SUSP.Susp_Visits.MemID = DATA.Dim_Member.MembershipNum
)

Pay attention if you have a where clause on the "right side' table of a query containing a left outer join...
In case you have no record on the right side satisfying the where clause, then the corresponding record of the 'left side' table will not appear in the result of your query....

It seems as though there are multiple rows in the DATA.Dim_Member table per SUSP.Susp_Visits row.

if multiple (x) rows in Dim_Member are associated with a single row in Susp_Visits, there will be x rows in the resul set.

Since the left table contains 4935 records, I suspect you want your results to return 4935 records. Try this:
create table table1
(siteID int,
SuspReason int)
create table table2
(siteID int,
SuspReason int)
insert into table1(siteID, SuspReason) values
(1, 678),
(1, 186),
(1, 723)
insert into table2(siteID, SuspReason) values
(1, 678),
(1, 965)
select distinct t1.siteID, t1.SuspReason
from table1 t1 left join table2 t2 on t1.siteID = t2.siteID and t1.SuspReason = t2.SuspReason
union
select distinct t2.siteID, t2.SuspReason
from table1 t1 right join table2 t2 on t1.siteID = t2.siteID and t1.SuspReason = t2.SuspReason

The only way your query would return more number of rows than the left table ( which is SUSP.Susp_Visits in your case), is that the condition (SUSP.Susp_Visits.MemID = DATA.Dim_Member.MembershipNum) is matching multiple rows in the right table, which is DATA.Dim_Member. So, there are multiple rows in the DATA.Dim_Member where identical values are present for DATA.Dim_Member.MembershipNum. You can verify this by executing the below query:
select DATA.Dim_Member.MembershipNum, count(DATA.Dim_Member.MembershipNum) from DATA.Dim_Member group by DATA.Dim_Member.MembershipNum

Simply, LEFT OUTER JOIN is the Cartesian product within each join key, along with the unmatched rows of the left table
(i.e. for each key_x that has N records in table_L and M records in table_R the result will have N*M records if M>0, or N records if M=0)

Related

sql query error no data displayed

I need to select all data from 2 tables in an sql database.
I searched the site and dried numerous ways but no sucess.
One table has no data but the other is full of it.
If i select each one individually i get good results, but if i use for instance:
select * from relatorio cross join temp
or
select * from relatorio r,temp t
or even:
select t.*, r.* from temp t inner join relatorio r on 1=1
The join works, but none of them shows data.
Can anyone help?
Thanks in advance.
All three select statements in the questions are cross joins.
A cross joins returns data only if both tables have at least one row.
It returns a cartesian product of both tables, meaning that every row in one table will be joined to every row in the other table.
One table has no data but the other is full of it.
Since one of your tables is empty, it will return no results at all. You can think about it as multipling by 0.
Now you have two options: one is to use a full join and the other one is to use left join, in this case both will return the same results, since one table is empty:
select *
from relatorio
left join temp on <join condition> -- assuming temp is the empty table
or
select *
from relatorio
full join temp on <join condition> -- in this case, it doesn't matter what table is empty
If you want to return all matched and umatched rows, use Full Outer Join.The FULL OUTER JOIN keyword returns all rows from the left table (table1) and from the right table (table2).
The FULL OUTER JOIN keyword combines the result of both LEFT and RIGHT joins.
In SQL the FULL OUTER JOIN combines the results of both left and right outer joins and returns all (matched or unmatched) rows from the tables on both sides of the join clause.
SQL FULL OUTER JOIN Syntax:
SELECT column_name(s)
FROM table1
FULL OUTER JOIN table2
ON table1.column_name=table2.column_name;
The SQL CROSS JOIN produces a result set which is the number of rows in the first table multiplied by the number of rows in the second table if no WHERE clause is used along with CROSS JOIN. This kind of result is called as Cartesian Product.
If WHERE clause is used with CROSS JOIN, it functions like an INNER JOIN.
An alternative way of achieving the same result is to use column names separated by commas after SELECT and mentioning the table names involved, after a FROM clause.
CROSS JOIN SYNTAX
SELECT *
FROM table1
CROSS JOIN table2;

find the number of records that left outer join,right outer join and full outer join return

Is it possible to find the number of records that left outer join, right outer join and full outer join return. Given the number of records in left handed side table and right handed side table and matching records.
I am trying to correlate the relation ship between them. I have tried with two tables entering sample data. Could not get any relation ship between them.
How ever if I know the number of unmatched entries in left handed side table. I will add that number to the matching records, then I will get the left outer join output. If I know the number of unmatched records in right handed side, then I will add that number to the matched records. It will give us right outer join output.
Is it possible without knowing unmatched records. Can we find the number of records that left outer join, right outer join and full outer join returns.
CREATE table table1(
id integer,
name varchar(40)
);
CREATE table table2(
id integer,
name varchar(40)
);
insert into table1(id,name)values(1,'ABC');
insert into table1(id,name)values(2,'DEF');
insert into table1(id,name)values(3,'GHI');
insert into table1(id,name)values(4,'JKL');
insert into table1(id,name)values(5,'JKL');
insert into table1(id,name)values(6,'JKL');
insert into table2(id,name)values(2,'ABC');
insert into table2(id,name)values(2,'ABC');
insert into table2(id,name)values(1,'ABC');
insert into table2(id,name)values(1,'ABC');
insert into table2(id,name)values(3,'ABC');
insert into table2(id,name)values(3,'ABC');
insert into table2(id,name)values(4,'ABC');
insert into table2(id,name)values(4,'ABC');
insert into table2(id,name)values(5,'ABC');
insert into table2(id,name)values(5,'ABC');
insert into table2(id,name)values(11,'ABC');
insert into table2(id,name)values(12,'ABC');
insert into table2(id,name)values(13,'ABC');
insert into table2(id,name)values(14,'ABC');
select count(*) from table1;//6
select count(*) from table2; //14
select count(*) from table1 inner join table2
on table1.id=table2.id; //10
select count(*) from table1 left outer join table2
on table1.id=table2.id;//11
select count(*) from table1 right outer join table2
on table1.id=table2.id;//14
select count(*) from table1 full outer join table2
on table1.id=table2.id;//15
//Unmatched records`enter code here`
select count(*) from table1 left outer join table2
on table1.id=table2.id
where table2.id is null;//1
select count(*) from table1 right outer join table2
on table1.id=table2.id
where table1.id is null;//4
Since you know the total number of records, knowing the number of unmatched records is equivalent to knowing the number of matched records. (Perhaps!)
The answer to your question is NO, you can't determine the number of records in different types of joins by only knowing the cardinalities of the base tables, without knowing how many records are matched (or unmatched). Simple mental exercise: both tables have 100 records. If all are perfectly matched, one-to-one, then all joins are the same as the inner join, and they all have 100 records. If there are no matches at all, the inner join has zero rows, the one-sided joins have 100 rows and the full outer join has 200 rows. And the only difference between these cases is the number of matched (or unmatched) records, there is absolutely nothing else that you might know that will allow you to get the answer without this piece of information.
ADDED after the OP asked a follow-up question:
In fact knowing "how many records are matched" is not well defined, and insufficient anyway. Suppose ALL records from both tables are matched. At one extreme, the matches may be in pairs: there is an id column in both tables, and the values in both tables are all the possible values from 1 to 100. Then the result has 100 rows. On the other hand, suppose the "id" is not unique in either table. Instead, it has the value 1 IN ALL 100 ROWS IN BOTH TABLES. Then every row in the first table matches every row in the second table, and the result set will have 100 x 100 = 10,000 rows.
This is just to suggest the following: "how many rows are matched" is not a well-defined concept. To have a count of the resulting joins (of different kinds), one needs to know what the joins are on, and for each tuple in the join condition, how many rows have that specific tuple in each table. Then the number of rows in the result set of an inner join is a sum of products of such tuple-grouped counts, and additional rows for unmatched rows from the left (or right, or both) table(s) for outer joins.

sql left join returns

I am trying to run a left join on 2 tables. I do not have a group by and the only where condition i have is on the second table. But, the returned rows are less than the first table. isn't the left join suppose to bring all the data from the first table?
Here is my SQL:
select *
from tbl_a A left join tbl_b B
ON
A.Cnumber=B.Cnumber
and A.CDNUmber=B.CDNumber
and abs(A.duration - B.Duration)<2
and substr(A.text,1,3)||substr(A.text,5,8)||substr(A.text,9,2)=substr(B.text,1,8)
where B.fixed = 'b580'
There are 140,000 records in table A but the result returned is less than 100,000 records. What is the problem and how can I solve it?
As soon as you put a condition in the WHERE clause that references the right table and doesn't accommodate the NULLs that will be produced when the join is unsuccessful, you've transformed it (effectively) back into an INNER JOIN.
Try:
where B.fixed = 'b580' OR B.fixed IS NULL
Or add this condition to the ON clause for the JOIN.
You should add the where clause to the join:
select *
from tbl_a A left join tbl_b B
ON
A.Cnumber=B.Cnumber
and A.CDNUmber=B.CDNumber
and abs(A.duration - B.Duration)<2
and substr(A.text,1,3)||substr(A.text,5,8)||substr(A.text,9,2)=substr(B.text,1,8)
and B.fixed = 'b580'
If you use where statemen all records where b is not existing will not returned.

SQL Inner Join and further requirements

I would like to return table entries from an inner join which do not have any matching entries in the second column.
Lets consider the following two tables:
Table one:
Name Number
A 1
A 2
A 4
Table two:
Name ID
A 3
The query should return Name=A ID=3. If ID would be 4, the query should not return anything. Is this even possible in SQL? Thanks for any hints!
Edit:
the joined table would look like this:
Name Number ID
A 1 3
A 2 3
A 4 3
So if I do this query I get no entries in the result set:
SELECT * FROM TABLE_ONE INNER JOIN TABLE_TWO ON TABLE_ONE.NAME=TABLE_TWO.NAME WHERE NUMBER=ID
Exactly in this situation I would like to get the Name returned!
Yes, instead of using an INNER join, use a LEFT or a FULL OUTER join. This will allow null values from the other table to appear when you have a value in one of your tables.
The FULL OUTER JOIN keyword returns all rows from the left table (table1) and from the right table (table2).
The LEFT JOIN keyword returns all rows from the left table (table1), with the matching rows in the right table (table2). The result is NULL in the right side when there is no match. (There is also a RIGHT join, but it does the same thing as the left join, just returning all rows from the RIGHT table instead of the left).
SELECT *
FROM Table2
WHERE NOT EXISTS (
SELECT *
FROM Table1
WHERE Table1.Name = Table2.Name AND Table1.Number = Table2.ID
)
As #rhealitycheck has said, a full outer join would work. I found this blog post helpful in explaining joins. P.S. I can't leave comments (Otherwise I would have).

What means "table A left outer join table B ON TRUE"?

I know conditions are used in table joining. But I met a specific situation and the SQL codes writes like "Table A join table B ON TRUE"
What will happen based on the "ON TRUE" condition? Is that just a total cross join without any condition selection?
Actually, the original expression is like:
Table A LEFT outer join table B on TRUE
Let's say A has m rows and B has n rows. Is there any conflict between "left outer join" and "on true"? Because it seems "on true" results a cross join.
From what I guess, the result will be m*n rows. So, it has no need to write "left outer join", just a "join" will give the same output, right?
Yes. That's the same thing as a CROSS JOIN.
In MySQL, we can omit the [optional] CROSS keyword. We can also omit the ON clause.
The condition in the ON clause is evaluated as a boolean, so we could also jave written something like ON 1=1.
UPDATE:
(The question was edited, to add another question about a LEFT [OUTER] JOIN b which is different than the original construct: a JOIN b)
The "LEFT [OUTER] JOIN" is slightly different, in that rows from the table on the left side will be returned even when there are no matching rows found in the table on the right side.
As noted, a CROSS JOIN between tables a (containing m rows) and table b containing n rows, absent any other predicates, will produce a resultset of m x n rows.
The LEFT [OUTER] JOIN will produce a different resultset in the special case where table b contains 0 rows.
CREATE TABLE a (i INT);
CREATE TABLE b (i INT);
INSERT INTO a VALUES (1),(2),(3);
SELECT a.i, b.i FROM a LEFT JOIN b ON TRUE ;
Note that the LEFT JOIN will returns rows from table a (a total of m rows) even when table b contains 0 rows.
A cross join produces a cartesian product between the two tables, returning all possible combinations of all rows. It has no on clause because you're just joining everything to everything.
Cross join does not combine the rows, if you have 100 rows in each table with 1 to 1 match, you get 10.000 results, Innerjoin will only return 100 rows in the same situation.
These 2 examples will return the same result:
Cross join
select * from table1 cross join table2 where table1.id = table2.fk_id
Inner join
select * from table1 join table2 on table1.id = table2.fk_id
Use the last method
The join syntax's general form:
SELECT *
FROM table_a
JOIN table_b ON condition
The condition is used to tell the database how to match rows from table_a to table_b, and would usually look like table_a.some_id = table_b.some_id.
If you just specify true, you will match every row from table_a with every row of table_b, so if table_a contains n rows and table_b contains m rows the result would have m*n rows.
Most(?) modern databases have a cleaner syntax for this, though:
SELECT *
FROM table_a
CROSS JOIN table_b
The difference between the pure cross join and left join (where the condition is forced to be always true, as when using ON TRUE) is that the result set for the left join will also have rows where the left table's rows appear next to a bunch of NULLs where the right table's columns would have been.