Where vs AND in LEFT JOIN [duplicate] - sql

This question already has answers here:
Difference between "on .. and" and "on .. where" in SQL Left Join? [duplicate]
(6 answers)
Closed 8 years ago.
I normally don't use an AND in the same line as ON when performing a LEFT JOIN, as I have faced issues in the past. I rather prefer to not go into this pickle and instead put any additional condition in a WHERE clause, which is reliable. But today, out of curiousity, I would love to put this question forward and be clear once and for all.
Question: What really goes on when in a LEFT JOIN when I use the "exta" conditions? Why doesn't it behave in the same manner as WHERE?
SAMPLE QUERIES
create table #a
(
id int,
name varchar(3)
)
create table #b
(
id int,
name varchar(3)
)
insert into #a
select 1, 'abc'
union
select 2, 'def'
union
select 3, 'ghi'
insert into #b
select 1, 'abc'
union
select 2, 'def'
select * from #a a left join #b b on a.id = b.id
where a.id = 3
select * from #a a left join #b b on a.id = b.id
and a.id = 3

This version filters on a.id:
select *
from #a a left join
#b b
on a.id = b.id
where a.id = 3
This version does not filter on a.id:
select *
from #a a left join
#b b
on a.id = b.id and a.id = 3;
Why not? Go to the definition of the left join. It takes all rows from the first table, regardless of whether the on clause evaluates to true, false, or NULL. So, filters on the first table have no impact in a left join.
Filters on the first table should be in the where clause. Filters on the second table should be in the on clause.

You can put conditions on the "right tables" columns in the ON clause.

Related

LEFT JOIN with OR clause without UNION

I know this shouldn't happen in a database, but it happened and we have to deal with it. We need to insert new rows into a table if they don't exist based on the values in another table. This is easy enough (just do LEFT JOIN and check for NULL values in 1st table). But...the join isn't very straight forward and we need to search 1st table on 2 conditions with an OR and not AND. So basically if it finds a match on either of the 2 attributes, we consider that the corresponding row in 1st table exists and we don't have to insert a new one. If there are no matches on either of the 2 attributes, then we consider it as a new row. We can use OR condition in the LEFT JOIN statement but from what I understand, it does full table scan and the query takes a very long time to complete even though it yields the right results. We cannot use UNION either because it will not give us what we're looking for.
Just for simplicity purpose consider the scenario below (we need to insert data into tableA).
If(OBJECT_ID('tempdb..#tableA') Is Not Null) Begin
Drop Table #tableA End
If(OBJECT_ID('tempdb..#tableB') Is Not Null) Begin
Drop Table #tableB End
create table #tableA ( email nvarchar(50), id int )
create table #tableB ( email nvarchar(50), id int )
insert into #tableA (email, id) values ('123#abc.com', 1), ('456#abc.com', 2), ('789#abc.com', 3), ('012#abc.com', 4)
insert into #tableB (email, id) values ('234#abc.com', 1), ('456#abc.com', 2), ('567#abc.com', 3), ('012#abc.com', 4), ('345#abc.com', 5)
--THIS QUERY IS CORRECTLY RETURNING 1 RECORD
select B.email, B.id
from #tableB B
left join #tableA A on A.email = B.email or B.id = A.id
where A.id is null
--THIS QUERY IS INCORRECTLY RETURNING 3 RECORDS SINCE THERE ARE ALREADY RECORDS WITH ID's 1 & 3 in tableA though the email addresses of these records don't match
select B.email, B.id
from #tableB B
left join #tableA A on A.email = B.email
where A.id is null
union
select B.email, B.id
from #tableB B
left join #tableA A on B.id = A.id
where A.id is null
If(OBJECT_ID('tempdb..#tableA') Is Not Null) Begin
Drop Table #tableA End
If(OBJECT_ID('tempdb..#tableB') Is Not Null) Begin
Drop Table #tableB End
The 1st query works correctly and only returns 1 record, but the table size is just few records and it completes under 1 sec. When the 2 tables have thousands or records, the query may take 10 min to complete. The 2nd query of course returns the records we don't want to insert because we consider them existing. Is there a way to optimize this query so it takes an acceptable time to complete?
You are using an anti join, which is another way of writing the straight-forward NOT EXISTS:
where not exists
(
select null
from #tableA A
where A.email = B.email or B.id = A.id
)
I.e. where not exists a row in table A with the same email or the same id. In other words: where not exists a row with the same email and not exists a row with the same id.
where not exists (select null from #tableA A where A.email = B.email)
and not exists (select null from #tableA A where B.id = A.id)
With the appropriate indexes
on #tableA (id);
on #tableA (email);
this should be very fast.
It's hard to tune something you can't see. Another option to get the data is to:
SELECT B.email
, B.id
FROM #TableB B
EXCEPT
(
SELECT B.email
, B.id
FROM #tableB B
INNER JOIN #tableA A
ON A.email = B.email
UNION ALL
SELECT B.email
, B.id
FROM #tableB B
INNER JOIN #tableA A
ON B.id = A.id
)
This way you don't have to use OR, you can use INNER JOIN rather than LEFT JOIN and you can use UNION ALL instead of UNION (though this advantage may well be negated by the EXCEPT). All of which may help your performance. Perhaps the joins can be more efficient when replaced with EXISTS.
You didn't mention how this problem occurred (where the data from both tables is coming from, and why they are out of sync when they shouldn't be), but it would be preferable to fix it at the source.
No the query returns correctly 3 rows
because
select B.email, B.id
from #tableB B
left join #tableA A on A.email = B.email
where A.id is null
Allone reurns the 3 rows.
For your "problemm"
select B.email, B.id
from #tableB B
left join #tableA A on A.email = B.email or B.id = A.id
where A.id is null
will che3kc for every row, if it is true to be included
So for example
('123#abc.com', 1) ('234#abc.com', 1)
as the Ids are the same it will be joined
but when you join by the emails the condition is false and so is included in the result set
You can only use the UNION approach, when you are comparing only the emails or the ids, but with both the queries are not equivalent

SQL conditional joins

I have a table-valued function with joins where I want to choose which join I use depending on a local variable like:
DECLARE #type int;
Then do some logic with #type and set it to 1.
SELECT ...
FROM table t
inner join ... a on a.id = t.id and #type = 1 -- Only trigger this join if #type is 1
inner join ... b on b.id = t.id and #type = 2 -- Only trigger this join if #type is 2
So my question is: how can I choose which join to trigger depending on the value of #type (if even possible).
The reason I want to do this is that the SELECT statement is massive, and I don't want repetitive code in the script.
Use left join instead:
SELECT ...
FROM table t LEFT JOIN
a
ON a.id = t.id AND #type = 1 LEFT JOIN
b
ON b.id = t.id AND #type = 2 ;
You might need WHERE #type IN (1, 2) if you want an empty result set for other values.
You will need COALESCE() in the SELECT to combine the columns:
COALESCE(a.col1, b.col1) as col1
This should be quite efficient. However, you might want to simply use UNION ALL:
SELECT ...
FROM table t JOIN
a
ON a.id = t.id
WHERE #type = 1
UNION ALL
SELECT ...
FROM table t JOIN
b
ON b.id = t.id
WHERE #type = 2 ;
You could union your two tables within a subquery. For any similar columns (i.e. would be in the same column in the outer select) you can place them above each other, for columns unique to each source you'd need to pad the other side of the union with NULL, e.g.
SELECT t.id,
a.SimilarCol,
a.UniqueToA,
a.UniqueToB
FROM Table AS t
INNER JOIN
( SELECT a.id,
a.SimilarCol, -- Column you would want to consider the same in each table
a.UniqueToA, -- Column Unique to this table
UniqueToB = NULL -- Column Unique to the other table
FROM SomeTable AS a
WHERE #Type = 1
UNION ALL
SELECT b.id,
b.SimilarCol,
UniqueToA = NULL,
b.UniqueToB
FROM SomeOtherTable AS b
WHERE #type = 2
) AS a
ON a.id = t.id;
Example on db<>Fiddle

"On" left join order

I've read through 20+ posts with a similar title, but failed to find an answer, so apologies in advance if one is available.
I have always believed that
select * FROM A LEFT JOIN B on ON A.ID = B.ID
was equivalent to
select * FROM A LEFT JOIN B on ON B.ID = A.ID
but was told today that "since you have a left join, you must have it as A = B, because flipped it will act as an inner join.
Any truth to this?
Whoever told you that does not understand how JOINs and join conditions work. He/She is completely wrong.
The order of the tables matters for a left join. a left join b is different than b left join a, but the order of the join condition is meaningless.
A.ID = B.ID is the condition on which the tables are joined and returns TRUE or FALSE.
Since equality(=) is commutative, the order of the operands does not affect the result.
They are completely incorrect and it is trivial to prove.
DECLARE #A TABLE (ID INT)
DECLARE #B TABLE (ID INT)
INSERT INTO #A(ID) SELECT 1
INSERT INTO #A(ID) SELECT 2
INSERT INTO #B(ID) SELECT 1
SELECT *
FROM #A a
LEFT JOIN #B b ON a.ID=b.ID
SELECT *
FROM #A a
LEFT JOIN #B b ON b.ID=a.ID
The order of the tables matter (A Left JOIN B versus B LEFT JOIN A), the order of the join condition group matter if an OR is used (A=B OR A IS NULL AND A IS NOT NULL - always use parentheses with OR), but within a condition group(a.ID=b.ID for example) it doesn't matter.

Please suggest how to change join condition. Need to replace OR condition in JOIN [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
This OR join is taking time. I need to avoid this join Please suggest.
Select * from [A] a
LEFT JOIN [B] b
ON A.ID = b.ID OR a.ID = b.ID_REF
Your query can be re-written with a UNION. You'd get the results for the first condition, then the results for the second, then merge the two.
SELECT * FROM [A] a LEFT JOIN [B] b ON a.ID = b.ID
UNION
SELECT * FROM [A] a LEFT JOIN [B] b ON a.ID = b.ID_REF;
This, however, is likely to be slower than your original query, because of the task to eliminate duplicates.
In case there can be no duplicates (i.e. b.ID never equals b.ID_REF) or you just don't care if you get some, then you can use UNION ALL instead. This simply glues both results without removing duplicates, so this may actually be faster than your query. You should have indexes on A(ID), B(ID), and B(ID_REF).
SELECT * FROM [A] a LEFT JOIN [B] b ON a.ID = b.ID
UNION ALL
SELECT * FROM [A] a LEFT JOIN [B] b ON a.ID = b.ID_REF;
Without having any information about your tables, indexes, etc., this is naturally going to be an incomplete answer. Having warned you about that, I would say that you could conceivably split this into two separate JOIN conditions and then alter your WHERE clause to reflect the importance/relevance of the ID fields, something like this:
SELECT *
FROM TableA A
LEFT JOIN TableB B1
ON A.ID = B1.ID
LEFT JOIN TableB B2
ON A.ID = B2.ID_REF
WHERE COALESCE(B1.ID,B2.ID_REF) = A.ID
;
This, of course, has the downside of potentially returning duplicate rows, and may even work more slowly than your original query, depending on your indexes/keys - but as stated, we're going to have a hard time giving a good answer with so little information about the schema you're working with.
Just using some super-simple dummy data to show you how this works:
DECLARE #TABLEA TABLE
(
ID INT
);
DECLARE #TABLEB TABLE
(
ID INT
,ID_REF INT
);
INSERT INTO #TABLEA
SELECT 1
UNION ALL
SELECT 2
UNION ALL
SELECT 3
UNION ALL
SELECT 4
UNION ALL
SELECT 5
UNION ALL
SELECT 6
;
INSERT INTO #TABLEB (ID, ID_REF)
SELECT 1,NULL
UNION ALL
SELECT NULL,2
UNION ALL
SELECT 3,3
UNION ALL
SELECT 4,NULL
UNION ALL
SELECT NULL,5
UNION ALL
SELECT 6,NULL
;
SELECT
A.ID AS [A_ID]
,COALESCE(B1.ID,B2.ID_Ref) AS [B_Final_ID]
FROM #TABLEA A
LEFT JOIN #TABLEB B1
ON A.ID = B1.ID
LEFT JOIN #TABLEB B2
ON A.ID = B2.ID_REF
WHERE COALESCE(B1.ID,B2.ID_REF) = A.ID
;
Returns:
A_ID | B_Final_ID
-----+-----------
1 | 1
2 | 2
3 | 3
4 | 4
5 | 5
6 | 6
I would seriously recommend that you consider supplying some dummy data, which would give nothing away about your actual database other than the structure of the tables. That's the only way you're going to be able to get a better answer, in my opinion.

Inner join 2 tables but return all if 1 table empty

I have 2 tables say A and B, and I want to do a join on them.
Table A will always have records in it.
When table B has rows in it, I want the query to turn all the rows in which table A and table B matches. (i.e. behave like inner join)
However, if table B is empty, I'd like to return everything from table A.
Is this possible to do in 1 query?
Thanks.
Yes, for results like this, use LEFT JOIN.
Basically what INNER JOIN does is it only returns row where it has atleast one match on the other table. The LEFT JOIN, on the other hand, returns all records on the left hand side table whether it has not match on the other table.
To further gain more knowledge about joins, kindly visit the link below:
Visual Representation of SQL Joins
I came across the same question and, as it was never answered, I post a solution given to this problem somewhere else in case it helps someone in the future.
See the source.
select *
from TableA as a
left join TableB as b
on b.A_Id = a.A_Id
where
b.A_Id is not null or
not exists (select top 1 A_Id from TableB)
Here is another one, but you need to add one "null" row to table B if it's empty
-- In case B is empty
Insert into TableB (col1,col2) values (null,null)
select *
from TableA as a inner join TableB as b
on
b.A_Id = a.A_Id
or b.A_Id is null
I would use an if-else block to solve it like below:
if (select count(*) from tableB) > 0
begin
Select * from TableA a Inner Join TableB b on a.ID = b.A_ID
end
else
begin
Select * from TableA
end
Try This
SELECT t1.* FROM table1 AS t1 INNER JOIN table2 AS t2 ON t1.something = t2.someotherthing UNION SELECT * FROM table1 WHERE something = somethingelse;
This is solution:
CREATE TABLE MyData(Id INT, Something VARCHAR(10), OwnerId INT);
CREATE TABLE OwnerFilter(OwnerId INT);
SELECT *
FROM
(SELECT NULL AS Gr) AS Dummy
LEFT JOIN OwnerFilter F ON (1 = 1)
JOIN MyData D ON (F.OwnerId IS NULL OR D.OwnerId = F.OwnerId);
Link to sqlfiddle: http://sqlfiddle.com/#!6/0f9d9/7
I did the following:
DECLARE #TableB TABLE (id INT)
-- INSERT INTO #TableB
-- VALUES (some ids to filter by)
SELECT TOP 10 *
FROM [TableA] A
LEFT JOIN #TableB B
ON A.ID = B.id
WHERE B.id IS NOT NULL
OR iif(exists(SELECT *
FROM TableB), 1, 0) = 0
Now:
If TableB is empty (leave the commented lines commented) you'll get the top 10.
If TableB has some ids in it, you'll only join by those.
I do not know how efficient this is. Comments are welcome.
Maybe use a CTE
;WITH ctetable(
Select * from TableA
)
IF(EXISTS(SELECT 1 FROM TableB))
BEGIN
Select * from ctetable
Inner join TableB
END
ELSE
BEGIN
Select * from ctetable
END
or dynamic SQL
DECLARE #Query NVARCHAR(max);
SET #QUERY = 'Select * FROM TableA';
IF(EXISTS(SELECT 1 FROM TableB))
BEGIN
SET #QUERY = CONCAT(#QUERY,' INNER JOIN TableB');
END
EXEC sp_executesql #Query