LEFT JOIN with OR clause without UNION - sql

I know this shouldn't happen in a database, but it happened and we have to deal with it. We need to insert new rows into a table if they don't exist based on the values in another table. This is easy enough (just do LEFT JOIN and check for NULL values in 1st table). But...the join isn't very straight forward and we need to search 1st table on 2 conditions with an OR and not AND. So basically if it finds a match on either of the 2 attributes, we consider that the corresponding row in 1st table exists and we don't have to insert a new one. If there are no matches on either of the 2 attributes, then we consider it as a new row. We can use OR condition in the LEFT JOIN statement but from what I understand, it does full table scan and the query takes a very long time to complete even though it yields the right results. We cannot use UNION either because it will not give us what we're looking for.
Just for simplicity purpose consider the scenario below (we need to insert data into tableA).
If(OBJECT_ID('tempdb..#tableA') Is Not Null) Begin
Drop Table #tableA End
If(OBJECT_ID('tempdb..#tableB') Is Not Null) Begin
Drop Table #tableB End
create table #tableA ( email nvarchar(50), id int )
create table #tableB ( email nvarchar(50), id int )
insert into #tableA (email, id) values ('123#abc.com', 1), ('456#abc.com', 2), ('789#abc.com', 3), ('012#abc.com', 4)
insert into #tableB (email, id) values ('234#abc.com', 1), ('456#abc.com', 2), ('567#abc.com', 3), ('012#abc.com', 4), ('345#abc.com', 5)
--THIS QUERY IS CORRECTLY RETURNING 1 RECORD
select B.email, B.id
from #tableB B
left join #tableA A on A.email = B.email or B.id = A.id
where A.id is null
--THIS QUERY IS INCORRECTLY RETURNING 3 RECORDS SINCE THERE ARE ALREADY RECORDS WITH ID's 1 & 3 in tableA though the email addresses of these records don't match
select B.email, B.id
from #tableB B
left join #tableA A on A.email = B.email
where A.id is null
union
select B.email, B.id
from #tableB B
left join #tableA A on B.id = A.id
where A.id is null
If(OBJECT_ID('tempdb..#tableA') Is Not Null) Begin
Drop Table #tableA End
If(OBJECT_ID('tempdb..#tableB') Is Not Null) Begin
Drop Table #tableB End
The 1st query works correctly and only returns 1 record, but the table size is just few records and it completes under 1 sec. When the 2 tables have thousands or records, the query may take 10 min to complete. The 2nd query of course returns the records we don't want to insert because we consider them existing. Is there a way to optimize this query so it takes an acceptable time to complete?

You are using an anti join, which is another way of writing the straight-forward NOT EXISTS:
where not exists
(
select null
from #tableA A
where A.email = B.email or B.id = A.id
)
I.e. where not exists a row in table A with the same email or the same id. In other words: where not exists a row with the same email and not exists a row with the same id.
where not exists (select null from #tableA A where A.email = B.email)
and not exists (select null from #tableA A where B.id = A.id)
With the appropriate indexes
on #tableA (id);
on #tableA (email);
this should be very fast.

It's hard to tune something you can't see. Another option to get the data is to:
SELECT B.email
, B.id
FROM #TableB B
EXCEPT
(
SELECT B.email
, B.id
FROM #tableB B
INNER JOIN #tableA A
ON A.email = B.email
UNION ALL
SELECT B.email
, B.id
FROM #tableB B
INNER JOIN #tableA A
ON B.id = A.id
)
This way you don't have to use OR, you can use INNER JOIN rather than LEFT JOIN and you can use UNION ALL instead of UNION (though this advantage may well be negated by the EXCEPT). All of which may help your performance. Perhaps the joins can be more efficient when replaced with EXISTS.
You didn't mention how this problem occurred (where the data from both tables is coming from, and why they are out of sync when they shouldn't be), but it would be preferable to fix it at the source.

No the query returns correctly 3 rows
because
select B.email, B.id
from #tableB B
left join #tableA A on A.email = B.email
where A.id is null
Allone reurns the 3 rows.
For your "problemm"
select B.email, B.id
from #tableB B
left join #tableA A on A.email = B.email or B.id = A.id
where A.id is null
will che3kc for every row, if it is true to be included
So for example
('123#abc.com', 1) ('234#abc.com', 1)
as the Ids are the same it will be joined
but when you join by the emails the condition is false and so is included in the result set
You can only use the UNION approach, when you are comparing only the emails or the ids, but with both the queries are not equivalent

Related

Full Outer Join failing to return all records from both tables

I have a pair of tables I need to join, I want to return any record that's in tableA, tableB or both. I think I need a FULL OUTER JOIN
This query return 1164 records
SELECT name FROM tableA
WHERE reportDay = '2022-Apr-05'
And this one return 3339 records
SELECT name FROM tableB
WHERE reportDay = '2022-Apr-05'
And this one returns 3369 records (so there must be 30 records in tableA that aren't in tableB)
select distinct name FROM tableA where reportDay = '2022-Apr-05'
union distinct
select distinct name FROM tableB where reportDay = '2022-Apr-05'
I want to obtain a list of all matching records in either table. The query above returns 3369 records, so a FULL OUTER JOIN should also return 3369 rows (I think). My best effort so far is shown below. It returns 1164 rows and returns what looks to me to be a left join between tableA and tableB.
SELECT tableA.name.*, tableB.name.*
FROM tableA
FULL OUTER JOIN tableB
ON (tableA.name = tableB.name and tableB.reportDay = '2022-Apr-05')
WHERE tableA.reportDay = '2022-Apr-05'
Help appreciated. (if this looks question looks familiar, it's a follow-on question to this one )
UPDATE - Sorry (#forpas) to keep moving the goalposts - I'm trying to match test data to real-data scenario's.
DROP TABLE tableA;
DROP TABLE tableB;
CREATE TABLE tableA (name VARCHAR(10),
reportDay DATE,
val1 INTEGER,
val2 INTEGER);
CREATE TABLE tableB (name VARCHAR(10),
reportDay DATE,
test1 INTEGER,
test2 INTEGER);
INSERT INTO tableA values ('A','2022-Apr-05',1,2),
('B','2022-Apr-05',3,4), ('C','2022-Apr-05',5,6),
('A','2022-Apr-06',1,2), ('B','2022-Apr-06',3,4),
('C','2022-Apr-06',5,6), ('Z','2022-Apr-04',5,6),
('Z','2022-Apr-06',5,6) ;
INSERT INTO tableB values ('A','2022-Apr-03',5,6),
('B','2022-Apr-04',11,22), ('B','2022-Apr-05',11,22),
('C','2022-Apr-05',33,44), ('D','2022-Apr-05',55,66),
('B','2022-Apr-06',11,22), ('C','2022-Apr-06',33,44),
('D','2022-Apr-06',55,66), ('Q','2022-Apr-06',5,6);
SELECT tableA.*, tableB.*
FROM tableA
FULL OUTER JOIN tableB
ON (tableA.name = tableB.name and tableB.reportDay = '2022-Apr-05'
AND tableA.reportDay = '2022-Apr-05' )
For this data, I'd hope to see 4 rows of data 'A' from tableA only, 'B' and 'C' from both tables, and 'D' from table B only. I'm after the 5th April records only! The query (shown above) suggested by #forpas works except that the 'A' record in tableA doesn't get returned.
UPDATE - FINAL EDIT AND ANSWER!
Ok, the solution seem to be to concetenate the two fields together before joining....
SELECT a.*, b.*
FROM tableA a FULL OUTER JOIN tableB b
ON (b.name || b.reportDay) = (a.name || a.reportDay)
WHERE (a.reportDay = '2022-Apr-05' OR a.reportDay IS NULL)
AND (b.reportDay = '2022-Apr-05' OR b.reportDay IS NULL);
The condition for the date should be placed in a WHERE clause:
SELECT a.*, b.*
FROM tableA a FULL OUTER JOIN tableB b
ON b.name = a.name AND a.reportDay = b.reportDay
WHERE '2022-Apr-05' IN (a.reportDay, b.reportDay);
or:
SELECT a.*, b.*
FROM tableA a FULL OUTER JOIN tableB b
ON b.name = a.name
WHERE (a.reportDay = '2022-Apr-05' OR a.reportDay IS NULL)
AND (b.reportDay = '2022-Apr-05' OR b.reportDay IS NULL);
See the demo.

"On" left join order

I've read through 20+ posts with a similar title, but failed to find an answer, so apologies in advance if one is available.
I have always believed that
select * FROM A LEFT JOIN B on ON A.ID = B.ID
was equivalent to
select * FROM A LEFT JOIN B on ON B.ID = A.ID
but was told today that "since you have a left join, you must have it as A = B, because flipped it will act as an inner join.
Any truth to this?
Whoever told you that does not understand how JOINs and join conditions work. He/She is completely wrong.
The order of the tables matters for a left join. a left join b is different than b left join a, but the order of the join condition is meaningless.
A.ID = B.ID is the condition on which the tables are joined and returns TRUE or FALSE.
Since equality(=) is commutative, the order of the operands does not affect the result.
They are completely incorrect and it is trivial to prove.
DECLARE #A TABLE (ID INT)
DECLARE #B TABLE (ID INT)
INSERT INTO #A(ID) SELECT 1
INSERT INTO #A(ID) SELECT 2
INSERT INTO #B(ID) SELECT 1
SELECT *
FROM #A a
LEFT JOIN #B b ON a.ID=b.ID
SELECT *
FROM #A a
LEFT JOIN #B b ON b.ID=a.ID
The order of the tables matter (A Left JOIN B versus B LEFT JOIN A), the order of the join condition group matter if an OR is used (A=B OR A IS NULL AND A IS NOT NULL - always use parentheses with OR), but within a condition group(a.ID=b.ID for example) it doesn't matter.

Sorting data from one table based on another table

Here are my tables:
create table tableA (id int, type varchar(5))
insert into tableA (ID,TYPE)
values
(101,'A'),
(101,'C'),
(101,'D'),
(102,'A'),
(102,'B'),
(103,'A'),
(103,'C')
create table tableB (id int, type varchar(5), isActive bit)
insert into tableB (id, type, isActive)
Values
(101,'A',1),
(101,'B',0),
(101,'C',0),
(101,'D',1),
(102,'A',1),
(102,'B',0),
(102,'C',1),
(102,'D',1),
(103,'A',1),
(103,'B',1),
(103,'C',1),
(103,'D',1)
Now, I want to do two things here:
1) Find the rows that are present in tableA but isActive = 0 in tableB. (done)
select A.* from tableA A
join tableB B
on A.id = B.id and A.type = B.type
where B.isactive = 0
2) Find the rows that are missing in tableA but isActive = 1 in tableB.
For example ID 103, type B is active in tableB but is missing in tableA.
I don't care about existance of type D in tableA because I am only checking the last entry in tableA which is C.
Also, ABCD is in order for all IDs, they can be active or inactive.
Thanks for your help!
My effort: (not working)
select A.* from tableA A
where exists (
select B.* from tableA A
join tableB B
on a.id = b.id
where b.isActive = 1
order by b.id,b.type
)
SQLFiddle
I think you are looking for something like the following:
select B.* from tableB B
left join tableA A
on B.id = A.id and B.type = A.type
where B.isActive = 1
and A.id is null
order by B.id, B.type
By using the left join, it means that rows in tableB that have no rows to join with in tableA will have all A.* columns null. This then allows you to add the where clause to check where the tableA records are null thus determining what is contained within tableB that is active and not in tableA

SQL Query efficiency (JOIN or Cartesian Product )

Hello I am Confused with three scenarios which commonly every one use in almost every project.
I wanted to Know which one of these will be Efficient accordinng to - Less Time Complexity - Efficiency - effectiveness
TableA (userid ,username, email , phone)
TableB (username,TestField)
.
Case 1
select email, TestField from TableA , TableB
where TableA.username = TableB.username and
TableB.username = 'ABC'
group by email, TestField
Case 2
select email, TestField from TableA
inner join TableB on TableB.username = 'ABC'
Case 3
declare #uname nvarchar(20);
set #uname = 'ABC';
declare #Email nvarchar(20);
select #Email= email from TableA where username = #uname;
select #Email as email , TestField from TableB
where username = #uname
Case 2 will give you a different output anyway, as you are not joining TableA and TableB in any way so you get a Cartesian product.
Since all of a sudden email came up, you will need a join in case 1:
In Case 1 you can simply rewrite the query to
SELECT DISTINCT A.Email , B.TestField
FROM TableA A join TableB B on A.username = B.Username
WHERE B.username = 'ABC'
Which is more readable and easier to maintain as you do not ave a superfluous GROUP BY clause.
In Case 3 you have userId in your where clause, which is not even in your tableB according to your post.
In general, for maintainability and readibility:
Use explicit joins
SELECT * FROM A JOIN B ON A.id = B.id
is preferable over
SELECT * FROM A, B WHERE A.id = B.id
And use DISTINCT when you want distinct values, instead of GROUP BY over all columns:
SELECT DISTINCT a, b, b FROM TABLE
is preferable over
SELECT a, b, c FROM TABLE GROUP BY a, b, c
Most database experts will tell you that cross products are evil and to be avoided. Your first example would work just fine. It is an implicit inner join.
Your second example is syntactically incorrect. I suspect you'd get an error from MSSQL Server Manager. What you probably meant was:
select a.email, b.TestField
from TableA a inner join TableB b
on (b.username = a.username)
where b.username = 'ABC'
Your first example will probably be the more efficient, since MSSQL Server is smart enough to do the projection on TableB.username before doing the join. I'm not so certain that this would be the case in the above version of case 2.
To be sure you could do it like this:
select a.email, b.TestField
from TableA a inner join
(select * from TableB where TableB.username = 'ABC') b
on (b.username = a.username)
where b.username = 'ABC'
Hope that helps.

Inner join 2 tables but return all if 1 table empty

I have 2 tables say A and B, and I want to do a join on them.
Table A will always have records in it.
When table B has rows in it, I want the query to turn all the rows in which table A and table B matches. (i.e. behave like inner join)
However, if table B is empty, I'd like to return everything from table A.
Is this possible to do in 1 query?
Thanks.
Yes, for results like this, use LEFT JOIN.
Basically what INNER JOIN does is it only returns row where it has atleast one match on the other table. The LEFT JOIN, on the other hand, returns all records on the left hand side table whether it has not match on the other table.
To further gain more knowledge about joins, kindly visit the link below:
Visual Representation of SQL Joins
I came across the same question and, as it was never answered, I post a solution given to this problem somewhere else in case it helps someone in the future.
See the source.
select *
from TableA as a
left join TableB as b
on b.A_Id = a.A_Id
where
b.A_Id is not null or
not exists (select top 1 A_Id from TableB)
Here is another one, but you need to add one "null" row to table B if it's empty
-- In case B is empty
Insert into TableB (col1,col2) values (null,null)
select *
from TableA as a inner join TableB as b
on
b.A_Id = a.A_Id
or b.A_Id is null
I would use an if-else block to solve it like below:
if (select count(*) from tableB) > 0
begin
Select * from TableA a Inner Join TableB b on a.ID = b.A_ID
end
else
begin
Select * from TableA
end
Try This
SELECT t1.* FROM table1 AS t1 INNER JOIN table2 AS t2 ON t1.something = t2.someotherthing UNION SELECT * FROM table1 WHERE something = somethingelse;
This is solution:
CREATE TABLE MyData(Id INT, Something VARCHAR(10), OwnerId INT);
CREATE TABLE OwnerFilter(OwnerId INT);
SELECT *
FROM
(SELECT NULL AS Gr) AS Dummy
LEFT JOIN OwnerFilter F ON (1 = 1)
JOIN MyData D ON (F.OwnerId IS NULL OR D.OwnerId = F.OwnerId);
Link to sqlfiddle: http://sqlfiddle.com/#!6/0f9d9/7
I did the following:
DECLARE #TableB TABLE (id INT)
-- INSERT INTO #TableB
-- VALUES (some ids to filter by)
SELECT TOP 10 *
FROM [TableA] A
LEFT JOIN #TableB B
ON A.ID = B.id
WHERE B.id IS NOT NULL
OR iif(exists(SELECT *
FROM TableB), 1, 0) = 0
Now:
If TableB is empty (leave the commented lines commented) you'll get the top 10.
If TableB has some ids in it, you'll only join by those.
I do not know how efficient this is. Comments are welcome.
Maybe use a CTE
;WITH ctetable(
Select * from TableA
)
IF(EXISTS(SELECT 1 FROM TableB))
BEGIN
Select * from ctetable
Inner join TableB
END
ELSE
BEGIN
Select * from ctetable
END
or dynamic SQL
DECLARE #Query NVARCHAR(max);
SET #QUERY = 'Select * FROM TableA';
IF(EXISTS(SELECT 1 FROM TableB))
BEGIN
SET #QUERY = CONCAT(#QUERY,' INNER JOIN TableB');
END
EXEC sp_executesql #Query