inner join on null value - sql

I'm not sure if i made a mistake in logic.
If i have a query and i do an inner join with a null value would i always get no results or will it ignore the join and succeed? example
user { id PK, name NVARCHAR NOT NULL, banStatus nullable reference }
if i write and u.banStatus i will receive no rows?
select * from user as u
join banstatus as b on u.banStatus=b.id
where id=1

You don't get the row if the join is null because NULL cannot be equal to anything, even NULL.
If you change it to a LEFT JOIN, then you will get the row.
With an inner join:
select * from user as u
join banstatus as b on u.banStatus=b.id
1, '1', 1, 'Banned'
With a left join:
select * from user as u
left join banstatus as b on u.banStatus=b.id
1, '1', 1, 'Banned'
2, 'NULL', , ''
Using this test data:
CREATE TABLE user (id int, banstatus nvarchar(100));
INSERT INTO user (id, banstatus) VALUES
(1, '1'),
(2, 'NULL');
CREATE TABLE banstatus (id int, text nvarchar(100));
INSERT INTO banstatus (id, text) VALUES
(1, 'Banned');

When you do an INNER JOIN, NULL values do not match with anything. Not even with each other. That is why your query is not returning any rows. (Source)

This is an inner joins on nulls (Oracle syntax):
select *
from user
uu
join banstatus
bb
on uu.banstatus = bb.id
or
uu.banstatus is null and bb.id is null

Nulls are not equal to any other value, so the join condition is not true for nulls. You can achieve the desired result by choosing a different join condition. Instead of
u.banStatus = b.id
use
u.banStatus = b.id OR (u.banStatus IS NULL AND b.id IS NULL)
Some SQL dialects have a more concise syntax for this kind of comparison:
-- PostgreSQL
u.banStatus IS NOT DISTINCT FROM b.id
-- SQLite
u.banStatus IS b.id

Related

LEFT JOIN with OR clause without UNION

I know this shouldn't happen in a database, but it happened and we have to deal with it. We need to insert new rows into a table if they don't exist based on the values in another table. This is easy enough (just do LEFT JOIN and check for NULL values in 1st table). But...the join isn't very straight forward and we need to search 1st table on 2 conditions with an OR and not AND. So basically if it finds a match on either of the 2 attributes, we consider that the corresponding row in 1st table exists and we don't have to insert a new one. If there are no matches on either of the 2 attributes, then we consider it as a new row. We can use OR condition in the LEFT JOIN statement but from what I understand, it does full table scan and the query takes a very long time to complete even though it yields the right results. We cannot use UNION either because it will not give us what we're looking for.
Just for simplicity purpose consider the scenario below (we need to insert data into tableA).
If(OBJECT_ID('tempdb..#tableA') Is Not Null) Begin
Drop Table #tableA End
If(OBJECT_ID('tempdb..#tableB') Is Not Null) Begin
Drop Table #tableB End
create table #tableA ( email nvarchar(50), id int )
create table #tableB ( email nvarchar(50), id int )
insert into #tableA (email, id) values ('123#abc.com', 1), ('456#abc.com', 2), ('789#abc.com', 3), ('012#abc.com', 4)
insert into #tableB (email, id) values ('234#abc.com', 1), ('456#abc.com', 2), ('567#abc.com', 3), ('012#abc.com', 4), ('345#abc.com', 5)
--THIS QUERY IS CORRECTLY RETURNING 1 RECORD
select B.email, B.id
from #tableB B
left join #tableA A on A.email = B.email or B.id = A.id
where A.id is null
--THIS QUERY IS INCORRECTLY RETURNING 3 RECORDS SINCE THERE ARE ALREADY RECORDS WITH ID's 1 & 3 in tableA though the email addresses of these records don't match
select B.email, B.id
from #tableB B
left join #tableA A on A.email = B.email
where A.id is null
union
select B.email, B.id
from #tableB B
left join #tableA A on B.id = A.id
where A.id is null
If(OBJECT_ID('tempdb..#tableA') Is Not Null) Begin
Drop Table #tableA End
If(OBJECT_ID('tempdb..#tableB') Is Not Null) Begin
Drop Table #tableB End
The 1st query works correctly and only returns 1 record, but the table size is just few records and it completes under 1 sec. When the 2 tables have thousands or records, the query may take 10 min to complete. The 2nd query of course returns the records we don't want to insert because we consider them existing. Is there a way to optimize this query so it takes an acceptable time to complete?
You are using an anti join, which is another way of writing the straight-forward NOT EXISTS:
where not exists
(
select null
from #tableA A
where A.email = B.email or B.id = A.id
)
I.e. where not exists a row in table A with the same email or the same id. In other words: where not exists a row with the same email and not exists a row with the same id.
where not exists (select null from #tableA A where A.email = B.email)
and not exists (select null from #tableA A where B.id = A.id)
With the appropriate indexes
on #tableA (id);
on #tableA (email);
this should be very fast.
It's hard to tune something you can't see. Another option to get the data is to:
SELECT B.email
, B.id
FROM #TableB B
EXCEPT
(
SELECT B.email
, B.id
FROM #tableB B
INNER JOIN #tableA A
ON A.email = B.email
UNION ALL
SELECT B.email
, B.id
FROM #tableB B
INNER JOIN #tableA A
ON B.id = A.id
)
This way you don't have to use OR, you can use INNER JOIN rather than LEFT JOIN and you can use UNION ALL instead of UNION (though this advantage may well be negated by the EXCEPT). All of which may help your performance. Perhaps the joins can be more efficient when replaced with EXISTS.
You didn't mention how this problem occurred (where the data from both tables is coming from, and why they are out of sync when they shouldn't be), but it would be preferable to fix it at the source.
No the query returns correctly 3 rows
because
select B.email, B.id
from #tableB B
left join #tableA A on A.email = B.email
where A.id is null
Allone reurns the 3 rows.
For your "problemm"
select B.email, B.id
from #tableB B
left join #tableA A on A.email = B.email or B.id = A.id
where A.id is null
will che3kc for every row, if it is true to be included
So for example
('123#abc.com', 1) ('234#abc.com', 1)
as the Ids are the same it will be joined
but when you join by the emails the condition is false and so is included in the result set
You can only use the UNION approach, when you are comparing only the emails or the ids, but with both the queries are not equivalent

Conditional Join or Filter based on Variable

I have a stored procedure that I'm passing 22 string variables to from SSRS. The variables can either be null or be a string. When a Null value is passed to the stored procedure, I need to return all records. While if the a string is passed, I need to filter to only those values. In my example, I'm using string_split and saving the variables into a temporary table.
Drop table if exists #Authors
Drop table if exists #Temp
CREATE TABLE #Authors
(
Id INT PRIMARY KEY,
Names VARCHAR (50) NOT NULL,
)
INSERT INTO #Authors
VALUES (1, 'AuthorA'),
(2, 'AuthorB'),
(3, 'AuthorC'),
(10, 'AuthorD'),
(12, 'AuthorE')
DECLARE
#vartest1 AS VARCHAR(20)
SET #vartest1 = 'AuthorA,AuthorB'
SELECT VALUE AS Names INTO #Temp FROM string_split(#vartest1, ',')
SELECT * FROM #Temp
SELECT * FROM #Authors a
INNER JOIN #TEMP t
ON a.Names=t.Names
SELECT *
FROM #Authors a
INNER JOIN
CASE
WHEN len(#vartest1)>0 #Temp t
ELSE #Authors a
END
ON
CASE
WHEN len(#vartest1)>0 Then #Temp.Names
Else a.Names
END = a.Names
Then I try to create a case join where it either joins to the temporary table or on itself to return all. I've read where people have used unions, but I don't think that'd work for 22 parameters.
Have you considered doing a left join with a condition that checks whether the variable was null?
Something like
SELECT a.* FROM #Authors a
LEFT JOIN #TEMP t
ON a.Names=t.Names
WHERE
#vartest1 is null -- Include all if input is null...
or t.names is not null -- ... Otherwise exclude where no match in #TEMP (treat like inner join)
I think this meets your requirements. The left join ensures we have all the records in #Authors, then the WHERE handles the two possible cases (#vartest1 is null to get them all or t.names is not null to check if there was actually a match).
You are likely aware of this, but there's no need to use a temporary table to handle your split values. You can do:
SELECT a.* FROM #Authors a
LEFT JOIN string_split(#vartest1, ',') t
ON a.Names=t.value
WHERE #vartest1 is null or t.value is not null
I think the above is very reasonable, but something more along your original thoughts would be to use IF like:
IF #vartest1 is null
SELECT a.* FROM #Authors a;
ELSE
SELECT a.* FROM #Authors a inner join #Temp t ON a.Names=t.Names;

max function does not when having case when clause

i have two tables.
one is as below
table a
ID, count
1, 123
2, 123
3, 123
table b
ID, count
table b is empty
when using
SELECT CASE
WHEN isnotnull(max(b.count)) THEN max(a.count) + max(b.count)
ELSE max(a.count)
FROM a, b
the only result is always NULL
i am very confused. why?
You don't need to use a JOIN, a simple SUM of two sub-queries will give you your desired result. Since you only add MAX(b.count) when it is non-NULL, we can just add it all the time but COALESCE it to 0 when it is NULL.
SELECT COALESCE((SELECT MAX(count) FROM b), 0) + (SELECT MAX(count) FROM a)
Another way to make this work is to UNION the count values from each table:
SELECT COALESCE(MAX(bcount), 0) + MAX(acount)
FROM (SELECT count AS acount, NULL AS bcount FROM a
UNION
SELECT NULL AS acount, count AS bcount FROM b) u
Note that if you use a JOIN it must be a FULL JOIN. If you use a LEFT JOIN you risk not seeing all the values from table b. For example, consider the case where table b has one entry: ID=4, count=456. A LEFT JOIN on ID will not include this value in the result table (since table a only has ID values of 1,2 and 3) so you will get the wrong result:
CREATE TABLE a (ID INT, count INT);
INSERT INTO a VALUES (1, 123), (2, 123), (3, 123);
CREATE TABLE b (ID INT, count INT);
INSERT INTO b VALUES (4, 456);
SELECT COALESCE(MAX(b.count), 0) + MAX(a.count)
FROM a
LEFT JOIN b ON a.ID = b.ID
Output
123 (should be 579)
To use a FULL JOIN you would write
SELECT COALESCE(MAX(b.count), 0) + MAX(a.count)
FROM a
FULL JOIN b ON a.ID = b.ID
Since, tableb is empty, max(b.count) will return NULL. And any operation done with NULL, results in NULL.
So, max(a.count) + max(b.count) is NULL.(this is 123 + NULL which will be NULL always). Hence, your query is returning NULL.
Just use a coalesce to assign a default value whenever NULL comes.
use coalesce() function and explicit join, avoid coma separated table name type old join method
select coalesce(max(a.count)+max(b.count),max(a.count))
from a left join b on a.id=b.id
Use left join
SELECT coalesce(max(a.count) + max(b.count),max(a.count))
FROM a left join b a.id=b.id

Compare two tables, find missing rows and mismatched data

I'd like to compare two tables and get a set of results where the lookup values are mismatched as well as where the key values are missing from the other table. The first part works fine with the following query:
SELECT * FROM (
SELECT mID, mLookup
FROM m) t1
FULL OUTER JOIN (
SELECT aID, aLookup
FROM a) t2
ON t1.mID = t2.aID
WHERE
t1.mID = t2.aID AND
t1.mLookup <> t2.aLookup
However, it doesn't return rows from t1 and t2 where there is no corresponding ID in the other table (because of the ON t1.mID = t2.aID).
How can I achieve both in the same query?
Remove the ID part of the WHERE clause. The FULL OUTER JOIN ON t1.mID = t2.aID is enough to link the tables together. The FULL OUTER JOIN will return both tables in the join even if one does not have a match.
However, the WHERE t1.m_ID = t2.aID clause limits the results to IDs that exist in both tables. This effectively causes the FULL OUTER JOIN to act like an INNER JOIN.
In other words:
SELECT * FROM (
SELECT mID, mLookup
FROM m) t1
FULL OUTER JOIN (
SELECT aID, aLookup
FROM a) t2
ON t1.mID = t2.aID
WHERE
--t1.mID = t2.aID AND -- remove this line
t1.mLookup <> t2.aLookup
-- EDIT --
Re-reading your question, you wanted only the mismatches. In that case, you need to search on where either side's ID is NULL:
SELECT * FROM (
SELECT mID, mLookup
FROM m) t1
FULL OUTER JOIN (
SELECT aID, aLookup
FROM a) t2
ON t1.mID = t2.aID
WHERE
t1.mID IS NULL OR
t2.mID IS NULL OR
t1.mLookup <> t2.aLookup
The where clause of your query filters out those rows that dont have matching "Ids". Try this:
SELECT m.mId, m.mLookup, a.aId, a.aLookup
from m
full outer join a
on a.aId = m.mId
where m.mId is null
or a.aID is null
or m.mLookup <> a.aLookup
The full outer join gets all possible rows, and the where clause keeps all rows where one or the other side are null and, where they match (neither null), keeps only those rows where the "lookup" values differ.
Starting from SQL Server 2008 and also valid for Azure SQL Database, Azure SQL Data Warehouse, Parallel Data Warehouse
Following is the SQL queries;
USE [test]
GO
CREATE TABLE [dbo].[Student1](
[Id] [int] NOT NULL,
[Name] [nvarchar](256) NOT NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[Student2](
[Id] [int] NOT NULL,
[Name] [nvarchar](256) NOT NULL
) ON [PRIMARY]
GO
---- You can re-run from here with your data
truncate table [Student1]
truncate table [Student2]
insert into [Student1] values (1, N'سید حیدر')
insert into [Student1] values (2, N'Syed Ali')
insert into [Student1] values (3, N'Misbah Arfin')
insert into [Student2] values (2, N'Syed Ali')
insert into [Student2] values (3, N'Misbah Arfin');
with StudentsAll (Id, [Name]) as
(
select s1.Id, s1.[Name] from Student1 s1
left outer join Student2 s2
on
s1.Id = s2.Id
),
StudentsMatched (Id, [Name]) as
(
select s1.Id, s1.[Name] from Student1 s1
inner join Student2 s2
on
s1.Id = s2.Id
)
select * from StudentsAll
except
select * from StudentsMatched

Can a where clause on a join be used in a view

I'm attempting to create an SQL view that consolidates a number of separate select queries. I've encountered some difficulty putting clauses from the individual select statements into the database view.
A simplified version of my view is:
create or replace view TestView as
select
A.Name,
B.Subscription,
C.Expiry
from
TestTableA as A left outer join TestTableB as B on A.ID = B.A_ID
left outer join TestTableC as C on A.ID = C.A_ID;
I've got two problems with the view:
On the frist join how can I only select record where the Subscription is a specific value AND if it is not that value still retrieve the Name and Expiry columns (in which case the Subscription would be null)?
On the second join how can I specify I only want the record with the most recent expiry date?
Below is my test schema, sample data and desired result set:
create table TestTableA
(
ID int,
Name varchar(32),
Primary Key(ID)
);
create table TestTableB
(
ID int,
A_ID int,
Subscription varchar(32),
Primary Key(ID),
Foreign Key(A_ID) references TestTableA(ID)
);
create table TestTableC
(
ID int,
A_ID int,
Expiry date,
Primary Key(ID),
Foreign Key(A_ID) references TestTableA(ID)
);
create or replace view TestView as
select
A.Name,
B.Subscription,
C.Expiry
from
TestTableA as A left outer join TestTableB as B on A.ID = B.A_ID
left outer join TestTableC as C on A.ID = C.A_ID;
insert into TestTableA values (1, 'Joe');
insert into TestTableB values (1, 1, 'abcd');
insert into TestTableB values (2, 1, 'efgh');
insert into TestTableC values (1, 1, '2012-10-25');
insert into TestTableC values (2, 1, '2012-10-24');
insert into TestTableA values (2, 'Jane');
Desired Results 1:
select * from TestView where Subscription is null or Subscription = 'efgh';
Joe, efgh, 2012-10-25
Jane, ,
Desired Results 2:
select * from TestView where Subscription is null or Subscription = 'xxxx';
Joe, , 2012-10-25
Jane, ,
I'll write query with simple SQL
If you have SQL Server 2005 or higher, you can use outer apply instead of join on subquery with min()
select
A.Name,
B.Subscription,
C.Expiry
from TestTableA as A
left outer join TestTableB as B on A.ID = B.A_ID and B.Subscription in ('abcd', 'efgh')
left outer join
(
select min(T.Expiry) as Expiry, T.A_ID
from TestTableC as T
group by T.A_ID
) as C on A.ID = C.A_ID
create or replace view TestView as
select
A.Name,
B.Subscription,
C.Expiry
from
TestTableA as A left outer join TestTableB as B on A.ID = B.A_ID
left outer join TestTableC as C on A.ID = C.A_ID;
where
B.Subscription is not null
and C.Expiry between (now() - interval 1 minute) and now()