JOIN and SELECT values not included in table2 - sql

I appreciate this might be very simple for you guys but sometimes the logic behind JOIN can be difficult for beginners. I want to select "ID" from table1 but only those "ID"s which do NOT appear in table2."ID". I tested LEFT and RIGHT but cannot get it to work the way I need to. I am using dashDB.

You can use NOT IN and subquery
Select * from table1 where id NOT IN (select id from table2);

try this...
SELECT *
FROM table1
LEFT JOIN table2 ON table1.ID = table2.ID
WHERE table2.ID IS NULL

I always prefer NOT EXISTS to do this
Select * from table1 a
where NOT EXISTS (select 1 from table2 b where a.id = b.id);
Here is a excellent article by Aaron Bertrand that compares the performance of all the methods
Should I use NOT IN, OUTER APPLY, LEFT OUTER JOIN, EXCEPT, or NOT EXISTS?

Use the below script.
SELECT t1.ID
FROM table1 t1
LEFT JOIN table2 t2 ON t1.ID = t2.ID
WHERE t2.ID IS NULL

Related

SQL Join tables with empty values

I have 2 tables
Lets say Table1 and Table2
They both have one shared value(id)
What I'm looking for is whether there is any function to combine them both based on that key, however if table2 has more elements, i want columns of table1 to be empty, and if table1 has more elements, table 2 columns to be empty
I tried a lot of different joins, but most of the time I end up with a lot of duplicate values as it tries to fill in both sides.
Tried Full outer join, Full join, etc
You are looking for full join:
select t1.*, t2.*
from t1 full join
t2
on t1.id = t2.id;
The above code from Gordon is right. However, since you have not specified the database and its version, I will post an alternate version for MySQL, which should also work for other databases.
Without duplicates:
SELECT * FROM Table1
LEFT JOIN Table2 ON Table1.id = Table2.id
UNION
SELECT * FROM Table1
RIGHT JOIN Table2 ON Table1.id = Table2.id
With duplicates:
SELECT * FROM Table1
LEFT JOIN Table2 ON Table1.id = Table2.id
UNION ALL
SELECT * FROM Table1
RIGHT JOIN Table2 ON Table1.id = Table2.id

SQL beginner question: unexpected behavior with where exists select 1

I started using SQL a week ago. I am sorry but I have a "why my code does not work" question.
Please look at the following three queries on table1 and table2.
A. Inner join (returned 2 row results)
select t1.*, t2.* from table1 t1, table2 t2
where t1.item = t2.item
and t1.something = t2.something
B. Subquery (returned 2 row results)
select t1.* from table1 t1
where exists (select 1 from table2 t2
where t1.item = t2.item
and t1.something = t2.something)
C. My code (Expected the same results as in A. "Inner join" but takes forever to return results)
select t1.*, t2.* from table1 t1, table2 t2
where exists (select 1 from table2 t2
where t1.item = t2.item
and t1.something = t2.something)
For your reference, # of rows for each table is the following.
select count(*) from table1 -- (100K)
select count(*) from table2 -- (10K)
Would somebody kindly educate me know why my code (C) does not work?
Thank you for your help in advance.
The problem with your (C) query is that the outer reference to table2 is completed unconstrained1. This means that you're effectively writing query B again but also cross joining that result to table2, meaning that you'll get not 2 results but 20000.
You should be using explicit join syntax. One of the advantages of this is that it forces you to think about the join conditions at the point of joining rather than having to remember to include them in the general where clause.
select t1.*, t2.*
from table1 t1
inner join table2 t2
on t1.item = t2.item
and t1.something = t2.something
It's an error to omit the on clause. It's never an error to forget to constrain a column in the where clause2.
1Just because you refer to table2 again inside your exists subquery, and even though you assign it the same t2 alias, that doesn't mean that they are the same reference. The two references to table2 are unrelated in any way.
2Of course, it's often a logical error to do this, but what I mean in this paragraph is specifically about error messages that the system will raise.

Applying joins conditionally in SQL Server

I have some set of records, but now i have to select only those records from this set which have theeir Id in either of the two tables.
Suppose I have table1 which contains
Id Name
----------
1 Name1
2 Name2
Now I need to select only those records from table one
which have either their id in table2 or in table3
I was trying to apply or operator witin inner join like:
select *
from table1
inner join table2 on table2.id = table1.id or
inner join table3 on table3.id = table1.id.
Is it possible? What is the best method to approach this? Actually I am also not able to use
if exist(select 1 from table2 where id=table1.id) then select from table1
Could someone help me to get over this?
Use left join and then check if at least one of the joins has found a relation
select t1.*
from table1 t1
left join table2 t2 on t2.id = t1.id
left join table3 t3 on t3.id = t1.id
where t2.id is not null
or t3.is is not null
I would be inclined to use exists:
select t1.*
from table1 t1
where exists (select 1 from table2 t2 where t2.id = t1.id) or
exists (select 1 from table3 t3 where t3.id = t1.id) ;
The advantage to using exists (or in) over a join involves duplicate rows. If table2 or table3 have multiple rows for a given id, then a version using join will produce multiple rows in the result set.
I think the most efficient way is to use UNION on table2 and table3 and join to it :
SELECT t1.*
FROM table1 t1
INNER JOIN(SELECT id FROM Table2
UNION
SELECT id FROM Table3) s
ON(t.id = s.id)
Alternatively, you can use below SQL as well:
SELECT *
FROM dbo.Table1
WHERE id Table1.IN ( SELECT table2.id
FROM dbo.table2 )
OR Table1.id IN ( SELECT table3.id
FROM Table3 )

Performance of two left joins versus union

I have searched but have not found a definitive answer. Which of these is better for performance in SQL Server:
SELECT T.*
FROM dbo.Table1 T
LEFT JOIN Table2 T2 ON T.ID = T2.Table1ID
LEFT JOIN Table3 T3 ON T.ID = T3.Table1ID
WHERE T2.Table1ID IS NOT NULL
OR T3.Table1ID IS NOT NULL
or...
SELECT T.*
FROM dbo.Table1 T
JOIN Table2 T2 ON T.ID = T2.Table1ID
UNION
SELECT T.*
FROM dbo.Table1 T
JOIN Table3 T3 ON T.ID = T3.Table1ID
I have tried running both but it's hard to tell for sure. I'd appreciate an explanation of why one is faster than the other, or if it depends on the situation.
Your two queries do not do the same things. In particular, the first will return duplicate rows if values are duplicated in either table.
If you are looking for rows in Table1 that are in either of the other two tables, I would suggest using exists:
select t1.*
from Table1 t1
where exists (select 1 from Table2 t2 where t2.Table1Id = t1.id) or
exists (select 1 from Table3 t3 where t3.Table1Id = t1.id);
And, create indexes on Table1Id in both Table2 and Table3.
Which of your original queries is faster depends a lot on the data. The second has an extra step to remove duplicates (union verses union all). On the other hand, the first might end up creating many duplicate rows.

Join where does not equal

I have two queries I have created in Access and I'm a step away from arriving at the data I need. I need to subset in some way.
Table1 and Table2 (Actually query1 and query2). They both have 3 fields: Email, Matcher and List.
I need to get all the results from Table2 where Email does not exist in Table1.
I found some posts about using an outer join and where null clause. I could not get it to work though. Didn't post what I tried here in case I was off course.
select t2.*
from table2 t2
left join table1 t1 on t2.email = t1.email
where t1.email is null
SELECT t2.*
FROM table2 t2
WHERE NOT EXISTS (
SELECT *
FROM table1 t1
WHERE t1.email = t2.email
)