Performance of two left joins versus union - sql

I have searched but have not found a definitive answer. Which of these is better for performance in SQL Server:
SELECT T.*
FROM dbo.Table1 T
LEFT JOIN Table2 T2 ON T.ID = T2.Table1ID
LEFT JOIN Table3 T3 ON T.ID = T3.Table1ID
WHERE T2.Table1ID IS NOT NULL
OR T3.Table1ID IS NOT NULL
or...
SELECT T.*
FROM dbo.Table1 T
JOIN Table2 T2 ON T.ID = T2.Table1ID
UNION
SELECT T.*
FROM dbo.Table1 T
JOIN Table3 T3 ON T.ID = T3.Table1ID
I have tried running both but it's hard to tell for sure. I'd appreciate an explanation of why one is faster than the other, or if it depends on the situation.

Your two queries do not do the same things. In particular, the first will return duplicate rows if values are duplicated in either table.
If you are looking for rows in Table1 that are in either of the other two tables, I would suggest using exists:
select t1.*
from Table1 t1
where exists (select 1 from Table2 t2 where t2.Table1Id = t1.id) or
exists (select 1 from Table3 t3 where t3.Table1Id = t1.id);
And, create indexes on Table1Id in both Table2 and Table3.
Which of your original queries is faster depends a lot on the data. The second has an extra step to remove duplicates (union verses union all). On the other hand, the first might end up creating many duplicate rows.

Related

How to fetch * records from a table which are not present in other two tables

I want to fetch all records with all columns of a table, Records which are not in the other 2 tables. Please help.
I have tried below query, it is working fine for comparing one column. But I want to compare 5 columns.
select * from A
WHERE NOT EXISTS
(select * from B b where b.id=a.id) AND
NOT EXISTS
(select * from C c where c.id=a.id)
A general solution might look like:
SELECT t1.*
FROM table1 t1
WHERE NOT EXISTS (SELECT 1 FROM table2 t2 WHERE t2.id = t1.t2_id) AND
NOT EXISTS (SELECT 1 FROM table3 t3 WHERE t3.id = t1.t3_id);
This assumes that you want to target table1 for records, ensuring that no matches can found in table2 and table3.
I prefer this approach:
SELECT t1.*
FROM table1 AS t1
LEFT JOIN table2 AS t2
ON t1.id = t2.t1_id
LEFT JOIN table3 AS t3
ON t1.id = t3.t1_id
WHERE t2.id IS NULL
AND t3.id IS NULL;
While this might be a bit less intuitive than using sub queries I think odds of making mistakes with aliases are less likely as in the example below, which happens more often than you might expect:
SELECT *
FROM table1
WHERE NOT EXISTS (SELECT 1 FROM table2 WHERE id = id) AND
NOT EXISTS (SELECT 1 FROM table3 WHERE id = id);
To your question about checks on 5 columns, that can still be done using either of these methods by adding conditions either in the left joins or in the where clause of each sub query.

SQL beginner question: unexpected behavior with where exists select 1

I started using SQL a week ago. I am sorry but I have a "why my code does not work" question.
Please look at the following three queries on table1 and table2.
A. Inner join (returned 2 row results)
select t1.*, t2.* from table1 t1, table2 t2
where t1.item = t2.item
and t1.something = t2.something
B. Subquery (returned 2 row results)
select t1.* from table1 t1
where exists (select 1 from table2 t2
where t1.item = t2.item
and t1.something = t2.something)
C. My code (Expected the same results as in A. "Inner join" but takes forever to return results)
select t1.*, t2.* from table1 t1, table2 t2
where exists (select 1 from table2 t2
where t1.item = t2.item
and t1.something = t2.something)
For your reference, # of rows for each table is the following.
select count(*) from table1 -- (100K)
select count(*) from table2 -- (10K)
Would somebody kindly educate me know why my code (C) does not work?
Thank you for your help in advance.
The problem with your (C) query is that the outer reference to table2 is completed unconstrained1. This means that you're effectively writing query B again but also cross joining that result to table2, meaning that you'll get not 2 results but 20000.
You should be using explicit join syntax. One of the advantages of this is that it forces you to think about the join conditions at the point of joining rather than having to remember to include them in the general where clause.
select t1.*, t2.*
from table1 t1
inner join table2 t2
on t1.item = t2.item
and t1.something = t2.something
It's an error to omit the on clause. It's never an error to forget to constrain a column in the where clause2.
1Just because you refer to table2 again inside your exists subquery, and even though you assign it the same t2 alias, that doesn't mean that they are the same reference. The two references to table2 are unrelated in any way.
2Of course, it's often a logical error to do this, but what I mean in this paragraph is specifically about error messages that the system will raise.

JOIN and SELECT values not included in table2

I appreciate this might be very simple for you guys but sometimes the logic behind JOIN can be difficult for beginners. I want to select "ID" from table1 but only those "ID"s which do NOT appear in table2."ID". I tested LEFT and RIGHT but cannot get it to work the way I need to. I am using dashDB.
You can use NOT IN and subquery
Select * from table1 where id NOT IN (select id from table2);
try this...
SELECT *
FROM table1
LEFT JOIN table2 ON table1.ID = table2.ID
WHERE table2.ID IS NULL
I always prefer NOT EXISTS to do this
Select * from table1 a
where NOT EXISTS (select 1 from table2 b where a.id = b.id);
Here is a excellent article by Aaron Bertrand that compares the performance of all the methods
Should I use NOT IN, OUTER APPLY, LEFT OUTER JOIN, EXCEPT, or NOT EXISTS?
Use the below script.
SELECT t1.ID
FROM table1 t1
LEFT JOIN table2 t2 ON t1.ID = t2.ID
WHERE t2.ID IS NULL

Applying joins conditionally in SQL Server

I have some set of records, but now i have to select only those records from this set which have theeir Id in either of the two tables.
Suppose I have table1 which contains
Id Name
----------
1 Name1
2 Name2
Now I need to select only those records from table one
which have either their id in table2 or in table3
I was trying to apply or operator witin inner join like:
select *
from table1
inner join table2 on table2.id = table1.id or
inner join table3 on table3.id = table1.id.
Is it possible? What is the best method to approach this? Actually I am also not able to use
if exist(select 1 from table2 where id=table1.id) then select from table1
Could someone help me to get over this?
Use left join and then check if at least one of the joins has found a relation
select t1.*
from table1 t1
left join table2 t2 on t2.id = t1.id
left join table3 t3 on t3.id = t1.id
where t2.id is not null
or t3.is is not null
I would be inclined to use exists:
select t1.*
from table1 t1
where exists (select 1 from table2 t2 where t2.id = t1.id) or
exists (select 1 from table3 t3 where t3.id = t1.id) ;
The advantage to using exists (or in) over a join involves duplicate rows. If table2 or table3 have multiple rows for a given id, then a version using join will produce multiple rows in the result set.
I think the most efficient way is to use UNION on table2 and table3 and join to it :
SELECT t1.*
FROM table1 t1
INNER JOIN(SELECT id FROM Table2
UNION
SELECT id FROM Table3) s
ON(t.id = s.id)
Alternatively, you can use below SQL as well:
SELECT *
FROM dbo.Table1
WHERE id Table1.IN ( SELECT table2.id
FROM dbo.table2 )
OR Table1.id IN ( SELECT table3.id
FROM Table3 )

Rewrite SQL code SELECT block to simplify logic

I am trying to rewrite this block with simpler logic if this can be done. I am using it within a larger SELECT statement and I think IF I can simplify this block, I might be able to improve performance of my query.
proj_catg_type_id, proj_catg_id and proj_id are all PKs in their tables.
select t1.proj_catg_name
from table1 t1, table2 t2, table3 t3
where t2.proj_catg_type_id = t1.proj_catg_type_id
and t2.proj_catg_type_id = 213
and t3.proj_id = t2.proj_id
Without knowing the referential integrety rules and the logic behind the tables it is difficult to give a 100% correct answer. But just by looking to this statement the most simplified logic would be
select t1.proj_catg_name
from table1 t1
where t1.proj_catg_type_id = 213;
select t1.proj_catg_name
from table1 t1 inner join table2 t2
on t2.proj_catg_type_id=t1.proj_catg_type_id
where t2.proj_catg_type_id=213
and t3.proj_id=t2.proj_i
maybe? is t3 used outside this subselect?
If t3 is a table outside the selct you showed, then this is a correlated subquery which you should not be using at all, ever! That turns your query into a row-by agonizing row cursor.
Use derived tables or joins to get the results.
You don't give me enough code to write a specific solution for your problem, but let me give you an example:
SELECT
field1
, field2
, (SELECT t3.field3
FROM table2 t2
JOIN table3 t3 ON t2.id = t3.id
WHERE t4.somefield = t2.somefield)
FROM table1 t1
JOIn table4 t4 ON t1.id = t4.id
SELECT
field1
, field2
, t3.field3
FROM table1 t1
JOIn table4 t4
ON t1.id = t4.id
join (SELECT field3
FROM table2 t2
JOIN table3 t3 ON t2.id = t3.id) a
ON t4.somefield = t2.somefield
The first query runs one row at a time which is extremely slow. The second should give the same results but runs in a set-based fashion which is much faster. It is important to make sure the derived table has an a alias. You could also use a CTE.