SQL beginner question: unexpected behavior with where exists select 1 - sql

I started using SQL a week ago. I am sorry but I have a "why my code does not work" question.
Please look at the following three queries on table1 and table2.
A. Inner join (returned 2 row results)
select t1.*, t2.* from table1 t1, table2 t2
where t1.item = t2.item
and t1.something = t2.something
B. Subquery (returned 2 row results)
select t1.* from table1 t1
where exists (select 1 from table2 t2
where t1.item = t2.item
and t1.something = t2.something)
C. My code (Expected the same results as in A. "Inner join" but takes forever to return results)
select t1.*, t2.* from table1 t1, table2 t2
where exists (select 1 from table2 t2
where t1.item = t2.item
and t1.something = t2.something)
For your reference, # of rows for each table is the following.
select count(*) from table1 -- (100K)
select count(*) from table2 -- (10K)
Would somebody kindly educate me know why my code (C) does not work?
Thank you for your help in advance.

The problem with your (C) query is that the outer reference to table2 is completed unconstrained1. This means that you're effectively writing query B again but also cross joining that result to table2, meaning that you'll get not 2 results but 20000.
You should be using explicit join syntax. One of the advantages of this is that it forces you to think about the join conditions at the point of joining rather than having to remember to include them in the general where clause.
select t1.*, t2.*
from table1 t1
inner join table2 t2
on t1.item = t2.item
and t1.something = t2.something
It's an error to omit the on clause. It's never an error to forget to constrain a column in the where clause2.
1Just because you refer to table2 again inside your exists subquery, and even though you assign it the same t2 alias, that doesn't mean that they are the same reference. The two references to table2 are unrelated in any way.
2Of course, it's often a logical error to do this, but what I mean in this paragraph is specifically about error messages that the system will raise.

Related

How to fetch * records from a table which are not present in other two tables

I want to fetch all records with all columns of a table, Records which are not in the other 2 tables. Please help.
I have tried below query, it is working fine for comparing one column. But I want to compare 5 columns.
select * from A
WHERE NOT EXISTS
(select * from B b where b.id=a.id) AND
NOT EXISTS
(select * from C c where c.id=a.id)
A general solution might look like:
SELECT t1.*
FROM table1 t1
WHERE NOT EXISTS (SELECT 1 FROM table2 t2 WHERE t2.id = t1.t2_id) AND
NOT EXISTS (SELECT 1 FROM table3 t3 WHERE t3.id = t1.t3_id);
This assumes that you want to target table1 for records, ensuring that no matches can found in table2 and table3.
I prefer this approach:
SELECT t1.*
FROM table1 AS t1
LEFT JOIN table2 AS t2
ON t1.id = t2.t1_id
LEFT JOIN table3 AS t3
ON t1.id = t3.t1_id
WHERE t2.id IS NULL
AND t3.id IS NULL;
While this might be a bit less intuitive than using sub queries I think odds of making mistakes with aliases are less likely as in the example below, which happens more often than you might expect:
SELECT *
FROM table1
WHERE NOT EXISTS (SELECT 1 FROM table2 WHERE id = id) AND
NOT EXISTS (SELECT 1 FROM table3 WHERE id = id);
To your question about checks on 5 columns, that can still be done using either of these methods by adding conditions either in the left joins or in the where clause of each sub query.

Adding rows of one table to rows of another table where two tables are matched by ID

In an Access 2013 database, I have a table t1 and another table t2. They both have the same number of columns and column names are also the same. Table t2 have a number of overlaps with id variable of table t1. I am trying to make a new table t3 where I add all the rows of t1 and only those rows of t2 that are not matched by an id variable present in both the tables t1 and t2. I used something like
Create Table t3 As Select * From (Select t1.* From t1 Inner Join t2 on t1.ID_Number = t2. ID_Number)
This throws syntax error. However, even if it worked this will select those rows that matches ID_Number in both the tables. I have tried various other codes and browsed through many other relevant stackoverflow post but could not resolve it.
try this :
SELECT t1.*
INTO t3
FROM t1
INNER JOIN t2
ON t1.ID_Number = t2.ID_Number
I am not sure about Access syntax but can this 2-step solution work?
select t1.* into t3 from t1 where t1.ID_Number not in (select t2.ID_Number from t2)
select t2.* into t3 from t2 where t2.ID_Number not in (select t1.ID_Number from t1)

Performance of two left joins versus union

I have searched but have not found a definitive answer. Which of these is better for performance in SQL Server:
SELECT T.*
FROM dbo.Table1 T
LEFT JOIN Table2 T2 ON T.ID = T2.Table1ID
LEFT JOIN Table3 T3 ON T.ID = T3.Table1ID
WHERE T2.Table1ID IS NOT NULL
OR T3.Table1ID IS NOT NULL
or...
SELECT T.*
FROM dbo.Table1 T
JOIN Table2 T2 ON T.ID = T2.Table1ID
UNION
SELECT T.*
FROM dbo.Table1 T
JOIN Table3 T3 ON T.ID = T3.Table1ID
I have tried running both but it's hard to tell for sure. I'd appreciate an explanation of why one is faster than the other, or if it depends on the situation.
Your two queries do not do the same things. In particular, the first will return duplicate rows if values are duplicated in either table.
If you are looking for rows in Table1 that are in either of the other two tables, I would suggest using exists:
select t1.*
from Table1 t1
where exists (select 1 from Table2 t2 where t2.Table1Id = t1.id) or
exists (select 1 from Table3 t3 where t3.Table1Id = t1.id);
And, create indexes on Table1Id in both Table2 and Table3.
Which of your original queries is faster depends a lot on the data. The second has an extra step to remove duplicates (union verses union all). On the other hand, the first might end up creating many duplicate rows.

Joining selected column to a table

I am try running this query and it takes long time because of the join i am using
SELECT T1.Id,T2.T2Id,T2.Col2
FROM Table1 T1
LEFT OUTER JOIN (SELECT TOP 1 Id, TT.T2Id,TT.Col2
FROM Table2 TT
WHERE TT.TypeId=3
ORDER BY TT.OrderId
)AS T2 ON T2 .Id=T1.Id
Thing is it doesn't let me do something like TT.Id=T1.Id with in the join query.
Is there any other way I can do this?
Try it with outer apply:
SELECT T1.Id, T2.T2Id, T2.Col2
FROM Table1 T1
OUTER APPLY (SELECT TOP 1 T2Id, T2.Col2
FROM Table2 TT
WHERE TT.TypeId = 3 AND TT.Id = T1.Id) T2
SELECT T1.Id, T2.T2Id, T2.Col2
FROM Table1 T1
OUTER APPLY (SELECT TOP 1 T2Id, T2.Col2
FROM Table2 TT
WHERE TT.TypeId = 3 AND T1.Id = TT.Id
Order by T2id desc) T2
I would use Outer Apply and T1.Id = TT.Id in the where condition since T1 is the parent table plus adding on order by - if needed for ordered result set
Well first of all your derived table will produce non deterministic results, as the top 1 row you return may differ each time you run it, even if the data in the table remains the same. You could put an order by clause in the the derived table to prevent that.
Is there an index on Table1.id? What exactly are you trying to achieve though, is it to return all rows from Table1, with just one row of many from Table2 that has the same ID?
If so I would look into using Cross Apply instead. Or maybe in this case Outer Apply. If I get a chance later I'll write up an example if needed, but in the mean time just Google Outer Apply for SQL Server.
Dan

Rewrite SQL code SELECT block to simplify logic

I am trying to rewrite this block with simpler logic if this can be done. I am using it within a larger SELECT statement and I think IF I can simplify this block, I might be able to improve performance of my query.
proj_catg_type_id, proj_catg_id and proj_id are all PKs in their tables.
select t1.proj_catg_name
from table1 t1, table2 t2, table3 t3
where t2.proj_catg_type_id = t1.proj_catg_type_id
and t2.proj_catg_type_id = 213
and t3.proj_id = t2.proj_id
Without knowing the referential integrety rules and the logic behind the tables it is difficult to give a 100% correct answer. But just by looking to this statement the most simplified logic would be
select t1.proj_catg_name
from table1 t1
where t1.proj_catg_type_id = 213;
select t1.proj_catg_name
from table1 t1 inner join table2 t2
on t2.proj_catg_type_id=t1.proj_catg_type_id
where t2.proj_catg_type_id=213
and t3.proj_id=t2.proj_i
maybe? is t3 used outside this subselect?
If t3 is a table outside the selct you showed, then this is a correlated subquery which you should not be using at all, ever! That turns your query into a row-by agonizing row cursor.
Use derived tables or joins to get the results.
You don't give me enough code to write a specific solution for your problem, but let me give you an example:
SELECT
field1
, field2
, (SELECT t3.field3
FROM table2 t2
JOIN table3 t3 ON t2.id = t3.id
WHERE t4.somefield = t2.somefield)
FROM table1 t1
JOIn table4 t4 ON t1.id = t4.id
SELECT
field1
, field2
, t3.field3
FROM table1 t1
JOIn table4 t4
ON t1.id = t4.id
join (SELECT field3
FROM table2 t2
JOIN table3 t3 ON t2.id = t3.id) a
ON t4.somefield = t2.somefield
The first query runs one row at a time which is extremely slow. The second should give the same results but runs in a set-based fashion which is much faster. It is important to make sure the derived table has an a alias. You could also use a CTE.