Find deleted rows: Not EXISTS vs Not IN - sql

In my case, I have two table with same structure: TableA & TableB, and what I was trying to do is to find if there is any records only exists in A but not B.
My script was
SELECT * FROM TableA
WHERE NOT EXISTS (
SELECT * FROM TableB
)
While there is 2 records which only exists in A but not B, this script returns nothing. Then I changed into following:
SELECT ID FROM TableA
WHERE ID NOT IN (
SELECT ID FROM TableB
)
This script works successfully and return the 2 records' ID.
My question is: Is this behavior normal? What is the mechanism behind NOT EXISTS and NOT IN?
I have read some other posts comparing NOT EXISTS and NOT IN, and most people suggest using NOT EXISTS in 99.9% scenarios, is this case fall into that 0.1% which NOT EXISTS is not applicable? (I believed it's due to my wrongly usage though, please correct me if that's the case)

If you want to look at all the values in the rows, then use EXCEPT:
SELECT *
FROM TableA
EXCEPT
SELECT *
FROM TableB;
If you want to use NOT EXISTS correctly, then you need a correlation clause:
SELECT a.*
FROM TableA a
WHERE NOT EXISTS (SELECT 1 FROM TableB b WHERE b.id = a.id);
I strongly recommend using NOT EXISTS over NOT IN with a subquery. NOT IN will return no rows at all if b.id is ever NULL. That is usually not what is intended. NOT EXISTS matches the expected semantics.

You need to be careful with the NOT IN expression.
The A NOT IN(B,C,D) expression basically means (A<>B AND A<>C AND A<>D). If any of the values are NULL the whole expression will become NULL.
So, applicable to your example the correct NOT IN expression should be (unless the ID is not nullable column):
SELECT ID FROM TableA
WHERE ID NOT IN (
SELECT ID FROM TableB WHERE ID IS NOT NULL
)

Related

Postgresql select from based on condition

How to run a given select statement based on condition?
If a condition (which comes from table_A) is true then select from table_B otherwise from table_C. Tables have no common column.
Something like this
select case when table_A.flag=true then
(select * from table_B )
else
(select * from table_C )
end
from table_A where ...
The above one will fail of course : more than one row returned by a subquery used as an expression
Since the columns are the same, you could use a UNION. Something like:
SELECT *
FROM Table_B
WHERE (SELECT flag FROM Table_A) = true
UNION ALL
SELECT *
FROM Table_C
WHERE (SELECT flag FROM Table_A) <> true
I'm assuming here that Table_A has only one row, but you could adjust the subquery in the WHERE conditions to get the flag however you need it.
The basic idea is that you set up the two conditions so that only one of them is true at a time (based on your flag). So, even though it is a UNION, only one part of the query will return results and you either end up with Table_B or Table_C.

(SQL) How to check if a value is in another table?

I'm not very good with SQL so I apologise.
I want to be able to go through each row on Table A and check if a specific value exists in an entire column in Table B.
I want to see all rows from table A where value is NOT in specific column in table B.
I hope that makes sense.
You can use not exists. Your question is a bit theorical, the but the logic would be:
select a.*
from tablea a
where not exists (select 1 from tableb b where b.col1 = a.col1)
Where values in tabla(col1) should correspond to values in tableb(col1).
It sounds like not exists:
select a.*
from a
where not exists (select 1 from b where b.col = a.col);

Select Table A minus Table B where condition consists of two columns

I have two tables. Table A and table B.
I would like to select everything from table A which is NOT in table B.
Sounds easy the catch is I need to select it based on two values (two columns)
revision AND casetype. Something like this.
select a.revision, a.casetype from A a
minus
select b.revision, b.casetype from B b;
The problem is I won't get back ID from table A.
Is it possible to select whole table A minus table B where conditions consist of two columns ? I would like to stick to SQL (no PL/SQL)
I also tried to write something like query below but I guess I can't do it since I need to check revision AND casetype altogether
select * from A a where a.casetype IN (select...) and a.revision IN (select...)
Any idea how to work around ? Thanks
Sure, I believe a basic not exists check should work.
select a.id, a.revision, a.casetype
from A a
where not exists (
select 1
from B
where revision = a.revision and casetype = a.casetype
);
Oracle supports tuples, so if you wanted you could do:
select a.*
from a
where (a.revision, a.casetype) in (select a.revision, a.casetype from A a
minus
select b.revision, b.casetype from B b
);
I would normally go for not exists, but this is the solution that builds on what you have already done.
except should work
select a.revision, a.casetype from A a
except
select b.revision, b.casetype from B b;

SQL 0 results for 'Not In' and 'In' when row does exist

I have a table (A) with a list of order numbers. It contains a single row.
Once this order has been processed it should be deleted. However, it is failing to be deleted.
I began investigating, a really simple query is performed for the deletion.
delete from table(A) where orderno not in (select distinct orderno from tableB)
The order number absolutely does not exist in tableB.
I changed the query in SSMS to :
select * from table(A) where orderno not in (select distinct orderno from tableB)
This returned 0 rows. Bare in mind the orderno does exist in tableA.
I then changed the query from "not in" to "In". It still returned 0 rows. How can this be possible that a value is not in a list of values but also not show for the opposite?
Things I have tried:
Two additional developers to look over it.
ltrim(rtrim()) on both the select values.
Various char casts and casting the number as an int.
Has anyone experienced this?
Don't use NOT IN with a subquery. Use NOT EXISTS instead:
delete from tableA
where not exists (select 1 from tableB where tableA.orderno = tableB.orderno);
What is the difference? If any orderno in TableB is NULL, then NOT IN returns NULL. This is correct behavior based on how NULL is defined in SQL, but it is counterintuitive. NOT EXISTS does what you want.
You can use not exists
select *
from table(A) a
where not exists (selet 1 from tableB where orderno = a.orderno);
I have experienced the same.
try joining the two tables tableA and TableB
select * from TableA a
inner join TableB b on a.orderno =b.orderno
This should allow you to get the records and then you can delete the same.

Why can I use a column from a different table in a subquery?

In this example, I feel like I shouldn't be able to make this mistake:
create table A (A_ID int);
create table B (B_ID int, OTHER_ID int);
insert into A values (123);
insert into B values (456, 123);
select * from A where A_ID in (select A_ID from B);
The correct query would be this:
select * from A where A_ID in (select OTHER_ID from B);
Since A_ID does not exist in table B, why doesn't the query throw an error, or at least fail?
Edit: Thanks for the replies! However, to be clear, my question isn't "what's the right way to do this?", I was just curious why this would work.
You should always include qualified table names when your write queries with more than one column. Your first query is interpreted as:
select a.*
from A a
where a.A_ID in (select a.A_ID from B b);
This is called a correlated subquery. They are allowed everywhere, except in the FROM clause.
You should be writing the query as:
select a.*
from A a
where a.A_ID in (select b.OTHER_ID from B b);
This prevents any errors. If you had qualified the column names originally, then your query would have (presumably) generated an error:
select a.*
from A a
where a.A_ID in (select b.A_ID from B b);
The subqueries that you are talking about are called correlated subqueries: these are queries that run under the context of the main query, and hence these provide access to any field that is part of the main query.
Think that it would not make lots of sense not allowing this kind of usage of the fields of the main query as otherwise the SQL would lose lots of power.
You can find further information in the Oracle Help Center.
It works because of scope.
All columns of the outer query are in the scope (visible to) subqueries.
You don't need to qualify columns, for example A.A_ID in:
select * from A where A_ID in (select A.A_ID from B)
if there's no ambiguity in the narrowest scope in which the column is found. For example, if B had a column A_ID you wouldn't need to qualify it, but if there were multiple columns in outer queries called , you would need to qualify it to disambiguate the reference.
For your query, this is how its functioning. A_ID is working like a constant.
select * from dual
where 123 in (select 123 from dual);
123 doesn't exists in dual, but since a row exists in dual, you can select any value you like.
select * from dual;
Output -
Dummy
X
As per OP's query -
select * from A where A_ID in (select A_ID from B);
If the above query was
select * from A where A_ID in (select A_ID from B where 1=2);
It wouldn't return any records.
Scope of column names comes into picture, when there are same column names, which require qualification via Table Alias