Select Table A minus Table B where condition consists of two columns - sql

I have two tables. Table A and table B.
I would like to select everything from table A which is NOT in table B.
Sounds easy the catch is I need to select it based on two values (two columns)
revision AND casetype. Something like this.
select a.revision, a.casetype from A a
minus
select b.revision, b.casetype from B b;
The problem is I won't get back ID from table A.
Is it possible to select whole table A minus table B where conditions consist of two columns ? I would like to stick to SQL (no PL/SQL)
I also tried to write something like query below but I guess I can't do it since I need to check revision AND casetype altogether
select * from A a where a.casetype IN (select...) and a.revision IN (select...)
Any idea how to work around ? Thanks

Sure, I believe a basic not exists check should work.
select a.id, a.revision, a.casetype
from A a
where not exists (
select 1
from B
where revision = a.revision and casetype = a.casetype
);

Oracle supports tuples, so if you wanted you could do:
select a.*
from a
where (a.revision, a.casetype) in (select a.revision, a.casetype from A a
minus
select b.revision, b.casetype from B b
);
I would normally go for not exists, but this is the solution that builds on what you have already done.

except should work
select a.revision, a.casetype from A a
except
select b.revision, b.casetype from B b;

Related

Why can I use a column from a different table in a subquery?

In this example, I feel like I shouldn't be able to make this mistake:
create table A (A_ID int);
create table B (B_ID int, OTHER_ID int);
insert into A values (123);
insert into B values (456, 123);
select * from A where A_ID in (select A_ID from B);
The correct query would be this:
select * from A where A_ID in (select OTHER_ID from B);
Since A_ID does not exist in table B, why doesn't the query throw an error, or at least fail?
Edit: Thanks for the replies! However, to be clear, my question isn't "what's the right way to do this?", I was just curious why this would work.
You should always include qualified table names when your write queries with more than one column. Your first query is interpreted as:
select a.*
from A a
where a.A_ID in (select a.A_ID from B b);
This is called a correlated subquery. They are allowed everywhere, except in the FROM clause.
You should be writing the query as:
select a.*
from A a
where a.A_ID in (select b.OTHER_ID from B b);
This prevents any errors. If you had qualified the column names originally, then your query would have (presumably) generated an error:
select a.*
from A a
where a.A_ID in (select b.A_ID from B b);
The subqueries that you are talking about are called correlated subqueries: these are queries that run under the context of the main query, and hence these provide access to any field that is part of the main query.
Think that it would not make lots of sense not allowing this kind of usage of the fields of the main query as otherwise the SQL would lose lots of power.
You can find further information in the Oracle Help Center.
It works because of scope.
All columns of the outer query are in the scope (visible to) subqueries.
You don't need to qualify columns, for example A.A_ID in:
select * from A where A_ID in (select A.A_ID from B)
if there's no ambiguity in the narrowest scope in which the column is found. For example, if B had a column A_ID you wouldn't need to qualify it, but if there were multiple columns in outer queries called , you would need to qualify it to disambiguate the reference.
For your query, this is how its functioning. A_ID is working like a constant.
select * from dual
where 123 in (select 123 from dual);
123 doesn't exists in dual, but since a row exists in dual, you can select any value you like.
select * from dual;
Output -
Dummy
X
As per OP's query -
select * from A where A_ID in (select A_ID from B);
If the above query was
select * from A where A_ID in (select A_ID from B where 1=2);
It wouldn't return any records.
Scope of column names comes into picture, when there are same column names, which require qualification via Table Alias

Find deleted rows: Not EXISTS vs Not IN

In my case, I have two table with same structure: TableA & TableB, and what I was trying to do is to find if there is any records only exists in A but not B.
My script was
SELECT * FROM TableA
WHERE NOT EXISTS (
SELECT * FROM TableB
)
While there is 2 records which only exists in A but not B, this script returns nothing. Then I changed into following:
SELECT ID FROM TableA
WHERE ID NOT IN (
SELECT ID FROM TableB
)
This script works successfully and return the 2 records' ID.
My question is: Is this behavior normal? What is the mechanism behind NOT EXISTS and NOT IN?
I have read some other posts comparing NOT EXISTS and NOT IN, and most people suggest using NOT EXISTS in 99.9% scenarios, is this case fall into that 0.1% which NOT EXISTS is not applicable? (I believed it's due to my wrongly usage though, please correct me if that's the case)
If you want to look at all the values in the rows, then use EXCEPT:
SELECT *
FROM TableA
EXCEPT
SELECT *
FROM TableB;
If you want to use NOT EXISTS correctly, then you need a correlation clause:
SELECT a.*
FROM TableA a
WHERE NOT EXISTS (SELECT 1 FROM TableB b WHERE b.id = a.id);
I strongly recommend using NOT EXISTS over NOT IN with a subquery. NOT IN will return no rows at all if b.id is ever NULL. That is usually not what is intended. NOT EXISTS matches the expected semantics.
You need to be careful with the NOT IN expression.
The A NOT IN(B,C,D) expression basically means (A<>B AND A<>C AND A<>D). If any of the values are NULL the whole expression will become NULL.
So, applicable to your example the correct NOT IN expression should be (unless the ID is not nullable column):
SELECT ID FROM TableA
WHERE ID NOT IN (
SELECT ID FROM TableB WHERE ID IS NOT NULL
)

Returning only duplicate rows from two tables

Every thread I've seen so far has been to check for duplicate rows and avoiding them. I'm trying to get a query to only return the duplicate rows. I thought it would be as simple as a subquery, but I was wrong. Then I tried the following:
SELECT * FROM a
WHERE EXISTS
(
SELECT * FROM b
WHERE b.id = a.id
)
Was a bust too. How do I return only the duplicate rows? I'm currently going through two tables, but I'm afraid there are a large amount of duplicates.
use this query, maybe is better if you check the relevant column.
SELECT * FROM a
INTERSECT
SELECT * FROM b
I am sure your posted code would work too like
SELECT * FROM a
WHERE EXISTS
(
SELECT 1 FROM b WHERE id = a.id
)
You can as well do a INNER JOIN like
SELECT a.* FROM a
JOIN b on a.id = b.id;
You can as well use a IN operator saying
SELECT * FROM a where id in (select id from b);
If none of them, then you can use UNION if both table satisfies the union restriction along with ROW_NUMBER() function like
SELECT * FROM (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY id ORDER BY id) AS rn
FROM (
select * from a
union all
select * from b) xx ) yy
WHERE rn = 1;
Note: there's an ambiguity as to what you mean by a duplicate row, and whether you're talking about duplicate keys, or all fields being the same. My answer deals with all fields being the same; some of the others are assuming it's just the keys. It's unclear which you intend.
You might try
SELECT id, col1, col2 FROM a INNER JOIN b ON a.id = b.id
WHERE a.col1 = b.col1 AND a.col2 = b.col2
adding in other columns as necessary. The database engine should be intelligent enough to do the comparisons on the indexed columns first, so it'll be efficient as long as you don't have rows that are different only on lots of non-indexed fields. (If you do, then I don't think anything will do it particularly efficiently.)

Postgis/SQL Select tuples such that the first tuple item is unique and the items geometries intersect

This question is particularly for Postgres 9.4
Lets say I have two tables:
CREATE TABLE A(id INT);
CREATE TABLE B(id INT);
I'd like to have all tuples (A, B) with a certain condition such that
among selected tuples all have different A column:
SELECT DISTINCT ON (A.id) A.id, B.id WHERE condition(A,B);
However DISTINCT ON will perform sorting in memory after all the tuples have been selected and I will like to not select tuples with duplicate A.id at all.
How can this be done in an efficient way?
EDIT:
both A and B have unique ids
EDIT2:
Here is the complete setup:
CREATE EXTENSION postgis;
DROP TABLE A;
DROP TABLE B;
CREATE TABLE A(shape Geometry, id INT);
CREATE TABLE B(shape Geometry, id INT, kind INT);
CREATE INDEX ON A USING GIST (shape);`
I would like to do the following:
SELECT A.id, B.id FROM A, B
WHERE B.id = (SELECT B.id FROM B WHERE
ST_Intersects(A.shape, B.shape)
AND ST_Length(ST_Intersection(A.shape, B.shape)) / ST_Length(A.shape) >= 0.5 AND B.kind != 1 LIMIT 1)`
which works (I believe), however is not necessarily the most efficient way. The table A has orders of magnitude more rows than table B. So
I am not even sure if the GiST index is right.
I am also aware that the order of arguments in ST_Intersects can have a significant effect on run time. What should the correct order be?
If you want just one row for each "A", you can use a correlated subquery (or lateral join):
select a.id,
(select b.id
from b
where condition(a, b)
limit 1
) as b_id
from a;
This should stop testing for rows from b when the first one is found -- which I imagine is the best approach performance-wise.
If none are found, you will get a NULL value. You can wrap this in a subquery and filter out NULLs.
Try something like:
WITH distinct_a as (
SELECT DISTINCT a.id
FROM A)
SELECT A.id, B.id
FROM distinct_a, B
WHERE condition(A,B)
The CTE (WITH ...) will select all distinct values first. Then selected values will be used in the next query.

Join SQL query to get data from two tables

I'm a newbie, just learning SQL and have this question: I have two tables with the same columns. Some registers are in the two tables but others only are in one of the tables. To illustrate, suppose table A = (1,2,3,4), table B=(3,4,5,6), numbers are registers. I need to select all registers in table B if they are not in table A, that is result=(5,6). What query should I use? Maybe a join. Thanks.
You can either use a NOT IN query like this:
SELECT col from A where col not in (select col from B)
or use an outer join:
select A.col
from A LEFT OUTER JOIN B on A.col=B.col
where B.col is NULL
The first is easier to understand, but the second is easier to use with more tables in the query.
Select register from TABLE_B b
Where not exists (Select register from TABLE_A a where a.register = b.register)
I assumed you have a column named register in TABLE_A and TABLE_B