Three-Way Diff in SQL - sql

I have three SQL tables (A, B, and C), representing three different version of a dataset. I want to devise an SQL query whose effect is to extract the ids of rows/tuples whose values are different in all three tables. (Two records are different if there exists a field where the records do not share the same value.) For simplicity, let's just assume that A, B, and C each have N records with record ids ranging from 1 to N, so for every id from 1 to N, there is a record in each table with that ID.
What might be the most efficient way to do this in SQL? One way would be to do something like
(SELECT id FROM
((SELECT * FROM A EXCEPT SELECT * FROM B) EXCEPT (SELECT * FROM C)) result)
EXCEPT
(SELECT id FROM
((SELECT * FROM A) INTERSECT (SELECT * FROM B)) result2)
Basically what I've done above is first found the ids of records where the version in A differs from the version of B and from the version in C (in the first two lines of the SQL query I've written). What's left is to filter out the ids of record where the version in B matches the version in C (which is done in the last two lines). But this seems horribly inefficient; is there a better, more concise way?
Note: I'm using PostgreSQL syntax here.

I would do it like this:
select id,
a.id is null as "missing in a",
b.id is null as "missing in b",
c.id is null as "missing in c",
a is distinct from b as "a and b different",
a is distinct from c as "a and c different",
b is distinct from c as "b and c different"
from a
full join b using (id)
full join c using (id)
where a is distinct from b
or b is distinct from c
or a is distinct from c
The id column is assumed to be a primary (or unique) key.
Online example

You can use the group by and having as follows:
select id from
(select * from A
union select * from B
union select * from C)
group by id
-- use characters that you know will not appear in this columns for concat
having count(distinct column1 || '#~#' || column2 || '#~#' || column3) = 3

Related

SQL query to append values not contained in second table

I have table A and table B with different number of columns but both containing a column with IDs. Table A contains more complete list of IDs and table B contains some of the IDs from the table A.
I would like to return resulting table B with original information plus appended IDs that are missing in B but contained in A. For these appended rows, other columns should be blank while column with IDs in B should just contain missing ID values.
Simple solution UNION ALL, with NOT EXISTS:
select b.id, b.c1, ..., b.cn
from b
UNION ALL
select distinct a.id, null, ..., null -- should be same number of columns as in the above select
from a
where not exists (select 1 from b where b.id = a.id)
I think you described left join:
select *
from b left join
a
using (id)

Value present in more than one table

I have 3 tables. All of them have a column - id. I want to find if there is any value that is common across the tables. Assuming that the tables are named a.b and c, if id value 3 is present is a and b, there is a problem. The query can/should exit at the first such occurrence. There is no need to probe further. What I have now is something like
( select id from a intersect select id from b )
union
( select id from b intersect select id from c )
union
( select id from a intersect select id from c )
Obviously, this is not very efficient. Database is PostgreSQL, version 9.0
id is not unique in the individual tables. It is OK to have duplicates in the same table. But if a value is present in just 2 of the 3 tables, that also needs to be flagged and there is no need to check for existence in he third table, or check if there are more such values. One value, present in more than one table, and I can stop.
Although id is not unique within any given table, it should be unique across the tables; a union of distinct id should be unique, so:
select id from (
select distinct id from a
union all
select distinct id from b
union all
select distinct id from c) x
group by id
having count(*) > 1
Note the use of union all, which preserves duplicates (plain union removes duplicates).
I would suggest a simple join:
select a.id
from a join
b
on a.id = b.id join
c
on a.id = c.id
limit 1;
If you have a query that uses union or group by (or order by, but that is not relevant here), then you need to process all the data before returning a single row. A join can start returning rows as soon as the first values are found.
An alternative, but similar method is:
select a.id
from a
where exists (select 1 from b where a.id = b.id) and
exists (select 1 from c where a.id = c.id);
If a is the smallest table and id is indexes in b and c, then this could be quite fast.
Try this
select id from
(
select distinct id, 1 as t from a
union all
select distinct id, 2 as t from b
union all
select distinct id, 3 as t from c
) as t
group by id having count(t)=3
It is OK to have duplicates in the same table.
The query can/should exit at the first such occurrence.
SELECT 'OMG!' AS danger_bill_robinson
WHERE EXISTS (SELECT 1
FROM a,b,c -- maybe there is a place for old-style joins ...
WHERE a.id = b.id
OR a.id = c.id
OR c.id = b.id
);
Update: it appears the optimiser does not like carthesian joins with 3 OR conditions. The below query is a bit faster:
SELECT 'WTF!' AS danger_bill_robinson
WHERE exists (select 1 from a JOIN b USING (id))
OR exists (select 1 from a JOIN c USING (id))
OR exists (select 1 from c JOIN b USING (id))
;

Oracle SQL to subtract 2 values from different table joins

I am trying to subtract sequences MN_SEQ from Table C generated based on join with other tables.
Here is the problem.
Query 1 -
Select M_Seq from Table C, Table A, Table B where C.date_sk=A.MTH_END_DT
and B.Loan_seq=A.Loan_seq
Query 2 -
Select M_Seq from Table C, Table B where C.date_sk=B.ORIG_DT
I have to get difference between 2 M_SEQ generated from the result set of query 1 and Query 2.
Below is what i tried, but I am getting error.
select mn_seq -mn_seq from
((select mn_seq from Table C, Table A, Table B where B.MTH_END_DT=C.DATE_SK and B.LOAN_SEQ=A.LOAN_SEQ)a,
(select mn_seq from Table C , Table B where B.ORIG_DT=C.DATE_SK
)b)
T
Kindly provide inputs . I am not sure if this is the right way to do it. I tried just using "-" between queries but didnt work. Thanks!
Try this..
SELECT (SELECT mn_seq
FROM TABLE c, TABLE a, TABLE b
WHERE b.mth_end_dt = c.date_sk
AND b.loan_seq = a.loan_seq) -
(SELECT mn_seq FROM TABLE c, TABLE b WHERE b.orig_dt = c.date_sk)
FROM dual
I assume both the mn_seq are NUMBER and also your WHERE clause returns only one record in each of the inner queries.

Issues with SQL Select utilizing Except and UNION All

Select *
From (
Select a
Except
Select b
) x
UNION ALL
Select *
From (
Select b
Except
Select a
) y
This sql statement returns an extremely wrong amount of data. If Select a returns a million, how does this entire statement return 100,000? In this instance, Select b contains mutually exclusive data, so there should be no elimination due to the except.
As already stated in the comment, EXCEPT does an implicit DISTINCT, according to this and the ALL in your UNION ALL cannot re-create the duplicates. Hence you cannot use your approach if you want to keep duplicates.
As you want to get the data that is contained in exactly one of the tables a and b, but not in both, a more efficient way to achieve that would be the following (I am just assuming the tables have columns id and c where id is the primary key, as you did not state any column names):
SELECT CASE WHEN a.id IS NULL THEN 'from b' ELSE 'from a' END as source_table
,coalesce(a.id, b.id) as id
,coalesce(a.c, b.c) as c
FROM a
FULL OUTER JOIN b ON a.id = b.id AND a.c = b.c -- use all columns of both tables here!
WHERE a.id IS NULL OR b.id IS NULL
This makes use of a FULL OUTER JOIN, excluding the matching records via the WHERE conditions, as the primary key cannot be null except if it comes from the OUTER side.
If your tables do not have primary keys - which is bad practice anyway - you would have to check across all columns for NULL, not just the one primary key column.
And if you have records completely consisting of NULLs, this method would not work.
Then you could use an approach similar to your original one, just using
SELECT ...
FROM a
WHERE NOT EXISTS (SELECT 1 FROM b WHERE <join by all columns>)
UNION ALL
SELECT ...
FROM b
WHERE NOT EXISTS (SELECT 1 FROM a WHERE <join by all columns>)
If you're trying to get any data that is in one table and not in the other regardless of which table, I would try something like the following:
select id, 'table a data not in b' from a where id not in (select id from b)
union
select id, 'table b data not in a' from b where id not in (select id from a)

Is there alternative way to write this query?

I have tables A, B, C, where A represents items which can have zero or more sub-items stored in C. B table only has 2 foreign keys to connect A and C.
I have this sql query:
select * from A
where not exists (select * from B natural join C where B.id = A.id and C.value > 10);
Which says: "Give me every item from table A where all sub-items have value less than 10.
Is there a way to optimize this? And is there a way to write this not using exists operator?
There are three commonly used ways to test if a value is in one table but not another:
NOT EXISTS
NOT IN
LEFT JOIN ... WHERE ... IS NULL
You have already shown code for the first. Here is the second:
SELECT *
FROM A
WHERE id NOT IN (
SELECT b.id
FROM B
NATURAL JOIN C
WHERE C.value > 10
)
And with a left join:
SELECT *
FROM A
LEFT JOIN (
SELECT b.id
FROM B
NATURAL JOIN C
WHERE C.value > 10
) BC
ON A.id = BC.id
WHERE BC.id IS NULL
Depending on the database type and version, the three different methods can result in different query plans with different performance characteristics.