Find unmatched rows between two tables - sql

Given this setup:
CREATE TABLE table1 (column1 text, column2 text);
CREATE TABLE table2 (column1 text, column2 text);
INSERT INTO table1 VALUES
('A', 'A')
, ('B', 'N')
, ('C', 'C')
, ('B', 'A');
INSERT INTO table2 VALUES
('A', 'A')
, ('B', 'N')
, ('C', 'X')
, ('B', 'Y');
How can I find missing combinations of (column1, column2) between these two tables? Rows not matched in the other table.
The desired result for the given example would be:
C | C
B | A
C | X
B | Y
There can be duplicate entries so we'd want to omit those.

One method is union all:
select t1.col1, t1.col2
from t1
where (t1.col1, t1.col2) not in (select t2.col1, t2.col2 from t2)
union all
select t2.col1, t2.col2
from t2
where (t2.col1, t2.col2) not in (select t1.col1, t1.col2 from t1);
If there are duplicates within a table, you can remove them by using select distinct. There is no danger of duplicates between the tables.

Seems to be a perfect task for set operations:
( --all rows from table 1 missing in table 2
select *
from table1
except
select *
from table2
)
union all -- both select return distinct rows
( -- all rows in table 2 missing in table 1
select *
from table2
except
select *
from table1
)

You can try to use not exists with a subquery, then use UNION ALL
select Column1,Column2 from table1 t1
where NOT exists
(
select 1
FROM table2 t2
where t1.Column1 = t2.Column1 or t1.Column2 = t2.Column2
)
UNION ALL
select Column1,Column2 from table2 t1
where NOT exists
(
select 1
FROM table1 t2
where t1.Column1 = t2.Column1 or t1.Column2 = t2.Column2
)

You can try set operations. EXCEPT to find the rows in table but not in the other and UNION to put the partial results into one.
(SELECT column1,
column2
FROM table1
EXCEPT
SELECT column1,
column2
FROM table2)
UNION
(SELECT column1,
column2
FROM table2
EXCEPT
SELECT column1,
column2
FROM table1);
If you don't need duplicate elimination you can try to use the ALL variants (EXCEPT ALL and UNION ALL). They are generally faster, as the DBMS doesn't have to look for and eliminate duplicates.

The devil is in the details with this seemingly simple task.
Short and among the fastest:
SELECT col1, col2
FROM (SELECT col1, col2, TRUE AS x1 FROM t1) t1
FULL JOIN (SELECT col1, col2, TRUE AS x2 FROM t2) t2 USING (col1, col2)
WHERE (x1 AND x2) IS NULL;
The FULL [OUTER] JOIN includes all rows from both sides, but fills in NULL values for columns of missing rows. The WHERE conditions (x1 AND x2) IS NULL identifies these unmatched rows. Equivalent: WHERE x1 IS NULL OR x2 IS NULL.
To eliminate duplicate pairs, add DISTINCT (or GROUP BY) at the end - cheaper for few dupes:
SELECT DISTINCT col1, col2
FROM ...
If you have many dupes on either side, it's cheaper to fold before the join:
SELECT col1, col2
FROM (SELECT DISTINCT col1, col2, TRUE AS x1 FROM t1) t1
FULL JOIN (SELECT DISTINCT col1, col2, TRUE AS x2 FROM t2) t2 USING (col1, col2)
WHERE (x1 AND x2) IS NULL;
It's more complicated if there can be NULL values. DISTINCT / DISTINCT ON or GROUP BY treat them as equal (so dupes with NULL values are folded in the subqueries above). But JOIN or WHERE conditions must evaluate to TRUE for rows to pass. NULL values are not considered equal in this, the FULL [OUTER] JOIN never finds a match for pairs containing NULL. This may or may not be desirable. You just have to be aware of the difference and define your requirements.
Consider the added demo in the SQL Fiddle
If there are no NULL values, no duplicates, but an additional column defined NOT NULL in each table, like the primary key, let's name each id, then it can be as simple as:
SELECT col1, col2
FROM t1
FULL JOIN t2 USING (col1, col2)
WHERE t1.id IS NULL OR t2.id IS NULL;
Related:
Select rows which are not present in other table
PostgreSQL - Create table as select with distinct on specific columns

Related

Compare a two column with the same table to remove duplicate

A sample table with two column and I need to compare the column 1 and column 2 to the same table records and need to remove the column 1 + column 2 = column 2+column 1.
I tried to do self join and case condition. But its not working
If I understand correctly, you can run a simple select like this if you have all reversed pairs in the table:
select col1, col2
from t
where col1 < col2;
If you have some singletons, then:
select col1, col2
from t
where col1 < col2 or
(col1 > col2 and
not exists (select 1
from t t2
where t2.col1 = t.col2 and
t2.col2 = t.col1
)
);
You can use the except operator.
"EXCEPT returns distinct rows from the left input query that aren't output by the right input query."
SELECT C1, C2 FROM table
Except
SELECT C2, C1 FROM table
Example with your given data set : dbfiddle
I am posting the answer based on oracle database and also the columns are string/varchar:
delete from table where rowid in (
select rowid from table
where column1 || column2 =column2 || column1 )
Feel free to provide more input and we can tweak the answer.
Okay. There might be a simpler way of doing this but this might work as well. {table} is to be replaced with your table name.
;with orderedtable as (select t1.col1, t1.col2, ROW_NUMBER() OVER(ORDER BY t1.col1, t1.col2 ASC) AS rownum
from (select distinct t2.col1, t2.col2 from {table} t2) as t1)
select f1.col1, f1.col2
from orderedtable f1
left join orderedtable f2 on f1.col1 = f2.col2 and f1.col2 = f2.col1 and f1.rownum < f2.rownum
where f2.rownum is null
The SQL below will get the reversed col1 and col2 rows:
select
distinct t2.col1,t1.col2
from
table t1
join
table t2 on t1.col1 = t2.col2 and t1.col2 = t2.col1
And when we get these reversed rows, we can except them with the left join clause, the complete SQL is:
select
t.col1,t.col2
from
table t
left join
(
select
distinct t2.col1,t1.col2
from
table t1
join
table t2 on t1.col1 = t2.col2 and t1.col2 = t2.col1
) tmp on t.col1 = tmp.col1 and t.col2 = tmp.col2
where
tmp.col1 is null
Is it clear?

efficient way to compare two tables in bigquery

I am interested in comparing, whether two tables contain the same data.
I could do it like this:
#standardSQL
SELECT
key1, key2
FROM
(
SELECT
table1.key1,
table1.key2,
table1.column1 - table2.column1 as col1,
table1.col2 - table2.col2 as col2
FROM
`table1` AS table1
LEFT JOIN
`table2` AS table2
ON
table1.key1 = table2.key1
AND
table1.key2 = table2.key2
)
WHERE
col1 != 0
OR
col2 != 0
But when I want to compare all numerical columns, this is kind of hard, especially if I want to do it for multiple table combinations.
Therefore my question: Is someone aware of a possibility to iterate over all numerical columns and restrict the result set to those keys where any of these differences where not zero?
In Standard SQL, we found using a UNION ALL of two EXCEPT DISTINCT's works for our use cases:
(
SELECT * FROM table1
EXCEPT DISTINCT
SELECT * from table2
)
UNION ALL
(
SELECT * FROM table2
EXCEPT DISTINCT
SELECT * from table1
)
This will produce differences in both directions:
rows in table1 that are not in table2
rows in table2 that are not in table1
Notes and caveats:
table1 and table2 must be of the same width and have columns in the same order and type.
this does not work directly with STRUCT or ARRAY data types. You should either UNNEST, or use TO_JSON_STRING to convert the these data types first.
this does not directly work with GEOGRAPHY either, you must cast to text first using ST_AsText
First, I want to bring up issues with your original query
The main issues are 1) using LEFT JOIN ; 2) using col != 0
Below is how it should be modified to really capture ALL differences from both tables
Run your original query and below one - and hopefully you will see the difference
#standardSQL
SELECT key1, key2
FROM
(
SELECT
IFNULL(table1.key1, table2.key1) key1,
IFNULL(table1.key2, table2.key2) key2,
table1.column1 - table2.column1 AS col1,
table1.col2 - table2.col2 AS col2
FROM `table1` AS table1
FULL OUTER JOIN `table2` AS table2
ON table1.key1 = table2.key1
AND table1.key2 = table2.key2
)
WHERE IFNULL(col1, 1) != 0
OR IFNULL(col2, 1) != 0
or you can just try to run your original and above version against dummy data to see the difference
#standardSQL
WITH `table1` AS (
SELECT 1 key1, 1 key2, 1 column1, 2 col2 UNION ALL
SELECT 2, 2, 3, 4 UNION ALL
SELECT 3, 3, 5, 6
), `table2` AS (
SELECT 1 key1, 1 key2, 1 column1, 29 col2 UNION ALL
SELECT 2, 2, 3, 4 UNION ALL
SELECT 4, 4, 7, 8
)
SELECT key1, key2
FROM
(
SELECT
IFNULL(table1.key1, table2.key1) key1,
IFNULL(table1.key2, table2.key2) key2,
table1.column1 - table2.column1 AS col1,
table1.col2 - table2.col2 AS col2
FROM `table1` AS table1
FULL OUTER JOIN `table2` AS table2
ON table1.key1 = table2.key1
AND table1.key2 = table2.key2
)
WHERE IFNULL(col1, 1) != 0
OR IFNULL(col2, 1) != 0
Secondly, below will highly simplify your overall query
#standardSQL
SELECT
IFNULL(table1.key1, table2.key1) key1,
IFNULL(table1.key2, table2.key2) key2
FROM `table1` AS table1
FULL OUTER JOIN `table2` AS table2
ON table1.key1 = table2.key1
AND table1.key2 = table2.key2
WHERE TO_JSON_STRING(table1) != TO_JSON_STRING(table2)
You can test it with the same dummy data example as above
Note: in this solution you don't need to pick specific columns - it just compare all columns! but if you need to compare only specific columns - you still will need to cherry-pick them like in below example
#standardSQL
SELECT
IFNULL(table1.key1, table2.key1) key1,
IFNULL(table1.key2, table2.key2) key2
FROM `table1` AS table1
FULL OUTER JOIN `table2` AS table2
ON table1.key1 = table2.key1
AND table1.key2 = table2.key2
WHERE TO_JSON_STRING((table1.column1, table1.col2)) != TO_JSON_STRING((table2.column1, table2.col2))
You will need to specify which are the numerical columns, but looking at a representation of all of them will do the fast compare:
#standardSQL
WITH table_a AS (
SELECT 1 id, 2 n1, 3 n2
), table_b AS (
SELECT 1 id, 2 n1, 4 n2
)
SELECT id
FROM table_a a
JOIN table_b b
USING(id)
WHERE TO_JSON_STRING([a.n1, a.n2]) != TO_JSON_STRING([b.n1, b.n2])

SQL using minus function to eliminate rows from specific columns but returning all columns from table

I'm trying to return all columns in a database where certain rows within certain columns have been eliminate. Is there any possible way to do this? I tried using code like this but I'm unsure what I am missing to make this work
Select * from table1
where (select column1 from table1
minus select column1 from table2);
You can do this with a WHERE NOT EXISTS:
Select T1.*
From Table1 T1
Where Not Exists
(
Select *
From Table2 T2
Where T2.Column1 = T1.Column1
)
Alternatively, you could use a LEFT OUTER JOIN:
Select T1.*
From Table1 T1
Left Join Table2 T2 On T2.Column1 = T1.Column1
Where T2.Column1 Is Null
Or even a WHERE NOT IN:
Select *
From Table1
Where Column1 Not In
(
Select Column1
From Table2
)
I would recommend the WHERE NOT EXISTS approach, but to fix the query you have in the question, you just need to add a WHERE IN:
Select *
From Table1
Where Column1 In
(
Select Column1
From Table1
Minus
Select Column1
From Table2
)
Try this:
select * from table1 where column1 in
(select column1 from table1
minus
select column1 from table2);

SQL to get the common rows from two tables

I have two tables T1 and T2.
Can any one please help with a SQL query which will fetch the common rows from these two tables? (Assume T1 and T2 has 100 columns each)
P.S : I guess INNER JOIN on each of the columns will not be a good idea.
Thanks
If you are using SQL Server 2005, then you can use Intersect Key word, which gives you common records.
SELECT column1
FROM table1
INTERSECT
SELECT column1
FROM table2
If you want in the output both column1 and column2 from table1 which has common columns1 in both tables.
SELECT column1, column2
FROM table1
WHERE column1 IN
(
SELECT column1
FROM table1
INTERSECT
SELECT column1
FROM table2
)
Use INTERSECT
SELECT * FROM T1
INTERSECT
SELECT * FROM T2
Yes, INNER JOIN will work.
eg. SELECT (column_1, column_2, ... column_n) FROM T1 JOIN T2 ON (condition) WHERE (condition)
This query will fetch the common records (intersection of) in both the tables according to ON condition.
select
t1.*
from
t1, t2
where
(
(t1.col1 is null and t2.col1 is null) or (
(t1.col1 = t2.col1 )
) and
(
(t1.col2 is null and t2.col2 is null) or (
(t1.col2 = t2.col2 )
) and
(
(t1.col3 is null and t2.col3 is null) or (
(t1.col3 = t2.col3 )
) and
....
(
(t1.col100 is null and t2.col100 is null) or (
(t1.col100 = t2.col100 )
)
SELECT NAME FROM Sample1
UNION
SELECT NAME FROM Sample2;
EX: Table Sample1
ID NAME
-------
1 A
-------
2 B
-------
Table Sample 2
ID NAME
--------
1 C
--------
2 B
------
Output
NAME
----
A
---
B
---
C
---

SQL Server Query to find different names in two tables

I have a situation here.
I have two tables:
I need a sql query which will print the Col names which are different in two tables.
For example, in this case the query should print the result as:
The reason is clear that m is present in Table-1 but not present in Table-2. Similar is the case with z which is in Table-2 but not in Table-1.
I am really stcuk here, please help.
The colum names are not case-sensitive.
Thanks.
You could also use NOT EXISTS to get the result:
select col1
from table1 t1
where not exists (select 1
from table2 t2
where t1.col1 = t2.col1)
union all
select col1
from table2 t2
where not exists (select 1
from table1 t1
where t1.col1 = t2.col1);
See SQL Fiddle with Demo
Or even NOT IN:
select col1
from table1 t1
where col1 not in (select col1
from table2 t2)
union all
select col1
from table2 t2
where col1 not in (select col1
from table1 t1);
See SQL Fiddle with Demo
Try:
select coalesce(t1.Col1, t2.Col1)
from [Table-1] t1
full outer join [Table-2] t2 on t1.Col1 = t2.Col1
where t1.Col1 is null or t2.Col1 is null
SQLFiddle here.
Alternatively:
select Col1 from
(select Col1 from [Table-1] union all select Col1 from [Table-2]) sq
group by Col1 having count(*) = 1
SQLFiddle here.
I Think simplest one is this
SELECT COL1 AS ResultCol FROM TABLE1 where COL1 not in (select COL2 from TABLE2) UNION SELECT COL2 AS ResultCol FROM TABLE2 where COL2 not in (select COL1 from table1)
declare #tab1 table(id int,col1 varchar(1))
declare #tab2 table(id int,col1 varchar(1))
INSERT INTO #tab1
([id], [Col1])
VALUES
(1, 'A'),
(2, 'B'),
(3, 'm'),
(4, 'c')
INSERT INTO #tab2
([id], [Col1])
VALUES
(1, 'A'),
(2, 'B'),
(3, 'C'),
(4, 'z')
select b.id,b.col1 from
(
select a.id,a.col1,b.col1 x from #tab1 a left join #tab2 b on a.col1 = b.col1
union
select b.id,b.col1,a.col1 x from #tab1 a right join #tab2 b on a.col1 = b.col1
) b
where b.x is null
There's a feature specifically for this operation. EXCEPT and INTERCEPT.
Find which values (single column result or multi-column result) are not present in the following queries
--What's in table A that isn't in table B
SELECT col1 FROM TableA
EXCEPT
SELECT col1 FROM TableB
--What's in table B that isn't in table A
SELECT col1 FROM TableB
EXCEPT
SELECT col1 FROM TableA
Likewise, the INTERCEPT keyword tells you what is shared
--What's in table A and table B
SELECT col1 FROM TableA
INTERCEPT
SELECT col1 FROM TableB
You can also use FULL OUTER JOIN operator.
Visual Representation of SQL Joins
SELECT ROW_NUMBER() OVER(ORDER BY COALESCE(t1.Col1, t2.Col1)) AS id,
COALESCE(t1.Col1, t2.Col1) AS ResultCol
FROM Table1 t1 FULL JOIN Table2 t2 ON t1.Col1 = t2.Col1
WHERE t1.Col1 IS NULL OR t2.Col1 IS NULL
See example on SQLFiddle