How to find different rows in two tables with same columns? - sql

I have two tables that have exact same set of columns. I'd like to select all rows that don't exactly match. Is there a way to do that without joining by every column or typing every column's name in any other way (I have a large number of them)?

If the number, type and order of columns are exactly the same, you can use the EXCEPT (or in some DBMS MINUS) operator to remove all rows from the first table, that match a row from the second table (by every column).
SELECT *
FROM table1
EXCEPT
SELECT *
FROM table2;
(Use EXCEPT ALL, if you don't want or need duplicate elimination. If you want also the result when the operands are interchanged, you can use UNION (or UNION ALL to union the results of a second EXCEPT operation. In doubt use parenthesis to prioritize the operations as needed.)

use minus
select * from tableA
minus
select * from tableB
If the query returns no rows then the data is exactly the same.

You could use JOIN by PK and compare all other columns using:
SELECT *
FROM src s
FULL OUTER JOIN trg t
ON s.id = t.id
WHERE NOT EXISTS (SELECT s.col1, s.col2, s.col3, s.col4
INTERSECT
SELECT t.col1, t.col2, t.col3, t.col4);
Please note that this approach allows to compare data side-by-side.
DBFiddle Demo
EDIT:
That still requires to explicitly mention every column? I'd rather not to.
Yes, but you could use drag and drop from object explorer(SSMS/TOAD/Oracle Developer) and avoid manually typing them.
There is SELECT * EXCEPT(only Google Big Query):
SELECT *
FROM src s
FULL OUTER JOIN trg t
ON s.id = t.id
WHERE NOT EXISTS (SELECT s.* EXCEPT s.id
INTERSECT
SELECT t.* EXCEPT t.id);

Related

Database Table Content Comparison

We Use SAP HANA as database.
How can I compare if two tables have the same content?
I already did a comparison of the primary key using SQL:
select COUNT (*) from Schema.table1;
select COUNT (*) from Schema.table2;
select COUNT (*)
from Schema.table1 p
join schema.table2 r
on p.keyPart1 = r.keyPart1
and p.keyPart2 = r.keyPart2
and p.keyPart3 = r.keypart3;
So I compared the rows of both tables and of the join. All row counts are the same.
But I still don't know if the content of all rows are exactly the same. It could be that one ore more cells of a non-key column is deviating.
I thought about putting all columns in the join Statement. But that did not feel right.
You might want to use except
SELECT * FROM A
EXCEPT
SELECT * FROM B;
SELECT * FROM B
EXCEPT
SELECT * FROM A;

SQL insert into, where not exists (select 1... what this "1" stands for?

INSERT INTO table1
SELECT * FROM table2
WHERE NOT EXISTS
(SELECT 1 FROM table1
WHERE table2.id = table1.id)
What is the role of that 1 in the forth line of code? I want to make an incremental update of table1 with records from table2. A friendly soul advised me to use the above query, which I find very common on the web in case of incremental update of a table. Can someone please explain how this mechanism works?
Exists checks for the presence of rows in the sub-select, not for the data returned by those rows.
So we are only interested if there is a row or not.
But as you can't have a select without selecting something, you need to put an expression into the select list.
That could be any expression. The actual expression is of no interest You could use select some_column or select * or select null or select 42 - that would all be the same.
You can select whatever in the case of EXISTS (sub-select, the only thing that matters are if a row is found (EXISTS true), or no rows found (EXISTS false).
The EXISTS keyword, as the name suggests, is used to determine whether or not any rows exist in a table that meet the specified condition. Since we only need to filter out those rows which meet the condition, but do not need to actually retrieve the values of individual columns, we use select 1 instead. For what it's worth, you can also write it as
INSERT INTO table1
SELECT * FROM table2
WHERE NOT EXISTS
(SELECT id FROM table1
WHERE table2.id = table1.id)
without affecting the filtering logic.

Different results from using IN and EXISTS

I have a query that has been giving me fits. Basically I want a left outer join, but without using a join.
I started off using IN and got back about 13,000 rows. If I use EXISTS, I then get about 11,000 rows. Even if I use GROUP BY to make sure duplicates aren't counted, there's still a difference.
Here's some code
This one with exists
SELECT upper(EMAIL_ADDRESS)
FROM DATA.CRM_CONTACTS
WHERE EXISTS
(
SELECT upper(Email_address)
FROM DATA.MMBI
WHERE DATA.CRM_CONTACTS.Email_address = DATA.MMBI.Email_Address
)
group by 1
order by 1
And this is code that uses IN:
SELECT upper(EMAIL_ADDRESS)
FROM DATA.CRM_CONTACTS
WHERE upper(EMAIL_ADDRESS) IN
(
SELECT upper(Email_address)
FROM DATA.MMBI
)
group by 1
order by 1
Is there any reason that would explain why I'm getting different results?
Assuming that you're using SQL Server:
In your in case, you're making a case-insensitive comparison, uppercasing both values to be compared:
WHERE upper(EMAIL_ADDRESS) IN ( SELECT upper(Email_address)
FROM DATA.MMBI
)
In your exists case, your join criteria for the correlated subquery is this
WHERE DATA.CRM_CONTACTS.Email_address = DATA.MMBI.Email_Address
Which means it's going to use the collation in play to make the comparison, which might be case-sensitive.

SQL: how do you look for missing ids?

Suppose I have a table with lots of rows identified by a unique ID. Now I have a (rather large) user-input list of ids (not a table) that I want to check are already in the database.
So I want to output the ids that are in my list, but not in the table. How do I do that with SQL?
EDIT: I know I can do that with a temporary table, but I'd really like to avoid that if possible.
EDIT: Same comment for using an external programming language.
Try with this:
SELECT t1.id FROM your_list t1
LEFT JOIN your_table t2
ON t1.id = t2.id
WHERE t2.id IS NULL
It is hardly possible to make a single pure and general SQL query for your task, since it requires to work with a list (which is not a relational concept and standard set of list operations is too limited). For some DBMSs it is possible to write a single SQL query, but it will utilize SQL dialect of the DBMS and will be specific to the DBMS.
You haven't mentioned:
which RDBMS will be used;
what is the source of the IDs.
So I will consider PostgreSQL is used, and IDs to be checked are loaded into a (temporary) table.
Consider the following:
CREATE TABLE test (id integer, value char(1));
INSERT INTO test VALUES (1,'1'), (2,'2'), (3,'3');
CREATE TABLE temp_table (id integer);
INSERT INTO temp_table VALUES (1),(5),(10);
You can get your results like this:
SELECT * FROM temp_table WHERE NOT EXISTS (
SELECT id FROM test WHERE id = temp_table.id);
or
SELECT * FROM temp_table WHERE id NOT IN (SELECT id FROM test);
or
SELECT * FROM temp_table LEFT JOIN test USING (id) WHERE test.id IS NULL;
You can pick any option, depending on your volumes you may have different performance.
Just a note: some RDBMS may have limitation on the number of expressions specified literally inside IN() construct, keep this in mind (I hit this several times with ORACLE).
EDIT: In order to match constraints of no temp tables and no external languages you can use the following construct:
SELECT DISTINCT b.id
FROM test a RIGHT JOIN (
SELECT 1 id UNION ALL
SELECT 5 UNION ALL
SELECT 10) b ON a.id=b.id
WHERE a.id IS NULL;
Unfortunately, you'll have to generate lot's of SELECT x UNION ALL entries to make a single-column and many-rows table here. I use UNION ALL to avoid unnecessary sorting step.

SQL select from data in query where this data is not already in the database?

I want to check my database for records that I already have recorded before making a web service call.
Here is what I imagine the query to look like, I just can't seem to figure out the syntax.
SELECT *
FROM (1,2,3,4) as temp_table
WHERE temp_table.id
LEFT JOIN table ON id IS NULL
Is there a way to do this? What is a query like this called?
I want to pass in a list of id's to mysql and i want it to spit out the id's that are not already in the database?
Use:
SELECT x.id
FROM (SELECT #param_1 AS id
FROM DUAL
UNION ALL
SELECT #param_2
FROM DUAL
UNION ALL
SELECT #param_3
FROM DUAL
UNION ALL
SELECT #param_4
FROM DUAL) x
LEFT JOIN TABLE t ON t.id = x.id
WHERE x.id IS NULL
If you need to support a varying number of parameters, you can either use:
a temporary table to populate & join to
MySQL's Prepared Statements to dynamically construct the UNION ALL statement
To confirm I've understood correctly, you want to pass in a list of numbers and see which of those numbers isn't present in the existing table? In effect:
SELECT Item
FROM IDList I
LEFT JOIN TABLE T ON I.Item=T.ID
WHERE T.ID IS NULL
You look like you're OK with building this query on the fly, in which case you can do this with a numbers / tally table by changing the above into
SELECT Number
FROM (SELECT Number FROM Numbers WHERE Number IN (1,2,3,4)) I
LEFT JOIN TABLE T ON I.Number=T.ID
WHERE T.ID IS NULL
This is relatively prone to SQL Injection attacks though because of the way the query is being built. It'd be better if you could pass in '1,2,3,4' as a string and split it into sections to generate your numbers list to join against in a safer way - for an example of how to do that, see http://www.sqlteam.com/article/parsing-csv-values-into-multiple-rows
All of this presumes you've got a numbers / tally table in your database, but they're sufficiently useful in general that I'd strongly recommend you do.
SELECT * FROM table where id NOT IN (1,2,3,4)
I would probably just do:
SELECT id
FROM table
WHERE id IN (1,2,3,4);
And then process the list of results, removing any returned by the query from your list of "records to submit".
How about a nested query? This may work. If not, it may get you in the right direction.
SELECT * FROM table WHERE id NOT IN (
SELECT id FROM table WHERE 1
);