How to Identify matching records in two tables? - sql

I have two tables with same column names. There are a total 40 columns in each table. Both the tables have same unique IDs. If I perform an inner join on the ID columns I get a match on 80% of the data. However, I would like to see if this match has exactly same data in each of the columns.
If there were a few rows like say 50-100 I could have performed a simple union operation ordered by ID and manually checked for the data. But both the tables contain more than 5000 records.
Is a join on each of the columns a valid solution for this or do I need to perform concatenation?

Suppose you have N columns, you can add GROUP BY COL1,COL2,....COLN
select * from table1
union all
select * from table2
group by COL1, COL2, ... , COLN
having count(*)>1;
Reference: link

Related

How to merge data of two tables with different column name in Big Query

How can I get final output based on table 1 and table 2 in Big Query
Table 1
Table 2
Final Output
You can use union all. If the columns are in the same order:
select *
from table1
union all
select *
from table2;
In general, though, it is better to list out the column names instead of using *. Note that in the result set, the names from the first select are used for the result set.

INTERSECT and UNION giving different counts of duplicate rows

I have two tables A and B with same column names. I have to combine them into table C
when I am running following query, the count is not matching -
select * into C
from
(
select * from A
union
select * from B
)X
The record count of C is not matching with A and B. There is difference of 89 rows. So I figured out that there are duplicates.
I used following query to find duplicates -
select * from A
INTERSECT
select * from B
-- 80 rows returned
Can anybody tell me why intersect returns 80 dups whereas count difference on using union is 89 ?
There are probably duplicates inside of A and/or B as well. All set operators perform an implicit DISTINCT on the result (logically, not necessarily physically).
Duplicate rows are usually a data-quality issue or an outright bug. I usually mitigate this risk by adding unique indexes on all columns and column sets that are supposed to be unique. I especially make sure that every table has a primary key if that is at all possible.

SQL queries producing unexpected results

I've got a strange situation with two SQL queries that aren't producing the expected results. Here are the queries:
Query 1:
SELECT DISTINCT SomeCharValue
FROM Table1
JOIN Table2
ON Table1.SomeCharValue = Table2.SomeCharValue
ORDER BY SomeCharValue
Query 2:
SELECT DISTINCT SomeCharValue
FROM Table1
JOIN Table2
ON Table1.SomeCharValue <> Table2.SomeCharValue
ORDER BY SomeCharValue
I have two tables with columns of varchar(15). Table2 is essentially a small subset of the values in Table1, thus Table1 has all values stored in Table2. The problem is, the two queries should never produce the same results, yet they do. Both queries will produce the same result for certain values; for example, if Table1 and Table2 contain the word 'hello', then Query 1 should return it, while Query 2 should not. However, BOTH queries return 'hello'. It doesn't make sense that 'hello' in both tables is equal and not equal at the same time. I ran a length query to test the values, and some were a different size with trailing white spaces, but even after changing these to be an exact match, and verifying the hexadecimal value of the characters to be the same, the same results occur. I can't compare numeric key fields since there is no key relationship between these tables. I can only compare the exact character values in the columns. Any ideas?
Imagine you have table1 containing a and b as separate rows, and table2 has the exact same contents.
Now for your second query, table1's row a will be compared to both the rows in table2. It will pass the ON clause when comparing to row b in table2, and hence a will be in your result set. Similarly for the b row in table1 which will pass the ON clause when compared to the a row in table2.
You could rewrite the query as
SELECT DISTINCT SomeCharValue
FROM TABLE1
WHERE SomeCharValue NOT IN (SELECT DISTINCT SomeCharValue FROM Table2)
ORDER BY SomeCharValue
Did you try to use NOT LIKE instead of <>

Proper way of querying table columns in SQL?

I have about 6 tables where some of the columns are identical. Do I have to know which tables contain the column I'm querying on or is there a way to write an SQL query such that I can reference a column and the database will scan the tables looking for a specific column?
For example, assume table1, table3, and table5 all contain the column 'Population'. Do I have to specify in my query that I want to retrieve information from 'Population' in tables table1, table2, and table3, or can is there a way to only specify that I want information from the 'Population' column without specifying any tables?
select table1.population as pop1, table2.population as pop2, table5.population as pop3
from table1, table2, table5;
This will return 3 columns showing the population from each table.
select population
from table1
union
select population
from table2
union
select population
from table5;
This will return a long list of populations in one column.

How to compare two tables each having 500 columns using PL-SQL

I need to compare two tables in different databases and check whether the data in both tables are matching or not.
The compare should return a result showing rows that don't match using an exact column to column data check.
Is this possible in PL-SQL?
To return all rows in table1 that do not match exactly the rows in table2:
select * from table1 except select * from table2
And to return all rows in table1 that match exactly what is in table2:
select * from table1 intersect select * from table2