SQL - Compare all column's data from two tables efficiently - sql

I have two tables say Table1 and Table2 with same columns and same schema structure.
Now I want to select data from Table1 which is present in Table2. However, when comparing data, I want to compare all the columns present in both these table. Like, entire record in Table1 should be present in Table2. What is the fastest and most efficient way to achieve this in SQL Server 2008? Both the tables contain around 1000-2000 records and these tables will get accessed very frequently.

The intersect operator does just that:
SELECT *
FROM table1
INTERSECT
SELECT *
FROM table2

With an " exists", you have a solution :
select * from Table1 t1 where exists (select 1 from Table2 t2
where t1.col1 = t2.col1
and t1.col2 = t2.col2
and t1.col3 = t2.col3
and ... // check here all columns ...
)
There is however a little problem in this solution in the case of null values, which can only be tested via a "IS NOT NULL" or "IS NULL", hence the complementary solution:
select * from Table1 t1 where exists (select 1 from Table2 t2
where (t1.col1 = t2.col1 or (t1.col1 IS NULL and t2.col1 IS NULL))
and (t1.col2 = t2.col2 or (t1.col2 IS NULL and t2.col2 IS NULL))
and (t1.col3 = t2.col3 or (t1.col3 IS NULL and t2.col3 IS NULL))
and ... // check here all columns ...
)

Related

Compare Two SQL identical Table to find missing records

I am working on SQL 2008. I have two identical tables with same column names.
On Table2, i am missing some records. Some records got deleted in the Table2.
I have to compare Table1 and Table2 and retrieve only missing records from table1.
Use a LEFT JOIN and check for IS NULL like below. where t2.col2 is null will be TRUE only when there are records in table1 which are not present in table2. Which is what you are looking for. [This is a sample code and have no resemblance with your original query]
select t1.col1,t1.col2,t1.col3
from table1 t1
left join table2 t2 on t1.some_column = t2.some_column
where t2.col2 is null
You should use SQL Except. There is no Join involved.
Select * from dbo.TableA
Except
Select * from dbo.TableB
In set theory, the difference of sets A, B (A-B) is the set of elements that belong to A and do not belong to B.
With an " not exists", you have a solution :
select * from Table1 t1 where not exists (select 1 from Table2 t2
where t1.col1 = t2.col1
and t1.col2 = t2.col2
and t1.col3 = t2.col3
and ... // check here all columns ...
)
There is however a little problem in this solution in the case of null values, which can only be tested via a "IS NOT NULL" or "IS NULL", hence the complementary solution:
select * from Table1 t1 where not exists (select 1 from Table2 t2
where (t1.col1 = t2.col1 or (t1.col1 IS NULL and t2.col1 IS NULL))
and (t1.col2 = t2.col2 or (t1.col2 IS NULL and t2.col2 IS NULL))
and (t1.col3 = t2.col3 or (t1.col3 IS NULL and t2.col3 IS NULL))
and ... // check here all columns ...
)

Delete Rows in a table Based on column value in third table

I have three tables Table1, Table2 and Table3 and the following query which deletes rows in Table1
delete from Table1
where EXISTS
(select (1) from Table2
where Table1.col1=Table2.col1
AND Table1.col2=Table2.col2
AND Table1.col3=(select **Table3.col3 from Table3** inner join Table2 on Table3.col1=Table2.col1)
Is this query correct? If not, how to use a third table inside the where condition?
Edit : Also, please explain how to rewrite the query if we want to delete the rows from table2 which itself is joined with table3?
Here's one way to do it:
delete from table1
where (col1, col2, col3) in (
select t1.col1, t1.col2, t1.col3
from table1 t1
join table2 t2 on t1.col1 = t2.col1 and t1.col2 = t2.col2
join table3 t3 on t1.col3 = t3.col3 and t2.col1 = t3.col1
);
SQL Fiddle Demo
Or using EXISTS might be faster:
delete table1
where exists (
select *
from table2
join table3 on table2.col1 = table3.col1
where table1.col3 = table3.col3 and
table1.col1 = table2.col1 and
table1.col2 = table2.col2);

fields that don't match

I have 3 fields per row in table 1 that I want to compare to the exact same fields per row in table 2
I have been playing around with NOT IN, but I am not having any luck. Can anyone help?
Basically I want to see all records from both tables where field 1, field 2, field 3 don't match in table 2
How would this be written?
SELECT *
FROM Table1 T1
FULL OUTER JOIN Table2 T2
ON T1.col1 = T2.col1
AND T1.col2 = T2.col2
AND T1.col3 = T2.col3
WHERE T1.col1 <> T2.col1
OR T1.col2 <> T2.col2
OR T1.col3 <> T2.col3
jyparask's answer is close, but it may not work if some of your columns are nullable and the where clause is off a bit. In that case you need to either coalesce the columns to a value they couldn't actually contain, or check for nulls manually.
I'm going to assume col1 is not nullable, and cols 2 and 3 are
SELECT *
FROM Table1 T1
FULL OUTER JOIN Table2 T2
ON T1.col1 = T2.col1
AND (T1.col2 = T2.col2 or (T1.col2 is null and T2.col2 is null))
AND (T1.col3 = T2.col3 or (T1.col3 is null and T2.col3 is null))
WHERE T1.col1 is null or T2.col1 is null # assuming col1 is not null
Or with a coalesce:
SELECT *
FROM Table1 T1
FULL OUTER JOIN Table2 T2
ON T1.col1 = T2.col1
AND COALESCE(T1.col2,-1) = COALESCE(T2.col2,-1)
AND COALESCE(T1.col3,-1) = COALESCE(T2.col3,-1)
WHERE T1.col1 is null or T2.col1 is null # assuming col1 is not null

compare two table values

I have 2 tables table A and table B. In table B we have to check if all the column entered is exactly as in table A, means if a row exists in table B then the same row will be there in table A too. also table A may have rows which are not in table B. if there is a row which is not in table A and is there in table B, an alert should be displayed showing which element is extra in table B.
Can we do this using join? if so what will be the sql code?
this is the best picture about joins i've ever seen :)
You probably want to have a look at the following article
SQL SERVER – Introduction to JOINs – Basic of JOINs
This should give you a very clear understanding of JOINs in Sql.
From there you should be able to find the solution.
As an example, you would have to look at something like
TABLE1
Col1
Col2
Col3
Col4
TABLE2
Col1
Col2
Col3
Col4
--all rows that match
SELECT *
FROM TABLE1 t1 INNER JOIN
TABLE2 t2 ON t1.Col1 = t2.Col1
AND t1.Col2 = t2.Col2
...
AND t1.Col3 = t2.Col3
--rows only in TABLE1
SELECT *
FROM TABLE1 t1 LEFT JOIN
TABLE2 t2 ON t1.Col1 = t2.Col1
AND t1.Col2 = t2.Col2
...
AND t1.Col3 = t2.Col3
WHERE t2.Col1 IS NULL
--rows only in TABLE2
SELECT *
FROM TABLE1 t2 LEFT JOIN
TABLE2 t1 ON t1.Col1 = t2.Col1
AND t1.Col2 = t2.Col2
...
AND t1.Col3 = t2.Col3
WHERE t1.Col1 IS NULL
If you want to compare based on single column, then you can do something like this:
SELECT ID FROM B LEFT JOIN A ON B.ID = A.ID WHERE A.ID IS NULL;
The above query will give you the list of records that are not present in A but in B.
Instead if you want to compare the entire row, you can use the following approach:
SELECT COUNT(*) FROM B;
SELECT COUNT(*) FROM A;
SELECT COUNT(*) FROM (
SELECT * FROM B UNION SELECT * FROM A
)
If all the queries returns the same count then you can assume that both the tables are exactly equal.

Getting next row using SQL

I have 1 table with data thus:
Col1 Col2
------- --------
Admin001 A
Admin001 B
Admin002 C
Admin002 C
Admin003 A
Admin003 C
I need to find all instances of Col2 values with 'A' immediately followed by 'B'. 'A' followed by any other symbol does not count. Is there a way to use SQL to accomplish this?
Environment is DB2 LUW v9.5
Update:
How can I do this if I make the table like below?
Col1 Col2 Col3
---- ------- --------
1 Admin001 A
2 Admin002 C
3 Admin002 C
4 Admin003 A
5 Admin003 C
6 Admin001 B
7 Admin001 A
8 Admin001 C
9 Admin001 B
Given that there is no implicit ordering of a set, then no, there isn't any reliable way to do this. Your data will need to be ordered (perhaps by a third column, or by column 1) for this to make any sense.
SELECT DISTINCT T1.Col2
FROM Table T1 INNER JOIN Table T2
ON T2.Col2 = T1.Col2 AND T2.Col1 = (T1.Col1 + 1)
WHERE T1.Col3 = 'A' AND T2.Col3 = 'B'
Update: As mentioned by Peter Lang, below, this will not work if the sequence in Col1 is interrupted. This version handles that situation and is more guaranteed to produce the correct result although if you're 100% certain the sequence will not be interrupted (that is, if you generate the sequence yourself in the same transaction as the analysis) the first should be faster:
SELECT DISTINCT T1.Col2
FROM Table T1 INNER JOIN Table T2
ON T2.Col2 = T1.Col2
AND T2.Col1 = (SELECT MIN(Col1) FROM Table T3 WHERE T3.Col1 > T1.Col1)
WHERE T1.Col3 = 'A' AND T2.Col3 = 'B'
It looks like you're trying to find out who's grade dropped from A to B, so we'll also assume that you want the results where B follows A for the same admin.
SELECT DISTINCT t1.Col2
FROM table t1
INNER JOIN table t2 ON t1.Col2 = t2.Col2
LEFT OUTER JOIN table t3 ON t1.Col2 = t3.Col2
AND t3.Col1 < t2.Col1 AND t3.Col1 > t1.Col1
WHERE t1.Col3 = 'A'
AND t2.Col3 = 'B' AND t2.Col1 > t1.Col1
AND t3.Col1 IS NULL
This yields any admin who has 'A' followed by 'B'.
The INNER JOIN and the first two expressions in the WHERE clause finds all records where 'B' occurs after 'A'. The left OUTER join and the last expression in the WHERE clause finds all records where there are grades between the A and B, and only takes the records without.
You asked to get these results, one per row, like this:
Col1 Col2 Col3
---- ------- --------
1 Admin001 A
6 Admin001 B
I'm going to adapt the above query the easy way.
I'll simply get the A records, get the B records, and union them:
(SELECT t1.Col1, t1.Col2, t1.Col3
FROM table t1
INNER JOIN table t2 ON t1.Col2 = t2.Col2
LEFT OUTER JOIN table t3 ON t1.Col2 = t3.Col2
AND t3.Col1 < t2.Col1 AND t3.Col1 > t1.Col1
WHERE t1.Col3 = 'A'
AND t2.Col3 = 'B' AND t2.Col1 > t1.Col1
AND t3.Col1 IS NULL)
UNION
(SELECT t2.Col1, t2.Col2, t2.Col3
FROM table t1
INNER JOIN table t2 ON t1.Col2 = t2.Col2
LEFT OUTER JOIN table t3 ON t1.Col2 = t3.Col2
AND t3.Col1 < t2.Col1 AND t3.Col1 > t1.Col1
WHERE t1.Col3 = 'A'
AND t2.Col3 = 'B' AND t2.Col1 > t1.Col1
AND t3.Col1 IS NULL)
ORDER BY Col2, Col1
Notice that we're ordering by Col2 first, then Col1. You may also get more than one set of records for each user.
How are you sorting the columns? If you aren't sorting them, you could get different results each time, as sometimes A would follow B, and sometimes B would follow A. If you are sorting them, you may be able to use an 'exists' test with the sorting expression.
There is no general method of getting the next (or previous) row in SQL, but many implementations provide their own built-in functions to help with that kind of thing. Unfortunately I'm not familiar with DB2.
This query assumes that col1 is some sort of sequence or timestamp for each row. Without it, there's no way to determine if A happened before or after B.
WITH sorted AS
(SELECT col1, col2, col3, ROWNUMBER()
OVER (PARTITION BY col2 ORDER BY col1) AS col4
FROM sometable
)
SELECT a.col1, a.col2, a.col3, b.col1, b.col3
FROM sorted a INNER JOIN sorted b
ON a.col2 = b.col2
WHERE a.col3 = 'A' AND b.col3 = 'B'
AND b.col4 = a.col4 + 1
;
I think the following should work, assuming your updated table layout with 3 columns. (Otherwise it's impossible, because no ordering is available):
select t1.col2
from yourtable t1, yourtable t2
where t1.col3 = 'A'
and t2.col3 = 'B'
and t1.col1 + 1 = t2.col1;