Comparing all data in two tables whilst ignoring certain differences - sql

I am currently trying to compare two SQL server database tables. I've found various methods online, some which seem to work, other which don't.
The one I used which worked was:
select * from table1
except
select * from table2
The only issue with this then (as far as I am aware) is, the table says there is a difference between a 'NULL' value and a '0'. Which is correct, there is a difference.
However my question is, is there a way to do the comparison difference check, whilst ignoring certain conditions such as NULL and 0.

You can use
select isnull(column1, 0) as column1 from table1
except
select isnull(column1, 0) as column1 from table2
to consider the values 0 and null as same.
Also, If you wish to consider more values as same like null = 0 = '' (empty string)
You can use case:
select case when (column1 is null or column1 = '') then 0 end as column1 from table1
except
select case when (column1 is null or column1 = '') then 0 end as column1 from table2

Related

SQL Server inconsistent results over 2 columns using = and <>

I am trying to replace a manual process with an SQL-SERVER (2012) based automated one. Prior to doing this, I need to analyse the data in question over time to produce some data quality measures/statistics.
Part of this entails comparing the values in two columns. I need to count where they match and where they do not so I can prove my varied stats tally. This should be simple but seems not to be.
Basically, I have a table containing two columns both of which are defined identically as type INT with null values permitted.
SELECT * FROM TABLE
WHERE COLUMN1 is NULL
returns zero rows
SELECT * FROM TABLE
WHERE COLUMN2 is NULL
also returns zero rows.
SELECT COUNT(*) FROM TABLE
returns 3780
and
SELECT * FROM TABLE
returns 3780 rows.
So I have established that there are 3780 rows in my table and that there are no NULL values in the columns I am interested in.
SELECT * FROM TABLE
WHERE COLUMN1=COLUMN2
returns zero rows as expected.
Conversely therefore in a table of 3780 rows, with no NULL values in the columns being compared, I expect the following SQL
SELECT * FROM TABLE
WHERE COLUMN1<>COLUMN2
or in desperation
SELECT * FROM TABLE
WHERE NOT (COLUMN1=COLUMN2)
to return 3780 rows but it doesn't. It returns 3709!
I have tried SELECT * instead of SELECT COUNT(*) in case NULL values in some other columns were impacting but this made no difference, I still got 3709 rows.
Also, there are some negative values in 73 rows for COLUMN1 - is this what causes the issue (but 73+3709=3782 not 3780 my number of rows)?
What is a better way of proving the values in these numeric columns never match?
Update 09/09/2016: At Lamaks suggestion below I isolated the 71 missing rows and found that in each one, COLUMN1 = NULL and COLUMN2 = -99. So the issue is NULL values but why doesn't
SELECT * FROM TABLE WHERE COLUMN1 is NULL
pick them up? Here is the information in Information Schema Views and System Views:
ORDINAL_POSITION COLUMN_NAME DATA_TYPE CHARACTER_MAXIMUM_LENGTH IS_NULLABLE
1 ID int NULL NO
.. .. .. .. ..
7 COLUMN1 int NULL YES
8 COLUMN2 int NULL YES
CONSTRAINT_NAME
PK__TABLE___...
name type_desc is_unique is_primary_key
PK__TABLE___... CLUSTERED 1 1
Suspect the CHARACTER_MAXIMUM_LENGTH of NULL must be the issue?
You can find the count based on the below query using left join.
--To find COLUMN1=COLUMN2 Count
--------------------------------
SELECT COUNT(T1.ID)
FROM TABLE T1
LEFT JOIN TABLE T2 ON T1.COLUMN1=T2.COLUMN2
WHERE t2.id is not null
--To find COLUMN1<>COLUMN2 Count
--------------------------------
SELECT COUNT(T1.ID)
FROM TABLE T1
LEFT JOIN TABLE T2 ON T1.COLUMN1=T2.COLUMN2
WHERE t2.id is null
Through the exhaustive comment chain above with all help gratefully received, I suspect this to be a problem with the table creation script data types for the columns in question. I have no explanation from an SQL code point of view, as to why the "is NULL" intermittently picked up NULL values.
I was able to identify the 71 rows that were not being picked up as expected by using an "except".
i.e. I flipped the SQL that was missing 71 rows, namely:
SELECT * FROM TABLE WHERE COLUMN1 <> COLUMN 2
through an except:
SELECT * FROM TABLE
EXCEPT
SELECT * FROM TABLE WHERE COLUMN1 <> COLUMN 2
Through that I could see that COLUMN1 was always NULL in the missing 71 rows - even though the "is NULL" was not picking them up for me when I ran
SELECT * FROM TABLE WHERE COLUMN1 IS NULL
which returned zero rows.
Regarding the comparison of values stored in the columns, as my data volumes are low (3780 recs), I am just forcing the issue by using ISNULL and setting to 9999 (a numeric value I know my data will never contain) to make it work.
SELECT * FROM TABLE
WHERE ISNULL(COLUMN1, 9999) <> COLUMN2
I then get the 3780 rows as expected. It's not ideal but it'll have to do and is more or less appropriate as there are null values in there so they have to be handled.
Also, using Bertrands tip above I could view the table creation script and the columns were definitely set up as INT.

Reduce the use of AND operator in Oracle while comparing multiple NULL values in different columns

I want to compare multiple column's NULL values. eg. Assume I have 3 columns in my table from which I have to find out NOT NULL values. I am using following code :
select * from table1 where
column1 is not null
and column2 is not null
and column3 is not null
I don't want to use this code as it uses "and" multiple times if columns goes on increasing.
Anybody have option to this in Oracle 11g ?
I agree with the comment that your query is fine as is. If the columns that you are checking are all of a numeric variety then you can use Oracle's behavior with null values to your advantage to shorten the query like this:
select * from table 1
where
(
column1
+ column2
+ column3
) is not null;
If any of the listed columns are null then the sum will be null also. Unfortunately, if you have strings instead--null strings concatenate just fine, so the same approach doesn't work with them.
You can use
COALESCE (expr1, expr2)
which is equivalent to
CASE WHEN expr1 IS NOT NULL THEN expr1 ELSE expr2 END
Your syntax would be
coalesce(column1,....,columnn) is not null
You can use this instead of the COALESCE:
SELECT *
FROM table1
WHERE column1 || column2 || column3 || column4 IS NOT NULL;
Tim Rhyne answers well. If you had all string columns, your where clause could be:
WHERE LENGTH(COLUMN1)+LENGTH(COLUMN2)+LENGTH(COLUMN3) IS NOT NULL
If you had a mix of string and numeric:
WHERE COLUMN_INTEGER1+COLUMN_INTEGER2+LENGTH(COLUMN_STRING3) IS NOT NULL

SQL Server compare data

I am attempting to compare 2 columns in one SQL Table. Column1 has 012-0000430-001 and Column2 has 0120000430001 both nvarchar data types. I would like to run a compare to make sure both tables match.
select Column1,substring(Column2,1,3)
+ substring(Column2,5,7)
+substring(Column2,13,3)
from Table1
This query gives me the data but what could I do next to see which data matches and which does not. I would eventually like to create a trigger that find the mismatch and then correct it.
Thanks in advance!
If you want to compare them, how about something like this?
select column1, column2,
(case when column2 = replace(column1, '-', '') then 'same'
else 'diff'
end)
from table1;
select CASE WHEN replace(Column1,'-','')= Column2 then
'Equals' else 'Not Equals' end from Table_Name

Is there a single SQL (or its variations) function to check not equals for multiple columns at once?

Just as I can check if a column does not equal one of the strings given in a set.
SELECT * FROM table1 WHERE column1 NOT IN ('string1','string2','string3');
Is there a single function that I can make sure that multiple columns does not equal a single string? Maybe like this.
SELECT * FROM table1 WHERE EACH(column1,column2,column3) <> 'string1';
Such that it gives the same effect as:
SELECT * FROM table1 WHERE column1 <> 'string1'
AND column2 <> 'string1'
AND column3 <> 'string1';
If not, what's the most concise way to do so?
I believe you can just reverse the columns and constants in your first example:
SELECT * FROM table1 WHERE 'string1' NOT IN (column1, column2, column3);
This assumes you are using SQL Server.
UPDATE:
A few people have pointed out potential null comparison problems (even though your desired query would have the same potential problem). This could be worked around by using COALESCE in the following way:
SELECT * FROM table1 WHERE 'string1' NOT IN (
COALESCE(column1,'NA'),
COALESCE(column2,'NA'),
COALESCE(column3,'NA')
);
You should replace 'NA' with a value that will not match whatever 'string1' is. If you do not allow nulls for columns 1,2 and 3 this is not even an issue.
No, there is no standard SQL way to do this. Barring any special constraints on what the string fields contain there's no more concise way to do it than you've already hit upon (col1 <> 'String1' AND col2 <> 'String2').
Additionally, this kind of requirement is often an indication that you have a flaw in your database design and that you're storing the same information in several different columns. If that is true in your case then consider refactoring if possible into a separate table where each column becomes its own row.
The most concise way to do this is
SELECT * FROM table1 WHERE column1 <> 'string1'
AND column2 <> 'string1'
AND column3 <> 'string1';
Yes, I cut & pasted that from your original question. :-)
I'm more concerned why you're wanting to compare against all three columns. It sounds like you might have a table that needs normalization. What are the actual columns of column1, column2 and column3. Are they something like phone1, phone2, and phone3? Perhaps those three columns should actually be in a subtable.

Select rows where column is null

How do you write a SELECT statement that only returns rows where the value for a certain column is null?
Do you mean something like:
SELECT COLUMN1, COLUMN2 FROM MY_TABLE WHERE COLUMN1 = 'Value' OR COLUMN1 IS NULL
?
I'm not sure if this answers your question, but using the IS NULL construct, you can test whether any given scalar expression is NULL:
SELECT * FROM customers WHERE first_name IS NULL
On MS SQL Server, the ISNULL() function returns the first argument if it's not NULL, otherwise it returns the second. You can effectively use this to make sure a query always yields a value instead of NULL, e.g.:
SELECT ISNULL(column1, 'No value found') FROM mytable WHERE column2 = 23
Other DBMSes have similar functionality available.
If you want to know whether a column can be null (i.e., is defined to be nullable), without querying for actual data, you should look into information_schema.
Use Is Null
select * from tblName where clmnName is null
You want to know if the column is null
select * from foo where bar is null
If you want to check for some value not equal to something and the column also contains null values you will not get the columns with null in it
does not work:
select * from foo where bar <> 'value'
does work:
select * from foo where bar <> 'value' or bar is null
in Oracle (don't know on other DBMS) some people use this
select * from foo where NVL(bar,'n/a') <> 'value'
if I read the answer from tdammers correctly then in MS SQL Server this is like that
select * from foo where ISNULL(bar,'n/a') <> 'value'
in my opinion it is a bit of a hack and the moment 'value' becomes a variable the statement tends to become buggy if the variable contains 'n/a'.
select Column from Table where Column is null;
select * from tableName where columnName is null
For some reasons IS NULL may not work with some column data type. I was in need to get all the employees that their English full name is missing, I've used:
SELECT emp_id, Full_Name_Ar, Full_Name_En
FROM employees
WHERE Full_Name_En = '' or Full_Name_En is null