mismatch not picked up when one value is null - sql

I have a simple SQL query where a comparison is done between two tables for mismatching value.
Yesterday, we picked up an issue where one field was null and the other wasn't, but a mismatch was not detected.
As far as I can determine,the logic has been working all along until yesterday.
Here is the logic of the SQL:
CREATE TABLE Table1
(UserID INT,PlayDate DATETIME)
CREATE TABLE Table2
(UserID INT,PlayDate DATETIME)
INSERT INTO Table1 (UserID)
SELECT 5346
INSERT INTO Table2 (UserID,PlayDate)
SELECT 5346,'2012-11-01'
SELECT a.UserID
FROM Table1 a
INNER JOIN
Table2 b
ON a.UserID = b.UserID
WHERE a.PlayDate <> b.PlayDate
No values are returned even though the PlayDate values are different.
I have now updated the WHERE to read:
WHERE ISNULL(a.PlayDate,'') <> ISNULL(b.PlayDate,'')
Is there a setting in SQL which someone could have changed to cause the original code to no longer pick up the difference in fields?
Thanks

NULL <> anything
is unknown not true. SQL uses three valued logic (false/true/unknown) and the predicate needs to evaluate to true in a where clause for the row to be returned.
In fact in standard SQL any comparison with NULL except for IS [NOT] NULL yields unknown. Including WHERE NULL = NULL
You don't state RDBMS but if it supports IS DISTINCT FROM you could use that or if you are using MySQL it has a null safe equality operator <=> you could negate.
You say you think it previously behaved differently. If you are on SQL Server you might be using a different setting for ANSI_NULLS somehow but this setting is deprecated and you should rewrite any code that depends on it anyway.
You can simulate IS DISTINCT FROM in SQL Server with WHERE EXISTS (SELECT a.PlayDate EXCEPT SELECT b.PlayDate)

Not even a NULL can be equal to NULL.
Here are two common queries that just don’t work:
select * from table where column = null;
select * from table where column <> null;
there is no concept of equality or inequality, greater than or less
than with NULLs. Instead, one can only say “is” or “is not”
(without the word “equal”) when discussing NULLs.
- The correct way to write the queries
select * from table where column IS NULL;
select * from table where column IS NOT NULL;

Related

SQL NOT IN failed

I am working on a query that will check the temp table if there is a record that do not exist on the main table. My query looks like this
SELECT * FROM [Telemarketing].[dbo].[PDCampaignBatch_temp]
WHERE [StartDateTime] NOT IN (SELECT [StartDateTime] FROM [Telemarketing].[dbo].PDCampaignBatch GROUP BY [StartDateTime])
but the problem is it does not display this row
even if that data does not exist in my main table. What seems to be the problem?
NOT IN has strange semantics. If any values in the subquery are NULL, then the query returns no rows at all. For this reason, I strongly recommend using NOT EXISTS instead:
SELECT t.*
FROM [Telemarketing].[dbo].[PDCampaignBatch_temp] t
WHERE NOT EXISTS (SELECT 1
FROM [Telemarketing].[dbo].PDCampaignBatch cb
WHERE t.StartDateTime = cb.StartDateTime
);
If the set is evaluated by the SQL NOT IN condition contains any values that are null, then the outer query here will return an empty set, even if there are many [StartDateTime]s that match [StartDateTime]s in the PDCampaignBatch table.
To avoid such issue,
SELECT *
FROM [Telemarketing].[dbo].[PDCampaignBatch_temp]
WHERE [StartDateTime] NOT IN (
SELECT DISTINCT [StartDateTime]
FROM [Telemarketing].[dbo].PDCampaignBatch
WHERE [StartDateTime] IS NOT NULL
);
Let's say PDCampaignBatch_temp and PDCampaignBatch happen to have the same structure (same columns in the same order) and you're tasked with getting the set of all rows in PDCampaignBatch_temp that aren't in PDCampaignBatch. The most effective way to do that is to make use of the EXCEPT operator, which will deal with NULL in the expected way as well:
SELECT * FROM [Telemarketing].[dbo].[PDCampaignBatch_temp]
EXCEPT
SELECT * FROM [Telemarketing].[dbo].[PDCampaignBatch]
In production code that is not a one-off, don't use SELECT *, write out the column names instead.
Most likely your issue is with the datetime. You may be only displaying a certain degree of percision like the year/month/date. The data may be stored as year/month/date/hour/minute/second/milisecond. If so you have to match down the the most granluar measurement of the data. If one field is a date and the other is a date time they also will likely never match up. Thus you always get no responses.

SQL Server where column in where clause is null

Let's say that we have a table named Data with Id and Weather columns. Other columns in that table are not important to this problem. The Weather column can be null.
I want to display all rows where Weather fits a condition, but if there is a null value in weather then display null value.
My SQL so far:
SELECT *
FROM Data d
WHERE (d.Weather LIKE '%'+COALESCE(NULLIF('',''),'sunny')+'%' OR d.Weather IS NULL)
My results are wrong, because that statement also shows values where Weather is null if condition is not correct (let's say that users mistyped wrong).
I found similar topic, but there I do not find appropriate answer.
SQL WHERE clause not returning rows when field has NULL value
Please help me out.
Your query is correct for the general task of treating NULLs as a match. If you wish to suppress NULLs when there are no other results, you can add an AND EXISTS ... condition to your query, like this:
SELECT *
FROM Data d
WHERE d.Weather LIKE '%'+COALESCE(NULLIF('',''),'sunny')+'%'
OR (d.Weather IS NULL AND EXISTS (SELECT * FROM Data dd WHERE dd.Weather LIKE '%'+COALESCE(NULLIF('',''),'sunny')+'%'))
The additional condition ensures that NULLs are treated as matches only if other matching records exist.
You can also use a common table expression to avoid duplicating the query, like this:
WITH cte (id, weather) AS
(
SELECT *
FROM Data d
WHERE d.Weather LIKE '%'+COALESCE(NULLIF('',''),'sunny')+'%'
)
SELECT * FROM cte
UNION ALL
SELECT * FROM Data WHERE weather is NULL AND EXISTS (SELECT * FROM cte)
statement show also values where Wether is null if condition is not correct (let say that users typed wrong sunny).
This suggests that the constant 'sunny' is coming from end-user's input. If that is the case, you need to parameterize your query to avoid SQL injection attacks.

Excluding a Null value returns 0 rows in a sub query

I'm trying to clean up some data in SQL server and add a foreign key between the two tables.
I have a large quantity of orphaned rows in one of the tables that I would like to delete. I don't know why the following query would return 0 rows in MS SQL server.
--This Query returns no Rows
select * from tbl_A where ID not in ( select distinct ID from tbl_B
)
When I include IS NOT NULL in the subquery I get the results that I expect.
-- Rows are returned that contain all of the records in tbl_A but Not in tbl_B
select * from tbl_A where ID not in ( select distinct ID from tbl_B
where ID is not null )
The ID column is nullable and does contain null values. IF I run just the subquery I get the exact same results except the first query returns one extra NULL row as expected.
This is the expected behavior of the NOT IN subquery. When a subquery returns a single null value NOT IN will not match any rows.
If you don't exclusively want to do a null check, then you will want to use NOT EXISTS:
select *
from tbl_A A
where not exists (select distinct ID
from tbl_B b
where a.id = b.id)
As to why the NOT IN is causing issues, here are some posts that discuss it:
NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL
NOT EXISTS vs NOT IN
What's the difference between NOT EXISTS vs. NOT IN vs. LEFT JOIN WHERE IS NULL?
Matching on NULL with equals (=) will return NULL or UNKNOWN as opposed to true/false from a logic standpoint. E.g. see http://msdn.microsoft.com/en-us/library/aa196339(v=sql.80).aspx for discussion.
If you want to include finding NULL values in table A where there is no NULL in table B (if B is the "parent" and A is the "child" in the "foreign key" relationship you desire) then you would need a second statement, something like the following. Also I would recommend qualifying the ID field with a table prefix or alias since the field names are the same in both tables. Finally, I would not recommend having NULL values as the key. But in any case:
select * from tbl_A as A where (A.ID not in ( select distinct B.ID from tbl_B as B ))
or (A.ID is NULL and not exists(select * from tbl_B as B where B.ID is null))
The problem is the non-comparability of nulls. If you are asking "not in" and there are nulls in the subquery it cannot say that anything anything is definitely not in becuase it is looking at those nulls as "unknown" and so the answer is always "unknown" in the three value logic that SQL uses.
Now of course that is all assuming you have ANSI_NULLS ON (which is the default) If you turn that off then suddenly NULLS become comparable and it will give you results, and probably the results you expect.
If the ids are never negative, you might consider something like:
select *
from tbl_A
where coalesce(ID, -1) not in ( select distinct coalesce(ID, -1) from tbl_B )
(Or if id is a string, use something line coalesce(id, '<null>')).
This may not work in all cases, but it has the virtue of simplicity on the coding level.
You probably have ANSI NULLs switched off. This compares null values so null=null will return true.
Prefix the first query with
SET ANSI_NULLS ON
GO

TSQL NOT EXISTS Why is this query so slow?

Debugging an app which queries SQL Server 05, can't change the query but need to optimise things.
Running all the selects seperately are quick <1sec, eg: select * from acscard, select id from employee... When joined together it takes 50 seconds.
Is it better to set uninteresting accesscardid fields to null or to '' when using EXISTS?
SELECT * FROM ACSCard
WHERE NOT EXISTS
( SELECT Id FROM Employee
WHERE Employee.AccessCardId = ACSCard.acs_card_number )
AND NOT EXISTS
( SELECT Id FROM Visit
WHERE Visit.AccessCardId = ACSCard.acs_card_number )
ORDER by acs_card_id
Do you have indexes on Employee.AccessCardId, Visit.AccessCardId, and ACSCard.acs_card_number?
The SELECT clause is not evaluated in an EXISTS clause. This:
WHERE EXISTS(SELECT 1/0
FROM EMPLOYEE)
...should raise an error for dividing by zero, but it won't. But you need to put something in the SELECT clause for it to be a valid query - it doesn't matter if it's NULL or a zero length string.
In SQL Server, NOT EXISTS (and NOT IN) are better than the LEFT JOIN/IS NULL approach if the columns being compared are not nullable (the values on either side can not be NULL). The columns compared should be indexed, if they aren't already.

Can Anyone explain why NULL is used in this query?

Also what will be the scenarios where this query is used
select * from TableA where exists
(select null from TableB where TableB.Col1=TableA.Col1)
As the query is in an EXISTS then you can return anything. It is not even evaluated.
In fact, you can replace the null with (1/0) and it will not even produce a divide by zero error.
The NULL makes no sense. It's simply bad SQL.
The exists clause is supposed to use SELECT *.
People make up stories about the cost of SELECT *. They claim it does an "extra" metadata query. It doesn't. They claim it's a "macro expansion" and requires lots of extra parse time. It doesn't.
The EXISTS condition is considered "to be met" if the subquery returns at least one row.
The syntax for the EXISTS condition is:
SELECT columns
FROM tables
WHERE EXISTS ( subquery );
Please note that "Select Null from mytable" will return number of rows in mytable but all will contain only one column with null in the cell as the requirement of outer query is just to check whether any row fall in the given given condition like in your case it is "TableB.Col1=TableA.Col1"
you can change null to 1, 0 or any column name available in the table. 1/0 may not be a good idea :)
It's a tacky way of selecting all records in TableA, which have a matching record (Col1=Col1) in TableB. They might equally well have selected '1', or '*', for instance.
A more human-readable way of achieving the same would be
SELECT * FROM TableA WHERE Col1 IN ( SELECT Col1 IN TableB )
Please, please, all ....
EXISTS returns a BOOLEAN i.e. TRUE or FALSE. If the result set is non empty then return TRUE. Correlation of the sub-query is important as in the case above.
i.e Give me all the rows in A where AT LEAST one col1 exists in B.
It does not matter what is in the select list. Its just a matter of style.