sql query with conditional where only works sometimes - sql

I'm creating a report (in Crystal Reports XI) based on a SQL stored procedure in a database. The query accepts a few parameters, and returns records within the specified date range. If parameters are passed in, they are used to determine which records to return. If one or more parameters are not passed in, that field is not used to limit the types of records returned. It's a bit complicated, so here's my WHERE clause:
WHERE ((Date > #start_date) AND (Date < #end_date))
AND (#EmployeeID IS NULL OR emp_id = #EmployeeID)
AND (#ClientID IS NULL OR client_id = #ClientID)
AND (#ProjectID IS NULL OR project_id = #ProjectID)
AND (#Group IS NULL OR group = #Group)
Now, for the problem:
The query (and report) works beautifully for old data, within the range of years 2000-2005. However, the WHERE clause is not filtering the data properly for more recent years: it only returns records where the parameter #Group is NULL (ie: not passed in).
Any hints, tips, or leads are appreciated!

Solved!
It actually had nothing to do with the WHERE clause, after all. I had let SQL Server generate an inner join for me, which should have been a LEFT join: many records from recent years do not contain entries in the joined table (expenses), so they weren't showing up. Interestingly, the few recent records that do have entries in the expenses table have a NULL value for group, which is why I got records only when #Group was NULL.
Morals of the story: 1. Double check anything that is automatically generated; and 2. Look out for NULL values! (n8wl - thanks for giving me the hint to look closely at NULLs.)

What are the chances that your newer data (post-2005) has some rows with NULL's in emp_id, client_id, project
_id, or group? If they were NULL's they can't match the parameters you're passing.

Since Date and group are reserved words you might try putting square brackets around the fields so they aren't processed. Doing so can get rid of "odd" issues like this. So that would make it:
WHERE (([Date] > #start_date) AND ([Date] < #end_date))
AND (#EmployeeID IS NULL OR emp_id = #EmployeeID)
AND (#ClientID IS NULL OR client_id = #ClientID)
AND (#ProjectID IS NULL OR project_id = #ProjectID)
AND (#Group IS NULL OR [group] = #Group)

Related

Passing in parameter to where clause using IS NULL or Coalesce

I would like to pass in a parameter #CompanyID into a where clause to filter results. But sometimes this value may be null so I want all records to be returned. I have found two ways of doing this, but am not sure which one is the safest.
Version 1
SELECT ProductName, CompanyID
FROM Products
WHERE (#CompanyID IS NULL OR CompanyID = #CompanyID)
Version 2
SELECT ProductName, CompanyID
FROM Products
WHERE CompanyID = COALESCE(#CompanyID, CompanyID)
I have found that the first version is the quickest, but I have also found in other tables using a similar method that I get different result sets back. I don't quite understand the different between the two.
Can anyone please explain?
Well, both queries are handling the same two scenarios -
In one scenario #CompanyID contains a value,
and in the second #CompanyID contains NULL.
For both queries, the first scenario will return the same result set - since
if #CompanyId contains a value, both will return all rows where companyId = #CompanyId, however the first query might return it faster (more on that at the end of my answer).
The second scenario, however, is where the queries starts to behave differently.
First, this is why you get different result sets:
Difference in result sets
Version 1
WHERE (#CompanyID IS NULL OR CompanyID = #CompanyID)
When #CompanyID is null, the where clause will not filter out any rows whatsoever, and all the records in the table will be returned.
Version 2
WHERE CompanyID = COALESCE(#CompanyID, CompanyID)
When #CompanyID is null, the where clause will filter out all the rows where CompanyID is null, since the result of null = null is actually unknown - and any query with null = null as it's where clause will return no results, unless ANSI_NULLS is set to OFF (which you really should not do since it's deprecated).
Index usage
You might get faster results from the first version, since the use of any function on a column in the where clause will prevent SQL Server from using any index that you might have on this column.
You can read more about it on this article in MSSql Tips.
Conclusion
Version 1 is better than version 2.
Even if you do not want to return records where companyId is null it's still better to write as WHERE (#CompanyID IS NULL OR CompanyID = #CompanyID) AND CompanyID IS NOT NULL than to use the second version.
It's worth noting that using the syntax ([Column] = #Value OR [Column] IS NULL) is a much better idea than using ISNULL([Column],#Value) = #Value (or using COALESCE).
This is because using the function causes the query to become un-SARGable; so indexes won't be used. The first expression is SARGable, and thus, will perform better.
Just adding this, as the OP states "I have found that the first version is the quickest", and wanted to elaborate why (even though, currently the statement is incomplete, I am guessing this was more due to user error and ignorance).
The second version is not correct SQL (for SQL Server). It needs an operator. Presumably:
SELECT ProductName, CompanyID
FROM Products
WHERE COALESCE(#CompanyID, CompanyID) = CompanyID;
The first version is correct as written. If you have an index on CompanyID, you might find this faster:
SELECT *
FROM Products
WHERE CompanyID = #CompanyID
UNION ALL
SELECT *
FROM Products
WHERE #CompanyID IS NULL;

SQL NOT IN failed

I am working on a query that will check the temp table if there is a record that do not exist on the main table. My query looks like this
SELECT * FROM [Telemarketing].[dbo].[PDCampaignBatch_temp]
WHERE [StartDateTime] NOT IN (SELECT [StartDateTime] FROM [Telemarketing].[dbo].PDCampaignBatch GROUP BY [StartDateTime])
but the problem is it does not display this row
even if that data does not exist in my main table. What seems to be the problem?
NOT IN has strange semantics. If any values in the subquery are NULL, then the query returns no rows at all. For this reason, I strongly recommend using NOT EXISTS instead:
SELECT t.*
FROM [Telemarketing].[dbo].[PDCampaignBatch_temp] t
WHERE NOT EXISTS (SELECT 1
FROM [Telemarketing].[dbo].PDCampaignBatch cb
WHERE t.StartDateTime = cb.StartDateTime
);
If the set is evaluated by the SQL NOT IN condition contains any values that are null, then the outer query here will return an empty set, even if there are many [StartDateTime]s that match [StartDateTime]s in the PDCampaignBatch table.
To avoid such issue,
SELECT *
FROM [Telemarketing].[dbo].[PDCampaignBatch_temp]
WHERE [StartDateTime] NOT IN (
SELECT DISTINCT [StartDateTime]
FROM [Telemarketing].[dbo].PDCampaignBatch
WHERE [StartDateTime] IS NOT NULL
);
Let's say PDCampaignBatch_temp and PDCampaignBatch happen to have the same structure (same columns in the same order) and you're tasked with getting the set of all rows in PDCampaignBatch_temp that aren't in PDCampaignBatch. The most effective way to do that is to make use of the EXCEPT operator, which will deal with NULL in the expected way as well:
SELECT * FROM [Telemarketing].[dbo].[PDCampaignBatch_temp]
EXCEPT
SELECT * FROM [Telemarketing].[dbo].[PDCampaignBatch]
In production code that is not a one-off, don't use SELECT *, write out the column names instead.
Most likely your issue is with the datetime. You may be only displaying a certain degree of percision like the year/month/date. The data may be stored as year/month/date/hour/minute/second/milisecond. If so you have to match down the the most granluar measurement of the data. If one field is a date and the other is a date time they also will likely never match up. Thus you always get no responses.

SQL Server Comparing 2 tables having Millions of Records

I have 2 tables in SQL Server: Table 1 and Table 2.
Table 1 has 500 Records and Table 2 has Millions of Records.
Table 2 may/may not have the 500 Records of Table 1 in it.
I have to compare Table 1 and Table 2. But the result should give me only the Records of Table 1 which has any data change in Table 2. Means the Result should be less than or equal to 500.
I don't have any primary key but the columns in the 2 tables are same. I have written the following query. But I am getting time out exception and it is taking much time to process. Please help.
With CTE_DUPLICATE(OLD_FIRSTNAME ,New_FirstName,
OLD_LASTNAME ,New_LastName,
OLD_MINAME ,New_MIName ,
OLD_FAMILYID,NEW_FAMILYID,ROWNUMBER)
as (
Select distinct
OLD.FIRST_NAME AS 'OLD_FIRSTNAME' ,New.First_Name AS 'NEW_FIRSTNAME',
OLD.LAST_NAME AS 'OLD_LASTNAME',New.Last_Name AS 'NEW_LASTNAME',
OLD.MI_NAME AS 'OLD_MINAME',New.MI_Name AS 'NEW_MINAME',
OLD.FAMILY_ID AS 'OLD_FAMILYID',NEW.FAMILY_ID AS 'NEW_FAMILYID',
row_number()over(partition by OLD.FIRST_NAME ,New.First_Name,
OLD.LAST_NAME ,New.Last_Name,
OLD.MI_NAME ,New.MI_Name ,
OLD.FAMILY_ID,NEW.FAMILY_ID
order by OLD.FIRST_NAME ,New.First_Name,
OLD.LAST_NAME ,New.Last_Name,
OLD.MI_NAME ,New.MI_Name ,
OLD.FAMILY_ID,NEW.FAMILY_ID )as rank
From EEMSCDBStatic OLD,EEMS_VIPFILE New where
OLD.MPID <> New.MPID and old.FIRST_NAME <> New.First_Name
and OLD.LAST_NAME <> New.Last_Name and OLD.MI_NAME <> New.MI_Name
and old.Family_Id<>New.Family_id
)
sELECT OLD_FIRSTNAME ,New_FirstName,
OLD_LASTNAME ,New_LastName,
OLD_MINAME ,New_MIName ,
OLD_FAMILYID,NEW_FAMILYID FROM CTE_DUPLICATE where rownumber=1
I think the main problem here is that your query is forcing the DB to fully multiply your tables, which means processing ~500M combinations. It happens because you're connecting any record from T1 with any record from T2 that has at least one different value, including MPID that looks like the unique identifier that must be used to connect records.
If MPID is really the column that identifies records in both tables then your query should have a bit different structure:
SELECT old.FIRSTNAME, new.FirstName,
old.LASTNAME, new.LastName,
old.MINAME, new.MIName,
old.FAMILYID, new.FAMILYID
FROM EEMSCDBStatic old
INNER JOIN EEMS_VIPFILE new ON old.MPID = new.MPID
WHERE old.FIRST_NAME <> New.First_Name
AND OLD.LAST_NAME <> New.Last_Name
AND OLD.MI_NAME <> New.MI_Name
AND old.Family_Id <> New.Family_id
ORDER BY old.FIRSTNAME, new.FirstName,
old.LASTNAME, new.LastName,
old.MINAME, new.MIName,
old.FAMILYID, new.FAMILYID
A couple of other thoughts:
If you're looking for any change in a record (even if only one column has different values), you should use ORs in the WHERE clause, not ANDs. Now you're only looking for records that changed values in all columns. For instance, you'll fail to find a person who changed his or her first name but decided to keep last name.
You should obviously consider indexing your tables if it's possible.
Surely it is pointless to use DISTINCT keyword together with ROWNUMBER.
See this sql query distinct with Row_Number.
You are doing CROSS JOIN, which is terribly big in your case.
Perhaps in that condition you
where OLD.MPID <> New.MPID and old.FIRST_NAME <> New.First_Name and ...
you wanted to have OR instead of AND?
It is also not entirely clear why you use ROWNUMBER at all - perhaps to find the best match.
All this is because as #Shnugo correctly remarked, the logic behind your comparing is faulty - you must have some logic defined that would JOIN the tables (Like First and second name must be the same).

How can I ensure a null is assigned to a variable when using MAX in Oracle?

I have another question dealing with Nulls in Oracle.
I have a small table that is used as a session table. Each row is a specific session.
If a session is finished successfully the final column indicates a version number or null if the session is dropped.
This column can be null and I want to select either the max version number or null into a sessionNumber variable .
This was the origanal I had set up
SELECT MAX (VerNumber) INTO sessionNumber
FROM Table_A
WHERE sessionDate = -- some date
AND VerNumber IS NOT NULL;
this returns one row when the col is null
I tried using NVL like this
SELECT NVL(MAX (VerNumber),NULL) INTO sessionNumber
FROM Table_A
WHERE sessionDate = -- some date
AND VerNumber IS NOT NULL;
To hopefully clear up confusion, the NOT NULL condition exists to prevent extra sessions by the same user for the same date. TBH in this situation even if the NOT NULL condition is removed I still get a (empty) row returned. I think there is something I'm missing with Nulls and aggregate functions.
but it doesn't work. I've also seen where I could use an exception but that seems like a convoluted fix.
Any guidance is greatly appreciated.
Thanks.
Further clarification as requested.
My table has four columns user, date, state, and verNumber
user and date are identifier columns ( not officially a PK but you can look at them that way).
state indicates if session was completed or not
verNumber indicates the the number of times that a session for that date was completed and can be null if no session for that date was completed.
I have a variable sessionNumber that I would like to assign either the MAX(VerNumber) available for that date or null when the date is unavailable or the value in VerNumber is null.
sample rows in Table_A
USER | DATE | STATE | VERNUMBER
'AName' | 2012-06-25 | 'YES' | 1
'CName' | 2012-06-25 | 'YES' | 2
'BName' | 2012-06-26 | 'NO' | --NULL
so for date 06-25 I would expect 2 to be the value
and for 06-26 I would expect null.
Is this close to what you want?
http://sqlfiddle.com/#!4/ca465/1
Barking up wrong tree?
Edit: forget that last one
...does this look right to you!!?
http://sqlfiddle.com/#!4/44e50/21
I'm thinking maybe you want a single entry for each date though with the session which was last (no matter the user) - either way this gives you some hints
Edit: Here's that query:
http://sqlfiddle.com/#!4/4163d/1
Which I think mirrors your output!
For everyones benefit (and in case SQL fiddle ever explodes!), the final query I did was:
SELECT Table_A.* FROM
(
SELECT
SessionDate,
MAX(CASE WHEN VerNumber IS NULL THEN 'A' ELSE VerNumber END) as Ver
FROM Table_A
GROUP BY SessionDate
) TD
INNER JOIN Table_A
ON NVL(Table_A.VerNumber, 'A') = TD.Ver
AND Table_A.SessionDate = TD.SessionDate
So basically just used a CASE to get the MAX of an expression on the VerNumber column but use an alpha character to ensure that the NULLs in that column were selected by the MAX. The outer query joins to the inner on an expression using NVL() which allows the NULL to be joined to the 'A' in the inner query. Not sure if collation would cause issues here (does collation ever change the sort order of alpha vs numeric??)
If you use an aggregate function alone in a select you'll always get a row. If you want no rows returned, you need to add another column and use GROUP BY and filter on that other column, something along the lines of:
SELECT VerNumber, MAX (VerNumber)
INTO sessionNumber
FROM Table_A
WHERE sessionDate = -- some date
AND VerNumber IS NOT NULL
GROUP BY VerNumber;
I don't know how your data is structured, so this may not be the exact solution. I'm also a bit confused, like the other who have commented, on exactly what you're trying to do.
Bear in mind, however, that if you do this you WILL need an exception handler, because you'll get a NO DATA FOUND exception.

SQL Server - Excluding records from a view if two certain fields are both null

I'm writing a view in SQL Server and I need the results filtered so that if two particular fields on a record have null values then that record is excluded.
As an example a table Customer has fields Code, Name, Address, Payment, Payment_Date.
If both Payment & Payment_Date are null then exclude the record from the result set, however if just one is null (or any other field) then it's fine to return the record.
Is this functionality possible in SQL Server?? Any help would be appreciated.
SELECT *
FROM mytable
WHERE other_conditions
…
AND (payment IS NOT NULL OR payment_date IS NOT NULL)
Take the coalesceof the two fields, and check that value for null
select * from yourtable where coalesce(field1, field2, field3, etc, etc,) is not null
this is somewhat easier on the eyes than a string of OR clauses (imho)
You can do this in the where clause, simply turn it around and use an OR instead:
WHERE
(
PAYMENT IS NOT NULL
OR
PAYMENT_DATE IS NOT NULL
)
AND
-- ...rest of where clause here...