Alternative to SET ANSI_NULLS OFF in a WHERE clause - sql

I have a SP that has a very complex SQL statement(s) where I need to be able to compare some column to NULL e.g.
...
FROM Categories
WHERE PID = #parentID
#parentID is a SP parameter which can be valid NULL.
PID (parent ID) is uniqueidentifier which can also be valid NULL (top level category).
I could use SET ANSI_NULLS OFF but the documentation says:
In a future version of SQL Server, ANSI_NULLS will always be ON and
any applications that explicitly set the option to OFF will generate
an error. Avoid using this feature in new development work, and plan
to modify applications that currently use this feature.
What can be an elegant way instead of repeating the same query(s) with IS NULL in case #parentID=NULL (and also not using dynamic SQL):
IF #parentID IS NULL
SELECT...WHERE PID IS NULL
ELSE
SELECT...WHERE PID = #parentID
EDIT: I want to avoid an IF because I hate repeating (huge) code.

Something like:
select ..
FROM Categories
WHERE PID = #parentID or (PID is null and #parentID is null)

I think the if version is the clearest. A big issue with multiple queries, though, is that the stored procedure compiles the code when it is first run -- and it might make the wrong decision about the execution plan.
One option is to include recompile.
Another is to combine the queries, but in a way where each part should use indexes effectively:
select c.* from categories c where pid is null and #parentid is null
union all
select c.* from categories c where pid = #parentId;
This is a tiny bit less efficient than the if version.

Here is an elegant and concise way to code this:
SELECT *
FROM Categories
WHERE COALESCE(PID, 'NULL-MATCH') = COALESCE(#parentID, 'NULL-MATCH')

Related

Passing in parameter to where clause using IS NULL or Coalesce

I would like to pass in a parameter #CompanyID into a where clause to filter results. But sometimes this value may be null so I want all records to be returned. I have found two ways of doing this, but am not sure which one is the safest.
Version 1
SELECT ProductName, CompanyID
FROM Products
WHERE (#CompanyID IS NULL OR CompanyID = #CompanyID)
Version 2
SELECT ProductName, CompanyID
FROM Products
WHERE CompanyID = COALESCE(#CompanyID, CompanyID)
I have found that the first version is the quickest, but I have also found in other tables using a similar method that I get different result sets back. I don't quite understand the different between the two.
Can anyone please explain?
Well, both queries are handling the same two scenarios -
In one scenario #CompanyID contains a value,
and in the second #CompanyID contains NULL.
For both queries, the first scenario will return the same result set - since
if #CompanyId contains a value, both will return all rows where companyId = #CompanyId, however the first query might return it faster (more on that at the end of my answer).
The second scenario, however, is where the queries starts to behave differently.
First, this is why you get different result sets:
Difference in result sets
Version 1
WHERE (#CompanyID IS NULL OR CompanyID = #CompanyID)
When #CompanyID is null, the where clause will not filter out any rows whatsoever, and all the records in the table will be returned.
Version 2
WHERE CompanyID = COALESCE(#CompanyID, CompanyID)
When #CompanyID is null, the where clause will filter out all the rows where CompanyID is null, since the result of null = null is actually unknown - and any query with null = null as it's where clause will return no results, unless ANSI_NULLS is set to OFF (which you really should not do since it's deprecated).
Index usage
You might get faster results from the first version, since the use of any function on a column in the where clause will prevent SQL Server from using any index that you might have on this column.
You can read more about it on this article in MSSql Tips.
Conclusion
Version 1 is better than version 2.
Even if you do not want to return records where companyId is null it's still better to write as WHERE (#CompanyID IS NULL OR CompanyID = #CompanyID) AND CompanyID IS NOT NULL than to use the second version.
It's worth noting that using the syntax ([Column] = #Value OR [Column] IS NULL) is a much better idea than using ISNULL([Column],#Value) = #Value (or using COALESCE).
This is because using the function causes the query to become un-SARGable; so indexes won't be used. The first expression is SARGable, and thus, will perform better.
Just adding this, as the OP states "I have found that the first version is the quickest", and wanted to elaborate why (even though, currently the statement is incomplete, I am guessing this was more due to user error and ignorance).
The second version is not correct SQL (for SQL Server). It needs an operator. Presumably:
SELECT ProductName, CompanyID
FROM Products
WHERE COALESCE(#CompanyID, CompanyID) = CompanyID;
The first version is correct as written. If you have an index on CompanyID, you might find this faster:
SELECT *
FROM Products
WHERE CompanyID = #CompanyID
UNION ALL
SELECT *
FROM Products
WHERE #CompanyID IS NULL;

NULLs and SET ANSI_NULLS OFF

I have a bit of SQL that queries a table that has one column that can take NULL.
In the code below the #term_type_id could be NULL, and the column term_type_id could have NULL values.
So the code below works if #term_type_id has a value, but does not work if #term_type_id is NULL.
Setting SET ANSI_NULLS OFF is a way around the problem, but I do know that this will be depreciated at some stage.
SELECT [date],value FROM history
WHERE id=#id
AND data_type_id=#data_type_id
AND quote_type_id=#quote_type_id
AND update_type_id=#update_type_id
AND term_type_id=#term_type_id
AND source_id=#source_id
AND ([date]>=#temp_from_date and [date]<=#temp_to_date)
ORDER BY [date]
What I have done in the past is to have something like this
if #term_type_id is NULL
BEGIN
SELECT ......
WHERE .....
AND term_type_id IS NULL
END
BEGIN
SELECT ......
WHERE .....
AND term_type_id = #term_type_id
END
While this works, it is very verbose and makes the code hard to read and maintain.
Does anyone have a better solution than using SET ANSI_NULLS OFF or having to write conditional code just to manage the case when something could be a value or NULL?
BTW - When I use SET ANSI_NULLS OFF, I only do it for the specific query then turn it back on afterwards. I do understand the reasons why this is frowned upon, but it is at the expense of writing pointless code to get around a 'pure' view of NULL.
Ben
Since both the column and the parameter can be null, you should treat both cases:
SELECT [date],value FROM history
WHERE id=#id
AND data_type_id=#data_type_id
AND quote_type_id=#quote_type_id
AND update_type_id=#update_type_id
AND ((term_type_id IS NULL AND #term_type_id IS NULL) OR term_type_id = #term_type_id)
AND source_id=#source_id
AND ([date]>=#temp_from_date and [date]<=#temp_to_date)
ORDER BY [date]
Note that this will only return results when both column and parameter are null, or none of them is null.
Ben, a better solution to if #term_type_id is NULL SELECT #term_type_id=-1
would be to use isnull(#term_type_id,-1)
Ahh - yes - thank you for the prompt responses. I think I have just come up with a better solution (well in this case)...
if #term_type_id is NULL SELECT #term_type_id=-1
SELECT [date],value FROM history
WHERE id=#id
AND data_type_id=#data_type_id
AND quote_type_id=#quote_type_id
AND update_type_id=#update_type_id
AND ISNULL(term_type_id,-1)=#term_type_id
AND source_id=#source_id
AND ([date]>=#temp_from_date and [date]<=#temp_to_date)
ORDER BY [date]
This works in this case as term_type_id is the result of an identity (1,1) and thus can not be -1.
Try this, this will work on both case.
SELECT ......
WHERE .....
AND ISNULL(term_type_id,-1) = ISNULL(#term_type_id,-1)
You can use any static value instead of -1
Or you can use something like below
SELECT ......
WHERE .....
AND ( (#term_type_id IS NULL AND term_type_id IS NULL)
OR term_type_id = #term_type_id
)
If it's the only nullable column among your search criteria, the best way would be to split conditions within a single UNION statement:
select date, value from dbo.History
where term_type_id is null
-- Remaining search criteria
and ...
union all
select date, value from dbo.History
where term_type_id = #term_type_id
-- Remaining search criteria
and ...
This is the fastest code possible in your case, basically because SQL Server doesn't have a particular knack for OR-ed conditions. However, another nullable column will turn this into a rather unpleasant mess.
If you think you can sacrifice performance, there is a useful function in T-SQL that does exactly that - NULLIF():
select date, value from dbo.History
where nullif(#term_type_id, term_type_id) is null
-- Remaining search criteria
and ...
However, this type of condition will be non-SARGable, most likely. Also, note that the order of arguments does matter in NULLIF(). Alternatively, you can devise CASE constructs of various complexity that might be semantically more suitable to your exact requirements.

Ignore other results if a resultset has been found

To start, take this snippet as an example:
SELECT *
FROM StatsVehicle
WHERE ((ReferenceMakeId = #referenceMakeId)
OR #referenceMakeId IS NULL)
This will fetch and filter the records if the variable #referenceMakeId is not null, and if it is null, will fetch all the records. In other words, it is taking the first one into consideration if #referenceMakeId is not null.
I would like to add a further restriction to this, how can I achieve this?
For instance
(ReferenceModelId = #referenceModeleId) OR
(
(ReferenceMakeId = #referenceMakeId) OR
(#referenceMakeId IS NULL)
)
If #referenceModelId is not null, it will only need to filter by ReferenceModelId, and ignore the other statements inside it. If I actually do this as such, it returns all the records. Is there anything that can be done to achieve such a thing?
Maybe something like this?
SELECT * FROM StatsVehicle WHERE
(
-- Removed the following, as it's not clear if this is beneficial
-- (#referenceModeleId IS NOT NULL) AND
(ReferenceModelId = #referenceModeleId)
) OR
(#referenceModeleId IS NULL AND
(
(ReferenceMakeId = #referenceMakeId) OR
(#referenceMakeId IS NULL)
)
)
This should do the trick.
SELECT * FROM StatsVehicle
WHERE ReferenceModelId = #referenceModeleId OR
(
#referenceModeleId IS NULL AND
(
#referenceMakeId IS NULL OR
ReferenceMakeId = #referenceMakeId
)
)
However, you should note that this types of queries (known as catch-all queries) tend to be less efficient then writing a single query for every case.
This is due to the fact that SQL Server will cache the first query plan that might not be optimal for other parameters.
You might want to consider using the OPTION (RECOMPILE) query hint, or braking down the stored procedure to pieces that will each handle the specific conditions (i.e one select for null variables, one select for non-null).
For more information, read this article.
If #referenceModelId is not null, it will only need to filter by
ReferenceModelId, and ignore the other statements inside it. If I
actually do this as such, it returns all the records. Is there
anything that can be done to achieve such a thing?
You can think of using a CASE for good short circuit mechanism
WHERE
CASE
WHEN #referenceModelId is not null AND ReferenceModelId = #referenceModeleId THEN 1
WHEN #referenceMakeId is not null AND ReferenceMakeId = #referenceMakeId THEN 1
WHEN #referenceModelId is null AND #referenceMakeId is null THEN 1
ELSE 0
END = 1

sql cte if statement

This might be a bad idea but just wondering?
;with Products
AS
(
/* can you have an if statement in here? */
select * from products
/* or */
select * from products
where condition
)
what i'm thinking is what if sometimes you have a search string or not. How do you make that a consideration in cte?
Or maybe it would be a better idea to have 2 cte in one procedure?
If you're passing in a search string as a parameter, you can check that it is null or not all in one statement. For example:
select *
from MyTable
where MyColumn = #SearchString
or #SearchString is null;
This will return records that match when the parameter is not null, and return all records when it is null.
As another option, you can always put case statements in your where clause.
Beyond that, if you truly need different queries, you can certainly branch with if BUT your query must be the very next statement after you declare the CTE. Thus you'd have to have a copy or your CTE and query in each branch of the if statement.
If you're thinking of passing an entire where clause and running it all as dynamic SQL (edit: meaning a non-parameterized concatenated string, not ORM-type sp_executesql), I would try refactoring to use any of the above methods first, as there are inherent problems with dynamic SQL. Dynamic SQL often looks clever and elegant at the outset, but should more often be seen as a last resort only when other options somehow turn out to be worse.
table variable might also help you. Check if it helps..
DECLARE #tbl TABLE(id int,name varchar(500), .... )
if <#booleanexpression = 1>
INSERT INTO #tbl select * from products
else
INSERT INTO #tbl select * from products where condition..
with cte as
( select * from #tbl )
select * from cte

Is there any way of improving the performance of this SQL Function?

I have a table which looks something like
Event ID Date Instructor
1 1/1/2000 Person 1
1 1/1/2000 Person 2
Now what I want to do is return this data so that each event is on one row and the Instructors are all in one column split with a <br> tag like 'Person 1 <br> Person 2'
Currently the way I have done this is to use a function
CREATE FUNCTION fnReturnInstructorNamesAsHTML
(
#EventID INT
)
RETURNS VARCHAR(max)
BEGIN
DECLARE #Result VARCHAR(MAX)
SELECT
#result = coalesce(#result + '<br>', '') + inst.InstructorName
FROM
[OpsInstructorEventsView] inst
WHERE
inst.EventID = #EventID
RETURN #result
END
Then my main stored procedure calls it like
SELECT
ev.[BGcolour],
ev.[Event] AS name,
ev.[eventid] AS ID,
ev.[eventstart],
ev.[CourseType],
ev.[Type],
ev.[OtherType],
ev.[OtherTypeDesc],
ev.[eventend],
ev.[CourseNo],
ev.[Confirmed],
ev.[Cancelled],
ev.[DeviceID] AS resource_id,
ev.Crew,
ev.CompanyName ,
ev.Notes,
dbo.fnReturnInstructorNamesAsHTML(ev.EventID) as Names
FROM
[OpsSimEventsView] ev
JOIN
[OpsInstructorEventsView] inst
ON
ev.EventID = inst.EventID
This is very slow, im looking at 4seconds per call to the DB. Is there a way for me to improve the performance of the function? Its a fairly small function so im not sure what I can do here, and I couldnt see a way to work the COALESCE into the SELECT of the main procedure.
Any help would be really appreciated, thanks.
You could try something like this.
SELECT
ev.[BGcolour],
ev.[Event] AS name,
ev.[eventid] AS ID,
ev.[eventstart],
ev.[CourseType],
ev.[Type],
ev.[OtherType],
ev.[OtherTypeDesc],
ev.[eventend],
ev.[CourseNo],
ev.[Confirmed],
ev.[Cancelled],
ev.[DeviceID] AS resource_id,
ev.Crew,
ev.CompanyName ,
ev.Notes,
STUFF((SELECT '<br>'+inst.InstructorName
FROM [OpsInstructorEventsView] inst
WHERE ev.EventID = inst.EventID
FOR XML PATH(''), TYPE).value('.', 'nvarchar(max)'), 1, 4, '') as Names
FROM
[OpsSimEventsView] ev
Not sure why you have joined OpsInstructorEventsView in the main query. I removed it here but if you needed you can just add it again.
A few things to look at:
1) The overhead of functions makes them expensive to call, especially in the select statement of a query that could potentially be returning thousands of rows. It will have to execute that function for every one of them. Consider merging the behavior of the function into your main stored procedure, where the SQL Server can make better use of its optimizer.
2) Since you are joining on event id in both tables, make sure you have an index on those two columns. I would expect that you do, given that those both appear to be primary key columns, but make sure. An index can make a huge difference.
3) Convert your coalesce call into its equivalent case statements to remove the overhead of calling that function.
Yes make it an INLINE Table-Valued SQL function:
CREATE FUNCTION fnReturnInstructorNamesAsHTML
( #EventID INT )
RETURNS Table
As
Return
SELECT InstructorName + '<br>' result
FROM OpsInstructorEventsView
WHERE EventID = #EventID
Go
Then, in your SQL Statement, use it like this
SELECT ]Other stuff],
(Select result from dbo.fnReturnInstructorNamesAsHTML(ev.EventID)) as Names
FROM OpsSimEventsView ev
JOIN OpsInstructorEventsView inst
ON ev.EventID = inst.EventID
I'm not exactly clear how the query you show in your question is concatenating data from multiple rows in one row of the result, but the problem is that ordinary UDFs are compiled on use, on EVERY use, so for each row in your output result the Query processopr has to recompile the UDF again. THis is NOT True for an "inline table valued" UDF, as it's sql is folded into the outer sql before it is passed to the SQL optimizer, (the subsystem that generates the statement cache plan) and so the UDF is only compiled once.