Performance implications of sql 'OR' conditions when one alternative is trivial? - sql

I'm creating a stored procedure for searching some data in my database according to some criteria input by the user.
My sql code looks like this:
Create Procedure mySearchProc
(
#IDCriteria bigint=null,
...
#MaxDateCriteria datetime=null
)
as
select Col1,...,Coln from MyTable
where (#IDCriteria is null or ID=#IDCriteria)
...
and (#MaxDateCriteria is null or Date<#MaxDateCriteria)
Edit : I've around 20 possible parameters, and each combination of n non-null parameters can happen.
Is it ok performance-wise to write this kind of code? (I'm using MS SQL Server 2008)
Would generating SQL code containing only the needed where clauses be notably faster?

OR clauses are notorious for causing performance issues mainly because they require table scans. If you can write the query without ORs you'll be better off.

where (#IDCriteria is null or ID=#IDCriteria)
and (#MaxDateCriteria is null or Date<#MaxDateCriteria)
If you write this criteria, then SQL server will not know whether it is better to use the index for IDs or the index for Dates.
For proper optimization, it is far better to write separate queries for each case and use IF to guide you to the correct one.
IF #IDCriteria is not null and #MaxDateCriteria is not null
--query
WHERE ID = #IDCriteria and Date < #MaxDateCriteria
ELSE IF #IDCriteria is not null
--query
WHERE ID = #IDCriteria
ELSE IF #MaxDateCriteria is not null
--query
WHERE Date < #MaxDateCriteria
ELSE
--query
WHERE 1 = 1
If you expect to need different plans out of the optimizer, you need to write different queries to get them!!
Would generating SQL code containing only the needed where clauses be notably faster?
Yes - if you expect the optimizer to choose between different plans.
Edit:
DECLARE #CustomerNumber int, #CustomerName varchar(30)
SET #CustomerNumber = 123
SET #CustomerName = '123'
SELECT * FROM Customers
WHERE (CustomerNumber = #CustomerNumber OR #CustomerNumber is null)
AND (CustomerName = #CustomerName OR #CustomerName is null)
CustomerName and CustomerNumber are indexed. Optimizer says : "Clustered
Index Scan with parallelization". You can't write a worse single table query.
Edit : I've around 20 possible parameters, and each combination of n non-null parameters can happen.
We had a similar "search" functionality in our database. When we looked at the actual queries issued, 99.9% of them used an AccountIdentifier. In your case, I suspect either one column is -always supplied- or one of two columns are always supplied. This would lead to 2 or 3 cases respectively.
It's not important to remove OR's from the whole structure. It is important to remove OR's from the column/s that you expect the optimizer to use to access the indexes.

So, to boil down the above comments:
Create a separate sub-procedure for each of the most popular variations of specific combinations of parameters, and within a dispatcher procedure call the appropriate one from an IF ELSE structure, the penultimate ELSE clause of which builds a query dynamically to cover the remaining cases.
Perhaps only one or two cases may be specifically coded at first, but as time goes by and particular combinations of parameters are identified as being statistically significant, implementation procedures may be written and the master IF ELSE construct extended to identify those cases and call the appropriate sub-procedure.

Regarding "Would generating SQL code containing only the needed where clauses be notably faster?"
I don't think so, because this way you effectively remove the positive effects of query plan caching.

You could perform selective queries, in order of the most common / efficient (indexed etc), parameters, and add PK(s) to a temporary table
That would create a (hopefully small!) subset of data
Then join that Temporary Table with the main table, using a full WHERE clause with
SELECT ...
FROM #TempTable AS T
JOIN dbo.MyTable AS M
ON M.ID = T.ID
WHERE (#IDCriteria IS NULL OR M.ID=#IDCriteria)
...
AND (#MaxDateCriteria IS NULL OR M.Date<#MaxDateCriteria)
style to refine the (small) subset.

What if constructs like these were replaced:
WHERE (#IDCriteria IS NULL OR #IDCriteria=ID)
AND (#MaxDateCriteria IS NULL OR Date<#MaxDateCriteria)
AND ...
with ones like these:
WHERE ID = ISNULL(#IDCriteria, ID)
AND Date < ISNULL(#MaxDateCriteria, DATEADD(millisecond, 1, Date))
AND ...
or is this just coating the same unoptimizable query in syntactic sugar?

Choosing the right index is hard for the optimizer. IMO, this is one of few cases where dynamic SQL is the best option.

this is one of the cases i use code building or a sproc for each searchoption.
since your search is so complex i'd go with code building.
you can do this either in code or with dynamic sql.
just be careful of SQL Injection.

I suggest one step further than some of the other suggestions - think about degeneralizing at a much higher abstraction level, preferably the UI structure. Usually this seems to happen when the problem is being pondered in data mode rather than user domain mode.
In practice, I've found that almost every such query has one or more non-null, fairly selective columns that would be reasonably optimizable, if one (or more) were specified. Furthermore, these are usually reasonable assumptions that users can understand.
Example: Find Orders by Customer; or Find Orders by Date Range; or Find Orders By Salesperson.
If this pattern applies, then you can decompose your hypergeneralized query into more purposeful subqueries that also make sense to users, and you can reasonably prompt for required values (or ranges), and not worry too much about crafting efficient expressions for subsidiary columns.
You may still end up with an "All Others" category. But at least then if you provide what is essentially an open-ended Query By Example form, then users will have some idea what they're getting into. Doing what you describe really puts you in the role of trying to out-think the query optimizer, which is folly IMHO.

I'm currently working with SQL 2005, so I don't know if the 2008 optimizer acts differently. That being said, I've found that you need to do a couple of things...
Make sure that you are using WITH (RECOMPILE) for your query
Use CASE statements to cause short-circuiting of the logic. At least in 2005 this is NOT done with OR statements. For example:
.
SELECT
...
FROM
...
WHERE
(1 =
CASE
WHEN #my_column IS NULL THEN 1
WHEN my_column = #my_column THEN 1
ELSE 0
END
)
The CASE statement will cause the SQL Server optimizer to recognize that it doesn't need to continue past the first WHEN. In this example it's not a big deal, but in my search procs a non-null parameter often meant searching in another table through a subquery for existence of a matching row, which got costly. Once I made this change the search procs started running much faster.

My suggestion is to build the sql string. You will gain maximum performance from index and reuse execution plan.
DECLARE #sql nvarchar(4000);
SET #sql = N''
IF #param1 IS NOT NULL
SET #sql = CASE WHEN #sql = N'' THEN N'' ELSE N' AND ' END + N'param1 = #param1';
IF #param2 IS NOT NULL
SET #sql = CASE WHEN #sql = N'' THEN N'' ELSE N' AND ' END + N'param2 = #param2';
...
IF #paramN IS NOT NULL
SET #sql = CASE WHEN #sql = N'' THEN N'' ELSE N' AND ' END + N'paramN = #paramN';
IF #sql <> N''
SET #sql = N' WHERE ' + #sql;
SET #sql = N'SELECT ... FROM myTable' + #sql;
EXEC sp_executesql #sql, N'#param1 type, #param2 type, ..., #paramN type', #param1, #param2, ..., #paramN;

Each time the procedure is called, passing different parameters, there is a different optimal execution plan for getting the data. The problem being, that SQL has cached an execution plan for your procedure and will use a sub-optimal (read terrible) execution plan.
I would recommend:
Create specific SPs for frequently run execution paths (i.e. passed parameter sets) optimised for each scenario.
Keep you main generic SP for edge cases (presuming they are rarely run) but use the WITH RECOMPILE clause to cause a new execution plan to be created each time the procedure is run.
We use OR clauses checking against NULLs for optional parameters to great affect. It works very well without the RECOMPILE option so long as the execution path is not drastically altered by passing different parameters.

Related

SQL: short-circuiting not working. "Null or empty full-text predicate" after upgrading to SQL Server 2012

I have the following query in SQL Server 2005 which works fine:
DECLARE #venuename NVARCHAR(100)
DECLARE #town NVARCHAR(100)
SET #venuename = NULL -- normally these are parameters in the stored proc.
SET #town = 'London'
SELECT COUNT(*) FROM dbo.Venue
WHERE
(#VenueName IS NULL OR CONTAINS((Venue.VenueName), #VenueName))
AND
(#Town IS NULL OR Town LIKE #Town + '%')
It uses short-circuiting when null values are passed for the parameters (there are many more in the real SP than shown in my example).
However after upgrading to SQL 2012, running this query with NULL passed for #VenueName fails with the error "Null or empty full-text predicate" as SQL Server seems to be running (or evaluating) the CONTAINS statement for #VenueName even when #VenueName is set to NULL.
Is there a way to use short-circuiting in 2012 or is this no longer possible? I'd hate to have to rewrite all of my SPs as we've used this technique in dozens of stored procedures across multiple projects over the years.
I do not know much about sql 2012 but can you please try following
DECLARE #venuename NVARCHAR(100)
DECLARE #town NVARCHAR(100)
SET #venuename = '""' -- -- **Yes '""' instead of null**.
SET #town = 'London'
SELECT COUNT(*) FROM dbo.Venue
WHERE
(#VenueName ='""' OR CONTAINS((Venue.VenueName), #VenueName))
AND
(#Town IS NULL OR Town LIKE #Town + '%')
Check out this thread: OR Operator Short-circuit in SQL Server Within SQL server, there is no guarantee that an OR clause breaks early. It's always been that way, so I guess you've just been lucky that it worked with SQL Server 2005.
To workaround your problem, consider using the ISNULL function every time you supply a parameter value that might be NULL, to the CONTAINS function.
This is Perfect Answer.
Let's examine these two statements:
IF (CONDITION 1) OR (CONDITION 2)
..
IF (CONDITION 3) AND (CONDITION 4)
...
If CONDITION 1 is TRUE, will CONDITION 2 be checked?
If CONDITION 3 is FALSE, will CONDITION 4 be checked?
What about conditions on WHERE: does the SQL Server engine optimize all conditions in a WHERE clause? Should programmers place conditions in the right order to be sure that the SQL Server optimizer resolves it in the right manner?
ADDED:
Thank to Jack for link, surprise from t-sql code:
IF 1/0 = 1 OR 1 = 1
SELECT 'True' AS result
ELSE
SELECT 'False' AS result
IF 1/0 = 1 AND 1 = 0
SELECT 'True' AS result
ELSE
SELECT 'False' AS result
There is not raise a Divide by zero exception in this case.
CONCLUSION:
If C++/C#/VB has short-circuiting why can't SQL Server have it?
To truly answer this let's take a look at how both work with conditions. C++/C#/VB all have short circuiting defined in the language specifications to speed up code execution. Why bother evaluating N OR conditions when the first one is already true or M AND conditions when the first one is already false.
We as developers have to be aware that SQL Server works differently. It is a cost based system. To get the optimal execution plan for our query the query processor has to evaluate every where condition and assign it a cost. These costs are then evaluated as a whole to form a threshold that must be lower than the defined threshold SQL Server has for a good plan. If the cost is lower than the defined threshold the plan is used, if not the whole process is repeated again with a different mix of condition costs. Cost here is either a scan or a seek or a merge join or a hash join etc... Because of this the short-circuiting as is available in C++/C#/VB simply isn't possible. You might think that forcing use of index on a column counts as short circuiting but it doesn't. It only forces the use of that index and with that shortens the list of possible execution plans. The system is still cost based.
As a developer you must be aware that SQL Server does not do short-circuiting like it is done in other programming languages and there's nothing you can do to force it to.

SQL Server 2008, different WHERE clauses with one query

I have a stored procedure which takes the same columns but with different WHERE clause.
Something like this.
SELECT
alarms.startt, alarms.endt, clients.code, clients.Plant,
alarms.controller, alarmtype.atype, alarmstatus.[text]
FROM alarms
INNER JOIN clients ON alarms.clientid = clients.C_id
INNER JOIN alarmstatus ON alarms.statusid = alarmstatus.AS_id
INNER JOIN alarmtype ON alarms.typeid = alarmtype.AT_id
and I put the same query in 3 if's (conditions) where the WHERE clause changes according the parameter passed in a variable.
Do I have to write the whole string over and over for each condition in every if?
Or can I optimize it to one time and the only thing what will change will be the WHERE clause?
You don't have to, you can get around it by doing something like
SELECT *
FROM [Query]
WHERE (#Parameter = 1 AND Column1 = 8)
OR (#Parameter = 2 AND Column2 = 8)
OR (#Parameter = 3 AND Column3 = 8)
However, just because you can do something, does not mean you should. Less verbose SQL does not mean better performance, so using something like:
IF #Parameter = 1
BEGIN
SELECT *
FROM [Query]
WHERE Column1 = 8
END
ELSE IF #Parameter = 2
BEGIN
SELECT *
FROM [Query]
WHERE Column2 = 8
END
ELSE IF #Parameter = 3
BEGIN
SELECT *
FROM [Query]
WHERE Column3 = 8
END
while equavalent to the first query should result in better perfomance as it will be optimised better.
You can avoid repeating the code if you do something like:
WHERE (col1 = #var1 AND #var1 IS NOT NULL)
OR ...
OPTION RECOMPILE;
You can also have some effect on this behavior with the parameterization setting of the database (simple vs. forced).
Something that avoids repeating the code and avoids sub-optimal plans due to parameter sniffing is to use dynamic SQL:
DECLARE #sql NVARCHAR(MAX) = N'SELECT ...';
IF #var1 IS NOT NULL
SET #sql = #sql + ' WHERE ...';
This may work better if you have the server setting "optimize for ad hoc queries" enabled.
I would probably stick with repeating the whole SQL Statement, but have resorted to this in the past...
WHERE (#whichwhere=1 AND mytable.field1=#id)
OR (#whichwhere=2 AND mytable.field2=#id)
OR (#whichwhere=3 AND mytable.field3=#id)
Not particularly readable, and you will have to check the execution plan if it is slow, but it keeps you from repeating the code.
Since no one has suggested this. You can put the original query in a view and then access the view with different WHERE clauses.
To improve performance, you can even add indexes to the view if you know what columns will be commonly used in the WHERE clause (check out http://msdn.microsoft.com/en-us/library/dd171921(v=sql.100).aspx).
Well like most things in SQL: it depends. There are a few consideration here.
Would the the different WHEREs lead to substantially different query
plans for execution e.g. one of the columns indexed but not the other
two
Is the query likely to change over time: i.e. customer
requirements needing other columns
Is the WHERE likely to become 4,
then 8, then 16 etc options.
One approach is to exec different procs into a temp table. Each proc would then have its own query plan.
Another approach would be to use dynamic SQL, once again each "query" would be assigned is own plan.
A third appoach would be to write an app that generated the SQL for each option, this could be either a stored proc or a sql string.
Then have a data set and do test driven development against it (this is true for each approach).
In the end the best learning solution is probably to
a) read about SQL Kalen Delaney Inside SQL is an acknowledged expert.
b) test your own solutions against your own data
I would go this way:
WHERE 8 = CASE #parameter
WHEN 1 THEN Column1
WHEN 2 THEN Column2
.
.
.

Reduce dynamic SQL using CASE to use "IN" or not

I am converting a stored procedure which I had previously written as a string then, using BIT parameters I decided whether to append certain WHERE/ON clauses
This sp is passed a number of comma-separated strings and then some of the dynamic WHERE clauses are like:
IF #pUse_Clause_A THEN SET #WhereClause = #WhereClause + ' AND [FIELD_A] IN (' + #pComma_Separated_List_A + ')'
In this case, #pComma_Separated_List_A is something like '1,3,6,66,22' ... a list of the things I want included.
Now I am changing these from strings into TVP,s so I can just use "real" SQL like
AND [FIELD_A] IN (SELECT [TVP_FIELD] FROM #pTVP_A)
When I do this, I don't like the string-building method
However, I also don't like having to nest the IF statements.
IF A
ENTIRE SQL WHERE A
ELSE
ENTIRE SQL WITHOUT WHERE CLAUSE
The more parameters I add, the more complicated it gets:
IF A
IF B
SQL WHERE A AND B
ELSE
SQL WHERE A
ELSE
IF B
SQL WHERE B
ELSE
SQL
What I would rather do is something like this:
SELECT * FROM TABLE
WHERE 1=1
CASE USE_A WHEN 1 THEN
AND [FIELD_A] IN (SELECT A FROM TBP_A)
END
CASE USE_B WHEN 1 THEN
AND [FIELD_B] IN (SELECT B FROM TVP_B)
END
I know it ignored SQL outside the chosen "IF" result, but having all that duplicated statement seems sloppy
Dynamically changing searches based on the given parameters is a complicated subject and doing it one way over another, even with only a very slight difference, can have massive performance implications. The key is to use an index, ignore compact code, ignore worrying about repeating code, you must make a good query execution plan (use an index).
Read this and consider all the methods. Your best method will depend on your parameters, your data, your schema, and your actual usage:
Dynamic Search Conditions in T-SQL by by Erland Sommarskog
The Curse and Blessings of Dynamic SQL by Erland Sommarskog

Conditional Joins - Dynamic SQL

The DBA here at work is trying to turn my straightforward stored procs into a dynamic sql monstrosity. Admittedly, my stored procedure might not be as fast as they'd like, but I can't help but believe there's an adequate way to do what is basically a conditional join.
Here's an example of my stored proc:
SELECT
*
FROM
table
WHERE
(
#Filter IS NULL OR table.FilterField IN
(SELECT Value FROM dbo.udfGetTableFromStringList(#Filter, ','))
)
The UDF turns a comma delimited list of filters (for example, bank names) into a table.
Obviously, having the filter condition in the where clause isn't ideal. Any suggestions of a better way to conditionally join based on a stored proc parameter are welcome. Outside of that, does anyone have any suggestions for or against the dynamic sql approach?
Thanks
You could INNER JOIN on the table returned from the UDF instead of using it in an IN clause
Your UDF might be something like
CREATE FUNCTION [dbo].[csl_to_table] (#list varchar(8000) )
RETURNS #list_table TABLE ([id] INT)
AS
BEGIN
DECLARE #index INT,
#start_index INT,
#id INT
SELECT #index = 1
SELECT #start_index = 1
WHILE #index <= DATALENGTH(#list)
BEGIN
IF SUBSTRING(#list,#index,1) = ','
BEGIN
SELECT #id = CAST(SUBSTRING(#list, #start_index, #index - #start_index ) AS INT)
INSERT #list_table ([id]) VALUES (#id)
SELECT #start_index = #index + 1
END
SELECT #index = #index + 1
END
SELECT #id = CAST(SUBSTRING(#list, #start_index, #index - #start_index ) AS INT)
INSERT #list_table ([id]) VALUES (#id)
RETURN
END
and then INNER JOIN on the ids in the returned table. This UDF assumes that you're passing in INTs in your comma separated list
EDIT:
In order to handle a null or no value being passed in for #filter, the most straightforward way that I can see would be to execute a different query within the sproc based on the #filter value. I'm not certain how this affects the cached execution plan (will update if someone can confirm) or if the end result would be faster than your original sproc, I think that the answer here would lie in testing.
Looks like the rewrite of the code is being addressed in another answer, but a good argument against dynamic SQL in a stored procedure is that it breaks the ownership chain.
That is, when you call a stored procedure normally, it executes under the permissions of the stored procedure owner EXCEPT when executing dynamic SQL with the execute command,for the context of the dynamic SQL it reverts back to the permissions of the caller, which may be undesirable depending on your security model.
In the end, you are probably better off compromising and rewriting it to address the concerns of the DBA while avoiding dynamic SQL.
I am not sure I understand your aversion to dynamic SQL. Perhaps it is that your UDF has nicely abstracted away some of the messyness of the problem, and you feel dynamic SQL will bring that back. Well, consider that most if not all DAL or ORM tools will rely extensively on dynamic SQL, and I think your problem could be restated as "how can I nicely abstract away the messyness of dynamic SQL".
For my part, dynamic SQL gives me exactly the query I want, and subsequently the performance and behavior I am looking for.
I don't see anything wrong with your approach. Rewriting it to use dynamic SQL to execute two different queries based on whether #Filter is null seems silly to me, honestly.
The only potential downside I can see of what you have is that it could cause some difficulty in determining a good execution plan. But if the performance is good enough as it is, there's no reason to change it.
No matter what you do (and the answers here all have good points), be sure to compare the performance and execution plans of each option.
Sometimes, hand optimization is simply pointless if it impacts your code maintainability and really produces no difference in how the code executes.
I would first simply look at changing the IN to a simple LEFT JOIN with NULL check (this doesn't get rid of your udf, but it should only get called once):
SELECT *
FROM table
LEFT JOIN dbo.udfGetTableFromStringList(#Filter, ',') AS filter
ON table.FilterField = filter.Value
WHERE #Filter IS NULL
OR filter.Value IS NOT NULL
It appears that you are trying to write a a single query to deal with two scenarios:
1. #filter = "x,y,z"
2. #filter IS NULL
To optimise scenario 2, I would INNER JOIN on the UDF, rather than use an IN clause...
SELECT * FROM table
INNER JOIN dbo.udfGetTableFromStringList(#Filter, ',') AS filter
ON table.FilterField = filter.Value
To optimise for scenario 2, I would NOT try to adapt the existing query, instead I would deliberately keep those cases separate, either an IF statement or a UNION and simulate the IF with a WHERE clause...
TSQL IF
IF (#filter IS NULL)
SELECT * FROM table
ELSE
SELECT * FROM table
INNER JOIN dbo.udfGetTableFromStringList(#Filter, ',') AS filter
ON table.FilterField = filter.Value
UNION to Simulate IF
SELECT * FROM table
INNER JOIN dbo.udfGetTableFromStringList(#Filter, ',') AS filter
ON table.FilterField = filter.Value
UNION ALL
SELECT * FROM table WHERE #filter IS NULL
The advantage of such designs is that each case is simple, and determining which is simple is it self simple. Combining the two into a single query, however, leads to compromises such as LEFT JOINs and so introduces significant performance loss to each.

Stored procedure bit parameter activating additional where clause to check for null

I have a stored procedure that looks like:
CREATE PROCEDURE dbo.usp_TestFilter
#AdditionalFilter BIT = 1
AS
SELECT *
FROM dbo.SomeTable T
WHERE
T.Column1 IS NOT NULL
AND CASE WHEN #AdditionalFilter = 1 THEN
T.Column2 IS NOT NULL
Needless to say, this doesn't work. How can I activate the additional where clause that checks for the #AdditionalFilter parameter? Thanks for any help.
CREATE PROCEDURE dbo.usp_TestFilter
#AdditionalFilter BIT = 1
AS
SELECT *
FROM dbo.SomeTable T
WHERE
T.Column1 IS NOT NULL
AND (#AdditionalFilter = 0 OR
T.Column2 IS NOT NULL)
If #AdditionalFilter is 0, the column won't be evaluated since it can't affect the outcome of the part between brackets. If it's anything other than 0, the column condition will be evaluated.
This practice tends to confuse the query optimizer. I've seen SQL Server 2000 build the execution plan exactly the opposite way round and use an index on Column1 when the flag was set and vice-versa. SQL Server 2005 seemed to at least get the execution plan right on first compilation, but you then have a new problem. The system caches compiled execution plans and tries to reuse them. If you first use the query one way, it will still execute the query that way even if the extra parameter changes, and different indexes would be more appropriate.
You can force a stored procedure to be recompiled on this execution by using WITH RECOMPILE in the EXEC statement, or every time by specifying WITH RECOMPILE on the CREATE PROCEDURE statement. There will be a penalty as SQL Server re-parses and optimizes the query each time.
In general, if the form of your query is going to change, use dynamic SQL generation with parameters. SQL Server will also cache execution plans for parameterized queries and auto-parameterized queries (where it tries to deduce which arguments are parameters), and even regular queries, but it gives most weight to stored procedure execution plans, then parameterized, auto-parameterized and regular queries in that order. The higher the weight, the longer it can stay in RAM before the plan is discarded, if the server needs the memory for something else.
CREATE PROCEDURE dbo.usp_TestFilter
#AdditionalFilter BIT = 1
AS
SELECT *
FROM dbo.SomeTable T
WHERE
T.Column1 IS NOT NULL
AND (NOT #AdditionalFilter OR T.Column2 IS NOT NULL)
select *
from SomeTable t
where t.Column1 is null
and (#AdditionalFilter = 0 or t.Column2 is not null)