Use case: I am going to be using SQL Server to retrieve values from a large table (1,000,000+ rows) where many different columns can be used as filter criteria, some more frequently used than others.
Questions
Would it be faster to utilize short-circuiting in the WHERE clause so that less comparisons are done?
Should the most commonly used criteria be filtered first to do even less comparisons?
Should the most commonly used criteria be indexed?
Example
No short circuiting
SELECT value
FROM AssignmentTable
WHERE (criteriaOne = <criteriaOneValue> OR criteriaOne IS NULL)
AND (criteriaTwo = <criteriaTwoValue> OR criteriaTwo IS NULL)
AND (criteriaThree = <criteriaThreeValue> OR criteriaThree IS NULL)
AND ... for all criteria (roughly 15)
With short circuiting
SELECT value
FROM AssignmentTable
WHERE 1 =
CASE
WHEN (criteriaOne = <criteriaOneValue> OR criteriaOne IS NULL) THEN
CASE
WHEN (criteriaTwo = <criteriaTwoValue> OR criteriaTwo IS NULL) THEN
CASE
WHEN (criteriaThree = <criteriaThreeValue> OR criteriaThree IS NULL) THEN 1
ELSE 0
END
ELSE 0
END
ELSE 0
END
The pattern for doing this without dynamic SQL in SQL Server is to use OPTION (RECOMPILE) to prune the un-needed predicates before the query optimizer generates a query plan.
EG:
SELECT value
FROM AssignmentTable
WHERE (Column1 = #column1 OR #column1 IS NULL)
AND (Column2 = #column2 OR #column2 IS NULL)
AND (Column3 = #column3 OR #column3 IS NULL)
AND ... for all criteria (roughly 15)
OPTION (RECOMPILE)
See the classic Dynamic Search Conditions in T-SQL for a complete discussion of the alternatives.
Related
I have a question for following 2 SQL:
declare #i1 bit, #b1 bit
declare #i2 bit, #b2 bit
declare #t table (Seq int)
insert into #t values (1)
-- verify data
select case when (select count(1) from #t n2 where 1 = 2) > 0 then 1 else 0 end
-- result 0
select #i1 = 1, #b1 = case when #i1 = 1 or ((select count(1) from #t n2 where 1 = 2) > 0) then 1 else 0 end from #t n where n.Seq = 1
select #i1, #b1
-- result 1, 0
select #i2 = 1, #b2 = case when #i2 = 1 or (0 > 0) then 1 else 0 end from #t n where n.Seq = 1
select #i2, #b2
-- result 1, 1
SQL Fiddle Here
Before the execute, I thought the case part should be null = 1 or (0 > 0), and it will return 0.
But now, I wondering why the 2nd SQL will return 1
Just to extend #Giorgi's answer:
See this execution plan:
Since #i2 is evaluated first (#i2=1), case when #i2 = 1 or anything returns 1.
See also this msdn entry: https://msdn.microsoft.com/en-us/library/ms187953.aspx and Caution section
If there are multiple assignment clauses in a single SELECT statement,
SQL Server does not guarantee the order of evaluation of the
expressions. Note that effects are only visible if there are
references among the assignments.
It's all related to internal optimization.
I will post this as an answer as it is quite large text from Training Kit (70-461):
WHERE propertytype = 'INT' AND CAST(propertyval AS INT) > 10
Some assume that unless precedence rules dictate otherwise, predicates
will be evaluated from left to right, and that short circuiting will
take place when possible. In other words, if the first predicate
propertytype = 'INT' evaluates to false, SQL Server won’t evaluate the
second predicate CAST(propertyval AS INT) > 10 because the result is
already known. Based on this assumption, the expectation is that the
query should never fail trying to convert something that isn’t
convertible.
The reality, though, is different. SQL Server does
internally support a short-circuit concept; however, due to the
all-at-once concept in the language, it is not necessarily going to
evaluate the expressions in left-to-right order. It could decide,
based on cost-related reasons, to start with the second expression,
and then if the second expression evaluates to true, to evaluate the
first expression as well. This means that if there are rows in the
table where propertytype is different than 'INT', and in those rows
propertyval isn’t convertible to INT, the query can fail due to a
conversion error.
Just to extend both answers.
From Dirty Secrets of the CASE Expression:
CASE will not always short circuit
The official documentation implies that the entire expression will short-circuit, meaning it will evaluate the expression from left-to-right, and stop evaluating when it hits a match:
The CASE statement evaluates its conditions sequentially and stops with the
first condition whose condition is satisfied.
And MS Connect:
CASE / COALESCE won't always evaluate in textual order
Aggregates Don't Follow the Semantics Of CASE
CASE Transact-SQL
The CASE statement evaluates its conditions sequentially and stops with the first condition whose condition is satisfied. In some situations, an expression is evaluated before a CASE statement receives the results of the expression as its input. Errors in evaluating these expressions are possible.
I am creating a SQL query in which I need a conditional where clause.
It should be something like this:
SELECT
DateAppr,
TimeAppr,
TAT,
LaserLTR,
Permit,
LtrPrinter,
JobName,
JobNumber,
JobDesc,
ActQty,
(ActQty-LtrPrinted) AS L,
(ActQty-QtyInserted) AS M,
((ActQty-LtrPrinted)-(ActQty-QtyInserted)) AS N
FROM
[test].[dbo].[MM]
WHERE
DateDropped = 0
--This is where i need the conditional clause
AND CASE
WHEN #JobsOnHold = 1 THEN DateAppr >= 0
ELSE DateAppr != 0
END
The above query is not working. Is this not the correct syntax or is there another way to do this that I don't know?
I don't want to use dynamic SQL, so is there any other way or do I have to use a workaround like using if else and using the same query with different where clauses?
Try this
SELECT
DateAppr,
TimeAppr,
TAT,
LaserLTR,
Permit,
LtrPrinter,
JobName,
JobNumber,
JobDesc,
ActQty,
(ActQty-LtrPrinted) AS L,
(ActQty-QtyInserted) AS M,
((ActQty-LtrPrinted)-(ActQty-QtyInserted)) AS N
FROM
[test].[dbo].[MM]
WHERE
DateDropped = 0
AND (
(ISNULL(#JobsOnHold, 0) = 1 AND DateAppr >= 0)
OR
(ISNULL(#JobsOnHold, 0) != 1 AND DateAppr != 0)
)
You can read more about conditional WHERE here.
Try this one -
WHERE DateDropped = 0
AND (
(ISNULL(#JobsOnHold, 0) = 1 AND DateAppr >= 0)
OR
(ISNULL(#JobsOnHold, 0) != 1 AND DateAppr != 0)
)
To answer the underlying question of how to use a CASE expression in the WHERE clause:
First remember that the value of a CASE expression has to have a normal data type value, not a boolean value. It has to be a varchar, or an int, or something. It's the same reason you can't say SELECT Name, 76 = Age FROM [...] and expect to get 'Frank', FALSE in the result set.
Additionally, all expressions in a WHERE clause need to have a boolean value. They can't have a value of a varchar or an int. You can't say WHERE Name; or WHERE 'Frank';. You have to use a comparison operator to make it a boolean expression, so WHERE Name = 'Frank';
That means that the CASE expression must be on one side of a boolean expression. You have to compare the CASE expression to something. It can't stand by itself!
Here:
WHERE
DateDropped = 0
AND CASE
WHEN #JobsOnHold = 1 AND DateAppr >= 0 THEN 'True'
WHEN DateAppr != 0 THEN 'True'
ELSE 'False'
END = 'True'
Notice how in the end the CASE expression on the left will turn the boolean expression into either 'True' = 'True' or 'False' = 'True'.
Note that there's nothing special about 'False' and 'True'. You can use 0 and 1 if you'd rather, too.
You can typically rewrite the CASE expression into boolean expressions we're more familiar with, and that's generally better for performance. However, sometimes is easier or more maintainable to use an existing expression than it is to convert the logic.
The problem with your query is that in CASE expressions, the THEN and ELSE parts have to have an expression that evaluates to a number or a varchar or any other datatype but not to a boolean value.
You just need to use boolean logic (or rather the ternary logic that SQL uses) and rewrite it:
WHERE
DateDropped = 0
AND ( #JobsOnHold = 1 AND DateAppr >= 0
OR (#JobsOnHold <> 1 OR #JobsOnHold IS NULL) AND DateAppr <> 0
)
Often when you use conditional WHERE clauses you end upp with a vastly inefficient query, which is noticeable for large datasets where indexes are used. A great way to optimize the query for different values of your parameter is to make a different execution plan for each value of the parameter. You can achieve this using OPTION (RECOMPILE).
In this example it would probably not make much difference, but say the condition should only be used in one of two cases, then you could notice a big impact.
In this example:
WHERE
DateDropped = 0
AND (
(ISNULL(#JobsOnHold, 0) = 1 AND DateAppr >= 0)
OR
(ISNULL(#JobsOnHold, 0) <> 1 AND DateAppr <> 0)
)
OPTION (RECOMPILE)
Source Parameter Sniffing, Embedding, and the RECOMPILE Options
This seemed easier to think about where either of two parameters could be passed into a stored procedure. It seems to work:
SELECT *
FROM x
WHERE CONDITION1
AND ((#pol IS NOT NULL AND x.PolicyNo = #pol) OR (#st IS NOT NULL AND x.State = #st))
AND OTHERCONDITIONS
I am trying to optimize a Stored procedure which is slow at the moment. It takes few parameters to search on which can be null. The query inside the SP looks like this.
SELECT
*some fields from the table*
FROM
[hrmCase]
WHERE
BrkId = #BrkId
AND
(ChannelId IN ('TO','TD'))
AND
case
when #PsId is null then 1
when (#PsId is not null) and ((SELECT SUBSTRING(UPPER(DATENAME(MONTH, Created)),1,3) + CAST(PSId AS varchar) PSId) like ('%' + #PsId + '%')) then 1 else 0
end = 1
AND
case
when #ACaseId is null then 1
when (#ACaseId is not null) and (AId like ('%' + #ACaseId + '%')) then 1 else 0
end = 1
AND
case
when #DateCreated is null then 1
when (#DateCreated is not null) and (dbo.StripTime(Created) = dbo.StripTime(#DateCreated)) then 1 else 0
end = 1
AND
case
when #Clients is null then 1
when (#Clients is not null) and (Client like ('%' + #Clients + '%')) then 1 else 0
end = 1
Is this the best way to do it or should I build a dynamic query based on the input parameters like below
Declare #SQLQuery AS NVarchar(4000)
Declare #ParamDefinition AS NVarchar(2000)
Set #SQLQuery = 'SELECT *some fields from the table* FROM [hrmCase] WHERE (1=1)'
If #PsId Is Not Null
Set #SQLQuery = #SQLQuery + ' And (SELECT SUBSTRING(UPPER(DATENAME(MONTH, Created)),1,3) + CAST(PSId AS varchar) PSId) like (''%''' + #PsId + '''%'')'
etc..
Which of the above two queries deemed more professional and will be quicker. Please suggest if there is a better way of doing the same thing.
Cheers,
DS
SQL Server is very bad at dealing with parameters like this--we've had many cases where we get it working acceptably and then 4 months later, something changes and it turns into table scans (or worse--I've seen performance FAR worse than simple table scans) all over again. (We have a 1.5TB database on SSD on mirrored 40 core servers). The best solution we've found is to use separate stored procedures or to use dynamic SQL. Many "experts" frown on dynamic SQL, but the reality is that in any decent environment these days, recompilation of dynamic SQL is insignificant in terms of CPU use and performance delay, and it eliminates the types of issues you're seeing because you eliminate the conditionals in the WHERE clauses that confuse the query optimizer.
The leading wildcards also will result in table scans in every case. Fulltext indexing can do this type of search much more efficiently than regular SQL. Some flavors of SQL Server support fulltext indexing and queries and some do not.
So maybe someone can point me in the right direction of what is causing this error? I've been fighting with this for a couple of hours and searching the web, and I can't figure out what I'm doing wrong here. It's included as part of a stored procedure, I don't know if that matters, if it does I can include that as well. Tables and field names have been changed to protect the innocent... meaning my job. Thanks.
SELECT
/* The fields are here*/
FROM
/* my joins are here */
WHERE
(Table.Field = stuff)
AND
(Table.Field2 = otherstuff)
AND
(Table2.Field3 = someotherstuff)
AND
CASE #param1
WHEN 0 THEN 'Table.Field IS NULL'
WHEN 1 THEN 'Table.Field2 IS NOT NULL'
ELSE ''
END
Thanks for the responses. Technically egrunin was the correct answer for this question, but OMG Ponies and Mark Byers were pretty much the same thing just missing that last piece. Thanks again.
I'm pretty sure the other answers leave out a case:
WHERE
(Table.Field = stuff)
AND
(Table.Field2 = otherstuff)
AND
(Table2.Field3 = someotherstuff)
AND
(
(#param1 = 0 and Table.Field IS NULL)
OR
(#param1 = 1 and NOT Table.Field2 IS NULL)
OR
(#param1 <> 0 AND #param1 <> 1) -- isn't this needed?
)
You are returning a string from your case expression, but only a boolean can be used. The string is not evaluated. You could do what you want using dynamic SQL, or you could write it like this instead:
AND (
(#param1 = 0 AND Table.Field IS NULL) OR
(#param1 = 1 AND Table.Field IS NOT NULL)
)
You can't use CASE in the WHERE clause like you are attempting
The text you provided in the CASE would only run if you were using dynamic SQL
Use:
WHERE Table.Field = stuff
AND Table.Field2 = otherstuff
AND Table2.Field3 = someotherstuff
AND ( (#param1 = 0 AND table.field IS NULL)
OR (#param1 = 1 AND table.field2 IS NOT NULL))
...which doesn't make sense if you already have Table.Field = stuff, etc...
Options that would perform better would be to either make the entire query dynamic SQL, or if there's only one parameter - use an IF/ELSE statement with separate queries & the correct WHERE clauses.
Is there an equivalent to VB's AndAlso/OrElse and C#'s &&/|| in SQL (SQL Server 2005). I am running a select query similar to the following:
SELECT a,b,c,d
FROM table1
WHERE
(#a IS NULL OR a = #a)
AND (#b IS NULL OR b = #b)
AND (#c IS NULL OR c = #c)
AND (#d IS NULL OR d = #d)
For example, if the "#a" parameter passed in as NULL there is no point in evaluating the 2nd part of the WHERE clause (a = #a). Is there a way to avoid this either by using special syntax or rewriting the query?
Thanks,
James.
The only way to guarantee the order of evaluation is to use CASE
WHERE
CASE
WHEN #a IS NULL THEN 1
WHEN a = #a THEN 1
ELSE 0
END = 1
AND /*repeat*/
In my experience this is usually slower then just letting the DB engine sort it out.
TerrorAustralis's answer is usually the best option for non-nullable columns
Try this:
AND a = ISNULL(#a,a)
This function looks at #a. If it is not null it equates the expression
AND a = #a
If it is null it equates the expression
AND a = a
(Since this is always true, it replaces the #b is null statement)
The query engine will take care of this for you. Your query, as written, is fine. All operators will "short circuit" if they can.
Another way is to do:
IF (#a > 0) IF (#a = 5)
BEGIN
END
Another if after the condition will do an "AndAlso" logic.
I want to emphesise that this is just a short way to write:
IF (#a > 0)
IF (#a = 5)
BEGIN
END
Take this example:
SELECT * FROM Orders
WHERE orderId LIKE '%[0-9]%'
AND dbo.JobIsPending(OrderId) = 1
Orders.OrderId is varchar(25)
dbo.JobIsPending(OrderId) UDF with int parameter
No short circuit is made as the conversion fails in dbo.JobIsPending(OrderId) when
Orders.OrderId NOT LIKE '%[0-9]%'
tested on SQL Server 2008 R2