Reduce dynamic SQL using CASE to use "IN" or not - sql

I am converting a stored procedure which I had previously written as a string then, using BIT parameters I decided whether to append certain WHERE/ON clauses
This sp is passed a number of comma-separated strings and then some of the dynamic WHERE clauses are like:
IF #pUse_Clause_A THEN SET #WhereClause = #WhereClause + ' AND [FIELD_A] IN (' + #pComma_Separated_List_A + ')'
In this case, #pComma_Separated_List_A is something like '1,3,6,66,22' ... a list of the things I want included.
Now I am changing these from strings into TVP,s so I can just use "real" SQL like
AND [FIELD_A] IN (SELECT [TVP_FIELD] FROM #pTVP_A)
When I do this, I don't like the string-building method
However, I also don't like having to nest the IF statements.
IF A
ENTIRE SQL WHERE A
ELSE
ENTIRE SQL WITHOUT WHERE CLAUSE
The more parameters I add, the more complicated it gets:
IF A
IF B
SQL WHERE A AND B
ELSE
SQL WHERE A
ELSE
IF B
SQL WHERE B
ELSE
SQL
What I would rather do is something like this:
SELECT * FROM TABLE
WHERE 1=1
CASE USE_A WHEN 1 THEN
AND [FIELD_A] IN (SELECT A FROM TBP_A)
END
CASE USE_B WHEN 1 THEN
AND [FIELD_B] IN (SELECT B FROM TVP_B)
END
I know it ignored SQL outside the chosen "IF" result, but having all that duplicated statement seems sloppy

Dynamically changing searches based on the given parameters is a complicated subject and doing it one way over another, even with only a very slight difference, can have massive performance implications. The key is to use an index, ignore compact code, ignore worrying about repeating code, you must make a good query execution plan (use an index).
Read this and consider all the methods. Your best method will depend on your parameters, your data, your schema, and your actual usage:
Dynamic Search Conditions in T-SQL by by Erland Sommarskog
The Curse and Blessings of Dynamic SQL by Erland Sommarskog

Related

USE WHERE 1=1 SQL [duplicate]

Why would someone use WHERE 1=1 AND <conditions> in a SQL clause (Either SQL obtained through concatenated strings, either view definition)
I've seen somewhere that this would be used to protect against SQL Injection, but it seems very weird.
If there is injection WHERE 1 = 1 AND injected OR 1=1 would have the same result as injected OR 1=1.
Later edit: What about the usage in a view definition?
Thank you for your answers.
Still,
I don't understand why would someone use this construction for defining a view, or use it inside a stored procedure.
Take this for example:
CREATE VIEW vTest AS
SELECT FROM Table WHERE 1=1 AND table.Field=Value
If the list of conditions is not known at compile time and is instead built at run time, you don't have to worry about whether you have one or more than one condition. You can generate them all like:
and <condition>
and concatenate them all together. With the 1=1 at the start, the initial and has something to associate with.
I've never seen this used for any kind of injection protection, as you say it doesn't seem like it would help much. I have seen it used as an implementation convenience. The SQL query engine will end up ignoring the 1=1 so it should have no performance impact.
Just adding a example code to Greg's answer:
dim sqlstmt as new StringBuilder
sqlstmt.add("SELECT * FROM Products")
sqlstmt.add(" WHERE 1=1")
''// From now on you don't have to worry if you must
''// append AND or WHERE because you know the WHERE is there
If ProductCategoryID <> 0 then
sqlstmt.AppendFormat(" AND ProductCategoryID = {0}", trim(ProductCategoryID))
end if
If MinimunPrice > 0 then
sqlstmt.AppendFormat(" AND Price >= {0}", trim(MinimunPrice))
end if
I've seen it used when the number of conditions can be variable.
You can concatenate conditions using an " AND " string. Then, instead of counting the number of conditions you're passing in, you place a "WHERE 1=1" at the end of your stock SQL statement and throw on the concatenated conditions.
Basically, it saves you having to do a test for conditions and then add a "WHERE" string before them.
Seems like a lazy way to always know that your WHERE clause is already defined and allow you to keep adding conditions without having to check if it is the first one.
Indirectly Relevant: when 1=2 is used:
CREATE TABLE New_table_name
as
select *
FROM Old_table_name
WHERE 1 = 2;
this will create a new table with same schema as old table. (Very handy if you want to load some data for compares)
I found this pattern useful when I'm testing or double checking things on the database, so I can very quickly comment other conditions:
CREATE VIEW vTest AS
SELECT FROM Table WHERE 1=1
AND Table.Field=Value
AND Table.IsValid=true
turns into:
CREATE VIEW vTest AS
SELECT FROM Table WHERE 1=1
--AND Table.Field=Value
--AND Table.IsValid=true
1 = 1 expression is commonly used in generated sql code. This expression can simplify sql generating code reducing number of conditional statements.
Actually, I've seen this sort of thing used in BIRT reports. The query passed to the BIRT runtime is of the form:
select a,b,c from t where a = ?
and the '?' is replaced at runtime by an actual parameter value selected from a drop-down box. The choices in the drop-down are given by:
select distinct a from t
union all
select '*' from sysibm.sysdummy1
so that you get all possible values plus "*". If the user selects "*" from the drop down box (meaning all values of a should be selected), the query has to be modified (by Javascript) before being run.
Since the "?" is a positional parameter and MUST remain there for other things to work, the Javascript modifies the query to be:
select a,b,c from t where ((a = ?) or (1==1))
That basically removes the effect of the where clause while still leaving the positional parameter in place.
I've also seen the AND case used by lazy coders whilst dynamically creating an SQL query.
Say you have to dynamically create a query that starts with select * from t and checks:
the name is Bob; and
the salary is > $20,000
some people would add the first with a WHERE and subsequent ones with an AND thus:
select * from t where name = 'Bob' and salary > 20000
Lazy programmers (and that's not necessarily a bad trait) wouldn't distinguish between the added conditions, they'd start with select * from t where 1=1 and just add AND clauses after that.
select * from t where 1=1 and name = 'Bob' and salary > 20000
where 1=0, This is done to check if the table exists. Don't know why 1=1 is used.
While I can see that 1=1 would be useful for generated SQL, a technique I use in PHP is to create an array of clauses and then do
implode (" AND ", $clauses);
thus avoiding the problem of having a leading or trailing AND. Obviously this is only useful if you know that you are going to have at least one clause!
Here's a closely related example: using a SQL MERGE statement to update the target tabled using all values from the source table where there is no common attribute on which to join on e.g.
MERGE INTO Circles
USING
(
SELECT pi
FROM Constants
) AS SourceTable
ON 1 = 1
WHEN MATCHED THEN
UPDATE
SET circumference = 2 * SourceTable.pi * radius;
If you came here searching for WHERE 1, note that WHERE 1 and WHERE 1=1 are identical. WHERE 1 is used rarely because some database systems reject it considering WHERE 1 not really being boolean.
Why would someone use WHERE 1=1 AND <proper conditions>
I've seen homespun frameworks do stuff like this (blush), as this allows lazy parsing practices to be applied to both the WHERE and AND Sql keywords.
For example (I'm using C# as an example here), consider the conditional parsing of the following predicates in a Sql query string builder:
var sqlQuery = "SELECT * FROM FOOS WHERE 1 = 1"
if (shouldFilterForBars)
{
sqlQuery = sqlQuery + " AND Bars > 3";
}
if (shouldFilterForBaz)
{
sqlQuery = sqlQuery + " AND Baz < 12";
}
The "benefit" of WHERE 1 = 1 means that no special code is needed:
For AND - whether zero, one or both predicates (Bars and Baz's) should be applied, which would determine whether the first AND is required. Since we already have at least one predicate with the 1 = 1, it means AND is always OK.
For no predicates at all - In the case where there are ZERO predicates, then the WHERE must be dropped. But again, we can be lazy, because we are again guarantee of at least one predicate.
This is obviously a bad idea and would recommend using an established data access framework or ORM for parsing optional and conditional predicates in this way.
Having review all the answers i decided to perform some experiment like
SELECT
*
FROM MyTable
WHERE 1=1
Then i checked with other numbers
WHERE 2=2
WHERE 10=10
WHERE 99=99
ect
Having done all the checks, the query run town is the same. even without the where clause. I am not a fan of the syntax
This is useful in a case where you have to use dynamic query in which in where
clause you have to append some filter options. Like if you include options 0 for status is inactive, 1 for active. Based from the options, there is only two available options(0 and 1) but if you want to display All records, it is handy to include in where close 1=1.
See below sample:
Declare #SearchValue varchar(8)
Declare #SQLQuery varchar(max) = '
Select [FirstName]
,[LastName]
,[MiddleName]
,[BirthDate]
,Case
when [Status] = 0 then ''Inactive''
when [Status] = 1 then ''Active''
end as [Status]'
Declare #SearchOption nvarchar(100)
If (#SearchValue = 'Active')
Begin
Set #SearchOption = ' Where a.[Status] = 1'
End
If (#SearchValue = 'Inactive')
Begin
Set #SearchOption = ' Where a.[Status] = 0'
End
If (#SearchValue = 'All')
Begin
Set #SearchOption = ' Where 1=1'
End
Set #SQLQuery = #SQLQuery + #SearchOption
Exec(#SQLQuery);
Saw this in production code and asked seniors for help.
Their answer:
-We use 1=1 so when we have to add a new condition we can just type
and <condition>
and get on with it.
I do this usually when I am building dynamic SQL for a report which has many dropdown values a user can select. Since the user may or may not select the values from each dropdown, we end up getting a hard time figuring out which condition was the first where clause. So we pad up the query with a where 1=1 in the end and add all where clauses after that.
Something like
select column1, column2 from my table where 1=1 {name} {age};
Then we would build the where clause like this and pass it as a parameter value
string name_whereClause= ddlName.SelectedIndex > 0 ? "AND name ='"+ ddlName.SelectedValue+ "'" : "";
As the where clause selection are unknown to us at runtime, so this helps us a great deal in finding whether to include an 'AND' or 'WHERE'.
Making "where 1=1" the standard for all your queries also makes it trivially easy to validate the sql by replacing it with where 1 = 0, handy when you have batches of commands/files.
Also makes it trivially easy to find the end of the end of the from/join section of any query. Even queries with sub-queries if properly indented.
I first came across this back with ADO and classic asp, the answer i got was: performance.
if you do a straight
Select * from tablename
and pass that in as an sql command/text you will get a noticeable performance increase with the
Where 1=1
added, it was a visible difference. something to do with table headers being returned as soon as the first condition is met, or some other craziness, anyway, it did speed things up.
Using a predicate like 1=1 is a normal hint sometimes used to force the access plan to use or not use an index scan. The reason why this is used is when you are using a multi-nested joined query with many predicates in the where clause where sometimes even using all of the indexes causes the access plan to read each table - a full table scan. This is just 1 of many hints used by DBAs to trick a dbms into using a more efficient path. Just don't throw one in; you need a dba to analyze the query since it doesn't always work.
Here is a use case... however I am not too concerned with the technicalities of why I should or not use 1 = 1.
I am writing a function, using pyodbc to retrieve some data from SQL Server. I was looking for a way to force a filler after the where keyword in my code. This was a great suggestion indeed:
if _where == '': _where = '1=1'
...
...
...
cur.execute(f'select {predicate} from {table_name} where {_where}')
The reason is because I could not implement the keyword 'where' together inside the _where clause variable. So, I think using any dummy condition that evaluates to true would do as a filler.

Good or Bad: 'where 1=1' in sql condition [duplicate]

Why would someone use WHERE 1=1 AND <conditions> in a SQL clause (Either SQL obtained through concatenated strings, either view definition)
I've seen somewhere that this would be used to protect against SQL Injection, but it seems very weird.
If there is injection WHERE 1 = 1 AND injected OR 1=1 would have the same result as injected OR 1=1.
Later edit: What about the usage in a view definition?
Thank you for your answers.
Still,
I don't understand why would someone use this construction for defining a view, or use it inside a stored procedure.
Take this for example:
CREATE VIEW vTest AS
SELECT FROM Table WHERE 1=1 AND table.Field=Value
If the list of conditions is not known at compile time and is instead built at run time, you don't have to worry about whether you have one or more than one condition. You can generate them all like:
and <condition>
and concatenate them all together. With the 1=1 at the start, the initial and has something to associate with.
I've never seen this used for any kind of injection protection, as you say it doesn't seem like it would help much. I have seen it used as an implementation convenience. The SQL query engine will end up ignoring the 1=1 so it should have no performance impact.
Just adding a example code to Greg's answer:
dim sqlstmt as new StringBuilder
sqlstmt.add("SELECT * FROM Products")
sqlstmt.add(" WHERE 1=1")
''// From now on you don't have to worry if you must
''// append AND or WHERE because you know the WHERE is there
If ProductCategoryID <> 0 then
sqlstmt.AppendFormat(" AND ProductCategoryID = {0}", trim(ProductCategoryID))
end if
If MinimunPrice > 0 then
sqlstmt.AppendFormat(" AND Price >= {0}", trim(MinimunPrice))
end if
I've seen it used when the number of conditions can be variable.
You can concatenate conditions using an " AND " string. Then, instead of counting the number of conditions you're passing in, you place a "WHERE 1=1" at the end of your stock SQL statement and throw on the concatenated conditions.
Basically, it saves you having to do a test for conditions and then add a "WHERE" string before them.
Seems like a lazy way to always know that your WHERE clause is already defined and allow you to keep adding conditions without having to check if it is the first one.
Indirectly Relevant: when 1=2 is used:
CREATE TABLE New_table_name
as
select *
FROM Old_table_name
WHERE 1 = 2;
this will create a new table with same schema as old table. (Very handy if you want to load some data for compares)
I found this pattern useful when I'm testing or double checking things on the database, so I can very quickly comment other conditions:
CREATE VIEW vTest AS
SELECT FROM Table WHERE 1=1
AND Table.Field=Value
AND Table.IsValid=true
turns into:
CREATE VIEW vTest AS
SELECT FROM Table WHERE 1=1
--AND Table.Field=Value
--AND Table.IsValid=true
1 = 1 expression is commonly used in generated sql code. This expression can simplify sql generating code reducing number of conditional statements.
Actually, I've seen this sort of thing used in BIRT reports. The query passed to the BIRT runtime is of the form:
select a,b,c from t where a = ?
and the '?' is replaced at runtime by an actual parameter value selected from a drop-down box. The choices in the drop-down are given by:
select distinct a from t
union all
select '*' from sysibm.sysdummy1
so that you get all possible values plus "*". If the user selects "*" from the drop down box (meaning all values of a should be selected), the query has to be modified (by Javascript) before being run.
Since the "?" is a positional parameter and MUST remain there for other things to work, the Javascript modifies the query to be:
select a,b,c from t where ((a = ?) or (1==1))
That basically removes the effect of the where clause while still leaving the positional parameter in place.
I've also seen the AND case used by lazy coders whilst dynamically creating an SQL query.
Say you have to dynamically create a query that starts with select * from t and checks:
the name is Bob; and
the salary is > $20,000
some people would add the first with a WHERE and subsequent ones with an AND thus:
select * from t where name = 'Bob' and salary > 20000
Lazy programmers (and that's not necessarily a bad trait) wouldn't distinguish between the added conditions, they'd start with select * from t where 1=1 and just add AND clauses after that.
select * from t where 1=1 and name = 'Bob' and salary > 20000
where 1=0, This is done to check if the table exists. Don't know why 1=1 is used.
While I can see that 1=1 would be useful for generated SQL, a technique I use in PHP is to create an array of clauses and then do
implode (" AND ", $clauses);
thus avoiding the problem of having a leading or trailing AND. Obviously this is only useful if you know that you are going to have at least one clause!
Here's a closely related example: using a SQL MERGE statement to update the target tabled using all values from the source table where there is no common attribute on which to join on e.g.
MERGE INTO Circles
USING
(
SELECT pi
FROM Constants
) AS SourceTable
ON 1 = 1
WHEN MATCHED THEN
UPDATE
SET circumference = 2 * SourceTable.pi * radius;
If you came here searching for WHERE 1, note that WHERE 1 and WHERE 1=1 are identical. WHERE 1 is used rarely because some database systems reject it considering WHERE 1 not really being boolean.
Why would someone use WHERE 1=1 AND <proper conditions>
I've seen homespun frameworks do stuff like this (blush), as this allows lazy parsing practices to be applied to both the WHERE and AND Sql keywords.
For example (I'm using C# as an example here), consider the conditional parsing of the following predicates in a Sql query string builder:
var sqlQuery = "SELECT * FROM FOOS WHERE 1 = 1"
if (shouldFilterForBars)
{
sqlQuery = sqlQuery + " AND Bars > 3";
}
if (shouldFilterForBaz)
{
sqlQuery = sqlQuery + " AND Baz < 12";
}
The "benefit" of WHERE 1 = 1 means that no special code is needed:
For AND - whether zero, one or both predicates (Bars and Baz's) should be applied, which would determine whether the first AND is required. Since we already have at least one predicate with the 1 = 1, it means AND is always OK.
For no predicates at all - In the case where there are ZERO predicates, then the WHERE must be dropped. But again, we can be lazy, because we are again guarantee of at least one predicate.
This is obviously a bad idea and would recommend using an established data access framework or ORM for parsing optional and conditional predicates in this way.
Having review all the answers i decided to perform some experiment like
SELECT
*
FROM MyTable
WHERE 1=1
Then i checked with other numbers
WHERE 2=2
WHERE 10=10
WHERE 99=99
ect
Having done all the checks, the query run town is the same. even without the where clause. I am not a fan of the syntax
This is useful in a case where you have to use dynamic query in which in where
clause you have to append some filter options. Like if you include options 0 for status is inactive, 1 for active. Based from the options, there is only two available options(0 and 1) but if you want to display All records, it is handy to include in where close 1=1.
See below sample:
Declare #SearchValue varchar(8)
Declare #SQLQuery varchar(max) = '
Select [FirstName]
,[LastName]
,[MiddleName]
,[BirthDate]
,Case
when [Status] = 0 then ''Inactive''
when [Status] = 1 then ''Active''
end as [Status]'
Declare #SearchOption nvarchar(100)
If (#SearchValue = 'Active')
Begin
Set #SearchOption = ' Where a.[Status] = 1'
End
If (#SearchValue = 'Inactive')
Begin
Set #SearchOption = ' Where a.[Status] = 0'
End
If (#SearchValue = 'All')
Begin
Set #SearchOption = ' Where 1=1'
End
Set #SQLQuery = #SQLQuery + #SearchOption
Exec(#SQLQuery);
Saw this in production code and asked seniors for help.
Their answer:
-We use 1=1 so when we have to add a new condition we can just type
and <condition>
and get on with it.
I do this usually when I am building dynamic SQL for a report which has many dropdown values a user can select. Since the user may or may not select the values from each dropdown, we end up getting a hard time figuring out which condition was the first where clause. So we pad up the query with a where 1=1 in the end and add all where clauses after that.
Something like
select column1, column2 from my table where 1=1 {name} {age};
Then we would build the where clause like this and pass it as a parameter value
string name_whereClause= ddlName.SelectedIndex > 0 ? "AND name ='"+ ddlName.SelectedValue+ "'" : "";
As the where clause selection are unknown to us at runtime, so this helps us a great deal in finding whether to include an 'AND' or 'WHERE'.
Making "where 1=1" the standard for all your queries also makes it trivially easy to validate the sql by replacing it with where 1 = 0, handy when you have batches of commands/files.
Also makes it trivially easy to find the end of the end of the from/join section of any query. Even queries with sub-queries if properly indented.
I first came across this back with ADO and classic asp, the answer i got was: performance.
if you do a straight
Select * from tablename
and pass that in as an sql command/text you will get a noticeable performance increase with the
Where 1=1
added, it was a visible difference. something to do with table headers being returned as soon as the first condition is met, or some other craziness, anyway, it did speed things up.
Using a predicate like 1=1 is a normal hint sometimes used to force the access plan to use or not use an index scan. The reason why this is used is when you are using a multi-nested joined query with many predicates in the where clause where sometimes even using all of the indexes causes the access plan to read each table - a full table scan. This is just 1 of many hints used by DBAs to trick a dbms into using a more efficient path. Just don't throw one in; you need a dba to analyze the query since it doesn't always work.
Here is a use case... however I am not too concerned with the technicalities of why I should or not use 1 = 1.
I am writing a function, using pyodbc to retrieve some data from SQL Server. I was looking for a way to force a filler after the where keyword in my code. This was a great suggestion indeed:
if _where == '': _where = '1=1'
...
...
...
cur.execute(f'select {predicate} from {table_name} where {_where}')
The reason is because I could not implement the keyword 'where' together inside the _where clause variable. So, I think using any dummy condition that evaluates to true would do as a filler.

Does SQL Server optimize LIKE ('%%') query?

I have a Stored Proc which performs search on records.
The problem is that some of the search criteria,which are coming from UI, may be empty string.
So, when the criteria not specified, the LIKE statement becomes redundant.
How can I effectively perform that search or Sql Server? Or, Does it optimize LIKE('%%') query since it means there is nothing to compare?
The Stored proc is like this:
ALTER PROC [FRA].[MCC_SEARCH]
#MCC_Code varchar(4),
#MCC_Desc nvarchar(50),
#Detail nvarchar(50)
AS
BEGIN
SELECT
MCC_Code,
MCC_Desc,
CreateDate,
CreatingUser
FROM
FRA.MCC (NOLOCK)
WHERE
MCC_Code LIKE ('%' + #MCC_Code + '%')
AND MCC_Desc LIKE ('%' + #MCC_Desc + '%')
AND Detail LIKE ('%' + #Detail + '%')
ORDER BY MCC_Code
END
With regard to an optimal, index-using execution plan - no. The prefixing wildcard prevents an index from being used, resulting in a scan instead.
If you do not have a wildcard on the end of the search term as well, then that scenario can be optimised - something I blogged out a while back: Optimising wildcard prefixed LIKE conditions
Update
To clarify my point:
LIKE 'Something%' - is able to use an index
LIKE '%Something' - is not able to use an index out-of-the-box. But you can optimise this to allow it to use an index by following the "REVERSE technique" I linked to.
LIKE '%Something%' - is not able to use an index. Nothing you can do to optimise for LIKE.
The short answer is - no
The long answer is - absolutely not
Does it optimize LIKE('%%') query since it means there is nothing to compare?
The statement is untrue, because there is something to compare. The following are equivalent
WHERE column LIKE '%%'
WHERE column IS NOT NULL
IS NOT NULL requires a table scan, unless there are very few non-null values in the column and it is well indexed.
EDIT
Resource on Dynamic Search procedures in SQL Server:
You simply must read this article by Erland Sommarskog, SQL Server MVP http://www.sommarskog.se/dyn-search.html (pick your version, or read both)
Otherwise if you need good performance on CONTAINS style searches, consider using SQL Server Fulltext engine.
If you use a LIKE clausule, and specify a wildcard-character (%) as a prefix of the searchstring, SQL Server (and all other DBMS'es I guess) will not be able to use indexes that might exists on that column.
I don't know if it optimizes the query if you use an empty search-argument ... Perhaps your question may be answered if you look at the execution plan ?
Edit: I've just checked this out, and the execution plan of this statement:
select * from mytable
is exactly the same as this the exec plan of this statement:
select * from mytable where description like '%'
Both SQL statements simply use a clustered index scan.

sql where clause in select statement issue

I am using SQL Server 2008 Enterprise with Windows Server 2008 Enterprise. I have a database table called "Book", which has many columns and three columns are important in this question, they are
Author, varchar;
Country, varchar;
Domain, varchar.
I want to write a store procedure with the following logics, but I do not know how to write (because of complex query conditions), appreciate if anyone could write a sample for me?
Input parameter: p_author as varchar, p_country as varchar, and p_domain as varchar
Query conditions:
if p_author is specified from input, then any row whose Author column LIKE %p_author% is satisfied with condition, if p_author is not specified from input every row is satisfied with this condition;
if p_country is specified from input, then any row whose Country column = p_country is satisfied with condition, if p_country is not specified from input every row is satisfied with this condition;
if p_domain is specified from input, then any row whose Domain column LIKE %p_domain% is satisfied, if p_domain is not specified from input every row is satisfied with this condition;
The results I want to return (must met with all following conditions):
records met with either condition 1 or 2;
records must meet with condition 3;
return distinct rows.
For example, records which met with condition 1 and condition 3 are ok to return, and records which met with condition 2 and condition 3 are ok to return.
thanks in advance,
George
Dynamically changing searches based on the given parameters is a complicated subject and doing it one way over another, even with only a very slight difference, can have massive performance implications. The key is to use an index, ignore compact code, ignore worrying about repeating code, you must make a good query execution plan (use an index).
Read this and consider all the methods. Your best method will depend on your parameters, your data, your schema, and your actual usage:
Dynamic Search Conditions in T-SQL by by Erland Sommarskog
The Curse and Blessings of Dynamic SQL by Erland Sommarskog
If you have the proper SQL Server 2008 version (SQL 2008 SP1 CU5 (10.0.2746) and later), you can use this little trick to actually use an index:
There isn't much you can do since you are using LIKE, but if you were using equality, you could add OPTION (RECOMPILE) onto your query, see Erland's article, and SQL Server will resolve the OR from within (Column = #Param+'%' OR #Param='') AND ... before the query plan is created based on the run-time values of the local variables, and an index can be used if you weren't using LIKE.
If I understand correctly, the following should work:
SELECT *
FROM Books
WHERE (
((Author LIKE '%' + #p_author + '%' OR #p_author = '') OR
(Country LIKE '%' + #p_country + '%' OR #p_country = ''))
AND (#p_author <> '' OR #p_country <> '')
) AND
(Domain LIKE '%' + #p_domain + '%' OR '%' #p_domain = '')

Performance implications of sql 'OR' conditions when one alternative is trivial?

I'm creating a stored procedure for searching some data in my database according to some criteria input by the user.
My sql code looks like this:
Create Procedure mySearchProc
(
#IDCriteria bigint=null,
...
#MaxDateCriteria datetime=null
)
as
select Col1,...,Coln from MyTable
where (#IDCriteria is null or ID=#IDCriteria)
...
and (#MaxDateCriteria is null or Date<#MaxDateCriteria)
Edit : I've around 20 possible parameters, and each combination of n non-null parameters can happen.
Is it ok performance-wise to write this kind of code? (I'm using MS SQL Server 2008)
Would generating SQL code containing only the needed where clauses be notably faster?
OR clauses are notorious for causing performance issues mainly because they require table scans. If you can write the query without ORs you'll be better off.
where (#IDCriteria is null or ID=#IDCriteria)
and (#MaxDateCriteria is null or Date<#MaxDateCriteria)
If you write this criteria, then SQL server will not know whether it is better to use the index for IDs or the index for Dates.
For proper optimization, it is far better to write separate queries for each case and use IF to guide you to the correct one.
IF #IDCriteria is not null and #MaxDateCriteria is not null
--query
WHERE ID = #IDCriteria and Date < #MaxDateCriteria
ELSE IF #IDCriteria is not null
--query
WHERE ID = #IDCriteria
ELSE IF #MaxDateCriteria is not null
--query
WHERE Date < #MaxDateCriteria
ELSE
--query
WHERE 1 = 1
If you expect to need different plans out of the optimizer, you need to write different queries to get them!!
Would generating SQL code containing only the needed where clauses be notably faster?
Yes - if you expect the optimizer to choose between different plans.
Edit:
DECLARE #CustomerNumber int, #CustomerName varchar(30)
SET #CustomerNumber = 123
SET #CustomerName = '123'
SELECT * FROM Customers
WHERE (CustomerNumber = #CustomerNumber OR #CustomerNumber is null)
AND (CustomerName = #CustomerName OR #CustomerName is null)
CustomerName and CustomerNumber are indexed. Optimizer says : "Clustered
Index Scan with parallelization". You can't write a worse single table query.
Edit : I've around 20 possible parameters, and each combination of n non-null parameters can happen.
We had a similar "search" functionality in our database. When we looked at the actual queries issued, 99.9% of them used an AccountIdentifier. In your case, I suspect either one column is -always supplied- or one of two columns are always supplied. This would lead to 2 or 3 cases respectively.
It's not important to remove OR's from the whole structure. It is important to remove OR's from the column/s that you expect the optimizer to use to access the indexes.
So, to boil down the above comments:
Create a separate sub-procedure for each of the most popular variations of specific combinations of parameters, and within a dispatcher procedure call the appropriate one from an IF ELSE structure, the penultimate ELSE clause of which builds a query dynamically to cover the remaining cases.
Perhaps only one or two cases may be specifically coded at first, but as time goes by and particular combinations of parameters are identified as being statistically significant, implementation procedures may be written and the master IF ELSE construct extended to identify those cases and call the appropriate sub-procedure.
Regarding "Would generating SQL code containing only the needed where clauses be notably faster?"
I don't think so, because this way you effectively remove the positive effects of query plan caching.
You could perform selective queries, in order of the most common / efficient (indexed etc), parameters, and add PK(s) to a temporary table
That would create a (hopefully small!) subset of data
Then join that Temporary Table with the main table, using a full WHERE clause with
SELECT ...
FROM #TempTable AS T
JOIN dbo.MyTable AS M
ON M.ID = T.ID
WHERE (#IDCriteria IS NULL OR M.ID=#IDCriteria)
...
AND (#MaxDateCriteria IS NULL OR M.Date<#MaxDateCriteria)
style to refine the (small) subset.
What if constructs like these were replaced:
WHERE (#IDCriteria IS NULL OR #IDCriteria=ID)
AND (#MaxDateCriteria IS NULL OR Date<#MaxDateCriteria)
AND ...
with ones like these:
WHERE ID = ISNULL(#IDCriteria, ID)
AND Date < ISNULL(#MaxDateCriteria, DATEADD(millisecond, 1, Date))
AND ...
or is this just coating the same unoptimizable query in syntactic sugar?
Choosing the right index is hard for the optimizer. IMO, this is one of few cases where dynamic SQL is the best option.
this is one of the cases i use code building or a sproc for each searchoption.
since your search is so complex i'd go with code building.
you can do this either in code or with dynamic sql.
just be careful of SQL Injection.
I suggest one step further than some of the other suggestions - think about degeneralizing at a much higher abstraction level, preferably the UI structure. Usually this seems to happen when the problem is being pondered in data mode rather than user domain mode.
In practice, I've found that almost every such query has one or more non-null, fairly selective columns that would be reasonably optimizable, if one (or more) were specified. Furthermore, these are usually reasonable assumptions that users can understand.
Example: Find Orders by Customer; or Find Orders by Date Range; or Find Orders By Salesperson.
If this pattern applies, then you can decompose your hypergeneralized query into more purposeful subqueries that also make sense to users, and you can reasonably prompt for required values (or ranges), and not worry too much about crafting efficient expressions for subsidiary columns.
You may still end up with an "All Others" category. But at least then if you provide what is essentially an open-ended Query By Example form, then users will have some idea what they're getting into. Doing what you describe really puts you in the role of trying to out-think the query optimizer, which is folly IMHO.
I'm currently working with SQL 2005, so I don't know if the 2008 optimizer acts differently. That being said, I've found that you need to do a couple of things...
Make sure that you are using WITH (RECOMPILE) for your query
Use CASE statements to cause short-circuiting of the logic. At least in 2005 this is NOT done with OR statements. For example:
.
SELECT
...
FROM
...
WHERE
(1 =
CASE
WHEN #my_column IS NULL THEN 1
WHEN my_column = #my_column THEN 1
ELSE 0
END
)
The CASE statement will cause the SQL Server optimizer to recognize that it doesn't need to continue past the first WHEN. In this example it's not a big deal, but in my search procs a non-null parameter often meant searching in another table through a subquery for existence of a matching row, which got costly. Once I made this change the search procs started running much faster.
My suggestion is to build the sql string. You will gain maximum performance from index and reuse execution plan.
DECLARE #sql nvarchar(4000);
SET #sql = N''
IF #param1 IS NOT NULL
SET #sql = CASE WHEN #sql = N'' THEN N'' ELSE N' AND ' END + N'param1 = #param1';
IF #param2 IS NOT NULL
SET #sql = CASE WHEN #sql = N'' THEN N'' ELSE N' AND ' END + N'param2 = #param2';
...
IF #paramN IS NOT NULL
SET #sql = CASE WHEN #sql = N'' THEN N'' ELSE N' AND ' END + N'paramN = #paramN';
IF #sql <> N''
SET #sql = N' WHERE ' + #sql;
SET #sql = N'SELECT ... FROM myTable' + #sql;
EXEC sp_executesql #sql, N'#param1 type, #param2 type, ..., #paramN type', #param1, #param2, ..., #paramN;
Each time the procedure is called, passing different parameters, there is a different optimal execution plan for getting the data. The problem being, that SQL has cached an execution plan for your procedure and will use a sub-optimal (read terrible) execution plan.
I would recommend:
Create specific SPs for frequently run execution paths (i.e. passed parameter sets) optimised for each scenario.
Keep you main generic SP for edge cases (presuming they are rarely run) but use the WITH RECOMPILE clause to cause a new execution plan to be created each time the procedure is run.
We use OR clauses checking against NULLs for optional parameters to great affect. It works very well without the RECOMPILE option so long as the execution path is not drastically altered by passing different parameters.