I have a parametrized query which looks like (With ? being the applications parameter):
SELECT * FROM tbl WHERE tbl_id = ?
What are the performance implications of adding a variable like so:
DECLARE #id INT = ?;
SELECT * FROM tbl WHERE tbl_id = #id
I have attempted to investigate myself but have had no luck other than query plans taking slightly longer to compile when the query is first run.
If tbl_id is unique there is no difference at all. I'm trying to explain why.
SQL Server usually can solve a query with many different execution plans. SQL Server has to choose one. It tries to find the the most efficient one without too much effort. Once SQL Server chooses one plan it usually caches it for later reuse. Cardinality plays a key role in the efficiency of an execution plan, i.e How many rows there are on tbl with a given value of tbl_id?. SQL Server stores column values frequency statistics to estimate cardinality.
Firstly, lets assume tbl_id is not unique and has a non uniform distribution.
In the first case we have tbl_id = ?. Lets figure out its cardinality. The first thing we need to do to figure it out is knowing the value of the parameter ?. Is it unknown? Not really. We have a value the first time the query is executed. SQL Server takes this value, it goes to stored statistics and estimates cadinality for this specific value, it estimates the cost of a bunch of possible execution plans taking into account the estimated cardinality, chooses the most efficient one and cache it for later reuse. This approach works most of the time. However if you execute the query later with other parameter value that has a very different cardinality, the cached execution plan might be very inefficient.
In the second case we have tbl_id = #id being #id a variable declared in the query, it isn't a query parameter. Which is the value of #id?. SQL Server treats it as an unknown value. SQL Server peaks the mean frequency from stored statistics as the estimated cardinality for unknown values. Then SQL Server do the same as before: it estimates the cost of a bunch of possible execution plans taking into account the estimated cardinality, chooses the most efficient one and cache it for later reuse. Again, this approach works most of the time. However if you execute the query with one parameter value that has a very different cardinality than the mean, the execution plan might be very inefficient.
When all values have the same cardinality they have the mean cardinality, so there is no difference between parameter and variable. This is the case of unique values, therefore there are no difference when values are unique.
one advantage of the 2nd approach is that is reduces the number of plans SQL will store.
in the first version it will create a different plan for every datatype (tinyint, smallint,int & bigint)
thats assuming its an adhoc statement.
If its in a stored proc - you might run into p-sniffing as mentioned above.
You could try adding
OPTION ( OPTIMIZE FOR (#id = [some good value]))
to the select to see if that helps - but is is usually not considered good practice to couple your queries to values.
I'm not sure if this helps, but I have to account for parameter sniffing for a lot of the stored procedures I write. I do this by creating local variables, setting those to the parameter values, and then using the local variables in the stored procedure. If you look at the stored execution plan, you can see that this prevents the parameter values from being used in the plan.
This is what I do:
CREATE PROCEDURE dbo.Test ( #var int )
AS
DECLARE
#_var int
SELECT
#_var = #var
SELECT *
FROM dbo.SomeTable
WHERE
Id = #_var
I do this mostly for SSRS. I've had a query/stored procedure return <1sec, but the report takes several minutes, for example. Doing the trick above fixed that.
There are also options for optimizing specific values (e.g. OPTION (OPTIMIZE #var FOR UNKNOWN)), but I've found this usually does not help me and will not have the same effects as the trick above. I haven't been able to investigate the specifics into why they are different, but I have experienced the OPTIMIZE FOR UNKNOWN did not help, where as using local variables in place of variables did.
Related
I have a complex query in .NET. Please see the query (simple query for explanation purposes) below:
SELECT * FROM Person WHERE Name='Ian' AND DateOfBirth='1961-04-04'
and this:
SELECT * FROM Person WHERE Name=#Ian AND DateOfBirth=#DateOfBirth
The table is indexed (name and date of birth).
The first query takes a fraction of a second to run from .NET. The second query takes about 48 seconds. Is this something to do with the execution plan? Is there anything I can do to force SQL Server to recreate the execution plan?
I have seen this question: https://dba.stackexchange.com/questions/530/how-do-you-clear-out-all-old-query-plans-from-within-microsoft-sql-server. However, this is more for stored procedures.
First, you want a composite index on Person(Name, DateOfBirth), not two indexes (the columns can be in either order).
Second, this probably has to do with the execution plan.
I am going to suggest the RECOMPILE option:
SELECT p.*
FROM Person p
WHERE Name = #Ian AND DateOfBirth = #DateOfBirth
OPTION (RECOMPILE);
What can happen with parameterized queries is that the execution plan is cached the first time it is run -- but that execution plan may not be the best for subsequent invocations.
If that doesn't work, then the problem may be data types, because data type incompatibility can prevent the use of indexes. Be sure that the data types and collations are the same. You might need to cast() the parameters to the appropriate type and/or use COLLATE.
Is there any difference, with regards to performance, when there are many queries running with (different) constant values inside a where clause, as opposed to having a query with declared parameters on top, where instead the parameter value is changing?
Sample query with with constant value in where clause:
select
*
from [table]
where [guid_field] = '00000000-0000-0000-000000000000' --value changes
Proposed (improved?) query with declared parameters:
declare #var uniqueidentifier = '00000000-0000-0000-000000000000' --value changes
select
*
from [table]
where [guid_field] = #var
Is there any difference? I'm looking at the execution plans of something similar to the two above queries and I don't see any difference. However, I seem to recall that if you use constant values in SQL statements that SQL server won't reuse the same query execution plans, or something to that effect that causes worse performance -- but is that actually true?
It is important to distinguish between parameters and variables here. Parameters are passed to procedures and functions, variables are declared.
Addressing variables, which is what the SQL in the question has, when compiling an ad-hoc batch, SQL Server compiles each statement within it's own right.
So when compiling the query with a variable it does not go back to check any assignment, so it will compile an execution plan optimised for an unknown variable.
On first run, this execution plan will be added to the plan cache, then future executions can, and will reuse this cache for all variable values.
When you pass a constant the query is compiled based on that specific value, so can create a more optimum plan, but with the added cost of recompilation.
So to specifically answer your question:
However, I seem to recall that if you use constant values in SQL statements that SQL server won't reuse the same query execution plans, or something to that effect that causes worse performance -- but is that actually true?
Yes it is true that the same plan cannot be reused for different constant values, but that does not necessarily cause worse performance. It is possible that a more appropriate plan can be used for that particular constant (e.g. choosing bookmark lookup over index scan for sparse data), and this query plan change may outweigh the cost of recompilation. So as is almost always the case regarding SQL performance questions. The answer is it depends.
For parameters, the default behaviour is that the execution plan is compiled based on when the parameter(s) used when the procedure or function is first executed.
I have answered similar questions before in much more detail with examples, that cover a lot of the above, so rather than repeat various aspects of it I will just link the questions:
Does assigning stored procedure input parameters to local variables help optimize the query?
Ensure cold cache when running query
Why is SQL Server using index scan instead of index seek when WHERE clause contains parameterized values
There are so many things involved in your question and all has to do with statistics..
SQL compiles execution plan for even Adhoc queries and stores them in plan cache for Reuse,if they are deemed safe.
select * into test from sys.objects
select schema_id,count(*) from test
group by schema_id
--schema_id 1 has 15
--4 has 44 rows
First ask:
we are trying a different literal every time,so sql saves the plan if it deems as safe..You can see second query estimates are same as literla 4,since SQL saved the plan for 4
--lets clear cache first--not for prod
dbcc freeproccache
select * from test
where schema_id=4
output:
select * from test where
schema_id=1
output:
second ask :
Passing local variable as param,lets use same value of 4
--lets pass 4 which we know has 44 rows,estimates are 44 whem we used literals
declare #id int
set #id=4
select * from test
As you can see below screenshot,using local variables estimated less some rough 29.5 rows which has to do with statistics ..
output:
So in summary ,statistics are crucial in choosing query plan(nested loops or doing a scan or seek)
,from the examples,you can see how estimates are different for each method.further from a plan cache bloat perspective
You might also wonder ,what happens if i pass many adhoc queries,since SQL generates a new plan for same query even if there is change in space,below are the links which will help you
Further readings:
http://www.sqlskills.com/blogs/kimberly/plan-cache-adhoc-workloads-and-clearing-the-single-use-plan-cache-bloat/
http://sqlperformance.com/2012/11/t-sql-queries/ten-common-threats-to-execution-plan-quality
First, note that a local variable is not the same as a parameter.
Assuming the column is indexed or has statistics, SQL Server uses the statistics histogram to glean an estimate the qualifying row count based on the constant value supplied. The query will also be auto-parameterized and cached if it is trivial (yield the same plan regardless of values) so that subsequent executions avoid query compilation costs.
A parameterized query also generates a plan using the stats histogram with the initially supplied parameter value. The plan is cached and reused for subsequent executions regardless of whether or not it is trivial.
With a local variable, SQL Server uses the overall statistics cardinality to generate the plan because the actual value is unknown at compile time. This plan may be good for some values but suboptimal for others when the query is not trivial.
So I have this really simple SP GetData which has two parameters and looks like this
SET ANSI_NULLS ON
GO
SET QOUTED_IDENTIFIER ON
GO
CREATE PROCEDURE [dbo].[GetData]
#Id int = NULL,
#ExternalId nvarchar(500) = NULL
AS
BEGIN
SET NOCOUNT ON;
SELECT
t.Id,
t.ExternalId,
t.Column1 -- other columns ommited for brevity
FROM [SomeTable] t
WHERE
(t.Id = #Id OR #ID IS NULL) AND
(t.ExternalId = #ExternalId OR #ExternalId IS NULL)
END
It's very simple. Just one select statement from one table. Now, what I am concerned about is that If I execute this procedure the time it takes on average is 0.505369 seconds but if I extract that select query and execute it like that the query takes from 0.023923 seconds on average. I am really concerned about this because this procedure is called really really often and is one of the critical procedure in my application that's why it's so minimized for now the 0.5s is a bit acceptable. for this time the table contains just 4.95 million rows. There is a clustered index on Id column and a non-clustered index on ExternalId column. the table is supposed to increase to 45 million rows in coming weaks and than the data insertion rate will decrease. on 45 million rows I don't think that the SP shown above will give some reasonable times. I don't really understan what is the problem here, or is it supposed to be like that ? as I know after execution of SP the plan is cached and for the next time the plan is not re created so should it be faster than the on the fly query ? In this case is it better to use Ad hoc query in stead of the SP ? The DB is Sql Server 2012. Thanks in advance
My first suggestion would be to use the option "Recompile". What happens is that the queries are compiled the first time the stored procedure is run (this is called "parameter sniffing"). This might have an impact on performance, if the execution path for the first run is different from the optimal execution path. For instance, sometimes the stored procedure is tested on super small tables, so indexes do not get used.
The syntax is:
SELECT
t.Id,
t.ExternalId,
t.Column1 -- other columns omited for brevity
FROM [SomeTable] t
WHERE
(t.Id = #Id OR #ID IS NULL) AND
(t.ExternalId = #ExternalId OR #ExternalId IS NULL)
OPTION (recompile);
However, your query uses or in the where clause which makes it difficult for the optimizer to use indexes at all. One option is to switch to dynamic SQL, something like this:
declare #sql nvarchar(max) = '
SELECT t.Id, t.ExternalId,
t.Column1 -- other columns omited for brevity
FROM [SomeTable] t
WHERE 1=1 ';
set #sql = #sql + (case when #id is not null then ' and t.Id = #id'` else '' end) +
(case when #ExternalId is not null then ' and t.externalId = #externalId' else '' end);
exec sp_executesql #sql, N'#Id int, #ExternalId int', #id = #id, #externalId = #externalId;
There are a few factors which help determine the best course of action:
The distribution of data in the ExternalId column:
If ExternalId values are fairly evenly spread out, then outside of the possible NULL value not even using that field, one value should not produce a plan that would not work for other values. You don't need to worry about the Id field, assuming it is the PK, because the very nature of a PK is that there is only 1 per any value.
The actual variability of input parameter values:
How often is either one of them NULL, a particular value, or any value? Meaning, are 90% of the executions of this proc coming in with #Id of NULL and #ExternalId one of 5 different values? Or is it that 90% of the time a different #Id value is passed in? And/or is there 1 particular value of #ExternalId that is used or is it usually different?
First
Before considering any structural changes, please make sure that the datatype of the ExternalId field matches the datatype of the #ExternalId input parameter. The #ExternalId input param is defined as NVARCHAR(500), so if the ExternalId field is actually declared as VARCHAR, then you are probably getting an "Index Scan" instead of an "Index Seek" due to the implicit conversion from VARCHAR to NVARCHAR.
Options
Using OPTION (RECOMPILE): This has been mentioned already and I am including it both for the sake of completeness and to say that this should be your last resort. This option ensures that you don't get a bad cached plan by disallowing that you ever have any cached plans. Which means that you are also never able to benefit from plan caching. In most cases there are better choices.
AND, and this is very important in situations like yours where the stored procedure is executed frequently: the reason that execution plans are cached is due to the expense of figuring them out, hence telling SQL Server to figure it out again and again, for every execution, will take its toll on the process.
Using OPTION (OPTIMIZE FOR...): This option allows you to tell the Query Optimizer to assume an average distribution, based on current statistics, for all input parameters (when using OPTIMIZE FOR UKNOWN) or assume a distribution based on a particular value for one or more input parameters (when using OPTIMIZE FOR ( #variable_name { UNKNOWN | = literal_constant } [ , ...n ] )). Please note that you can still use the UNKNOWN keyword to assume an average distribution for specific parameters while also using specific values for other parameters. For more info, please see the MSDN page for Query Hints.
Parameterized Dynamic SQL (i.e. 'Field = #Param'): This option solves the problem of various combinations of parameters (which you have tried to solve using the Field = #Param OR #Param IS NULL method). And this might be all you need if the data in the ExternalId field is fairly evenly distributed. But if it is very uneven, then you can still fall into the problem of getting a bad cached plan.
Literal (i.e. non-Parameterized) Dynamic SQL (i.e. 'Field = ' + CONVERT(NVARCHAR(50), #Param)): In this method you would concatenate the appropriate parameter values into the Dynamic SQL (after making sure that #ExternalId does not have any single-quotes in it so as to avoid SQL Injection). This will give you a query plan that is tailored to the specific values and can be reused if those values are passed in again (in the exact combination of both input parameters). The main downside here is that if there is a high variability of values passed in for either input parameter, you will generate quite a lot of execution plans, and they do take up memory. But in the case of highly varied data distributions (i.e. one #ExternalId pulls 50 rows while another value pulls 2 million), then this probably the way to go.
Combination of Parameterized and non-Parameterized Dynamic SQL: In the case where an input parameter's values are highly varied but the data distribution in the table is fairly even, you can parameterize this parameter in the Dynamic SQL while concatenating in the input parameter that has a highly varied data distribution. Of course, in this particular situation, we know that Id is very evenly distributed, so if ExternalId is also evenly distributed then you should stick with Parameterized Dynamic SQL (as noted above). This would result in fewer execution plans than going with the fully Literal option.
I have had great success using this technique with stored procedures that get called every second for several hours, and that hit several tables that have well over 10 million rows each. This is after I initially tried using OPTION (RECOMPILE) only to find that it made things worse.
Multiple Stored Procedures: Assuming that you don't ever call this proc with both input parameters being NULL at the same time, you could create three stored procedures for the combinations of: #Id-only, #ExternalId-only, and both #Id and #ExternalId. And it would then be up to the app code to determine which stored procedure to execute. This would seem to be great for the #Id-only proc since the data is evenly distributed. But depending on how evenly or unevenly the values for ExternalId are distributed, the two stored procedures with the #ExternalId input parameter could still run into the problem of getting a bad cached plan.
Notes
When I say "bad cached plan", I mean "bad" for some values. Execution Plans are cached the first time the stored procedure is executed. They are cached until SQL Server restarts, or some executes DBCC FREEPROCCACHE, or if the system is experiencing memory pressure and needs to free up some memory to be used for queries, it can dump plans that haven't been used in a while. But the plan that is cached was intended to be the optimal plan for the parameter values it first ran with. Different values used in subsequent executions might be horribly inefficient with that same plan. So the "bad" refers to a sometimes condition, not an always condition. If the plan is always bad, then it is most likely the query itself is the issue and not the parameter values.
A downside to Dynamic SQL is that it breaks ownership chaining. This means that, typically, direct table permissions need to be granted to the user since the permissions cannot be assumed via owner of the stored procedure. The good news, however, is that you don't need to grant direct table permissions to the User(s) executing the stored proc. You can do the following to maintain proper security when using Dynamic SQL:
I Assume that we are only dealing with a single database.
Create a Certificate in the database
Create a User based on that Certificate
Grant the table permissions to this new Certificate-based User
Use ADD SIGNATURE to sign the stored procedure, which effectively grants the stored procedure -- not the User executing the stored procedure -- the permissions assigned to the new Certificate-based User.
I've had a SQL performance review done on a project we're working on, and one 'Critical' item that has come up is this:
This kind of wildcard query pattern will cause a table scan, resulting
in poor query performance.
SELECT *
FROM TabFoo
WHERE ColBar = #someparam OR #someparam IS NULL
Their recommendation is:
In many cases, an OPTION (RECOMPILE) hint can be a quick workaround.
From a design point of view, you can also consider using separate If
clauses or (not recommended) use a dynamic SQL statement.
Dynamic SQL surely isn't the right way forward. Basically the procedure is one where I am search for something, OR something else. Two parameters come into the procedure, and I am filtering on one, or the other.
A better example than what they showed is:
SELECT ..
FROM...
WHERE (ColA = #ParA OR #ColA IS NULL)
(AND ColB = #ParB OR #ParB IS NULL)
Is that bad practice, and besides dynamic SQL (because, I thought dynamic sql can't really compile and be more efficient in it's execution plan?), how would this best be done?
A query like
select *
from foo
where foo.bar = #p OR #p is null
might or might not cause a table scan. My experience is that it will not: the optimizer perfectly able to do an index seek on the expression foo.bar = #p, assuming a suitable index exists. Further, it's perfectly able to short-circuit things if the variable is null. You won't know what your execution plan looks like until you try it and examine the bound execution plane. A better technique, however is this:
select *
from foo
where foo.bar = coalesce(#p,foo.bar)
which will give you the same behavior.
If you are using a stored procedure, one thing that can and will bite you in the tookus is something like this:
create dbo.spFoo
#p varchar(32)
as
select *
from dbo.foo
where foo.bar = #p or #p = null
return ##rowcount
The direct use of the stored procedure parameter in the where clause will cause the cached execution plan to be based on the value of #p on its first execution. That means that if the first execution of your stored procedure has an outlier value for #p, you may get a cached execution plan that performs really poorly for the 95% of "normal" executions and really well only for the oddball cases. To prevent this from occurring, you want to do this:
create dbo.spFoo
#p varchar(32)
as
declare #pMine varchar(32)
set #pMine = #p
select *
from dbo.foo
where foo.bar = #pMine or #pMine = null
return ##rowcount
That simple assignment of the parameter to a local variable makes it an expression and so the cached execution plan is not bound to the initial value of #p. Don't ask how I know this.
Further the recommendation you received:
In many cases, an OPTION (RECOMPILE) hint can be a quick workaround.
From a design point of view, you can also consider using separate
If clauses or (not recommended) use a dynamic SQL statement.
is hogwash. Option(recompile) means that the stored procedure is recompiled on every execution. When the stored procedure is being compiled, compile-time locks on taken out on dependent object. Further, nobody else is going to be able to execute the stored procedure until the compilation is completed. This has, shall we say, negative impact on concurrency and performance. Use of option(recompile) should be a measure of last resort.
Write clean SQL and vet your execution plans using production data, or as close as you can get to it: the execution plan you get is affected by the size and shape/distribution of the data.
I could be wrong, but I'm pretty sure a table scan will occur no matter what if the column you have in your where clause isn't indexed. Also, you could probably get better performance by reordering your OR clauses so that if #ParA IS NULL is true, it evaluates first and would not require evaluating the value in the column. Something to remember is that the where clause is evaluated for every row that comes back from the from clause. I would not recommend dynamic SQL, and honestly, even under relatively heavy load I'd find it difficult to believe that this form of filter would cause a significant performance hit, since a table scan is required anytime the column isn't indexed.
We did a Microsoft engagement where they noted that we had a ton of this "Wildcard Pattern Usage", and their suggestion was to convert the query to an IF/ELSE structure...
IF (#SomeParam is null) BEGIN
SELECT *
FROM TabFoo
END
ELSE BEGIN
SELECT *
FROM TabFoo
WHERE ColBar = #someparam
END
They preferred this approach over recompile (adds to execution time) or dynamic code (can't plan ahead, so kind of the same thing, having to figure out the plan every time); and I seem to recall that it is still an issue even with local variables (plus, you need extra memory regardless).
You can see that things get a bit crazy if you write queries with multiple WPU issues, but at least for the smaller ones, MS recommends the IF/ELSE approach.
In all the examples I saw, NULL was involved, but I can't help but think if you had a parameter utilizing a default, whether on the parameter itself or set with an ISNULL(), and essentially the same pattern used, that might also be bad (well, as long as the default is something an "actual value" would never be, that is).
I have a table with almost 800,000 records and I am currently using dynamic sql to generate the query on the back end. The front end is a search page which takes about 20 parameters and depending on if a parameter was chosen, it adds an " AND ..." to the base query. I'm curious as to if dynamic sql is the right way to go ( doesn't seem like it because it runs slow). I am contemplating on just creating a denormalized table with all my data. Is this a good idea or should I just build the query all together instead of building it piece by piece using the dynamic sql. Last thing, is there a way to speed up dynamic sql?
It is more likely that your indexing (or lack thereof) is causing the slowness than the dynamic SQL.
What does the execution plan look like? Is the same query slow when executed in SSMS? What about when it's in a stored procedure?
If your table is an unindexed heap, it will perform poorly as the number of records grows - this is regardless of the query, and a dynamic query can actually perform better as the table nature changes because a dynamic query is more likely to have its query plan re-evaluated when it's not in the cache. This is not normally an issue (and I would not classify it as a design advantage of dynamic queries) except in the early stages of a system when SPs have not been recompiled, but statistics and query plans are out of date, but the volume of data has just drastically changed.
Not the static one yet. I have with the dynamic query, but it does not give any optimizations. If I ran it with the static query and it gave suggestions, would applying them affect the dynamic query? – Xaisoft (41 mins ago)
Yes, the dynamic query (EXEC (#sql)) is probably not going to be analyzed unless you analyzed a workload file. – Cade Roux (33 mins ago)
When you have a search query across multiple tables that are joined, the columns with indexes need to be the search columns as well as the primary key/foreign key columns - but it depends on the cardinality of the various tables. The tuning analyzer should show this. – Cade Roux (22 mins ago)
I'd just like to point out that if you use this style of optional parameters:
AND (#EarliestDate is Null OR PublishedDate < #EarliestDate)
The query optimizer will have no idea whether the parameter is there or not when it produces the query plan. I have seen cases where the optimizer makes bad choices in these cases. A better solution is to build the sql that uses only the parameters you need. The optimizer will make the most efficient execution plan in these cases. Be sure to use parameterized queries so that they are reusable in the plan cache.
As previous answer, check your indexes and plan.
The question is whether you are using a stored procedure. It's not obvious from the way you worded it. A stored procedure creates a query plan when run, and keeps that plan until recompiled. With varying SQL, you may be stuck with a bad query plan. You could do several things:
1) Add WITH RECOMPILE to the SP definition, which will cause a new plan to be generated with every execution. This includes some overhead, which may be acceptable.
2) Use separate SP's, depending on the parameters provided. This will allow better query plan caching
3) Use client generated SQL. This will create a query plan each time. If you use parameterized queries, this may allow you to use cached query plans.
The only difference between "dynamic" and "static" SQL is the parsing/optimization phase. Once those are done, the query will run identically.
For simple queries, this parsing phase plus the network traffic turns out to be a significant percentage of the total transaction time, so it's good practice to try and reduce these times.
But for large, complicated queries, this processing is overall insignificant compared to the actual path chosen by the optimizer.
I would focus on optimizing the query itself, including perhaps denormalization if you feel that it's appropriate, though I wouldn't do that on a first go around myself.
Sometimes the denormalization can be done at "run time" in the application using cached lookup tables, for example, rather than maintaining this o the database.
Not a fan of dynamic Sql but if you are stuck with it, you should probably read this article:
http://www.sommarskog.se/dynamic_sql.html
He really goes in depth on the best ways to use dynamic SQL and the isues using it can create.
As others have said, indexing is the most likely culprit. In indexing, one thing people often forget to do is put an index on the FK fields. Since a PK creates an index automatically, many assume an FK will as well. Unfortunately creating an FK does nto create an index. So make sure that any fields you join on are indexed.
There may be better ways to create your dynamic SQL but without seeing the code it is hard to say. I would at least look to see if it is using subqueries and replace them with derived table joins instead. Also any dynamic SQl that uses a cursor is bound to be slow.
If the parameters are optional, a trick that's often used is to create a procedure like this:
CREATE PROCEDURE GetArticlesByAuthor (
#AuthorId int,
#EarliestDate datetime = Null )
AS
SELECT * --not in production code!
FROM Articles
WHERE AuthorId = #AuthorId
AND (#EarliestDate is Null OR PublishedDate < #EarliestDate)
There are some good examples of queries with optional search criteria here: How do I create a stored procedure that will optionally search columns?
As noted, if you are doing a massive query, Indexes are the first bottleneck to look at. Make sure that heavily queried columns are indexed. Also, make sure that your query checks all indexed parameters before it checks un-indexed parameters. This makes sure that the results are filtered down using indexes first and then does the slow linear search only if it has to. So if col2 is indexed but col1 is not, it should look as follows:
WHERE col2 = #col2 AND col1 = #col1
You may be tempted to go overboard with indexes as well, but keep in mind that too many indexes can cause slow writes and massive disk usage, so don't go too too crazy.
I avoid dynamic queries if I can for two reasons. One, they do not save the query plan, so the statement gets compiled each time. The other is that they are hard to manipulate, test, and troubleshoot. (They just look ugly).
I like Dave Kemp's answer above.
I've had some success (in a limited number of instances) with the following logic:
CREATE PROCEDURE GetArticlesByAuthor (
#AuthorId int,
#EarliestDate datetime = Null
) AS
SELECT SomeColumn
FROM Articles
WHERE AuthorId = #AuthorId
AND #EarliestDate is Null
UNION
SELECT SomeColumn
FROM Articles
WHERE AuthorId = #AuthorId
AND PublishedDate < #EarliestDate
If you are trying to optimize to below the 1s range, it may be important to gauge approximately how long it takes to parse and compile the dynamic sql relative to the actual query execution time:
SET STATISTICS TIME ON;
and then execute the dynamic SQL string "statically" and check the "Messages" tab. I was surprised by these results for a ~10 line dynamic sql query that returns two rows from a 1M row table:
SQL Server parse and compile time:
CPU time = 199 ms, elapsed time = 199 ms.
(2 row(s) affected)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 4 ms.
Index optimization will doubtfully move the 199ms barrier much (except perhaps due to some analyzation/optimization included within the compile time).
However if the dynamic SQL uses parameters or is repeating than the compile results may be cached according to: See Caching Query Plans which would eliminate the compile time. Would be interesting to know how long cache entries live, size, shared between sessions, etc.