Parameterized query is too slow

Parameterized query is too slow - sql

I have a complex query in .NET. Please see the query (simple query for explanation purposes) below:
SELECT * FROM Person WHERE Name='Ian' AND DateOfBirth='1961-04-04'
and this:
SELECT * FROM Person WHERE Name=#Ian AND DateOfBirth=#DateOfBirth
The table is indexed (name and date of birth).
The first query takes a fraction of a second to run from .NET. The second query takes about 48 seconds. Is this something to do with the execution plan? Is there anything I can do to force SQL Server to recreate the execution plan?
I have seen this question: https://dba.stackexchange.com/questions/530/how-do-you-clear-out-all-old-query-plans-from-within-microsoft-sql-server. However, this is more for stored procedures.

First, you want a composite index on Person(Name, DateOfBirth), not two indexes (the columns can be in either order).
Second, this probably has to do with the execution plan.
I am going to suggest the RECOMPILE option:
SELECT p.*
FROM Person p
WHERE Name = #Ian AND DateOfBirth = #DateOfBirth
OPTION (RECOMPILE);
What can happen with parameterized queries is that the execution plan is cached the first time it is run -- but that execution plan may not be the best for subsequent invocations.
If that doesn't work, then the problem may be data types, because data type incompatibility can prevent the use of indexes. Be sure that the data types and collations are the same. You might need to cast() the parameters to the appropriate type and/or use COLLATE.

Related

Impact of variables in a paramatized SQL query

I have a parametrized query which looks like (With ? being the applications parameter):
SELECT * FROM tbl WHERE tbl_id = ?
What are the performance implications of adding a variable like so:
DECLARE #id INT = ?;
SELECT * FROM tbl WHERE tbl_id = #id
I have attempted to investigate myself but have had no luck other than query plans taking slightly longer to compile when the query is first run.

If tbl_id is unique there is no difference at all. I'm trying to explain why.
SQL Server usually can solve a query with many different execution plans. SQL Server has to choose one. It tries to find the the most efficient one without too much effort. Once SQL Server chooses one plan it usually caches it for later reuse. Cardinality plays a key role in the efficiency of an execution plan, i.e How many rows there are on tbl with a given value of tbl_id?. SQL Server stores column values frequency statistics to estimate cardinality.
Firstly, lets assume tbl_id is not unique and has a non uniform distribution.
In the first case we have tbl_id = ?. Lets figure out its cardinality. The first thing we need to do to figure it out is knowing the value of the parameter ?. Is it unknown? Not really. We have a value the first time the query is executed. SQL Server takes this value, it goes to stored statistics and estimates cadinality for this specific value, it estimates the cost of a bunch of possible execution plans taking into account the estimated cardinality, chooses the most efficient one and cache it for later reuse. This approach works most of the time. However if you execute the query later with other parameter value that has a very different cardinality, the cached execution plan might be very inefficient.
In the second case we have tbl_id = #id being #id a variable declared in the query, it isn't a query parameter. Which is the value of #id?. SQL Server treats it as an unknown value. SQL Server peaks the mean frequency from stored statistics as the estimated cardinality for unknown values. Then SQL Server do the same as before: it estimates the cost of a bunch of possible execution plans taking into account the estimated cardinality, chooses the most efficient one and cache it for later reuse. Again, this approach works most of the time. However if you execute the query with one parameter value that has a very different cardinality than the mean, the execution plan might be very inefficient.
When all values have the same cardinality they have the mean cardinality, so there is no difference between parameter and variable. This is the case of unique values, therefore there are no difference when values are unique.

one advantage of the 2nd approach is that is reduces the number of plans SQL will store.
in the first version it will create a different plan for every datatype (tinyint, smallint,int & bigint)
thats assuming its an adhoc statement.
If its in a stored proc - you might run into p-sniffing as mentioned above.
You could try adding
OPTION ( OPTIMIZE FOR (#id = [some good value]))
to the select to see if that helps - but is is usually not considered good practice to couple your queries to values.

I'm not sure if this helps, but I have to account for parameter sniffing for a lot of the stored procedures I write. I do this by creating local variables, setting those to the parameter values, and then using the local variables in the stored procedure. If you look at the stored execution plan, you can see that this prevents the parameter values from being used in the plan.
This is what I do:
CREATE PROCEDURE dbo.Test ( #var int )
AS
DECLARE
#_var int
SELECT
#_var = #var
SELECT *
FROM dbo.SomeTable
WHERE
Id = #_var
I do this mostly for SSRS. I've had a query/stored procedure return <1sec, but the report takes several minutes, for example. Doing the trick above fixed that.
There are also options for optimizing specific values (e.g. OPTION (OPTIMIZE #var FOR UNKNOWN)), but I've found this usually does not help me and will not have the same effects as the trick above. I haven't been able to investigate the specifics into why they are different, but I have experienced the OPTIMIZE FOR UNKNOWN did not help, where as using local variables in place of variables did.

SQL - any performance difference using constant values vs parameters?

Is there any difference, with regards to performance, when there are many queries running with (different) constant values inside a where clause, as opposed to having a query with declared parameters on top, where instead the parameter value is changing?
Sample query with with constant value in where clause:
select
*
from [table]
where [guid_field] = '00000000-0000-0000-000000000000' --value changes
Proposed (improved?) query with declared parameters:
declare #var uniqueidentifier = '00000000-0000-0000-000000000000' --value changes
select
*
from [table]
where [guid_field] = #var
Is there any difference? I'm looking at the execution plans of something similar to the two above queries and I don't see any difference. However, I seem to recall that if you use constant values in SQL statements that SQL server won't reuse the same query execution plans, or something to that effect that causes worse performance -- but is that actually true?

It is important to distinguish between parameters and variables here. Parameters are passed to procedures and functions, variables are declared.
Addressing variables, which is what the SQL in the question has, when compiling an ad-hoc batch, SQL Server compiles each statement within it's own right.
So when compiling the query with a variable it does not go back to check any assignment, so it will compile an execution plan optimised for an unknown variable.
On first run, this execution plan will be added to the plan cache, then future executions can, and will reuse this cache for all variable values.
When you pass a constant the query is compiled based on that specific value, so can create a more optimum plan, but with the added cost of recompilation.
So to specifically answer your question:
However, I seem to recall that if you use constant values in SQL statements that SQL server won't reuse the same query execution plans, or something to that effect that causes worse performance -- but is that actually true?
Yes it is true that the same plan cannot be reused for different constant values, but that does not necessarily cause worse performance. It is possible that a more appropriate plan can be used for that particular constant (e.g. choosing bookmark lookup over index scan for sparse data), and this query plan change may outweigh the cost of recompilation. So as is almost always the case regarding SQL performance questions. The answer is it depends.
For parameters, the default behaviour is that the execution plan is compiled based on when the parameter(s) used when the procedure or function is first executed.
I have answered similar questions before in much more detail with examples, that cover a lot of the above, so rather than repeat various aspects of it I will just link the questions:
Does assigning stored procedure input parameters to local variables help optimize the query?
Ensure cold cache when running query
Why is SQL Server using index scan instead of index seek when WHERE clause contains parameterized values

There are so many things involved in your question and all has to do with statistics..
SQL compiles execution plan for even Adhoc queries and stores them in plan cache for Reuse,if they are deemed safe.
select * into test from sys.objects
select schema_id,count(*) from test
group by schema_id
--schema_id 1 has 15
--4 has 44 rows
First ask:
we are trying a different literal every time,so sql saves the plan if it deems as safe..You can see second query estimates are same as literla 4,since SQL saved the plan for 4
--lets clear cache first--not for prod
dbcc freeproccache
select * from test
where schema_id=4
output:
select * from test where
schema_id=1
output:
second ask :
Passing local variable as param,lets use same value of 4
--lets pass 4 which we know has 44 rows,estimates are 44 whem we used literals
declare #id int
set #id=4
select * from test
As you can see below screenshot,using local variables estimated less some rough 29.5 rows which has to do with statistics ..
output:
So in summary ,statistics are crucial in choosing query plan(nested loops or doing a scan or seek)
,from the examples,you can see how estimates are different for each method.further from a plan cache bloat perspective
You might also wonder ,what happens if i pass many adhoc queries,since SQL generates a new plan for same query even if there is change in space,below are the links which will help you
Further readings:
http://www.sqlskills.com/blogs/kimberly/plan-cache-adhoc-workloads-and-clearing-the-single-use-plan-cache-bloat/
http://sqlperformance.com/2012/11/t-sql-queries/ten-common-threats-to-execution-plan-quality

First, note that a local variable is not the same as a parameter.
Assuming the column is indexed or has statistics, SQL Server uses the statistics histogram to glean an estimate the qualifying row count based on the constant value supplied. The query will also be auto-parameterized and cached if it is trivial (yield the same plan regardless of values) so that subsequent executions avoid query compilation costs.
A parameterized query also generates a plan using the stats histogram with the initially supplied parameter value. The plan is cached and reused for subsequent executions regardless of whether or not it is trivial.
With a local variable, SQL Server uses the overall statistics cardinality to generate the plan because the actual value is unknown at compile time. This plan may be good for some values but suboptimal for others when the query is not trivial.

SQL Server 2014 performance - parameterized SQL vs. literals

I have a simple query like
select count(distinct key)
from table
where date between '2014-01-01' and '2014-12-31'
It's fast (about 1 second), but becomes much slower (about 4 seconds) when I try to parameterize it within sp_executesql:
exec sp_executesql
N'select count(distinct key) from table where date between #start and #end',
N'#start date, #end date',
#start = '2014-01-01', #end='2014-12-31'
Why the difference in performance?
UPDATE The plan difference appears to be because of type conversion. When I change the parameter types to N'#start datetime, #end datetime', to match the columns exactly, the discrepancy disappears, and the plans for parameters vs. constants are practically identical (same costs, etc.) (Facepalm.)
I'll accept an answer that explains why the type conversion results in such a dramatic plan difference rather than just converting the parameters up front and proceeding as usual.
END UPDATE
The plans are very similar - same index, same estimated cardinalities, same row counts, and same I/O - although the CPU cost estimates in the parameterized version are higher.
Specific discrepancies:
The same Index Seek appears in both plans, but it is parallel in the (fast) version with literals, and NOT parallel in the (slow) version with parameters. The non-parallel one has higher CPU estimate.
The version with parameters has an Nested Loop join connecting the index scan with some logic around parameters. This appears to add some CPU overhead of its own (CPU estimate = 7.6 on the nested loop).
Both versions have "Parallelism" as a child of the Hash Match; in the parameterized version it is Distribute Streams (CPU estimate = 16.5) but in the literal version it is Repartition Streams (CPU estimate = 8.3)
How can I get the parameterized version to perform similarly as the version with literals?
All I could find in my research about performance of parameterized queries has to do with estimates -- either plans cached with different parameter values, or local variables being treated as unknowns. Neither is a factor here; my estimates are correct.

Why the difference in performance?
Because you have two different queries and they have two different execution plans (even though they are similar).
Why do you have two different plans?
There is a great detailed article by Erland Sommarskog Slow in the Application, Fast in SSMS? Understanding Performance Mysteries, where he explains how query optimizer works. The key points applicable to your example are:
A constant is a constant, and when a query includes a constant, SQL Server can use the value of the constant with full trust, and even
take such shortcuts to not access a table at all, if it can infer from
constraints that no rows will be returned.
For a parameter, SQL Server does not know the run-time value, but it "sniffs" the input value when compiling the query.
For a local variable, SQL Server has no idea at all of the run-time value, and applies standard assumptions. (Which the assumptions are
depends on the operator and what can be deduced from the presence of
unique indexes.)
And there is a corollary of this: if you take out a query from a
stored procedure and replace variables and parameters with constants,
you now have quite a different query.
As shown in the article quick answer to:
How can I get the parameterized version to perform similarly as the
version with literals?
is to use OPTION(RECOMPILE) / OPTIMIZE FOR.
RECOMPILE
Instructs the SQL Server Database Engine to discard the plan generated
for the query after it executes, forcing the query optimizer to
recompile a query plan the next time the same query is executed.
Without specifying RECOMPILE, the Database Engine caches query plans
and reuses them. When compiling query plans, the RECOMPILE query hint
uses the current values of any local variables in the query and, if
the query is inside a stored procedure, the current values passed to
any parameters.
So, option RECOMPILE behaves as if query had literal values instead of parameters.
OPTIMIZE FOR
Instructs the query optimizer to use a particular value for a local
variable when the query is compiled and optimized. The value is used
only during query optimization, and not during query execution.

Faster querying with temp table creation (SQL SERVER) [duplicate]

I am re-iterating the question asked by Mongus Pong Why would using a temp table be faster than a nested query? which doesn't have an answer that works for me.
Most of us at some point find that when a nested query reaches a certain complexity it needs to broken into temp tables to keep it performant. It is absurd that this could ever be the most practical way forward and means these processes can no longer be made into a view. And often 3rd party BI apps will only play nicely with views so this is crucial.
I am convinced there must be a simple queryplan setting to make the engine just spool each subquery in turn, working from the inside out. No second guessing how it can make the subquery more selective (which it sometimes does very successfully) and no possibility of correlated subqueries. Just the stack of data the programmer intended to be returned by the self-contained code between the brackets.
It is common for me to find that simply changing from a subquery to a #table takes the time from 120 seconds to 5. Essentially the optimiser is making a major mistake somewhere. Sure, there may be very time consuming ways I could coax the optimiser to look at tables in the right order but even this offers no guarantees. I'm not asking for the ideal 2 second execute time here, just the speed that temp tabling offers me within the flexibility of a view.
I've never posted on here before but I have been writing SQL for years and have read the comments of other experienced people who've also just come to accept this problem and now I would just like the appropriate genius to step forward and say the special hint is X...

There are a few possible explanations as to why you see this behavior. Some common ones are
The subquery or CTE may be being repeatedly re-evaluated.
Materialising partial results into a #temp table may force a more optimum join order for that part of the plan by removing some possible options from the equation.
Materialising partial results into a #temp table may improve the rest of the plan by correcting poor cardinality estimates.
The most reliable method is simply to use a #temp table and materialize it yourself.
Failing that regarding point 1 see Provide a hint to force intermediate materialization of CTEs or derived tables. The use of TOP(large_number) ... ORDER BY can often encourage the result to be spooled rather than repeatedly re evaluated.
Even if that works however there are no statistics on the spool.
For points 2 and 3 you would need to analyse why you weren't getting the desired plan. Possibly rewriting the query to use sargable predicates, or updating statistics might get a better plan. Failing that you could try using query hints to get the desired plan.

I do not believe there is a query hint that instructs the engine to spool each subquery in turn.
There is the OPTION (FORCE ORDER) query hint which forces the engine to perform the JOINs in the order specified, which could potentially coax it into achieving that result in some instances. This hint will sometimes result in a more efficient plan for a complex query and the engine keeps insisting on a sub-optimal plan. Of course, the optimizer should usually be trusted to determine the best plan.
Ideally there would be a query hint that would allow you to designate a CTE or subquery as "materialized" or "anonymous temp table", but there is not.

Another option (for future readers of this article) is to use a user-defined function. Multi-statement functions (as described in How to Share Data between Stored Procedures) appear to force the SQL Server to materialize the results of your subquery. In addition, they allow you to specify primary keys and indexes on the resulting table to help the query optimizer. This function can then be used in a select statement as part of your view. For example:
CREATE FUNCTION SalesByStore (#storeid varchar(30))
RETURNS #t TABLE (title varchar(80) NOT NULL PRIMARY KEY,
qty smallint NOT NULL) AS
BEGIN
INSERT #t (title, qty)
SELECT t.title, s.qty
FROM sales s
JOIN titles t ON t.title_id = s.title_id
WHERE s.stor_id = #storeid
RETURN
END
CREATE VIEW SalesData As
SELECT * FROM SalesByStore('6380')

Having run into this problem, I found out that (in my case) SQL Server was evaluating the conditions in incorrect order, because I had an index that could be used (IDX_CreatedOn on TableFoo).
SELECT bar.*
FROM
(SELECT * FROM TableFoo WHERE Deleted = 1) foo
JOIN TableBar bar ON (bar.FooId = foo.Id)
WHERE
foo.CreatedOn > DATEADD(DAY, -7, GETUTCDATE())
I managed to work around it by forcing the subquery to use another index (i.e. one that would be used when the subquery was executed without the parent query). In my case I switched to PK, which was meaningless for the query, but allowed the conditions from the subquery to be evaluated first.
SELECT bar.*
FROM
(SELECT * FROM TableFoo WITH (INDEX([PK_Id]) WHERE Deleted = 1) foo
JOIN TableBar bar ON (bar.FooId = foo.Id)
WHERE
foo.CreatedOn > DATEADD(DAY, -7, GETUTCDATE())
Filtering by the Deleted column was really simple and filtering the few results by CreatedOn afterwards was even easier. I was able to figure it out by comparing the Actual Execution Plan of the subquery and the parent query.
A more hacky solution (and not really recommended) is to force the subquery to get executed first by limiting the results using TOP, however this could lead to weird problems in the future if the results of the subquery exceed the limit (you could always set the limit to something ridiculous). Unfortunately TOP 100 PERCENT can't be used for this purpose since SQL Server just ignores it.

Is a dynamic sql stored procedure a bad thing for lots of records?

I have a table with almost 800,000 records and I am currently using dynamic sql to generate the query on the back end. The front end is a search page which takes about 20 parameters and depending on if a parameter was chosen, it adds an " AND ..." to the base query. I'm curious as to if dynamic sql is the right way to go ( doesn't seem like it because it runs slow). I am contemplating on just creating a denormalized table with all my data. Is this a good idea or should I just build the query all together instead of building it piece by piece using the dynamic sql. Last thing, is there a way to speed up dynamic sql?

It is more likely that your indexing (or lack thereof) is causing the slowness than the dynamic SQL.
What does the execution plan look like? Is the same query slow when executed in SSMS? What about when it's in a stored procedure?
If your table is an unindexed heap, it will perform poorly as the number of records grows - this is regardless of the query, and a dynamic query can actually perform better as the table nature changes because a dynamic query is more likely to have its query plan re-evaluated when it's not in the cache. This is not normally an issue (and I would not classify it as a design advantage of dynamic queries) except in the early stages of a system when SPs have not been recompiled, but statistics and query plans are out of date, but the volume of data has just drastically changed.
Not the static one yet. I have with the dynamic query, but it does not give any optimizations. If I ran it with the static query and it gave suggestions, would applying them affect the dynamic query? – Xaisoft (41 mins ago)
Yes, the dynamic query (EXEC (#sql)) is probably not going to be analyzed unless you analyzed a workload file. – Cade Roux (33 mins ago)
When you have a search query across multiple tables that are joined, the columns with indexes need to be the search columns as well as the primary key/foreign key columns - but it depends on the cardinality of the various tables. The tuning analyzer should show this. – Cade Roux (22 mins ago)

I'd just like to point out that if you use this style of optional parameters:
AND (#EarliestDate is Null OR PublishedDate < #EarliestDate)
The query optimizer will have no idea whether the parameter is there or not when it produces the query plan. I have seen cases where the optimizer makes bad choices in these cases. A better solution is to build the sql that uses only the parameters you need. The optimizer will make the most efficient execution plan in these cases. Be sure to use parameterized queries so that they are reusable in the plan cache.

As previous answer, check your indexes and plan.
The question is whether you are using a stored procedure. It's not obvious from the way you worded it. A stored procedure creates a query plan when run, and keeps that plan until recompiled. With varying SQL, you may be stuck with a bad query plan. You could do several things:
1) Add WITH RECOMPILE to the SP definition, which will cause a new plan to be generated with every execution. This includes some overhead, which may be acceptable.
2) Use separate SP's, depending on the parameters provided. This will allow better query plan caching
3) Use client generated SQL. This will create a query plan each time. If you use parameterized queries, this may allow you to use cached query plans.

The only difference between "dynamic" and "static" SQL is the parsing/optimization phase. Once those are done, the query will run identically.
For simple queries, this parsing phase plus the network traffic turns out to be a significant percentage of the total transaction time, so it's good practice to try and reduce these times.
But for large, complicated queries, this processing is overall insignificant compared to the actual path chosen by the optimizer.
I would focus on optimizing the query itself, including perhaps denormalization if you feel that it's appropriate, though I wouldn't do that on a first go around myself.
Sometimes the denormalization can be done at "run time" in the application using cached lookup tables, for example, rather than maintaining this o the database.

Not a fan of dynamic Sql but if you are stuck with it, you should probably read this article:
http://www.sommarskog.se/dynamic_sql.html
He really goes in depth on the best ways to use dynamic SQL and the isues using it can create.
As others have said, indexing is the most likely culprit. In indexing, one thing people often forget to do is put an index on the FK fields. Since a PK creates an index automatically, many assume an FK will as well. Unfortunately creating an FK does nto create an index. So make sure that any fields you join on are indexed.
There may be better ways to create your dynamic SQL but without seeing the code it is hard to say. I would at least look to see if it is using subqueries and replace them with derived table joins instead. Also any dynamic SQl that uses a cursor is bound to be slow.

If the parameters are optional, a trick that's often used is to create a procedure like this:
CREATE PROCEDURE GetArticlesByAuthor (
#AuthorId int,
#EarliestDate datetime = Null )
AS
SELECT * --not in production code!
FROM Articles
WHERE AuthorId = #AuthorId
AND (#EarliestDate is Null OR PublishedDate < #EarliestDate)

There are some good examples of queries with optional search criteria here: How do I create a stored procedure that will optionally search columns?

As noted, if you are doing a massive query, Indexes are the first bottleneck to look at. Make sure that heavily queried columns are indexed. Also, make sure that your query checks all indexed parameters before it checks un-indexed parameters. This makes sure that the results are filtered down using indexes first and then does the slow linear search only if it has to. So if col2 is indexed but col1 is not, it should look as follows:
WHERE col2 = #col2 AND col1 = #col1
You may be tempted to go overboard with indexes as well, but keep in mind that too many indexes can cause slow writes and massive disk usage, so don't go too too crazy.
I avoid dynamic queries if I can for two reasons. One, they do not save the query plan, so the statement gets compiled each time. The other is that they are hard to manipulate, test, and troubleshoot. (They just look ugly).
I like Dave Kemp's answer above.

I've had some success (in a limited number of instances) with the following logic:
CREATE PROCEDURE GetArticlesByAuthor (
#AuthorId int,
#EarliestDate datetime = Null
) AS
SELECT SomeColumn
FROM Articles
WHERE AuthorId = #AuthorId
AND #EarliestDate is Null
UNION
SELECT SomeColumn
FROM Articles
WHERE AuthorId = #AuthorId
AND PublishedDate < #EarliestDate

If you are trying to optimize to below the 1s range, it may be important to gauge approximately how long it takes to parse and compile the dynamic sql relative to the actual query execution time:
SET STATISTICS TIME ON;
and then execute the dynamic SQL string "statically" and check the "Messages" tab. I was surprised by these results for a ~10 line dynamic sql query that returns two rows from a 1M row table:
SQL Server parse and compile time:
CPU time = 199 ms, elapsed time = 199 ms.
(2 row(s) affected)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 4 ms.
Index optimization will doubtfully move the 199ms barrier much (except perhaps due to some analyzation/optimization included within the compile time).
However if the dynamic SQL uses parameters or is repeating than the compile results may be cached according to: See Caching Query Plans which would eliminate the compile time. Would be interesting to know how long cache entries live, size, shared between sessions, etc.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas