T-SQL query massive performance difference between using variables & constants - sql-server-2005

All,
I am seeing some really weird behavior when I run a query in terms of performance between using a variable that's value is set at the beginning to actually using the value as a constant in the query.
What I am seeing is that
DECLARE #ID BIGINT
SET #ID = 5
SELECT * FROM tblEmployee WHERE ID = #ID
runs much faster than when I run
SELECT * FROM tblEmployee WHERE ID = 5
This is obviously a simpler version of the actual query but does anyone know of known issues in SQL Server 2005 the way it parses queries that would explain this behavior. My original query goes from 13 seconds to 8 minutes between the two approaches.
Thanks,
Ashish

Are you sure it's that way around?
Normally the parameterised query will be slower because SQL Server doesnp't know in advance what the parameter will be. A constant can be optimised right away.
One thing to note here about datatypes though.. what does this do:
SELECT * FROM tblEmployee WHERE ID = CAST(5 as bigint)
Also, reverse the execution order. We saw something odd the other day and the plans changed when we changed order.
Another way, mask ID to remove "parameter sniffing" affects on the first query. And difference?
DECLARE #ID BIGINT
SET #ID = 5
DECLARE #MaskedID BIGINT
SET #MaskedID = #ID
SELECT * FROM tblEmployee WHERE ID = #MaskedID
Finally, add OPTION (RECOMPILE) to each query. It means the plan is discarded and not re-used so it compiles differently.

Have you checked the query plans for each? That's always the first thing I do when I'm trying to analyze a performance issue.

If values get cached, you could be drawing an unwarranted conclusion that one approach is faster than another. Is there always this difference?

From what I understand it's to do with cached query plans.
When you run Select * from A Where B = #C it's one query plan regardless of value of #C. so if you run 10x with different values for #C, it's a single query plan.
When you run:
Select * from A Where B = 1 it creates a query plan
Select * from A Where B = 2 creates another
Select * from A Where B = 3 creates another
etc.
All this does is eat up memory.
Google query plan caching and literals and I'm sure you turn up detail explanations

Related

Stored procedure using too many selects?

I recently started doing some performance tuning on a client's stored procedures and i bumped into this chunk of code and could'nt find a way to make it work more efficiently.
declare #StationListCount int;
select #StationListCount = count(*) from #StationList;
declare #FleetsCnt int;
select #FleetsCnt=COUNT(*) from #FleetIds;
declare #StationCnt int;
select #StationCnt=COUNT(*) from #StationIds;
declare #VehiclesCnt int;
select #VehiclesCnt=COUNT(*) from #VehicleIds;
declare #TrIds table(VehicleId bigint,TrId bigint,InRange bit);
insert into #TrIds(VehicleId,TrId,InRange)
select t.VehicleID,t.FuelTransactionId,1
from dbo.FuelTransaction t
join dbo.Fleet f on f.FleetID = t.FleetID and f.CompanyID=#ActorCompanyID
where t.TransactionTime>=#From and (#To is null or t.TransactionTime<#To)
and (#StationListCount=0 or exists (select id fRom #StationList where t.FuelStationID = ID))
and (#FleetsCnt=0 or exists (select ID from #FleetIds where ID = t.FleetID))
and (#StationCnt=0 or exists (select ID from #StationIds where ID = t.FuelStationID))
and (#VehiclesCnt=0 or exists (select ID from #VehicleIds where ID = t.VehicleID))
and t.VehicleID is not null
the insert command slows the whole procedure and takes 99% of the resources.
I am not sure but i think these nested loops are referring to the queries inside the where clause
I would very much appreciate the help i can get on this.
Thank you!
There are couple of things that you actually should go over and see the performance differences. First of all, as the previous answer suggest you should omit the count(*)-like aggragates as much as possible. If the table is so big, the cost of these functions exponentially increase. You can even think of storing those counts in a seperate table with proper index constraints.
I also suggest you to split the select statement into multiple statements because when you use so many NULL checks, or, and conditions in combinations; your indexes may be bypassed so that your query cost increases a lot. Sometimes, using UNIONs may provide far better performance than using such conditions.
Actually, you should try all these and see what fits your needs
hope it helps.
Insert is using only 1 table for vehicle Id so joining other tables doesn't requires.
I don't see the declaration of the #table variables, but (assuming the IDs in them are unique) consider communicating this information to the optimizer, IOW add primary key constraints to them.
Also, add the option(recompile) to the end of the query.

SQL Server which query runs faster

UPDATE User
SET Name = (SELECT NameSpace.NameId
FROM NameSpace
WHERE NameSpace.Name = 'BlaBlaBla')
WHERE UserId = 1453
This is faster or
int Value = Select NameSpace.NameId from NameSpace
where NameSpace.Name = 'BlaBlaBla';
UPDATE User
SET Name = "+Value +"
WHERE UserId = 1453
and
Select
UserName,
UserAge,
(Select * from AdressesTable where Adresses.AdresID=User.AdresID)
from
UserTable
where
UserId='123'
OR
Select *
from AdressesTable, UserTable
where Adresses.AdresID = User.AdresID AND UserID = '123'
There are a variety of assumptions to be made in determining which is faster.
First, if you are concerned about speed, then you want indexes on users(userid) and namespace(name).
Second, the assignment query should look like this in SQL Server:
declare #Value int;
select #Value = NameSpace.NameId
from NameSpace
where NameSpace.Name = 'BlaBlaBla';
Your variable declarations and subqueries are not correct for SQL Server.
Finally, even with everything set up correctly, it is not possible to say which is faster. If I assume that there is only one matching record for UserId, then the single update is probably faster -- although perhaps by so little that it is not noticeable. It may not be faster. The update may cause some sort of lock to be taken on NameSpace that would not otherwise be taken. I would actually expect the two to be quite comparable in speed.
However, if many users have the same userid (which is unlikely given the name of the column), then you are doing updates on multiple rows. Storing the calculated result once and using that is probably better than running the subquery multiple times. Even so, with the right indexes, I would expect the difference in performance to be negligible.

Select top 4 with order by, but only if actually required?

I have part of a stored proc that is called thousands and thousands of times and as a result takes up the bulk of the whole thing. Having run it through execution plan it looks like the TOP 4 and Order By part is taking up a lot of that. The order by uses a function that although streamlined, will still be being used a fair bit.
This is an odd situation in that for 99.5% of the data there will be 4 or less results returned anyway, it's only for the 0.5% of times that we need the TOP 4. This is a requirement of the data algorithm so eliminating the TOP 4 entirely is not an option.
So lets say my syntax is
SELECT SomeField * SomeOtherField as MainField, SomeOtherField
FROM
(
SELECT TOP 4
SomeField, 1/dbo.[Myfunction](Param1, Param2, 34892) as SomeOtherField
FROM #MytempTable
WHERE
Param1 > #NextMargin1 AND Param1 < #NextMargin1End
AND Param2 > #NextMargin2 AND Param2 < #NextMargin2End
ORDER BY dbo.[MyFunction](Param1, Param2, 34892)
) d
Is there a way I can tell SQL server to do the order by if and only if there are more than 4 results returned after the where takes place? I don't need the order otherwise. Perhaps a table variable and count of the table in an if?
--- Update based on Davids Answer to try to work out why it was slower:
I did a check and can confirm that 96.5% of times there are 4 or less results so it's not a case of more data than expected.
Here is the execution plan for the insert into the #FunctionResults
And the breakdowns of the Insert and spool:
And then the execution plan for the selection of the top4 and orderby:
Please let me know if any further information or breakdowns are required, the size of #Mytemptable could typically be 28000 rows and it has index
CREATE INDEX MyIndex on #MyTempTable (Param1, Param2) INCLUDE ([SomeField])
This answer has been updated based on continued feedback from the question asker. The original suggestion was to attempt to use a table variable to store pre-calculations and select the top 4 from the results. However, in practice it appears that the optimizer was over-estimating the number of rows and choosing a bad execution plan.
In addition to the previous recommendations, I would also recommend updating statistics periodically after any change to this process to provide the query optimizer with updated information to make more informed decisions.
As this is a performance tuning process without direct access to the source environment, this answer is expected to change based on user feedback. Per the recommendation of #SteveFord above, the sample query below reflects the use of a CROSS APPLY to attempt to avoid multiple unnecessary function calls.
SELECT TOP 4
M.SomeField,
M.SomeField * 1/F.FunctionResults [SomeOtherField]
FROM #MytempTable M
CROSS APPLY (SELECT dbo.Myfunction(M.Param1, M.Param2, 34892)) F(FunctionResults)
ORDER BY F.FunctionResults

SQL Server 2008: Optimize Query Performance with known empty result

I have an application in which a user has a mask which runs sql statements against a SQL Server 2008 database. In addition the user can set parameters in the mask. Consider a mask with one parameter, which is a Dropdown with 2 selections: "Planes" and "Cars".
When the User selects "Cars" and hits the "Execute" button, the following SQL statement which I configured before in the mask hits the database.
SELECT cars.id, cars.name
FROM cars
WHERE 'Cars' = 'Cars'
UNION ALL
SELECT planes.id, planes.name
FROM planes
WHERE 'Planes' = 'Cars'
(This a quite made up example, as the queries in my Application are far more complex with lots of JOINS and so on...)
Even if I take the second part and paste it into the SQL Server Management studio, set some parameters and hit execute, the query will take several seconds to complete... with an empty result.
My question now: How can I optimize the second part, so the SQL Server recognizes that in the second SELECT statement, there really is nothing to do?
EDIT:
The reason why my second ("dead") query executes for some time is the following: Inside the query there are JOINS, along with a Sub-SELECT in the WHERE clause. Let's say
SELECT planes.id, planes.name
FROM planes
INNER JOIN very_complex_colour_view colours
ON colours.id = planes.colour.id
WHERE 'Planes' = 'Cars'
In fact even the "planes" table is a complex view itself.
Depending on parameter it is selecting records from respective table.
So no need of using UNION ALL.
Use IF ELSE construct -
DECLARE #Input VARCHAR(20) = 'Cars'
IF (#Input = 'Cars')
BEGIN
SELECT cars.id, cars.name
FROM cars
END
ELSE IF (#Input = 'Planes')
BEGIN
SELECT planes.id, planes.name
FROM planes
END
This would also help SQL Optimizer to use Parameter Sniffing technique and use best execution plan which would improve your query performance.
More on Parameter Sniffing -
Parameter Sniffing
Parameter Sniffing (or Spoofing) in SQL Server
When I run the following query on my system:
select *
from <really big table that is not in the cache>
where 'planes' = 'cars'
The results returned in about 1 second the first time and subsequent times immediately.
I suspect that your actual query is more complicated than this example. And, the simplification is eliminating the problem. Even when I try the above with a join, there is basically no performance hit.
Based on the execution plans I'm seeing, SQL Server recognizes that the constant is always false. The following query:
select *
from Published_prev2..table1 sv join
Published_prev2..table2 sl
on sv.id= sl.id
where 'planes' = 'cars'
Produces a constant scan for the execution plan. The same query with 'cars = cars' produces a more complex plan that has joins and so on.

T-SQL query performance puzzle: Why does using a variable make a difference?

I'm trying to optimize a complex SQL query and getting wildly different results when I make seemingly inconsequential changes.
For example, this takes 336 ms to run:
Declare #InstanceID int set #InstanceID=1;
With myResults as (
Select
Row = Row_Number() Over (Order by sv.LastFirst),
ContactID
From DirectoryContactsByContact(1) sv
Join ContainsTable(_s_Contacts, SearchText, 'john') fulltext on (fulltext.[Key]=ContactID)
Where IsNull(sv.InstanceID,1) = #InstanceID
and len(sv.LastFirst)>1
) Select * From myResults Where Row between 1 and 20;
If I replace the #InstanceID with a hard-coded number, it takes over 13 seconds (13890 ms) to run:
Declare #InstanceID int set #InstanceID=1;
With myResults as (
Select
Row = Row_Number() Over (Order by sv.LastFirst),
ContactID
From DirectoryContactsByContact(1) sv
Join ContainsTable(_s_Contacts, SearchText, 'john') fulltext on (fulltext.[Key]=ContactID)
Where IsNull(sv.InstanceID,1) = 1
and len(sv.LastFirst)>1
) Select * From myResults Where Row between 1 and 20;
In other cases I get the exact opposite effect: For example, using a variable #s instead of the literal 'john' makes the query run more slowly by an order of magnitude.
Can someone help me tie this together? When does a variable make things faster, and when does it make things slower?
The cause might be that IsNull(sv.InstanceID,1) = #InstanceID is very selective for some values of #InstanceID, but not very selective for others. For example, there could be millions of rows with InstanceID = null, so for #InstanceID = 1 a scan might be quicker.
But if you explicitly provide the value of #InstanceID, SQL Server knows based on the table statistics whether it's selective or not.
First, make sure your statistics are up to date:
UPDATE STATISTICS table_or_indexed_view_name
Then, if the problem still occurs, compare the query execution plan for both methods. You can then enforce the fastest method using query hints.
With hardcoded values the optimizer knows what to base on when building execution plan.
When you use variables it tries to "guess" the value and in many cases it gets not the best one.
You can help it to pick a value for optimization in 2 ways:
"I know better", this will force it to use the value you provide.
OPTION (OPTIMIZE FOR(#InstanceID=1))
"See what I do", this will instruct it to sniff the values you pass and use average (or most popular for some data types) value of those supplied over time.
OPTION (OPTIMIZE FOR UNKNOWN)