I have an application in which a user has a mask which runs sql statements against a SQL Server 2008 database. In addition the user can set parameters in the mask. Consider a mask with one parameter, which is a Dropdown with 2 selections: "Planes" and "Cars".
When the User selects "Cars" and hits the "Execute" button, the following SQL statement which I configured before in the mask hits the database.
SELECT cars.id, cars.name
FROM cars
WHERE 'Cars' = 'Cars'
UNION ALL
SELECT planes.id, planes.name
FROM planes
WHERE 'Planes' = 'Cars'
(This a quite made up example, as the queries in my Application are far more complex with lots of JOINS and so on...)
Even if I take the second part and paste it into the SQL Server Management studio, set some parameters and hit execute, the query will take several seconds to complete... with an empty result.
My question now: How can I optimize the second part, so the SQL Server recognizes that in the second SELECT statement, there really is nothing to do?
EDIT:
The reason why my second ("dead") query executes for some time is the following: Inside the query there are JOINS, along with a Sub-SELECT in the WHERE clause. Let's say
SELECT planes.id, planes.name
FROM planes
INNER JOIN very_complex_colour_view colours
ON colours.id = planes.colour.id
WHERE 'Planes' = 'Cars'
In fact even the "planes" table is a complex view itself.
Depending on parameter it is selecting records from respective table.
So no need of using UNION ALL.
Use IF ELSE construct -
DECLARE #Input VARCHAR(20) = 'Cars'
IF (#Input = 'Cars')
BEGIN
SELECT cars.id, cars.name
FROM cars
END
ELSE IF (#Input = 'Planes')
BEGIN
SELECT planes.id, planes.name
FROM planes
END
This would also help SQL Optimizer to use Parameter Sniffing technique and use best execution plan which would improve your query performance.
More on Parameter Sniffing -
Parameter Sniffing
Parameter Sniffing (or Spoofing) in SQL Server
When I run the following query on my system:
select *
from <really big table that is not in the cache>
where 'planes' = 'cars'
The results returned in about 1 second the first time and subsequent times immediately.
I suspect that your actual query is more complicated than this example. And, the simplification is eliminating the problem. Even when I try the above with a join, there is basically no performance hit.
Based on the execution plans I'm seeing, SQL Server recognizes that the constant is always false. The following query:
select *
from Published_prev2..table1 sv join
Published_prev2..table2 sl
on sv.id= sl.id
where 'planes' = 'cars'
Produces a constant scan for the execution plan. The same query with 'cars = cars' produces a more complex plan that has joins and so on.
Related
INSERT INTO #TEMP(ID,CID,STS,ETL_NBR,T_ID)
SELECT STG.ID,STG.CID,STG.STS,STG.ETL_NBR,STG.T_ID
FROM DBO.A_STAGE STG(NOLOCK)
INNER JOIN DBO.A_PRE PRE(NOLOCK)
ON PRE.ID=STG.ID AND PRE.CID=STG.CID
WHERE PRE.STS = 'D'
AND STG.ETL_NBR < PRE.ETL_NBR
Above query is constructed from Dynamic SQL inside a stored procedure. Tables involved in joins are actually being read through variables.This query hangs for small volume of data as well.
1) If I perform SELECT based on above conditions, The query still results 0 records. For insert it hangs for hours.
2) There are no blockers on this query.
Note: Since this is dynamic query when other tables are passed to variables this query runs smooth. It has issue with specific table only. I did update stats and rebuild index on that table. No use.
This can be the classic Parameter Sniffing, you can read more here: what-is-parameter-sniffing
There's a way to improve performance with such a queries, using the OPTION (RECOMPILE) in your query
You need to incluide at the end of query
You can also use the UNKNOWN for the each variable, like this:
OPTION (OPTIMIZE FOR (#variable 1UNKNOWN, #variable2 UNKNOWN, ....))
You can read more here: improving-query-performance-with-option-recompile-constant-folding-and-avoiding-parameter-sniffing-issues/
PRE.CID=STG.CID
This particular join has NULL and Spaces. It is not Unique or PK column. Hence the slowness. I have excluded them in Join by adding PRE.CID IS NOT NULL .
The query now runs fast.
I have a relative simple query
SELECT
, db1.something
, COALESCE(db2.something_else, 'NA') AS something2
FROM dwh.db_1 AS db1
LEFT JOIN dwh.db_2 AS db2 ON db1.some_id = db2 = some_id
EXPLAIN gives an estimated time of something more than 15 seconds.
On the other hand, explain on the following, where we basically replaced the alias with the table name:
SELECT
, db1.something
, COALESCE(db_2.something_else, 'NA') AS something2
FROM dwh.db_1 AS db1
LEFT JOIN dwh.db_2 AS db2 ON db1.some_id = db2.some_id
gives an estimated time of over 4 hours, where it seems like the system is trying to execute a product join on some spool (I can't really follow the sequence of planning steps).
I always thought that aliases are just aliases and have no impact on perf.
The estimated time is probably correct :-)
A Table-Alias is not really an alias, it replaces the tablename within that query. In Teradata using the original tablename doesn't result in an error message (as it does within most other DBMSes), but it causes a
CROSS join.
Why? Well, Teradata was implemented before there was Standard SQL, the initial query language was called TEQUEL (TEradata QUEry Language), whose syntax didn't require to list tables within FROM. A simple RETRIEVE TableName.ColumnName carried enough information for the Parser/Optimizer to resolve tablename and columnname. There's no flag to switch it off, some client tools refuse to submit it, but you can still submit RETRIEVE in BTEQ.
Within that above example you're mixing old TEQUEL and SQL, there are 3 tables for the optimizer, but only one join-condition, this results
in a CROSS join to the third table.
At least it's easy to spot in Explain. The optimizer will do this stupid join as last step, so scroll to the end and you will see joined using a product join, with a join condition of ("(1=1)").
I have a stored procedure where I use a Common Table Expression to build a hierarchical path up a menu (so it can display something like Parent Menu -> Sub Menu -> Sub Sub Menu -> ...)
It works great for what I want to use it for, the issue comes when putting the information I get from the recursive CTE into the information I really want. I do an Inner Join from my Data to the CTE and get out the Hierarchical Path. For something that returns ~300 rows, the stored procedure takes on average 15-20 seconds.
When I insert the results from the CTE into a Temp Table and do the join based on that, the procedure takes less than a second.
I was just wondering why it takes so long to join using only the CTE, or if I am misusing CTE's in some way.
**Edit this is the stored procedure essentially
With Hierarchical_Path (Menu_ID, Parent_ID, Path)
As
(
Select
EM.Menu_Id, Parent_ID,
Convert(varchar(max),
EM.Description) as Path
From
Menu EM
Where
--EM.Topic_No is null
EM.Parent_ID = 0 and EM.Disabled = 0
Union All
Select
EM.Menu_ID,
EM.Parent_ID,
Convert(Varchar(max),E.Path + ' -> ' + EM.Description) as Path
From
Menu EM
Inner Join
Hierarchical_Path E
On
EM.Parent_ID = E.Menu_ID
)
SELECT distinct
EM.Description
,EMS.Path
FROM
dbo.Menu em
INNER JOIN
Hierarchical_Path EMS
ON
EMS.Menu_ID = em.Menu_Id
2 more INNER JOINs
2 Left Joins
WHERE Clause
When I run the query like this (joining onto the CTE) the performance is around 20 seconds.
When I insert the CTE results into a temp table, and join onto that, the performance is instantaneous.
Taking apart my query a bit more, it seems like it gets hung up on the where clause. I guess my question is more to the point of when exactly does a CTE run and does it get stored in memory? I was running under the assumption that it gets called once and then sticks around in memory, but under some circumstances could it be called mulitple times?
The difference is a CTE is not persisted and a temporary table is (at least for the session). Joining on a non-persisted column means SQL has no stats on the data at all compared to the same column in a temporary table which is already pre-evaluated. Basically, the temp table caches what you would use and SQL Server can better optimize for it. The same issues are run into when joining on the result of a function or a table variable.
My guess is that your CTE execution plan is doing the execution with a single thread while your temp table can use multiple threads. You can check this by including actual execution plan when you run the queries and looking for two horizontal arrows pointing in opposite directions on each operator. That indicates parallelism.
P.S. - Try setting "set statistics io on" and "set statistics time on" to see if the actual cost of running the queries are the same regardless of run duration.
I have the following two SQL statements
First one:
IF(#User_Id IS NULL)
BEGIN
SELECT *
FROM [UserTable]
END
ELSE
BEGIN
SELECT *
FROM [UserTable] AS u
WHERE u.[Id] = #User_Id
END
Second one:
SELECT *
FROM [UserTable] AS u
WHERE (#User_Id IS NULL OR u.[Id] = #User_Id)
Both of those queries would be wrapped in its own stored procedure. I am suspecting that the IF statement is causing a lot of recompilations on SQL. I am faced with either separating each part of the IF statement into its own stored procedure, OR replacing the entire IF statement with a WHERE clause (illustrated above in the second SQL statement)
My question is: What is the difference between the two statements from a performance perspective, and how would SQL treat each statement?
Thanks.
Both solution will generate identical number of compilations.
The first solution the query optimizer is free to come up with the best plan for each of the two, different, queries. The first query (on the NULL branch of the IF) is not much that can be optimized, but the second one (on the NOT NULL branch of the ID) can be optimized if an index on Id column exists.
But the second solution is an optimization disaster. No matter the value of the #User_Id parameter, the optimizer has to come up with a plan that works for any value of the parameter. As such, no matter the value of #User_Id, the plan will always use the suboptimal table scan. There is just no way around this issue, and this is not parameter sniffing as some might think. Is just correctness of the plan, even if the value at plan generation time is NOT NULL, the plan has to work even when the parameter is NULL, so it cannot use the index on Id.
Always, always, always, use the first form with the explicit IF.
All,
I am seeing some really weird behavior when I run a query in terms of performance between using a variable that's value is set at the beginning to actually using the value as a constant in the query.
What I am seeing is that
DECLARE #ID BIGINT
SET #ID = 5
SELECT * FROM tblEmployee WHERE ID = #ID
runs much faster than when I run
SELECT * FROM tblEmployee WHERE ID = 5
This is obviously a simpler version of the actual query but does anyone know of known issues in SQL Server 2005 the way it parses queries that would explain this behavior. My original query goes from 13 seconds to 8 minutes between the two approaches.
Thanks,
Ashish
Are you sure it's that way around?
Normally the parameterised query will be slower because SQL Server doesnp't know in advance what the parameter will be. A constant can be optimised right away.
One thing to note here about datatypes though.. what does this do:
SELECT * FROM tblEmployee WHERE ID = CAST(5 as bigint)
Also, reverse the execution order. We saw something odd the other day and the plans changed when we changed order.
Another way, mask ID to remove "parameter sniffing" affects on the first query. And difference?
DECLARE #ID BIGINT
SET #ID = 5
DECLARE #MaskedID BIGINT
SET #MaskedID = #ID
SELECT * FROM tblEmployee WHERE ID = #MaskedID
Finally, add OPTION (RECOMPILE) to each query. It means the plan is discarded and not re-used so it compiles differently.
Have you checked the query plans for each? That's always the first thing I do when I'm trying to analyze a performance issue.
If values get cached, you could be drawing an unwarranted conclusion that one approach is faster than another. Is there always this difference?
From what I understand it's to do with cached query plans.
When you run Select * from A Where B = #C it's one query plan regardless of value of #C. so if you run 10x with different values for #C, it's a single query plan.
When you run:
Select * from A Where B = 1 it creates a query plan
Select * from A Where B = 2 creates another
Select * from A Where B = 3 creates another
etc.
All this does is eat up memory.
Google query plan caching and literals and I'm sure you turn up detail explanations