Do parentheses in a SQL Server View impact performance? - sql

When I added a set of parentheses around an "OR" statement in MS SQL Server View query, the performance suffered dramatically.
I ran across a view that had a join with a complicated set of and / or clauses in it. I thought the statement would be easier to read and maintain with the addition of parentheses. After I added the parentheses, though, the view performance tanked.
I ran this statement: SELECT COUNT(*) FROM [ViewName]. Result was
630,644. Query took 14 seconds to process.
Then added a set of parentheses to the view definition and ran the
same COUNT(*) statement. Stopped the query after 75 seconds; it had
still not completed.
Removed the parentheses from the view. COUNT(*) again. Result took
15 seconds.
Added parentheses again. COUNT(*) query took 90 seconds before I
stopped it.
This leads me to believe that the parentheses were definitely impacting the performance of the view. But everything I have read says that parentheses in SQL don't impact performance.
What am I missing?

This leads me to believe that the parentheses were definitely impacting the performance of the view. But everything I have read says that parentheses in SQL don't impact performance.
It depends where parentheses been applied. sometimes it applies with no effect on logic, and sometime there is effect i.e. as per following example, in Example 1 there would be effect on query processing eventually on performance, but in Example 2 there won't be any effect with adding parentheses
Example 1
ColumnA = 0 and ColmnB = 1 or ColumnB = 2
ColumnA = 0 and (ColmnB = 1 or ColumnB = 2)
ColumnA = 0 and ColmnB = 1 or (ColumnB = 2)
Example 2
ColumnA = 0 and ColmnB = 1
ColumnA = 0 and (ColmnB = 1)
(ColumnA = 0) and ColmnB = 1
with a complicated set of and / or clauses in it
Further more, I would suggest look at execution plan as well for clear understanding, this you can enable in SSMS (shortcut ctrl+m) before executing the query, and see the difference in each execution:
How many tasks query had to perform to get the result
What type of tasks query had to perform
What are the estimated rows and Actual number of rows in each task
Index usage changed, any new indexes required etc...

Related

SQL Query Performance with case statement

I have a simple select that is running very slow and have narrowed it down to one particular where statement.
I am not sure if you need to see the whole query, or maybe will be able to help me understand why the case is affecting performance so much. I feel like I found the problem, but can't seem to resolve it. I've worked with case statement before, and have never ran into such huge performance issues.
For this particular example. the declaration is as follows: Declare #lastInvOnly as int = 0
the problem where statement follow and runs for about 20 seconds:
AND ird.inventorydate = CASE WHEN #lastinvonly=0 THEN
-- get the last reported inventory in respect to the specified parameter
(SELECT MAX(ird2.inventorydate)
FROM irdate ird2
WHERE ird2.ris =r.ris AND
ird2.generateddata!='g' AND
ird2.inventorydate <= #inventorydate)
END
Removing the case makes it run in 1 second which is a HUGE difference. I can't understand why.
AND ird.inventorydate =
(SELECT MAX(ird2.inventorydate)
FROM irdate ird2
WHERE ird2.ris = r.ris AND
ird2.generateddata! = 'g' AND
ird2.inventorydate <= #inventorydate)
It should almost certainly be a derived table and you should join to it instead. Sub-selects tend to have poor performance and when used conditionally, even worse. Try this instead:
INNER JOIN (
select
ris
,max(inventorydate) AS [MaxInvDate]
from irdate
where s and generateddata!='g'
and inventorydate <= #inventorydate
GROUP BY ris
) AS MaxInvDate ON MaxInvDate.ris=r.ris
and ird.inventorydate=MaxInvDate.MaxInvDate
and #lastinvonly=0
I'm not 100% positive this logically works with the whole query as your question only provides a small part.
I can't tell for sure without seeing an execution plan but the branch in your filter is likely the cause of the performance problems. Theoretically, the optimizer can take the version without the case and apply an optimization that transforms the subquery in your filter into a join; when the case statement is added this optimization is no longer possible and the subquery is executed for every row. One can refactor the code to help the optimizer out, something like this should work:
outer apply (
select max(ird2.inventorydate) as maxinventorydate
from irdate ird2
where ird2.ris = r.ris
and ird2.generateddata <> 'g'
and ird2.inventorydate <= #inventorydate
and #lastinvonly = 0
) as ird2
where ird.inventorydate = ird2.maxinventorydate

Additional condition in MS SQL query is executing 2 min (compared to original 1 sec)

I have one illogical problem that I just can't figure out.
I am doing a complex query. After I did a little change it began executing over 2 minutes instead of one second. Can someone explain to me how is this even possible? What could be the background of this?
First query
DECLARE #CRUISE_ID int = 10001890 --:CRUISE_ID
SELECT
/* ... */
FROM Cruise_Itinerary with(nolock)
INNER JOIN Cruise with(nolock) ON Cruise_Itinerary.CRUISE_ID = Cruise.CRUISE_ID
AND (Cruise.CRUISE_ID = #CRUISE_ID)
/* ... */
Second query
DECLARE #CRUISE_ID int = 10001890 --:CRUISE_ID
SELECT
/* ... */
FROM Cruise_Itinerary with(nolock)
INNER JOIN Cruise with(nolock) ON Cruise_Itinerary.CRUISE_ID = Cruise.CRUISE_ID
AND (#CRUISE_ID is null OR Cruise.CRUISE_ID = #CRUISE_ID)
/* ... */
The first query executes in one second but the second one takes over 2 minutes to execute. I just don't get it. What is a difference between
AND (10001890 is null OR Cruise.CRUISE_ID = 10001890)
and
AND (#CRUISE_ID is null OR Cruise.CRUISE_ID = #CRUISE_ID)?
Variable #CRUISE_ID has no other occurrences is the entire query.
Edit: I figured it out with help of my colleagues and you guys.
Here is a good explanation what is going on:
http://sqlinthewild.co.za/index.php/2009/03/19/catch-all-queries/
The optimal plan differs completely depending on what parameters are passed. The optimizer can't tell that and it plays safe. It creates plans that will always work. That’s (one of the reasons) why in the first example it was an index scan, not an index seek.
We can see it from the execution plan of the second query that the index scan happened at the end of plan. I checked. It takes over 2 minutes to execute if I remove this the whole condition.
Firstly, the logic in your query seems contradictory. You are essentially saying "If x and (x or y)". We (humans) might think along the lines of:
Given that x (Cruise.CRUISE_ID = #CRUISE_ID) in this instance must be true to meet the AND logic, the second condition (#CRUISE_ID is null OR Cruise.CRUISE_ID = #CRUISE_ID) can be ignored. So ensure that the x is true as the starting point for one's calculations.
The SQL query optimiser however clearly decides that the query plan must try to ensure that both sides of the AND must be met and thus rationalises it something along the lines of:
With just condition 1 the plan can start by performing a (clustered?) INDEX SEEK on the Cruise table on the basis of the (presumably indexed) CruiseID. When you add in condition 3 the optimiser can no longer perform this seek as another predicate (#CruiseID is null) must be taken account of (#CRUISE_ID is null OR Cruise.CRUISE_ID = #CRUISE_ID). Therefore the whole of the Cruise_Itinerary table has to be scanned (there are no other indexed columns it can use), then the join onto Cruise performed before the various conditions are checked as part of the join.
Essentially it is doing what you are asking - if the value is NULL then everything must be returned with predictably devastating consequences for performance. You would be better off using an IF...ELSE block to ensure that the query plan is optimised for both possible options (#CruiseID is null/ is not null).

Optimize complicated SQL Update

Somebody at work made this UPDATE some years ago and itt works, the problem is it's taking almost 5 hours when called multiple times in a process, this is not a regular UPDATE, there is no 1 to 1 record matching between tables, this does an update based on accumulative (SUM) of a parituclar field in the same table, and things get more complicated because this SUM is restricted to special conditions based on dates and another field.
I think this is something like an (implicit) inner join with no 1 to 1 match, like ALL VS ALL, so when having for example 7000 records in the table this thing will process 7000 * 7000 records, more than 55 million, in my opinion cursors should have been used here, but now i need more speed and i don't think cursors will get me there.
My question is: Is there any way to rewrite this and make it faster?? Pay attention to the conditions on that SUM, this is not an easy to see UPDATE (at least for me).
More info:
CodCtaCorriente and CodCtaCorrienteMon are primary keys on this table but, as I said before there is no intention to make a 1 to 1 match here that's why this keys are not used in the query, CodCtaCorrienteMon is used in conditions but not as a join condition (ON).
UPDATE #POS SET SaldoDespuesEvento =
(SELECT SUM(Importe)
FROM #POS CTACTE2
WHERE CTACTE2.CodComitente = #POS.CodComitente
AND CTACTE2.CodMoneda = #POS.CodMoneda
AND CTACTE2.EstaAnulado = 0
AND (DATEDIFF(day, CTACTE2.FechaLiquidacion, #POS.FechaLiquidacion) > 0
OR
(DATEDIFF(day, CTACTE2.FechaLiquidacion, #POS.FechaLiquidacion) = 0
AND (#POS.CodCtaCorrienteMon >= CTACTE2.CodCtaCorrienteMon))))
WHERE #POS.EstaAnulado = 0 AND #POS.EsSaldoAnterior = 0
From your query plan it looks like its spending most of the time in the filter right after the index spool.
If you are going to run this query a few times, I would create an index on the 'CodComitente', 'CodMoneda', 'EstaAnulado', 'FechaLiquidacion', and 'CodCtaCorrienteMon' columns.
I don't know much about the Index Spool iterator; but basically from what I understand about it, its used as a 'temporary' index created at query time. So if you are running this query multiple times, I would create that index once, then run the query as many times as you need.
Also, I would try creating a variable to store the result of your sum operation, so you can avoid running that as much as possible.
DECLARE #sumVal AS INT
SET #sumVal = SELECT SUM(Importe)
FROM #POS CTACTE2
WHERE CTACTE2.CodComitente = #POS.CodComitente
AND CTACTE2.CodMoneda = #POS.CodMoneda
AND CTACTE2.EstaAnulado = 0
AND (DATEDIFF(day, CTACTE2.FechaLiquidacion, #POS.FechaLiquidacion) > 0
OR
(DATEDIFF(day, CTACTE2.FechaLiquidacion, #POS.FechaLiquidacion) = 0
AND (#POS.CodCtaCorrienteMon >= CTACTE2.CodCtaCorrienteMon)))
UPDATE #POS SET SaldoDespuesEvento = #sumVal
WHERE #POS.EstaAnulado = 0 AND #POS.EsSaldoAnterior = 0
It is hard to help much without the query plan but I would make the an assumption that if there is not already indexes on the FechaLiquidacion and CodCtaCorrienteMon columns then performance would be improved by creating them as long as database storage space is not an issue.
Found the solution, this is a common problem: Running Totals
This is one of the few cases CURSORS perform better, see this and more available solutions here (or browse stackoverflow, there are many cases like this):
http://weblogs.sqlteam.com/mladenp/archive/2009/07/28/SQL-Server-2005-Fast-Running-Totals.aspx

Adding 'distinct' keyword to oracle query obliterates query performance for no reason

I am quite confused by something I'm seeing in an Oracle 10 database.
I have the following query.
select
t2.duplicate_num
from table1 t1, table2 t2,
(
select joincriteria_0 from intable1 it1, intable2 it2
where it2.identifier in (4496486,5911382)
and it1.joincriteria_0 = it2.joincriteria_0
and it1.filter_0 = 1
) tt
where t1.joincriteria_0 = tt.joincriteria_0
and t2.joincriteria_1 = t1.joincriteria_1
and t2.filter_0 = 3
and t2.filter_1 = 1
and t2.filter_2 not in (48020)
It doesn't really seem like anything special to me, here are the baseline performance numbers from autotrace:
CR_GETS: 318
CPU: 3
ROWS: 33173
Now if I add the 'DISTINCT' keyword to the query (e.g. 'select distinct t2.duplicate_num...') this happens
CR_GETS: 152921
CPU: 205
ROWS: 305
The query plan has not changed, but the logical IO grows by a factor of 500. I was expecting CPU only to go up and logical IO to be largely unchanged.
The net result is a query that runs 10-100x slower with the distinct keyword. I can put code into the applciation which would make the result set distinct in a fraction of the time. How does this make any sense? particularly without the query plan changing?
This indicates a lack of index somewhere. It also means, your original query without the distinct clause wasn't optimized. With "distinct" also it could not be optimized, so the query plan remained the same. An unoptimized query varies widely in performance due to the full table scans.

SQL Server 2008 UPDATE Statement WHERE clause precedence

I wrote the following query:
UPDATE king_in
SET IN_PNSN_ALL_TP_CNTRCT_CD = IN_PNSN_ALL_TP_CNTRCT_CD + '3'
WHERE COALESCE(IN_PNSN_ALL_TP_CNTRCT_TX, '') <> ''
AND CHARINDEX('3', IN_PNSN_ALL_TP_CNTRCT_CD) = 0
It checks to see if a field has a value in it and if it does it puts a 3 in a corresponding field if there isn't a 3 already in it. When I ran it, I got a string or binary data will be truncated error. The field is a VARCHAR(3) and there are rows in the table that already have 3 characters in them but the rows that I was actually doing the updating on via the WHERE filter had a MAX LEN of 2 so I was completely baffled as to why SQL Server was throwing me the truncation error. So I changed my UPDATE statement to:
UPDATE king_in
SET IN_PNSN_ALL_TP_CNTRCT_CD = k.IN_PNSN_ALL_TP_CNTRCT_CD + '3'
FROM king_in k
INNER JOIN
(
SELECT ki.row_key,
in_sqnc_nb
FROM king_in ki
INNER JOIN King_Ma km
ON ki.Row_Key = km.Row_Key
INNER JOIN King_Recs kr
ON km.doc_loc_nb = kr.ACK_ID
WHERE CHARINDEX('3', IN_PNSN_ALL_TP_CNTRCT_CD) = 0
AND COALESCE(IN_PNSN_ALL_TP_CNTRCT_TX, '') <> ''
) a
ON k.Row_Key = a.Row_Key
AND k.in_sqnc_nb = a.insr_sqnc_nb
and it works fine without error.
So it appears based on this that when doing an UPDATE statement without a FROM clause that SQL Server internally goes through and runs the SET statement before it filters the records based on the WHERE clause. Thats why I was getting the truncation error, because even though the records I wanted to update were less than 3 characters, there were rows in the table that had 3 characters in that field and when it couldn't add a '3' to the end of one of those rows, it threw the error.
So after all of that, I've got a handful of questions.
1) Why? Is there a specific DBMS reason that SQL Server wouldn't filter the result set before applying the SET statement?
2) Is this just a known thing about SQL that I never learned along the way?
3) Is there a setting in SQL Server to change this behavior?
Thanks in advance.
1 - Likely because your criteria are not SARGable - that is, they can't use an index. If the query optimizer determines it's faster to do a table scan, it'll go ahead and run on all the rows. This is especially likely when you filter on a function applied to the field like you do here.
2 - Yes. The optimizer will do what it thinks it best. You can get around this somewhat by using parentheses to force an evaluation order of your WHERE clause but in your example I don't think it would help since it forces a table scan regardless.
3 - No, you need to alter your data or your logic to allow indexes to be used. If you really really need to filter on existence of a certain character in a field, it probably should be it's own column and/or you should normalize that particular bit of data better.
A workaround for your particular instance would be to add a WHERE LEN(IN_PNSN_ALL_TP_CNTRCT_CD) < 3 as well.