Optimize complicated SQL Update - sql

Somebody at work made this UPDATE some years ago and itt works, the problem is it's taking almost 5 hours when called multiple times in a process, this is not a regular UPDATE, there is no 1 to 1 record matching between tables, this does an update based on accumulative (SUM) of a parituclar field in the same table, and things get more complicated because this SUM is restricted to special conditions based on dates and another field.
I think this is something like an (implicit) inner join with no 1 to 1 match, like ALL VS ALL, so when having for example 7000 records in the table this thing will process 7000 * 7000 records, more than 55 million, in my opinion cursors should have been used here, but now i need more speed and i don't think cursors will get me there.
My question is: Is there any way to rewrite this and make it faster?? Pay attention to the conditions on that SUM, this is not an easy to see UPDATE (at least for me).
More info:
CodCtaCorriente and CodCtaCorrienteMon are primary keys on this table but, as I said before there is no intention to make a 1 to 1 match here that's why this keys are not used in the query, CodCtaCorrienteMon is used in conditions but not as a join condition (ON).
UPDATE #POS SET SaldoDespuesEvento =
(SELECT SUM(Importe)
FROM #POS CTACTE2
WHERE CTACTE2.CodComitente = #POS.CodComitente
AND CTACTE2.CodMoneda = #POS.CodMoneda
AND CTACTE2.EstaAnulado = 0
AND (DATEDIFF(day, CTACTE2.FechaLiquidacion, #POS.FechaLiquidacion) > 0
OR
(DATEDIFF(day, CTACTE2.FechaLiquidacion, #POS.FechaLiquidacion) = 0
AND (#POS.CodCtaCorrienteMon >= CTACTE2.CodCtaCorrienteMon))))
WHERE #POS.EstaAnulado = 0 AND #POS.EsSaldoAnterior = 0

From your query plan it looks like its spending most of the time in the filter right after the index spool.
If you are going to run this query a few times, I would create an index on the 'CodComitente', 'CodMoneda', 'EstaAnulado', 'FechaLiquidacion', and 'CodCtaCorrienteMon' columns.
I don't know much about the Index Spool iterator; but basically from what I understand about it, its used as a 'temporary' index created at query time. So if you are running this query multiple times, I would create that index once, then run the query as many times as you need.
Also, I would try creating a variable to store the result of your sum operation, so you can avoid running that as much as possible.
DECLARE #sumVal AS INT
SET #sumVal = SELECT SUM(Importe)
FROM #POS CTACTE2
WHERE CTACTE2.CodComitente = #POS.CodComitente
AND CTACTE2.CodMoneda = #POS.CodMoneda
AND CTACTE2.EstaAnulado = 0
AND (DATEDIFF(day, CTACTE2.FechaLiquidacion, #POS.FechaLiquidacion) > 0
OR
(DATEDIFF(day, CTACTE2.FechaLiquidacion, #POS.FechaLiquidacion) = 0
AND (#POS.CodCtaCorrienteMon >= CTACTE2.CodCtaCorrienteMon)))
UPDATE #POS SET SaldoDespuesEvento = #sumVal
WHERE #POS.EstaAnulado = 0 AND #POS.EsSaldoAnterior = 0

It is hard to help much without the query plan but I would make the an assumption that if there is not already indexes on the FechaLiquidacion and CodCtaCorrienteMon columns then performance would be improved by creating them as long as database storage space is not an issue.

Found the solution, this is a common problem: Running Totals
This is one of the few cases CURSORS perform better, see this and more available solutions here (or browse stackoverflow, there are many cases like this):
http://weblogs.sqlteam.com/mladenp/archive/2009/07/28/SQL-Server-2005-Fast-Running-Totals.aspx

Related

How to improve the efficiency of below query in SQL Server?

I have a ten million level database. The client needs to read data and perform calculation.
Due to the large amount of data, if it is saved in the application cache, memory will be overflow and crash will occur.
If I use select statement to query data from the database in real time, the time may be too long and the number of operations on the database may be too frequent.
Is there a better way to read the database data? I use C++ and C# to access SQL Server database.
My database statement is similar to the following:
SELECT TOP 10 y.SourceName, MAX(y.EndTimeStamp - y.StartTimeStamp) AS ProcessTimeStamp
FROM
(
SELECT x.SourceName, x.StartTimeStamp, IIF(x.EndTimeStamp IS NOT NULL, x.EndTimeStamp, 134165256277210658) AS EndTimeStamp
FROM
(
SELECT
SourceName,
Active,
LEAD(Active) OVER(PARTITION BY SourceName ORDER BY TicksTimeStamp) NextActive,
TicksTimeStamp AS StartTimeStamp,
LEAD(TicksTimeStamp) OVER(PARTITION BY SourceName ORDER BY TicksTimeStamp) EndTimeStamp
FROM Table1
WHERE Path = N'App1' and TicksTimeStamp >= 132165256277210658 and TicksTimeStamp < 134165256277210658
) x
WHERE (x.Active = 1 and x.NextActive = 0) OR (x.Active = 1 and x.NextActive = null)
) y
GROUP BY y.SourceName
ORDER BY ProcessTimeStamp DESC, y.SourceName
The database structure is roughly as follows:
ID Path SourceName TicksTimeStamp Active
1 App1 Pipe1 132165256277210658 1
2 App1 Pipe1 132165256297210658 0
3 App1 Pipe1 132165956277210658 1
4 App2 Pipe2 132165956277210658 1
5 App2 Pipe2 132165956277210658 0
I use the ExecuteReader of C #. The same SQL statement runs on SQL Management for 4s, but the time returned by the ExecuteReader is 8-9s. Does the slow time have anything to do with this interface?
I don't really 'get' the entire query but I'm wondering about this part:
WHERE (x.Active = 1 and x.NextActive = 0) OR (x.Active = 1 and x.NextActive = null)
SQL doesn't really like OR's so why not convert this to
WHERE x.Active = 1 and ISNULL(x.NextActive, 0) = 0
This might cause a completely different query plan. (or not)
As CharlieFace mentioned, probably best to share the query plan so we might get an idea of what's going on.
PS: I'm also not sure what those 'ticksTimestamps' represent, but it looks like you're fetching a pretty wide range there, bigger volumes will also cause longer processing time. Even though you only return the top 10 it still has to go through the entire range to calculate those durations.
I agree with #Charlieface. I think the index you want is as follows:
CREATE INDEX idx ON Table1 (Path, TicksTimeStamp) INCLUDE (SourceName, Active);
You can add both indexes (with different names of course) and see which one the execution engine chooses.
I can suggest adding the following index which should help the inner query using LEAD:
CREATE INDEX idx ON Table1 (SourceName, TicksTimeStamp, Path) INCLUDE (Active);
The key point of the above index is that it should allow the lead values to be rapidly computed. It also has an INCLUDE clause for Active, to cover the entire select.

Do parentheses in a SQL Server View impact performance?

When I added a set of parentheses around an "OR" statement in MS SQL Server View query, the performance suffered dramatically.
I ran across a view that had a join with a complicated set of and / or clauses in it. I thought the statement would be easier to read and maintain with the addition of parentheses. After I added the parentheses, though, the view performance tanked.
I ran this statement: SELECT COUNT(*) FROM [ViewName]. Result was
630,644. Query took 14 seconds to process.
Then added a set of parentheses to the view definition and ran the
same COUNT(*) statement. Stopped the query after 75 seconds; it had
still not completed.
Removed the parentheses from the view. COUNT(*) again. Result took
15 seconds.
Added parentheses again. COUNT(*) query took 90 seconds before I
stopped it.
This leads me to believe that the parentheses were definitely impacting the performance of the view. But everything I have read says that parentheses in SQL don't impact performance.
What am I missing?
This leads me to believe that the parentheses were definitely impacting the performance of the view. But everything I have read says that parentheses in SQL don't impact performance.
It depends where parentheses been applied. sometimes it applies with no effect on logic, and sometime there is effect i.e. as per following example, in Example 1 there would be effect on query processing eventually on performance, but in Example 2 there won't be any effect with adding parentheses
Example 1
ColumnA = 0 and ColmnB = 1 or ColumnB = 2
ColumnA = 0 and (ColmnB = 1 or ColumnB = 2)
ColumnA = 0 and ColmnB = 1 or (ColumnB = 2)
Example 2
ColumnA = 0 and ColmnB = 1
ColumnA = 0 and (ColmnB = 1)
(ColumnA = 0) and ColmnB = 1
with a complicated set of and / or clauses in it
Further more, I would suggest look at execution plan as well for clear understanding, this you can enable in SSMS (shortcut ctrl+m) before executing the query, and see the difference in each execution:
How many tasks query had to perform to get the result
What type of tasks query had to perform
What are the estimated rows and Actual number of rows in each task
Index usage changed, any new indexes required etc...

Erratic query performance

I am new to this site, but please don't hold it against me. I have only used it once.
Here is my dilemma: I have moderate SQL knowledge but am no expert. The query below was created by a consultant a long time ago.
On most mornings it takes a 1.5 hours to run because there is lots of data. BUT other mornings, it takes 4-6 hours. I have tried eliminating any jobs that are running. I am thoroughly confused as to what to try to find out what is causing this problem.
Any help would be appreciated.
I have already broken this query into 2 queries, but any tips on ways to help boost performance would be greatly appreciated.
This query builds back our inventory transactions to find what our stock on hand value was at any given point in time.
SELECT
ITCO, ITIM, ITLOT, Time, ITWH, Qty, ITITCD,ITIREF,
SellPrice, SellCost,
case
when Transaction_Cost is null
then Qty * (SELECT ITIACT
FROM (Select Top 1 B.ITITDJ, B.ITIREF, B.ITIACT
From OMCXIT00 AS B
Where A.ITCO = B.ITCO
AND A.ITWH = B.ITWH
AND A.ITIM = B.ITIM
AND A.ITLOT = B.ITLOT
AND ((A.ITITDJ > B.ITITDJ)
OR (A.ITITDJ = B.ITITDJ AND A.ITIREF <= B.ITIREF))
ORDER BY B.ITITDJ DESC, B.ITIREF DESC) as C)
else Transaction_Cost
END AS Transaction_Cost,
case when ITITCD = 'S' then ' Shipped - Stock' else null end as TypeofSale,
case when ititcd = 'S' then ITIREF else null end as OrderNumber
FROM
dbo.InvTransTable2 AS A
Here is the execution plan.
http://i.imgur.com/mP0Cu.png
Here is the DTA but I am unsure how to read it since the recommedations are blank. Shouldn't that say "Create"?
http://i.imgur.com/4ycIP.png
You can not do match with dbo.InvTransTable2, because of you are selected all records from it, so it will be left scanning records.
Make sure that you have clustered index on OMCXIT00, it looks like it is a heap, no clustered index.
Make sure that clustered index is small, but has more distinct values in it.
If you have not many records OMCXIT00, it may be sufficient to create index with key ITCO and include following columns in include ( ITITDJ , ITIREF, ITWH,ITCO ,ITIM,ITLOT )
Index creation example:
CREATE INDEX IX_dbo_OMCXIT00
ON OMCXIT00 ([ITCO])
INCLUDE ( ITITDJ , ITIREF)
If it does not help, then you need to see which columns in the predicates that you are searching for has more distinct values, and create index with key one or some of them and make sure reorder predicate order in where clause.
A.ITCO = B.ITCO
AND A.ITWH = B.ITWH
AND A.ITIM = B.ITIM
AND A.ITLOT = B.ITLOT
besides adding indexes to change table scans for index seeks, ask to yourself: "do i really need this order by in this sql code?". if you dont neet this sorting, remove order by from your sql code. next, there is a good chance your code will be faster.

SQL Server 2008 UPDATE Statement WHERE clause precedence

I wrote the following query:
UPDATE king_in
SET IN_PNSN_ALL_TP_CNTRCT_CD = IN_PNSN_ALL_TP_CNTRCT_CD + '3'
WHERE COALESCE(IN_PNSN_ALL_TP_CNTRCT_TX, '') <> ''
AND CHARINDEX('3', IN_PNSN_ALL_TP_CNTRCT_CD) = 0
It checks to see if a field has a value in it and if it does it puts a 3 in a corresponding field if there isn't a 3 already in it. When I ran it, I got a string or binary data will be truncated error. The field is a VARCHAR(3) and there are rows in the table that already have 3 characters in them but the rows that I was actually doing the updating on via the WHERE filter had a MAX LEN of 2 so I was completely baffled as to why SQL Server was throwing me the truncation error. So I changed my UPDATE statement to:
UPDATE king_in
SET IN_PNSN_ALL_TP_CNTRCT_CD = k.IN_PNSN_ALL_TP_CNTRCT_CD + '3'
FROM king_in k
INNER JOIN
(
SELECT ki.row_key,
in_sqnc_nb
FROM king_in ki
INNER JOIN King_Ma km
ON ki.Row_Key = km.Row_Key
INNER JOIN King_Recs kr
ON km.doc_loc_nb = kr.ACK_ID
WHERE CHARINDEX('3', IN_PNSN_ALL_TP_CNTRCT_CD) = 0
AND COALESCE(IN_PNSN_ALL_TP_CNTRCT_TX, '') <> ''
) a
ON k.Row_Key = a.Row_Key
AND k.in_sqnc_nb = a.insr_sqnc_nb
and it works fine without error.
So it appears based on this that when doing an UPDATE statement without a FROM clause that SQL Server internally goes through and runs the SET statement before it filters the records based on the WHERE clause. Thats why I was getting the truncation error, because even though the records I wanted to update were less than 3 characters, there were rows in the table that had 3 characters in that field and when it couldn't add a '3' to the end of one of those rows, it threw the error.
So after all of that, I've got a handful of questions.
1) Why? Is there a specific DBMS reason that SQL Server wouldn't filter the result set before applying the SET statement?
2) Is this just a known thing about SQL that I never learned along the way?
3) Is there a setting in SQL Server to change this behavior?
Thanks in advance.
1 - Likely because your criteria are not SARGable - that is, they can't use an index. If the query optimizer determines it's faster to do a table scan, it'll go ahead and run on all the rows. This is especially likely when you filter on a function applied to the field like you do here.
2 - Yes. The optimizer will do what it thinks it best. You can get around this somewhat by using parentheses to force an evaluation order of your WHERE clause but in your example I don't think it would help since it forces a table scan regardless.
3 - No, you need to alter your data or your logic to allow indexes to be used. If you really really need to filter on existence of a certain character in a field, it probably should be it's own column and/or you should normalize that particular bit of data better.
A workaround for your particular instance would be to add a WHERE LEN(IN_PNSN_ALL_TP_CNTRCT_CD) < 3 as well.

Optimizing MySQL statement with lot of count(row) an sum(row+row2)

I need to use InnoDB storage engine on a table with about 1mil or so records in it at any given time. It has records being inserted to it at a very fast rate, which are then dropped within a few days, maybe a week. The ping table has about a million rows, whereas the website table only about 10,000.
My statement is this:
select url
from website ws, ping pi
where ws.idproxy = pi.idproxy and pi.entrytime > curdate() - 3 and contentping+tcpping is not null
group by url
having sum(contentping+tcpping)/(count(*)-count(errortype)) < 500 and count(*) > 3 and
count(errortype)/count(*) < .15
order by sum(contentping+tcpping)/(count(*)-count(errortype)) asc;
I added an index on entrytime, yet no dice. Can anyone throw me a bone as to what I should consider to look into for basic optimization of this query. The result set is only like 200 rows, so I'm not getting killed there.
In the absence of the schemas of the relations, I'll have to make some guesses.
If you're making WHERE a.attrname = b.attrname clauses, that cries out for a JOIN instead.
Using COUNT(*) is both redundant and sometimes less efficient than COUNT(some_specific_attribute). The primary key is a good candidate.
Why would you test contentping+tcpping IS NOT NULL, asking for a calculation that appears unnecessary, instead of just testing whether the attributes individually are null?
Here's my attempt at an improvement:
SELECT url
FROM website AS ws
JOIN ping AS pi
ON ws.idproxy = pi.idproxy
WHERE
pi.entrytime > CURDATE() - 3
AND pi.contentping IS NOT NULL
AND pi.tcpping IS NOT NULL
GROUP BY url
HAVING
SUM(pi.contentping + pi.tcpping) / (COUNT(pi.idproxy) - COUNT(pi.errortype)) < 500
AND COUNT(pi.idproxy) > 3
AND COUNT(pi.errortype) / COUNT(pi.idproxy) < 0.15
ORDER BY
SUM(pi.contentping + pi.tcpping) / (COUNT(pi.idproxy) - COUNT(pi.errortype)) ASC;
Performing lots of identical calculations in both the HAVING and ORDER BY clauses will likely be costing you performance. You could either put them in the SELECT clause, or create a view that has those calculations as attributes and use that view for accessing the values.