Update From a Loop (Table Locking Issues + CLR Memory Issues) - sql

I have a table with a geography column. I want to update this value, for a subset of rows (~1,000), based on the results of a query.
I have created a view that will return the geography column I want + the id of the row in the table to be updated.
If I run the query
UPDATE A
SET A.GeogCol = B.GeogCol
FROM TableToUpdate A
INNER JOIN UpdatedFrom_View B
ON A.ID = B.ID
I will receive a system out of memory error. This occurs because it is 32 bit sql server and the VAS reservation runs out of space because of the CLR functions I call to create the geography column. I am not the sever administrator, so I cannot temporarily allocate more space to the VAS reservation.
If I reduce this to WHERE A.ID BETWEEN 0 AND 100 and attempt to manually itterate, I still have this problem sometimes. I have found a few of the troublesome rows (larger geography objects than usual), and I can update those if I say WHERE A.ID = #ID.
My thought was then to update based on a loop.
DECLARE #ID INT
DECLARE #g GEOGRAPHY
DECLARE IDCursor CURSOR FOR SELECT ID FROM TableToUpdate
OPEN IDCursor
FETCH NEXT FROM IDCursor INTO #ID
WHILE ##FETCH_STATUS = 0
BEGIN
SET #g = (SELECT GeogCol FROM UpdatedFrom_View WHERE ID = #ID)
UPDATE TableToUpdate SET GeogCol = #g WHERE ID = #ID
FETCH NEXT FROM IDCursor INTO #ID
END
CLOSE IDCursor
DEALLOCATE IDCursor
However this appears to have issues with table locking (I think). It has now run for just under 2 days and has updated less than 150 records.
For reference, when done manually, UPDATE TableToUpdate SET GeogCol = #g WHERE ID = #ID takes less than 10 seconds.
Is there a better way to do this?
--EDIT
So I decided to test something.
I wrote this query which takes approximately 1 second.
UPDATE A
SET A.GeogCol = B.GeogCol
FROM TableToUpdate A
INNER JOIN UpdatedFrom_View B
ON A.ID = B.ID
WHERE A.ID BETWEEN 1 AND 1
Then I wrote a stored procedure that does the EXACT same thing.
CREATE PROCEDURE TestUpdate #IDLow INT, #IDHigh INT
AS
BEGIN
UPDATE A
SET A.GeogCol = B.GeogCol
FROM TableToUpdate A
INNER JOIN UpdatedFrom_View B
ON A.ID = B.ID
WHERE A.ID BETWEEN #IDLow AND #IDHigh
END
GO
EXEC TestUpdate #IDLow = 1, #IDHigh = 1
This query takes over half an hour. What is happening?

Related

SQL Query with Cursor optimization

I have a query where I iterate through a table -> for each entry I iterate through another table and then compute some results. I use a cursor for iterating through the table. This query takes ages to complete. Always more than 3 minutes. If I do something similar in C# where the tables are arrays or dictionaries it doesn't even take a second. What am I doing wrong and how can I improve the efficiency?
DELETE FROM [QueryScores]
GO
INSERT INTO [QueryScores] (Id)
SELECT Id FROM [Documents]
DECLARE #Id NVARCHAR(50)
DECLARE myCursor CURSOR LOCAL FAST_FORWARD FOR
SELECT [Id] FROM [QueryScores]
OPEN myCursor
FETCH NEXT FROM myCursor INTO #Id
WHILE ##FETCH_STATUS = 0
BEGIN
DECLARE #Score FLOAT = 0.0
DECLARE #CounterMax INT = (SELECT COUNT(*) FROM [Query])
DECLARE #Counter INT = 0
PRINT 'Document: ' + CAST(#Id AS VARCHAR)
PRINT 'Score: ' + CAST(#Score AS VARCHAR)
WHILE #Counter < #CounterMax
BEGIN
DECLARE #StemId INT = (SELECT [Query].[StemId] FROM [Query] WHERE [Query].[Id] = #Counter)
DECLARE #Weight FLOAT = (SELECT [tfidf].[Weight] FROM [TfidfWeights] AS [tfidf] WHERE [tfidf].[StemId] = #StemId AND [tfidf].[DocumentId] = #Id)
PRINT 'WEIGHT: ' + CAST(#Weight AS VARCHAR)
IF(#Weight > 0.0)
BEGIN
DECLARE #QWeight FLOAT = (SELECT [Query].[Weight] FROM [Query] WHERE [Query].[StemId] = #StemId)
SET #Score = #Score + (#QWeight * #Weight)
PRINT 'Score: ' + CAST(#Score AS VARCHAR)
END
SET #Counter = #Counter + 1
END
UPDATE [QueryScores] SET Score = #Score WHERE Id = #Id
FETCH NEXT FROM myCursor INTO #Id
END
CLOSE myCursor
DEALLOCATE myCursor
The logic is that i have a list of docs. And I have a question/query. I iterate through each and every doc and then have a nested iteration through the query terms/words to find if the doc contains these terms. If it does then I add/multiply pre-calculated scores.
The problem is that you're trying to use a set-based language to iterate through things like a procedural language. SQL requires a different mindset. You should almost never be thinking in terms of loops in SQL.
From what I can gather from your code, this should do what you're trying to do in all of those loops, but it does it in a single statement in a set-based manner, which is what SQL is good at.
INSERT INTO QueryScores (id, score)
SELECT
D.id,
SUM(CASE WHEN W.[Weight] > 0 THEN W.[Weight] * Q.[Weight] ELSE NULL END)
FROM
Documents D
CROSS JOIN Query Q
LEFT OUTER JOIN TfidfWeights W ON W.StemId = Q.StemId AND W.DocumentId = D.id
GROUP BY
D.id
Of course, without a description of your requirements or sample data with expected output I don't know if this is actually what you're looking to get, but it's my best guess given your code.
You should read: https://stackoverflow.com/help/how-to-ask
The query I came up with is very similar to the one from Tom H.
There's a lot of unknowns about the problem OP code is trying to solve. Is there a particular reason the code only checks for rows in the Query table where the Id value is between 0 and one less than the number of rows in the table? Or is the intent really just to get all of the rows from Query?
Here's my version:
INSERT INTO QueryScores (Id, Score)
SELECT d.Id
, SUM(CASE WHEN w.Weight > 0 THEN w.Weight * q.Weight ELSE NULL END) AS Score
FROM [Documents] d
CROSS
JOIN [Query] q
LEFT
JOIN [TfidfWeights] w
ON w.StemId = q.StemId
AND w.DocumentId = d.Id
GROUP BY d.Id
Processing RBAR (row by agonizing row) is almost always going to be slower than processing as a set. SQL is designed to operate on sets of data. There is overhead for each individual SQL statement, and for each context switch between the procedure and the SQL engine. Sure, there might be room to improve performance of individual parts of the procedure, but the big gain is going to be doing an operation on the entire set, in a single SQL statement.
If there's some reason you need to process one document at a time, using a cursor, then get rid of the loops and individual selects and all those PRINT, and just use a single query to get the score for the document.
OPEN myCursor
FETCH NEXT FROM myCursor INTO #Id
WHILE ##FETCH_STATUS = 0
BEGIN
UPDATE [QueryScores]
SET Score
= ( SELECT SUM( CASE WHEN w.Weight > 0
THEN w.Weight * q.Weight
ELSE NULL END
)
FROM [Query] q
JOIN [TfidfWeights] w
ON w.StemId = q.StemId
WHERE w.DocumentId = #Id
)
WHERE Id = #Id
FETCH NEXT FROM myCursor INTO #Id
END
CLOSE myCursor
DEALLOCATE myCursor
You might not even need documents
INSERT INTO QueryScores (id, score)
SELECT W.DocumentId as [id]
, SUM(W.[Weight] + Q.[Weight]) as [score]
FROM Query Q
JOIN TfidfWeights W
ON W.StemId = Q.StemId
AND W.[Weight] > 0
GROUP BY W.DocumentId

How can I improve this query for performance, provide status, and run as SQL job?

I have two tables, in two different databases. I am using one of the tables to update values in the other database table.
There are over 200,000 rows to iterate through, and it is taking several hours to run, on an Amazon c3.xlarge instance.
Below is the query I am running, and I am wondering three things:
Can this query be optimized to perform faster?
I would like to add a count to get the number of actual records
updated.How?
How can I turn this into a SQL job?
DECLARE #id VARCHAR(12) -- unique id
DECLARE #currentval VARCHAR(64) -- current value
DECLARE #newval VARCHAR(64) -- updated value
DECLARE db_cursor1 CURSOR FOR
SELECT b.[id], a.status, b.[New Status]
FROM db1.dbo.['account'] as b inner join db2.dbo.accounttemp as a on a.ACCOUNTID = b.[ID]
OPEN db_cursor1
FETCH NEXT FROM db_cursor1
INTO #id,
#currentval,
#newval
WHILE ##FETCH_STATUS = 0
BEGIN
UPDATE db2.dbo.accounttemp
SET status = #newval
WHERE ACCOUNTID = #id
AND STATUS = #currentval
FETCH NEXT FROM db_cursor1
INTO #id,
#currentval,
#newval
END
CLOSE db_cursor1
DEALLOCATE db_cursor1
By reviewing the procedure, you will see that you can completely remove the cursor using the following SQL
UPDATE db2.dbo.accounttemp
SET status = a.Status
FROM db2.dbo.accounttemp at
INNER JOIN db1.dbo.['account'] AS a ON a.Id = at.[ACCOUNTID]
WHERE a.Status = at.Status
call the following line to return the rows affected by the update
RETURN ##ROWCOUNT
You can create an SQL maintenance plan to run this on scheduled basis

SQL Server avoiding Cursors

I am just reading about SQL Server cursors, that I should avoid them as much as I can :)
Is there ALWAYS a way to write a query/function/procedure without cursors?
I have found some examples on the Net, but they are usually rather simple.
My example - can I avoid cursors?:
Let's have an update procedure X taking an account id and a transaction id - this is the unique key of the row I want to update
But there are more transactions for an account
SELECT accID, transID from table
Now I use a cursor to loop on the table, always taking the accID and transID to get the row and update it
Can I do it a non-cursor way?
Thanks
I was asked for more info about what I want to do, it is a bit too long, so I will add it here as a new comment.. I can not copy the code here, so I am trying to capture at least the base.
My code looked like:
DECLARE #headAcc varchar(20)
set #headAcc='ACC111'
declare #id varchar(20), #clId varchar(20)
declare cur CURSOR for
select addID, transID from Table1
where accID like #headAcc+'%' and...
order by 1 desc
OPEN cur
FETCH NEXT from cur into #accID, #transID
WHILE ##FETCH_STATUS = 0
BEGIN
-- 1
update Table2
set colX = ...
where accID=#accID and transID=#transID and...
-- 2
update Table3
set colY = ...
where accID=#accID and transID=#transID and...
FETCH NEXT from cur into #accID, #transID
END
CLOSE cur
DEALLOCATE cur
Thanks for the help in answers and link in comments, I was not very familiar with UPDATE JOINs, and that should be the answer.
After reading the articles I came up with 2 updates in form like:
DECLARE #headACC varchar(20)
set #headACC='ACC111'
update t2
set t2.colX = ...
from Table2 t2
join Table1 t1
on t1.accID=t2.accID
and t1.transID=t2.transID
where t1.accID like #headAcc+'%'
and...
It seems to work. Any other comments appreciated eg. if there is a more effective way.
See this example from MSDN on how to update one table based on data in other tables:
USE AdventureWorks2012;
GO
UPDATE Sales.SalesPerson
SET SalesYTD = SalesYTD + SubTotal
FROM Sales.SalesPerson AS sp
JOIN Sales.SalesOrderHeader AS so
ON sp.BusinessEntityID = so.SalesPersonID
AND so.OrderDate = (SELECT MAX(OrderDate)
FROM Sales.SalesOrderHeader
WHERE SalesPersonID = sp.BusinessEntityID);
GO

Efficient SQL Server stored procedure

I am using SQL Server 2008 and running the following stored procedure that needs to "clean" a 70 mill table from about 50 mill rows to another table, the id_col is integer (primary identity key)
According to the last running I made it is working good but it is expected to last for about 200 days:
SET NOCOUNT ON
-- define the last ID handled
DECLARE #LastID integer
SET #LastID = 0
declare #tempDate datetime
set #tempDate = dateadd(dd,-20,getdate())
-- define the ID to be handled now
DECLARE #IDToHandle integer
DECLARE #iCounter integer
DECLARE #watch1 nvarchar(50)
DECLARE #watch2 nvarchar(50)
set #iCounter = 0
-- select the next to handle
SELECT TOP 1 #IDToHandle = id_col
FROM MAIN_TABLE
WHERE id_col> #LastID and DATEDIFF(DD,someDateCol,otherDateCol) < 1
and datediff(dd,someDateCol,#tempDate) > 0 and (some_other_int_col = 1745 or some_other_int_col = 1548 or some_other_int_col = 4785)
ORDER BY id_col
-- as long as we have s......
WHILE #IDToHandle IS NOT NULL
BEGIN
IF ((select count(1) from SOME_OTHER_TABLE_THAT_CONTAINS_20k_ROWS where some_int_col = #IDToHandle) = 0 and (select count(1) from A_70k_rows_table where some_int_col =#IDToHandle )=0)
BEGIN
INSERT INTO SECONDERY_TABLE
SELECT col1,col2,col3.....
FROM MAIN_TABLE WHERE id_col = #IDToHandle
EXEC [dbo].[DeleteByID] #ID = #IDToHandle --deletes the row from 2 other tables that is related to the MAIN_TABLE and than from the MAIN_TABLE
set #iCounter = #iCounter +1
END
IF (#iCounter % 1000 = 0)
begin
set #watch1 = 'iCounter - ' + CAST(#iCounter AS VARCHAR)
set #watch2 = 'IDToHandle - '+ CAST(#IDToHandle AS VARCHAR)
raiserror ( #watch1, 10,1) with nowait
raiserror (#watch2, 10,1) with nowait
end
-- set the last handled to the one we just handled
SET #LastID = #IDToHandle
SET #IDToHandle = NULL
-- select the next to handle
SELECT TOP 1 #IDToHandle = id_col
FROM MAIN_TABLE
WHERE id_col> #LastID and DATEDIFF(DD,someDateCol,otherDateCol) < 1
and datediff(dd,someDateCol,#tempDate) > 0 and (some_other_int_col = 1745 or some_other_int_col = 1548 or some_other_int_col = 4785)
ORDER BY id_col
END
Any ideas or directions to improve this procedure run-time will be welcomed
Yes, try this:
Declare #Ids Table (id int Primary Key not Null)
Insert #Ids(id)
Select id_col
From MAIN_TABLE m
Where someDateCol >= otherDateCol
And someDateCol < #tempDate -- If there are times in these datetime fields,
-- then you may need to modify this condition.
And some_other_int_col In (1745, 1548, 4785)
And Not exists (Select * from SOME_OTHER_TABLE_THAT_CONTAINS_20k_ROWS
Where some_int_col = m.id_col)
And Not Exists (Select * From A_70k_rows_table
Where some_int_col = m.id_col)
Select id from #Ids -- this to confirm above code generates the correct list of Ids
return -- this line to stop (Not do insert/deletes) until you have verified #Ids is correct
-- Once you have verified that above #Ids is correctly populated,
-- then delete or comment out the select and return lines above so insert runs.
Begin Transaction
Delete OT -- eliminate row-by-row call to second stored proc
From OtherTable ot
Join MAIN_TABLE m On m.id_col = ot.FKCol
Join #Ids i On i.Id = m.id_col
Insert SECONDERY_TABLE(col1, col2, etc.)
Select col1,col2,col3.....
FROM MAIN_TABLE m Join #Ids i On i.Id = m.id_col
Delete m -- eliminate row-by-row call to second stored proc
FROM MAIN_TABLE m
Join #Ids i On i.Id = m.id_col
Commit Transaction
Explaanation.
You had numerous filtering conditions that were not SARGable, i.e., they would force a complete table scan for every iteration of your loop, instead of being able to use any existing index. Always try to avoid filter conditions that apply processing logic to a table column value before comparing it to some other value. This eliminates the opportunity for the query optimizer to use an index.
You were executing the inserts one at a time... Way better to generate a list of PK Ids that need to be processed (all at once) and then do all the inserts at once, in one statement.

SQL while loop with Temp Table

I need to create a temporary table and then update the original table. Creating the temporary table is not a problem.
create table #mod_contact
(
id INT IDENTITY NOT NULL PRIMARY KEY,
SiteID INT,
Contact1 varchar(25)
)
INSERT INTO #mod_contact (SiteID, Contact1)
select r.id, r.Contact from dbo.table1 r where CID = 142
GO
Now I need to loop through the table and update r.contact = SiteID + r.contact
I have never used a while loop before and can't seem to make any examples I have seen work.
You can do this in multiple ways, but I think you're looking for a way using a cursor.
A cursor is sort of a pointer in a table, which when incremented points to the next record. ( it's more or less analogeous to a for-next loop )
to use a cursor you can do the following:
-- DECLARE the cursor
DECLARE CUR CURSOR FAST_FORWARD READ_ONLY FOR SELECT id, siteId, contract FROM #mod_contract
-- DECLARE some variables to store the values in
DECLARE #varId int
DECLARE #varSiteId int
DECLARE #varContract varchar(25)
-- Use the cursor
OPEN CUR
FETCH NEXT FROM CUR INTO #varId, #varSiteId, #varContract
WHILE ##FETCH_STATUS = 0
BEGIN
UPDATE dbo.table1
SET contract = #varSiteId + #varContract -- It might not work due to the different types
WHERE id = #varId
FETCH NEXT FROM CUR INTO #varId, #varSiteId, #varContract
END
CLOSE CUR
DEALLOCATE CUR
It's not the most efficient way to get this done, but I think this is what you where looking for.
Hope it helps.
Use a set based approach - no need to loop (from the little details):
UPDATE
r
SET
r.Contact = m.SiteID + r.Contact
FROM
table1 r
INNER JOIN
#mod_contact m
ON m.id=r.id
Your brain wants to do this:
while records
update(i); //update record i
records = records + 1
end while
SQL is set based and allows you to take a whole bunch of records and update them in a single command. The beauty of this is you can use the WHERE clause to filter certain rows that are not needed.
As others have mentioned, learning how to do loops in SQL is generally a bad idea; however, since you're trying to understand how to do something, here's an example:
DECLARE #id int
SELECT #ID =1
WHILE #ID <= (SELECT MAX(ID) FROM table_1)
-- while some condition is true, then do the following
--actions between the BEGIN and END
BEGIN
UPDATE table_1
SET contact = CAST(siteID as varchar(100)) + contact
WHERE table_1.CID = #ID
--increment the step variable so that the condition will eventually be false
SET #ID = #ID + 1
END
--do something else once the condition is satisfied
PRINT 'DONE!! Don't try this in production code...'
Try this one:
-- DECLARE the cursor
DECLARE CUR CURSOR FAST_FORWARD READ_ONLY FOR SELECT column1,column2 FROM table
-- DECLARE some variables to store the values in
DECLARE #varId int
DECLARE #varSiteId int
--DECLARE #varContract varchar(25)
-- Use the cursor
OPEN CUR
FETCH NEXT FROM CUR INTO #varId, #varSiteId
WHILE ##FETCH_STATUS = 0
BEGIN
SELECT *
FROM Table2
WHERE column1 = #varId
AND column2 = #varSiteId
FETCH NEXT FROM CUR INTO #varId, #varSiteId
END
CLOSE CUR
DEALLOCATE CUR
need to create a temporary table and then up date the original table.
Why use a temporary table at all? Your CID column doesn't appear in the temporary table, so I don't see how you can successfully update the original table using SiteID, unless there is only one row where CID = 142 in which using a temp table is definitely overkill.
You can just do this:
UPDATE dbo.table1
SET contact = SiteID + contact
WHERE CID = 142;
Here's a related example which may help getting you to 'think in SQL':
UPDATE T
SET A = B, B = A;
Assuming A and B are of the same type, this would successfully swap their values.