Delete more than 6 million rows so SLOW

Delete more than 6 million rows so SLOW - sql

I have a SP that needs to delete more than 6 million rows.
I tried this approach but still the execution time is SLOW.
DECLARE #continue BIT = 1
-- delete all ids not between starting and ending ids
WHILE #continue = 1
BEGIN
SET #continue = 0
DELETE top (10000) u
FROM Table1 u WITH (READPAST)
WHERE ID = #ID
AND NID IN (SELECT NID FROM #Node GROUP BY NID)
IF ##ROWCOUNT > 0
SET #continue = 1
end
Any other suggestions?

Your loop logic looks OK
As other have said look at application design
Can you use staging tables and be able to truncate
If the table has FK and you are confident you are not going to violate them then disable the FK and re-enable
If you have a delete trigger and you need to delete 6 million every day then you really need to reevaluate the design
Look at optimizing the select then put it in the delete
select top (10000) u
FROM Table1 u WITH (READPAST)
JOIN #Node
on u.NIT = #Node.NIT
and u.ID = #ID

Related

Optimistic concurrency while in transaction

In trying to fix data errors due to concurrency conflicts I realized I'm not completely sure how optimistic concurrency works in SQL Server. Assume READ_COMMITTED isolation level. A similar example:
BEGIN TRAN
SELECT * INTO #rows FROM SourceTable s WHERE s.New = 1
UPDATE d SET Property = 'HelloWorld' FROM DestinationTable d INNER JOIN #rows r ON r.Key = d.Key
UPDATE s SET Version = GenerateRandomVersion() FROM SourceTable s
INNER JOIN #rows r on r.Key = s.Key AND r.Version = s.Version
IF ##ROWCOUNT <> SELECT COUNT(*) FROM #rows
RAISEERROR
END IF
COMMIT TRAN
Is this completely atomic / thread safe?
The ON clause on UPDATE s should prevent concurrent updates via the Version and ROWCOUNT check. But is that really true? What about the following, similar query?
BEGIN TRAN
SELECT * INTO #rows FROM SourceTable s WHERE s.New = 1
UPDATE s SET New = 0 AND Version = GenerateRandomVersion() FROM SourceTable s
INNER JOIN #rows r on r.Key = s.Key AND r.Version = s.Version
IF ##ROWCOUNT <> SELECT COUNT(*) FROM #rows
RAISEERROR
END IF
UPDATE d SET Property = 'HelloWorld' FROM DestinationTable d INNER JOIN #rows r ON r.Key = d.Key
COMMIT TRAN
My worry here is that concurrent execution of the above script will reach the UPDATE s statement, get a ##ROWCOUNT that is transient / not actually committed to DB yet, so both threads / executions will continue past the IF statement and perform the important UPDATE d statement, which in this case is idempotent but not so in my original production case.

I think what you want to do is remove the very small race condition in your script by making it as set based as possible, e.g.
BEGIN TRAN
DECLARE #UpdatedSources Table (Key INT NOT NULL);
UPDATE s SET New = 0
FROM SourceTable s
WHERE s.New = 1
OUTPUT Inserted.Key into #UpdatedSources
UPDATE d SET Property = 'HelloWorld'
FROM DestinationTable d
INNER JOIN #UpdatedSources r ON r.Key = d.Key
COMMIT TRAN
I think the 'version' column in your table is confusing things - you're trying to build atomicity into your table rather than just letting the DB transactions handle that. With the script above, the rows where New=1 will be locked until the transaction commits, so subsequent attempts will only find either 'actually' new rows or rows where new=0.
Update after comment
To demonstrate the locking on the table, if it is something you want to see, then you could try and initiate a deadlock. If you were to run this query concurrently with the first one, I think you may eventually deadlock, though depending on how quickly these run, you may struggle to see it:
BEGIN TRAN
SELECT *
FROM DestinationTable d
INNER JOIN SourceTable ON r.Key = d.Key
WHERE s.New = 1
UPDATE s SET New = 0
FROM SourceTable s
WHERE s.New = 1
COMMIT TRAN

How to loop through Delete one row at a time

I am wondering if there is a way to restructure the below SQL so that one row is deleted at a time, as opposed to performing a delete in one mass operation? The reason being is that the delete action causes a trigger on this table to execute and (in cases where a USER_ID has more than 1 row) is attempting to insert data into another table that has a datetime stamp as a key and the same time (to the millisecond) is attempting to be inserted and causing a duplicate key insert error.
DELETE ORDERS
FROM LINE_ORDER ORDERS
INNER JOIN LINE_ORDER_XREF B ON B.OPRID = ORDERS.USER_ID
WHERE B.USERID = 'SYSACCT'
The thought was that if each row is deleted separately as it's own transaction then this will make each datetime stamp unique. The number of delete operations will be low and the additional processing time is not a concern in this case. Is it possible to structure into some loop or use a cursor? The primary ID column's in LINE_ORDER (USER_ID and USER_ROLE) are varchar columns so I don't believe I can increment this.
USER_ID USER_ROLE DYNAMIC_SW
11000_600 E_SAML N
11000_602 E_SAML N
11000_602 SUPRV N
11000_604 E_PRO N
11000_605 E_SAML N

Well, you can use TOP for this purpose:
DELETE o
FROM (SELECT TOP (1) o.*
FROM LINE_ORDER o INNER JOIN
LINE_ORDER_XREF lox
ON lox.OPRID = o.USER_ID
WHERE lox.USERID = 'SYSACCT'
) o;
You then need to embed this in a loop to delete all the matching values.

You can try this:
DECLARE #i INT=
(
SELECT ORDERS
FROM LINE_ORDER ORDERS
INNER JOIN LINE_ORDER_XREF B ON B.OPRID = ORDERS.USER_ID
WHERE B.USERID = 'SYSACCT'
);
DECLARE #count INT= 1;
WHILE(#count <= #i)
BEGIN
DELETE TOP (1)
ORDERS
FROM LINE_ORDER ORDERS
INNER JOIN LINE_ORDER_XREF B ON B.OPRID = ORDERS.USER_ID
WHERE B.USERID = 'SYSACCT';
SET #count = #count + 1;
END;

Slow SQL Update using sub query

Hi all … Wonder if anyone out there can help me with this one please.
I am running a query to update product categories against sales lines and need to back file a few million records so I wrote the query below to run for a specific order ID
DECLARE #ID INT
SET #ID = 659483
UPDATE [TradeSpace].[TradeSpace].[dbo].[SalesLine]
SET [ProductCategory] = [curSync].[pc_Cat]
FROM (SELECT [SC_ID],
[pc_cat]
FROM [MW_MereSys].[dbo].[MWSLines]
INNER
JOIN [MW_MereSys].[dbo].[MWProductCats]
ON [MWSLines].[pc_catref] = [MWProductCats].[pc_catref]
WHERE [sh_id] = #ID
) AS [curSync]
WHERE [SalesLine].[slID] = [curSync].[sc_id]
AND [salesline].[soid] = #ID
The sub SELECT runs in less than one second but the update has yet to finished (have left it for an hour at most). Indexes exist for [slID] and [soid] .. a manual update for one line takes less than one seconds but run like this (10 lines) is desperately slow.
Does anybody have any clues please. I've written plenty of queries like this and never had a problem … stumped :(

Your query rewritten with no changes:
UPDATE s SET
ProductCategory = curSync.pc_Cat
FROM TradeSpace.TradeSpace.dbo.SalesLine s
INNER JOIN
(
SELECT [SC_ID], [pc_cat]
FROM [MW_MereSys].[dbo].[MWSLines] l
INNER JOIN [MW_MereSys].[dbo].[MWProductCats] c ON l.[pc_catref] = c.[pc_catref]
WHERE [sh_id] = #ID
) AS [curSync]
on s.[slID] = [curSync].[sc_id]
WHERE s.[soid] = #ID
Are your sure everything is correct here? That single row from SalesLine always matches only one row from subquery?
Try this then. Will fail if this is not true. Original query would silently update same row with different values in same situation.
UPDATE s SET
ProductCategory = (
SELECT [pc_cat]
FROM [MW_MereSys].[dbo].[MWSLines] l
INNER JOIN [MW_MereSys].[dbo].[MWProductCats] c ON l.[pc_catref] = c.[pc_catref]
WHERE [sh_id] = #ID
AND [sc_id] = s.[slID]
)
FROM TradeSpace.TradeSpace.dbo.SalesLine s
WHERE s.[soid] = #ID
And please check estimated execution plan. Does it hit indexes?

We need other detail like a I mention in comments.
Your update is slow because of very high cardinality estimate when update table is join with Sub query result.
It may be because of wrong join and where predicate.
you can put the sub query result in #Temp table and try.Also you can create same index in #temp table.
DECLARE #ID INT
SET #ID = 659483
create #temp table([SC_ID] int,[pc_cat] int)
insert into #temp
SELECT [SC_ID],
[pc_cat]
FROM [MW_MereSys].[dbo].[MWSLines]
INNER JOIN [MW_MereSys].[dbo].[MWProductCats]
ON [MWSLines].[pc_catref] = [MWProductCats].[pc_catref]
WHERE [sh_id] = #ID
UPDATE SalesLine
SET [ProductCategory] = [curSync].[pc_Cat]
FROM [TradeSpace].[TradeSpace].[dbo].[SalesLine] as SalesLine
inner join #temp AS [curSync]
WHERE [SalesLine].[slID] = [curSync].[sc_id]
AND [salesline].[soid] = #ID
drop table #temp

how to properly merge these 2 query into one update?

This currently work but I would like to change the update statement to include the action of the insert below it, is it posssible?
UPDATE cas
SET [Locked] = CASE WHEN cas.Locked <> #TargetState AND cas.LastChanged = filter.SourceDateTime THEN #TargetState ELSE cas.[Locked] end,
OUTPUT inserted.Id, inserted.Locked, CASE WHEN inserted.Locked = #TargetState AND
inserted.LastChanged = filter.SourceDateTime THEN 1
WHEN inserted.LastChanged <> filter.SourceDateTime THEN -1 -- out of sync
WHEN deleted.Locked = #TargetState THEN -2 -- was not in a good state
ELSE 0 END --generic failure
INTO #OUTPUT
FROM dbo.Target cas WITH(READPAST, UPDLOCK, ROWLOCK) INNER JOIN #table filter ON cas.Id = filter.Id
INSERT INTO #OUTPUT
SELECT filter.id, NULL, when cas.id is not null -3 -- row was/is locked
else -4 end --not found
FROM #table filter left join dbo.target cas with(nolock) on filter.id = cas.id
WHERE NOT EXISTS (SELECT 1 FROM #OUTPUT result WHERE filter.id = result.UpdatedId)

I do not think what you want is possible.
You start with a table to be updated. Let’s say this table contains a set of IDs, say, 1 to 6
You join onto a temp table containing a different set of IDs that may partially overlap (say, 4 to 9)
You issue the update using an inner join. Only rows 4 to 6 are updated
The output clause picks up data only for modified rows, so you only get data for rows 4 to 6
If you flipped this to an outer join (such that all temp table rows are selected), you still only update rows 4 to 6, and the output clause still only kicks out data for rows 4 to 6
So, no, I see no way of achieving this goal in a single SQL statement.

How can I do a SQL UPDATE in batches, like an Update Top?

Is it possible to add a TOP or some sort of paging to a SQL Update statement?
I have an UPDATE query, that comes down to something like this:
UPDATE XXX SET XXX.YYY = #TempTable.ZZZ
FROM XXX
INNER JOIN (SELECT SomeFields ... ) #TempTable ON XXX.SomeId=#TempTable.SomeId
WHERE SomeConditions
This update will affect millions of records, and I need to do it in batches. Like 100.000 at the time (the ordering doesn't matter)
What is the easiest way to do this?

Yes, I believe you can use TOP in an update statement, like so:
UPDATE TOP (10000) XXX SET XXX.YYY = #TempTable.ZZZ
FROM XXX
INNER JOIN (SELECT SomeFields ... ) #TempTable ON XXX.SomeId=#TempTable.SomeId
WHERE SomeConditions

You can use SET ROWCOUNT { number | #number_var } it limits number of rows processed before stopping the specific query, example below:
SET ROWCOUNT 10000 -- define maximum updated rows at once
UPDATE XXX SET
XXX.YYY = #TempTable.ZZZ
FROM XXX
INNER JOIN (SELECT SomeFields ... ) #TempTable ON XXX.SomeId = #TempTable.SomeId
WHERE XXX.YYY <> #TempTable.ZZZ and OtherConditions
-- don't forget about bellow
-- after everything is updated
SET ROWCOUNT 0
I've added XXX.YYY <> #TempTable.ZZZ to where clause to make sure you will not update twice already updated value.
Setting ROWCOUNT to 0 turn off limits - don't forget about it.

You can do something like the following
declare #i int = 1
while #i <= 10 begin
UPDATE top (10) percent
masterTable set colToUpdate = lt.valCol
from masterTable as mt
inner join lookupTable as lt
on mt.colKey = lt.colKey
where colToUpdate is null
print #i
set #i += 1
end
--one final update without TOP (assuming lookupTable.valCol is mostly not null)
UPDATE --top (10) percent
masterTable set colToUpdate = lt.valCol
from masterTable as mt
inner join lookupTable as lt
on mt.colKey = lt.colKey
where colToUpdate is null

Depending on your ability to change the datastructure of the table, I would suggest that you add a field to your table that can hold some sort of batch-identificator. Ie. it can be a date-stamp if you do it daily, an incremenal value or basically any value that you can make unique for your batch. If you take the incremental approach, your update will then be:
UPDATE TOP (100000) XXX SET XXX.BATCHID = 1, XXX.YYY = ....
...
WHERE XXX.BATCHID < 1
AND (rest of WHERE-clause here).
Next time, you'll set the BATCHID = 2 and WHERE XXX.BATCHID < 2
If this is to be done repeatedly, you can set an index on the BATCHID and reduce load on the server.

DECLARE #updated_Rows INT;
SET #updated_Rows = 1;
WHILE (#updated_Rows > 0)
BEGIN
UPDATE top(10000) XXX SET XXX.YYY = #TempTable.ZZZ FROM XXX
INNER JOIN #TempTable ON XXX.SomeId=#TempTable.SomeId
WHERE SomeConditions
SET #updated_Rows = ##ROWCOUNT;
END

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Delete more than 6 million rows so SLOW - sql

Related

Optimistic concurrency while in transaction

How to loop through Delete one row at a time

Slow SQL Update using sub query

how to properly merge these 2 query into one update?

How can I do a SQL UPDATE in batches, like an Update Top?

Categories

Resources