Is it possible to do batch operations with TSQL "merge" statement? - sql

With a simple UPDATE statement we can do it in batches when dealing with huge tables.
WHILE 1 = 1
BEGIN
UPDATE TOP (5000)
dbo.LargeOrders
SET CustomerID = N'ABCDE'
WHERE CustomerID = N'OLDWO';
IF ##rowcount < 5000
BREAK;
END
When working with MERGE statement, is it possible to do similar things? As I know this is not possible, because you need to do different operations based on the condition. For example to UPDATE when matched and to INSERT when not matched. I just want to confirm on it and I may need to switch to the old-school UPDATE & INSERT if it's true.

Why not use a temp table as the source of your MERGE and then handle batching via the source table. I do this in a project of mine where I'll batch 50,000 rows to a temp table and then use the temp table as the source for the MERGE. I do this in a loop until all rows are processed.
The temp table could be a physical table or an in memory table.

Related

Performance problem of MERGE statement in Azure Synapse

My DWH is deployed on Azure Synapse SQL pool.
I loaded data to DWH by script that consists of update, insert and delete (u-i-d) operations. The duration of full load to target table was 12minutes for near 50million of rows.
Recently I tried to use MERGE statement instead of u-i-d. And I found that MERGE performance much worse than u-i-d - 1hour for MERGE against 12minutes for u-i-d!
Please share your experience with MERGE statement on Azure synapse, friends!
Does MERGE really work worse in Synapse than separate update-insert-delete operations?
As per MS doc on MERGE (Transact-SQL) - SQL Server | Microsoft Learn, Merge statement works better for complex statements and for simple activities merging using Insert, Update and delete statement works better.
The conditional behavior described for the MERGE statement works best when the two tables have a complex mixture of matching characteristics. For example, inserting a row if it doesn't exist, or updating a row if it matches. When simply updating one table based on the rows of another table, improve the performance and scalability with basic INSERT, UPDATE, and DELETE statements.
I tried to repro and compare both approaches using simple statements.
Sample tables are taken as in below image.
Merge statement is used to merge and it took 9 seconds
MERGE Products AS TARGET
USING UpdatedProducts AS SOURCE
ON (TARGET.ProductID = SOURCE.ProductID)
--When records are matched, update the records if there is any change
WHEN MATCHED AND TARGET.ProductName <> SOURCE.ProductName OR TARGET.Rate <> SOURCE.Rate
THEN UPDATE SET TARGET.ProductName = SOURCE.ProductName, TARGET.Rate = SOURCE.Rate
--When no records are matched, insert the incoming records from source table to target table
WHEN NOT MATCHED BY TARGET
THEN INSERT (ProductID, ProductName, Rate) VALUES (SOURCE.ProductID, SOURCE.ProductName, SOURCE.Rate)
--When there is a row that exists in target and same record does not exist in source then delete this record target
WHEN NOT MATCHED BY SOURCE
THEN DELETE ;
Then tried with Update, Insert and delete statement. It took nearly 0 second.
Update, insert and delete works better for simple scenarios.

SQL Merge and large table data

I will be running a MERGE SQL query to query over a million records in my source table and insert into my target table. This table that I'm doing the SELECT from in the Merge is in production. This table will have an application with many users hitting the table for SELECT, INSERT, UPDATE, DELETE at the same time. I will NOT be modifying the source table data with my MERGE statement, only the target table. I will have SQL Snapshot Isolation enabled, so no reason to use NOLOCK hint. Is there a way to have the query run in batches, or is having the MERGE statement scan the entire table more efficient? I have 2 other merge statements I'll be running after the initial INSERT to do INSERT, UPDATE, DELETE on target table for any changes that were done. Are there any precautions I need to take so as to not cause performance issues with the production application? I'm going to use a stored procedure because I will be running these queries on multiple tables that will be doing the same function over and over again.
My sample initial MERGE:
MERGE dl178 as TARGET
USING dlsd178 as SOURCE
ON (TARGET.docid = source.docid AND TARGET.objectid = source.objectid AND target.pagenum = source.pagenum
and target.subpagenum = source.subpagenum
and target.pagever = source.pagever and target.pathid = source.pathid
and target.annote = source.annote)
WHEN NOT MATCHED BY TARGET
THEN INSERT (docid, pagenum, subpagenum, pagever, objectid, pathid, annote, formatid, ftoffset, ftcount)
VALUES (
source.docid, source.pagenum, source.subpagenum, source.pagever,
source.objectid, source.pathid,source.annote ,source.formatid ,source.ftoffset, source.ftcount)
OUTPUT $action, Inserted.*;
Have you considered Partition Switching or Change Data Capture? Sounds like you are trying to start an ongoing ETL process.

Splitting a merge into batches in SQL Server 2008

I am trying to find the best way to perform a SQL Server merge in batches.
TEMPTABLENAME is a temp table with an id column (and all the other columns).
I am thinking some kind of while loop using the id as a counter.
The real problem with the merge statement is that it locks the entire table for an extended amount of time when processing 200K+ records. I want to loop every 100 rows so I can release the lock and let the other applications access the table. This table has millions of rows and also fires off an audit every time data is updated. This causes 160K records to take around 20 to 30 mins in the merge below.
The merge code below is a sample of the code. There is probably about 25 columns that get updated/inserted.
I would be open to finding another way other than the merge to insert the data. I just can not change the audit system or amount of records in the table.
merge Employee as Target
using TEMPTABLENAME as Source on (Target.ClientId = Source.ClientId and
Target.EmployeeReferenceId = Source.EmployeeReferenceId)
when matched then
update
set
Target.FirstName = Source.FirstName,
Target.MiddleName = Source.MiddleName,
Target.LastName = Source.LastName,
Target.BirthDate = Source.BirthDate
when not matched then
INSERT ([FirstName], [MiddleName], [LastName], [BirthDate])
VALUES (Source.FirstName, Source.MiddleName, Source.LastName, Source.BirthDate)
OUTPUT $action INTO #SummaryOfChanges;
SELECT Change, COUNT(*) AS CountPerChange
FROM #SummaryOfChanges
GROUP BY Change;

Updating 100k records in one update query

Is it possible, or recommended at all, to run one update query, that will update nearly 100k records at once?
If so, how can I do that? I am trying to pass an array to my stored proc, but it seems not to work, this is my SP:
CREATE PROCEDURE [dbo].[UpdateAllClients]
#ClientIDs varchar(max)
AS
BEGIN
DECLARE #vSQL varchar(max)
SET #vSQL = 'UPDATE Clients SET LastUpdate=GETDATE() WHERE ID IN (' + #ClientIDs + ')';
EXEC(#vSQL);
END
I have not idea whats not working, but its just not updating the relevant queries.
Anyone?
The UPDATE is reading your #ClientIDs (as a Comma Separated Value) as a whole. To illustrate it more, you are doing like this.
assume the #ClientIDs = 1,2,3,4,5
your UPDATE command is interpreting it like this
UPDATE Clients SET LastUpdate=GETDATE() WHERE ID IN ('1,2,3,4,5')';
and not
UPDATE Clients SET LastUpdate=GETDATE() WHERE ID IN (1,2,3,4,5)';
One suggestion to your question is by using subquery on your UPDATE, example
UPDATE Clients
SET LastUpdate = GETDATE()
WHERE ID IN
(
SELECT ID
FROM tableName
-- where condtion
)
Hope this makes sense.
A few notes to be aware of.
Big updates like this can lock up the target table. If > 5000 rows are affected by the operation, the individual row locks will be promoted to a table lock, which would block other processes. Worth bearing in mind if this could cause an issue in your scenario. See: Lock Escalation
With a large number of rows to update like this, an approach I'd consider is (basic):
bulk insert the 100K Ids into a staging table (e.g. from .NET, use SqlBulkCopy)
update the target table, using a join onto the above staging table
drop the staging table
This gives some more room for controlling the process, but breaking the workload up into chunks and doing it x rows at a time.
There is a limit for the number of items you pass to 'IN' if you are giving an array.
So, if you just want to update the whole table, skip the IN condition.
If not specify an SQL inside IN. That should do the job
The database will very likely reject that SQL statement because it is too long.
When you need to update so many records at once, then maybe your database schema isn't appropriate. Maybe the LastUpdate datum should not be stored separately for each client but only once globally or only once for a constant group of clients?
But it's hard to recommend a good course of action without seeing the whole picture.
What version of sql server are you using? If it is 2005+ I would recommend using TVPs (table valued parameters - http://msdn.microsoft.com/en-us/library/bb510489.aspx). The transfer of data will be faster (as opposed to building a huge string) and your query would look nicer:
update c
set lastupdate=getdate()
from clients c
join #mytvp t on c.Id = t.Id
Each SQL statement on its own is a transaction statement . This means sql server is going to grab locks for all these million of rows .It can really degrade the performance of a table .So you really don’t tend to update a table which has million of rows in it which hurts the performance.So the workaround is to set rowcount before DML operation
set rowcount=100
UPDATE Clients SET LastUpdate=GETDATE()
WHERE ID IN ('1,2,3,4,5')';
set rowcount=0
or from SQL server 2008 you can parametrize Top keyword
Declare #value int
set #value=100000
again:
UPDATE top (#value) Clients SET LastUpdate=GETDATE()
WHERE ID IN ('1,2,3,4,5')';
if ##rowcount!=0 goto again
See how fast the above query takes and then adjust and change the value of the variable .You need to break the tasks for smaller units as suggested in the above answers
Method 1:
Split the #clientids with delimiters ','
put in array and iterate over that array
update clients table for each id.
OR
Method 2:
Instead of taking #clientids as a varchar2, follow below steps
create object type table for ids and use join.
For faster processing u can also create index on clientid as well.

Slow join on Inserted/Deleted trigger tables

We have a trigger that creates audit records for a table and joins the inserted and deleted tables to see if any columns have changed. The join has been working well for small sets, but now I'm updating about 1 million rows and it doesn't finish in days. I tried updating a select number of rows with different orders of magnitude and it's obvious this is exponential, which would make sense if the inserted/deleted tables are being scanned to do the join.
I tried creating an index but get the error:
Cannot find the object "inserted" because it does not exist or you do not have permissions.
Is there any way to make this any faster?
Inserting into temporary tables indexed on the joining columns could well improve things as inserted and deleted are not indexed.
You can check ##ROWCOUNT inside the trigger so you only perform this logic above some threshold number of rows though on SQL Server 2008 this might overstate the number somewhat if the trigger was fired as the result of a MERGE statement (It will return the total number of rows affected by all MERGE actions not just the one relevant to that specific trigger).
In that case you can just do something like SELECT #NumRows = COUNT(*) FROM (SELECT TOP 10 * FROM INSERTED) T to see if the threshold is met.
Addition
One other possibility you could experiment with is simply bypassing the trigger for these large updates. You could use SET CONTEXT_INFO to set a flag and check the value of this inside the trigger. You could then use OUTPUT inserted.*, deleted.* to get the "before" and "after" values for a row without needing to JOIN at all.
DECLARE #TriggerFlag varbinary(128)
SET #TriggerFlag = CAST('Disabled' AS varbinary(128))
SET CONTEXT_INFO #TriggerFlag
UPDATE YourTable
SET Bar = 'X'
OUTPUT inserted.*, deleted.* INTO #T
/*Reset the flag*/
SET CONTEXT_INFO 0x