We have a trigger that creates audit records for a table and joins the inserted and deleted tables to see if any columns have changed. The join has been working well for small sets, but now I'm updating about 1 million rows and it doesn't finish in days. I tried updating a select number of rows with different orders of magnitude and it's obvious this is exponential, which would make sense if the inserted/deleted tables are being scanned to do the join.
I tried creating an index but get the error:
Cannot find the object "inserted" because it does not exist or you do not have permissions.
Is there any way to make this any faster?
Inserting into temporary tables indexed on the joining columns could well improve things as inserted and deleted are not indexed.
You can check ##ROWCOUNT inside the trigger so you only perform this logic above some threshold number of rows though on SQL Server 2008 this might overstate the number somewhat if the trigger was fired as the result of a MERGE statement (It will return the total number of rows affected by all MERGE actions not just the one relevant to that specific trigger).
In that case you can just do something like SELECT #NumRows = COUNT(*) FROM (SELECT TOP 10 * FROM INSERTED) T to see if the threshold is met.
Addition
One other possibility you could experiment with is simply bypassing the trigger for these large updates. You could use SET CONTEXT_INFO to set a flag and check the value of this inside the trigger. You could then use OUTPUT inserted.*, deleted.* to get the "before" and "after" values for a row without needing to JOIN at all.
DECLARE #TriggerFlag varbinary(128)
SET #TriggerFlag = CAST('Disabled' AS varbinary(128))
SET CONTEXT_INFO #TriggerFlag
UPDATE YourTable
SET Bar = 'X'
OUTPUT inserted.*, deleted.* INTO #T
/*Reset the flag*/
SET CONTEXT_INFO 0x
Related
I want to delete rows from a Transactions table (which has a foreign key to my Customers table), and then update Customers.StartingBalance to reflect the sum of the deleted amounts.
So I have created a stored procedure. Here's what I have so far.
SET NOCOUNT ON;
DECLARE #CustomerBalances TABLE
(
CustomerId INT,
Amount BIGINT
);
-- Note: Caller has already begun a transaction
DELETE Transactions WITH (TABLOCK)
OUTPUT deleted.CustomerId, deleted.TotalAmount INTO #CustomerBalances
WHERE [TimeStamp] < #ArchiveDateTime;
IF EXISTS (SELECT 1 FROM #CustomerBalances)
BEGIN
UPDATE Customers WITH (TABLOCK)
SET StartingBalance = StartingBalance +
(SELECT SUM(Amount) FROM #CustomerBalances cb WHERE Id = cb.CustomerId)
END;
DELETE FROM #CustomerBalances
Since SQL is not my core competency, I'm trying to understand this query better. In particular, I have some questions about the UPDATE statement above.
This will update all Customers because I have no WHERE clause, right?
This correctly handles cases where a customer has more than one matching row in the #CustomerBalances table, right?
Is the EXISTS clause needed here?
Will SUM() return 0 or NULL if there are no matching rows?
Does everything get cleaned up if I don't have the final DELETE statement?
It is critical that no changes are made to the Transactions or Customers table while I'm doing this. Does my use of TABLOCK make sense here?
Any suggestions about the overall approach I'm taking?
This will update all Customers because I have no WHERE clause, right?
Yes. Consider adding a WHERE clause such as:
WHERE Id IN (SELECT DISTINCT CustomerId FROM #CustomerBalances)
This prevents updating balances that haven't changed.
This correctly handles cases where a customer has more than one matching row in the #CustomerBalances table, right?
Yes. Because you use SUM() to aggregate them.
Is the EXISTS clause needed here?
It's recommended rather than essential. It's good practice so that you only attempt to update balances when records have been archived.
Will SUM() return 0 or NULL if there are no matching rows?
Yes, this is a bug that will cause balances to be set to NULL (or error if NULL not allowed) for customers who had no transactions archived. This will be fixed by adding the WHERE clause noted above. If you're trying to avoid the WHERE for some reason you can fix it with COALESCE(SUM(Amount),0.00)
Does everything get cleaned up if I don't have the final DELETE statement?
Yes. When the procedure completes, the table variable will go out of scope automatically, so the DELETE isn't need, as far as this snippet shows.
It is critical that no changes are made to the Transactions or Customers table while I'm doing this. Does my use of TABLOCK make sense here?
Yes, but you should also specify HOLDLOCK to keep the lock until the transaction completes.
Any suggestions about the overall approach I'm taking?
See above, but in general it looks to be reasonable.
With a simple UPDATE statement we can do it in batches when dealing with huge tables.
WHILE 1 = 1
BEGIN
UPDATE TOP (5000)
dbo.LargeOrders
SET CustomerID = N'ABCDE'
WHERE CustomerID = N'OLDWO';
IF ##rowcount < 5000
BREAK;
END
When working with MERGE statement, is it possible to do similar things? As I know this is not possible, because you need to do different operations based on the condition. For example to UPDATE when matched and to INSERT when not matched. I just want to confirm on it and I may need to switch to the old-school UPDATE & INSERT if it's true.
Why not use a temp table as the source of your MERGE and then handle batching via the source table. I do this in a project of mine where I'll batch 50,000 rows to a temp table and then use the temp table as the source for the MERGE. I do this in a loop until all rows are processed.
The temp table could be a physical table or an in memory table.
I have an update trigger with the following code:
declare #numrows int
select #numrows = ##rowcount
if #numrows <> 1
return
In some cases #numrows returns 0 even though row count is 1. I think it's because the select resets the row count? Anyway, I'm replacing it with this:
set #numrows = (select count(*) from inserted)
Later in the trigger I'm using both inserted and deleted table records. Will the row counts for inserted and deleted always be equal, or do I need to check them separately?
I cant comment on MERGE as Steve has in his answer, but if an UPDATE is run on a table
UPDATE TableA SET Column1 = "ABC" WHERE Column1 = "DEF"
And an update trigger exists on TableA, then when the trigger fires, yes, the count of records in each of the Inserted & the Deleted tables will be the same, and will be equal to the number of rows affected by the update statement that was run.
They will not be equal. Remember there are nice features like MERGE that can INSERT/UPDATE/DELETE all in one transaction which would make a single call to your trigger.
EDIT:
After doing some more testing, my understanding of how MERGE worked is WRONG. They will be separate trigger events. One for each action INSERT/UPDATE/DELETE.
Therefore, I cannot think any reason that equal counts for INSERTED and DELETED would not mean they are all updates. More important, if you have any number of records in both tables, it is an update. Therefore this would be the fastest way for you to determine if it is an update:
IF EXISTS(SELECT TOP 1 1 FROM inserted) AND EXISTS(SELECT TOP 1 1 FROM deleted)
I am glad that merge does not work the way I thought because my triggers (which use the code above) would have failed.
I have two tables with following columns:
SUMMARY(sum_id, sum_number) and DETAILS(det_id, det_number, sum_id)
I want to delete rows from table DETAILS with det_id in list of IDs, which can be done by:
DELETE FROM details WHERE det_id in (1,2,3...)
BUT
At the same time I need to update table SUMMARY if summary.sum_id=details.sum_id
UPDATE summary SET sum_number-=somefunction(details.det_number)
WHERE summary.sum_id=details.sum_id
More over, afterwards it would be totally great to delete rows from SUMMARY table if sum_number<=0
How to do all this in an intelligent way?
What if i know, from the very beginning, both IDs: details.det_id (to delete) AND summary.sum_id which correspond to details.det_id
You did not specify a DBMS so I'm assuming PostgreSQL.
You can do this with a single statement using the new writeable CTE feature:
with deleted as (
delete from details
where det_id in (1,2,3...)
returning details.*
),
new_summary as (
update summary
set sum_number = some_function(deleted.det_number)
from deleted
where delete.sum_id = summary.sum_id
returning summary.sum_id
)
delete from summary
where sum_number <= 0
and sum_id in (select sum_id from new_summary);
The in condition in the outer delete is not strictly necessary, but you may not have CTE definitions that you don't use, so the condition ensures that the new_summary CTE is actually used in the statement. Additionally it might improve performance a bit, because only the changed summary rows are checked (not all).
It is not possible to perform all of these operations in a single statement. You would have to do something like this:
UPDATE summary SET sum_number = somefunction(details.det_number)
FROM summary INNER JOIN details ON summary.sum_id = details.sum_id
DELETE FROM details WHERE det_id IN (1,2,3,...)
DELETE FROM summary WHERE sum_number <= 0
I would use a trigger... then the database is responsible for the deletes.
Using an update trigger, once/if the Update is successfull if will fire the trigger which can do as much or as little as you need... i.e. it can do you're 2 deletes.
For an example have a read of this tutorial:
http://www.mysqltutorial.org/create-the-first-trigger-in-mysql.aspx this answer (http://stackoverflow.com/questions/6296313/mysql-trigger-after-update-only-if-row-has-changed) from stackoverflow also provides a good example.
Is it possible, or recommended at all, to run one update query, that will update nearly 100k records at once?
If so, how can I do that? I am trying to pass an array to my stored proc, but it seems not to work, this is my SP:
CREATE PROCEDURE [dbo].[UpdateAllClients]
#ClientIDs varchar(max)
AS
BEGIN
DECLARE #vSQL varchar(max)
SET #vSQL = 'UPDATE Clients SET LastUpdate=GETDATE() WHERE ID IN (' + #ClientIDs + ')';
EXEC(#vSQL);
END
I have not idea whats not working, but its just not updating the relevant queries.
Anyone?
The UPDATE is reading your #ClientIDs (as a Comma Separated Value) as a whole. To illustrate it more, you are doing like this.
assume the #ClientIDs = 1,2,3,4,5
your UPDATE command is interpreting it like this
UPDATE Clients SET LastUpdate=GETDATE() WHERE ID IN ('1,2,3,4,5')';
and not
UPDATE Clients SET LastUpdate=GETDATE() WHERE ID IN (1,2,3,4,5)';
One suggestion to your question is by using subquery on your UPDATE, example
UPDATE Clients
SET LastUpdate = GETDATE()
WHERE ID IN
(
SELECT ID
FROM tableName
-- where condtion
)
Hope this makes sense.
A few notes to be aware of.
Big updates like this can lock up the target table. If > 5000 rows are affected by the operation, the individual row locks will be promoted to a table lock, which would block other processes. Worth bearing in mind if this could cause an issue in your scenario. See: Lock Escalation
With a large number of rows to update like this, an approach I'd consider is (basic):
bulk insert the 100K Ids into a staging table (e.g. from .NET, use SqlBulkCopy)
update the target table, using a join onto the above staging table
drop the staging table
This gives some more room for controlling the process, but breaking the workload up into chunks and doing it x rows at a time.
There is a limit for the number of items you pass to 'IN' if you are giving an array.
So, if you just want to update the whole table, skip the IN condition.
If not specify an SQL inside IN. That should do the job
The database will very likely reject that SQL statement because it is too long.
When you need to update so many records at once, then maybe your database schema isn't appropriate. Maybe the LastUpdate datum should not be stored separately for each client but only once globally or only once for a constant group of clients?
But it's hard to recommend a good course of action without seeing the whole picture.
What version of sql server are you using? If it is 2005+ I would recommend using TVPs (table valued parameters - http://msdn.microsoft.com/en-us/library/bb510489.aspx). The transfer of data will be faster (as opposed to building a huge string) and your query would look nicer:
update c
set lastupdate=getdate()
from clients c
join #mytvp t on c.Id = t.Id
Each SQL statement on its own is a transaction statement . This means sql server is going to grab locks for all these million of rows .It can really degrade the performance of a table .So you really don’t tend to update a table which has million of rows in it which hurts the performance.So the workaround is to set rowcount before DML operation
set rowcount=100
UPDATE Clients SET LastUpdate=GETDATE()
WHERE ID IN ('1,2,3,4,5')';
set rowcount=0
or from SQL server 2008 you can parametrize Top keyword
Declare #value int
set #value=100000
again:
UPDATE top (#value) Clients SET LastUpdate=GETDATE()
WHERE ID IN ('1,2,3,4,5')';
if ##rowcount!=0 goto again
See how fast the above query takes and then adjust and change the value of the variable .You need to break the tasks for smaller units as suggested in the above answers
Method 1:
Split the #clientids with delimiters ','
put in array and iterate over that array
update clients table for each id.
OR
Method 2:
Instead of taking #clientids as a varchar2, follow below steps
create object type table for ids and use join.
For faster processing u can also create index on clientid as well.