SQL - how can I avoid a concurrency conflict on this UPDATE? - sql

I have a table. Multiple external processes will be simultaneously looking at it & updating it.
I want to write some SQL to update "the one row with the highest PRIORITY, which has not already been updated (by another external process)".
The processes access my table using the SQL below. But my fear is... I'm not sure what will happen if two processes attempt to run this code at nearly the same time. Is there some risk that two nearly simultaneous instances of this code will both try to update the same row?
I just want to be sure that the code, as written (using a CTE) runs the SELECT and UPDATE in a single transaction, with no chance of multiple processes selecting & updating the same row. If that's not already the case, then let me know what I'd need to change to accomplish this.
Thanks!
WITH MostUrgentWorkAssignment AS
(
SELECT TOP(1) *
FROM
dbo.WorkAssignments a
WHERE
a.UserID IS NULL
ORDER BY
a.Priority
)
UPDATE MostUrgentWorkAssignment
SET UserID = #UserID

Shouldn't that TSQL be something like this, avoiding the unnecessary CTE?
UPDATE [dbo].[WorkAssignments]
SET
[UserId] = #UserID
FROM
[dbo].[WorkAssignments] [A]
JOIN
(
SELECT TOP 1
[A].[Id]
FROM
[dbo].[WorkAssignments] [A]
WHERE
[A].[UserId] IS NULL
ORDER BY
[A].[Priority]
) [MostUrgentWorkAssignment]
ON [MostUrgentWorkAssignment].[Id] = [A].[Id];
If you have a sensible Isolation Level this statement will be safe. The select and update will run within an implicit transaction as they are part of the same statement. I suspect this is equally true, with or without the CTE.

Related

SQL Server filter rows which are being selected in other transactions

i have a couple of jobs Update from select queries e.g
UPDATE TABLE_X
SET "stopFlag" = 1
OUTPUT
INSERTED."RowID" AS "rowID"
WHERE "RowID" IN (
SELECT TOP 50
"RowID"
FROM
TABLE_X
WHERE
stopFlag=0
)
Currently i was thinking that the update cannot conflict with another update but as i see the logs of my database tables it seems that 2 different jobs executed for a single row resulting in messed up data. My question is. Is this a proper way to filter the rows from being selected. If it is then what am i missing?
A transaction is not necessary here, as every statement runs in an auto-commit transaction anyway.
You could up the isolation level to SERIALIZABLE, which may be more consistent, at the cost of more blocking. You could also add an UPDLOCK hint to the inner reference of Table_X.
But I think the best thing to do here will actually improve performance also: don't self-join the table, just update the derived table directly
UPDATE x
SET stopFlag = 1
OUTPUT
inserted.RowID AS rowID
FROM (
SELECT TOP 50
RowID,
stopFlag
FROM
TABLE_X
WHERE
stopFlag = 0
) x;
An UPDLOCK is automatically taken on any rows read from the table reference which is being updated, so that is not necessary.
If you want the statements to run concurrently, but mark and return disjoint rows, use READPAST. You can even introduce ordering guarantees, eg:
UPDATE TABLE_X
SET "stopFlag" = 1
OUTPUT
INSERTED."RowID" AS "rowID"
WHERE "RowID" IN (
SELECT TOP 50
"RowID"
FROM
TABLE_X with (rowlock, updlock, readpast)
WHERE
stopFlag=0
ORDER BY "RowID"
)
See generally Using tables as Queues.

Is there a possibility of deadlock when updating many rows using "IN()" in postgres?

Currently, we are updating many rows at the same time using this statement:
update my_table set field = 'value' where id in (<insert ids here>);
My worry is, it might cause a deadlock with another query that we run in intervals:
select my_table where field = 'value' for update order by id;
The query above will fetch multiple rows.
Is this scenario possible?
Just a bit of background:
We added the order by id before since when we run the query above multiple times at the same time, we were having random deadlocks due to different orders by that query.
We were wondering if this applies to update statements as well.
Yes, these can deadlock. To avoid this, run the select ... for update order by id in the same transaction immediately before the update. This will lock all rows affected and avoid any other transaction from running the same select ... for update query.
I am not saying consolidate the same two tasks. I am saying use the same locking select in both.

SQL Server BEFORE UPDATE trigger which adds in a timestamp on a field before UPDATE execution

Hi good people of stackoverflow,
I am working on a trigger for a table on SQL SERVER 2008 R2 for auditing purpose which should add in a timestamp for the UPDATE_TS field before the UPDATE query is sent for execution. The result being that the update occurs with the original values to update on query plus the additional value of the UPDATE_TS as set by the trigger.
I have as well edited this question since I hear that inner-joins are not very heavy in terms of performance on triggers in comparison to not using them. I am not sure if this will add an additional overhead on the trigger as opposed to avoiding inner join in the trigger.
The example I am working on is below. Thank you for any help and suggestions!
Example Table is called MY_TABLE:
CREATE TABLE [myschema].[MY_TABLE](
[MY_TABLE_ID] [bigint] IDENTITY(1,1) NOT NULL,
[FIELD_TO_UPDATE] [varchar](255) NOT NULL,
[CREATE_TS] [datetime] NULL,
[UPDATE_TS] [datetime] NULL),
PRIMARY KEY (MY_TABLE_ID))
TRIGGER to create:
CREATE TRIGGER [myschema].[my_table_update_ts_trigger] ON [mydb].[myschema].[MY_TABLE]
INSTEAD OF UPDATE
AS
BEGIN
UPDATE INTO MY_TABLE ([FIELD_TO_UPDATE],[UPDATE_TS])
SELECT ins.FIELD_TO_UPDATE, GETDATE() FROM INSERTED as ins
END
You need to identify the row(s) you need to update, and you do this with a join or semi-join. It's not going to get much more efficient than this, unless you simply don't perform the update at all:
CREATE TRIGGER [myschema].[my_table_update_ts_trigger]
ON [myschema].[MY_TABLE]
INSTEAD OF UPDATE
AS
BEGIN
UPDATE t SET
FIELD_TO_UPDATE = i.FIELD_TO_UPDATE,
UPDATE_TS = CURRENT_TIMESTAMP
FROM myschema.MY_TABLE AS t
INNER JOIN inserted AS i
ON t.MY_TABLE_ID = i.MY_TABLE_ID;
END
GO
Here is the execution plan:
Since you need to match the rows in inserted to your base table, and since there may be more than one row that gets updated by any operation (triggers fire per statement in SQL Server, not per row like in some other platforms), and since this isn't a BEFORE update but an INSTEAD OF update (meaning you still have to actually perform the UPDATE that would have happened without the trigger in place), you need to have output from both tables in order to perform the update accurately. This means you need a JOIN, and you cannot use a SEMI-JOIN (e.g. EXISTS), which probably still violates your outlandish requirements anyway. If you only needed to update the timestamp, you could do this:
UPDATE t SET UPDATE_TS = CURRENT_TIMESTAMP
FROM myschema.MY_TABLE AS t
WHERE EXISTS (SELECT 1 FROM inserted WHERE MY_TABLE_ID = t.MY_TABLE_ID);
Unfortunately, that will not work, because FIELD_TO_UPDATE gets lost without actually pulling in the inserted pseudo-table in a proper join.
Another way is to use a CROSS APPLY, e.g.:
UPDATE t SET
FIELD_TO_UPDATE = i.FIELD_TO_UPDATE,
UPDATE_TS = CURRENT_TIMESTAMP
FROM inserted AS i
CROSS APPLY myschema.MY_TABLE AS t
WHERE i.MY_TABLE_ID = t.MY_TABLE_ID;
It, too, is missing the nasty JOIN keyword, but it is still performing a JOIN. You can see this because the execution plans are identical:
Now, you can theoretically do this without a join, but that doesn't mean it will perform better. In fact I guarantee you beyond a shadow of a doubt that this will be less efficient, even though it does not contain a single four-letter word like JOIN:
DECLARE #NOW DATETIME = CURRENT_TIMESTAMP,
#MY_TABLE_ID INT,
#FIELD_TO_UPDATE VARCHAR(255);
DECLARE c CURSOR LOCAL FAST_FORWARD FOR
SELECT MY_TABLE_ID, FIELD_TO_UPDATE FROM inserted;
OPEN c;
FETCH NEXT FROM c INTO #FIELD_TO_UPDATE, #MY_TABLE_ID;
WHILE ##FETCH_STATUS = 0
BEGIN
UPDATE myschema.MY_TABLE SET
FIELD_TO_UPDATE = #FIELD_TO_UPDATE,
UPDATE_TS = #NOW
WHERE MY_TABLE_ID = #MY_TABLE_ID;
FETCH NEXT FROM c INTO #FIELD_TO_UPDATE, #MY_TABLE_ID;
END
CLOSE c;
DEALLOCATE c;
That said, if you think even for a second that this solution is going to be faster than the one with joins, I have some swampland in Florida to sell you. There are multiple bridges on the property, too. I'm not even going to bother showing the execution plans for this one.
Let's also compare what happens in an INSTEAD OF INSERT trigger. Here is an example, probably similar to what you had:
CREATE TRIGGER myschema.ins_my_table
ON myschema.MY_TABLE
INSTEAD OF INSERT
AS
INSERT myschema.MY_TABLE(FIELD_TO_UPDATE, CREATE_TS)
SELECT FIELD_TO_UPDATE, CURRENT_TIMESTAMP FROM inserted;
GO
This, too, will produce a plan that looks like two queries were executed:
It is important to note that an INSTEAD OF trigger cancels the original update, and you are responsible for issuing your own (even though the plan still shows two queries).
One final option would be to use an AFTER trigger instead of an INSTEAD OF trigger. This will allow you to update the timestamp without the JOIN, because the FIELD_TO_UPDATE has already been updated. But in this case you really will see two queries, and two queries will really be executed (it won't just look that way in the plans).
Some general comments
Since i'm going for performance increase I do not want to use any inner joins in the code used for the trigger.
This doesn't really make much sense; why do you think joins are bad for performance? Sounds like you've watched too many NoSQL videos. Please don't discard technology because you've heard it was bad or because you had a slow join once. Create the query that makes sense, optimize when it doesn't perform well, and come for help when you can't optimize. In almost all cases (there are exceptions, of course), the problem is indexing or statistics and not the fact that you used a JOIN keyword. That doesn't mean you should avoid all joins in all queries at all costs.
If you only want not to see the word JOIN, it's possbile, just write it like this.
CREATE TRIGGER [myschema].[my_table_update_ts_trigger]
ON [myschema].[MY_TABLE]
INSTEAD OF UPDATE
AS
BEGIN
UPDATE t
SET FIELD_TO_UPDATE = i.FIELD_TO_UPDATE,
UPDATE_TS = CURRENT_TIMESTAMP
FROM myschema.MY_TABLE AS t,
inserted AS i
WHERE t.MY_TABLE_ID = i.MY_TABLE_ID;
END
GO

Updating 100k records in one update query

Is it possible, or recommended at all, to run one update query, that will update nearly 100k records at once?
If so, how can I do that? I am trying to pass an array to my stored proc, but it seems not to work, this is my SP:
CREATE PROCEDURE [dbo].[UpdateAllClients]
#ClientIDs varchar(max)
AS
BEGIN
DECLARE #vSQL varchar(max)
SET #vSQL = 'UPDATE Clients SET LastUpdate=GETDATE() WHERE ID IN (' + #ClientIDs + ')';
EXEC(#vSQL);
END
I have not idea whats not working, but its just not updating the relevant queries.
Anyone?
The UPDATE is reading your #ClientIDs (as a Comma Separated Value) as a whole. To illustrate it more, you are doing like this.
assume the #ClientIDs = 1,2,3,4,5
your UPDATE command is interpreting it like this
UPDATE Clients SET LastUpdate=GETDATE() WHERE ID IN ('1,2,3,4,5')';
and not
UPDATE Clients SET LastUpdate=GETDATE() WHERE ID IN (1,2,3,4,5)';
One suggestion to your question is by using subquery on your UPDATE, example
UPDATE Clients
SET LastUpdate = GETDATE()
WHERE ID IN
(
SELECT ID
FROM tableName
-- where condtion
)
Hope this makes sense.
A few notes to be aware of.
Big updates like this can lock up the target table. If > 5000 rows are affected by the operation, the individual row locks will be promoted to a table lock, which would block other processes. Worth bearing in mind if this could cause an issue in your scenario. See: Lock Escalation
With a large number of rows to update like this, an approach I'd consider is (basic):
bulk insert the 100K Ids into a staging table (e.g. from .NET, use SqlBulkCopy)
update the target table, using a join onto the above staging table
drop the staging table
This gives some more room for controlling the process, but breaking the workload up into chunks and doing it x rows at a time.
There is a limit for the number of items you pass to 'IN' if you are giving an array.
So, if you just want to update the whole table, skip the IN condition.
If not specify an SQL inside IN. That should do the job
The database will very likely reject that SQL statement because it is too long.
When you need to update so many records at once, then maybe your database schema isn't appropriate. Maybe the LastUpdate datum should not be stored separately for each client but only once globally or only once for a constant group of clients?
But it's hard to recommend a good course of action without seeing the whole picture.
What version of sql server are you using? If it is 2005+ I would recommend using TVPs (table valued parameters - http://msdn.microsoft.com/en-us/library/bb510489.aspx). The transfer of data will be faster (as opposed to building a huge string) and your query would look nicer:
update c
set lastupdate=getdate()
from clients c
join #mytvp t on c.Id = t.Id
Each SQL statement on its own is a transaction statement . This means sql server is going to grab locks for all these million of rows .It can really degrade the performance of a table .So you really don’t tend to update a table which has million of rows in it which hurts the performance.So the workaround is to set rowcount before DML operation
set rowcount=100
UPDATE Clients SET LastUpdate=GETDATE()
WHERE ID IN ('1,2,3,4,5')';
set rowcount=0
or from SQL server 2008 you can parametrize Top keyword
Declare #value int
set #value=100000
again:
UPDATE top (#value) Clients SET LastUpdate=GETDATE()
WHERE ID IN ('1,2,3,4,5')';
if ##rowcount!=0 goto again
See how fast the above query takes and then adjust and change the value of the variable .You need to break the tasks for smaller units as suggested in the above answers
Method 1:
Split the #clientids with delimiters ','
put in array and iterate over that array
update clients table for each id.
OR
Method 2:
Instead of taking #clientids as a varchar2, follow below steps
create object type table for ids and use join.
For faster processing u can also create index on clientid as well.

Is update with nested select atomic operation?

I need to select first (let's say) 10000 rows in database and return them. There may be more clients that do this operation at one time. I came up with this query:
update v set v.batch_Id = :batchId
from tblRedir v
inner join (
select top 10000 id
from tblRedir
where batch_Id is null
order by Date asc
) v2 on v.id=v2.id
It is a operation that consists from update and nested select. Both the queries work on the same table (tblRedir). The idea is that the rows are first marked by a unique batchId and then returned via
select * from tblRedir where batch_id = :batchId
(the batchid is a unique identifier (e.g. timestamp or guid) for each this update)
My question:
I thought that the operation update with nested select is atomic - that means that every client receives his own set of data that is unique (no other client received a subset of his data).
However it looks that I'm wrong - in some cases there are clients that receive no data, because probably they first both execute the select and then both execute the update (so the first client has no marked rows).
Is this operation atomic or not?
I work with Sql server 2005. The query is run via NHibernate like this
session.CreateSQLQuery('update....')
SELECT places shared locks on the rows read which then can be lifted in READ COMMITED isolation mode.
UPDATE places the update locks later promoted to exclusive locks. They are not lifted until the end of the transaction.
You should make the locks to retain as soon as they are placed.
You can do it by making the transaction isolation level REPEATABLE READ which will retain the shared locks until the end of the transaction and will prevent UPDATE part from locking these rows.
Alternatively, you can rewrite your query as this:
WITH q AS
(
SELECT TOP 10000 *
FROM mytable WITH (ROWLOCK, READPAST)
WHERE batch_id IS NULL
ORDER BY
date
)
UPDATE q
SET batch_id = #myid
, which will just skip the locked rows.