SQL Server Concurrency in update - sql

I have a TABLE:
id status mod runat
1 0 null null
2 0 null null
3 0 null null
4 0 null null
And, I call this query two times, at same time.
UPDATE TABLE
SET
status = 1,
mod = GETDATE()
OUTPUT INSERTED.id
WHERE id = (
SELECT TOP (1) id
FROM TABLE
WHERE STATUS = 0
AND NOT EXISTS(SELECT * FROM TABLE WHERE STATUS = 1)
AND COALESCE(runat, GETDATE()) <= GETDATE()
ORDER BY ID ASC)
And... some times I have:
1
1
Instead
1
NULL
why? Update query isn't transactional?

Short answer
Add WITH (UPDLOCK, HOLDLOCK) to select
UPDATE TABLE
SET
status = 1,
mod = GETDATE()
OUTPUT INSERTED.id
WHERE id = (
SELECT TOP (1) id
FROM TABLE WITH (UPDLOCK, HOLDLOCK)
WHERE STATUS = 0
AND NOT EXISTS(SELECT * FROM TABLE WHERE STATUS = 1)
AND COALESCE(runat, GETDATE()) <= GETDATE()
ORDER BY ID ASC)
Explanation
Because you are using a subquery to get the ID there are basically two statements being run here - a select and an update. When 1 is returned twice it just means both select statements ran before either update was completed. If you add an UPDLOCK, then when the first one runs it holds the UPDLOCK. The second SELECT has to wait for the UPDLOCK to be released by the first select before it can execute.
More information
Exactly what is happening will depending on the locking scheme of your database, and the locks issued by other statements. This kind of update can even lead to deadlocks under certain circumstances.
Because the statements runs so fast it's hard to see what locks they are holding. To effectively slow things down a good trick is to
Open a session and run the first statement with a BEGIN TRANS
statement at the start of it (don't include a COMMIT or ROLLBACK)
Run a query on sys.dm_tran_locks to see what locks are being held
Open a second session and run the second statement and see what
happens. If your locking scheme is setup correctly it should wait
for the first one to finish before it does anything.
Switch back to the first session and COMMIT to simulate it finished
This link has a lot of information but locking and data contention are complex areas with lots of possible solutions. This link should give you everything you need to know to decide how to approach this issue.
https://learn.microsoft.com/en-us/sql/relational-databases/sql-server- transaction-locking-and-row-versioning-guide?view=sql-server-2017

Related

Conditional column in SQL Server based in another column

I'm looking for a solution to have a version column on my table based on another column.
I have a column "document No" in my table. Every time I insert a new row with the same document no, I would like to increase the column version.
I know I can it by the back-end. But, it means I have first to read the table and then insert. My idea is to optimize the performance and leave it with SQL Server.
Is It possible?
pk DocNo Version
---------------------
1 ABC 0
2 CBD 0
3 ABC 1
4 FGH 0
5 ABC 2
Assuming that you can parameterize your query (as in a stored procedure), AND your primary key is set to IDENTITY, you can use something along the lines of:
INSERT INTO TableA (DocNo, Version)
(SELECT TOP 1 'XYZ',ISNULL(MAX(Version)+1,0)
FROM TableA WHERE DocNo = 'XYZ')
I used 'XYZ' where you would place your parameter like:
INSERT INTO TableA (DocNo, Version)
(SELECT TOP 1 #DocNo,ISNULL(MAX(Version)+1,0)
FROM TableA WHERE DocNo = #DocNo)
Stored Procedure Solution
CREATE PROCEDURE tableUpsert(#DocNo varchar(100))
AS
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRAN
IF EXISTS(SELECT * FROM dbo.YourTable WITH (UPDLOCK) WHERE DocNo = #DocNo)
UPDATE dbo.YourTable
SET Version = Version + 1
WHERE DocNo = #DocNo;
ELSE
INSERT dbo.YourTable(DocNo, Version)
VALUES(#DocNo, 1);
COMMIT
Code is pretty self-explanatory. If the record exists, you update by incrementing your VersionNumber column and if it doesn't, then insert a new record with default VersionNumber of 1. Note the use of UPDLOCK to ensure that only your specific process is currently updating the record.
You can use insert trigger. In the trigger, update the Version by getting last version of same DocNo and increment by 1.
update t
set Version = isnull(v.Version, 0) + 1
from inserted i
inner join mytable t on i.pk = t.pk
outer apply
(
select Version = max(Version)
from mytable x
where x.DocNo = i.DocNo
) v
Your version number is implicit in your data. Use the PK to determine it via
SELECT DocNo, ROW_NUMBER() over (PARTITION BY DocNo ORDER BY pk) as version order by DocNo when you retrieve the data (or put that in a view)
Relying on IDENTITY may give you gaps
Relying on MAX(x)+1 may not always work depending on your concurrency model.
Locking the table/column will introduce concurrency issues (which may be unimportant or trivial in your case).

SQL Server : trigger to disable old records beeing updated - single row update works, multiple fails

The following SQL is working fine for one row updates, but fails on multiple row updates:
CREATE TRIGGER update_fix
ON mytable
FOR UPDATE AS
BEGIN
SET NOCOUNT ON;
IF (SELECT f_month FROM inserted) = 99
AND (SELECT x_date FROM mytable
WHERE data_id IN ((SELECT data_id FROM inserted)))<= DateAdd(yy, -2, GetDate())
BEGIN
RAISERROR('Cannot update old records',16,1)
ROLLBACK TRANSACTION
RETURN;
END
END
GO
Can anyone help me changing the trigger to work for updates affecting multiple rows? Should examine each row beeing updated separately - so if there is an update of 10 rows where one of them should not be updated - just that one will not be updated, other rows will be updated sucessfully..Is that even possible with a trigger (having in mind I can not redefine the queries itself a these origin from large system and such change would be out of scope)? I worry that one query = one transaction..ie. all can be rolled back, or all commited, but.. maybe it is possible..?
Use an INSTEAD OF trigger.
Perform the UPDATE in the trigger, and add a WHERE condition to the update so that only the rows that are new enough get updated.
Then you can use whatever logic you prefer to get the rowcounts of rows that were not updated, and use RAISERROR to send a message like "5 rows were not updated due to age" or whatever you want.
Then there is no need to rollback the transaction at all.
Of course it fails. You have an = for a subquery that can return more than one result.
The if should look something like this:
if (exists (select 1 from inserted where f_month = 99) and
exists (select 1
from mytable
where data_id in (select data_id from inserted) and
x_date <= dateadd(year, -2, getdate())
)
)
I'm not 100% sure this is the same logic, because the two conditions could be on different records. It is not clear what the exact conditions you want are.
If both conditions need to be on the same record, the:
if (exists (select 1
from mytable
where data_id in (select data_id from inserted where f_month = 99) and
x_date <= dateadd(year, -2, getdate())
)
)

Delete Duplicates in Table with huge amount of rows

I have a table with 19 million records. I want to delete duplicates, but the query I am using is taking a very long while and eventually connection is timing out.
This is the query I am using:
DELETE FROM [TableName]
WHERE id NOT IN
(SELECT MAX(id) FROM [TableName] GROUP BY field)
where ID is Primary key and auto increment.
I want to delete the duplicates in field.
Is there a faster alternative to this query?
Any help would be appreciated.
i suggest temporarily adding an index onto field to speed things up. maybe use this statement to delete (even though yours should work fine with the index).
my statement generates a list of ids that should be deleted. assuming that id as the primary key is indexed, this is probably faster. this should also perform a little better than not in.
with candidates as (
SELECT id
, ROW_NUMBER() over (PARTITION by field order by id desc) rn
FROM [TableName]
)
delete
from candidates
where rn > 1
My answer is a a spin on Brett Schneiders, with a batched approach (including a small wait) to avoid contention, and alleviate explosive log file growth.
Set your initial #batchcount to something you think the server can handle -- you can also increase/decrease the wait time as needed. Once ##ROWCOUNT=0, the loop will terminate.
declare #batchcount int, #totalrows int
set #totalrows = 0
set #batchcount = 10000 -- set this to some initial value
while #batchcount > 0
begin
;with dupes as (
SELECT id
, ROW_NUMBER() over (PARTITION by field order by id desc) rownum
FROM [TableName]
)
delete top (#batchcount) t1
from TableName t1
join dupes c
on c.id = t1.id
and c.rownum > 1
set #batchcount = ##ROWCOUNT --record how many just got nuked
set #totalrows = #totalrows + #batchcount --track progress
print cast(#totalrows as varchar) + ' rows have been deleted' -- show progress
waitfor delay '00:00:05' -- wait 5 seconds for log writes, other queries etc
end
The print statement may not "show" on every loop in SSMS, but every so often you'll see SQL messages appear showing hundreds of iterations completed... be patient.
Create another heap table and insert there the ids you want to delete. Than delete the records in the main table (where exists in heap table) in chunks of 1000-5000 each to avoid the time out. Good luck!

SQL Server 2008 - Select disjunct rows

I have two concurrent processes and I have two queries, eg.:
select top 10 * into #tmp_member
from member
where status = 0
order by member_id
and then
update member
set process_status = 1
from member inner join #tmp_member m
on member.member_id=m.member_id
I'd like each process to select different rows, so if a row was already selected by the first process, then do not use that one in the second process' result list.
Do I have to play around with locks? UPDLOCK, ROWLOCK, READPAST hints maybe? Or is there a more straightforward solution?
Any help is appreciated,
cheers,
b
You need hints.
See my answer here: SQL Server Process Queue Race Condition
However, you can shorten your query above into a single statement with the OUTPUT clause. Otherwise you'll need a transaction too (asuming each process executes the 2 statements above one after the other)
update m
set process_status = 1
OUTPUT Inserted.member_id
from
(
SELECT top 10
process_status, member_id
from member WITH (ROWLOCK, READPAST, UPDLOCK)
where status = 0
order by member_id
) m
Summary: if you want multiple processes to
select 10 rows where status = 0
set process_status = 1
return a resultset in a safe, concurrent fashion
...then use this code.
Well the problem is that your select/update is not atomic - the second process might select the first 10 items in between the first process having selected and before updating.
There's the OUTPUT clause you can use on the UPDATE statement to make it atomic. See the documentation for details, but basically you can write something like:
DECLARE #MyTableVar table(member_ID INT)
UPDATE TOP (10) Members
SET
member_id = member_id,
process_status = 1
WHERE status = 0
OUTPUT inserted.member_ID
INTO #MyTableVar;
After that #MyTableVar should contain all the updated member IDs.
To meet your goal of having multiple processes work on the member table you will not need to "play around with locks". You will need to change from the #tmp_member table to a global temp table or a permanate table. The table will also need a column to track which process is managing the member row/
You will need a method to provide some kind of ID to each process which will be using the table. The first query will then be modified to exclude any entries in the table by other processes. The second query will be modified to include only those entries by this process

How to Select a record from the database and update it in an atomic query

I have a number of records in a table with a Status column and I want to select a single record where Status = Pending and in the same atomic query mark it as Status = InProcess. What's the best way to do that?
This is needed because multiple queries can be running at the same time trying to process these records and I don't want two threads to be picking up the same record to process.
You can use OUTPUT clause:
UPDATE [table]
SET Status = 'InProcess'
OUTPUT deleted.*
WHERE Status = 'Pending'
Here you can use inserted table name if you want to get row with new status or deleted when old.
Here is an article about Using tables as Queues.
With this table create table T (ID int identity, Status varchar(15))
Something like this should keep you safe from deadlocks.
;with cte as
(
select top 1 *
from T with (rowlock, readpast)
where Status = 'Pending'
order by ID
)
update cte
set Status = 'InProcess'
output inserted.ID, inserted.Status
This should do the trick
UPDATE [table]
SET Status = 'InProcess'
WHERE Status = 'Pending'
SQL 2008 should take care of any locking for you.
This following is kind of a hack, but it worked for me for atomic read/updates:
declare temp1, temp2, ...;
update table
set temp1=column1,
temp2=column2, ...
column1=expression1,
column2=expression2, ...
where conditions;
select temp1, temp2, ...;