Avoiding deadlock when updating table - sql

I've got a 3-tier app and have data cached on a client side, so I need to know when data changed on the server to keep this cache in sync.
So I added a "lastmodification" field in the tables, and update this field when a data change. But some 'parent' lastmodification rows must be updated in case child rows (using FK) are modified.
Fetching the MAX(lastmodification) from the main table, and MAX from a related table, and then MAX of these several values was working but was a bit slow.
I mean:
MAX(MAX(MAIN_TABLE), MAX(CHILD1_TABLE), MAX(CHILD2_TABLE))
So I switched and added a trigger to this table so that it update a field in a TBL_METADATA table:
CREATE TABLE [TABLE_METADATA](
[TABLE_NAME] [nvarchar](250) NOT NULL,
[TABLE_LAST_MODIFICATION] [datetime] NOT NULL
Now related table can update the 'main' table last modification time by just also updating the last modification in the metadata table.
Fetching the lastmodification is now fast
But ... now I've random deadlock related to updating this table.
This is due to 2 transactions modifying the TABLE_METADATA at a different step, and then locking each other.
My question: Do you see a way to keep this lastmodification update without locking the row?
In my case I really don't care if:
The lastmodification stay updated even if the transaction is rollback
The 'dirty' lastmodification (updated but not yet committed) is
overwritten by a new value
In fact, I really don't need these update to be in the transaction, but as they are executed by the trigger it's automatically in the current transaction.
Thank you for any help

As far as I know, you cannot prevent a U-lock. However, you could try reducing the number of locks to a minimum by using with (rowlock).
This will tell the query optimiser to lock rows one by one as they are updated, rather than to use a page or table lock.
You can also use with (nolock) on tables which are joined to the table which is being updated. An alternative to this would be to use set transaction isolation level read uncommitted.
Be careful using this method though, as you can potentially create corrupted data.
For example:
update mt with (rowlock)
set SomeColumn = Something
from MyTable mt
inner join AnotherTable at with (nolock)
on mt.mtId = at.atId
You can also add with (rowlock) and with (nolock)/set transaction isolation level read uncommitted to other database objects which often read and write the same table, to further reduce the likelihood of a deadlock occurring.
If deadlocks are still occurring, you can reduce read locking on the target table by self joining like this:
update mt with (rowlock)
set SomeColumn = Something
from MyTable mt
where mt.Id in (select Id from MyTable mt2 where Column = Condition)
More documentation about table hints can be found here.

Related

Updating different fields in different rows

I've tried to ask this question at least once, but I never seem to put it across properly. I really have two questions.
My database has a table called PatientCarePlans
( ID, Name, LastReviewed, LastChanged, PatientID, DateStarted, DateEnded). There are many other fields, but these are the most important.
Every hour, a JSON extract gets a fresh copy of the data for PatientCarePlans, which may or may not be different to the existing records. That data is stored temporarily in PatientCarePlansDump. Unlike other tables which will rarely change, and if they do only one or two fields, with this table there are MANY fields which may now be different. Therefore, rather than simply copy the Dump files to the live table based on whether the record already exists or not, my code does the no doubt wrong thing: I empty out any records from PatientCarePlans from that location, and then copy them all from the Dump table back to the live one. Since I don't know whether or not there are any changes, and there are far too many fields to manually check, I must assume that each record is different in some way or another and act accordingly.
My first question is how best (I have OKish basic knowledge, but this is essentially a useful hobby, and therefore have limited technical / theoretical knowledge) do I ensure that there is minimal disruption to the PatientCarePlans table whilst doing so? At present, my code is:
IF Object_ID('PatientCarePlans') IS NOT NULL
BEGIN
BEGIN TRANSACTION
DELETE FROM [PatientCarePlans] WHERE PatientID IN (SELECT PatientID FROM [Patients] WHERE location = #facility)
COMMIT TRANSACTION
END
ELSE
SELECT TOP 0 * INTO [PatientCarePlans]
FROM [PatientCareplansDUMP]
INSERT INTO [PatientCarePlans] SELECT * FROM [PatientCarePlansDump]
DROP TABLE [PatientCarePlansDUMP]
My second question relates to how this process affects the numerous queries that run on and around the same time as this import. Very often those queries will act as though there are no records in the PatientCarePlans table, which causes obvious problems. I'm vaguely aware of transaction locks etc, but it goes a bit over my head given the hobby status! How can I ensure that a query is executed and results returned whilst this process is taking place? Is there a more efficient or less obstructive method of updating the table, rather than simply removing them and re-adding? I know there are merge and update commands, but none of the examples seem to fit my issue, which only confuses me more!
Apologies for the lack of knowhow, though that of course is why I'm here asking the question.
Thanks
I suggest you do not delete and re-create the table. The DDL script to create the table should be part of your database setup, not part of regular modification scripts.
You are going to want to do the DELETE and INSERT inside a transaction. Preferably you would do this under SERIALIZABLE isolation in order to prevent concurrency issues. (You could instead do a WITH (TABLOCK) hint, which would be less likely cause a deadlock, but will completely lock the table.)
SET XACT_ABORT, NOCOUNT ON; -- always set XACT_ABORT if you have a transaction
SET TRANSACTION ISOLATION LEVEL SNAPSHOT;
BEGIN TRAN;
DELETE FROM [PatientCarePlans]
WHERE PatientID IN (
SELECT p.PatientID
FROM [Patients] p
WHERE location = #facility
);
INSERT INTO [PatientCarePlans] (YourColumnsHere) -- always specify columns
SELECT YourColumnsHere
FROM [PatientCarePlansDump];
COMMIT;
You could also do this with a single MERGE statement. However it is complex to write (owing to the need to restrict the set of rows being targetted), and is not usually more performant than separate statements, and also needs SERIALIZABLE.

postgresql: delete all locks

question: there is a table with over 9000 rows. It must be cleaned but without any locks (table in active using). I tried to use pg_advisory_unlock_all, but no result.
select pg_advisory_unlock_all();
start transaction();
delete from table where id='1';
DELETE 1
start transaction();
delete from table where id='1';
(waiting for finish first transaction)
There is no way to delete data from a table without locking the rows you want to delete.
That shouldn't be a problem as long as concurrent access doesn't try to modify these rows or insert new ones with id = '1', because writers never block readers and vice versa in PostgreSQL.
If concurrent transactions keep modifying the rows you want to delete, that's a little funny (why would you want to delete data you need?). You'd have to wait for the exclusive locks, and you might well run into deadlocks. In that case, it might be best to lock the whole table with the LOCK statement before you start. Deleting from a table that small should then only take a very short time.

MS SQL table hints and locking, parallelism

Here's the situation:
MS SQL 2008 database with table that is updated approximately once a minute.
The table structure is similar to following:
[docID], [warehouseID], [docDate], [docNum], [partID], [partQty]
Typical working cycle:
User starts data exchange from in-house developed system:
BEGIN TRANSACTION
SELECT * FROM t1
WHERE [docDate] BETWEEN &DateStart AND &DateEnd
AND [warehouseID] IN ('w1','w2','w3')
...then system performs rather long processing of the data selected, generates the list of [docID]s to delete from t1, then goes
DELETE FROM t1 WHERE [docID] IN ('d1','d2','d3',...,'dN')
COMMIT TRANSACTION
Here, the problem is that while 1st transaction processes selected the data, another reads it too and then they together populate the same data in in-house system.
At first, I inserted (TABLOCKX) table hint into SELECT query. And it worked pretty well until users started to complain about system's performance.
Then I changed hints to (ROWLOCK, XLOCK, HOLDLOCK), assuming that it would:
exclusively lock...
selected rows (instead of whole table)...
until the end of transaction
But this seems making a whole table lock anyway. I have no access to database itself, so I can't just analyze these locks (actually, I have no idea yet how to do it, even if I had access)
What I would like to have as a result:
users are able to process data related with different warehouses and dates in parallel
as a result of 1., avoid duplication of downloaded data
Except locks, other solutions I have are (although they both seem clumsy):
Implement a flag in t1, showing that the data is under processing (and then do 'SELECT ... WHERE NOT [flag]')
Divide t1 into two parts: header and details, and apply locks separately.
I beleive that I might misunderstood some concepts with regards to transaction isolation levels and/or table hints and there is another (better) way.
Please, advise!
You may change a concept of workflow.
Instead of deleting records update them with setting extra field Deprecated from 0 to 1.
And read data not from the table but from the view where Deprecated = 0.
BEGIN TRANSACTION
SELECT * FROM vT1
WHERE [docDate] BETWEEN &DateStart AND &DateEnd
AND [warehouseID] IN ('w1','w2','w3')
where vT1 view looks like this:
select *
from t1
where Deprecated = 0
And the deletion will look like this:
UPDATE t1 SET Deprecated = 1 WHERE [docID] IN ('d1','d2','d3',...,'dN')
COMMIT TRANSACTION
Using such a concept you will achieve two goals:
decrease probability of locks
get history of movings on warehouses

Can an UPDATE of a Transaction T1 run at the same time with T2 that does a SELECT get to those rows before the Select does?

Let's say we have a table with many rows and a primary key (index).
T1 will do a SELECT that would search for some rows using the WHERE clause, locking them with Shared locks. At the same time, T2 will do an update on a row that falls into the range of T1's requested rows.
The question is, can the Update get to those rows before he Select does?
How is the engine locking rows when Selecting, one-by-one ,like : read this row,lock it, now move to the next, etc. In this case the Update might get to the rows before the Select gets them? and what if no index was used but a table scan instead?
The Update statement has a Select component too. How does the Update actually lock a row?
One by one, first reads it, then locks it with X, next one etc. In this scenario the Select could get to the rows before the Update does?
And is the Select part of the Update affected by the isolation level?
The question is targeted on traditional ANSI isolation systems and not Oracle/MVCC
There's quite a few questions here but I'll try to address some of them.
Both SELECT and UPDATE will lock as they go through the index or records in the table. Records already locked by SELECT will not be available for UPDATE and the other way round. This may even cause a deadlock, depending on the order of those operations (which is beyond your control).
If you need to update before select, you need to control it from your code level. If you start both at once, SQL Server will just start executing them and locking.
SELECT is affected by the isolation level, e.g. when your isolation level will be read uncommitted, select will read the data and not put any locks.

TSQL Snapshot Isolation

Using SQL2k5, I have a staging table that contains columns that will populate numerous other tables. For instance, a statement like this:
INSERT INTO [appTable1] ([colA], [colB])
SELECT [appTable1_colA], [appTable1_colB]
FROM [stageTable]
A trigger on [appTable1] will then populate the identity column values of the newly inserted rows back into [stageTable]; for this example, we'll say it's [stageTable].[appTable1_ID] which are then inserted into other tables as a FK. More similar statements follow like:
INSERT INTO [appTable2] ([colA], [colB], [colC], [appTable1_FK])
SELECT [appTable2_colA], [appTable2_colB], [appTable2_colC], [appTable1_ID]
FROM [stageTable]
This process continues through numerous tables like this. As you can see, I'm not including a WHERE clause on the SELECTs from the staging table as this table gets truncated at the end of the process. However, this leaves the possibility of another process adding records to this staging table in the middle of this transaction and those records would not contain the FKs previously populated. Would I want to issue this statement to prevent this?:
SET TRANSACTION ISOLATION LEVEL SNAPSHOT
If this is the best solution, what are the downsides of doing it this way?
Can you add a batch id to your staging table, so that you can use it in where clauses to ensure that you are only working on the original batch of records? Any process that adds records to the staging table would have to use a new, unique batch id. This would be more efficient (and more robust) than depending on snapshot isolation, I think.
All Isolation levels, including snapshot, affect only reads. SELECTs from stageTable will not see uncommited inserts, nor it will block. I'm not sure that solves your problem of throwing everything into the stageTable without any regard for ownership. What happens when the transaction finally commits, the stageTable is left with all the intermediate results ready to be read by the next transaction? Perhaps you should use a temporary #stageTable that will ensure a natural isolation between concurent threads.
To understand the cost of using Snapshot isolation, read Row Versioning Resource Usage:
extra space consumed in tempdb
extra space consumed in each row in the data table
extra space consumed in BLOB storage for large fields