Concurrency during long running update in TSQL - sql

Using Sql Server 2005. I have a long running update that may take about 60 seconds in our production environment. The update is not part of any explicit transactions nor has any sql hints. While the update is running, what's to be expected from other requests that occur on those rows that will be updated? There's about 6 million total rows in the table that will be updated of which about 500,000 rows will be updated.
Some concurrency concerns/questions:
1) What if another select query (with nolock hint) is performed on this table among some of the rows that are being updated. Will the query wait until the update is finished?
2) What the other select query does not have a nolock hint? Will this query have to wait until the update is finished?
3) What if another update query is performing an update on one of these rows? Will this query have to wait until it's finished?
4) What about deletes?
5) What about inserts?
Thanks!
Dave

Every statement in sql server runs in a transaction. If you don't explicitly start one, the server starts one for every statement and commits it if the statement is successful, and rolls it back if it is not.
The exact locking you'll see with your update, unfortunately, depends. It will start off as row locks, but it is likely that it will be elevated to at least a few page locks based on the number of rows you're updating. Full elevation to a table lock is unlikely, but this depends in some amount on your server - SQL Server will elevate it if the page locks are using too much memory.
If your select is ran with nolock then you will get dirty reads if you happen to select any rows which are involved in the update. This means you will read the uncommitted data, and it may not be consistent with other rows (since those may not have been updated yet).
For all other cases if your statement encounters a row involved in the update, or a row on a locked page (assuming the lock has been elevated) then the statement will have to wait for the update to finish.

Related

How can I read dirty values in SQL UPDATE statement WHERE clause

Let's assume I have the following query in two separate SSMS query windows:
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
BEGIN TRANSACTION
UPDATE dbo.Jobs
SET [status] = 'Running'
OUTPUT Inserted.*
WHERE [status] = 'Waiting'
--I'm NOT committing yet
--Commit Transaction
I run query window 1 (but do not commit), and then I run query window 2.
I want for query window 2 to immediately update only rows that were inserted after I started query 1 (all new records come in with a status of 'Waiting'). However, SQL Server is waiting for the first query to finish, because in an update statement it's not reading dirty values (even if it's set to READ UNCOMMITTED);
Is there a way to overcome this?
In my application I will have 2 (or more) processes running it, I want that process 2 should be able to pickup the rows that process 1 have not picked up; I don't want that process 2 should need to wait until process 1 is finish
What you are asking for is simply impossible.
Even at the lowest isolation level of READ UNCOMMITTED (aka NOLOCK), an X-Lock (exclusive) must be taken in order to make modifications. In other words, writes are always locked, even if the reads that fetched those rows were not locked.
So even though session 2 is running under READ UNCOMMITTED also, if it wants to do a modification it must also take an X-Lock, which is incompatible with the first X-Lock.
The solution here is to either do this in one session, or commit immediately. In any case, do not hold locks for any length of time, as it can cause massive blocking chains and even deadlocks.
If you want to just ignore all those rows which have been inserted, you could use the WITH (READPAST) hint.
READ UNCOMMITTED as an isolation level or as a hint has huge issues.
It can cause anything from deadlocks to completely incorrect results. For example, you could read a row twice, or not at all, when by the logical definition of the schema there should have been exactly one row. You could read entire pages twice or not at all.
You can get deadlocks due to U-Locks not being taken in UPDATE and DELETE statements.
And you still take schema locks, so you can still get stuck behind a synchronous statistics update or an index rebuild.

Understanding locks and query status in Snowflake (multiple updates to a single table)

While using the python connector for snowflake with queries of the form
UPDATE X.TABLEY SET STATUS = %(status)s, STATUS_DETAILS = %(status_details)s WHERE ID = %(entry_id)s
, sometimes I get the following message:
(snowflake.connector.errors.ProgrammingError) 000625 (57014): Statement 'X' has locked table 'XX' in transaction 1588294931722 and this lock has not yet been released.
and soon after that
Your statement X' was aborted because the number of waiters for this lock exceeds the 20 statements limit
This usually happens when multiple queries are trying to update a single table. What I don't understand is that when I see the query history in Snowflake, it says the query finished successfully (Succeded Status) but in reality, the Update never happened, because the table did not alter.
So according to https://community.snowflake.com/s/article/how-to-resolve-blocked-queries I used
SELECT SYSTEM$ABORT_TRANSACTION(<transaction_id>);
to release the lock, but still, nothing happened and even with the succeed status the query seems to not have executed at all. So my question is, how does this really work and how can a lock be released without losing the execution of the query (also, what happens to the other 20+ queries that are queued because of the lock, sometimes it seems that when the lock is released the next one takes the lock and have to be aborted as well).
I would appreciate it if you could help me. Thanks!
Not sure if Sergio got an answer to this. The problem in this case is not with the table. Based on my experience with snowflake below is my understanding.
In snowflake, every table operations also involves a change in the meta table which keeps track of micro partitions, min and max. This meta table supports only 20 concurrent DML statements by default. So if a table is continuously getting updated and getting hit at the same partition, there is a chance that this limit will exceed. In this case, we should look at redesigning the table updation/insertion logic. In one of our use cases, we increased the limit to 50 after speaking to snowflake support team
UPDATE, DELETE, and MERGE cannot run concurrently on a single table; they will be serialized as only one can take a lock on a table at at a time. Others will queue up in the "blocked" state until it is their turn to take the lock. There is a limit on the number of queries that can be waiting on a single lock.
If you see an update finish successfully but don't see the updated data in the table, then you are most likely not COMMITting your transactions. Make sure you run COMMIT after an update so that the new data is committed to the table and the lock is released.
Alternatively, you can make sure AUTOCOMMIT is enabled so that DML will commit automatically after completion. You can enable it with ALTER SESSION SET AUTOCOMMIT=TRUE; in any sessions that are going to run an UPDATE.

How select for update works for queries with multiple rows/results?

So given this transaction:
select * from table_a where field_a = 'A' for update;
Assuming this gives out multiple rows/results, will the database lock all the results right off the bat? Or will it lock it one row at a time.
If the later is true, does that mean running this query concurrently, can result in a deadlock?
Thus, adding an order by to maintain consistency on the order is needed to solve this problem?
The documentation explains what happens as follows:
FOR UPDATE
FOR UPDATE causes the rows retrieved by the SELECT statement to be
locked as though for update. This prevents them from being locked,
modified or deleted by other transactions until the current
transaction ends. That is, other transactions that attempt UPDATE,
DELETE, SELECT FOR UPDATE, SELECT FOR NO KEY UPDATE, SELECT FOR SHARE
or SELECT FOR KEY SHARE of these rows will be blocked until
the current transaction ends; conversely, SELECT FOR UPDATE will
wait for a concurrent transaction that has run any of those commands
on the same row, and will then lock and return the updated row (or no
row, if the row was deleted). Within a REPEATABLE READ or
SERIALIZABLE transaction, however, an error will be thrown if a row
to be locked has changed since the transaction started. For further
discussion see Section 13.4.
The direct answer to your question is that Postgres cannot lock all the rows "right off the bat"; it has to find them first. Remember, this is row-level locking rather than table-level locking.
The documentation includes this note:
SELECT FOR UPDATE modifies selected rows to mark them locked, and so
will result in disk writes.
I interpret this as saying that Postgres executes the SELECT query and as it finds the rows, it marks them as locked. The lock (for a given row) starts when Postgres identifies the row. It continues until the end of the transaction.
Based on this, I think it is possible for a deadlock situation to arise using SELECT FOR UPDATE.

How to remove deadlocks in SQL Server 2005?

First of all I would like to know what is the actual root cause of deadlocks in SQL Server 2005. Is it because when two processes access the same row in a table?
Anyways, consider two tables _Table_Now_ and _Table_History_ where both having the same structure.
Suppose there is one column called NAME.
So when one process tries to UPDATE a record with NAME='BLUE' in _Table_Now_, first, it need to put the present row with NAME='BLUE' into _Table_History_ then update
_Table_Now_, and also delete previously present row from _Table_History_.
Deadlock occurs while deleting. I do not understand why?
Please guide me!
deadlock basically mean when process A is dependent on process B and process B is dependent on process A, so A will just start\continue when B finishes and B will only start\continue when A finishes
what you may be experiencing are table (or row) lock, so SQL locks the row before updating the table to make sure no other process tries to access that row while it is doing the update.
Can you be more specific on how are you doing the insert\update\delete. You shouldnt have deadlocks in this scenario.
FYI, don't use with (NOLOCK). It will yes prevent from locking but it does so by telling SQL Server to read uncommitted data, and it can end up in data inconsistencies.
Deadlock occurs when Process A is waiting for Process B to release resources and Process B is waiting for Process A to release resources.
If I understand the order of Updates correctly, it is this:
1. Read a row in Table_Now
2. Update a row in Table_History
3. Update a row in Table_Now
4. Delete a row in Table_History.
This could be a risky order if you are using transactions or locks incorrectly.
To avoid deadlocks, for each process you should execute:
1. Begin Transaction (Preferably table lock)
2. Perform all the DB operations
3. Commit the transaction (or Rollback in case any problem occurs while DB update)
This will ensure each process to lock both the tables, perform all the operations and then exit.
If you are already using transactions, what scope and level are you using? If not, introduce transactions. It should solve the problem.

SQL Server row lock

How to row lock in SQL Server 2005. I execute a sql for row locking and that is
SELECT *
FROM authors
WITH (HOLDLOCK, ROWLOCK)
WHERE au_id = '274-80-9391'
it work fine but in this case row is lock for update not for selection. I just want to know how to lock a row as a result another user can not see that row when issue a SQL in SQL Server. please guide me. thanks
You can't hide a row so that it won't be seen by other SQL queries. If you open a transaction, lock a row, and hold open the transaction, you can cause other SQL queries to block waiting for your transaction to end and the lock to clear. However if the queries are run with a different transaction isolation level (e.g.: Read Uncommitted) then they will bypass the lock and still see that row's values.
If you want to skip rows that are locked you can use the READPAST hint in SQL Server. This needs to be specified on the query that is reading the locked rows, not the query that's locking them. From Books Online:
Specifies that the Database Engine not read rows that are locked by
other transactions. When READPAST is specified, row-level locks are
skipped.
A SELECT statement with a READPAST hint will return all the non-locked rows and skip the locked rows.