How does SQLServer lock rows based on query WHERE clause - sql

They are two sql session
session 1 start first
update table1
set status = 3
where status = 2
session 2 start second, but session 1 stil running
update table1
set status = 4
where status = 2
session 1 and 2 finish
Is possible, that one records will be status 3 and another status 4 ? Or always updated record will be status 3 ?
Sql engine first lock all rows who pass where clause, and another statement have to acquire lock, or record is locked when it is read?
In another words is set lock on where clause and another statement have to acquire lock on this where clause?

This is too long for a comment.
You should read the documentation that SQL Server provides on locking and transactions. Here it is.
Under normal circumstances, databases are ACID-compliant for their transactions. In SQL Server, each update statement is a transaction, so it either completes or does not. Hence, I would expect that under normal circumstances, SQL Server would set all the values to either 3 or all to 4, but not both.
The semantics of transactions are then complicated by the real world and, in particular, different locking schemes. So, you can set parameters to allow for dirty writes, for instance. This is why I'm pointing you to the documentation. There is a basic principle of ACID-ness; beyond that, there are a lot of database specifics.

Related

How can I read dirty values in SQL UPDATE statement WHERE clause

Let's assume I have the following query in two separate SSMS query windows:
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
BEGIN TRANSACTION
UPDATE dbo.Jobs
SET [status] = 'Running'
OUTPUT Inserted.*
WHERE [status] = 'Waiting'
--I'm NOT committing yet
--Commit Transaction
I run query window 1 (but do not commit), and then I run query window 2.
I want for query window 2 to immediately update only rows that were inserted after I started query 1 (all new records come in with a status of 'Waiting'). However, SQL Server is waiting for the first query to finish, because in an update statement it's not reading dirty values (even if it's set to READ UNCOMMITTED);
Is there a way to overcome this?
In my application I will have 2 (or more) processes running it, I want that process 2 should be able to pickup the rows that process 1 have not picked up; I don't want that process 2 should need to wait until process 1 is finish
What you are asking for is simply impossible.
Even at the lowest isolation level of READ UNCOMMITTED (aka NOLOCK), an X-Lock (exclusive) must be taken in order to make modifications. In other words, writes are always locked, even if the reads that fetched those rows were not locked.
So even though session 2 is running under READ UNCOMMITTED also, if it wants to do a modification it must also take an X-Lock, which is incompatible with the first X-Lock.
The solution here is to either do this in one session, or commit immediately. In any case, do not hold locks for any length of time, as it can cause massive blocking chains and even deadlocks.
If you want to just ignore all those rows which have been inserted, you could use the WITH (READPAST) hint.
READ UNCOMMITTED as an isolation level or as a hint has huge issues.
It can cause anything from deadlocks to completely incorrect results. For example, you could read a row twice, or not at all, when by the logical definition of the schema there should have been exactly one row. You could read entire pages twice or not at all.
You can get deadlocks due to U-Locks not being taken in UPDATE and DELETE statements.
And you still take schema locks, so you can still get stuck behind a synchronous statistics update or an index rebuild.

Check if table data has changed?

I am pulling the data from several tables and then passing the data to a long running process. I would like to be able to record what data was used for the process and then query the database to check if any of the tables have changed since the process was last run.
Is there a method of solving this problem that should work across all sql databases?
One possible solution that I've thought of is having a separate table that is only used for keeping track of whether the data has changed since the process was run. The table contains a "stale" flag. When I start running the process, stale is set to false. If any creation, update, or deletion occurs in any of the tables on which the operation depends, I set stale to true. Is this a valid solution? Are there better solutions?
One concern with my solution is situations like this:
One user starts inserting a new row into one of the tables. Stale gets set to true, but the new row has not actually been added yet. Another user has simultaneously started the long running process, pulling the data from the tables and setting the flag to false. The row is finally added. Now the data used for the process is out of date but the flag indicates it is not stale. Would transactions be able to solve this problem?
EDIT:
This is some SQL for my idea. Not sure if it works, but just to give you a better idea of what I was thinking:
# First transaction reads the data and sets the flag to false
BEGIN TRANSACTION
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
UPDATE flag SET stale = false
SELECT * FROM DATATABLE
COMMIT TRANSACTION
# Second transaction updates the data and sets the flag to true
BEGIN TRANSACTION
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
UPDATE data SET val = 15 WHERE ID = 10
UPDATE flag SET stale = true
COMMIT TRANSACTION
I do not have much experience with transactions or handwriting xml, so there are probably issues with this. From what I understand two serializable transactions can not be interleaved. Please correct me if I'm wrong.
Is there a way to accomplish this with only the first transaction? The process will be run rarely, but the updates to the data table will occur more frequently, so it would be nice to not lock up the data table when performing updates.
Also, is the SET TRANSACTION ISOLATION syntax specific to MS?
The stale flag will probably work, but a timestamp would be better since it provides more metadata about the age of the records which could be used to tune your queries, e.g., only pull data that is over 5 minutes old.
To address your concern about inserting a row at the same time a query is run, transactions with an appropriate isolation level will help. For row inserts, updates, and selects, at least use a transaction with an isolation level that prevents dirty reads so that no other connections can see the updated data until the transaction is committed.
If you are strongly concerned about the case where an update happens at the same time as a record pull, you could use the REPEATABLE READ or even SERIALIZABLE isolation levels, but this will slow DB access down.
Your SQLServer sampled should work. For alternate databases, Here's an example that works in PostGres:
Transaction 1
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
-- run queries that update the tables, then set last_updated column
UPDATE sometable SET last_updatee = now() WHERE id = 1;;
COMMIT;
Transaction 2
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
-- select data from tables, then set last_queried column
UPDATE sometable SET last_queried = now() WHERE id = 1;
COMMIT;
If transaction 1 starts, and then transaction 2 starts before transaction 1 has completed, transaction 2 will block during on the update, and then will throw an error when transaction 1 is committed. If transaction 2 starts first, and transaction 1 starts before that has finished, then transaction 1 will error. Your application code or process should be able to handle those errors.
Other databases use similar syntax - MySQL (with InnoDB plugin) requires you to set the isolation level before you start the transaction.

Optimistic locking on writes in Oracle

I've noticed when I do the following:
update t set x = 1 where x = 0; // (1) session 1
update t set x = 2 where x = 0; // (2) session 2
commit; // (3) session 2
At line (2), session 2 will wait for session (1) to commit.
The issue is that I may have lots of users using this table, but I don't want one holding the session open to block all other users.
Ideally, I'd like the session 2 commit to succeed, and then session 1 to throw an error when it attempts to commit, as described here: Optimistic concurrency control.
Is there a way to get Oracle to behave in this way? (I'm using Oracle 10g if that makes a difference).
Rationale and (perhaps bad) solution
I have a nightly replication process that can affect rows on the table. I don't want this to be blocked by a user who leaves open a session on one of those rows.
The only way I can think of to work around this to not give users direct table access, but instead create an updatable view or PL/SQL functions that write to a temporary table, and then provide the user a procedure that performs the actual writes and commits. This way, the time that rows will be locked will be only during the execution of the "commit procedure", which would be limited.
I'd like a solution that is something like this but preferably easier.

Very Slow SQL Update using a Linked Server

I have a sql script running on a server (ServerA)
This server, has a linked server set up (ServerB) - this is located off site in a datacenter.
This query works relatively speeidily:
SELECT OrderID
FROM [ServerB].[DBName].[dbo].[MyTable]
WHERE Transferred = 0
However, when updating the same table using this query:
UPDATE [ServerB].[DBName].[dbo].[MyTable]
SET Transferred = 1
It takes > 1 minute to complete (even if there's only 1 column where Transferred = 0)
Is there any reason this would be acting so slowly?
Should I have an index on MyTable for the "Transferred" column?
If you (I mean SQL server) cannot use index on remote side to select records, such remote update in fact reads all records (primary key and other needed fields) from remote side, updates these locally and sends updated records back. If your link is slow (say 10Mbit/s or less), then this scenario takes lot of time.
I've used stored procedure on remote side - this way you should only call that procedure remotely (with set of optional parameters). If your updateable subset is small, then proper indexes may help too - but stored procedure is usually faster.
UPDATE [ServerB].[DBName].[dbo].[MyTable]
SET Transferred = 1
WHERE Transferred = 0 -- missing this condition?
How often is this table being used?
If this table is used by many users at the same time, you may have a problem with lock/block.
Everytime some process updates a table without filtering the records, the entire table is locked by the transaction and the other processess that needs to update the table stand waiting.
It his case, you may be waiting for some other process to unlock the table.

Concurrency during long running update in TSQL

Using Sql Server 2005. I have a long running update that may take about 60 seconds in our production environment. The update is not part of any explicit transactions nor has any sql hints. While the update is running, what's to be expected from other requests that occur on those rows that will be updated? There's about 6 million total rows in the table that will be updated of which about 500,000 rows will be updated.
Some concurrency concerns/questions:
1) What if another select query (with nolock hint) is performed on this table among some of the rows that are being updated. Will the query wait until the update is finished?
2) What the other select query does not have a nolock hint? Will this query have to wait until the update is finished?
3) What if another update query is performing an update on one of these rows? Will this query have to wait until it's finished?
4) What about deletes?
5) What about inserts?
Thanks!
Dave
Every statement in sql server runs in a transaction. If you don't explicitly start one, the server starts one for every statement and commits it if the statement is successful, and rolls it back if it is not.
The exact locking you'll see with your update, unfortunately, depends. It will start off as row locks, but it is likely that it will be elevated to at least a few page locks based on the number of rows you're updating. Full elevation to a table lock is unlikely, but this depends in some amount on your server - SQL Server will elevate it if the page locks are using too much memory.
If your select is ran with nolock then you will get dirty reads if you happen to select any rows which are involved in the update. This means you will read the uncommitted data, and it may not be consistent with other rows (since those may not have been updated yet).
For all other cases if your statement encounters a row involved in the update, or a row on a locked page (assuming the lock has been elevated) then the statement will have to wait for the update to finish.