Is a half-written values reading prevented when SELECT WITH (NOLOCK) hint?

Is a half-written values reading prevented when SELECT WITH (NOLOCK) hint? - sql

I think it is inter-DBMS question although I specify it in SQL Server terminology.
Having read msdn documentation, for ex., [ 1 ], I could not understand:
Is it possible to select half-written )partially-overwritten, -updated, -deleted, -inserted) values WITH( NOLOCK) values and if not how is it (half-written values reading) prevented (if no locks are respected)?
Violation of which DBMS principle is reading of half-written value?
I am having difficulties in identifying its term (is it consistence, integrity break)?
What is the name of corresponding term?
Update:
I deleted from this post the questions on UPDATE (DELETE) WITH(NOLOCK).
msdn docs, for example, [ 1 ] and multiple articles tell that SELECT WITH(NOLOCK) is the same as READUNCOMMITTED and "No shared locks are issued to prevent other transactions from modifying data read by the current transaction, and exclusive locks set by other transactions do not block the current transaction from reading the locked data".
Do I understand correctly that DBMS ensures that only completely written (committed or not) values can be read?
How is it ensured if no locks are used or respected?
This question is not about which transaction can read what and when but how reading of incompletely written values is prevented.
Update2:
Since this question started to be downvoted and closed, I moved questions on UPDATE(DELETE) WITH(NOLOCK) to msdn forum:
What is the meaning of UPDATE and DELETE WITH(NOLOCK) statements?
I also repeated this same question in msdn forum:
Is a half-written values reading prevented WITH (NOLOCK) hint?
which caused there complete confusion.
Though, why is it (being closed here as not a question having an answer)?
It is very basic fundamental concept, having simple answer, obligatory for clear understanding by database developers and DBAs.
[ 1 ]
Table Hints (Transact-SQL)
SQL Server 2008 R2
http://msdn.microsoft.com/en-us/library/ms187373.aspx

Oracle doesn't allow dirty reads under any circumstances (ie reading another session's uncommitted values). It violates 'Isolation' (the I in ACID) for the database and potentially gives an apparent inconsistency for the reading operation [eg seeing a child record without a parent].
There are two mechanisms in play.
Firstly, each record has a lock byte indicating whether it is currently locked or not. The value of the byte points to a transaction in the block header, so a session can determine whether the lock is its own or belongs to another session.
If a read sees that the byte is set then it uses a pointer in the block header to find an older version of the block. If it is still locked, it keeps following the pointers until it gets to a version of the block where the record shows as unlocked. Then it returns the value.
The same mechanism is also used for time based consistency. If a select started at 3pm and it finds a block modified at 3:02 pm then it follows the history back to find the version of the block that was current at 3:00. It may then find that the record it wants was locked at 3:00pm [it may have been committed at 3:01pm] and has to go back further to see what the committed value was at 3:00pm.
The other protection mechanism is a latch. When it reads a block, it takes a latch on it for the duration of the read. This prevents another process (potentially running on another CPU) from accessing the block during the duration of the read (ie process A cannot set the lock byte at the same time as thread B is reading the block - it has to wait until the read is finished). These latches are very low level CPU operations and are only held for very short durations. On a single core/cpu box, latching isn't necessary as there's only one core so only on thread can execute at one time anyway.

After some contemplation and experimenting, I believe, that DBMS-es do not provide such value integrity:
Had we assumed such possibility, then we immediately come to conclusion that the same transaction in the same transaction scope can use incompletely defined (written, inserted, updated, deleted) values what really never happens;
Values are used not only by DBMS but outside of it by operating system and other software frameworks
It is most probably the transaction features implemented at operating system, or even more low (machine, hardware) level.
Update:
The Gary's answer supports it:
"These latches are very low level CPU operations and are only held for very short durations."
Though, I did not intend to mix discussion of DBMS transaction isolation phenomena with low-level transaction support provided by hardware.
Update2:
And, how would be better to name the corresponding low-level (hardware transaction support) term distinctively in order to avoid its confusion and inconsistencies with DBMS transaction terminology? Is it value consistency or value integrity?
Update3:
Suppose I have 2 GB string in nvarchar(max) having 1GB RAM. How would CPU provide integrity for this value on hardware level?
The answer by Razvan Socol in msdn thread Is a half-written values reading prevented WITH (NOLOCK) hint? gives a script catching the reading of partially-updated values. That site is currently down and I re-produce this code here:
1) Create a Test table and fill 10 rows of it:
if object_id('Test') IS not NULL
drop table Test;
CREATE TABLE Test (
ID int IDENTITY PRIMARY KEY,
Txt nvarchar(max) NOT NULL
)
GO
-----------
INSERT INTO Test
SELECT REPLICATE(CONVERT(nvarchar(max),
CHAR(65+ABS(CHECKSUM(NEWID()))%26)),100000)
GO 10
---
2)
In first session (tab of SSMS) launch time-consuming update:
UPDATE Test
SET Txt=REPLICATE(CONVERT(nvarchar(max),
CHAR(65+ABS(CHECKSUM(NEWID()))%26)),100000)
GO 1000
3)
In second session (tab of SSMS) launch catching half-overwritten values:
WHILE 1=1 BEGIN
SELECT Txt FROM Test WITH (NOLOCK)
WHERE LEN(REPLACE(Txt,LEFT(Txt,1),''))<>0;
select 'rowcount inside=',##rowcount;
IF ##ROWCOUNT<>0 BREAK
END
--for wishing to try it in non-SqlServer DBMS
-- WITH(NOLOCK) hint is another way as setting READ UNCOMMITTED tx iso level
--SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
Well, it is SSMS of SQL Server 2008 R2 catching quite rapidly a bunch of half-overwritten values.
I am curious what are the results in other DBMS?

Related

Using NOLOCK for reading single static row. Whats the harm?

Can anyone with DEADLOCK experience enlighten me?
I read that it can cause log file corruption - is that possible? I think MS would never do that. Also if "some situations", like mine, are okay with DEADLOCK, why not use it?
I have no datasets, return tables (like other posts in Stack Overflow). I have one SQL statement with ID select which returns only one row like:
sqlstr = "SELECT Parameter1 FROM Companies WITH (NOLOCK) WHERE ID = 25
Also, this parameter does not change. But as this is a heavy load aspnet application (not a web site) and I run this kind of query again and again, every SQL read causes a lock in SQL server. If possible I'd prefer to avoid that.
Every post in this site is about multiple records, recordsets, dirty reads. I could not find anything about "reading single record which is not changing all the time".
Any expert's opinion, please?

This simple select statement when executed without any lock/nolock hints under default transaction isolation level , obtains a shared lock on the row, It means other users can also read this row while its being read by this query.
On the other hand when you specify WITH (NOLOCK) query hint, it does not obtain any locks at all. In this case again other users can read this row as well but you might be reading a dirty row (data that has not been committed to disk yet and is in the process of being modified).
So in either case this simple select will not cause a deadlock. So really the question you should be asking yourself is, should users be able to see dirty data or not? and in most cases the answer would be no.
Therefore do not worry about getting deadlocks with this select query. as long as you are using default transaction isolation level. In a more strict isolation level like seriallizable a select can lock out other users but in default isolation level you should be ok.

NOLOCK has two main disadvantages: It can return uncommitted data (you don't seem worried about that) and it can cause queries to spuriously fail under very rare circumstances. Never will NOLOCK cause physical database corruption.
Consider using snapshot isolation for transactions that only read data. Readers under SI do not lock or block. SI takes them out of the picture. It provides perfect consistency for read-only transactions. Be sure to find out about the drawbacks.

It isn't worth it.
NOLOCK is often exploited as a magic way to speed up database reads, but I try to avoid using it whever possible.
The result set can contain rows that have not yet been committed, that are often later rolled back.
An error or Result set can be empty, be missing rows or display the same row multiple times.
This is because other transactions are moving data at the same time you're reading it.
READ COMMITTED adds an additional issue where data is corrupted within a single column where multiple users change the same cell simultaneously.
There are other side-effects too, which result in sacrificing the speed increase you were hoping to gain in the first place.
Now you know, never use it again.

After deep searches and asking questions to many experts I found out that using NOLOCK hint causes no problem in this scenario, yet its not advised. nothing wrong with NOLOCK but as I use sql2014 I "should" use ISOLATION LEVEL option. Its a method came instead of NOLOCK. For example for huge table selects that cause deadlocks:
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN TRANSACTION;
SELECT * FROM HugeTable;
COMMIT TRANSACTION;
is very handy.
I had HugeTable and a web form that uses sqlAdapter and Radgrid to show this data. Whenever I run this report, though indexes and paging of radgrid is fine, it caused deadlock, which makes sense. I changed select statement of sqlAdapter to above sentence, its perfect now.
best.

When updating two columns of one record, is such change is atomic?

The question itself is mostly defined in the subject, I'm trying to update two columns of a record with similar statement:
UPDATE SomeTableName
SET Field1 = 1,
Field2 = 2
WHERE ID = 123;
Is that change is atomic or not in SQL Server? In simple words, if power outage occurs during middle of update operation (or any other catastrophic event), does it mean that I can end up with only one field updated?

In theory ALL transactions are atomic -- can't guarantee no possible bug in sql server could break this.
If you don't speficy an explicit transaction, each statement is its own transaction.
Power failures, etc. don't cause a problem because the transaction log is applied on restart.
ADDED
Re: comment about prior question 21468742
Sorry, I don't think so -- a lot to read there, but I saw nothing violating atomicity there, it appeared to be a confusion of atomicity and isolation. And I see that Martin Smith came to the same conclusion. Think of it this way, when you update stuff like this you are updating a disk block by rewriting the whole block (or database base page). With a log and commit architecture the whole block is written and committed, or none of it is. In case of P/F the last good write is known, and if a failed write happens and it not marked complete it is not applied to the database from the tranlog on restart.

Lock issues on large recordset

I have a database table that I use as a queue system, where separate process that talk to each other create and read entries in the table. For example, when a user initiates a search an entry is created, then another process that runs every second or two will pick up that new entry, update the status and then do a search, updating the entry again when the search is complete. This all seems to work well with thousands of searches per hour.
However, I have a master admin screen that lets me view the status of all of these 'jobs' but it runs very slowly. I basically return all entries in the table for the last hour so I can keep an eye on what's going on. I think that I am running into lock issues of some sort. I only need to read each entry, and don't really care if it the data is a little bit out of date. I just use a standard 'Select * from Table' statement so maybe it is waiting for other locks to expire before returning data as the jobs are constantly updating the data.
Would this be handled better by a certain kind of cursor to return each row one at a time, etc? Any other ideas?
Thanks

If you really don't care if the data is a bit out of date... or if you only need the data to be 99.99% accurate, consider using WITH (NOLOCK):
SELECT * FROM Table WITH (NOLOCK);
This will instruct your query to use the READ UNCOMMITTED ISOLATION LEVEL, which has the following behavior:
Specifies that dirty reads are allowed. No shared locks are issued to
prevent other transactions from modifying data read by the current
transaction, and exclusive locks set by other transactions do not
block the current transaction from reading the locked data.
Be aware that NOLOCK may cause some inaccuracies in your data, so it probably isn't a good idea to use it throughout the rest of your system.

You need FROM yourtable WITH (NOLOCK) table hint.
You may also want to look at transaction isolation in your update process, if you aren't already

An alternative to NOLOCK (which can lead to very bad things, such as missed rows or duplicated rows) is to allow read committed snapshot isolation at the database level and then issue your query with:
SET TRANSACTION ISOLATION LEVEL SNAPSHOT;

Is non-atomic value SELECT specific to SQL Server or it is possible in other DBMS-es?

My answer to my question Is a half-written values reading prevented when SELECT WITH (NOLOCK) hint? cites a script illustrating catching non-atomic reads (SELECTs) of partly-updated values in SQL Server.
Is such non-atomic (partly updated, inserted, deleted) value reading problem specific to SQL Server?
Is it possible in other DBMS-es?
Update:
Not long time ago I believed that READ UNCOMMITTED transaction isolation level (also achieved through WITH(NOLOCK) hint in SQL Server) permitted reading (from other transactions) the uncommitted (or committed, if not yet changed) values but not partly modified (partly updated, partly inserted, partly deleted) values.
Update2:
The first two answers deviated the discussion to attacking READ UNCOMMITTED (isolation level ) phenomena specified by ANSI/ISO SQL-92 specifications.
This question is not about this.
Is non-atomicity of a value (not row!) is compliant with READ UNCOMMITTED and dirty read at all?
I believed that READ UNCOMMITTED did imply reading of uncommitted rows in their entirety but not partly modified values.
Does the definition of "dirty read" include possibility of value modification non-atomicity?
Is it a bug or by design?
or by ANSI SQL92 definition of "dirty read"? I believed that "dirty read" did include atomic reading uncommitted rows but non-atomically modified values...

Is it possible in other DBMS-es?
As far as I know the only other databases that allow READ UNCOMMITTED are DB2, Informix and MySQL when using a non-transactional engine.

All hell would break loose if atomic statements were in fact not atomic.
I can answer this for MSSQL - all single statements are atomic, "dirty reads" refers to the
possibility of reading a "phantom row" that might not exist after TX is committed/rolled back.

There is a difference between Atomicity and READ COMMITTED if the implementation of the latter relies on locking.
Consider transactions A and B. Transaction A is a single SELECT for all records with a status of 'pending' (perhaps a full scan on a very large table so it takes several minutes).
At 3:01 transaction A reads record R1 in the database and sees its status is 'New' so doesn't return it or lock it.
At 3:02 transaction B updates record R1 from 'New' to 'Pending' and record R2000 from 'New' to 'Pending' (single statement)
At 3:03 transaction B commits
At 3:04 transaction A reads record R2000, sees it is 'Pending' and committed and returns it (and locks it).
In this situation, the select in transaction A has only seen part of Transaction B, violating atomicity. Technically though, the select has only returned committed records.
Databases relying on locking reads suffer from this problem because the only solution would be to lock the entirety of the table(s) being read so no-one can update any records in any of them. This would make it impractical for any concurrent activity.
In practice, most OLTP applications have very quick transactions operating on very small data volumes (relative to the database size), and concurrent operations tend to hit different 'slices' of data so the situation occurs very rarely. Even if it does happen, it doesn't necessarily result in a noticeable problem and even when it does they are very hard to reproduce and fixing them would require a whole new architecture. In short, despite being a theoretical problem, in practice it often isn't worth worrying about.
That said, an architect should be aware of the potential issue, be able to assess the risk for a particular application and determine alternatives.
That's one reason why SQL Server added non-locking consistent reads in 2005.

Database theory requires that in all isolation levels, the individual UPDATE or INSERT statements are atomic. Their intermediate results should not be visible to read uncommitted transactions. This has been stated in a paper by a group of well-known database experts. http://research.microsoft.com/apps/pubs/default.aspx?id=69541
However, as read uncommitted results are not considered transactionally consistent by definition, it is possible that implementations may contain bugs that result in part-updated row sets to be returned and these bugs have not been noticed in tests because of the difficulty to determine the validity of the returned result sets.

MySQL: Transactions vs Locking Tables

I'm a bit confused with transactions vs locking tables to ensure database integrity and make sure a SELECT and UPDATE remain in sync and no other connection interferes with it. I need to:
SELECT * FROM table WHERE (...) LIMIT 1
if (condition passes) {
// Update row I got from the select
UPDATE table SET column = "value" WHERE (...)
... other logic (including INSERT some data) ...
}
I need to ensure that no other queries will interfere and perform the same SELECT (reading the 'old value' before that connection finishes updating the row.
I know I can default to LOCK TABLES table to just make sure that only 1 connection is doing this at a time, and unlock it when I'm done, but that seems like overkill. Would wrapping that in a transaction do the same thing (ensuring no other connection attempts the same process while another is still processing)? Or would a SELECT ... FOR UPDATE or SELECT ... LOCK IN SHARE MODE be better?

Locking tables prevents other DB users from affecting the rows/tables you've locked. But locks, in and of themselves, will NOT ensure that your logic comes out in a consistent state.
Think of a banking system. When you pay a bill online, there's at least two accounts affected by the transaction: Your account, from which the money is taken. And the receiver's account, into which the money is transferred. And the bank's account, into which they'll happily deposit all the service fees charged on the transaction. Given (as everyone knows these days) that banks are extraordinarily stupid, let's say their system works like this:
$balance = "GET BALANCE FROM your ACCOUNT";
if ($balance < $amount_being_paid) {
charge_huge_overdraft_fees();
}
$balance = $balance - $amount_being paid;
UPDATE your ACCOUNT SET BALANCE = $balance;
$balance = "GET BALANCE FROM receiver ACCOUNT"
charge_insane_transaction_fee();
$balance = $balance + $amount_being_paid
UPDATE receiver ACCOUNT SET BALANCE = $balance
Now, with no locks and no transactions, this system is vulnerable to various race conditions, the biggest of which is multiple payments being performed on your account, or the receiver's account in parallel. While your code has your balance retrieved and is doing the huge_overdraft_fees() and whatnot, it's entirely possible that some other payment will be running the same type of code in parallel. They'll be retrieve your balance (say, $100), do their transactions (take out the $20 you're paying, and the $30 they're screwing you over with), and now both code paths have two different balances: $80 and $70. Depending on which ones finishes last, you'll end up with either of those two balances in your account, instead of the $50 you should have ended up with ($100 - $20 - $30). In this case, "bank error in your favor".
Now, let's say you use locks. Your bill payment ($20) hits the pipe first, so it wins and locks your account record. Now you've got exclusive use, and can deduct the $20 from the balance, and write the new balance back in peace... and your account ends up with $80 as is expected. But... uhoh... You try to go update the receiver's account, and it's locked, and locked longer than the code allows, timing out your transaction... We're dealing with stupid banks, so instead of having proper error handling, the code just pulls an exit(), and your $20 vanishes into a puff of electrons. Now you're out $20, and you still owe $20 to the receiver, and your telephone gets repossessed.
So... enter transactions. You start a transaction, you debit your account $20, you try to credit the receiver with $20... and something blows up again. But this time, instead of exit(), the code can just do rollback, and poof, your $20 is magically added back to your account.
In the end, it boils down to this:
Locks keep anyone else from interfering with any database records you're dealing with. Transactions keep any "later" errors from interfering with "earlier" things you've done. Neither alone can guarantee that things work out ok in the end. But together, they do.
in tomorrow's lesson: The Joy of Deadlocks.

I've started to research the same topic for the same reasons as you indicated in your question. I was confused by the answers given in SO due to them being partial answers and not providing the big picture. After I read couple documentation pages from different RDMS providers these are my takes:
TRANSACTIONS
Statements are database commands mainly to read and modify the data in the database. Transactions are scope of single or multiple statement executions. They provide two things:
A mechanism which guaranties that all statements in a transaction are executed correctly or in case of a single error any data modified by those statements will be reverted to its last correct state (i.e. rollback). What this mechanism provides is called atomicity.
A mechanism which guaranties that concurrent read statements can view the data without the occurrence of some or all phenomena described below.
Dirty read: A transaction reads data written by a concurrent
uncommitted transaction.
Nonrepeatable read: A transaction re-reads data it has previously read
and finds that data has been modified by another transaction (that
committed since the initial read).
Phantom read: A transaction re-executes a query returning a set of
rows that satisfy a search condition and finds that the set of rows
satisfying the condition has changed due to another recently-committed
transaction.
Serialization anomaly: The result of successfully committing a group
of transactions is inconsistent with all possible orderings of running
those transactions one at a time.
What this mechanism provides is called isolation and the mechanism which lets the statements to chose which phenomena should not occur in a transaction is called isolation levels.
As an example this is the isolation-level / phenomena table for PostgreSQL:
If any of the described promises is broken by the database system, changes are rolled back and the caller notified about it.
How these mechanisms are implemented to provide these guaranties is described below.
LOCK TYPES
Exclusive Locks: When an exclusive lock acquired over a resource no other exclusive lock can be acquired over that resource. Exclusive locks are always acquired before a modify statement (INSERT, UPDATE or DELETE) and they are released after the transaction is finished. To explicitly acquire exclusive locks before a modify statement you can use hints like FOR UPDATE(PostgreSQL, MySQL) or UPDLOCK (T-SQL).
Shared Locks: Multiple shared locks can be acquired over a resource. However, shared locks and exclusive locks can not be acquired at the same time over a resource. Shared locks might or might not be acquired before a read statement (SELECT, JOIN) based on database implementation of isolation levels.
LOCK RESOURCE RANGES
Row: single row the statements executes on.
Range: a specific range based on the condition given in the statement (SELECT ... WHERE).
Table: whole table. (Mostly used to prevent deadlocks on big statements like batch update.)
As an example the default shared lock behavior of different isolation levels for SQL-Server :
DEADLOCKS
One of the downsides of locking mechanism is deadlocks. A deadlock occurs when a statement enters a waiting state because a requested resource is held by another waiting statement, which in turn is waiting for another resource held by another waiting statement. In such case database system detects the deadlock and terminates one of the transactions. Careless use of locks can increase the chance of deadlocks however they can occur even without human error.
SNAPSHOTS (DATA VERSIONING)
This is a isolation mechanism which provides to a statement a copy of the data taken at a specific time.
Statement beginning: provides data copy to the statement taken at the beginning of the statement execution. It also helps for the rollback mechanism by keeping this data until transaction is finished.
Transaction beginning: provides data copy to the statement taken at the beginning of the transaction.
All of those mechanisms together provide consistency.
When it comes to Optimistic and Pessimistic locks, they are just namings for the classification of approaches to concurrency problem.
Pessimistic concurrency control:
A system of locks prevents users from modifying data in a way that
affects other users. After a user performs an action that causes a
lock to be applied, other users cannot perform actions that would
conflict with the lock until the owner releases it. This is called
pessimistic control because it is mainly used in environments where
there is high contention for data, where the cost of protecting data
with locks is less than the cost of rolling back transactions if
concurrency conflicts occur.
Optimistic concurrency control:
In optimistic concurrency control, users do not lock data when they
read it. When a user updates data, the system checks to see if another
user changed the data after it was read. If another user updated the
data, an error is raised. Typically, the user receiving the error
rolls back the transaction and starts over. This is called optimistic
because it is mainly used in environments where there is low
contention for data, and where the cost of occasionally rolling back a
transaction is lower than the cost of locking data when read.
For example by default PostgreSQL uses snapshots to make sure the read data didn't change and rolls back if it changed which is an optimistic approach. However, SQL-Server use read locks by default to provide these promises.
The implementation details might change according to database system you chose. However, according to database standards they need to provide those stated transaction guarantees in one way or another using these mechanisms. If you want to know more about the topic or about a specific implementation details below are some useful links for you.
SQL-Server - Transaction Locking and Row Versioning Guide
PostgreSQL - Transaction Isolation
PostgreSQL - Explicit Locking
MySQL - Consistent Nonlocking Reads
MySQL - Locking
Understanding Isolation Levels (Video)

You want a SELECT ... FOR UPDATE or SELECT ... LOCK IN SHARE MODE inside a transaction, as you said, since normally SELECTs, no matter whether they are in a transaction or not, will not lock a table. Which one you choose would depend on whether you want other transactions to be able to read that row while your transaction is in progress.
http://dev.mysql.com/doc/refman/5.0/en/innodb-locking-reads.html
START TRANSACTION WITH CONSISTENT SNAPSHOT will not do the trick for you, as other transactions can still come along and modify that row. This is mentioned right at the top of the link below.
If other sessions simultaneously
update the same table [...] you may
see the table in a state that never
existed in the database.
http://dev.mysql.com/doc/refman/5.0/en/innodb-consistent-read.html

Transaction concepts and locks are different. However, transaction used locks to help it to follow the ACID principles.
If you want to the table to prevent others to read/write at the same time point while you are read/write, you need a lock to do this.
If you want to make sure the data integrity and consistence, you had better use transactions.
I think mixed concepts of isolation levels in transactions with locks.
Please search isolation levels of transactions, SERIALIZE should be the level you want.

I had a similar problem when attempting a IF NOT EXISTS ... and then performing an INSERT which caused a race condition when multiple threads were updating the same table.
I found the solution to the problem here: How to write INSERT IF NOT EXISTS queries in standard SQL
I realise this does not directly answer your question but the same principle of performing an check and insert as a single statement is very useful; you should be able to modify it to perform your update.

I'd use a
START TRANSACTION WITH CONSISTENT SNAPSHOT;
to begin with, and a
COMMIT;
to end with.
Anything you do in between is isolated from the others users of your database if your storage engine supports transactions (which is InnoDB).

You are confused with lock & transaction. They are two different things in RMDB. Lock prevents concurrent operations while transaction focuses on data isolation. Check out this great article for the clarification and some graceful solution.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas