A DBA that my company hired to troubleshoot deadlock issues just told me that our OLTP databases locking problems will improve if we set the transaction level to READ COMMITTED from READ UNCOMMITTED.
Isn't that just 100% false? READ COMMITTED will cause more locks, correct?
More Details:
Our data is very "siloed" and user specific. 99.9999999 % of all user interactions work with your own data and our dirty read scenarios, if they happen, can barely effect what the user is trying to do.
Thanks for all the answers, the dba in question ended up being useless, and we fixed the locking issues by adding a single index.
I regret that I didn't specify the locking problems were occurring for update statements and not regular selects. From my googing the two different query types have distinct solutions when dealing with locking issues.
That does sound like a bit of a rash decision, however without all the details of your environment it is difficult to say.
You should advise your DBA to consider the use of SQL Server's advanced isolation features, i.e. the use of Row Versioning techniques. This was introduced to SQL Server 2005 to specifically address issues with OLTP database that experience high locking.
The following white paper contains quite complicated subject matter but it is a must read for all exceptional DBA's. It includes example of how to use each of the additional isolation levels, in different types of environments i.e. OLTP, Offloaded Reporting Environment etc.
http://msdn.microsoft.com/en-us/library/ms345124.aspx
In summary it would be both foolish and rash to modify the transaction isolation for all of your T-SQL queries without first developing a solid understanding of how the excessive locking is occuring within your environment.
I hope this helps but please let me know if you require further clarification.
Cheers!
Doesn't it depend on what your problem is: for example if your problem is a deadlock, mightn't an increase in the locking level cause an earlier acquisition of locks and therefore a decreased possibility of deadly embrace?
It the data is siloed and you are still getting deadlocks then you may simply need to add rowlock hints to the queries that are causing the problem so that the locks are taken at the row level and not the page level (which is the default).
READ UNCOMMITTED will reduce the number of locks if you are locking data because of SELECT statements. If you are locking data because of INSERT, UPDATE and DELETE statements then changing the isolation level to READ UNCOMMITTED won't do anything for you. READ UNCOMMITTED has the same effect as adding WITH (NOLOCK) to your queries.
This sounds scary. Do you really want to just change these parameters to avoid deadlocks? Maybe the data needs to be locked?
That said, it could be that the DBA is referring to the new (as of SQL Server 2005) READ COMMITTED SNAPSHOT that uses row versioning and can eliminate some kinds of deadlocks.
http://www.databasejournal.com/features/mssql/article.php/3566746/Controlling-Transactions-and-Locks-Part-5-SQL-2005-Snapshots.htm
Related
In our application as per recommendation from DBA, we are adding no lock hint to each and every select query used.
So, it requires that each select query needed to be modified to set the table hint and it will be time to do manually.
As we want to use the hint across all tables in database, Is it possible to set the no lock hint ( or TRANSACTION ISOLATION LEVEL READ UNCOMMITTED ) at database level so that each query is not needed to be modified and table hint be applied to all queries?
The short answer is "no". The default isolation level in SQL Server is READ COMMITTED, and there is no way to change this to UNCOMMITTED, either globally or per-database. And that's a very good thing too.
WITH (NOLOCK) is a recipe for trouble when it comes to getting accurate results from your database, and in bad cases it can even cause timeouts from queries that run forever due to data getting moved (which NOLOCK cannot protect against). See Is the NOLOCK (Sql Server hint) bad practice? for some more discussion, and some good tips on alternatives.
In particular, many applications that are reader-heavy and want to proceed without blocking can benefit from snapshot isolation. Unlike UNCOMMITTED, you can make snapshot isolation the default with the READ_COMMITTED_SNAPSHOT option. Be sure to read up on the pros and cons of snapshot isolation before you do this -- or better yet, ask your DBA to do this, as any DBA who recommends a global use of WITH (NOLOCK) has some reading up to do. Query hints should be used only as a last resort.
I have a stored procedure that inserts into several tables in a single transaction. I know transactions can maintain data consistency in non-concurrent situations by allowing rollbacks after errors, power failure, etc., but if other code selects from these tables before I commit the transaction, could it possibly select inconsistent data?
Basically, can you select uncommitted transactions?
If so, then how do people typically deal with this?
This depends on the ISOLATION LEVEL of the read query rather than the transaction. This can be set centrally on the connection or provided in the SELECT hint.
See:
Connection side: http://msdn.microsoft.com/en-us/library/system.data.isolationlevel.aspx
Database side: http://msdn.microsoft.com/en-us/library/ms173763.aspx
As already mentioned by Aliostad, this depends on the selected isolation level. The Wikipedia article has examples of the different common scenarios.
So yes, you can choose to get uncommitted data, but only by choice. I never did that and I have to admit that the idea seems a bit ... dangerous to me. But there are probably reasonable use cases.
Extending Aliostad's answer:
By default, other reading processes won't read data that is being changed (uncommitted, aka "dirty reads"). This applies to all clients and drivers
You have to override this default deliberately with the NOLOCK hint or changing isolation level to allow "dirty reads".
I need execute an update statement over an sql server table. This table is used by another process at the same time. Because that sometimes deadlocks occurs, which Isolation Level do you recommend to avoid or minimize these deadlocks?
READ UNCOMMITTED
But that allows the process to read the data before a transaction has committed, what is known as a dirty read. Further Reading
You may prefer to turn on row versioning, the update creates a new version of the row and any other select statements use the old version until this one has committed. To do this turn on READ_COMMITTED_SNAPSHOT mode. There is more info here. There is an overhead involved maintaining the versions of the rows but it removes UPDATE/SELECT deadlocks.
The suggestions to use READ UNCOMMITTED here are ok, but they really side-step the issue of why you're getting a deadlock in the first place. If you don't care about dirty reads then that's fine, but if you need to benefits of isolation (consistency, etc) then I recommend figuring out a proper locking strategy in your application.
I don't have the answer for you on that one - I've been working out some strategies on that myself. See the comments of this question for some discussion.
Look into snapshot isolation - using this level of isolation is a good compromise between consistency and speed. I might be shot down in flames for saying this, however I believe that deadlocks are much more difficult to encounter at this isolation level.
Whether this is the right thing to do to get around your deadlock situation is another matter entirely.
Use a cursor or a loop to update small numbers of rows in a batch, this avoids SQL Server escalting to a table lock.
I was looking at potential concurrency issues in DB so i went to read up. I found http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/com.ibm.db2.udb.doc/admin/c0005267.htm and it mentions access to uncommitted data.
Access to uncommitted data.
Application A might update a value in
the database, and application B might
read that value before it was
committed. Then, if the value of A is
not later committed, but backed out,
the calculations performed by B are
based on uncommitted (and presumably
invalid) data.
What... i thought other sessions (same app and even same thread) can read data that has not been committed yet? I thought only the connection/session (i am not sure of my terminology) that wrote the data into the uncommitted transaction can read uncommitted data.
Can other threads really read data that hasnt been committed?
I plan to use mysql but i may use sqlite
What other sessions can read depends on how you set up your database. In MySQL it also depends on what database engine you use. The term you're looking for (in ANSI SQL terms) is "isolation level".
Many databases will default to an isolation level where reads on uncommitted data will block. So if transaction A updates record 1234 in table T and then transaction B tries to select record 1234 before A commits or rolls back then B will block until A does one of those things.
See MySQL Transactions, Part II - Transaction Isolation Levels.
One serious downside of this is that batch update operations that live in long-running transactions (typically) can potentially block many requests.
You can also set it so B will see uncommitted data but that is often ill-advised.
Alternatively you can use a scheme called MVCC ("Multiversion concurrency control"), which will give different transactions a consistent view of the data based on the time the transaction started. This avoids the uncommitted read problem (reading data that may be rolled back) and is much more scalable, especially in the context of long-lived transactions.
MySQL supports MVCC.
Certainly in SQL Server you can, you have to chose to do it, it is not the default, but if you use the right isolation level or query hint you can chose to read an uncommitted row, this can leads to problems and even a double read of the same row in theory.
That article mentions access to uncommitted data as one of the problems eliminated by the database manager.
The database manager controls this
access to prevent undesirable effects,
such as:
...
Access to uncommitted data.
MySQL's InnoDB storage engine supports several transaction isolation levels. For details, see
http://dev.mysql.com/doc/refman/5.4/en/set-transaction.html.
For some versions of some databases, setting queries to be able to read uncommitted will improve performance, because of reduced locking. That still leaves questions of security, reliability, and scalability to be answered.
To give a specific, I used to work on a very large e-commerce site. They used read uncommitted on reads to the store catalog, since the data was heavily accessed, infrequently changed, and not sensitive to concerns about reading uncommitted data. Any data from the catalog that was used to place an order would be re-verified anyway. This was on SQL Server 2000, which was known to have locking performance problems. On newer versions of SQL Server, the locking performance has improved, so this wouldn't be necessary.
I am encountering very infrequent yet annoying SQL deadlocks on a .NET 2.0 webapp running on top of MS SQL Server 2005. In the past, we have been dealing with the SQL deadlocks in the very empirical way - basically tweaking the queries until it work.
Yet, I found this approach very unsatisfactory: time consuming and unreliable. I would highly prefer to follow deterministic query patterns that would ensure by design that no SQL deadlock will be encountered - ever.
For example, in C# multithreaded programming, a simple design rule such as the locks must be taken following their lexicographical order ensures that no deadlock will ever happen.
Are there any SQL coding patterns guaranteed to be deadlock-proof?
Writing deadlock-proof code is really hard. Even when you access the tables in the same order you may still get deadlocks [1]. I wrote a post on my blog that elaborates through some approaches that will help you avoid and resolve deadlock situations.
If you want to ensure two statements/transactions will never deadlock you may be able to achieve it by observing which locks each statement consumes using the sp_lock system stored procedure. To do this you have to either be very fast or use an open transaction with a holdlock hint.
Notes:
Any SELECT statement that needs more than one lock at once can deadlock against an intelligently designed transaction which grabs the locks in reverse order.
Zero deadlocks is basically an incredibly costly problem in the general case because you must know all the tables/obj that you're going to read and modify for every running transaction (this includes SELECTs). The general philosophy is called ordered strict two-phase locking (not to be confused with two-phase commit) (http://en.wikipedia.org/wiki/Two_phase_locking ; even 2PL does not guarantee no deadlocks)
Very few DBMS actually implement strict 2PL because of the massive performance hit such a thing causes (there are no free lunches) while all your transactions wait around for even simple SELECT statements to be executed.
Anyway, if this is something you're really interested in, take a look at SET ISOLATION LEVEL in SQL Server. You can tweak that as necessary. http://en.wikipedia.org/wiki/Isolation_level
For more info, see wikipedia on Serializability: http://en.wikipedia.org/wiki/Serializability
That said -- a great analogy is like source code revisions: check in early and often. Keep your transactions small (in # of SQL statements, # of rows modified) and quick (wall clock time helps avoid collisions with others). It may be nice and tidy to do a LOT of things in a single transaction -- and in general I agree with that philosophy -- but if you're experiencing a lot of deadlocks, you may break the trans up into smaller ones and then check their status in the application as you move along. TRAN 1 - OK Y/N? If Y, send TRAN 2 - OK Y/N? etc. etc
As an aside, in my many years of being a DBA and also a developer (of multiuser DB apps measuring thousands of concurrent users) I have never found deadlocks to be such a massive problem that I needed special cognizance of it (or to change isolation levels willy-nilly, etc).
There is no magic general purpose solution to this problem that work in practice. You can push concurrency to the application but this can be very complex especially if you need to coordinate with other programs running in separate memory spaces.
General answers to reduce deadlock opportunities:
Basic query optimization (proper index use) hotspot avoidanant design, hold transactions for shortest possible times...etc.
When possible set reasonable query timeouts so that if a deadlock should occur it is self-clearing after the timeout period expires.
Deadlocks in MSSQL are often due to its default read concurrency model so its very important not to depend on it - assume Oracle style MVCC in all designs. Use snapshot isolation or if possible the READ UNCOMMITED isolation level.
I believe the following useful read/write pattern is dead lock proof given some constraints:
Constraints:
One table
An index or PK is used for read/write so engine does not resort to table locks.
A batch of records can be read using a single SQL where clause.
Using SQL Server terminology.
Write Cycle:
All writes within a single "Read Committed" transaction.
The first update in the transaction is to a specific, always-present record
within each update group.
Multiple records may then be written in any order. (They are "protected"
by the write to the first record).
Read Cycle:
The default read committed transaction level
No transaction
Read records as a single select statement.
Benefits:
Secondary write cycles are blocked at the write of first record until the first write transaction completes entirely.
Reads are blocked/queued/executed atomically between the write commits.
Achieve transaction level consistency w/o resorting to "Serializable".
I need this to work too so please comment/correct!!
As you said, always access tables in the same order is a very good way to avoid deadlocks. Furthermore, shorten your transactions as much as possible.
Another cool trick is to combine 2 sql statements in one whenever you can. Single statements are always transactional. For example use "UPDATE ... SELECT" or "INSERT ... SELECT", use "##ERROR" and "##ROWCOUNT" instead of "SELECT COUNT" or "IF (EXISTS ...)"
Lastly, make sure that your calling code can handle deadlocks by reposting the query a configurable amount of times. Sometimes it just happens, it's normal behaviour and your application must be able to deal with it.
In addition to consistent sequence of lock acquisition - another path is explicit use of locking and isolation hints to reduce time/resources wasted unintentionally acquiring locks such as shared-intent during read.
Something that none has mentioned (surprisingly), is that where SQL server is concerned many locking problems can be eliminated with the right set of covering indexes for a DB's query workload. Why? Because it can greatly reduce the number of bookmark lookups into a table's clustered index (assuming it's not a heap), thus reducing contention and locking.
If you have enough design control over your app, restrict your updates / inserts to specific stored procedures and remove update / insert privileges from the database roles used by the app (only explicitly allow updates through those stored procedures).
Isolate your database connections to a specific class in your app (every connection must come from this class) and specify that "query only" connections set the isolation level to "dirty read" ... the equivalent to a (nolock) on every join.
That way you isolate the activities that can cause locks (to specific stored procedures) and take "simple reads" out of the "locking loop".
Quick answer is no, there is no guaranteed technique.
I don't see how you can make any application deadlock proof in general as a design principle if it has any non-trivial throughput. If you pre-emptively lock all the resources you could potentially need in a process in the same order even if you don't end up needing them, you risk the more costly issue where the second process is waiting to acquire the first lock it needs, and your availability is impacted. And as the number of resources in your system grows, even trivial processes have to lock them all in the same order to prevent deadlocks.
The best way to solve SQL deadlock problems, like most performance and availability problems is to look at the workload in the profiler and understand the behavior.
Not a direct answer to your question, but food for thought:
http://en.wikipedia.org/wiki/Dining_philosophers_problem
The "Dining philosophers problem" is an old thought experiment for examining the deadlock problem. Reading about it might help you find a solution to your particular circumstance.