In my database I have two closely related tables.
There is a frequently called SP that INSERTs some rows into both tables within a transaction, and several other places that do SELECTs from these tables joined.
INSERTs take X locks on both tables, SELECTs take S or IS locks on them. Since the order in which shared locks are taken varies from query to query, some of them occasionally get deadlocked with the INSERT transaction.
Is there any good way to avoid these deadlocks (NOLOCK probably doesn't qualify as 'good')?
So far the only general solution I can think of is using SNAPSHOT isolation level. However, it would add some performance overhead, and I haven't yet found any sound data on how large this overhead is.
I use snapshot in my system. It is not free, that's for sure, but the alternatives are not free either - blocking uses up resources too. Using rowlock does not always help.
Also snapshot gives you a consistent point in time snapshot of your data; otherwise you are exposed to some subtle bugs.
One more thing: you can get deadlocks even if you have only one table, examples here: Reproducing deadlocks involving only one table
Does you SP select or update anything from these tables inside transaction? If not, you can try to use rowlock hints for your inserts and other selects (rowlocks usually do not escalate into page or table locks, unless you have too many rows in select results). If yes, then you can try updlock hint for your selects inside SP transaction.
I'm not sure if this will help you but I found this blog awhile ago and it helped me cleanup some deadlocks.
Related
Let's say I have a table with 1,000,000 rows and running a SELECT * FROM TableName on this table takes around 10 seconds to return the data.
Without a NOLOCK statement (putting to one side issues around dirty reads) would this query lock the table for 10 seconds meaning that no other process could read or write to the table?
I am often informed by DBAs then when querying live data to diagnose data issues I should use NOLOCK to ensure that I don't lock the table which may cause issues for users. Is this true?
The NOLOCK table hint will cause that no shared locks will be taken for the table in question; same with READUNCOMMITTED isolation level, but this time applies not to a single table but rather to everything involved. So, the answer is 'no, it won't lock the table'. Note that possible schema locks will still be held even with readuncommitted.
Not asking for shared locks the read operation will potentially read dirty data (updated but not yet committed) and non-existent data (updated, but rolled back), transactionally inconsistent in any case, and migh even skip the whole pages as well (in case of page splits happening simultaneously with the read operation).
Specifying either NOLOCK or READUNCOMMITTED is not considered a good practice. It will be faster, of course. Just make sure you're aware of consequences.
Also, support for these hints in UPDATE and DELETE statements will be removed in a future version, according to docs.
Even with NOLOCK, SQL Server can choose to override and obtain a lock, nevertheless. It is called a QUERY **HINT** for a reason. The sure fire way of avoiding locks would be to use SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED in the beginning of the session.
Your query will try to obtain a table level lock as you're asking for all the data. So yes it will be held for the duration of the query and all other processes trying to acquire Exclusive locks on that table with be put on wait queues. Processes that are also trying to read from that table will not be blocked.
Any diagnosis performed on a live system should done with care and using the NOLOCK hint will allow you to view data without creating any contention for users.
EDIT: As pointed out update locks are compatible with shared locks. So won't be blocked by the read process.
I just realized I've had a headache for years. Well, metaphorically speaking. In reality I was looking at my database structure and somehow just realized I never use transactions. Doh.
There's a lot of data on the internet about transactions (begin transaction, rollback, commit, etc.), but surprisingly not much detail about exactly why they are vital, and just exactly how vital?
I understand the concept of handling if something goes wrong. This made sense when one is doing multiple updates, for example, in multiple tables in one go, but this is bad practice as far as I know and I don't do this. All of my queries just update one table. If a query errors, it cancels, transaction or no transaction. What else could go wrong or potentially corrupt a one table update, besides my pulling the plug out of my server?
In other words, my question is,
exactly how vital is it that i implement transactions on all of my tables - I am fully blasphemous for not having them, or does it really matter that much?
UPDATE
+1 to invisal, who pointed out that queries are automatically wrapped as transactions, which I did not know. Pointed out multiple good references on the subject of my question.
This made a lot of sense when one is doing multiple updates, for
example, in multiple tables in one go. But basically all of my queries
just update one table at a time. If a query errors, it cancels,
transaction or no transaction.
In your case, it does nothing. A single statement has its own transaction itself. For more information you can read the existed question and answers:
What does a transaction around a single statement do?
Transaction necessary for single update query?
Do i need transaction for joined query?
Most important property of the database is to keep your data, reliably.
Database reliability is assured by conforming to ACID principles (Atomicity, Consistency, Isolation, Durability). In the context of databases, a single logical operation on the data is called a transaction. Without transactions, such reliability would not be possible.
In addition to reliability, using transactions properly lets you improve performance of some data operations considerably. For example, you can start transaction, insert a lot of data (say 100k rows), and only then commit. Server does not have to actually write to disk until commit is called, effectively batching data in memory. This allows to improve performance a lot.
You should be aware that every updating action against your database is performed inside a transaction, even if only 1 table (SQL server automatically creates a transaction for it).
The reason for always doing transactions is to ensure ACID as others have mentioned. Here I'd like to elaborate on the isolation point. Without transaction isolation, you may have problems with: read uncommitted, unrepeatable read, phantom read,..
it depends if you are updating one table and one row, then the only advantage is going to be in the logging... but if you update multiple row in a table at one time... without transactions you could still run into somecurruption
Well it depends, SQL is most of the times used for supporting data for some host languages like c, c++, java, php, c# and others. Well I have not worked with much technologies.. but if you are using following combinations then here is my point of view:
SQL with C / C++ : Commit Required
SQL with Java : Not Required
SQL with C# : Not Required
SQL with PHP : Not Required
And it also depends which SQL you are using. It would also depend from different flavors of SQL like Oracle SQL, SQL Server, SQLite, MySQL etc...
When you are using Oracle SQL in its console, like Oracle 11g, Oracle 10g etc... COMMIT is required.
And as far as corruption of table and data is concerned. YES it happens, I had a very bad experience with it. So, if you pull out your wire or something while you are updating in your table, then you might end up with a massive disaster.
Well concluding, I will suggest you to do commit.
How would I go about grabbing data from a table that is CONSTANTLY being inserted into (and needs to be) without causing any locking so that the inserts will continue unheeded.
I've looked around and found select with nolock option but, if I'm understanding correctly, this does not stop the lock creation, rather goes around current locks and grabs everything?
Thanks.
EDIT: This table will never be UPDATED, only INSERTS and SELECTS
As long as you don't mind getting dirty reads from your table this shouldn't be a problem for you. Make sure that the translation isolation level is set appropriately and that your calling code (if applicable) isn't using implicit transactions and you should be fine.
Microsoft's Transaction Isolation Docs:
http://msdn.microsoft.com/en-us/library/ms173763.aspx
NOLOCK is a common, and in my opinion, abused option when running into situations like this. Although it can help you overcome problems in high contention situations it can also cause difficult to track down bugs. Although this is something of an ongoing argument check out http://blogs.msdn.com/b/davidlean/archive/2009/04/06/sql-server-nolock-hint-other-poor-ideas.aspx for an idea of some of the risks with using hints like this.
You can use the NOLOCK hint when selecting from the table. There are some side effects like this (you can basically get a dirty read.)
NOLOCK issues no row locks in the query you add it to, and has no impact on the locks issued by other running queries. NOLOCK does issue a a Sch-S lock, Schema Stability lock, which isn't going to cause you a problem.
I believe you have misunderstood. select ... with (nolock) will not acquire any locks. That is to say, it will not block any other writes.
The downside seems to be that it will include uncommitted reads, so the result may not hold it the writing transaction rolls back.
You can use NOLOCK, but I would only recommend that in cases where you know that "dirty data" is acceptable (for example, a syslog database where you know data will never be altered or deleted once it's been inserted). The best way to do it is to SELECT from data that is NOT being locked; can you identify rows that aren't being affected by your insert? For example, if your data is being inserted with a CreateDate column defaulting to GETDATE(), make sure your queries pull data from BEFORE that point.
Of course, it all depends on how much data is being written and whether or not the insert statement is generating row or page or table locks...
One option not discussed here is to use replication. If you replicate the table in question and run your queries on the replicated database, you will not block inserts/updates. (In your case, I would use transactional replication - https://msdn.microsoft.com/en-us/library/ms151176.aspx).
Is it possible in relational databases for these two statements to deadlock? I'm trying to simplify my question and example -- please just assume that these selects, which I think would normally only require sharable read-locking, now require exclusive read locks:
Concurrent Connection 1:
SELECT {...}
FROM A
JOIN B ON {...}
Concurrent Connection 2:
SELECT {...}
FROM B
JOIN A ON {...}
That is, does the ordering of the joins matter? Are single statements in SQL atomic? Is A locked first and then B in the first statement and B locked first and then A in the second statement?
I think not - My gut tells me that two single statements like this cannot deadlock, no matter how complex. I believe that a statement is analyzed as a whole and that the resources requiring locking are locked using some deterministic global order (i.e. alphabetically). But I need more than a gut feeling on this - I can't think of a way to prove it and I can't find it documented.
I'm interested in MS SQL 2005, but I don't think the question is implementation specific.
Secondarily: As it relates to MS SQL, I'd also want to know that Common Table Expressions also have this guarantee - that CTEs are mostly a syntactic benefit (+recursion), consolidated into a traditional single statement by the engine.
SELECTs cannot deadlock with other SELECT, because they only acquire shared locks. You say that we should consider that these SELECTs now 'require exclusive read locks', but this is not possible for us to consider because 1) there is no such thing as an exlusive read lock and 2) reads don't acquire exclusive locks.
But you do pose a more general question, whether simple statements can deadlock. The answer is a definite, resounding YES. Locks are acquired at execution, not analyzed upfront and sorted then acquired in some order. It would be impossible for the engine to know upfront the needed locks because they depend on the actual data in on-disk, and to read the data the engine needs to ... lock the data.
Deadlocks between simple statements (SELECt vs. UPDATE or SELECT vs. DELETE) due to different index access order are quite common and very easy to investigate, diagnose and fix. But note that there is always a write operation involved, as reads cannot block each other. For this discussion, adding a UPDLOCK or XLOCK hint to a SELECT should be considered a write. You don't even need a JOIN, a secondary index may well introduce the access order problem leading to deadlock, see Read/Write Deadlock.
And finally, writing SELECT FROM A JOIN B or writing SELECT FROM B JOIN A is completely irrelevant. The query optimizer is free to rearrange the access order as it sees fit, the actual text of the query does not impose the order of execution in any way.
Updated
How then can we construct a general
strategy toward a READ COMMITTED
"multiple entity" database that
doesn't deadlock?
I'm afraid there is no cookie-cutter recipe. The solution will depend from case to case. Ultimately, in database applications deadlocks are a fact of life. I understand this may sound absurd, as in 'we landed on the Moon but we can't write a correct database application', but there are strong factors at play which pretty much guarantee that applications will eventually encounter deadlocks. Lucky deadlocks are the easiest to deal with errors, simple read again the state, apply the logic, re-write the new state. Now that being said, there are some good practices that can dramatically reduce the frequency of deadlocks, down to the point they are all but vanished:
Try to have a consistent access pattern for Writes. Have clearly defined rules stating things such as 'a transaction shall always tables in this order: Customers -> OrderHeaders -> OrderLines.' Note that the order has to be obeyed inside a transaction. Basically, rank all tables in your schema and specify that all updates must occur in ranking order. This eventually boils down to code discipline of the individual contributor writing the code, as it has to ensure it writes is update sin the proper order inside a transaction.
Reduce the duration of writes. The usual wisdom goes as this: at the beginning of the transaction do all the reads (read the existing state), then process the logic and compute new values, then write all updates at the end of transaction. Avoid a pattern like 'read->write->logic->read->write', instead do 'read->read->logic->write->write'. Of course, the true craftsmanship consist in how to deal with actual, real, individual cases when apparently one must have to do writes mid-transaction. A special note here must be said about a specific type of transaction: those driven by a queue, which by very definition start their activity by dequeueing (= a write) from the queue. These applications were always notoriously difficult to write and prone to errors (specially deadlocks), luckily there are ways to do it, see Using tables as Queues.
Reduce the amount of reads. Table scans are the most prevalent cause of deadlocks. Proper indexing will not only eliminate the deadlocks, but may also boost performance in the process.
Snapshot isolation. This is the closest thing you'll get to a free lunch in regard to avoiding deadlocks. I intentionally put it last, because it may mask other problems (like improper indexing) instead of fixing them.
Trying to solve this problem with a LockCustomerByXXX approach I'm afraid doesn't work. Pessimistic locking doesn't scale. Optimistic concurrency updates are the way to go if you want to have any sort of decent performance.
As far as I know, you are correct: the SQL engine figures out what it will need to do (probably as it parses the query), locks all required resources, executes the query, and then unlocks them.
Reads won't deadlock each other. You must have some write going on as well.
You can do things to reduce the number of deadlocks. For example, insert only at the end of a clustered index on platforms that support row locking and avoid updating records. Ah hah, now Facebook's UI makes more sense.
It's sometimes easier to handle the deadlocks than avoid them. The server will fail and report back, and you can retry.
I am encountering very infrequent yet annoying SQL deadlocks on a .NET 2.0 webapp running on top of MS SQL Server 2005. In the past, we have been dealing with the SQL deadlocks in the very empirical way - basically tweaking the queries until it work.
Yet, I found this approach very unsatisfactory: time consuming and unreliable. I would highly prefer to follow deterministic query patterns that would ensure by design that no SQL deadlock will be encountered - ever.
For example, in C# multithreaded programming, a simple design rule such as the locks must be taken following their lexicographical order ensures that no deadlock will ever happen.
Are there any SQL coding patterns guaranteed to be deadlock-proof?
Writing deadlock-proof code is really hard. Even when you access the tables in the same order you may still get deadlocks [1]. I wrote a post on my blog that elaborates through some approaches that will help you avoid and resolve deadlock situations.
If you want to ensure two statements/transactions will never deadlock you may be able to achieve it by observing which locks each statement consumes using the sp_lock system stored procedure. To do this you have to either be very fast or use an open transaction with a holdlock hint.
Notes:
Any SELECT statement that needs more than one lock at once can deadlock against an intelligently designed transaction which grabs the locks in reverse order.
Zero deadlocks is basically an incredibly costly problem in the general case because you must know all the tables/obj that you're going to read and modify for every running transaction (this includes SELECTs). The general philosophy is called ordered strict two-phase locking (not to be confused with two-phase commit) (http://en.wikipedia.org/wiki/Two_phase_locking ; even 2PL does not guarantee no deadlocks)
Very few DBMS actually implement strict 2PL because of the massive performance hit such a thing causes (there are no free lunches) while all your transactions wait around for even simple SELECT statements to be executed.
Anyway, if this is something you're really interested in, take a look at SET ISOLATION LEVEL in SQL Server. You can tweak that as necessary. http://en.wikipedia.org/wiki/Isolation_level
For more info, see wikipedia on Serializability: http://en.wikipedia.org/wiki/Serializability
That said -- a great analogy is like source code revisions: check in early and often. Keep your transactions small (in # of SQL statements, # of rows modified) and quick (wall clock time helps avoid collisions with others). It may be nice and tidy to do a LOT of things in a single transaction -- and in general I agree with that philosophy -- but if you're experiencing a lot of deadlocks, you may break the trans up into smaller ones and then check their status in the application as you move along. TRAN 1 - OK Y/N? If Y, send TRAN 2 - OK Y/N? etc. etc
As an aside, in my many years of being a DBA and also a developer (of multiuser DB apps measuring thousands of concurrent users) I have never found deadlocks to be such a massive problem that I needed special cognizance of it (or to change isolation levels willy-nilly, etc).
There is no magic general purpose solution to this problem that work in practice. You can push concurrency to the application but this can be very complex especially if you need to coordinate with other programs running in separate memory spaces.
General answers to reduce deadlock opportunities:
Basic query optimization (proper index use) hotspot avoidanant design, hold transactions for shortest possible times...etc.
When possible set reasonable query timeouts so that if a deadlock should occur it is self-clearing after the timeout period expires.
Deadlocks in MSSQL are often due to its default read concurrency model so its very important not to depend on it - assume Oracle style MVCC in all designs. Use snapshot isolation or if possible the READ UNCOMMITED isolation level.
I believe the following useful read/write pattern is dead lock proof given some constraints:
Constraints:
One table
An index or PK is used for read/write so engine does not resort to table locks.
A batch of records can be read using a single SQL where clause.
Using SQL Server terminology.
Write Cycle:
All writes within a single "Read Committed" transaction.
The first update in the transaction is to a specific, always-present record
within each update group.
Multiple records may then be written in any order. (They are "protected"
by the write to the first record).
Read Cycle:
The default read committed transaction level
No transaction
Read records as a single select statement.
Benefits:
Secondary write cycles are blocked at the write of first record until the first write transaction completes entirely.
Reads are blocked/queued/executed atomically between the write commits.
Achieve transaction level consistency w/o resorting to "Serializable".
I need this to work too so please comment/correct!!
As you said, always access tables in the same order is a very good way to avoid deadlocks. Furthermore, shorten your transactions as much as possible.
Another cool trick is to combine 2 sql statements in one whenever you can. Single statements are always transactional. For example use "UPDATE ... SELECT" or "INSERT ... SELECT", use "##ERROR" and "##ROWCOUNT" instead of "SELECT COUNT" or "IF (EXISTS ...)"
Lastly, make sure that your calling code can handle deadlocks by reposting the query a configurable amount of times. Sometimes it just happens, it's normal behaviour and your application must be able to deal with it.
In addition to consistent sequence of lock acquisition - another path is explicit use of locking and isolation hints to reduce time/resources wasted unintentionally acquiring locks such as shared-intent during read.
Something that none has mentioned (surprisingly), is that where SQL server is concerned many locking problems can be eliminated with the right set of covering indexes for a DB's query workload. Why? Because it can greatly reduce the number of bookmark lookups into a table's clustered index (assuming it's not a heap), thus reducing contention and locking.
If you have enough design control over your app, restrict your updates / inserts to specific stored procedures and remove update / insert privileges from the database roles used by the app (only explicitly allow updates through those stored procedures).
Isolate your database connections to a specific class in your app (every connection must come from this class) and specify that "query only" connections set the isolation level to "dirty read" ... the equivalent to a (nolock) on every join.
That way you isolate the activities that can cause locks (to specific stored procedures) and take "simple reads" out of the "locking loop".
Quick answer is no, there is no guaranteed technique.
I don't see how you can make any application deadlock proof in general as a design principle if it has any non-trivial throughput. If you pre-emptively lock all the resources you could potentially need in a process in the same order even if you don't end up needing them, you risk the more costly issue where the second process is waiting to acquire the first lock it needs, and your availability is impacted. And as the number of resources in your system grows, even trivial processes have to lock them all in the same order to prevent deadlocks.
The best way to solve SQL deadlock problems, like most performance and availability problems is to look at the workload in the profiler and understand the behavior.
Not a direct answer to your question, but food for thought:
http://en.wikipedia.org/wiki/Dining_philosophers_problem
The "Dining philosophers problem" is an old thought experiment for examining the deadlock problem. Reading about it might help you find a solution to your particular circumstance.