Clarification on DBMS Locking - locking

I'm taking an intro class on database management systems, and had a question that wasn't answered by my book. This question is not from my homework, I was just curious.
The textbook continually stresses that a transaction is one logical unit of work. However, when coming across the shared/exclusive locking modes, I got a little confused.
There was a diagram in the book that looked like this:
Time | Transaction Status
1 Request Lock
2 Receive Lock
3 Process transaction
4 Release Lock
5 Lock is released
Does the transaction get processed all at the same time, or does it get processed as individual locks are obtained?
If there are commands in two transactions that result in a shared lock as well as an exclusive lock, do those transactions run concurrently, or are they scheduled one after the other?

The answer is, as usual, "it depends" :-)
Generally speaking, you don't need to take out all your locks before you begin; however, you need to take out all your locks before you release any locks.
So you can do the following:
lock resource A
update A
lock resource B
update B
unlock A
unlock B
This allows you to be a bit friendlier to other transactions that may want to read B, and don't care about A, for example. It does introduce more risk -- you may be unable to acquire a lock on B, and decide to roll back your transaction. Them's the breaks.
You also want to always acquire all locks in the same order, so that you don't wind up in a deadlock (transaction 1 has A and wants B; trans 2 has B and wants A; standoff at high noon, no one wins. If you enforce consistent order, trans 2 will try to get A before B and either wait, letting trans 2 proceed, or fail, if trans 1 already started -- either way, no deadlock).
Things get more interesting when you have intent-to-exclude locks -- locks that are taken as shared with an "option" to make them exclusive. This might be covered somewhere in the back of your book :-)

In practice each operation aquires the needed lock before it proceeds. A SELECT will first aquire a shared lock on a row, then read the row. An UPDATE will first acquire an exclusive lock on that row, then update the row. In theory you can say that 'locks are aquired, then the transaction processes', but in real life is it each individual operation in the transaction that knows what locks are required.

If it needs an exclusive lock, it will either block the other transaction or it will wait for the other transaction to finish before obtaining the lock.
Things that need exclusive locks (UPDATE/DELETE/etc) can't happen while anything else is accessing the data.

in general locks are determined at run time. When the BEGIN TRANSACTION command is processed, nothing has run in the transaction yet, so there are no locks. As commands execute in the transaction locks are acquired.

"If there are commands in two transactions that result in a shared lock as well as an exclusive lock, do those transactions run concurrently, or are they scheduled one after the other?"
A lock does not consist solely of the notion "shared/exclusive". The most important thing about a lock is the resource that it applies to.
Two transactions that each hold an exclusive lock on distinct resources (say, two separate tables, or two separate partitions, or two separate pages, or two separate rows, or two separate printers, or two separate IP ports, ...) can continue to run concurrently without any problem.
Transaction serialization only becomes necessary when a transaction requests a lock on some resource, where the sharing mode of that lock is incompatible with a lock held on the same resource by some other transaction.
If your textbook really gives the sequence of events as you state, then throw it away. Lock requests emerge as the transaction is being processed, and there is no definitive and final way for the transaction processor to know at the start of the transaction which locks it will be needing (otherwise deadlocking would be a nonexistant problem).

Related

How to prevent locks in redshift. ( Shared lock stopping a write job)

I have a data warehouse which are used by multiple downstream users. They read the data from the redshift table. When they read the data, there is a shared lock enforced on the table. At that time, my daily job which is supposed to write on the table does not write as it cannot put an exclusive lock until the shared lock is clear.
Ideally my write job should take priority over any other read job. Can I enforce this is some way?
Usually this is done by your update process not requiring an exclusive lock or managing the need for locks so that the update process isn't blocked.
Can you describe your update process and which steps are requiring the exclusive locks?
Look at the locks and statements causing them when things are making forward progress. Reworking these parts should allow you to keep you updates moving while these read sessions are acting on the versions of data they started with.
It is also important to not have user transactions that hang around for days on end. This can happen when interactive sessions are just left open mid transaction. The also prevents errors due to some sessions seeing very old versions of data.

Why do deadlocks happen in SQL Server?

So as I understand it, SQL deadlocks happen when a SPID is busy processing another query and it can't be bothered to run another one because it's so busy right now. The SQL Server "randomly" picks one of the queries to deadlock out of the resources asked for and fails it out, throwing an exception.
I have an app running ~ 40 instances and a back-end Windows Service, all of which are hitting the same database. I'm looking to reduce deadlocks so I can increase the number of threads I can runs simultaneously.
Why can't SQL Server just enqueue the new query and run it when it has time and the resources are available? Most of what I'm doing can wait a few seconds on occasion.
Is there a way to set Transaction Isolation Level globally without having to specify it at the onset of each new connection/session?
Your understanding of deadlocks is not correct. What you've described is blocking. It's a common mistake to equate the two.
A deadlock occurs when two separate transactions each want different resources and neither will release the one that they have so that the other can run. It's probably easier to illustrate:
SPID #1 gets a lock on resource A
SPID #2 gets a lock on resource B
SPID #1 now needs a lock on resource B in order to complete
SPID #2 now needs a lock on resource A in order to complete
SPID #1 can't complete (and therefor release resource A) because SPID #2 has it
SPID #2 can't complete (and therefor release resource B) because SPID #1 has it
Since neither SPID can complete one has to give up (i.e. be chosen by the server as the deadlock victim) and will fail.
The best way to avoid them is to keep your transactions small (in number of resources needed) and quick.
Deadlock is where two threads of processing are both being held up by the other ( it can be more, but two is sufficiently complex ). So one thread locks a table, then requests a lock on another table. the other table is locked by the second thread, which cannot progress because it is waiting for a lock on the first table.
The reason that one of these has to be thrown out is that in a deadlock, they will never end - neither thread can progress at all. The only answer is for one to be stopped to allow the other to complete.
The solution to reducing deadlocks in the sort of situation you are talking about may be to redesign the solution. If you can make sure that less locking occurs, you will have less deadlocks.
Deadlocks occurs because, two concurrent transactions may overlap e lock different resources, both required by the other transaction to finish.
Let's imagine:
1 - Transaction A locks row1
2 - Transaction B locks row2
3 - Transaction A tries to lock row1, and, because of the previous lock, SQL server waits
4 - Transaction B tries to lock row2, and, because of the previous lock, SQL server waits
So, SQL server must choose on transaction, kill it, and allow the other to continue.
This image ilustrates this situation very well: http://www.eupodiatamatando.com/wp-content/uploads/2008/01/deadlocknajkcomafarialibh3.jpg

Can I open an stoppable transaction with SQL Server?

I'm looking for something similiar to an SQL transaction. I need the usual protections that transactions provide, but I don't want it to slow down anyone else.
Imagine client A connects to the DB and runs these commands:
BEGIN TRAN
SELECT (something)
(Wait a few seconds maybe.)
UPDATE (something)
COMMIT
Inbetween the SELECT and the UPDATE, client B comes along and attempts to do a query, that under normal circumstances, would end up having to wait for A to COMMIT.
What I'd like is for client A to open it's transaction in such a way that should B come along and perform it's query, client A will find it's transaction immediately rolled back and it's subsequent commands failing. Client B would only experience minimal delay.
(Note that the SELECT and UPDATE are simply illustrative commands.)
Update...
I've got a high priority task (client B) that sometimes (once a month-ish) gets an SQL timeout error, and a low priority task (client A) with a transaction which causes that timeout. I'd rather that the low priority task fails and is reattempted in the next cycle.
I ended up fixing this problem by eliminating the transactions entirely and replacing them with an informal set of flags. The queries were refactored to only do something if the right set of flags are raised and I added something that cleared up abandoned records that the rollback would have cleared in the past.
I fixed my transaction issues by eliminating transactions.
Using SNAPSHOT isolation level will prevent B from blocking. B will see data in the state they were before A issued BEGIN TRANSACTION. Unless B modifies data, they will never block each other.
While not a transaction at all, Optimistic Concurrency may be useful -- it is used by default in LINQ2SQL, etc.
The general idea is that the data is read -- modifications can be independently made -- and then the data written back with a "check" (this is loosely comparable to a Compare and Swap). If the check fails it is up the application to decide what to do (restart the process, proceed anyway, fail).
This naturally doesn't work for all scenarios and may not detect a number of interactions, such as new items added between the "read" and "write". Both the actual read and write can be in separate transactions with the appropriate isolation level; the separate transactions may allow additional transactions to be interleaved.
Of course, depending upon the exact problem and interactions... different isolation levels and/or finer grained locking may be sufficient.
Happy coding.
That is back to front.
You can't have later clients aborting earlier transactions: that's chaos.
You can have snapshot isolation so that client B has a consistent view and isn't blocked (mostly) by client A. Also Wikipedia for more general stuff
Perhaps describe your problem more fully so we can offer suggestions for that...
One thing that I've seen used (but I'm afraid that I don't have any code handy for it) is having transaction A spawn another process which then monitors the transaction. If it sees any blocks caused by the transaction then it immediately issues a KILL to the spid.
If I can find the code for this then I'll add it here.

distribution transaction lock in oracle database

I have some question around transaction lock in oracle database. What I have found out so far is that:
Cause: The time to wait on a lock in a distributed transaction has been exceeded. This time is specified in the initialization parameter DISTRIBUTED_LOCK_TIMEOUT.
Action: This situation is treated as a deadlock and the statement was rolled back. To set the time-out interval to a longer interval, adjust the initialization parameter DISTRIBUTED_LOCK_TIMEOUT, then shut down and restart the instance.
Some other things that I want to know in more details are things like:
It is mentioned that a lock in 'distributed transaction' happened. So what kind of database operation that can cause this ? Updating a record ? Selecting a record ?
What does 'Distributed' means anyway. I have seen this term coined all over the place, but I can't seem to deduce what it means.
What can we do to reduce instances of such lock ?
A distributed transaction means that you had a transaction that had two different participants. If you are using PL/SQL, that generally implies that there are multiple databases involved. But it may simply indicate that an application is using an external transaction coordinator in its interactions with the database. A J2EE application, for example, might want to create a distributed transaction that covers both issuing SQL statements against a database to move $100 from account A to account B as well as the application server action of creating a JMS message for this transaction that would eventually cause an email notification of the transfer to be sent. In this case, the application wants to ensure that the state of the middle tier matches the state of the back end.
Distributed transactions are not free. They involve potentially quite a bit of additional overhead because, at a minimum, you need to use the two-phase commit protocol to verify that all the components that are part of the distributed transaction are ready to commit and to verify that they all did commit. That involves sending a number of network packets which can be a significant fraction of the time an OLTP transaction is waiting. Distributed transactions also cause administrative issues because you end up with cases where one participant's transaction fails after it indicated it was ready to commit or a transaction coordinator failing while various participants have open transactions.
So the first question would be whether your application actually needs distributed transactions. Sometimes, developers find that they are accidentally requesting distributed transactions when they really aren't necessary. If you're not sure what a distributed transaction is, it's entirely possible that you don't really need them.
There is a guide here that will walk you through the steps to simulate an ORA-02049: timeout: distributed transaction waiting for lock if you want a better understanding of one of its causes:

Deadlock error in INSERT statement

We've got a web-based application. There are time-bound database operations (INSERTs and UPDATEs) in the application which take more time to complete, hence this particular flow has been changed into a Java Thread so it will not wait (block) for the complete database operation to be completed.
My problem is, if more than 1 user comes across this particular flow, I'm facing the following error thrown by PostgreSQL:
org.postgresql.util.PSQLException: ERROR: deadlock detected
Detail: Process 13560 waits for ShareLock on transaction 3147316424; blocked by process 13566.
Process 13566 waits for ShareLock on transaction 3147316408; blocked by process 13560.
The above error is consistently thrown in INSERT statements.
Additional Information:
1) I have PRIMARY KEY defined in this table.
2) There are FOREIGN KEY references in this table.
3) Separate database connection is passed to each Java Thread.
Technologies
Web Server: Tomcat v6.0.10
Java v1.6.0
Servlet
Database: PostgreSQL v8.2.3
Connection Management: pgpool II
One way to cope with deadlocks is to have a retry mechanism that waits for a random interval and tries to run the transaction again. The random interval is necessary so that the colliding transactions don't continuously keep bumping into each other, causing what is called a live lock - something even nastier to debug. Actually most complex applications will need such a retry mechanism sooner or later when they need to handle transaction serialization failures.
Of course if you are able to determine the cause of the deadlock it's usually much better to eliminate it or it will come back to bite you. For almost all cases, even when the deadlock condition is rare, the little bit of throughput and coding overhead to get the locks in deterministic order or get more coarse-grained locks is worth it to avoid the occasional large latency hit and the sudden performance cliff when scaling concurrency.
When you are consistently getting two INSERT statements deadlocking it's most likely an unique index insert order issue. Try for example the following in two psql command windows:
Thread A | Thread B
BEGIN; | BEGIN;
| INSERT uniq=1;
INSERT uniq=2; |
| INSERT uniq=2;
| block waiting for thread A to commit or rollback, to
| see if this is an unique key error.
INSERT uniq=1; |
blocks waiting |
for thread B, |
DEADLOCK |
V
Usually the best course of action to resolve this is to figure out the parent objects that guard all such transactions. Most applications have one or two of primary entities, such as users or accounts, that are good candidates for this. Then all you need is for every transaction to get the locks on the primary entity it touches via SELECT ... FOR UPDATE. Or if touches several, get locks on all of them but in the same order every time (order by primary key is a good choice).
What PostgreSQL does here is covered in the documentation on Explicit Locking. The example in the "Deadlocks" section shows what you're probably doing. The part you may not have expected is that when you UPDATE something, that acquires a lock on that row that continues until the transaction involved ends. If you have multiple clients all doing updates of more than one thing at once, you'll inevitably end up with deadlocks unless you go out of your way to prevent them.
If you have multiple things that take out implicit locks like UPDATE, you should wrap the whole sequence in BEGIN/COMMIT transaction blocks, and make sure you're consistent about the order they acquire locks (even the implicit ones like what UPDATE grabs) at everywhere. If you need to update something in table A then table B, and one part of the app does A then B while the other does B then A, you're going to deadlock one day. Two UPDATEs against the same table are similarly destined to fail unless you can enforce some ordering of the two that's repeatable among clients. Sorting by primary key once you have the set of records to update and always grabbing the "lower" one first is a common strategy.
It's less likely your INSERTs are to blame here, those are much harder to get into a deadlocked situation, unless you violate a primary key as Ants already described.
What you don't want to do is try and duplicate locking in your app, which is going to turn into a giant scalability and reliability mess (and will likely still result in database deadlocks). If you can't work around this within the confines of the standard database locking methods, consider using either the advisory lock facility or explicit LOCK TABLE to enforce what you need instead. That will save you a world of painful coding over trying to push all the locks onto the client side. If you have multiple updates against a table and can't enforce the order they happen in, you have no choice but to lock the whole table while you execute them; that's the only route that doesn't introduce a potential for deadlock.
Deadlock explained:
In a nutshell, what is happening is that a particular SQL statement (INSERT or other) is waiting on another statement to release a lock on a particular part of the database, before it can proceed. Until this lock is released, the first SQL statement, call it "statement A" will not allow itself to access this part of the database to do its job (= regular lock situation). But... statement A has also put a lock on another part of the database to ensure that no other users of the database access (for reading, or modifiying/deleting, depending on the type of lock). Now... the second SQL statement, is itself in need of accessing the data section marked by the lock of Statement A. That is a DEAD LOCK : both Statement will wait, ad infinitum, on one another.
The remedy...
This would require to know the specific SQL statement these various threads are running, and looking in there if there is a way to either:
a) removing some of the locks, or changing their types.
For example, maybe the whole table is locked, whereby only a given row, or
a page thereof would be necessary.
b) preventing multiple of these queries to be submitted at a given time.
This would be done by way of semaphores/locks (aka MUTEX) at the level of the
multi-threading logic.
Beware that the "b)" approach, if not correctly implemented may just move the deadlock situation from within SQL to within the program/threads logic. The key would be to only create one mutex to be obtained first by any thread which is about to run one of these deadlock-prone queries.
Your problem, probably, is the insert command is trying to lock one or both index and the indexes is locked for the other tread.
One common mistake is lock resources in different order on each thread. Check the orders and try to lock the resources in the same order in all threads.