I have application code that inserts a record in a local database and a record in a remote database (via Oracle Database Link). When I commit this distributed transaction is it guaranteed that both the local and remote databases will both commit or both rollback, or is there a chance that the remote one could commit but the local commit would fail (or vice versa)?
I'd be astonished if Oracle does not use the equivalent of the Two-Phase Commit (2PC) protocol, which ensures that either both commit or both rollback.
With 2PC, there is a stage called the pre-commit phase where the master (coordinator) instance records its own decision and tells all participants to get ready to commit (and report their status - must fail, or can commit). The participants also get ready to commit, and (if they can commit) await further instructions from the coordinator after telling the coordinator they are ready to commit. When all the participants have responded, the coordinator records the final decision, and sends that decision to the participants, and acts on its decision. If the master fails after recording the decision but before successfully sending the decision to the participants, the participants can be hung up in a state where they can neither commit nor rollback. There are ways to recover from that. If the coordinator stays down long enough (for example, is taken out of service as a result of catastrophic h/w failure) you can end up with problems; the participants end up doing a heuristic rollback (presumed rollback), typically - but this requires astonishingly bad luck to cause any trouble.
There are alternatives to 2PC; the net result is the same - all commit or all rollback.
Related
Let's say I need to run some queries AND send an email in an atomic way.
A typical example is for a user sign-up form, I need to create the user and send the welcome email.
I can use a transaction :
begin transaction
create user
if creation failed, rollback transaction, bailout
send email
if email sending failed, rollback transaction, bailout
commit transaction
but, in PostgreSQL, a commit can fail (for example when using DEFERRED constraints).
So the solution would be to use a two-phase commit :
begin transaction
create user
PREPARE TRANSACTION
if creation failed, rollback transaction, bailout
send email
if email sending failed, rollback transaction, bailout
COMMIT PREPARED // this one is guaranteed to work by Postgres
but the Postgres doc says :
It is unwise to leave transactions in the prepared state for a long time.
Moreover :
The intended usage of the feature is that a prepared transaction will normally be committed or rolled back as soon as an external transaction manager has verified that other databases are also prepared to commit.
So,
Is sending an email too much time ?
What problems could arise if we do so ?
What would be an acceptable timeout if it takes too long ?
Are database locks different in prepared state than in idle state (ie. before the commit) ?
Sending an email is not transactional. It is possible you will never know whether it was successfully delivered or not and clearly "forever" is too long to hold onto a prepared transaction.
You will want to structure the table so that partially created users can be committed, but still known to be unconfirmed. Like with a column indicating as much. That way troubleshooters can actually see the partially created user to decide what to do about them. With prepared transactions, the semi-committed rows are invisible, so no one can figure out what is going on.
To put it a somewhat different way, to use two phase commit effectively, you need to have a transaction manager. Do you? You didn't describe one. Are you planning to write your own? Do you know how much work that will be?
My question is this:
Say I have a Transaction Manager and 2 Resource Managers.
TM tells RMs to prepare.
RMs acknowledge they are prepared/vote yes.
TM tells RMs to commit.
RM 1 commits and acknowledges commit.
RM 2 never gets the commit message because of network failure.
In this scenario I know that RM 2 is sitting in a waiting state, then the session times out in the database and is put into in-doubt state.
If the TM does not reconnect with the RM before the AbandonTimeout is exceeded, then the transaction is abandoned.
My question is, what happens to the global transaction while the TM continues to attempt recovery of the RM?
Does the TM send back an exception to the application when it starts trying the recovery?
Does the TM send back success even though one of the RMs never sent an acknowledgement?
The AbandonTimeout is default of 24hours. Does the TM hold the transaction for 24 hours and then once the timeout is reached, send back an exception?
In this link 2 phase Commit the end of phase two states:
The coordinator sends a commit message to all the cohorts.
Each cohort completes the operation, and releases all the locks and resources held during the transaction.
Each cohort sends an acknowledgment to the coordinator.
The coordinator completes the transaction when all acknowledgments have been received.
So what happens to the global transaction if the acknowledgement of the commit is never received?
I cannot find anything surrounding the resolution of a global transaction during a recovery operation. Any help would be appreciated.
Thanks,
Matt
Only when all the participants return ok, the transaction will be returned to the database as committed. If the TM cannot reconnect it will stay as in doubt, potentially locking database pages (This generally requires manual cleanup).
Depending on timeout settings, the client application can receive errors. Some db systems like oracle allow to simulate different error conditions. The following link describes that http://docs.oracle.com/cd/B28359_01/server.111/b28310/ds_txnman009.htm#ADMIN12285
When working with database transactions, what are the possible conditions (if any) that would cause the final COMMIT statement in a transaction to fail, presuming that all statements within the transaction already executed without issue?
For example... let's say you have some two-phase or three-phase commit protocol where you do a bunch of statements, then wait for some master process to tell you when it is ok to finally commit the transaction:
-- <initial handshaking stuff>
START TRANSACTION;
-- <Execute a bunch of SQL statements>
-- <Inform master of readiness to commit>
-- <Time passes... background transactions happening while we wait>
-- <Receive approval to commit from master (finally!)>
COMMIT;
If your code gets to that final COMMIT statement and sends it to your DBMS, can you ever get an error (uniqueness issue, database full, etc) at that statement? What errors? Why? How do they appear? Does it vary depending on what DBMS you run?
COMMIT may fail. You might have had sufficent resources to log all the changes you wished to make, but lack resources to actually implement the changes.
And that's not considering other reasons it might fail:
The change itself might not fit the constraints of the database.
Power loss stops things from completing.
The level of requested selection concurrency might disallow an update (cursors updating a modified table, for example).
The commit might time out or be on a connection which times out due to starvation issues.
The network connection between the client and the database may be lost.
And all the other "simple" reasons that aren't on the top of my head.
It is possible for some database engines to defer UNIQUE index constraint checking until COMMIT. Obviously if the constraint does not hold true at the time of commit then it will fail.
Sure.
In a multi-user environment, the COMMIT may fail because of changes by other users (e.g. your COMMIT would violate a referential constraint when applied to the now current database...).
Thomas
If you're using two-phase commit, then no. Everything that could go wrong is done in the prepare phase.
There could still be network outage, power less, cosmic rays, etc, during the commit, but even so, the transactions will have been written to permanent storage, and if a commit has been triggered, recovery processes should carry them through.
Hopefully.
Certainly, there could be a number of issues. The act of committing, in and of itself, must make some final, permanent entry to indicate that the transaction committed. If making that entry fails, then the transaction can't commit.
As Ignacio states, there can be deferred constraint checking (this could be any form of constraint, not just unique constraint, depending on the DBMS engine).
SQL Server Specific: flushing FILESTREAM data can be deferred until commit time. That could fail.
One very simple and often overlooked item: hardware failure. The commit can fail if the underlying server dies. This might be disk, cpu, memory, or even network related.
The transaction could fail if it never receives approval from the master (for any number of reasons).
No matter how wonderfully a system may be designed, there is going to be some possibility that a commit will get into a situation where it's impossible to know whether it succeeded or not. In some cases, it may not matter (e.g. if a hard drive holding the database turns into a pile of slag, it may be impossible to tell whether the commit succeeded or not before that occurred but it wouldn't really matter); in others cases, however, this could be a problem. Especially with distributed database systems, if a connection failure occurs at just the right time during a commit, it will be impossible for both sides to be certain of whether the other side is expecting a commit or a rollback.
With MySQL or MariaDB, when used with Galera clustering, COMMIT is when the other nodes in the cluster are checked. So, yes important errors can be discovered by COMMIT, and you must check for these errors.
I have some question around transaction lock in oracle database. What I have found out so far is that:
Cause: The time to wait on a lock in a distributed transaction has been exceeded. This time is specified in the initialization parameter DISTRIBUTED_LOCK_TIMEOUT.
Action: This situation is treated as a deadlock and the statement was rolled back. To set the time-out interval to a longer interval, adjust the initialization parameter DISTRIBUTED_LOCK_TIMEOUT, then shut down and restart the instance.
Some other things that I want to know in more details are things like:
It is mentioned that a lock in 'distributed transaction' happened. So what kind of database operation that can cause this ? Updating a record ? Selecting a record ?
What does 'Distributed' means anyway. I have seen this term coined all over the place, but I can't seem to deduce what it means.
What can we do to reduce instances of such lock ?
A distributed transaction means that you had a transaction that had two different participants. If you are using PL/SQL, that generally implies that there are multiple databases involved. But it may simply indicate that an application is using an external transaction coordinator in its interactions with the database. A J2EE application, for example, might want to create a distributed transaction that covers both issuing SQL statements against a database to move $100 from account A to account B as well as the application server action of creating a JMS message for this transaction that would eventually cause an email notification of the transfer to be sent. In this case, the application wants to ensure that the state of the middle tier matches the state of the back end.
Distributed transactions are not free. They involve potentially quite a bit of additional overhead because, at a minimum, you need to use the two-phase commit protocol to verify that all the components that are part of the distributed transaction are ready to commit and to verify that they all did commit. That involves sending a number of network packets which can be a significant fraction of the time an OLTP transaction is waiting. Distributed transactions also cause administrative issues because you end up with cases where one participant's transaction fails after it indicated it was ready to commit or a transaction coordinator failing while various participants have open transactions.
So the first question would be whether your application actually needs distributed transactions. Sometimes, developers find that they are accidentally requesting distributed transactions when they really aren't necessary. If you're not sure what a distributed transaction is, it's entirely possible that you don't really need them.
There is a guide here that will walk you through the steps to simulate an ORA-02049: timeout: distributed transaction waiting for lock if you want a better understanding of one of its causes:
I am trying to understand the ACID property of database transction: how they are achieved; which part is atomicity and which part is durability ,etc.
Let's say I have a transction with two actions,A and B. Unfortunately, system powered off when performing action B. After a system reset, we know the database will preserve (through the rollback jounery in sqlite) the state before performing action A. So, which ACID property does this show, atomicity or durability?
Another case :Suppose when performing action B , an error happened and was notified to the applicaiton , and the application rollback. I consider this is as pure atomicity which is achieved by the user but not by the database engine. Am I correct?
Both examples highlight atomicity: either both A and B are committed, or neither.
Durability is a property that comes into the picture only after the transaction is committed. The application can rest assured that if the COMMIT call succeeded, then is durable. A system reset or a power off will not revert the effect of a committed transaction, hence its durability.