My question is this:
Say I have a Transaction Manager and 2 Resource Managers.
TM tells RMs to prepare.
RMs acknowledge they are prepared/vote yes.
TM tells RMs to commit.
RM 1 commits and acknowledges commit.
RM 2 never gets the commit message because of network failure.
In this scenario I know that RM 2 is sitting in a waiting state, then the session times out in the database and is put into in-doubt state.
If the TM does not reconnect with the RM before the AbandonTimeout is exceeded, then the transaction is abandoned.
My question is, what happens to the global transaction while the TM continues to attempt recovery of the RM?
Does the TM send back an exception to the application when it starts trying the recovery?
Does the TM send back success even though one of the RMs never sent an acknowledgement?
The AbandonTimeout is default of 24hours. Does the TM hold the transaction for 24 hours and then once the timeout is reached, send back an exception?
In this link 2 phase Commit the end of phase two states:
The coordinator sends a commit message to all the cohorts.
Each cohort completes the operation, and releases all the locks and resources held during the transaction.
Each cohort sends an acknowledgment to the coordinator.
The coordinator completes the transaction when all acknowledgments have been received.
So what happens to the global transaction if the acknowledgement of the commit is never received?
I cannot find anything surrounding the resolution of a global transaction during a recovery operation. Any help would be appreciated.
Thanks,
Matt
Only when all the participants return ok, the transaction will be returned to the database as committed. If the TM cannot reconnect it will stay as in doubt, potentially locking database pages (This generally requires manual cleanup).
Depending on timeout settings, the client application can receive errors. Some db systems like oracle allow to simulate different error conditions. The following link describes that http://docs.oracle.com/cd/B28359_01/server.111/b28310/ds_txnman009.htm#ADMIN12285
Related
There's one thing I don't get when reading redis transaction docs:
All the commands in a transaction are serialized and executed
sequentially. It can never happen that a request issued by another
client is served in the middle of the execution of a Redis
transaction. This guarantees that the commands are executed as a
single isolated operation.
From: https://redis.io/topics/transactions
The fragment in bold is the one that bugs me. Does it mean that when setting value of the key A in one request, another request that wants to set a value for the key B will be blocked until the first one finishes?
No, opening a transaction with MULTI does not block other concurrent connections to redis.
https://redis.io/topics/transactions#usage
A Redis transaction is entered using the MULTI command. The command always replies with OK. At this point the user can issue multiple commands. Instead of executing these commands, Redis will queue them. All the commands are executed once EXEC is called.
Your commands inside of a transaction are collected until you close the transaction. Afterwards the commends will be executed all at one without interruption. This means other connections can still execute their commands before commands of the still open transaction if they are submitted afterwards, while they are still submitted before closing the transaction.
Scenario:
We have a wcf workflow with a client that does NOT use transactionflow.
The workflow contains several sequential TransactedReceiveScopes (using content-based correlation).
The TransactedReceiveScopes contain custom db operations.
Observations:
When we run SQL profiler against the first call, we see all the custom db calls, and the SaveInstance call in the profile trace.
We've noticed that, even though the SendReply is at the very end of TransactedReceiveScope, sometimes the sendreply occurs a good 10 seconds before the transaction gets committed.
We tried changing the TimeToPersist and TimeToUnload to zero, but that had no effect. (The trace shows the SaveInstance happening immediately anyway, but rather the commit seems to be delayed).
Questions:
Are our observations correct?
At what point is the transaction committed? Is this like garbage collection - i.e. it commits some time later when it's not busy?
Is there any way to control the commit delay, or is the only way to do this to use transactionflow from the client (anc then it should all commit when the client commits, including the persist).
The TransactedReceiveScope commits the transaction when the body is completed but as all execution is done through the scheduler that could be some time later. It is not related to garbage collection and there is no real way to influence it other that to avoid a busy machine and a lot of other parallel activities that could also be in the execution queue.
I have application code that inserts a record in a local database and a record in a remote database (via Oracle Database Link). When I commit this distributed transaction is it guaranteed that both the local and remote databases will both commit or both rollback, or is there a chance that the remote one could commit but the local commit would fail (or vice versa)?
I'd be astonished if Oracle does not use the equivalent of the Two-Phase Commit (2PC) protocol, which ensures that either both commit or both rollback.
With 2PC, there is a stage called the pre-commit phase where the master (coordinator) instance records its own decision and tells all participants to get ready to commit (and report their status - must fail, or can commit). The participants also get ready to commit, and (if they can commit) await further instructions from the coordinator after telling the coordinator they are ready to commit. When all the participants have responded, the coordinator records the final decision, and sends that decision to the participants, and acts on its decision. If the master fails after recording the decision but before successfully sending the decision to the participants, the participants can be hung up in a state where they can neither commit nor rollback. There are ways to recover from that. If the coordinator stays down long enough (for example, is taken out of service as a result of catastrophic h/w failure) you can end up with problems; the participants end up doing a heuristic rollback (presumed rollback), typically - but this requires astonishingly bad luck to cause any trouble.
There are alternatives to 2PC; the net result is the same - all commit or all rollback.
Does NServiceBus automatically attempt to redeliver messages if handling fails? And if it does, is there a limit in the number of times delivery can be attempted?
NSB will enlist in a distributed transaction and if it fails it will retry the configured number of times. Look at the MsmqTransport config section.
EDIT: A distributed transaction begins as soon as you peek or receive a message from MSMQ. All of the work you do in a message handler will be included in the transaction and it is governed by the Distributed Transaction Coordinator. The DTC will also include things like DB transactions if you are updating DBs and so on.
If say a DB update fails, the whole thing rolls back and the message is put back on the queue.
I have some question around transaction lock in oracle database. What I have found out so far is that:
Cause: The time to wait on a lock in a distributed transaction has been exceeded. This time is specified in the initialization parameter DISTRIBUTED_LOCK_TIMEOUT.
Action: This situation is treated as a deadlock and the statement was rolled back. To set the time-out interval to a longer interval, adjust the initialization parameter DISTRIBUTED_LOCK_TIMEOUT, then shut down and restart the instance.
Some other things that I want to know in more details are things like:
It is mentioned that a lock in 'distributed transaction' happened. So what kind of database operation that can cause this ? Updating a record ? Selecting a record ?
What does 'Distributed' means anyway. I have seen this term coined all over the place, but I can't seem to deduce what it means.
What can we do to reduce instances of such lock ?
A distributed transaction means that you had a transaction that had two different participants. If you are using PL/SQL, that generally implies that there are multiple databases involved. But it may simply indicate that an application is using an external transaction coordinator in its interactions with the database. A J2EE application, for example, might want to create a distributed transaction that covers both issuing SQL statements against a database to move $100 from account A to account B as well as the application server action of creating a JMS message for this transaction that would eventually cause an email notification of the transfer to be sent. In this case, the application wants to ensure that the state of the middle tier matches the state of the back end.
Distributed transactions are not free. They involve potentially quite a bit of additional overhead because, at a minimum, you need to use the two-phase commit protocol to verify that all the components that are part of the distributed transaction are ready to commit and to verify that they all did commit. That involves sending a number of network packets which can be a significant fraction of the time an OLTP transaction is waiting. Distributed transactions also cause administrative issues because you end up with cases where one participant's transaction fails after it indicated it was ready to commit or a transaction coordinator failing while various participants have open transactions.
So the first question would be whether your application actually needs distributed transactions. Sometimes, developers find that they are accidentally requesting distributed transactions when they really aren't necessary. If you're not sure what a distributed transaction is, it's entirely possible that you don't really need them.
There is a guide here that will walk you through the steps to simulate an ORA-02049: timeout: distributed transaction waiting for lock if you want a better understanding of one of its causes: