Two phase commit: what happens if the coordinator dies between sending two confirmations - distributed-algorithm

I am trying to understand how two phase commit protocol works and I hit an issue that is unclear to me.
Let's say that the coordinator sent two requests for commit and both recipients acknowledged. Now it starts sending commit requests, but between the first and second request the coordinator has a failure. This means that the first recipient will commit, while the second one won't. Doesn't it leave the distributed system in an inconsistent state? How is such a thing solved?

https://www.cs.rutgers.edu/~pxk/417/notes/content/transactions.html
Two-phase commit is not fault tolerant because it uses a single coordinator whose failure can cause the protocol to block.

Related

RabbitMQ and Delivery Guarantees in Distributed Database Transaction

I am trying to understand what is the right pattern to deal with RabbitMQ deliveries in the context of distributed database transaction.
To make this simple, I will illustrate my ideas in pseudocode, but I'm in fact using Spring AMQP to implement these ideas.
Anything like
void foo(message) {
processMessageInDatabaseTransaction(message);
sendMessageToRabbitMQ(message);
}
Where by the time we reach sendMessageToRabbitMQ() the processMessageInDatabaseTransaction() has successfully committed its changes to the database, or an exception has been thrown before reaching the message sending code.
I know that for the sendMessageToRabbitMQ() I can use Rabbit transactions or publisher confirms to guarantee that Rabbit got my message.
My interest is understanding what should happen when things go south, i.e. when the database transaction succeeded, but the confirmation does not arrive after certain amount of time (with publisher confirms) or the Rabbit transaction fails to commit (with Rabbit transaction).
Once that happens, what is the right pattern to guarantee delivery of my message?
Of course, having developed idempotent consumers, I have considered that I could retry the sending of the messages until Rabbit confirms success:
void foo(message) {
processMessageInDatabaseTransaction(message);
retryUntilSuccessFull {
sendMessagesToRabbitMQ(message);
}
}
But this pattern has a couple of drawbacks I dislike, first, if the failure is prolonged, my threads will start to block here and my system will eventually become unresponsive. Second, what happens if my system crashes or shuts down? I will never deliver these messages then since they will be lost.
So, I thought, well, I will have to write my messages to the database first, in pending status, and then publish my pending messages from there:
void foo(message) {
//transaction commits leaving message in pending status
processMessageInDatabaseTransaction(message);
}
#Poller(every="10 seconds")
void bar() {
for(message in readPendingMessagesFromDbStore()) {
sendPendingMessageToRabbitMQ(message);
if(confirmed) {
acknowledgeMessageInDatabase(message);
}
}
}
Possibly sending the messages multiple times if I fail to acknowledge the message in my database.
But now I have introduced other problems:
The need to do I/O from the database to publish a message that 99% time would have successfully being published immediately without having to check the database.
The difficulty of making the poller closer to real time delivery since now I have added latency to the publication of the messages.
And perhaps other complications like guarantee delivery of events in order, poller executions stepping into one another, multiple pollers, etc.
And then I thought well, I could make this a bit more complicated like, I can publish from the database until I catch up with the live stream of events and then publish real time, i.e. maintain a buffer of size b (circular buffer) as I read based on pages check if that message is in buffer. If so then switch to live subscription.
To this point I realized that how to do this right is not exactly evident and so I concluded that I need to learn what are the right patterns to solve this problem.
So, does anyone has suggestions on what is the right ways to do this correctly?
While RabbitMQ cannot participate in a truly global (XA) transaction, you can use Spring Transaction management to synchronize the Database transaction with the Rabbit transaction, such that if either update fails, both transactions will be rolled back. There is a (very) small timing hole where one might commit but not the other so you do need to deal with that possibility.
See Dave Syer's Javaworld Article for more details.
When Rabbit fails to receive a message (for whatever reason, but in my experience only because the service is down or unavailable) you should be in a position to catch an error. At this point, you can make a record of that - and any subsequent - failed attempt in order to retry when Rabbit becomes available again. The quickest way of doing this is just logging the message details to file, and iterating over to re-send when appropriate.
As long as you have that file, you've not lost your messages.
Once messages are inside Rabbit, and you have faith in the rest of the architecture, it should be safe to assume that messages will end up where they are supposed to be, and that no further persistence work needs doing at your end.

managing lock on message in RabbitMQ

I'm trying to use RabbitMQ in a more unconventional way (though at this point i can pick any other message queue implementation if needed)
I have one queue (I can have more if needed) that where customers are fetching N messages asynchronous. After they do their work I send the results from the client to the db.
I have two problems: first I don't want that they will work on the same message, second I want to grantee that I wont lose messages in case that my customer will close the browser or just stop working.
I looked at the documentation and saw the TTL which was perfect for me if I could alter that message that got timeout isn't going to be deleted but to move to another queue. can't find a way to alter this.
Moreover I looked at the confirmation option which in the first glance looked what I wanted,that mechanism is working like this: when the consumer gets a message he send confirmation to queue, I thought I can delay this confirm and send it when the work is done on the client side.
my problem was that I can't program the queue that if any message didn't get confirm then return it to the queue (or to another).
I also find how to do a scheduled message but it didn't help either because I don't want that the message will be inserted to the queue in five min,I want that when a customer will receive a message it will be locked in the queue for 5 min until confirm to delete is set otherwise return it to the queue.
Can I do temporary queue that enables my mechanism?
If someone can help with one of the problems or suggest another architecture or option to do it in another MQ it would be great.
Resources:
confirmation:
http://www.rabbitmq.com/blog/2011/02/10/introducing-publisher-confirms/
post about locks but his problem was a batcher component:
Locks and batch fetch messages with RabbitMq
TTL:
https://www.rabbitmq.com/ttl.html
Schedule a message:
https://www.rabbitmq.com/blog/2015/04/16/scheduling-messages-with-rabbitmq/
my problem was that I can't program the queue that if any message
didnt get confirm then return it to the queue (or to another).
RabbitMQ does this anyhow, so all you have to do is switch off the auto-ack flag, you figured this out
I thought I can delay this confirm and send it when the work is done
on the client side.
so just send the ACK once you've finished with processing the message.
All the unacknowledged messages remain in the queue and are re-delivered to next consumer (or the same one when it's up again, depending on your setup)

NServicebus handler with custom sqlconnection

I have an NServiceBus handler that creates a new sql connection and new sql command.
However, the command that is executed is not being committed to the database until after the whole process is finished.
It's like there is a hidden sql transaction in the handler itself.
I moved my code into a custom console application without nservicebus and the sql command executed and saved immediately. Unlike in nservicebus where it doesn't save until the end of the handler.
Indeed every handler is wrapped in a transaction, the default transaction guarantee is relying on DTC. That is intentional :)
If you disable it then you might get duplicate messages or lose some data, so that must be done carefully. You can disable transactions using endpoint configuration API instead of using options in connection string.
Here you can find more information about configuration and available guarantees http://docs.particular.net/nservicebus/transports/transactions.
Unit of work
Messages should be processed as a single unit of work. Either everything succeeds or fails.
If you want to have multiple units of work executed then
create multiple endpoints
or send multiple messages
This also has the benefit that these can potentially be processed in parallel.
Please note, that creating multiple handlers WILL NOT have this effect. All handlers on the same endpoint will be part of the same unit of work thus transaction.
Immediate dispatch
If you really want to send a specific message when the sending of the message must not be part of the unit of work then you can immediately send it like this:
using (new TransactionScope(TransactionScopeOption.Suppress))
{
var myMessage = new MyMessage();
bus.Send(myMessage);
}
This is valid for V5, for other versions its best to look at the documentation:
http://docs.particular.net/nservicebus/messaging/send-a-message#dispatching-a-message-immediately
Enlist=false
This is a workaround that MUST NOT be used to circumvent a specific transactional configuration as is explained very well by Tomasz.
This can result in data corruption because the same messsage can be processed multiple times in case of error recovery while then the same database action will be performed again.
Found the solution.
In my connection string I had to add Enlist=False
As mentioned by #wlabaj Setting Enlist=False will indeed make sure that a transaction opened in the handler will be different from transaction used by the transport to receive/send messages.
It is however important to note that it changes the message processing semantics. By default, when DTC is used, receive/send and any transactional operations inside a handler will be commited/rolled-back atomically. With Enlist=False it's not the case so it's possible that there will be more than one handler transaction being committed for the same message. Consider following scenario as a sample case when it can happen:
message is received (transport transaction gets started)
message is successfully processed inside the handler (handler transaction committed successfully)
transport transaction fails and message is moved back to the input queue
message is received second time
message is successfully processed inside the handler
...
The behavior with Enlist-False setting is something that might a be desirable behavior in your case. That being said I think it's worth clarifying what are the consequences in terms of message processing semantics.

What happens if a Publisher terminates before receive ack?

I want to ensure that certain kind of messages couldn't be lost, hence I should use Confirms (aka Publisher Acknowledgements).
The broker loses persistent messages if it crashes before said
messages are written to disk. Under certain conditions, this causes
the broker to behave in surprising ways.
For instance, consider this scenario:
a client publishes a persistent message to a durable queue
a client consumes the message from the queue (noting that the message is persistent and the queue durable), but doesn't yet ack it,
the broker dies and is restarted, and
the client reconnects and starts consuming messages.
At this point, the client could reasonably assume that the message
will be delivered again. This is not the case: the restart has caused
the broker to lose the message. In order to guarantee persistence, a
client should use confirms.
But what if, using confirms, the Publisher goes down before receive the ack and the message wasn't delivery to the queue for some reason (i.e. network failure).
Suppose we have a simple REST endpoint where we can POST new COMMENTS and, when a new COMMENT is created we want to publish a message in a queue. (Note: it doesn't matter if I send a message of a new COMMENT that at the end isn't created due to a rollback for example).
CommentEndpoint {
Channel channel;
post(String comment) {
channel.publish("comments-queue",comment) // is a persistent queue
Comment aNewComment = new Comment(comment)
repository.save(comment)
// what happens if the server where this publisher is running terminates here ?
channel.waitConfirmations()
}
}
When the server restarts the channel is gone and the message could never be delivered.
One solution that comes to my mind is that after a restart, query the recent comments (¿something like the comments created between the last 3 min before the crash?) in the repository and send one message for each one and await confirmations.
What you are worried about is really no longer RabbitMQ only issue, it is a distributed transaction issue. This discussion gives one reasonable lightweight solution. And there are more strict solutions, for instance, two-phase commit, three-phase commit, etc, to ensure data consistent when it is really necessary.

How can i know whether Rabbitmq acked success?

When I set up manual Ack with RMQ, but how could i know whether ack is successfully done?If there is a exception before basic.ack when i have long operations to perform, the message will be sent to another consumer .How can i avoid that?
How can i avoid that?
You can't.
At some point it will happen and your code needs to deal with this scenario gracefully. This is typically done with idempotence in your message processing.
That is, you allow the message to be processed more than once (because it will happen), but you only make the underlying change to the system once.
A common / simple way of handling this is to have an ID associated with each message. Before processing the message, check to see if that ID is marked as complete in your database. If it's not, then process the message. When the message is processed, you update a database with that ID. That way, when (not if) you run into the scenario where a message is processed twice, you won't actually do the processing / system changes twice.