What is the guidance to short circuit retry policy on a non transient error.
Scenario.
Using MassTransit v3 attached to RabbitMq.
A simple retry policy try 5 times setup in the pipeline.
In the Consume for the message, a non recoverable error occurs, rather than throwing an exception and retrying another 4 times I'd like to move this message to the error queue.
You can use:
Retry.Except<BadException>().Immediate(5);
The solution from Chris did not work for me for some reason. Maybe I was making this call from the wrong context. The only thing that worked was to make two UseRetry calls:
configurator.ReceiveEndpoint(host, _config.QueueName, ec =>
{
...
// Configure retries to immediately fail on permanent consumer faults
ec.UseRetry(r => r.None().Handle<ConsumerPermanentFaultException>());
// Configure all other retries as incremental
ec.UseRetry(r =>
{
r.Incremental(_config.RetryCount, _config.RetryInterval, _config.RetryIntervalIncrement)
.Ignore<ConsumerPermanentFaultException>();
});
// Load consumers from the container
ec.LoadFrom(consumerContainer);
});
Related
I have written a program which requires multiple queues interaction - means consumer of one queue writes message to another queue and same program has consumer to take action on that queue.
Problem: How to handle network time-out issues with queue while sending messages asynchronously using spring rabbit ampq library?or RabbitTemplate.send() function must throw an exception if there are network issues.
Currently, I have implemented RabbitTemplate.send() that returns immediately and working fine. But, If network is down, send function returns immediately, doesn't throw any exception and client code assumes success. As a result, i have in-consistent state in DB that message is successfully processed. Please note that call to send function is wrapped inside transactional block and goal is if queue writing fails, DB commit must also rollback. I am exploring following solutions but no success:
Can we configure rabbitTemplate to throw run-time exception if any network connectivity issue so that client call is notified? Please suggest how to do this.
Shall we use synchronous SendAndReceive function call but it leads to delay in processing? Another problem, observed with this function, my consumer code gets notification while sendAndReceive function is still blocked for writing message to queue. Please advise if we can delay notification to queue unless sendAndReceive function is returned. But call to SendAndReceive() was throwing an amqp exception if network was down which we were able to capture, but it has cost associated related to performance.
My application is multi-threaded, if multiple threads are sending message using sendAndReceive(), how spring-amqp library manages queue communication? Does it internally creates channel per request? If messages are delivered via same channel, it would impact performance a lot for multi-threaded application.
Can some-one share sample code for using SendAndReceive function with best-practices?
Do we have any function in spring-amqp library to check health of RabbitMQ server before submitting send function call? I explored rabbitTemplate.isRunning() but not getting proper result. If any specific configuration required, please suggest.
Any other solution to consider for guaranteed message delivery or handle network time-out issues to throw runtime exceptions to client..
As per Gary comment below, I have set: rabbitTemplate.setChannelTransacted(true); and it makes call sync. Next part of problem is that if I have transaction block on outer block, call to RabbitTemplate.send() returns immediately. I expect transaction block of outer function must wait for inner function to return, otherwise, ii don't get expected result as my DB changes are persisted though we enabled setChannelTransacted to true. I tried various Transaction propagation level but no success. Please advise if I am doing anything wrong and review transactional propagation settings as below
#Transactional
public void notifyQueueAndDB(DBRequest dbRequest) {
logger.info("Updating Request in DB");
dbService.updateRequest(dbRequest));
//Below is call to RabbitMQ library
mqService.sendmessage(dbRequest); //If sendMessage fails because of network outage, I want DB commit also to be rolled-back.
}
MQService defined in another library of project, snippet below.
#Transactional( propagation = Propagation.NESTED)
private void sendMessage(......) {
....
rabbitTemplate.send(this.queueExchange, queueName, amqpMessage);
}catch (Exception exception) {
throw exception
}
Enable transactions so that the send is synchronous.
or
Use Publisher confirms and wait for the confirmation to be received.
Either one will be quite a bit slower.
We are having an issue with recovery for messages originating from Sagas.
When a Saga sends a message for processing, the message handler can sometimes fail with an exception. We currently use a try/catch and when an exception is thrown, we "Reply" with a failed message to the Saga. The issue with this approach is that Recoverability retries don't happen since we are handling the error in the message handler.
My thought was to add custom logic to the pipeline and if the Command message implements some special Interface, the custom logic would send a failed message response to the Saga if an exception occurs (after the retries fails), but I'm not sure where to plug into the pipeline that would allow me to send messages after retries fails.
Is this a valid approach? If not, how can I solve for Saga to Handler failure messages after retries?
You can use immediate dispatch to not wait for a handler to complete.
However, I would like to suggest an alternate approach. Why not create a Timeout in the saga? If the reply from the processing-handler isn't received within a certain TimeSpan, you take an alternate path. The processing-handler gets 5 minutes and if it doesn't respond within 5 minutes, we do something else. If it still responds after 6 minutes, we know we've already taken the alternate path (use a boolean flag or so and store that inside the saga data) and put aside the reply that arrived too late.
If you want to start a discussion based on this, check our community platform.
I have a process whereby an admin must be alerted and the message automatically retried if some business logic is not meet.
Currently what I did is I throw and Exception to force NServiceBus to retry the message.
I have a feeling this is not what I am supposed to do. Is this the proper way of doing it?
public void Handle(ImportantCmd message)
{
//do some awesome business logic here
..a business logic is not meet..
//send email alert in case of error
Bus.Publish<SendEmailCmd>(email =>
{
email.To = "pooradmin#awesomecompany.com";
email.Title = "Important title";
email.Body = "Important message";
});
//then force NServiceBus to retry
throw new Exception("Blah blah...., retrying this message.");
}
Update: I would like an admin to be alerted whenever some condition is not met and he/she should be able to see all messages that are affected (perhaps in a dedicated queue?) and possibly retry them.
Basically our service depends on an external service. This external service occasionally could return erroneous respond (but if we retry it might work). That is why I am alerting the admin and at the same time retrying them.
Given your update (i'm assuming the admin will not alter the message) i would say you can use the FLR (First Level retry) and SLR(Second Level Retry) to retry the messages as the web service you are calling will eventually be able to process your message.
If that fails, the message will end up in the error queue.
You can monitor the error queue, by polling ServiceControl using it's API (if you use the platform installer it will install ServiceControl with NServiceBus) or subscribing to the MessageFailed event ServiceControl is publishing like this spike code more on David's blog .
Here is a link about SLR
Check Out David's book
The retry mechanism of NServiceBus (driven by throwing an exception) is supposed to be for infrastructure problems (deadlocks, servers unavailable, outright bugs, etc.) that a developer would need to look at. That way transient failures (deadlocks, web service down) is taken care of on an automatic retry, and permanent errors (whoops looks like I divided by zero!) go to an error queue for a developer to figure out and take administrative action.
Now, if your endpoint is transactional, your code above will not work as expected because either everything in the message handler is in the transaction. That means if you throw an exception, your Bus.Publish (or Bus.Send, and you can't/shouldn't publish a command) will not actually happen.
Really, I don't understand what sort of business logic would require an alert and a retry. Can you elaborate? What is it that makes your business logic so non-deterministic based on the incoming message? And can anything be done about that?
But at the end of the day, this business logic sounds like it's part of a business process, which should stay expressed in messages, not in errors and retry. So if a condition means you need to notify someone and so something else, publish a ThingHappened event (a subscriber can send an email) and then have another handler do whatever is necessary to handle that business process. If that means that, in the future, a new command comes through with largely the same data, then so be it.
I have a producer sending durable messages to a RabbitMQ exchange. If the RabbitMQ memory or disk exceeds the watermark threshold, RabbitMQ will block my producer. The documentation says that it stops reading from the socket, and also pauses heartbeats.
What I would like is a way to know in my producer code that I have been blocked. Currently, even with a heartbeat enabled, everything just pauses forever. I'd like to receive some sort of exception so that I know I've been blocked and I can warn the user and/or take some other action, but I can't find any way to do this. I am using both the Java and C# clients and would need this functionality in both. Any advice? Thanks.
Sorry to tell you but with RabbitMQ (at least with 2.8.6) this isn't possible :-(
had a similar problem, which centred around trying to establish a channel when the connection was blocked. The result was the same as what you're experiencing.
I did some investigation into the actual core of the RabbitMQ C# .Net Library and discovered the root cause of the problem is that it goes into an infinite blocking state.
You can see more details on the RabbitMQ mailing list here:
http://rabbitmq.1065348.n5.nabble.com/Net-Client-locks-trying-to-create-a-channel-on-a-blocked-connection-td21588.html
One suggestion (which we didn't implement) was to do the work inside of a thread and have some other component manage the timeout and kill the thread if it is exceeded. We just accepted the risk :-(
The Rabbitmq uses a blocking rpc call that listens for a reply indefinitely.
If you look the Java client api, what it does is:
AMQChannel.BlockingRpcContinuation k = new AMQChannel.SimpleBlockingRpcContinuation();
k.getReply(-1);
Now -1 passed in the argument blocks until a reply is received.
The good thing is you could pass in your timeout in order to make it return.
The bad thing is you will have to update the client jars.
If you are OK with doing that, you could pass in a timeout wherever a blocking call like above is made.
The code would look something like:
try {
return k.getReply(200);
} catch (TimeoutException e) {
throw new MyCustomRuntimeorTimeoutException("RabbitTimeout ex",e);
}
And in your code you could handle this exception and perform your logic in this event.
Some related classes that might require this fix would be:
com.rabbitmq.client.impl.AMQChannel
com.rabbitmq.client.impl.ChannelN
com.rabbitmq.client.impl.AMQConnection
FYI: I have tried this and it works.
I have an endpoint that has a message handler which does some FTP work.
Since the ftp process can take some time, I encapsulated the ftp method within a TransactionScope with TransactionScopeOption.Suppress to prevent the transaction timeout exceptions.
Doing this got rid of the timeout exceptions, however the handler was fired 5 times
(retries is set to 5 in my config file)
The files were ftp'd ok, but they were just ftp'd 5 times.
the handler look like it is re-fired after 10 or 11 minutes.
some test code looks as follows:
public void Handle(FtpMessage msg)
{
using (TransactionScope t = new TransactionScope(TransactionScopeOption.Suppress))
{
FtpFile(msg);
}
}
Any help would be greatly appreciated.
thanks.
If this truly is an FTP communication that cannot be completed within the transaction timeout, another approach would be to turn this into a Saga.
The Saga would be started by FtpMessage, and in the handler it would start the FTP work asynchronously, whether in another thread, or via another process, or whatever, and store enough information in saga data to be able to look up the progress later.
Then it would request a timeout from the TimeoutManager for however long makes sense. Upon receiving that timeout, it would look up the state in saga data, and check on the FTP work in progress. If it's completed, mark the saga as complete, if not, request another timeout.
Alternatively, you could have a process wrapping the FTP communication that hosts its own Bus but does not have any message handlers of its own. It could receive its FTP information via the command line (including requesting endpoint), do its work, and then send a message back to the requesting endpoint saying it is complete. Then you would not have to wait for a timeout to move on with the process.
I'd recommend configuring that endpoint as non-transactional rather than trying to suppress the transaction. Do that by including .IsTransactional(false) in your initialization code if self-hosting or by implementing IConfigureThisEndpoint, AsA_Client when using the generic host.
My guess is that by not completing the inner scope you're causing the outer scope, created by NSB, to rollback. This will cause NSB to retry your FtpMessage.
Try to add: t.Complete(); after your call to FtpFile and see if that does it for you.
Edit: After rereading your question I realized that this won't solve your timeout issue. Have you tried to increase the timeout? (10 min is the default maxValue in machine.config so you can't set it to higher without modifying machine.config)