How to detect alarm-based blocking RabbitMQ producer? - rabbitmq

I have a producer sending durable messages to a RabbitMQ exchange. If the RabbitMQ memory or disk exceeds the watermark threshold, RabbitMQ will block my producer. The documentation says that it stops reading from the socket, and also pauses heartbeats.
What I would like is a way to know in my producer code that I have been blocked. Currently, even with a heartbeat enabled, everything just pauses forever. I'd like to receive some sort of exception so that I know I've been blocked and I can warn the user and/or take some other action, but I can't find any way to do this. I am using both the Java and C# clients and would need this functionality in both. Any advice? Thanks.

Sorry to tell you but with RabbitMQ (at least with 2.8.6) this isn't possible :-(
had a similar problem, which centred around trying to establish a channel when the connection was blocked. The result was the same as what you're experiencing.
I did some investigation into the actual core of the RabbitMQ C# .Net Library and discovered the root cause of the problem is that it goes into an infinite blocking state.
You can see more details on the RabbitMQ mailing list here:
http://rabbitmq.1065348.n5.nabble.com/Net-Client-locks-trying-to-create-a-channel-on-a-blocked-connection-td21588.html
One suggestion (which we didn't implement) was to do the work inside of a thread and have some other component manage the timeout and kill the thread if it is exceeded. We just accepted the risk :-(

The Rabbitmq uses a blocking rpc call that listens for a reply indefinitely.
If you look the Java client api, what it does is:
AMQChannel.BlockingRpcContinuation k = new AMQChannel.SimpleBlockingRpcContinuation();
k.getReply(-1);
Now -1 passed in the argument blocks until a reply is received.
The good thing is you could pass in your timeout in order to make it return.
The bad thing is you will have to update the client jars.
If you are OK with doing that, you could pass in a timeout wherever a blocking call like above is made.
The code would look something like:
try {
return k.getReply(200);
} catch (TimeoutException e) {
throw new MyCustomRuntimeorTimeoutException("RabbitTimeout ex",e);
}
And in your code you could handle this exception and perform your logic in this event.
Some related classes that might require this fix would be:
com.rabbitmq.client.impl.AMQChannel
com.rabbitmq.client.impl.ChannelN
com.rabbitmq.client.impl.AMQConnection
FYI: I have tried this and it works.

Related

RabbitMQ + kombu - A long callback blocks the heartbeat leading to aborting the connection

We have been trying to use RabbitMQ to transfer data from Project A to Project B.
We created a producer who takes the data from Project A and puts it in a queue, and that was relatively easy. Then, create a k8s pod for Project B, which listens to the appropriate queue with the ConsumerMixin of kombu.
Overall, the integration was reasonable and straightforward. But when we started to process long messages, we noticed that they were coming back into the queue repeatedly.
After research, we found out that whenever the processing of the message takes more than 20 seconds, the message showed up in the queue again, even though the processing was successful.
The source of this issue lies with the heartbeat of RabbitMQ. We set the heartbeat for 10 seconds, and the RabbitMQ checks the connection twice before it kills it. However, because the process of the callback takes more than 20 seconds, and the .ack() (acknowledge) of the message happens at the end of the callback (to ensure it was successful), the heartbeat is being blocked by the process of this message (as described here: https://github.com/celery/kombu/issues/621#issuecomment-251836611).
We have been trying to find a workaround with Threading, to process the message on a different thread and avoid the block of the heartbeat, but it didn't work. Also, it feels like we were trying to hack things and not solve the problem.
So my question here is if there is a proper workaround to handle this situation, or what alternatives do we have? RabbitMQ seemed like the right choice since we use it in standalone projects with Celery, and it is also recommended on the internet.

Instruct RabbitMQ to resend undelivered messages periodically

Background
We're using langohr to interact with RabbitMQ. We've tried two different approaches to let RabbitMQ resend messages that has not yet been properly handled by our service. One way that works is to send a basic.nack with requeue set to the true but this will resend the message immediately until the service responds with a basic.ack. This is a bit problematic if the service for example tries to persist the message to a datastore that is currently down (and is down for a while). It would be better for us to just fetch the undelivered messages say every 20 seconds or so (i.e. we neither do a basic.ack or basic.nack if the datastore is down, we just let the messages be retained in the queue). We've tried to implement this using an ExecutorService whose gist is implemented like this:
(let [chan (lch/open conn)] ; We create a new channel since channels in Langohr are not thread-safe
(log/info "Triggering \"recover\" for channel" chan)
(try
(lb/recover chan)
(catch Exception e (log/error "Failed to call recover" e))
(finally (lch/close chan))))
Unfortunately this doesn't seem to work (the messages are not redelivered and just remains in the queue). If we restart the service the queued messages are consumed correctly. However we have other services that are implemented using spring-rabbitmq (in Java) and they seem to be taking care of this out of the box. I've tried looking in the source code to figure out how they do it but I haven't managed to do so yet.
Question
How do you instruct RabbitMQ to (re-)deliver messages in the queue periodically (preferably using Langohr)?
I am not sure what you are doing with your Spring AMQP apps, but there's nothing built into RabbitMQ for this.
However, it's pretty easy to set up dead-lettering using a TTL to requeue back to the original queue after some period of time. See this answer for examples, links etc.
EDIT
However, Spring AMQP does have a retry interceptor which can be configured to suspend the consumer thread for some period(s) during retry.
Stateful retry rejects and requeues; stateless retry handles the retries internally and has no interaction with the broker during retries.
See this answer which has instructions: we Nack the message, the nack puts the message into a holding queue for N seconds, then it TTLs out of that queue and into another queue that puts it back in the original queue.
It took a little bit of work to setup, but it works great!

Messages being lost on consumer falling over

This seems like a pretty basic question, but I seem to be losing messages when the consumer falls over before acknowledging them. I have set up the broker with an exchange audit:exchange and a queue bound to it audti:queue. Both are durable, and as expected if I send messages when no consumer is active they sit on the queue and get processed by the consumer when it starts up. However if I put a break point in the consumer and kill the process half way through, the message is not requeued - it just seems to get lost. The consumer is set up using the annotation
#RabbitListener(queues="audit:queue")
public void process(Message message) {
routeMessage(message) //stop here and kill process - message removed from q
}
I can't reproduce your issue.
With the breakpoint triggered, I see the message still in the queue (unacked=1) on the rabbit console.
When the process is killed; the message goes back to ready.
Have you configured the listener container factory to use Acknowledgemode.NONE?
That will exhibit the behavior you describe.
The default is AUTO which means the message will only be acknowledged when the listener returns successfully.
If you still think there's an issue; please supply the complete test case.
Sorry this was my bad (I just wasted a few hours .. sigh). I was killing the app from within my ide. Which probably detaches and then kills the process - allowing time for it to proceed just enough that it actually does send the ack. When I just killed the process from a terminal it worked exactly as expected. Particualr apologies to you Gary for wasting your time as well.

Does EasyNetQ support nack?

What I'm really trying to do is leave the message on the queue in the case where it is rejected by the current consumer. In RabbitMQ I could send a NACK to accomplish this. Is NACK supported in EasyNetQ? Is there another way to achieve the behavior I'm looking for?
Update: not a lot of responses, so I'm wondering how people are generally handling the lack of NACK in EasyNetQ. Not having the equivalent of basic.reject limits consumers to "I can always process every message" scenarios. I suppose consumers could throw a specific "rejected" exception to cause EasyNetQ to dequeue the message to the error queue, and I could requeue messages with those errors. Anyone else have other workarounds in place?
I used EasyNetQ for almost a year, but no matter how we tweaked it (amongst other things added our own implementation of IConsumerErrorStrategy) I never really got it to work the way I wanted. The fact that it is single threaded gave us some unexpected behaviour (sometimes deadlocks) when performing RequestAsync while in a SubscribeAsync handler.
The solution for us was to move from EasyNetQ. After working with the official RabbitMq Client for a while, I spent a few days writing a super thin client on top of that. It is influenced by EasyNetQ and supports most of the concepts that EasyNetQ has. However, I added some neat features like pluggable message contexts. I think that the Nack feature of IAdvancedMessageContext that I just added can be something for you:
var client = service.GetService<IBusClient<AdvancedMessageContext>>();
client.RespondAsync<BasicRequest, BasicResponse>((req, ctx) =>
{
ctx?.Nack(); // the context implements IAdvancedMessageContext.
return Task.FromResult<BasicResponse>(null);
}, cfg => cfg.WithNoAck(false));
If you're interested you can read more about it at the Github page (especially the NackTests.cs).
I think you can change the behavior by implementing your own IConsumerErrorStrategy:
https://github.com/EasyNetQ/EasyNetQ/blob/master/Source/EasyNetQ/Consumer/DefaultConsumerErrorStrategy.cs
But if you need that kind of control you might consider just using the RabbitMQ client directly?
It sounds like you are trying to handle failures. You can NACK a message, but that means it sits at the head of the queue. Great, but then it means that you could end up with a bunch of messages that are truthfully unable to be processed, and you will be unable to actually process real messages.
The solution that I have always used when using RabbitMQ is to utilize the default error handling of EasyNetQ, and have a separate application to resend messages. That is, when an exception is captured in RabbitMQ, it routes the message to a queue called "EasyNetQ_Default_Error_Queue". You are able to override this name and have different queues go to different error queues, but for now let's stick with the default. You can then have a Windows Service/Azure Worker role reading these messages, and working out what to do. That may include having a "RetryCount" on your message envelope/wrapper to make sure that it only loops around so many times. All in all, it's going to be a bit of work.
What you are finding, is what many people run into when using RabbitMQ/EasyNetQ. She's pretty raw.

JMSXGroupID/correlation-id to queue messages on stomp client doesn't seem to work

I was trying to queue messages to the same consumer using stomp-js on a node server.
Producer:
producer.send({'JMSXGroupID':JMSXGroupID, 'destination':confMgr.getConfig("jmsqueue.destination"), 'body':JSON.stringify(msg), 'persistent':'true'}, false);
Consumer:
client.on('message', function(message) {
client.ack(message.headers['message-id']);
})
I was sending two messages using the same JMSXGroupID and it seems that the the client processess both the messages in parallel rather than processing message1 and ack'ing it and going ahead to process message2 and then ack'ing message2. I tried using 'correlation-id' and it doesn't seem to work either. Can anyone suggest a better way?
Thank you in advance,
Chandra.
I guess you are using this stomp-js lib (correct me if I'm wrong): https://github.com/benjaminws/stomp-js
Message groups are supported by ActiveMQ using Stomp, so you are most likely getting the messages in order. Processing them in order requires you to somehow process each message synchronously on the client, which is rather simple when you can controll how many threads that the listener will run in. This might not be as easy with java script. which is not
From what I can see, the lib you are using is not the most well documented, the only setting you could tweak that might (I have not tried it!), is to alter the prefetch size to one.
var headers = {
destination: '/queue/test_stomp',
ack: 'client',
'activemq.prefetchSize': '1'
};
It might be the case that this lib still starts eagerly directly to fetch the next message, but you might want to test it.
On the other hand, you might as well want to re design the application to be sequence independent, since you are running node.js and java script. It's always better to have a sequence independence with messaging, since you are able to optimize performance a lot better and can avoid synchronous behaviours.
I don't know what you did try to achieve with correlation id, but that header is used to correlate a request with a reply, which is not the case here.