How spring-cloud-stream-rabbit-binder works when RabbitMQ disk or memory alarm is activated? - rabbitmq

Versions:
Spring-cloud-stream-starter-rabbit --> 2.1.0.RELEASE
RabbitMQ --> 3.7.7
Erlang --> 21.1
(1) I have created a sample mq-publisher-demo & mq-subscriber-demo repositories on github for reference.
When Memory Alarm was activated
Publisher: was able to publish messages.
Subscriber: seems like, the subscriber was receiving messages in batch with few delays.
When Disk Alarm was activated
Publisher: was able to publish messages.
Subscriber: seems like, the subscriber was not receiving messages while Disk Alarm was activated. but once the alarm was deactivated, all messages were received by the subscriber.
Are the messages getting buffered somewhere?
Is this the expected behavior?
(because I was expecting RabbitMQ will stop receiving messages from the publisher and the subscriber will never get any subsequent messages once any of the alarms are activated.)
(2) Spring Cloud Stream document says below.
Does it mean the above behaviour? (avoiding deadlock & keep publisher publishing the messages)
Starting with version 2.0, the RabbitMessageChannelBinder sets the RabbitTemplate.userPublisherConnection property to true so that the non-transactional producers avoid deadlocks on consumers, which can happen if cached connections are blocked because of a memory alarm on the broker.
(3) Do we have something similar for Disk alarm also to avoid deadlocks?
(4) If the producer's message will not be accepted by RabbitMQ, then is it possible to throw specific exception to the publisher from spring-cloud-stream (saying alarms are activated and message publish failed)?
I'm kind of new about these alarms in spring-cloud-stream, please help me to understand clearly. Thanking you.

Are the messages getting buffered somewhere?
Yes, when resource alarm is set, messages will be published into network buffers.
Tiny messages will take some time to fill up Network buffer and then
block publishers.
Less network buffer size will block publishers
soon.

It's better to ask questions about the behavior of RabbitMQ itself (and the Java client that Spring uses) on the rabbitmq-users Google group; that's where the RabbitMQ engineers hang out.
(2) Spring Cloud Stream document says below. Does it mean the above behaviour?
That change was made so that if producers are blocked from producing, consumers can still consume.
(4) If the producer's message will not be accepted by RabbitMQ, then is it possible to throw specific exception to the publisher from spring-cloud-stream (saying alarms are activated and message publish failed)?
Publishing is asynchronous by default; you can enable transactions (which can slow down performance a lot; or enable errors on the producer and you'll get an asynchronous message on the error channel if you enable publisher confirms and returns.

Related

RabbitMQ dead letter handling guarantees

If I use publisher confirms, I can be (reasonably) sure that a message sent to an exchange on the RabbitMQ server, and which received ACK from the RabbitMQ server is not lost even if the RabbitMQ server crashes (power outage for example).
However what happens when a message arrives at a dead letter exchange after a manual rejection in the consumer? (channel.basicReject, I use Spring AMQP.)
Can I still be sure that in the case in which the original message is dequeued from the queue to which the consumer is listening, and the RabbitMQ server subsequently crashes, I will eventually find the message, after the RabbitMQ server is restarted, in the queues which are bound to the dead letter exchange (if normally the message would have arrived there)?
If the answer is negative, is there a way to ensure that this is the case?
As #GaryRussell suggested, I posted a similar question on rabbitmq-users Google group.
Here is the answer I got from Daniil Fedotov
"Hi,
There is no delivery guarantees in place. Dead lettering does not check if the message was enqueued or saved to disk.
Dead-lettering does not use publisher confirms or any other confirm mechanisms.
It's not that easy to implement reliable dead-lettering from one queue to another and there are plans to address this issue eventually, but it may take a while.
If you want to safely reject messages from the consumer without a risk of losing them - you can publish them from the consumer application manually to the dead-letter queue, wait for the confirmation and then reject."

Resiliently processing messages from RabbitMQ

I'm not sure how to resiliently handle RabbitMQ messages in the event of an intermittent outage.
I subscribe in a windows service, read the message, then store it my database. If I can't process the record because of the data I publish it to a dead letter queue for a human to address and reprocess.
I am not sure what to do if I have some intermittent technical issue that will fix itself (database reboot, network outage, drive space, etc). I don't want hundreds of messages showing up on dead letter that just needed to wait for a for a glitch but now would be waiting on a human.
Currently, I re-queue the event and retry it once, but it retries so fast the issue is not usually resolved. I thought of retrying forever but I don't want a real issue to get stuck in an infinite loop.
Is a broad topic but from the server side you could persist your messages and make your queues durable, this means that in the eventuality the server gets restarted they won't be lost, check more here How to persist messages during RabbitMQ broker restart?
For the consumer (client) it will depend on how you configure your client, from the docs:
In the event of network failure (or a node crashing), messages can be duplicated, and consumers must be prepared to handle them. If possible, the simplest way to handle this is to ensure that your consumers handle messages in an idempotent way rather than explicitly deal with deduplication.
If a message is delivered to a consumer and then requeued (because it was not acknowledged before the consumer connection dropped, for example) then RabbitMQ will set the redelivered flag on it when it is delivered again (whether to the same consumer or a different one). This is a hint that a consumer may have seen this message before (although that's not guaranteed, the message may have made it out of the broker but not into a consumer before the connection dropped). Conversely if the redelivered flag is not set then it is guaranteed that the message has not been seen before. Therefore if a consumer finds it more expensive to deduplicate messages or process them in an idempotent manner, it can do this only for messages with the redelivered flag set.
Check more here: https://www.rabbitmq.com/reliability.html#consumer

Settings Autoack true in Rabbitmq and celery

I am using celery and rabbitmq , but due to pushing several task in queue my server memory utilization becomes more than 40% , so that rabbit further will not accepting any task . so i want to delete those message which are already executed , but due to durable behavior of rabbitmq those message not automatically delete, so i want to set some configuration like autoAck=True , so that if message is consumed from celery ,it will delete from rabbitmq queues and also from my server memory. please explain how can we do that .
OK, so while I don't fully understand why you have the problem you have, it is clear what is going on.
A publisher puts a message task in the queue
Your worker process pulls the message and processes it
The message is never actually removed from the queue
This behavior happens when a consumer fails to acknowledge the processing of a message. To confirm, if you look at the RabbitMQ management plug-in, you'll see a whole bunch of unacknowledged messages. These will be unavailable for consumption, but will continue to be held on the server and taking up disk space and memory.
Further, if you do a Basic.Recover, all of these messages will then get dumped back into the queue to be processed again.
This problem is due to incorrect configuration of your consumer. There are two ways to address this:
You can configure the consumer to auto-ack (i.e. acknowledge the message automatically upon receipt). This is done when you declare the consumer (using Basic.Consume). Edit: It looks like this may be the default behavior of Celery.
You can configure your worker process to submit an acknowledgement (using Basic.Ack). Edit: this is done via the acks_late property in Celery.

ActiveMQ KahaDB Persistence Store Full

I am using ActiveMQ 5.4 with KahaDB as message store.
While Publishing Messages (with Persistence true) to a Topic, which has Durable subscriber, the persistence store is increasing even the messages are dispatched to Subscriber. So this is causing an issue as the message store is getting full and not accepting any more messages.
So my question is why the Persistence store is not discarding the messages in the KahaDB, even the messages are getting dispatched?
Regards,
Srinivas
What you are seeing is an interaction between the ActiveMQ message store behaviour and that for durable subscriptions on topics.
When you have durable subscriptions, a topic is treated like a queue for each subscriber's clientId (set on the Connection). The logic being that the client doesn't want to miss any messages when they disconnect. So if they disconnect, the durable subscription hangs around and keeps the messages alive.
The AMQ message store uses data logs for it's message journal. These are written sequentially, and never actually removed from (that would require random access). There is a second file which keeps track of which messages have been consumed. Once all the messages in a data file have been consumed, that file is deleted.
So what you're seeing is that some of the messages in the data file are not being consumed by these durable subscriptions and just hang around. ClientIds for durable subscribers not being consistently used would cause this issue. It's likely that there is something wrong with the way the feature is being used, if you use JMX to inspect the subscriptions on the broker that should help you track down the root cause.
As a general rule, whenever you think that you might want to use a durable subscription, use virtual topics instead - they are much easier to reason about, inspect and load balance. On the other hand if you just want to get the last couple of messages when you reconnect a topic subscriber rather than all the messages you may have missed, use retroactive consumers.
An easy way to get around this issue is to always use a time to live when you send a message - pretty much every use case has a time limit of when a message ought to be consumed by anyway. ActiveMQ will expire messages beyond this point, and free up the messages in the data files for deletion.

"Unknown delivery tag" from RabbitMQ when ack'ing a message in a cluster with replicated queues

We've been using Rabbit successfully for about a year. Recently have upgraded to v2.6.1, because we want to use clusters with replicated message queues.
My testing has hit a puzzling behavior that smells like a Rabbit bug to me. The test that uncovers this is working with a two-node cluster. Both nodes are running v2.6.1. Both nodes have disk. Both nodes are running on Mac OS, though I doubt this is pertinent.
I'm also running Alice on the node that runs the test. The test uses it to programmatically do a stop_app on one of the nodes, because the test is trying to validate that if the cluster master fails, and a slave is elevated to take its place, that we don't lose messages.
So, the test has a small thread pool, which is given tasks that periodically 1) publish messages, and 2) toggle the state of the Rabbit master node (stopped if running; started if stopped). Other threads are consuming messages from queues.
I'm using publisher confirms, and I'm also acknowledging the messages in the consumers (using autoAck=false for channel.basicConsume()).
When the master node is stopped, I see both the producers and consumers catching ShutdownSignalException. They handle this by attempting to reconnect to the cluster. This works fine. When reconnected, they continue with their business.
Sometimes, what I see is that a consumer has successfully fetched a message from the broker, and is calling channel.basicAck() when it gets that ShutdownSignalException.
Later, when the consumer has reconnected, it again pulls down the same message. (The message bodies are tagged with a UUID, so I know it is the same one.) This time, when the consumer attempts to basicAck() the message, it again gets ShutdownSignalException, but this one has the following text in it: "reply-text=PRECONDITION_FAILED - unknown delivery tag 7".
In fact, that is the same delivery tag that was offered to the consumer by the broker before the master went down and the consumer reconnected.
Googling suggests that this event means that the consumer is attempting to ack the same message more than once.
But, how can this be so? If the first ack succeeded, then the message should have been removed from the broker's queues, and the consumer shouldn't see the same message again.
Yet, if the first ack did not succeed, then the consumer shouldn't be dinged for attempting to re-ack the message.
Anyone seen this before? It smells like a bug in Rabbit's replicated queues to me, but I've still new to Rabbit, and so am willing to believe there's a subtlety here in consuming from a clustered broker that I haven't yet grokked!
Thanks, --Steve
I'm not sure if my case matching yours, but I have seen similar "unknown delivery tag" on attempts to ack after reconnect and then the same message arrived again. Initially it looked like a bug to me, but in fact this is expected behavior. Consumer with QOS>1 may have in it's local buffer some messages and delivery tag will be invalid for all o them after reconnect. From another hand, attempt to ack even the current message after reconnect doesn't make any sense, because that message already nacked automatically on connection lost and this is why I got it again.