ActiveMQ - memory not released after consuming all messages - activemq

I've made a test, based on the example solution of the activemq-cpp library.
In the test I send 50,000 messages to a queue, and after they're all sent I consume them, with INDIVIDUAL_ACKNOWLEDGE on the session and message->acknowledge() on every consumed message. The consumer is asynchronous.
Memory (private working set) of java.exe before sending messages: 209,320 KB. After sending all messages: 412,548 KB. After consuming all messages: 434,637 KB. Meaning, although queue size is 0, memory was not released.
What am I missing?
Thanks.

Besides the JVM processing mentioned above there are a number of other factors that are in play here. Depending on the state of the broker when you started the producer sending messages there could be a number of resources that where allocated on the broker to create the Queue and various other management objects which will then remain in memory to facilitate message routing etc. To truly analyze the memory usage and check for leaks you should use a tool like Yourkit etc.

Related

RabbitMQ consumer overload

I`ve been reading about the principles of AMQP messaging confirms. (https://www.rabbitmq.com/confirms.html). Really helpful and wel written article but one particular thing about consumer aknowledgments is really confusing, here is the quote:
Another things that's important to consider when using automatic acknowledgement mode is that of consumer overload.
Consumer overload? Message queue is processed and kept in RAM by broker (if I understand it correctly). What overload is it about? Does consumer have some kind of second queue?
Another part of that article is even more confusing:
Consumers therefore can be overwhelmed by the rate of deliveries, potentially accumulating a backlog in memory and running out of heap or getting their process terminated by the OS.
What backlog? How is this all works together? What part of job is done by consumer (besides consuming message and processing it of course)? I thought that broker is keeping queues alive and forwards the messages but now I am reading about some mysterious backlogs and consumer overloads. This is really confusing, can someone explain it a bit or at least point me to the good source?
I believe the documentation you're referring to deals with what, in my opinion, is sort of a design flaw in either AMQP 0-9-1 or RabbitMQ's implementation of it.
Consider the following scenario:
A queue has thousands of messages sitting in it
A single consumer subscribes to the queue with AutoAck=true and no pre-fetch count set
What is going to happen?
RabbitMQ's implementation is to deliver an arbitrary number of messages to a client who has not pre-fetch count. Further, with Auto-Ack, prefetch count is irrelevant, because messages are acknowledged upon delivery to the consumer.
In-memory buffers:
The default client API implementations of the consumer have an in-memory buffer (in .NET it is some type of blocking collection (if I remember correctly). So, before the message is processed, but after the message is received from the broker, it goes into this in-memory holding area. Now, the design flaw is this holding area. A consumer has no choice but to accept the message coming from the broker, as it is published to the client asynchronously. This is a flaw with the AMQP protocol specification (see page 53).
Thus, every message in the queue at that point will be delivered to the consumer immediately and the consumer will be inundated with messages. Assuming each message is small, but takes 5 minutes to process, it is entirely possible that this one consumer will be able to drain the entire queue before any other consumers can attach to it. And since AutoAck is turned on, the broker will forget about these messages immediately after delivery.
Obviously this is not a good scenario if you'd like to get those messages processed, because they've left the relative safety of the broker and are now sitting in RAM at the consuming endpoint. Let's say an exception is encountered that crashes the consuming endpoint - poof, all the messages are gone.
How to work around this?
You must turn Auto-Ack off, and generally it is also a good idea to set reasonable pre-fetch count (usually 2-3 is sufficient).
Being able to signal back pressure a basic problem in distributed systems. Without explicit acknowledgements, the consumer does not have any way to say "Slow down" to broker. With auto-ack on, as soon as the TCP acknowledgement is received by broker, it deletes the message from its memory/disk.
However, it does not mean that the consuming application has processed the message or ave enough memory to store incoming messages. The backlog in the article is simply a data structure used to store unprocessed messages (in the consumer application)

How does RabbitMQ actually store the message physically?

I want to know how does RabbitMQ store the messages physically in its RAM and Disk?
I know that RabbitMQ tries to keep the messages in memory (But I don't know how the messages are put in the Ram). But the messages can be spilled into disk when the messages are with persistent mode or when the broker has the memory pressure. (But I don't know how the messages are stored in Disk.)
I'd like to know the internals about these. Unfortunately, the official documentation in its homepage do not expose the internal details.
Which document should I read for this?
RabbitMQ uses a custom DB to store the messages, the db is usually located here:
/var/lib/rabbitmq/mnesia/rabbit#hostname/queues
Starting form the version 3.5.5 RabbitMQ introduced the new New Credit Flow
https://www.rabbitmq.com/blog/2015/10/06/new-credit-flow-settings-on-rabbitmq-3-5-5/
Let’s take a look at how RabbitMQ queues store messages. When a
message enters the queue, the queue needs to determine if the message
should be persisted or not. If the message has to be persisted, then
RabbitMQ will do so right away[3]. Now even if a message was persisted
to disk, this doesn’t mean the message got removed from RAM, since
RabbitMQ keeps a cache of messages in RAM for fast access when
delivering messages to consumers. Whenever we are talking about paging
messages out to disk, we are talking about what RabbitMQ does when it
has to send messages from this cache to the file system.
This post blog is enough detailed.
I also suggest to read about lazy queue:
https://www.rabbitmq.com/lazy-queues.html
and
https://www.rabbitmq.com/blog/2015/12/28/whats-new-in-rabbitmq-3-6-0/
Lazy Queues This new type of queues work by sending every message that
is delivered to them straight to the file system, and only loading
messages in RAM when consumers arrive to the queues. To optimize disk
reads messages are loaded in batches.

Persistent message queue with at-least-once delivery

I have been looking at message queues (currently between Kafka and RabbitMQ) for one of my projects where these are biggest must have features.
Must have features
Messages in queues should be persistent. (only until they are processed successfully by consumers.)
Messages in queues should be removed only when downstream consumers were able to process the message successfully. Basically, a consumer should ACK. that it processed a message successfully.
Good to have features
To increase throughput, consumers should be able to pull batch of messages from queue.
If you are going with Kafka it will only retains message for a configurable duration of time after which the messages will be discarded to free up spaces no matter consumed or not.
And it is simply the responsibilities of the Kafka consumers to keep a track of what has been consumed.
IMHO if you require to keep the messages persisted for ever than consider using a different storage medium (database may be).

How do I find if rabbitmq delivered a particular message to consumer

I have done RabbitMQ queue ccing to find out whether there are messages published to queues. How do I find if all of them are delivered to consumers.
While this question is not altogether clear, let me address the issue of how (or whether) to know if a particular message (let's call it message x) was delivered to a consumer.
First, some theory.
Message queuing is commonly used across networks - and networks can be unreliable. Further, the machines operating the message system may be unreliable.
Message queues are usually designated for the processing of a particular type of message. The processing of the message itself may be unreliable.
As a result of the foregoing, messages have the possibility to be processed/consumed zero or more times (i.e. a message can be dropped, processed once, or processed more than once).
Now, RabbitMQ contains some features that attempt to mitigate the possible failure modes (primary using acknowledgments), but no mitigation technique can be 100% reliable. Therefore, while the reliability is higher, it cannot be guaranteed - and your application needs to be able to cope with the occasional possibility of failure.
There is an inherent assumption in the question that the original publisher of message x cares about the consumer of message x. This indicates that a two-way exchange (e.g. RPC) is needed - one from publisher to consumer for message x, then from consumer back to original publisher (message y). The original publisher maintains state while the consumer processes message x, and the receipt of the message y response closes out the state machine.
If the intent is to simply publish a stream of messages, the publisher should neither be aware of the consumers nor care whether or not the messages are consumed. However, from an application monitoring standpoint, you presumably would care. You (as the systems administrator) could do a few things to see if messages are being consumed:
Monitor the RabbitMQ management console to see publish/consume rates, as well as queue length
Set up logging and tracing in your application (perhaps dumping logs off to elasticsearch) - then set up a log analyzer to detect abnormal log conditions
Set up performance monitoring on the consuming computers - if there is a problem, you will likely see abnormal statistics on variables like processor time and memory use
Send an occasional test message, which can be specially configured to put a marker in the logs, and look for that marker.

RabbitMQ memory usage creeping up and blocking calls... why?

I'm using RabbitMQ to handle app logs (windows server 2008 install). apps send messages to the exchange. I have a dedicated queue that gets messages forwarded to it. I then have a windows service connecting to that queue, pulling messages off, and persisting them to DB. I have a n-number of clients connecting to the exchange in real time to latch on the the stream so there are n-number of connections at a time. It is possible that some of these clients may not Close() their connections in code. Many clients have long running connections.
As messages are pulled off the queue, they are auto-ack'ed, so I don't have any unacknowledged messages on the queue. However, I'm seeing the memory of Rabbit grow over time. It starts at 32K or so when first turned on then creeps up until it exceeds the threshold and blocks incoming connections.
I have both .NET and Java clients--but both are auto-ack.
Reading the docs, I didn't see any description of how Rabbit is using memory--i.e. I don't understand why memory would be bloating over time. The messages are getting pulled off and ack'ed which seems to me would mean that Rabbit wouldn't be holding on to it any more and thus can free the associated memory, causing a stable mem usage profile.
I don't see how fiddling with the memory dial in Rabbit would help either--usage just creeps upwards over time: eventually I'll exceed it.
My guess is that there is something I'm doing wrong with my clients that is causing the memory to grow over time, but I can't think of why that would be.
why does Rabbit memory usage creep up when no messages are kept on any queues?
what coding practices could cause the RabbitMQ server to
retain (and grow) memory?
Is it possible that you have other queues bound to the exchange perhaps? Check the Rabbit admin page under exchanges, click on your exchange, and check for queues bound to it. It may be that one of your clients, when declaring the exchange, is inadvertently binding an unnamed (system random named) queue to the exchange, and messages are piling up in there.
The other thing to check is the QoS settings - if you leave QoS set at the default (infinite) then Rabbit will send out messages immediately to any client regardless of how many messages they are already holding. This results in a lot of book-keeping, like which client has which message on the server, and a large buffer on the client.
Make sure to set your QoS pre-fetch limit to something much more reasonable, like say 100. That way, if you have 1M messages and only 1 client with prefetch of 100, Rabbit will send only 100 to the client and keep the other 999900 on disk on the server, and not use nearly as much memory.
This was a big cause of memory bloat in my application, and now that I've addressed prefetch, everything is fine.