I want to know how does RabbitMQ store the messages physically in its RAM and Disk?
I know that RabbitMQ tries to keep the messages in memory (But I don't know how the messages are put in the Ram). But the messages can be spilled into disk when the messages are with persistent mode or when the broker has the memory pressure. (But I don't know how the messages are stored in Disk.)
I'd like to know the internals about these. Unfortunately, the official documentation in its homepage do not expose the internal details.
Which document should I read for this?
RabbitMQ uses a custom DB to store the messages, the db is usually located here:
/var/lib/rabbitmq/mnesia/rabbit#hostname/queues
Starting form the version 3.5.5 RabbitMQ introduced the new New Credit Flow
https://www.rabbitmq.com/blog/2015/10/06/new-credit-flow-settings-on-rabbitmq-3-5-5/
Let’s take a look at how RabbitMQ queues store messages. When a
message enters the queue, the queue needs to determine if the message
should be persisted or not. If the message has to be persisted, then
RabbitMQ will do so right away[3]. Now even if a message was persisted
to disk, this doesn’t mean the message got removed from RAM, since
RabbitMQ keeps a cache of messages in RAM for fast access when
delivering messages to consumers. Whenever we are talking about paging
messages out to disk, we are talking about what RabbitMQ does when it
has to send messages from this cache to the file system.
This post blog is enough detailed.
I also suggest to read about lazy queue:
https://www.rabbitmq.com/lazy-queues.html
and
https://www.rabbitmq.com/blog/2015/12/28/whats-new-in-rabbitmq-3-6-0/
Lazy Queues This new type of queues work by sending every message that
is delivered to them straight to the file system, and only loading
messages in RAM when consumers arrive to the queues. To optimize disk
reads messages are loaded in batches.
Related
I was reading this tutorial about RabbitMQ. In it's description of what a queue is in RabbitMQ, it says the following:
A queue is only bound by the host's memory & disk limits, it's
essentially a large message buffer.
In this context, what is a message buffer? Is it a common data structure?
In this context, what is a message buffer? Is it a common data structure?
A buffer in Computer Science typically refers to a data structure or memory region which holds data temporarily while it is moved from one location to another.
You will find buffers widespread at many levels of abstraction throughout the hardware/software stack. They are especially common around interaction points with hardware devices (reading/writing data to/from software and peripherals, for example) and in networking code which writes data to/from network sockets. They are particularly useful where it is necessary to decouple a producer & consumer (different processes may read/write the buffered data, for example, or do so at different speeds) or in cases where users of a resource must queue prior to being serviced.
In the RabbitMQ context, "message buffer" refers to Rabbit's message queue data structure. A queue is a region of memory, backed by a persistent copy of messages on disk, in which RabbitMQ stores messages submitted by producers1 while it awaits a consumer to read the queue and process the message. The RabbitMQ broker acts as an intermediary to decouple the producer and consumer processes from each other.
1Of course, RabbitMQ offers its users advanced routing logic for submitted messages. Messages submitted by users may be committed directly to a queue (buffer) for delivery, or they may traverse a more complex set of routes which dynamically delivers the message to zero or more queues for delivery to multiple consumer processes.
I have been looking at message queues (currently between Kafka and RabbitMQ) for one of my projects where these are biggest must have features.
Must have features
Messages in queues should be persistent. (only until they are processed successfully by consumers.)
Messages in queues should be removed only when downstream consumers were able to process the message successfully. Basically, a consumer should ACK. that it processed a message successfully.
Good to have features
To increase throughput, consumers should be able to pull batch of messages from queue.
If you are going with Kafka it will only retains message for a configurable duration of time after which the messages will be discarded to free up spaces no matter consumed or not.
And it is simply the responsibilities of the Kafka consumers to keep a track of what has been consumed.
IMHO if you require to keep the messages persisted for ever than consider using a different storage medium (database may be).
I've made a test, based on the example solution of the activemq-cpp library.
In the test I send 50,000 messages to a queue, and after they're all sent I consume them, with INDIVIDUAL_ACKNOWLEDGE on the session and message->acknowledge() on every consumed message. The consumer is asynchronous.
Memory (private working set) of java.exe before sending messages: 209,320 KB. After sending all messages: 412,548 KB. After consuming all messages: 434,637 KB. Meaning, although queue size is 0, memory was not released.
What am I missing?
Thanks.
Besides the JVM processing mentioned above there are a number of other factors that are in play here. Depending on the state of the broker when you started the producer sending messages there could be a number of resources that where allocated on the broker to create the Queue and various other management objects which will then remain in memory to facilitate message routing etc. To truly analyze the memory usage and check for leaks you should use a tool like Yourkit etc.
I'm using RabbitMQ to handle app logs (windows server 2008 install). apps send messages to the exchange. I have a dedicated queue that gets messages forwarded to it. I then have a windows service connecting to that queue, pulling messages off, and persisting them to DB. I have a n-number of clients connecting to the exchange in real time to latch on the the stream so there are n-number of connections at a time. It is possible that some of these clients may not Close() their connections in code. Many clients have long running connections.
As messages are pulled off the queue, they are auto-ack'ed, so I don't have any unacknowledged messages on the queue. However, I'm seeing the memory of Rabbit grow over time. It starts at 32K or so when first turned on then creeps up until it exceeds the threshold and blocks incoming connections.
I have both .NET and Java clients--but both are auto-ack.
Reading the docs, I didn't see any description of how Rabbit is using memory--i.e. I don't understand why memory would be bloating over time. The messages are getting pulled off and ack'ed which seems to me would mean that Rabbit wouldn't be holding on to it any more and thus can free the associated memory, causing a stable mem usage profile.
I don't see how fiddling with the memory dial in Rabbit would help either--usage just creeps upwards over time: eventually I'll exceed it.
My guess is that there is something I'm doing wrong with my clients that is causing the memory to grow over time, but I can't think of why that would be.
why does Rabbit memory usage creep up when no messages are kept on any queues?
what coding practices could cause the RabbitMQ server to
retain (and grow) memory?
Is it possible that you have other queues bound to the exchange perhaps? Check the Rabbit admin page under exchanges, click on your exchange, and check for queues bound to it. It may be that one of your clients, when declaring the exchange, is inadvertently binding an unnamed (system random named) queue to the exchange, and messages are piling up in there.
The other thing to check is the QoS settings - if you leave QoS set at the default (infinite) then Rabbit will send out messages immediately to any client regardless of how many messages they are already holding. This results in a lot of book-keeping, like which client has which message on the server, and a large buffer on the client.
Make sure to set your QoS pre-fetch limit to something much more reasonable, like say 100. That way, if you have 1M messages and only 1 client with prefetch of 100, Rabbit will send only 100 to the client and keep the other 999900 on disk on the server, and not use nearly as much memory.
This was a big cause of memory bloat in my application, and now that I've addressed prefetch, everything is fine.
I am using ActiveMQ 5.4 with KahaDB as message store.
While Publishing Messages (with Persistence true) to a Topic, which has Durable subscriber, the persistence store is increasing even the messages are dispatched to Subscriber. So this is causing an issue as the message store is getting full and not accepting any more messages.
So my question is why the Persistence store is not discarding the messages in the KahaDB, even the messages are getting dispatched?
Regards,
Srinivas
What you are seeing is an interaction between the ActiveMQ message store behaviour and that for durable subscriptions on topics.
When you have durable subscriptions, a topic is treated like a queue for each subscriber's clientId (set on the Connection). The logic being that the client doesn't want to miss any messages when they disconnect. So if they disconnect, the durable subscription hangs around and keeps the messages alive.
The AMQ message store uses data logs for it's message journal. These are written sequentially, and never actually removed from (that would require random access). There is a second file which keeps track of which messages have been consumed. Once all the messages in a data file have been consumed, that file is deleted.
So what you're seeing is that some of the messages in the data file are not being consumed by these durable subscriptions and just hang around. ClientIds for durable subscribers not being consistently used would cause this issue. It's likely that there is something wrong with the way the feature is being used, if you use JMX to inspect the subscriptions on the broker that should help you track down the root cause.
As a general rule, whenever you think that you might want to use a durable subscription, use virtual topics instead - they are much easier to reason about, inspect and load balance. On the other hand if you just want to get the last couple of messages when you reconnect a topic subscriber rather than all the messages you may have missed, use retroactive consumers.
An easy way to get around this issue is to always use a time to live when you send a message - pretty much every use case has a time limit of when a message ought to be consumed by anyway. ActiveMQ will expire messages beyond this point, and free up the messages in the data files for deletion.