Maximum message size for RabbitMQ

Maximum message size for RabbitMQ - rabbitmq

What is the maximum size that a message can be when publishing to a RabbitMQ queue (pub/sub model) ?
I can't see any explicit limits in the docs but I assume there are some guidelines.
Thanks in advance.

I was doing comparison between Amazon Queue Service and RabbitMQ or other streaming+messaging platforms like kinesis, kafka. As Amazon Queue Service only supports min 2^10 bytes(1 Kilobytes) - max 2^18 bytes (256 Kilobytes), similarly kinesis has size limits too. (Don't know why?)
Anyway In theory AMQueueProtocal would handle 2^64 bytes. So, even for a huge message, RabbitMQ might work in a single broker, definitely taking minutes/hours to persist but would or might not in a cluster of brokers. If the message transfer time between nodes (60seconds?) > heartbeat time between nodes, it will cause the cluster to disconnect and the loose the message.
This thread is useful -> Can RabbitMQ handle big messages?
References
http://grokbase.com/t/rabbitmq/rabbitmq-discuss/127wsy1h92/limiting-the-size-of-a-message
http://comments.gmane.org/gmane.comp.networking.rabbitmq.general/14665
http://rabbitmq.1065348.n5.nabble.com/Max-messages-allowed-in-a-queue-in-RabbitMQ-td26063.html
https://www.rabbitmq.com/heartbeats.html

Related

Large RabbitMQ message in Slow network

I am using RabbitMQ with Spring AMQP
large message (>100MB, 102400KB)
small bandwidth (<512Kbps)
low heartbeat interval (10 seconds)
single broker
It will take >= 200*8 seconds to consume the message, which is more than my heartbeat interval. From https://stackoverflow.com/a/42363685/418439
If the message transfer time between nodes (60seconds?) > heartbeat time between nodes, it will cause the cluster to disconnect and the loose the message
Will I also face the disconnection issue even I am using single broker?
Does the heartbeat and consumer using the same thread, where if
consumer is consuming, it is not possible to perform heartbeat?
If so, what can I do to consume the message, without increase heartbeat interval or reduce my message size?
Update:
I have received another answer and comments after I posted my own answer. Thanks for the feedback. Just to clarify, I do not use AMQP for file transfer. Actually the data is in JSON message, some are simple and small but some contain complex information, include some free hand drawing. Besides saving the data at Data Center, we also save a copy of message at branch level via AMQP, for case connectivity to Data Center is not available.

So, the real questions here are a bit more fundamental, and those are: (1) is it appropriate to perform a large file transfer via AMQP, and (2) what purpose does the heartbeat serve?
Heartbeats
First off, let's address the heartbeat question. As the RabbitMQ documentation clearly states, the purpose of the heartbeat is "to ensure that the application layer promptly finds out about disrupted connections."
The reason for this is simple. In an ordinary AMQP usage, there may be several seconds, even minutes between the arrival of successive messages. Without data being exchanged across a TCP session, many firewalls and other networking equipment automatically close ports to lower exposure to the enterprise network. Heartbeats further help mitigate a fundamental weakness in TCP, which is the difficulty of detecting a dropped connection. Networks experience failure, and TCP is not always able to detect that on its own.
So, the bottom line here is that, while you're transferring a large message, the connection is active and the heartbeat function serves no useful purpose, and can cause you trouble. It's best to turn it off in such cases.
AMQP For Moving Large Files?
The second issue, and I believe more important question, is how should large files be dealt with. To answer this, let's first consider what a message queue does: sending messages -- small bits of data which communicate something to another computer system. The operative word here is small. Messages typically contain one of three things: 1. commands (go do something), 2. events (something happened), 3. requests (give me some data), and 4. responses (here is your data). A full discussion on these is beyond the scope, but suffice it to say that each of these can generally be composed of a small message less than 100kB.
Indeed, the AMQP protocol, which underlies RabbitMQ, is a fairly chatty protocol. It requires large messages be divided into multiple segments of no more than 131kB. This can add a significant amount of overhead to a large file transfer, especially when compared to other file transfer mechanisms (FTP, for instance). Secondly, the message has to be fully processed by the broker before it is made available in a queue, and it ties up valuable resources on the broker while this is being done. For one, the whole message must fit into RAM on the broker due to its architecture. This solution may work for one client and one broker, but it will break quickly when scaling out is attempted.
Finally, compression is often desirable when transferring files - HTTP supports gzip compression automatcially. AMQP does not. It is quite common in message-oriented applications to send a message containing a resource locator (e.g. URL) pointing to the larger data file, which is then accessed via appropriate means.
The moral of the story
As the adage goes: "to the man with a hammer, everything looks like a nail." AMQP is not a hammer- it's a precision scalpel. It has a very specific purpose, and narrow applicability within that purpose. Using it for something other than its intended purpose will lead to stability and reliability problems in whatever it is you are designing, and overall dissatisfaction with your end product.

Will I also face the disconnection issue even I am using single
broker?
Yes
Does the heartbeat and consumer use the same thread, where
if consumer is consuming, it is not possible to perform heartbeat?
Can't confirm the thread, but from what I observe when Java RabbitMQ consumer consumes a message, it won't perform heartbeat acknowledgement. If the time to consume longer than 3 x heartbeat timeout timer (due to large message and/or low bandwidth), MQ server will close AMQP connection.
If so, what can I do to consume the message, without increase
heartbeat interval or reduce my message size?
I resolved my issue by increasing heartbeat size. No further code change is required.

Does RabbitMQ have a mechanism to throttle down producers/consumers?

As far as I know, RabbitMQ has a internal flow control which blocks a producer which publishes messages too fast that consumers cannot catch up it. (It does not require any configuration)
I'd like to know whether I can configure some amount of quota (MB/sec) for each producer and client so that they do not burden the broker system too much.
For example, a producer with quota 2 MB/sec cannot publish messages at higher rate than 2 MB/sec.

There is no a way lo limit each single producer.
The flow control needs to do not burden the broker system too much.
If needs, you can tune the memory threshold and the paging threshold:
https://www.rabbitmq.com/memory.html
about the flow control I suggest to read:
http://www.rabbitmq.com/blog/2014/04/14/finding-bottlenecks-with-rabbitmq-3-3/
and
https://www.rabbitmq.com/blog/2015/10/06/new-credit-flow-settings-on-rabbitmq-3-5-5/
I'd add that, for my side, it doesn't make too much sense to limit a single producer, what happen if for example you have thousand of producers ?

Does NServiceBus 4.x with RabbitMQ support round robing consumers or the competing consumer model?

I'm using NServiceBus 4.x with RabbitMQ 3.2.x as my transport.
I made the assumption that by using RabbitMQ as my transport I would be given the competing consumer model as an option. I understand that NServiceBus employs the "Fannout" exchange type for all exchanges and does not support round robin at this time. However is there a way to configure NServiceBus to take advantage of the levels of indirection via Exchanges and channels that RabbitMQ offers.
I have several consumers I would like to compete for messages from a given queue. What I am observing is subscribers' blocking access to further message retrieval from the queue until the message is consumed. So having more then one consumer at this point does me no good other then redundancy.
After reading some documentation on RabbitMQ I'm assuming that it's normal to block until the Ack receipt is sent from the subscriber. But I had assumed that subscriber #2 would have free access to the queue to fetch another message.
There is mention of increasing the prefetch count on RabbitMQ channel.
Example:
channel.BasicQos(0,prefetchcount,false)
I don't see anywhere that I can change this setting via configuration in NServiceBus. Furthermore as I read what prefetch does I'm really not sure this what i'm looking for.
Is it possible to use RabbitMQ with out a distirbutor type pattern used with MSMQ? Or should I move to MassTransit or Rebus?

Put prefetchcount=2 in your connection string. Any value above 1 will tell the broker to allow more than X unacked message to go out. You need to fiddle with this setting to find the optimum for your scenario.

RabbitMQ memory usage creeping up and blocking calls... why?

I'm using RabbitMQ to handle app logs (windows server 2008 install). apps send messages to the exchange. I have a dedicated queue that gets messages forwarded to it. I then have a windows service connecting to that queue, pulling messages off, and persisting them to DB. I have a n-number of clients connecting to the exchange in real time to latch on the the stream so there are n-number of connections at a time. It is possible that some of these clients may not Close() their connections in code. Many clients have long running connections.
As messages are pulled off the queue, they are auto-ack'ed, so I don't have any unacknowledged messages on the queue. However, I'm seeing the memory of Rabbit grow over time. It starts at 32K or so when first turned on then creeps up until it exceeds the threshold and blocks incoming connections.
I have both .NET and Java clients--but both are auto-ack.
Reading the docs, I didn't see any description of how Rabbit is using memory--i.e. I don't understand why memory would be bloating over time. The messages are getting pulled off and ack'ed which seems to me would mean that Rabbit wouldn't be holding on to it any more and thus can free the associated memory, causing a stable mem usage profile.
I don't see how fiddling with the memory dial in Rabbit would help either--usage just creeps upwards over time: eventually I'll exceed it.
My guess is that there is something I'm doing wrong with my clients that is causing the memory to grow over time, but I can't think of why that would be.
why does Rabbit memory usage creep up when no messages are kept on any queues?
what coding practices could cause the RabbitMQ server to
retain (and grow) memory?

Is it possible that you have other queues bound to the exchange perhaps? Check the Rabbit admin page under exchanges, click on your exchange, and check for queues bound to it. It may be that one of your clients, when declaring the exchange, is inadvertently binding an unnamed (system random named) queue to the exchange, and messages are piling up in there.
The other thing to check is the QoS settings - if you leave QoS set at the default (infinite) then Rabbit will send out messages immediately to any client regardless of how many messages they are already holding. This results in a lot of book-keeping, like which client has which message on the server, and a large buffer on the client.
Make sure to set your QoS pre-fetch limit to something much more reasonable, like say 100. That way, if you have 1M messages and only 1 client with prefetch of 100, Rabbit will send only 100 to the client and keep the other 999900 on disk on the server, and not use nearly as much memory.
This was a big cause of memory bloat in my application, and now that I've addressed prefetch, everything is fine.

Maximize throughput with RabbitMQ

In our project, we want to use the RabbitMQ in "Task Queues" pattern to pass data.
On the producer side, we build a few TCP server(in node.js) to recv
high concurrent data and send it to MQ without doing anything.
On the consumer side, we use JAVA client to get the task data from
MQ, handle it and then ack.
So the question is:
To get the maximum message passing throughput/performance( For example, 400,000 msg/second) , How many queues is best? Does that more queue means better throughput/performance? And is there anything else should I notice?
Any known best practices guide for using RabbitMQ in such scenario?
Any comments are highly appreciated!!

For best performance in RabbitMQ, follow the advice of its creators. From the RabbitMQ blog:
RabbitMQ's queues are fastest when they're empty. When a queue is
empty, and it has consumers ready to receive messages, then as soon as
a message is received by the queue, it goes straight out to the
consumer. In the case of a persistent message in a durable queue, yes,
it will also go to disk, but that's done in an asynchronous manner and
is buffered heavily. The main point is that very little book-keeping
needs to be done, very few data structures are modified, and very
little additional memory needs allocating.
If you really want to dig deep into the performance of RabbitMQ queues, this other blog entry of theirs goes into the data much further.

According to a response I once got from the rabbitmq-discuss mailing group there are other things that you can try to increase throughput and reduce latency:
Use a larger prefetch count. Small values hurt performance.
A topic exchange is slower than a direct or a fanout exchange.
Make sure queues stay short. Longer queues impose more processing
overhead.
If you care about latency and message rates then use smaller messages.
Use an efficient format (e.g. avoid XML) or compress the payload.
Experiment with HiPE, which helps performance.
Avoid transactions and persistence. Also avoid publishing in immediate
or mandatory mode. Avoid HA. Clustering can also impact performance.
You will achieve better throughput on a multi-core system if you have
multiple queues and consumers.
Use at least v2.8.1, which introduces flow control. Make sure the
memory and disk space alarms never trigger.
Virtualisation can impose a small performance penalty.
Tune your OS and network stack. Make sure you provide more than enough
RAM. Provide fast cores and RAM.

You will increase the throughput with a larger prefetch count AND at the same time ACK multiple messages (instead of sending ACK for each message) from your consumer.
But, of course, ACK with multiple flag on (http://www.rabbitmq.com/amqp-0-9-1-reference.html#basic.ack) requires extra logic on your consumer application (http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2013-August/029600.html). You will have to keep a list of delivery-tags of the messages delivered from the broker, their status (whether your application has handled them or not) and ACK every N-th delivery-tag (NDTAG) when all of the messages with delivery-tag less than or equal to NDTAG have been handled.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas