What are the allowed types of messages (strings, bytes, integers, etc.)?
What is the maximum size of a message?
What is the maximum number of queues and exchanges?
Theoretically anything can be stored/sent as a message. You actually don't want to store anything on the queues. The system works most efficiently if the queues are empty most of the time. You can send anything you want to the queue with two preconditions:
The thing you are sending can be converted to and from a bytestring
The consumer knows exactly what it is getting and how to convert it to the original object
Strings are pretty easy, they have a built in method for converting to and from bytes. If you know it is a string then you know how to convert it back. The best option is to use a markup string like XML, JSON, or YML. This way you can convert objects to Strings and back again to the original objects; they work across programming languages so your consumer can be written in a different language to your producer as long as it knows how to understand the object.
I work in Java. I want to send complex messages with sub objects in the fields. I use my own message object. The message object has two additional methods toBytes and fromBytes that convert to and from the bytestream. I use routing keys that leave no doubt as to what type of message the consumer is receiving. The message is Serializable. This works fine, but is limiting as I can only use it with other Java programs.
The size of the message is limited by the memory on the server, and if it is persistent then also the free HDD space too. You probably do not want to send messages that are too big; it might be better to send a reference to a file or DB.
You might also want to read up on their performance measures:
http://www.rabbitmq.com/blog/2012/04/17/rabbitmq-performance-measurements-part-1/
http://www.rabbitmq.com/blog/2012/04/25/rabbitmq-performance-measurements-part-2/
Queues are pretty light weight, you will most likely be limited by the number of connections you have. It will depend on the server most likely. Here is some info on a similiar question:
http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2009-February/003042.html
What is the maximum size of a message?
It used to be 2 GiB before version 3.8.0:
%% Trying to send a term across a cluster larger than 2^31 bytes will
%% cause the VM to exit with "Absurdly large distribution output data
%% buffer". So we limit the max message size to 2^31 - 10^6 bytes (1MB
%% to allow plenty of leeway for the #basic_message{} and #content{}
%% wrapping the message body).
-define(MAX_MSG_SIZE, 2147383648).
Reference: https://github.com/rabbitmq/rabbitmq-common/blob/v3.7.21/include/rabbit.hrl#L279
It has been 512 MiB since version 3.8.0:
%% Max message size is hard limited to 512 MiB.
%% If user configures a greater rabbit.max_message_size,
%% this value is used instead.
-define(MAX_MSG_SIZE, 536870912).
Reference: https://github.com/rabbitmq/rabbitmq-common/blob/v3.8.0/include/rabbit.hrl#L238
See robthewolf's answer.
The max message size is 2GB, however, performance tuning for messages of this size is not effective. Max Message Size
There is no hard limit imposed by RabbitMQ Server Software on the number of queues, however, the hardware the server is running on may very well impact this limit.
3a. There is no queue length limit imposed by the server by default. You can, however, limit this through server-side policy (configuration) or client side policy. Max Queue Length
There is more information and links on a related post.
Related
I have been trying to use WebRTC Data Channel for a game, however, I am unable to consistently send live player data without hitting the queue size limit (8KB) after 50-70 secs of playing.
Sine the data is required to be real-time, I have no use for data that comes out of order. I have initialized the data channel with the following attributes:
negotiated: true,
id: id,
ordered: true,
maxRetransmits: 0,
maxPacketLifetime: 66
The MDN Docs said that the buffer cannot be altered in any way.
Is there anyway I can consistently send data without exceeding the buffer space? I don't mind purging the buffer space as it only contains data that has been clogged up over time.
NOTE: The data is transmitting until the buffer size exceeds the 8KB space.
EDIT: I forgot to add that this issue is only occurring when the two sides are on different networks. When both are within the same LAN, there is no buffering (since higher bandwidth, I presume). I tried to add multiple Data Channels (8 in parallel). However, this only increased the time before the failure occurred again. All 8 buffers were full. I also tried creating a new channel each time the buffer was close to being full and switched to the new DC while closing the previous one that was full, but I found out the hard way (reading Note in MDN Docs) that the buffer space is not released immediately, rather tries to transmit all data in the buffer taking away precious bandwidth.
Thanks in advance.
The maxRetransmits value is ignored if the maxPacketLifetime value is set; thus, you've configured your channel to resend packets for up to 66ms. For your application, it is probably better to use a pure unreliable channel by setting maxPacketLifetime to 0.
As Sean said, there is no way to flush the queue. What you can do is to drop packets before sending them if the channel is congested:
if(dc.bufferedAmount > 0)
return;
dc.send(data);
Finally, you should realise that buffering may happen in the network as well as at the sender: any router can buffer packets when it is congested, and many routers have very large buffers (this is called BufferBloat). The WebRTC stack should prevent you from buffering too much data in the network, but if WebRTC's behaviour is not aggressive enough for your needs, you will need to add explicit feedback from the sender to the receiver in order to avoid having too many packets in flight.
I don't believe you can flush the outbound buffer, you will probably need to watch the bufferedAmount and adjust what you are sending if it grows.
Maybe handle the retransmissions yourselves and discard old data if needed? WebRTC doesn't surface the SACKs from SCTP. So I think you will need to implement something yourself.
It's an interesting problem. Would love to hear the WebRTC W3C WorkGroup takes on it if exposing more info would make things easier for you.
Always when I find some articles or videos are talking about stream they're necessairly talking about serialization?
what is the relation between those? or to be specific,
Could we say that the data stream always needs serialization or could we find some data stream without serialization?
Firstly, it useful to have a reminder of serial vs parallel communication: if we take a simple example of transmitting a byte, in the parallel case all 8 bits are sent at the same time and in the serial case the 8 bits are sent one by one and the byte built again on the receiving side.
For your video domain example, If you imagine a frame of a video as being a large collection of bytes, lets say 720 by 1280 pixels and each pixel is represented by a byte, then we need 921,600 bytes to represent the frame.
If you are streaming the video you need to send each frame (plus overhead which we'll ignore here for simplicity) from the server to the client device, hence you need to send the 921,600 bytes for each frame.
If you had a very (very!) large parallel connections that could transmit 921,600 bytes in parallel between the server and the client in a single communication then this would be easy to understand.
However, this is almost always not the case, even for much smaller data structures, so serialisation is the name generally given to the process of taking the 921,600 bytes and breaking them down into the size which you can transmit - and that size is often one bit at a time.
Generally a video will be broken down into packets and the packets transmitted to the client. The packets themselves are just collections of bytes also and if the connection allows only a single bit of information to be transmitted at a time, then the packet needs to be broken down and sent 'serially' one bit at a time.
To complicate things, as is commonly the case in computer science and communications, the terms can mean different things in different contexts.
For example you may see it mentioned that you can either stream or 'serialise an object' in some client server communication. What this generally means is that you can either send the raw data 'stream' and let the client be responsible for how to interpret it, or you can use a framework or underlying mechanism which will take an object, convert it into a format that can be transmitted serially, and then reconstruct it on the other end and give it to the client. In fact the actually communication is serial in both cases (if it is using a serial communication channel) so the terms are being used in a different way here.
I am using RabbitMQ with Spring AMQP
large message (>100MB, 102400KB)
small bandwidth (<512Kbps)
low heartbeat interval (10 seconds)
single broker
It will take >= 200*8 seconds to consume the message, which is more than my heartbeat interval. From https://stackoverflow.com/a/42363685/418439
If the message transfer time between nodes (60seconds?) > heartbeat time between nodes, it will cause the cluster to disconnect and the loose the message
Will I also face the disconnection issue even I am using single broker?
Does the heartbeat and consumer using the same thread, where if
consumer is consuming, it is not possible to perform heartbeat?
If so, what can I do to consume the message, without increase heartbeat interval or reduce my message size?
Update:
I have received another answer and comments after I posted my own answer. Thanks for the feedback. Just to clarify, I do not use AMQP for file transfer. Actually the data is in JSON message, some are simple and small but some contain complex information, include some free hand drawing. Besides saving the data at Data Center, we also save a copy of message at branch level via AMQP, for case connectivity to Data Center is not available.
So, the real questions here are a bit more fundamental, and those are: (1) is it appropriate to perform a large file transfer via AMQP, and (2) what purpose does the heartbeat serve?
Heartbeats
First off, let's address the heartbeat question. As the RabbitMQ documentation clearly states, the purpose of the heartbeat is "to ensure that the application layer promptly finds out about disrupted connections."
The reason for this is simple. In an ordinary AMQP usage, there may be several seconds, even minutes between the arrival of successive messages. Without data being exchanged across a TCP session, many firewalls and other networking equipment automatically close ports to lower exposure to the enterprise network. Heartbeats further help mitigate a fundamental weakness in TCP, which is the difficulty of detecting a dropped connection. Networks experience failure, and TCP is not always able to detect that on its own.
So, the bottom line here is that, while you're transferring a large message, the connection is active and the heartbeat function serves no useful purpose, and can cause you trouble. It's best to turn it off in such cases.
AMQP For Moving Large Files?
The second issue, and I believe more important question, is how should large files be dealt with. To answer this, let's first consider what a message queue does: sending messages -- small bits of data which communicate something to another computer system. The operative word here is small. Messages typically contain one of three things: 1. commands (go do something), 2. events (something happened), 3. requests (give me some data), and 4. responses (here is your data). A full discussion on these is beyond the scope, but suffice it to say that each of these can generally be composed of a small message less than 100kB.
Indeed, the AMQP protocol, which underlies RabbitMQ, is a fairly chatty protocol. It requires large messages be divided into multiple segments of no more than 131kB. This can add a significant amount of overhead to a large file transfer, especially when compared to other file transfer mechanisms (FTP, for instance). Secondly, the message has to be fully processed by the broker before it is made available in a queue, and it ties up valuable resources on the broker while this is being done. For one, the whole message must fit into RAM on the broker due to its architecture. This solution may work for one client and one broker, but it will break quickly when scaling out is attempted.
Finally, compression is often desirable when transferring files - HTTP supports gzip compression automatcially. AMQP does not. It is quite common in message-oriented applications to send a message containing a resource locator (e.g. URL) pointing to the larger data file, which is then accessed via appropriate means.
The moral of the story
As the adage goes: "to the man with a hammer, everything looks like a nail." AMQP is not a hammer- it's a precision scalpel. It has a very specific purpose, and narrow applicability within that purpose. Using it for something other than its intended purpose will lead to stability and reliability problems in whatever it is you are designing, and overall dissatisfaction with your end product.
Will I also face the disconnection issue even I am using single
broker?
Yes
Does the heartbeat and consumer use the same thread, where
if consumer is consuming, it is not possible to perform heartbeat?
Can't confirm the thread, but from what I observe when Java RabbitMQ consumer consumes a message, it won't perform heartbeat acknowledgement. If the time to consume longer than 3 x heartbeat timeout timer (due to large message and/or low bandwidth), MQ server will close AMQP connection.
If so, what can I do to consume the message, without increase
heartbeat interval or reduce my message size?
I resolved my issue by increasing heartbeat size. No further code change is required.
We're currently using RabbitMQ, where a continuously super-fast producer is paired with a consumer limited by a limited resource (e.g. slow-ish MySQL inserts).
We don't like declaring a queue with x-max-length, since all messages will be dropped or dead-lettered once the limit is reached, and we don't want to loose messages.
Adding more consumers is easy, but they'll all be limited by the one shared resource, so that won't work. The problem still remains: How to slow down the producer?
Sure, we could put a flow control flag in Redis, memcached, MySQL or something else that the producer reads as pointed out in an answer to a similar question, or perhaps better, the producer could periodically test for queue length and throttle itself, but these seem like hacks to me.
I'm mostly questioning whether I have a fundamental misunderstanding. I had expected this to be a common scenario, and so I'm wondering:
What is best practice for throttling producers? How is this done with RabbitMQ? Or do you do this in a completely different way?
Background
Assume the producer actually knows how to slow himself down with the right input. E.g. a hardware sensor or hardware random number generator, that can generate as many events as needed.
In our particular real case, we have an API that users can use to add messages. Instead of devouring and discarding messages, we'd like to apply back-pressure by having our API return an error if the queue is "full", so the caller/user knows to back-off, or have the API block until the consumer catches up. We don't control our user, so regardless of how fast the consumer is, I can create a producer that is faster.
I was hoping for something like the API for a TCP socket, where a write() can block and where a select() can be used to determine if a handle is writable. So either having the RabbitMQ API block or have it return an error if the queue is full.
For the x-max-length property, you said you don't want messages to be dropped or dead-lettered. I see there was an update in adding some more capabilities for this. As I see it is specified in the documentation:
"Use the overflow setting to configure queue overflow behaviour. If overflow is set to reject-publish, the most recently published messages will be discarded. In addition, if publisher confirms are enabled, the publisher will be informed of the reject via a basic.nack message"
So as I understand it, you can use queue limit to reject the new messages from publishers thus pushing some backpressure to the upstream.
I don't think that this is in any way rabbitmq specific. Basically you have a scenario, where there are two systems of different processing capabilities, and this mismatch will either pose a risk of overflowing the queue (whatever it would be), or even in case of a constant mismatch between producer and consumer, simply create more and more time-distance between event creation and its handling.
I used to deal with this kind of scenarios, and unfortunately there is no magic bullet. You either have to speed up even handling (better hardware, more suited software?) or throttle the event creation (which has nothing to do with MQ really).
Now, I would ask you what's the goal and how the events are produced. Are the events are produced constantly, with either unlimitted or just very high rate (for example readings from sensors - the more, the better), or are they created in batches/spikes (for example: user requests in specific time periods, batch loads from CRM system). I assume that the goal is to process everything cause you mention you don't want to loose any queued message.
If the output is constant, then some limiter (either internal counter, if the producer is the only producer, or external queue length checks if queue can be filled with some other system) is definitely in place.
IF eventsInTimePeriod/timePeriod > estimatedConsumerBandwidth
THEN LowerRate()
ELSE RiseRate()
In real world scenarios we used to simply limit the output manually to the estimated values and there were some alerts set for queue length, time from queue entry to queue leaving etc. Where such limiters were omitted (by mistake mostly) we used to find later some tasks that were supposed to be handled in few hours, that were waiting for three months for their turn.
I'm afraid it's hard to answer to "How to slow down the producer?" if we know nothing about it, but some ideas are: aforementioned rate check or maybe a blocking AddMessage method:
AddMessage(message)
WHILE(getQueueLength() > maxAllowedQueueLength)
spin(1000); // or sleep or whatever
mqAdapter.AddMessage(message)
I'd say it all depends on specific of the producer application and in general your architecture.
I'm experimenting with webRTC and it seems that there's an arbitrary limit to how many bytes can be sent in each message. This guy whose example I used chose a limit of 100 (plus some) bytes. In my tests it seems to be close to 200 bytes. However from reading on TCP and UDP those protocols support packages of up to around 65kb and even when taking the MTU for different types of networks into account it should still be a lot more space available than ~200 bytes.
The only source I've found that mentions a hard limit is this WebRTC Data Channel Protocol draft but it only says TBD.
So my questions are:
if there's any source that specifies the current message size limit in any browser?
if I can assume that the limit is always the same, and if not if there's any way my app can be made aware of the limit?
The sharefest project found a way around the rate throttling - you can modify the outgoing offer to change the bandwidth setting (per http://www.ietf.org/rfc/rfc2327.txt)
Details here: https://github.com/Peer5/ShareFest/blob/master/public/js/peerConnectionImplChrome.js#L201
From my own experience you're still limited to ~800 bytes per message.
I've been testing sending jpegs to chrome 57 over the data channel, and messages up to 64k seem to be reliable now.
The webRTC data channel does have a reliability mechanism, it uses SCTP over DTLS (over UDP) - SCTP lets you set reliability and ordering behaviour, but by default WebRTC uses ordered+reliable - meaning you get similar semantics to that of TCP - except that the message boundaries are preserved - at least in theory.
In practice Chrome may deliver partial messages up to the javascript if it runs out of space so it is best to check that you have a complete message before processing it.