As the Doc says:
Some applications need multiple logical connections to the broker. However, it is undesirable to
keep many TCP connections open at the same time because doing so consumes
system resources and makes it more difficult to configure firewalls.
AMQP 0-9-1 connections are multiplexed with channels that can be thought of as
"lightweight connections that share a single TCP connection".
Every protocol operation performed by a client happens on a channel.
Communication on a particular channel is completely separate from communication on another
channel, therefore every protocol method also carries a channel ID (a.k.a. channel number),
an integer that both the broker and clients use to figure out which channel the method is for.
And in spring-amqp doc,it says:
It is important to understand that the cache size is (by default) not a limit but is merely the number of
channels that can be cached. With a cache size of, say, 10, any number of channels can actually be in use.
If more than 10 channels are being used and they are all returned to the cache,
10 go in the cache. The remainder are physically closed.
I'm a little confused since channels are virtual connections,what does physically closed mean?
Based on my understanding, different channels are really just different TCP packets(Identified by a different channel ID)? And if so, Why CachingConnectionFactory cache channels ? (I know my understanding must be wrong。。。)
Related
I am using RabbitMQ with Spring AMQP
large message (>100MB, 102400KB)
small bandwidth (<512Kbps)
low heartbeat interval (10 seconds)
single broker
It will take >= 200*8 seconds to consume the message, which is more than my heartbeat interval. From https://stackoverflow.com/a/42363685/418439
If the message transfer time between nodes (60seconds?) > heartbeat time between nodes, it will cause the cluster to disconnect and the loose the message
Will I also face the disconnection issue even I am using single broker?
Does the heartbeat and consumer using the same thread, where if
consumer is consuming, it is not possible to perform heartbeat?
If so, what can I do to consume the message, without increase heartbeat interval or reduce my message size?
Update:
I have received another answer and comments after I posted my own answer. Thanks for the feedback. Just to clarify, I do not use AMQP for file transfer. Actually the data is in JSON message, some are simple and small but some contain complex information, include some free hand drawing. Besides saving the data at Data Center, we also save a copy of message at branch level via AMQP, for case connectivity to Data Center is not available.
So, the real questions here are a bit more fundamental, and those are: (1) is it appropriate to perform a large file transfer via AMQP, and (2) what purpose does the heartbeat serve?
Heartbeats
First off, let's address the heartbeat question. As the RabbitMQ documentation clearly states, the purpose of the heartbeat is "to ensure that the application layer promptly finds out about disrupted connections."
The reason for this is simple. In an ordinary AMQP usage, there may be several seconds, even minutes between the arrival of successive messages. Without data being exchanged across a TCP session, many firewalls and other networking equipment automatically close ports to lower exposure to the enterprise network. Heartbeats further help mitigate a fundamental weakness in TCP, which is the difficulty of detecting a dropped connection. Networks experience failure, and TCP is not always able to detect that on its own.
So, the bottom line here is that, while you're transferring a large message, the connection is active and the heartbeat function serves no useful purpose, and can cause you trouble. It's best to turn it off in such cases.
AMQP For Moving Large Files?
The second issue, and I believe more important question, is how should large files be dealt with. To answer this, let's first consider what a message queue does: sending messages -- small bits of data which communicate something to another computer system. The operative word here is small. Messages typically contain one of three things: 1. commands (go do something), 2. events (something happened), 3. requests (give me some data), and 4. responses (here is your data). A full discussion on these is beyond the scope, but suffice it to say that each of these can generally be composed of a small message less than 100kB.
Indeed, the AMQP protocol, which underlies RabbitMQ, is a fairly chatty protocol. It requires large messages be divided into multiple segments of no more than 131kB. This can add a significant amount of overhead to a large file transfer, especially when compared to other file transfer mechanisms (FTP, for instance). Secondly, the message has to be fully processed by the broker before it is made available in a queue, and it ties up valuable resources on the broker while this is being done. For one, the whole message must fit into RAM on the broker due to its architecture. This solution may work for one client and one broker, but it will break quickly when scaling out is attempted.
Finally, compression is often desirable when transferring files - HTTP supports gzip compression automatcially. AMQP does not. It is quite common in message-oriented applications to send a message containing a resource locator (e.g. URL) pointing to the larger data file, which is then accessed via appropriate means.
The moral of the story
As the adage goes: "to the man with a hammer, everything looks like a nail." AMQP is not a hammer- it's a precision scalpel. It has a very specific purpose, and narrow applicability within that purpose. Using it for something other than its intended purpose will lead to stability and reliability problems in whatever it is you are designing, and overall dissatisfaction with your end product.
Will I also face the disconnection issue even I am using single
broker?
Yes
Does the heartbeat and consumer use the same thread, where
if consumer is consuming, it is not possible to perform heartbeat?
Can't confirm the thread, but from what I observe when Java RabbitMQ consumer consumes a message, it won't perform heartbeat acknowledgement. If the time to consume longer than 3 x heartbeat timeout timer (due to large message and/or low bandwidth), MQ server will close AMQP connection.
If so, what can I do to consume the message, without increase
heartbeat interval or reduce my message size?
I resolved my issue by increasing heartbeat size. No further code change is required.
I am using amqplib in Node.js, and I am not clear about the best practices in my code.
Basically, my current code calls the amqp.connect() when the Node server starts up, and then uses a different channel for each producer and each consumer, never actually closing any of them. I'd like to know if that makes any sense, or should I create the channel, publish and close it every time I want to publish a message. And what about the connection? Is that a "good practice" to connect once, and then keep it open for the lifetime of my server?
On the Consumer side - can I use a single connection and a single channel to listen on multiple queues?
Thank you for any clarifications
In general, it's not a good practice to open and close connections and channels per message. Connections are long lived and it takes resources to keep opening and closing them. For channels, they share the TCP connection with the connection so they are more lightweight, but they will still consume memory and definitely should not be left open after done using them.
It is recommended to have a channel per thread, and a channel per consumer. But for publishing it is totally ok to use the same channel. But keep in mind that depending on the operations, the protocol might kill the channel in certain situations (e.g. queue existence check), so prepare for that. There is also soft (configurable) and hard (usually 65535) limits on the maximum number of channels on many of the client implementations.
So to sum up, depending on your use case use one to a few connections, open channels when you need them and share them when it makes sense, but remember to close them when done.
The rabbitmq documentation explains the nature of the connections and channels (end of the document). And the accepted answer on this question has good information on the subject.
My app has multiple threads that publish messages to a single RabbitMQ cluster.
Reading the rabbit docs: i read the following:
For applications that use multiple threads/processes for processing, it is very common to open a new channel per thread/process and not share channels between them.
And I understand that instead of opening multiple connection (expensive)
it is better to open multiple channels.
But why not use a single channel to all threads?
What are the benefits of using multiple channels over a single channel?
AMQP has the concept of Channel to provide more flexibility over reliable TCP connections. Opening a TCP connection per message would be extremely expensive, so they came up with the idea of logical Channels within a connection.
It is not a good idea to use a Channel for all the threads because if anything fails in a particular thread and the Channel dies, the rest of the threads will throw the exception AlreadyClosedException. A channel can die for multiple reasons: for example for trying to declare something that is already declared with other parameters or trying to cancel a consumer which doesn't exist, publishing to an exchange that doesn't exist, etc...
My best advice would be to have an object that holds a Channel in a local variable and also implements ShutdownListener interface, so every time the channel fails, it is able to recover and create a new one from a connection. So I would say that the main benefit is failure tolerance and scalability, since if a Channel dies it won't affect the rest.
Using a java client for RabbitMQ, I have created a connection pooling mechanism that has a set of rabbitmq connections established and available. Once a client leases a connection the client creates a channel. If I have to send perform tasks and send 100 messages, for each of those messages the client will lease a connection and create a channel with the API such as:
rqConnection = MyPoolManager.leaseConnection();
rqChannel = rqConnection.createChannel();
Can I have a channel pre-established within my pool as one channel per connection, or a channel can always be created prior to send a message ? My concern is that creating channels over channels might consume resources. I can have the channel co-exist with a Class that contains both the connection and the channel so it is always pre-created ahead of its usage need. If the channel creation poses no resource consumption or leakage implications, then I can proceed with my current approach.
Based on additional research and observation from other groups, here are some facts about channels:
it appears that there are no documents specifying how to calculate the ration of number of channels per connection, neither for the benefits of running multiple connections vs multiple channels per connection
running a large number connections appears to be more resource consuming than running a large number of channels. Also, connections are limited to a certain number of file descriptors, whereas Channels are not.
some individual tests revealed the performance benchmarking of pooling connections versus pooling channels is similar
So the best approach appears to be in favor of having one connection and pool on multiple channels, where each channel would be provided by a different thread ( to prevent concurrency issues ).
Book Essential WCF claims that NetTcpBinding.MaxConnections limits the number of connections to an endpoint. Thus if property is set to value of 10, then only 10 concurrent connections will be allowed to that endpoint.
But the following blog http://kennyw.com/work/indigo/181 claims this property this property doesn’t limit the number of concurrent connections, but instead only specifies max number of connections that will be cached and reused by another channel:
MaxConnections for TCP is not a hard
and fast limit, but rather a knob on
the connections that we will cache in
our connection pool. That is, if you
set MaxConnections=2, you can still
open 4 client channels on the same
factory simultaneously. However, when
you close all of these channels, we
will only keep two of these
connections around (subject to
IdleTimeout of course) for future
channel usage. This helps performance
in cases where you are creating and
disposing client channels. This knob
will also apply to the equivalent
usage on the server-side as well (that
is, when a server-side channel is
closed, if we have less than
MaxConnections in our server-side pool
we will initiate I/O to look for
another new client channel).
So which is true?
EDIT:
First of all, you mean NetTcpBinding.MaxConnections, right?
Yes, thank you ... I've corrected the typo
See official docs at http://msdn.microsoft.com/en-us/library/system.servicemodel.nettcpbinding.maxconnections.aspx and especially http://msdn.microsoft.com/en-us/library/ms731078.aspx - the behavior is actually different depending if it's the server or the client, but in no case is it a hard limit on the number of connections. (On the client, it's a limit on the connections that are pooled, and on the server it's a limit on connections that haven't been accepted yet by the ServiceModel layer).
a) I assume by “pooled” you mean number of connection that will be reused by other channels. But the blog says this is the case for both client and the server, while if I understand you correctly, you’re saying on server it means number of connections waiting to be accepted by ServiceModel layer?
Thus if property is set to 10, then only 10 connections will be allowed to wait to be accepted and if another connection tries to wait, it will immediately be rejected?
First of all, you mean NetTcpBinding.MaxConnections, right?
See official docs at http://msdn.microsoft.com/en-us/library/system.servicemodel.nettcpbinding.maxconnections.aspx and especially http://msdn.microsoft.com/en-us/library/ms731078.aspx - the behavior is actually different depending if it's the server or the client, but in no case is it a hard limit on the number of connections. (On the client, it's a limit on the connections that are pooled, and on the server it's a limit on connections that haven't been accepted yet by the ServiceModel layer).