As far as I know, RabbitMQ has a internal flow control which blocks a producer which publishes messages too fast that consumers cannot catch up it. (It does not require any configuration)
I'd like to know whether I can configure some amount of quota (MB/sec) for each producer and client so that they do not burden the broker system too much.
For example, a producer with quota 2 MB/sec cannot publish messages at higher rate than 2 MB/sec.
There is no a way lo limit each single producer.
The flow control needs to do not burden the broker system too much.
If needs, you can tune the memory threshold and the paging threshold:
https://www.rabbitmq.com/memory.html
about the flow control I suggest to read:
http://www.rabbitmq.com/blog/2014/04/14/finding-bottlenecks-with-rabbitmq-3-3/
and
https://www.rabbitmq.com/blog/2015/10/06/new-credit-flow-settings-on-rabbitmq-3-5-5/
I'd add that, for my side, it doesn't make too much sense to limit a single producer, what happen if for example you have thousand of producers ?
Related
We have a design challenge where the situation is as follow:
There are multiple producers and multiple consumers (on same queue).
Each message represent a task with parameters that consumer needs to handle.
The problem is that there are certain tasks that take lots of memory (and cpu power) which we know the consumer have no capacity to handle this. the good thing is that we know how much memory (and cpu power) it approximately can take in advance, so we could prevent a consumer taking that task and giving a change to other consumer with enough memory to handle.
There is the prefetch setting but i can't see how it can configure to meet this requirement
Finally I found an option to rollback a transaction, so the consumer can basically check if it has enough hardware resources to handle the task and if not rollback which retrieves the message back to queue allowing next consumer take it and so forth.
Not sure if that's the right approach or there is a better way?
The messages could have properties set which indicate whether or not they will require high CPU and/or memory and then consumers could use selectors to only receive the messages which fit their hardware constraints.
In one of our applications the back pressure did not work and there was a huge pileup in a queue on RabbitMQ. This caused the RMQ node to choke.
Is there a way to apply flow control (manually) on that queue in such cases? That would have slowed down the producer and given us headroom.
In your case the consumers are not fast enough to handle the messages.
Basically you had a load-spike.
So, it does not mean that you need to stop the publishers.
You could:
Increase the number of the consumers
Use the Lazy queues
you didn't see the flow control because RabbitMQ could handle the messages.
I have a middleware based on Apache Camel which does a transaction like this:
from("amq:job-input")
to("inOut:businessInvoker-one") // Into business processor
to("inOut:businessInvoker-two")
to("amq:job-out");
Currently it works perfectly. But I can't scale it up, let say from 100 TPS to 500 TPS. I already
Raised the concurrent consumers settings and used empty businessProcessor
Configured JAVA_XMX and PERMGEN
to speed up the transaction.
According to Active MQ web Console, there are so many messages waiting for being processed on scenario 500TPS. I guess, one of the solution is scale the ActiveMQ up. So I want to use multiple brokers in cluster.
According to http://fuse.fusesource.org/mq/docs/mq-fabric.html (Section "Topologies"), configuring ActiveMQ in clustering mode is suitable for non-persistent message. IMHO, it is true that it's not suitable, because all running brokers use the same store file. But, what about separating the store file? Now it's possible right?
Could anybody explain this? If it's not possible, what is the best way to load balance persistent message?
Thanks
You can share the load of persistent messages by creating 2 master/slave pairs. The master and slave share their state either though a database or a shared filesystem so you need to duplicate that setup.
Create 2 master slave pairs, and configure so called "network connectors" between the 2 pairs. This will double your performance without risk of loosing messages.
See http://activemq.apache.org/networks-of-brokers.html
This answer relates to an version of the question before the Camel details were added.
It is not immediately clear what exactly it is that you want to load balance and why. Messages across consumers? Producers across brokers? What sort of concern are you trying to address?
In general you should avoid using networks of brokers unless you are trying to address some sort of geographical use case, have too many connections for a signle broker to handle, or if a single broker (which could be a pair of brokers configured in HA) is not giving you the throughput that you require (in 90% of cases it will).
In a broker network, each node has its own store and passes messages around by way of a mechanism called store-and-forward. Have a read of Understanding broker networks for an explanation of how this works.
ActiveMQ already works as a kind of load balancer by distributing messages evenly in a round-robin fashion among the subscribers on a queue. So if you have 2 subscribers on a queue, and send it a stream of messages A,B,C,D; one subcriber will receive A & C, while the other receives B & D.
If you want to take this a step further and group related messages on a queue so that they are processed consistently by only one subscriber, you should consider Message Groups.
Adding consumers might help to a point (depends on the number of cores/cpus your server has). Adding threads beyond the point your "Camel server" is utilizing all available CPU for the business processing makes no sense and can be conter productive.
Adding more ActiveMQ machines is probably needed. You can use an ActiveMQ "network" to communicate between instances that has separated persistence files. It should be straight forward to add more brokers and put them into a network.
Make sure you performance test along the road to make sure what kind of load the broker can handle and what load the camel processor can handle (if at different machines).
When you do persistent messaging - you likely also want transactions. Make sure you are using them.
If all running brokers use the same store file or tx-supported database for persistence, then only the first broker to start will be active, while others are in standby mode until the first one loses its lock.
If you want to loadbalance your persistence, there were two way that we could try to do:
configure several brokers in network-bridge mode, then send messages
to any one and consumer messages from more than one of them. it can
loadbalance the brokers and loadbalance the persistences.
override the persistenceAdapter and use the database-sharding middleware
(such as tddl:https://github.com/alibaba/tb_tddl) to store the
messages by partitions.
Your first step is to increase the number of workers that are processing from ActiveMQ. The way to do this is to add the ?concurrentConsumers=10 attribute to the starting URI. The default behaviour is that only one thread consumes from that endpoint, leading to a pile up of messages in ActiveMQ. Adding more brokers won't help.
Secondly what you appear to be doing could benefit from a Staged Event-Driven Architecture (SEDA). In a SEDA, processing is broken down into a number of stages which can have different numbers of consumer on them to even out throughput. Your threads consuming from ActiveMQ only do one step of the process, hand off the Exchange to the next phase and go back to pulling messages from the input queue.
You route can therefore be rewritten as 2 smaller routes:
from("activemq:input?concurrentConsumers=10").id("FirstPhase")
.process(businessInvokerOne)
.to("seda:invokeSecondProcess");
from("seda:invokeSecondProcess?concurentConsumers=20").id("SecondPhase")
.process(businessInvokerTwo)
.to("activemq:output");
The two stages can have different numbers of concurrent consumers so that the rate of message consumption from the input queue matches the rate of output. This is useful if one of the invokers is much slower than another.
The seda: endpoint can be replaced with another intermediate activemq: endpoint if you want message persistence.
Finally to increase throughput, you can focus on making the processing itself faster, by profiling the invokers themselves and optimising that code.
I would like to configure my ActiveMQ producers to failover (I'm using the Stomp protocol) when a broker reaches a configured limit. I want to allow consumers to continue consumption from the overloaded broker, unabated.
Reading ActiveMQ docs, it looks like I can configure ActiveMQ to do one of a few things when a broker reaches its limits (memory or disk):
Slow down messages using producerFlowControl="true" (by blocking the send)
Throw exceptions when using sendFailIfNoSpace="true"
Neither of the above, in which case..I'm not sure what happens? Reverts to TCP flow control?
It doesn't look like any of these things are designed to trigger a producer failover. A producer will failover when it fails to connect but not, as far as I can tell, when it fails to send (due to producer flow control, for example).
So, is it possible for me to configure a broker to refuse connections when it reaches its limits? Or is my best bet to detect slow down on the producer side, and to manually reconfigure my producers to use the a different broker at that time?
Thanks!
Your best bet is to use sendFailIfNoSpace, or better sendFailIfNoSpaceAfterTimeout. This will throw an exception up to your client, which can then attempt to resend the message to another broker at the application level (though you can encapsulate this logic over the top of your Stomp library, and use this facade from your code). Though if your ActiveMQ setup is correctly wired, your load both in terms of production and consumption should be more or less evenly distributed across your brokers, so this feature may not buy you a great deal.
You would probably get a better result if you concentrated on fast consumption of the messages, and increased the storage limits to smooth out peaks in load.
In our project, we want to use the RabbitMQ in "Task Queues" pattern to pass data.
On the producer side, we build a few TCP server(in node.js) to recv
high concurrent data and send it to MQ without doing anything.
On the consumer side, we use JAVA client to get the task data from
MQ, handle it and then ack.
So the question is:
To get the maximum message passing throughput/performance( For example, 400,000 msg/second) , How many queues is best? Does that more queue means better throughput/performance? And is there anything else should I notice?
Any known best practices guide for using RabbitMQ in such scenario?
Any comments are highly appreciated!!
For best performance in RabbitMQ, follow the advice of its creators. From the RabbitMQ blog:
RabbitMQ's queues are fastest when they're empty. When a queue is
empty, and it has consumers ready to receive messages, then as soon as
a message is received by the queue, it goes straight out to the
consumer. In the case of a persistent message in a durable queue, yes,
it will also go to disk, but that's done in an asynchronous manner and
is buffered heavily. The main point is that very little book-keeping
needs to be done, very few data structures are modified, and very
little additional memory needs allocating.
If you really want to dig deep into the performance of RabbitMQ queues, this other blog entry of theirs goes into the data much further.
According to a response I once got from the rabbitmq-discuss mailing group there are other things that you can try to increase throughput and reduce latency:
Use a larger prefetch count. Small values hurt performance.
A topic exchange is slower than a direct or a fanout exchange.
Make sure queues stay short. Longer queues impose more processing
overhead.
If you care about latency and message rates then use smaller messages.
Use an efficient format (e.g. avoid XML) or compress the payload.
Experiment with HiPE, which helps performance.
Avoid transactions and persistence. Also avoid publishing in immediate
or mandatory mode. Avoid HA. Clustering can also impact performance.
You will achieve better throughput on a multi-core system if you have
multiple queues and consumers.
Use at least v2.8.1, which introduces flow control. Make sure the
memory and disk space alarms never trigger.
Virtualisation can impose a small performance penalty.
Tune your OS and network stack. Make sure you provide more than enough
RAM. Provide fast cores and RAM.
You will increase the throughput with a larger prefetch count AND at the same time ACK multiple messages (instead of sending ACK for each message) from your consumer.
But, of course, ACK with multiple flag on (http://www.rabbitmq.com/amqp-0-9-1-reference.html#basic.ack) requires extra logic on your consumer application (http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2013-August/029600.html). You will have to keep a list of delivery-tags of the messages delivered from the broker, their status (whether your application has handled them or not) and ACK every N-th delivery-tag (NDTAG) when all of the messages with delivery-tag less than or equal to NDTAG have been handled.