ActiveMQ performance for producing persistent text messages - activemq

As advised on the webpage
activemq-performance-module-users-manual I've tried (on an Intel i7 laptop with Windows 7 OS and SSD drive) the performance of producing persistent messages on a ActiveMQ Queue :
mvn activemq-perf:producer -Dproducer.destName=queue://TEST.FOO -Dproducer.deliveryMode=persistent
against the default installation of activemq 5.12.1
The performance which I got is around 300-400 messages per second.
On the page activemq-performance I have been reading much higher numbers:
When running the server on one box and a single producer and consumer thread in separate VMs on the other box, using a single topic we got around 21-22,000 messages/second using 1-2K messages.
On the other hand, when the messages are not persistent, the performance of the producer grows to 49000 messages per second. -Dproducer.deliveryMode=nonpersistent
When the messages are sent asynchrounously.
-Dproducer.deliveryMode=persistent -Dfactory.useAsyncSend=true
I get around 23000 messages sent per second.
From what I see here stackoverflow-activemq-persistent-performance-on-different-operatiing-systems it makes a difference when running activemq on different OS.
Can somebody give me some tips for having a better performance for writing persistent activemq messages?

Performance of sending persistent messages is all about disk based IO as the message must be written to the disk prior to the broker signalling the client that the message send completed. The faster the disk the better your throughput will be, all else being equal.
To work around some of this you can send persistent messages in transactional batches so that the send itself is complete and the synchronization point is reduced to the transaction boundary.
Depending on the size of the text messages you can also gain some performance by using compression, this can be turned on via a option in the ActiveMQConnectionFactory.

Related

How to increase RabbitMQ low publish rates performance

I'm using RabbitMQ 3.6.10.
Having 16GB RAM on the machine and set water benchmark to 6GB. 4 cores.
I'm trying to perform some tests on Rabbit. Creating 1 publisher and no one that will consume the messages.
When creating 1 connection with 1 channel publishing unlimited messages one after another the management UI shows that average publish/s in ~4500.
When increasing the number of channels/connections and do it parallel in different kinds of combination i can see that it also not writing more than ~4,500.
I saw many benchmarks that talk about many more messages per second.
I can't figure what can cause the bottleneck? Any ideas?
In addition, when using many channels with many messages I get to some point that the Rabbit RAM is full and it blocks the publishers from publishing more messages. This is a good behavior but the problem is that the Rabbit stops writing to the disk and it stuck in this status forever. Any ideas?

Flow control limitting message rate on single queue

I have a exchange and only one queue bind to it. When the message publishing rate goes over some cap the rabbitmq automatically throttles the incoming message rate.
On further investigation i found this happens due to the "Flow control" trottling mechanism built in rabbitmq. https://www.rabbitmq.com/blog/2014/04/14/finding-bottlenecks-with-rabbitmq-3-3/
As per this document i have connection, channels in flow control and not the queue. which means there is a cpu-bound / disk-bound limit.
My messages are not persistent so i don't have disk limitation. On Searching, i found documents stating a queue is limited to single cpu. https://groups.google.com/forum/#!msg/rabbitmq-users/wzHMV7F0ugU/zhW_9b8ACQAJ
What does it mean ? do the rabbitmq queue process uses only 1 cpu even multiple cores are available in the machine? what is the limitation of cpu with respect to queue flow control?
A queue is handled by one and one only CPU, which mean that you have to design your message flow through rabbit with multiple queue in order to remain scalable.
If you are on one queue only you will be limited to a maximum number of messages no matter if you have 1 or more cores
https://www.rabbitmq.com/queues.html#runtime-characteristics
If you have a specific need to build an architecture with only one logical queue, which is explicitely not recommended ; or if you have a queue with a really high trafic, you can check sharded queues here : Github Sharded queues Plugin
It's a pluggin (take with caution and test everything before going to production, especialy failure and replication) that split a logical queue name into multiple queues.
If you are running a benchmark on rabbitmq, remember to produce and consume on a number of queues superior to the amount of CPU cores present on the server.
Other tips about benchmark, try to produce only, consume only, and both at the same time, with different persistence settings (persistence, message size, lazy queues, ...) and ack settings.

How to ensure flow control in RabbitMQ is never triggered?

I have a publisher pushing to a queue at a slightly larger rate than the consumers can consume. For small numbers, it is okay, but for a very large number of messages, RabbitMQ starts writing it to the disk. At a certain point of time, the disk becomes full, and flow control is triggered. From then on, the rates are really slow. Is there any way to decrease or share this load between cluster nodes? How should I design my application so that flow control is never triggered?
I am using RabbitMQ 3.2.3 on three nodes with 13G RAM, and 10G of system disk space - connected to each other through the cluster. Two of these are RAM nodes, and the remaining one is a disk node, also used for RabbitMQ management plugin.
You can tweak the configuration, upgrade hardware etc and in the end you'd probably want to put a load balancer in front of your RabbitMQ servers to balance the load between multiple RabbitMQ nodes. The problem here is that if you are publishing at a higher rate than you are consuming, eventually you will run into this problem again, and again.
I think the best way to prevent this from happening is to implement logic on the publisher side that keeps track of the number of requests waiting to be processed in the queue. If the number of requests exceeds X the publisher should either wait until the number of messages has gone down, or publish new messages at a slower rate. This type of solution of course depends on where the messages published are coming from, if they are user submitted (e.g. through a browser or client) you could show a loading-bar when the queue builds-up.
Ideally though you should focus on making the processing on the consumer side faster, and maybe scale that part up, but having something to throttle the publisher when it gets busy should help prevent buildups.

distributed cluster questions about performance

I'm using 6 servers to make a cluster and they are all disk nodes. I use rabbitmq for collecting log file for our website. Now at the peak hour, the publish rate is about 30k message per second. There are 2 main consumers(hdfs and elasticsearch) and each one need to handle all message, so the delivery rate hit about 60k per second.
In my scenario, a single server can hold 10k delivery rate and I use 6 node to load balance the pressure. My solution is that I created 2 queues on each node. Each message is with a random routing-key(something like message.0, message.1, etc) to distribute the pressure to every node.
What confused me is:
All message send to one node. Should I use a HA Proxy to load balance this publish pressure?
Is there any performance difference between Durable Queues and Transient Queues?
Is there any performance difference between Memory Node and Disk Node? What I know is the difference between memory node and disk node is only about the meta data such as queue configuration.
How can I imrove the performance in publish and delivery codes? I've researched and I know several methods:
disable the confirm mechanism(in publish codes?)
enable HiPE(I've done that and it helped a lot)
For example, input is 1w mps(message per second), there are two consumers to consume all message. Then the output is 2w mps. If my server can handle 1w mps, I need two server to handle the 2w-mps-pressure. Now a new consumer need to consume all message, too. As a result, output hits 3w mps, so I need another one more server. For a conclusion, one more consumer for all message, one more server?
"All message send to one node. Should I use a HA Proxy to load balance this publish pressure?"
This article outlines a number of designs aimed at distributing load in RabbitMQ.
"Is there any performance difference between Durable Queues and Transient Queues?"
Yes, Durable Queues are backed up to disk so that they can be reinstated on server-restart, for example. This adds a nominal overhead, though the actual process occurs asynchronously.
"Is there any performance difference between Memory Node and Disk Node?"
Not that I'm aware of, but that would depend on the machine itself.
"How can I imrove the performance in publish and delivery codes?"
Try this out.

how to recover from message store exhaustion?

when a activemq broker gets flooded with messages or the consumer fails it will stop accepting messages once certain (configurable) limits are reached. In Broker Networks this effect can take down the whole cluster.
I'm currently using the default configuration for memory limits and experience the following behavior:
consumer fails or becomes very slow (known problem)
broker A (the one the consumer connects to) gets filled and stops accepting messages
all other brokers get filled up and stop to accept messages
the cluster is basicly down
if the consumer comes back online now it will try to reconnect to one of the cluster nodes but the nodes will not accept the connection becaus this would create advisory messages that can't be handled because the broker is already full.
How do i have to configure the memory limits so that my productive destinations are limited and blocked but the broker will still be able to accept advisories so my consumer can revover?
You should be able to use producerFlowControl to slow producers to not overwhelm your broker. That being said, this is enabled by default, so you are likely using it already...
I would try something like this (assuming an 8GB box or so)...
use the failover transport everywhere (broker/client connections)
increase JVM heap to 4 GB
increase systemUsage limits substantially (memoryUsage 3gb, storeUsage/tempUsage = 10 gb)
enable producer flow control on both topics and queues
set the memory limit to 2GB divided by the total # of topics+queues
in other words, this should in total be substantially less the the memoryUsage limit
exclude the Advisory topics from the producer flow control (they might be already)
This should limit the producers and leave resources for your system to function/recover/accept consumer connections...