How to ensure flow control in RabbitMQ is never triggered? - rabbitmq

I have a publisher pushing to a queue at a slightly larger rate than the consumers can consume. For small numbers, it is okay, but for a very large number of messages, RabbitMQ starts writing it to the disk. At a certain point of time, the disk becomes full, and flow control is triggered. From then on, the rates are really slow. Is there any way to decrease or share this load between cluster nodes? How should I design my application so that flow control is never triggered?
I am using RabbitMQ 3.2.3 on three nodes with 13G RAM, and 10G of system disk space - connected to each other through the cluster. Two of these are RAM nodes, and the remaining one is a disk node, also used for RabbitMQ management plugin.

You can tweak the configuration, upgrade hardware etc and in the end you'd probably want to put a load balancer in front of your RabbitMQ servers to balance the load between multiple RabbitMQ nodes. The problem here is that if you are publishing at a higher rate than you are consuming, eventually you will run into this problem again, and again.
I think the best way to prevent this from happening is to implement logic on the publisher side that keeps track of the number of requests waiting to be processed in the queue. If the number of requests exceeds X the publisher should either wait until the number of messages has gone down, or publish new messages at a slower rate. This type of solution of course depends on where the messages published are coming from, if they are user submitted (e.g. through a browser or client) you could show a loading-bar when the queue builds-up.
Ideally though you should focus on making the processing on the consumer side faster, and maybe scale that part up, but having something to throttle the publisher when it gets busy should help prevent buildups.

Related

RabbitMQ slowing down after some times

On our RabbitMQ installed in production, we have a performance issue.
To explain the context, we have an initialization batch that creates around ~60k messages. For business reasons, those messages must be treated in strict order and we can't lose any. As such, we have only one queue which is durable and lazy and one consumer (SpringBoot AMQP) with a prefetch of 10. Both are on the same virtual machine.
At first, the processing is fast enough, around 5 to 10 messages per second. But it progressively slows down until it reaches a cap of fewer than 20 messages per hour. It takes approximately 1 hour to reach this point.
After some investigations, we found out that the problem comes from RabbitMQ. When we simply stop and restart it, the performance goes back to normal and then drops slowly again. Doing the same on just the consumer doesn't change anything.
I'm thinking about some resources bottleneck but I can't manage to find which one as RAM, CPU, and disk looks fine. I am not really familiar with ERL virtual machine and managing RabbitMQ itself so I may have missed something.
Does someone as an idea of the source of the problem or where I could look for more information on what is happening?
RabbitMQ characteristics :
ERL 23.3.2
RabbitMQ 3.8.14

server-to-server multicast messaging with Google Cloud PubSub?

I have a cluster of backend servers on GCP, and they need to send messages to each other. All the servers need to receive every message, but I can tolerate a low error rate. I can deal with receiving the message more than once on a given server. Packet ordering doesn't matter.
I don't need much of a persistence layer. A message becomes stale within a couple of seconds after sending it.
I wired up Google Cloud PubSub and pretty quickly realized that for a given subscription, you can have any number of subscribers but only one of them is guaranteed to get the message. I considered making the subscribers all fail to ack it, but that seems like a gross hack that probably won't work well.
My server cluster is sized dynamically by an autoscaler. It spins up VM instances as needed, with dynamic hostnames and IP addresses. There is no convenient way to map the dynamic hosts to static subscriptions, but it feels like that's my only real option: Create more subscriptions than my max server pool size, and then use some sort of paxos system (runtime config, zookeeper, whatever) to allocate servers to subscriptions.
I'm starting to feel that even though my use case feels really simple ("Every server can multicast a message to every other server in my group"), it may not be a good fit for Cloud PubSub.
Should I be using GCM/FCM? Or some other technology?
Cloud Pub/Sub may or may not be a fit for you, depending on the size of your server cluster. Failing to ack the messages certainly won't work because you can't be sure each instance will get the message; it could just be redelivered to the same instance over and over again.
You could use multiple subscriptions and have each instance create a new subscription when it starts up. This only works if you don't plan to scale beyond 10,000 instances in your cluster, as that is the maximum number of subscriptions per topic allowed. The difficulty here is in cleaning up subscriptions for instances that go down. Ones that cleanly shut down could probably delete their own subscriptions, but there will always be some that don't get cleaned up. You'd need some kind of external process that can determine if the instance for each subscription is still up and running and if not, delete the subscription. You could use GCE shutdown scripts to catch this most of the time, though there will still be edge cases where deletes would have to be done manually.

ActiveMQ performance for producing persistent text messages

As advised on the webpage
activemq-performance-module-users-manual I've tried (on an Intel i7 laptop with Windows 7 OS and SSD drive) the performance of producing persistent messages on a ActiveMQ Queue :
mvn activemq-perf:producer -Dproducer.destName=queue://TEST.FOO -Dproducer.deliveryMode=persistent
against the default installation of activemq 5.12.1
The performance which I got is around 300-400 messages per second.
On the page activemq-performance I have been reading much higher numbers:
When running the server on one box and a single producer and consumer thread in separate VMs on the other box, using a single topic we got around 21-22,000 messages/second using 1-2K messages.
On the other hand, when the messages are not persistent, the performance of the producer grows to 49000 messages per second. -Dproducer.deliveryMode=nonpersistent
When the messages are sent asynchrounously.
-Dproducer.deliveryMode=persistent -Dfactory.useAsyncSend=true
I get around 23000 messages sent per second.
From what I see here stackoverflow-activemq-persistent-performance-on-different-operatiing-systems it makes a difference when running activemq on different OS.
Can somebody give me some tips for having a better performance for writing persistent activemq messages?
Performance of sending persistent messages is all about disk based IO as the message must be written to the disk prior to the broker signalling the client that the message send completed. The faster the disk the better your throughput will be, all else being equal.
To work around some of this you can send persistent messages in transactional batches so that the send itself is complete and the synchronization point is reduced to the transaction boundary.
Depending on the size of the text messages you can also gain some performance by using compression, this can be turned on via a option in the ActiveMQConnectionFactory.

NServiceBus Pub/Sub Distributor/Worker Scenario Too Slow

I am working on a proof of concept implementation of NServiceBus v4.x for work.
Right now I have two subscribers and a single publisher.
The publisher can publish over 500 message per second. It runs great.
Subscriber A runs without distributors/workers. It is a single process.
Subscriber B runs with a single distributor powering N number of workers.
In my test I hit an endpoint that creates and publishes 100,000 messages. I do this publish with the subscribers off line.
Subscriber A processes a steady 100 messages per second.
Subscriber B with 2+ workers (same result with 2, 3, or 4) struggles to top 50 messages per second gross across all workers.
It seems in my scenario that the workers (which I ramped up to 40 threads per worker) are waiting around for the distributor to give them work.
Am I missing something possibly that is causing the distributor to be throttled? All Buses are running an unlimited Dev license.
System Information:
Intel Core i5 M520 # 2.40 GHz
8 GBs of RAM
SSD Hard Drive
UPDATE 08/06/2013: I finished deploying the system to a set of servers. I am experiencing the same results. Every server with a worker that I add decreases the performance of the subscriber.
Subscriber B has a distributor on one server and two additional servers for workers. With Subscriber B and one server with an active worker I am experiencing ~80 messages/events per second. Adding in another worker on an additional physical machine decreases that to ~50 messages per second. Also, these are "dummy messages". No logic actually happens in the handlers other than a log of the message through log4net. Turning off the logging doesn't increase performance.
Suggestions?
If you're scaling out with NServiceBus master/worker nodes on one server, then trying to measure performance is meaningless. One process with multiple threads will always do better than a distributor and multiple worker nodes on the same machine because the distributor will become a bottleneck while everything is competing for the same compute resources.
If the workers are moved to separate servers, it becomes a completely different story. The distributor is very efficient at doling out messages if that's the only thing happening on the server.
Give it a try with multiple servers and see what happens.
Rather than have a dummy handler that does nothing, can you simulate actual processing by adding in some sleep time, say 5 seconds. And then compare the results of having a subscriber and through the distributor?
Scaling out (with or without a distributor) is only useful for where the work being done by a single machine takes time and therefore more computing resources helps.
To help with this, monitor the CriticalTime performance counter on the endpoint and when you have the need, add in the distributor.
Scaling out using the distributor when needed is made easy by not having to change code, just starting the same endpoint in distributor and worker profiles.
The whole chain is transactional. You are paying heavy for this. Increasing the workload across machines will really not increase performance when you do not have very fast disk storage with write through caching to speed up transactional writes.
When you have your poc scaled out to several servers just try to mark a messages as 'Express' which does not do transactional writes in the queue and disable MSDTC on the bus instance to see what kind of performance is possible without transactions. This is not really usable for production unless you know where this is not mandatory or what is capable when you have a architecture which does not require DTC.

distributed cluster questions about performance

I'm using 6 servers to make a cluster and they are all disk nodes. I use rabbitmq for collecting log file for our website. Now at the peak hour, the publish rate is about 30k message per second. There are 2 main consumers(hdfs and elasticsearch) and each one need to handle all message, so the delivery rate hit about 60k per second.
In my scenario, a single server can hold 10k delivery rate and I use 6 node to load balance the pressure. My solution is that I created 2 queues on each node. Each message is with a random routing-key(something like message.0, message.1, etc) to distribute the pressure to every node.
What confused me is:
All message send to one node. Should I use a HA Proxy to load balance this publish pressure?
Is there any performance difference between Durable Queues and Transient Queues?
Is there any performance difference between Memory Node and Disk Node? What I know is the difference between memory node and disk node is only about the meta data such as queue configuration.
How can I imrove the performance in publish and delivery codes? I've researched and I know several methods:
disable the confirm mechanism(in publish codes?)
enable HiPE(I've done that and it helped a lot)
For example, input is 1w mps(message per second), there are two consumers to consume all message. Then the output is 2w mps. If my server can handle 1w mps, I need two server to handle the 2w-mps-pressure. Now a new consumer need to consume all message, too. As a result, output hits 3w mps, so I need another one more server. For a conclusion, one more consumer for all message, one more server?
"All message send to one node. Should I use a HA Proxy to load balance this publish pressure?"
This article outlines a number of designs aimed at distributing load in RabbitMQ.
"Is there any performance difference between Durable Queues and Transient Queues?"
Yes, Durable Queues are backed up to disk so that they can be reinstated on server-restart, for example. This adds a nominal overhead, though the actual process occurs asynchronously.
"Is there any performance difference between Memory Node and Disk Node?"
Not that I'm aware of, but that would depend on the machine itself.
"How can I imrove the performance in publish and delivery codes?"
Try this out.