I have one rabbit mq Publisher who is publishing on a Direct exchange. There are multiple rabbit mq consumers bound to the Direct exchange with different routing keys.
Few of these consumers might take more time to process the message.
My question is does one slow consumer affect the performance of other consumers even though they are bound on different routing keys ?
One slow consumer will have no affect on other consumers. Each consumer is independent and can work as fast or as slow as necessary for your application.
It will affect other consumers in the terrible case that said consumer's queue start backing up badly up to the point where you hit the server memory watermark. If that happens tho, you need to review what's going on in your system for such situation to arise.
Related
Background
I have a RabbitMQ cluster that running for more than a year without any problems. Lastly, I found that sometimes, the CPU of the machine is touching the 100% CPU. I'm investigating ways to increase the throughput of the cluster to serve more customers.
The cluster architecture is that we have HA enabled (exactly 1 replica), and durable messages (for all the queues). As I understand it, the durable feature is the most expensive one in terms of performance. So, I trying to understand if it is needed for me.
Question
According to my experience, the cluster was running for more than a year without problems. So I assume that the chance for a problem is very low. Even after this, I want to create another layer of protection, just in case...
If I have two servers that holding the same data, but not storing it into the disk (durable OFF), is not safe enough for 99.99% of the cases? Those two servers are in different regions so the chance that both of them will go down is very low. Wondering if saving it to the disk can be helpful, or just a waste?
There is a thumb rule about the performance improvements of disabling the durable feature? In percents.
Thank you!
The influence of durable on performance
For reliable delivery, rabbitmq use the publish confirmation mechanism. Everytime the publisher publish a message to rabbitmq server, the server will respond with basic.ack rpc to ack the message. For routable messages, the basic.ack is sent when a message has been accepted by all the queues. For persistent messages routed to durable queues, this means persisting to disk. For mirrored queues, this means that all mirrors have accepted the message. So as you mentioned, the IO may become bottlenect of performance.
Is it overhead both durable and mirrored
It depends on your consideration between performance and HA. Imagine if you declare non-durable mirrored queue, and the master and slave are down, your messages will get lost. So whether overhead depends on how important message safty is.
Is the performance bottleneck mainly caused by durable?
As we discussed, if you declare non-durable queue, the throught maybe increase. But this may not be the main cause of low performance. You have said the cpu usage sometimes is 100%, which means very little I/O waitting. The high load maybe due to many connections and high throughput. In order to determine how to increase throughput, you can use benchmark tool to find the bottleneck.
pages maybe useful:
https://www.cloudamqp.com/blog/2016-01-25-identify-and-protect-against-high-cpu-and-memory-usage.html
https://www.cloudamqp.com/blog/2018-01-08-part2-rabbitmq-best-practice-for-high-performance.html
I would like to create a cluster for high availability and put a load balancer front of this cluster. In our configuration, we would like to create exchanges and queues manually, so one exchanges and queues are created, no client should make a call to redeclare them. I am using direct exchange with a routing key so its possible to route the messages into different queues on different nodes. However, I have some issues with clustering and queues.
As far as I read in the RabbitMQ documentation a queue is specific to the node it was created on. Moreover, we can only one queue with the same name in a cluster which should be alive in the time of publish/consume operations. If the node dies then the queue on that node will be gone and messages may not be recovered (depends on the configuration of course). So, even if I route the same message to different queues in different nodes, still I have to figure out how to use them in order to continue consuming messages.
I wonder if it is possible to handle this failover scenario without using mirrored queues. Say I would like switch to a new node in case of a failure and continue to consume from the same queue. Because publisher is just using routing key and these messages can go into more than one queue, same situation is not possible for the consumers.
In short, what can I to cope with the failures in an environment explained in the first paragraph. Queue mirroring is the best approach with a performance penalty in the cluster or a more practical solution exists?
Data replication (mirrored queues in RabbitMQ) is a standard approach to achieve high availability. I suggest to use those. If you don't replicate your data, you will lose it.
If you are worried about performance - RabbitMQ does not scale well.
The only way I know to improve performance is just to make your nodes bigger or create second cluster. Adding nodes to cluster does not really improve things. Also if you are planning to use TLS it will decrease throughput significantly as well. If you have high throughput requirement +HA I'd consider Apache Kafka.
If your use case allows not to care about HA, then just re-declare queues/exchanges whenever your consumers/publishers connect to the broker, which is absolutely fine. When you declare queue that's already exists nothing wrong will happen, queue won't be purged etc, same with exchange.
Also, check out RabbitMQ sharding plugin, maybe that will do for your usecase.
I have set up a rabbit cluster and I publish messages into a fanout exchange every time something changes in a database.
I have dedicated queues bound to this exchange for some of my microservices that consume these updates and I also originally set up a dedicated queue for an external client so that they can federate it with their own rabbit infrastructure and consume a copy of every message.
Now I'm wondering whether allowing exchange federation rather than creating a new dedicated queue for each new external consumer would be a better approach since more and more users will come.
What are the pros and cons?
Thanks
As long as you manage permissions properly, the final decision is up to you. You can give a try to all variants first and find what will fit your actual needs.
Having local queue may have it pros and cons: it allows end-user to survive some outage with their infrastructure or network issue at the cost of your disk/memory, however, you may limit queue length and/or size.
I'd suggest you to take a look at Shovel plugin and Dynamic shovels. With local queue it may server a good job.
Comparing to federation, shovel is much simpler, e.g. it doesn't sync content between upstream and downstream but simply moves message from one queue to another in a reliable manner. As long as you don't need what federation provides, shovel could be a good choice.
Also, you may find this q/a useful (however, it might be a bit outdated) - https://stackoverflow.com/a/19357272.
When my system's data changes I publish every single change to at least 4 different consumers (around 3000 messages a second) so I want to use a message broker.
Most of the consumers are responsible to update their database tables with the change.
(The DBs are different - couch, mysql, etc therefor solutions such as using their own replication mechanism or using db triggers is not possible)
questions
Does anyone have an experience with data replication between DBs using a message broker?
is it a good practice?
What do I do in case of failures?
Let's say, using RabbitMQ, the client removed 10,000 messages from the queue, acked, and threw an exception each time before handling them. Now they are lost. Is there a way to go back in the queue?
(re-queueing them will mess their order ).
Is using rabbitMQ a good practice? Isn't the ability to go back in the queue as in Kafka important to fail scenarios?
Thanks.
I don't have experience with DB replication using message brokers, but maybe this can help put you in the right track:
2. What do I do in case of failures?
Let's say, using RabbitMQ, the client removed 10,000 messages from the
queue, acked, and threw an exception each time before handling them.
Now they are lost. Is there a way to go back in the queue?
You can use dead lettering to avoid losing messages. I'd suggest to not ack until you are sure the consumers have processed them successfully, unless it is a long-running task. In case of failure, basic.reject instead of basic.ack to send them to a dead-letter queue. You have a medium throughput, so gotta be careful with that.
However, the order is not guaranteed. You'll need to implement a manual mechanism to recover them in the order they were published, maybe by using message headers with some sort of timestamp or id mechanism, to re-process them in the correct order.
Assume you have a small rabbitmq system of 3 nodes that is supposed to handle 100+ decently high volume queues in the same exchange. Given that queues only exist on the node they are created on (we're not using replicated, High Availability queues), what's the best way to create the queues? Is there any benefit to having the queues distributed among the cluster nodes, or is it better to keep them all on one node and have rmq do the routing?
It depends on your application, really.
RabbitMQ is smart about sending messages, so it'll only send a message to a node in the cluster if
a queue that holds that message resides on that node or
if a consumer has connected to that node and has requested the message.
In general, you should aim to declare queues on the nodes on which both the publishers and the consumers for that queue will connect to. In other words, you should aim to connect publishers and consumers to the node that holds the queues they use. This assumes you're trying to conserve bandwidth used overall.
If you're using clustering to improve throughput (and you probably are), and you don't care about internal bandwidth used, you should aim to connect your publishers/consumers to the nodes in a balanced way and not worry about the internal routing mechanisms.
One last thing to think about is memory and disk-space. Queues store messages in main memory, and fallback to disk if that's insufficient. So, if you declare all your queues in one place, that'll result in one node that's "over-worked" and two nodes with memory to spare.
As part of a move towards redundancy and failover in an application I'm working on, I've just finished setting up a RabbitMQ cluster behind a proxy, and have all of my publishers and consumers connect via the proxy, which round robins connections to the individual nodes as they come in from the clients. Prior to upgrading RabbitMQ to 2.7.1, this seemed to pretty evenly distribute queues to the separate nodes, though this would of course depend pretty heavily on how your proxy balances the requests and when your clients try to connect (and declare a queue)...
Having said all that, I just upgraded to RabbitMQ 2.7.1, which was pretty painless, and gave us HA queues, which is a pretty big win for our apps. At any rate, if you're interested in the set up, and think it would be of benefit to your queue problem, I'd be happy to share the setup.