RabbitMQ can send one message to different queues through exchange. While queueA and queueB on different node accept same message, will these two nodes store the message respectively on their own disk or using a common database to store this message once for sharing between nodes?
RabbitMQ in cluster does not share the same database-messages.
Each node has its own local database.
If want to learn more about that, I suggest to read:
https://github.com/rabbitmq/internals/blob/master/queues_and_message_store.md
When I look into the design in my new company's push applications backed by RabbitMQ, I find that we have some queues which have millions or even hundred million messages to send to for one push task. Say the queue name is named PUSH_QUEUE
I wonder if I can benefit from the design that I split the queue into several pieces and why:
PUSH_QUEUE_1
PUSH_QUEUE_2
PUSH_QUEUE_3
PUSH_QUEUE_4
PUSH_QUEUE_5
and producer will send to this sharding queue by robin round, consumer subscribe all the queues.
We don't specify any exchange but the default one.
This might be helpful for you: RabbitMQ-Sharding.
If I have two queues from which I want to consume messages, and I use a single SimpleMessageQueueListenerContainer for it, in which order would the listeners be invoked/messages consumed when both queues have messages?
I will try to be more specific of the problem I am working on:
I have a consumer application which needs to consume messages from 2 queues – say regular-jobs-queue and infrequent-jobs-queue. If there are any messages in ‘infrequent-jobs-queue’ I want to consume those before consuming messages from ‘regular-jobs-queue’. I might not be able to combine these and put all messages into a single rabbitmq level priority queue and assign higher priority to infrequent-job message because of some upcoming use-cases like purging regular-jobs without affecting infrequent-jobs and others.
I am aware that RabbitMQ has support for consumer priority but I am not very sure if it will be applicable here. I want all instances of my consumer application to first consume messages of infrequent-jobs-queue if any and not prioritize amongst these consumers.
Or should I like have 2 containers, with dedicated consumer thread(s) per queue and have an internal priority-queue data structure into which I can put messages as and when consumed from rabbitmq queue.
Any help would be really appreciated. Thanks.
~Rashida
You can't do what you want; messages will be delivered with equal priority.
Moving them to an internal in-memory queue will risk message loss.
You might want to consider using one of the RabbitTemplate.receive() or receiveAndConvert() methods instead of a message-driven container.
That way you have complete control.
I have a middleware based on Apache Camel which does a transaction like this:
from("amq:job-input")
to("inOut:businessInvoker-one") // Into business processor
to("inOut:businessInvoker-two")
to("amq:job-out");
Currently it works perfectly. But I can't scale it up, let say from 100 TPS to 500 TPS. I already
Raised the concurrent consumers settings and used empty businessProcessor
Configured JAVA_XMX and PERMGEN
to speed up the transaction.
According to Active MQ web Console, there are so many messages waiting for being processed on scenario 500TPS. I guess, one of the solution is scale the ActiveMQ up. So I want to use multiple brokers in cluster.
According to http://fuse.fusesource.org/mq/docs/mq-fabric.html (Section "Topologies"), configuring ActiveMQ in clustering mode is suitable for non-persistent message. IMHO, it is true that it's not suitable, because all running brokers use the same store file. But, what about separating the store file? Now it's possible right?
Could anybody explain this? If it's not possible, what is the best way to load balance persistent message?
Thanks
You can share the load of persistent messages by creating 2 master/slave pairs. The master and slave share their state either though a database or a shared filesystem so you need to duplicate that setup.
Create 2 master slave pairs, and configure so called "network connectors" between the 2 pairs. This will double your performance without risk of loosing messages.
See http://activemq.apache.org/networks-of-brokers.html
This answer relates to an version of the question before the Camel details were added.
It is not immediately clear what exactly it is that you want to load balance and why. Messages across consumers? Producers across brokers? What sort of concern are you trying to address?
In general you should avoid using networks of brokers unless you are trying to address some sort of geographical use case, have too many connections for a signle broker to handle, or if a single broker (which could be a pair of brokers configured in HA) is not giving you the throughput that you require (in 90% of cases it will).
In a broker network, each node has its own store and passes messages around by way of a mechanism called store-and-forward. Have a read of Understanding broker networks for an explanation of how this works.
ActiveMQ already works as a kind of load balancer by distributing messages evenly in a round-robin fashion among the subscribers on a queue. So if you have 2 subscribers on a queue, and send it a stream of messages A,B,C,D; one subcriber will receive A & C, while the other receives B & D.
If you want to take this a step further and group related messages on a queue so that they are processed consistently by only one subscriber, you should consider Message Groups.
Adding consumers might help to a point (depends on the number of cores/cpus your server has). Adding threads beyond the point your "Camel server" is utilizing all available CPU for the business processing makes no sense and can be conter productive.
Adding more ActiveMQ machines is probably needed. You can use an ActiveMQ "network" to communicate between instances that has separated persistence files. It should be straight forward to add more brokers and put them into a network.
Make sure you performance test along the road to make sure what kind of load the broker can handle and what load the camel processor can handle (if at different machines).
When you do persistent messaging - you likely also want transactions. Make sure you are using them.
If all running brokers use the same store file or tx-supported database for persistence, then only the first broker to start will be active, while others are in standby mode until the first one loses its lock.
If you want to loadbalance your persistence, there were two way that we could try to do:
configure several brokers in network-bridge mode, then send messages
to any one and consumer messages from more than one of them. it can
loadbalance the brokers and loadbalance the persistences.
override the persistenceAdapter and use the database-sharding middleware
(such as tddl:https://github.com/alibaba/tb_tddl) to store the
messages by partitions.
Your first step is to increase the number of workers that are processing from ActiveMQ. The way to do this is to add the ?concurrentConsumers=10 attribute to the starting URI. The default behaviour is that only one thread consumes from that endpoint, leading to a pile up of messages in ActiveMQ. Adding more brokers won't help.
Secondly what you appear to be doing could benefit from a Staged Event-Driven Architecture (SEDA). In a SEDA, processing is broken down into a number of stages which can have different numbers of consumer on them to even out throughput. Your threads consuming from ActiveMQ only do one step of the process, hand off the Exchange to the next phase and go back to pulling messages from the input queue.
You route can therefore be rewritten as 2 smaller routes:
from("activemq:input?concurrentConsumers=10").id("FirstPhase")
.process(businessInvokerOne)
.to("seda:invokeSecondProcess");
from("seda:invokeSecondProcess?concurentConsumers=20").id("SecondPhase")
.process(businessInvokerTwo)
.to("activemq:output");
The two stages can have different numbers of concurrent consumers so that the rate of message consumption from the input queue matches the rate of output. This is useful if one of the invokers is much slower than another.
The seda: endpoint can be replaced with another intermediate activemq: endpoint if you want message persistence.
Finally to increase throughput, you can focus on making the processing itself faster, by profiling the invokers themselves and optimising that code.
On this link it says "The current implementation will simply broadcast all the publish messages to all the other nodes" and adds that it will be improved in future.
For current implementation: If loosing messages is not important; does it make sense to use redis for pub/sub for now? It looks like one instance is better to stop broadcast traffic. Because beside writes; reads should be propgated to other nodes too! (so that the client will not be notified twice.)
Am I missing something?
No, I don't think you missed any point. Redis Cluster is an on-going work, and this includes the specifications. The section about pub/sub is rather light and could probably be improved.
In Salvatore's proposal, a client is subscribed on a single instance (not to all of them), so when the publications are broadcasted to all instances, the client is only notified once. If the Redis instance is down, it is up to the client to subscribe on one of the surviving node of the cluster (any other).
Another possibility would have been to elect one node of the cluster as a unique pub/sub node, so that clients can publish and subscribe on this node only. But high-availability of the pub/sub service would be more difficult to support this way.