I have a service fabric application with one stateless service and multiple stateful services. The stateless service reads from RabbtitMQ(using MassTransit) and passes the message to corresponding stateful services to process it.The message passed to stateful service is queued in ConcurrentQueue and in RunAsync it dequeue and process. It was fine until there where less number of messages. Recently the number of messages increased multiple folds. Now the situation is message count in RabbitMq is in millions and the stateful service queue gets overloaded. The memory usage became very high and the cluster gets stuck.
Does the queue gets locked during enqueue process that the RunAsync method needs to wait. Enqueue are very high.
I'm not able to find a way to stop fetching from RabbitMq if the queue count is very high. Atleast the cluster won't hang or crash.
What will be the best design follow in situations where input load is very high?
Thanks
How are you partitioning your stateful services? If you have a range or named partitioning, all your messages should not be hitting the same replica. If you applied partitioning scheme correctly, your messages will end up in different partitions of the same stateful service and you can scale out horizontally.
Related
I have a Java application which publishes events to RabbitMQ. It has one very important characteristic: message order must be preserved at all times. The consumer can handle duplicates, but it cannot handle when message 2 is enqueued before message 1, so to say.
I have been reading a lot about RabbitMQ lately, and I feel there is only solution to do this: set the channel in confirm mode (https://www.rabbitmq.com/confirms.html - basically, it forces the broker to acknowledge the publication) and publish one by one. With one by one I mean that the message 2 is only published after RabbitMQ confirmed (via an asynchronous ACK response) that message 1 is actually well received and persisted.
I tried this in a conceptual implementation, and while this works fine, it's uber slow, without exaggerating. Which makes sense: after all, we are now limiting our message rate to 1 message at a time.
So this leads me to my question: are there other, more performant, ways to ensure that message ordering is always preserved (either in RabbitMQ or via different approaches)?
Although my concern is RabbitMQ, I believe this question might be applied to any kind of asynchronous message queue service.
RabbitMQ's clients enqueue in the same order that you sent. It's when subscribers go down, you get network splits or the subscriber NACKs messages that they can get re-ordered; and even then RMQ tries to keep them in the same approximate order by re-queueing at the same position, or as close to the same position.
You can do it like you suggest; take one message at a time, because if you take a message, but crash before you've ACKed it from the broker, it will pop up when your service comes back up, at the same position.
This assumes you only have a single service instance at any given time, consuming from the queue. Which in turn is a distributed systems problem on its own, if you have a scheduler like Kubernetes or Mesos, spawning your service instances.
Another solution would be to ensure ordering of processing in the receiving service, by "resequencing" the messages based on their logical timestamps/sequence numbers.
I've written a much more thorough guide as annotated code here https://github.com/haf/rmq-publisher-confirms-hopac/blob/master/src/Server/Shared/RabbitMQ.fs — with batching you can resequence. Furthermore, if your idempotence builds the consecutive sequence numbers into its logic, you can start taking batches and each event will be idempotent, despite being re-consumed.
I have more-or-less implemented the Reliability Pattern in my Mule application using persistent VM queues CloudHub, as documented here. While everything works fine, it has left me with a number of questions about actually ensuring reliable delivery of my messages. To illustrate the points below, assume I have http-request component within my "application logic flow" (see the diagram on the link above) that is throwing an exception because the endpoint is down, and I want to ensure that the in flight message will eventually get delivered to the endpoint:
As detailed on the link above, I have observed that when the exception is thrown within my "application logic flow", and I have made the flow transactional, the message is put back on the VM queue. However all that happens is the message then repeatedly taken off the queue, processed by the flow, and the exception is thrown again - ad infinitum. There appears to be no way of configuring any sort of retry delay or maximum number of retries on VM queues as is possible, for example, with ActiveMQ. The best work around I have come up with is to surround the http-request message processor with the until-successful scope, but I'd rather have these sorts of things apply to my whole flow (without having to wrap the whole flow in until-successful). Is this sort of thing possible using only VM queues and CloudHub?
I have configured my until-successful to place the message on another VM queue which I want to use as a dead-letter-queue. Again, this works fine, and I can login to CloudHub and see the messages populated on my DLQ - but then it appears to offer no way of moving messages from this queue back into the flow when the endpoint comes back up. All it seems you can do in CloudHub is clear your queue. Again, is this possible using VM queues and CloudHub only (i.e. no other queueing tool)?
VM queues are very basic, whether you use them in CloudHub or not.
VM queues have no capacity for delaying redelivery (like exponential back-offs). Use JMS queues if you need such features.
You need to create a flow for processing the DLQ, for example one that regularly consumes the queue via the requester module and re-injects the messages into the main queue. Again, with JMS, you would have better control.
Alternatively to JMS, you could consider hosted queues like CloudAMQP, Iron.io or AWS SQS. You would lose transaction support on the inbound endpoint but would gain better control on the (re)delivery behaviour.
I am trying to understand how rabbitmq per-connection flow-control works with multiple consumers. In particular what would happen if one consumer were to hang? Would flow control be invoked and how would it affect the rest of the consumers? Would the behaviour depend upon whether the queues were durable or autodeleting?
Thanks.
Rabbit MQ uses "Credit Flow Control".
Essentially, whenever a message is received on a channel a credit is deducted. Credit starts at a default level, e.g. 200, and when it dips below 0, connections are blocked. After a certain number of messages are consumed and ACKed, the credit is bumped up a certain amount.
You can read more about it here:
http://videlalvaro.github.io/2013/09/rabbitmq-internals-credit-flow-for-erlang-processes.html
Per-connection flow control describes what happens when a publisher (or group of publishers) is sending messages to queues faster than the queues are being processed. This is a safety feature as RabbitMQ becomes unstable at some point when the queue fills without bound. From the documentation, this is automatic:
RabbitMQ will block connections which are publishing too quickly for queues to keep up. No configuration is required.
Unfortunately, the documentation is not terribly specific on when/how this flow control is implemented, other than "several times per second." So, if one consumer gets stuck, as long as the other consumer(s) can keep up, flow control should not be triggered.
I have a middleware based on Apache Camel which does a transaction like this:
from("amq:job-input")
to("inOut:businessInvoker-one") // Into business processor
to("inOut:businessInvoker-two")
to("amq:job-out");
Currently it works perfectly. But I can't scale it up, let say from 100 TPS to 500 TPS. I already
Raised the concurrent consumers settings and used empty businessProcessor
Configured JAVA_XMX and PERMGEN
to speed up the transaction.
According to Active MQ web Console, there are so many messages waiting for being processed on scenario 500TPS. I guess, one of the solution is scale the ActiveMQ up. So I want to use multiple brokers in cluster.
According to http://fuse.fusesource.org/mq/docs/mq-fabric.html (Section "Topologies"), configuring ActiveMQ in clustering mode is suitable for non-persistent message. IMHO, it is true that it's not suitable, because all running brokers use the same store file. But, what about separating the store file? Now it's possible right?
Could anybody explain this? If it's not possible, what is the best way to load balance persistent message?
Thanks
You can share the load of persistent messages by creating 2 master/slave pairs. The master and slave share their state either though a database or a shared filesystem so you need to duplicate that setup.
Create 2 master slave pairs, and configure so called "network connectors" between the 2 pairs. This will double your performance without risk of loosing messages.
See http://activemq.apache.org/networks-of-brokers.html
This answer relates to an version of the question before the Camel details were added.
It is not immediately clear what exactly it is that you want to load balance and why. Messages across consumers? Producers across brokers? What sort of concern are you trying to address?
In general you should avoid using networks of brokers unless you are trying to address some sort of geographical use case, have too many connections for a signle broker to handle, or if a single broker (which could be a pair of brokers configured in HA) is not giving you the throughput that you require (in 90% of cases it will).
In a broker network, each node has its own store and passes messages around by way of a mechanism called store-and-forward. Have a read of Understanding broker networks for an explanation of how this works.
ActiveMQ already works as a kind of load balancer by distributing messages evenly in a round-robin fashion among the subscribers on a queue. So if you have 2 subscribers on a queue, and send it a stream of messages A,B,C,D; one subcriber will receive A & C, while the other receives B & D.
If you want to take this a step further and group related messages on a queue so that they are processed consistently by only one subscriber, you should consider Message Groups.
Adding consumers might help to a point (depends on the number of cores/cpus your server has). Adding threads beyond the point your "Camel server" is utilizing all available CPU for the business processing makes no sense and can be conter productive.
Adding more ActiveMQ machines is probably needed. You can use an ActiveMQ "network" to communicate between instances that has separated persistence files. It should be straight forward to add more brokers and put them into a network.
Make sure you performance test along the road to make sure what kind of load the broker can handle and what load the camel processor can handle (if at different machines).
When you do persistent messaging - you likely also want transactions. Make sure you are using them.
If all running brokers use the same store file or tx-supported database for persistence, then only the first broker to start will be active, while others are in standby mode until the first one loses its lock.
If you want to loadbalance your persistence, there were two way that we could try to do:
configure several brokers in network-bridge mode, then send messages
to any one and consumer messages from more than one of them. it can
loadbalance the brokers and loadbalance the persistences.
override the persistenceAdapter and use the database-sharding middleware
(such as tddl:https://github.com/alibaba/tb_tddl) to store the
messages by partitions.
Your first step is to increase the number of workers that are processing from ActiveMQ. The way to do this is to add the ?concurrentConsumers=10 attribute to the starting URI. The default behaviour is that only one thread consumes from that endpoint, leading to a pile up of messages in ActiveMQ. Adding more brokers won't help.
Secondly what you appear to be doing could benefit from a Staged Event-Driven Architecture (SEDA). In a SEDA, processing is broken down into a number of stages which can have different numbers of consumer on them to even out throughput. Your threads consuming from ActiveMQ only do one step of the process, hand off the Exchange to the next phase and go back to pulling messages from the input queue.
You route can therefore be rewritten as 2 smaller routes:
from("activemq:input?concurrentConsumers=10").id("FirstPhase")
.process(businessInvokerOne)
.to("seda:invokeSecondProcess");
from("seda:invokeSecondProcess?concurentConsumers=20").id("SecondPhase")
.process(businessInvokerTwo)
.to("activemq:output");
The two stages can have different numbers of concurrent consumers so that the rate of message consumption from the input queue matches the rate of output. This is useful if one of the invokers is much slower than another.
The seda: endpoint can be replaced with another intermediate activemq: endpoint if you want message persistence.
Finally to increase throughput, you can focus on making the processing itself faster, by profiling the invokers themselves and optimising that code.
I would like to configure my ActiveMQ producers to failover (I'm using the Stomp protocol) when a broker reaches a configured limit. I want to allow consumers to continue consumption from the overloaded broker, unabated.
Reading ActiveMQ docs, it looks like I can configure ActiveMQ to do one of a few things when a broker reaches its limits (memory or disk):
Slow down messages using producerFlowControl="true" (by blocking the send)
Throw exceptions when using sendFailIfNoSpace="true"
Neither of the above, in which case..I'm not sure what happens? Reverts to TCP flow control?
It doesn't look like any of these things are designed to trigger a producer failover. A producer will failover when it fails to connect but not, as far as I can tell, when it fails to send (due to producer flow control, for example).
So, is it possible for me to configure a broker to refuse connections when it reaches its limits? Or is my best bet to detect slow down on the producer side, and to manually reconfigure my producers to use the a different broker at that time?
Thanks!
Your best bet is to use sendFailIfNoSpace, or better sendFailIfNoSpaceAfterTimeout. This will throw an exception up to your client, which can then attempt to resend the message to another broker at the application level (though you can encapsulate this logic over the top of your Stomp library, and use this facade from your code). Though if your ActiveMQ setup is correctly wired, your load both in terms of production and consumption should be more or less evenly distributed across your brokers, so this feature may not buy you a great deal.
You would probably get a better result if you concentrated on fast consumption of the messages, and increased the storage limits to smooth out peaks in load.