One mule app server in cluster polling maximum message from MQ - mule

My mule application is comprised of 2 nodes running in a cluster, and it listens to IBM MQ Cluster (basically connecting to 2 MQ via queue manager). There are situations where one mule node pulls or takes more than 80% of message from MQ cluster and another mule node picks rest 20%. This is causing CPU performance issues.
We have double checked that all load balancing is proper, and very few times we get CPU performance problem. Please can anybody give some ideas what could be possible reason for it.
Example: last scenario was created where there are 200000 messages in queue, and node2 mule server picked 92% of message from queue within few minutes.

This issue has been fixed now. Got into the root cause - our mule application running on MULE_NODE01 reads/writes to WMQ_NODE01, and similarly for node 2. One of the mule node (lets say MULE_NODE02) reads from linux/windows file system and puts huge messages to its corresponding WMQ_NODE02. Now, its IBM MQ which tries to push maximum load to other WMQ node to balance the work load. That's why MULE_NODE01 reads all those loaded files from WMQ_NODE01 and causes CPU usage alerts.
#JoshMc your clue helped a lot in understanding the issues, thanks a lot for helping.
Its WMQ node in a cluster which tries to push maximum load to other WMQ node, seems like this is how MQ works internally.
To solve this, we are now connecting our mule node to MQ gateway, rather making 1-to-1 connectivity

This could be solved by avoiding the racing condition caused by multiple listeners. Configure the listener in the cluster to the primary node only.
republish the message to a persistent VM queue.
move the logic to another flow that could be triggered via a VM listener and let the Mule cluster do the load balancing.

Related

ServiceStack Redis Mq: is eventual consistency an issue?

I'm looking at turning a monolith application into a microservice-oriented application and in doing so will need a robust messaging system for interprocesses-communication. The idea is for the microserviceprocesses to be run on a cluster of servers for HA, with requests to be processed to be added on a message queue that all the applications can access. I'm looking at using Redis as both a KV-store for transient data and also as a message broker using the ServiceStack framework for .Net but I worry that the concept of eventual consistency applied by Redis will make processing of the requests unreliable. This is how I understand Redis to function in regards to Mq:
Client 1 posts a request to a queue on node 1
Node 1 will inform all listeners on that queue using pub/sub of the existence of
the request and will also push the requests to node 2 asynchronously.
The listeners on node 1 will pull the request from the node, only 1 of them will obtain it as should be. An update of the removal of the request is sent to node 2 asynchronously but will take some time to arrive.
The initial request is received by node 2 (assuming a bit of a delay in RTT) which will go ahead and inform listeners connected to it using pub/sub. Before the update from node 1 is received regarding the removal of the request from the queue a listener on node 2 may also pull the request. The result being that two listeners ended up processing the same request, which would cause havoc in our system.
Is there anything in Redis or the implementation of ServiceStack Redis Mq that would prevent the scenario described to occur? Or is there something else regarding replication in Redis that I have misunderstood? Or should I abandon the Redis/SS approach for Mq and use something like RabbitMQ instead that I have understood to be ACID-compliant?
It's not possible for the same message to be processed twice in Redis MQ as the message worker pops the message off the Redis List backed MQ and all Redis operations are atomic so no other message worker will have access to the messages that have been removed from the List.
ServiceStack.Redis (which Redis MQ uses) only supports Redis Sentinel for HA which despite Redis supporting multiple replicas they only contain a read only view of the master dataset, so all write operations like List add/remove operations can only happen on the single master instance.
One notable difference from using Redis MQ instead of specific purpose MQ like Rabbit MQ is that Redis doesn't support ACK's, so if the message worker process that pops the message off the MQ crashes then it's message is lost, as opposed to Rabbit MQ where if the stateful connection of an un Ack'd message dies the message is restored by the RabbitMQ server back to the MQ.

RabbitMQ as Message Broker used by Spring Websocket dies under load

I develop an application where we need to handle 160k concurrent users which are connected to the backend via a websocket connection.
We decided to use the spring websocket implementation and RabbitMQ as the message broker.
In our application every user needs to subscribe to its user queue /exchange/amq.direct/update as well as to another queue where also other users can potential subscribe to /topic/someUniqueName.
In our first performance test we did the naive approach where every user subscribes to two new queues.
When running the test RabbitMQ dies silently when around 800 users are connected at the same time, so around 1600 queues are active (See the graph of all RabbitMQ objects here).
I read though that you should be careful opening many connections to RabbitMQ.
Now I wonder if the approach that is anticipated by Spring Websocket with opening one queue per user is a conceptional problem for systems with high load or if there is another error in my system.
Limiting factors for RabbitMQ are usually:
memory (can be checked in dashboard) that needs to grow with number of messages and number of queues (if you don't use lazy queues that go directly to disk).
maximum number of file descriptors (at least 1 per connection) that often defaults to too low values on many distributions (ref: https://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2012-April/019615.html)
CPU for routing the messages
I did find the issue. I actually misconfigured the RabbitMQ service and just gave it a 1024 file descriptor limit. Increasing it solved the issue.

Why would you run a messaging queue (eg RabbitMQ) cluster?

Overview
A RabbitMQ broker is a logical grouping of one or several Erlang
nodes, each running the RabbitMQ application and sharing users,
virtual hosts, queues, exchanges, bindings, and runtime parameters.
Sometimes we refer to the collection of nodes as a cluster.
Why would you do this? I understand to increase durability of messages (if a node goes down, other queues still get the messages). But what about performance? How does cluster improve performance. Won't all consumers/producers connect to the master node's queue anyway? If so, aren't we still getting traffic on a single node regardless? Do we put a load balancer so traffic is directed at different nodes each time?
How does RabbitMQ cluster increase performance?
Well, right after that paragraph, the documentation states the following:
What is Replicated?
All data/state required for the operation of a RabbitMQ broker is
replicated across all nodes. An exception to this are message queues,
which by default reside on one node, though they are visible and
reachable from all nodes. To replicate queues across nodes in a
cluster, see the documentation on high availability (note that you
will need a working cluster first).
So, you would cluster to provide higher capacity in your RabbitMQ broker than a single node can provide alone. Note that clustering by itself is not a high-availability strategy.
Your assertion that message durability is increased is false, as message queues continue to reside on one broker (unless mirroring is used).
By default, contents of a queue within a RabbitMQ cluster are located on a single node (the node on which the queue was declared) [1]
Without mirroring, when that node goes down, messages on it will be lost. The cluster will put the queue onto a different node. RabbitMQ does not handle network partitions well, so this can be a bit of a problem.
"Aren't we still getting traffic on a single node regardless?" - if you only have one queue, then yes. However, a bigger question is "why would you run a message broker with only one queue?" Similarly, if you only create queues on one node, then you will still have one point of failure in the system.

Cluster-wide singleton in Websphere Cluster

I need to run a component using Apache Camel (or Spring Integration) under WAS ND 8.0 cluster. They both run some threads on startup, and stop them on shutdown normally. No problem to supply WAS managed threadpool. But that threads must run on single cluster's node at the same time. Moreover it must be high-available i.e. switch to other node when active node falls.
Solution I found - is WAS Partitioning Facility. It requires additional Extended Deployment licenses. Is it the only way, or there is some way to implement this using Network Deployment license only?
Thanks in advance.
I think that there is not a feature that address this interesting requirement.
I can imagine a "trick":
A Timer EJB send a message on a queue (let's say 1 per minute)
Configure a Service Integration Bus (SIB) with High Availability and No Scalability, so the HA Manager ensure that only one messaging engine (ME) is alive.
Create a non-reliable queue for high performances and low resource consumption.
The Activation Spec should be configured to listen only local ME.
A MDB implement the following logic: when the message arrives, it check if the singleton thread is alive, otherwise it start the thread.
Does it make sense?

rabbitmq HA cluster

I am wanting to setup RabbitMQ as a two (or more) node cluster with HA.
Use case: a client producer app (C#.NET) knows that the cluster has two nodes and publishes to the cluster. Various consumer apps (also C#.NET) connect to the cluster and get all messages generated by the producer. So long as at least one node is up and running the producer and consumers will all continue to work without error. Supposing nodes A and B are running and B dies for a while, then gets restarted, then a while later A dies, the clients all continue to function without receiving an error since at all times at least one node is up.
Can it be made to work like this out of the box?
Are there any other MQs that would be more appropriate (commercial ok) for a Windows/.NET application environment?
RabbitMQ v2.6.0 now supports high-availability queues using active/active clustering. Microsoft and a number of other companies have collaborated on Apache QPid which has C# bindings and which also supports active/active HA clustering.
Can it be made to work like this out of the box?
No. When a node goes down, all of its connections are closed. Since AMQP connections are stateful, there's no way around this. What you could achieve is 1) broker goes down, 2) all clients disconnect, 3) clients connect to other node (masquerading as original) and are none the wiser.
On a side note, rabbit does not support active-active HA clustering at the moment. It does support active-passive clustering and a form of logical clustering (which might be what you're looking for).