Message pickup takes a pause with replicated Level db + Zookeeper

Message pickup takes a pause with replicated Level db + Zookeeper - activemq

I've configured 3 ActiveMQ instances along with Zookeeper (Using multi replicated level db).
For verifying if it works fine with zookeeper or not, I've one producer and 8 consumers. Producer has produced around 1000 messages and I am shutting the producer.
Now these threads pick from the queue and keep on processing. I am using consumer.receive() method in the code it blocks for sometime ( indefinite time ) and restarts processing. I see the threads, all are waiting for the messages to be picked from the queue. I am not sure, why this happens even though we have enough messages to be consumed on the queue.
Can anyone please help ?
Regards,
JE
Note :
ActiveMQ version :5.10
ZooKeeper Version : 3.4.6

Related

ServiceStack Redis Mq: is eventual consistency an issue?

I'm looking at turning a monolith application into a microservice-oriented application and in doing so will need a robust messaging system for interprocesses-communication. The idea is for the microserviceprocesses to be run on a cluster of servers for HA, with requests to be processed to be added on a message queue that all the applications can access. I'm looking at using Redis as both a KV-store for transient data and also as a message broker using the ServiceStack framework for .Net but I worry that the concept of eventual consistency applied by Redis will make processing of the requests unreliable. This is how I understand Redis to function in regards to Mq:
Client 1 posts a request to a queue on node 1
Node 1 will inform all listeners on that queue using pub/sub of the existence of
the request and will also push the requests to node 2 asynchronously.
The listeners on node 1 will pull the request from the node, only 1 of them will obtain it as should be. An update of the removal of the request is sent to node 2 asynchronously but will take some time to arrive.
The initial request is received by node 2 (assuming a bit of a delay in RTT) which will go ahead and inform listeners connected to it using pub/sub. Before the update from node 1 is received regarding the removal of the request from the queue a listener on node 2 may also pull the request. The result being that two listeners ended up processing the same request, which would cause havoc in our system.
Is there anything in Redis or the implementation of ServiceStack Redis Mq that would prevent the scenario described to occur? Or is there something else regarding replication in Redis that I have misunderstood? Or should I abandon the Redis/SS approach for Mq and use something like RabbitMQ instead that I have understood to be ACID-compliant?

It's not possible for the same message to be processed twice in Redis MQ as the message worker pops the message off the Redis List backed MQ and all Redis operations are atomic so no other message worker will have access to the messages that have been removed from the List.
ServiceStack.Redis (which Redis MQ uses) only supports Redis Sentinel for HA which despite Redis supporting multiple replicas they only contain a read only view of the master dataset, so all write operations like List add/remove operations can only happen on the single master instance.
One notable difference from using Redis MQ instead of specific purpose MQ like Rabbit MQ is that Redis doesn't support ACK's, so if the message worker process that pops the message off the MQ crashes then it's message is lost, as opposed to Rabbit MQ where if the stateful connection of an un Ack'd message dies the message is restored by the RabbitMQ server back to the MQ.

One mule app server in cluster polling maximum message from MQ

My mule application is comprised of 2 nodes running in a cluster, and it listens to IBM MQ Cluster (basically connecting to 2 MQ via queue manager). There are situations where one mule node pulls or takes more than 80% of message from MQ cluster and another mule node picks rest 20%. This is causing CPU performance issues.
We have double checked that all load balancing is proper, and very few times we get CPU performance problem. Please can anybody give some ideas what could be possible reason for it.
Example: last scenario was created where there are 200000 messages in queue, and node2 mule server picked 92% of message from queue within few minutes.

This issue has been fixed now. Got into the root cause - our mule application running on MULE_NODE01 reads/writes to WMQ_NODE01, and similarly for node 2. One of the mule node (lets say MULE_NODE02) reads from linux/windows file system and puts huge messages to its corresponding WMQ_NODE02. Now, its IBM MQ which tries to push maximum load to other WMQ node to balance the work load. That's why MULE_NODE01 reads all those loaded files from WMQ_NODE01 and causes CPU usage alerts.
#JoshMc your clue helped a lot in understanding the issues, thanks a lot for helping.
Its WMQ node in a cluster which tries to push maximum load to other WMQ node, seems like this is how MQ works internally.
To solve this, we are now connecting our mule node to MQ gateway, rather making 1-to-1 connectivity

This could be solved by avoiding the racing condition caused by multiple listeners. Configure the listener in the cluster to the primary node only.
republish the message to a persistent VM queue.
move the logic to another flow that could be triggered via a VM listener and let the Mule cluster do the load balancing.

Messages replayed after reboot in rabbitMQ

I have a rabbitMQ cluster with two nodes configured to be synchronized. Each queue is mirrored and persistent.
Each time I need to reboot a node of my cluster, some old messages are replayed.
I don’t understand why because one of the two nodes is still alive and they are "normally" synchronized.
Have you any idea to help me to investigate this problem?

Could you check if you have some messages that are not acknowledged?
If you do (this would mean a consumer never acknowledges it), it could explain the behavior:
Message consumed but never acknowledged
Node reboots
The connections of the consumers connected to that node get closed
Any of the unacked messages that have been consumed in the related channel are put back in the queue

Logstash with rabbitmq cluster

I have a 3 node cluster of Rabbitmq behind a HAproxy Load Balancer. When I shut down a node, Rabbitmq successfully switches the queue to the other nodes. However, I notice that Logstash stops pulling messages from the queue unless I restart it. Is this a problem with the way rabbitmq operates? i.e. it deactivates all active consumers. I am not sure if log stash has any retry capability. Anyone run into this issue?

Quoting rabbit mq documentation, page for clustering first
What is Replicated? All data/state required for the operation of a
RabbitMQ broker is replicated across all nodes. An exception to this
are message queues, which by default reside on one node, though they
are visible and reachable from all nodes.
and high availability
Clients that are consuming from a mirrored queue may wish to know that
the queue from which they have been consuming has failed over. When a
mirrored queue fails over, knowledge of which messages have been sent
to which consumer is lost, and therefore all unacknowledged messages
are redelivered with the redelivered flag set. Consumers may wish to
know this is going to happen.
If so, they can consume with the argument x-cancel-on-ha-failover set
to true. Their consuming will then be cancelled on failover and a
consumer cancellation notification sent. It is then the consumer's
responsibility to reissue basic.consume to start consuming again.
So, what does all this mean:
You have to mirror queues
The consumers should use manual ACK
The consumers should reconnect on their own
So the answer to your question is no, it's not a problem with rabbitmq, that's simply how it works. It's up to clients to reconnect.

Does Kafka handle network failure better than RabbitMQ?

We have been having below issues from RabbitMQ and had been manually restarting the servers every weekend as a work around.
Network partition detected
Mnesia reports that this RabbitMQ cluster has experienced a network partition. This is a dangerous situation. RabbitMQ clusters should not be installed on networks which can experience partitions.
We have gone through other popular posts on the topic e.g. here and here
Our network is not highly reliable and occasional blips are expected but when it does come up I would have expected 1 of the 4 node RabbitMQ cluster to join the rest of cluster - as is the case with 4 nodes of Tomcat installed on same servers.
Although the nodes on single partition continue to run independently but doesnt seem like that is a graceful recovery from failure in one node.
We didnt have great luck with using any rabbitmqctl commands like rabbitmqctl cluster_status - It used to sporadically cause the rabbitmq process to hang which needed a sudo kill to RabbitMQ process.
We are at a point of evaluating moving to Kafka or any other message broker that handles message partition well
Any thoughts on working around not needing manual RabbitMQ restarts or ability of Kafka to handle such situation is highly appreciated

I think Kafka with replication should be able to handle network partitions quite easily, as long as the number of brokers partitioned is inferior to the replication factor of your topic (aka, the consumers and producers can always reach at least 1 broker for the topics they're operating with).
To avoid backpressure in the clients while Zookeeper discover the partition and propagate the information to the producers and consumer, you may want to set short ZK heartbeating (yes, you'll need ZK, and a cluster too since you absolutely don't want your whole ZK cluster partitioned).
Fair warning though : using a cluster of kafka brokers will drop the FIFO aspect of your message queue which can be pretty disturbing if you're expecting the same order of messages produced by the producers and read by the consumers, which you could expect with RabbitMQ.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas