Durable vs Mirrored Queue in RabbitMQ - rabbitmq

My queues are durable and Messages are persistent. I have setup 3 RabbitMQ Server Cluster having HA mirroring of all queues among all servers. My Master node seems to be Rabbitmq3 When I shutdown RabbitmQ3. I get following errors.
Caused by: com.rabbitmq.client.ShutdownSignalException: channel error; protocol method: #method<channel.close>(reply-code=404, reply-text=NOT_FOUND - home node 'rabbit#rabbitmq3' of durable queue 'durable-test-queue' in vhost 'test' is down or inaccessible
I think if I have mirrored queues in Cluster. I should not create durable queue since they will cause problem if my rabbitmq master node goes down suddenly

The whole point of a cluster - you system should tolerate failure of any single node, including queue master. Your error is just a notification that current master is down. Cluster should elect new master and queue should continue to function, regardless of durability of the queue / persistence of messages.
You should be able to continue to send/receive messages on those durable queues.

Related

Spring AMQP reconnection issue with rabbitmq cluster due to queue checking retry limit

I have a rabbitmq cluster with 3 nodes. One node has a durable and non-mirrored classic queue named test-queue.
I have a spring boot app using spring-AMQP default connection factory new CachingConnectionFactory() to firstly ensure the queue exists and then subscribe its messages. Everything works fine
Then I started a rolling update to the rabbitmq cluster, where node was being restarted one by one.
I observed following during this process from the log:
Upon start I saw below output
Received shutdown signal for consumer tag=amq.ctag-pzPHM_GEd5e-J5Y_L2W7_g com.rabbitmq.client.ShutdownSignalException: connection error; protocol method: #method<connection.close>(reply-code=320, reply-text=CONNECTION_FORCED - broker forced connection closure with reason 'shutdown', class-id=0, method-id=0)
...
org.springframework.amqp.rabbit.connection.CachingConnectionFactory[m][] - Attempting to connect to: xxx:5672
...
org.springframework.amqp.rabbit.connection.CachingConnectionFactory[m][] - Created new connection: xxx#66971f6b:58/SimpleConnection#4315e774
Which shows that the app received shutdown signal and successfully reconnected. At this point, it looks like the node that has the queue was shut down, but the app was able to establish a new connection because there are other nodes
Later I saw more shut down signal which indicates the other node started to shutdown
org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer[m][] - Consumer raised exception, processing can restart if the connection factory supports it com.rabbitmq.client.ShutdownSignalException: connection error; protocol method: #method<connection.close>(reply-code=320, reply-text=CONNECTION_FORCED - broker forced
At the same time I noticed below logs, which indicate that although connected, spring amqp can't find the queue. I guess it is because the node has the queue was down. Spring amqp might be checking other nodes. It thought the queue does not exist so it started to recreate the queue. Also note that there was a retry limit which is 3
org.springframework.amqp.rabbit.listener.BlockingQueueConsumer[m][] - Failed to declare queue: test-queuey
Queue declaration failed; retries left=3 org.springframework.amqp.rabbit.listener.BlockingQueueConsumer$DeclarationException: Failed to declare queue(s):[test-queue]
...
Caused by: com.rabbitmq.client.ShutdownSignalException: channel error; protocol method: #method<channel.close>(reply-code=404, reply-text=NOT_FOUND - queue 'test-queue' in vhost '/' process is stopped by supervisor, class-id=50, method-id=10)
At the end, retry exhausted. I noticed the followings. Looks like spring amqp give up, and started to close everything. The end state was that, no consumer registered to the queue. Spring app was still running but not be able to get messages. It no longer retry like how the disconnection is handled. The resolution was to reboot the app.
org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer[m][] - Cancelling Consumer#7f74d6dd: tags=[[]], channel=Cached Rabbit Channel: AMQChannel(amqp://guest#xxx:5672/,26), conn: Proxy#65ef722a Shared Rabbit Connection: SimpleConnection#4315e774 [delegate=amqp://guest#xxx:5672/, localPort= 37208], acknowledgeMode=AUTO local queue size=0
org.springframework.amqp.rabbit.listener.BlockingQueueConsumer[m][] - Closing Rabbit Channel: Cached Rabbit Channel: AMQChannel(amqp://guest#xxx:5672/,26), conn: Proxy#65ef722a Shared Rabbit Connection: SimpleConnection#4315e774 [delegate=amqp://guest#xxx:5672/, localPort= 37208]
org.springframework.amqp.rabbit.connection.CachingConnectionFactory[m][] - Closing cached Channel: AMQChannel
org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer[m][] - Stopping container from aborted consumer
org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer[m][] - Shutting down Rabbit listener container
I get spring amqp come with a retry on disconnection logic which keeps reconnecting indefinitely. But for such case, how can I make it so that spring wait until cluster restart is completed then start reconnecting? or is there a way to disable the retry limit on the queue checking so that it will keep checking the queue until cluster restart is completed instead of giving up early? Would changing queue to mirrored queue or Quorum queue resolve this issue?
See
https://docs.spring.io/spring-amqp/docs/current/reference/html/#declarationRetries
The number of retry attempts when passive queue declaration fails. Passive queue declaration occurs when the consumer starts or, when consuming from multiple queues, when not all queues were available during initialization. When none of the configured queues can be passively declared (for any reason) after the retries are exhausted, the container behavior is controlled by the 'missingQueuesFatal` property, described earlier.
and
https://docs.spring.io/spring-amqp/docs/current/reference/html/#failedDeclarationRetryInterval
The interval between passive queue declaration retry attempts. Passive queue declaration occurs when the consumer starts or, when consuming from multiple queues, when not all queues were available during initialization.
You can increase one or both of these from their defaults (3 and 5000 respectively).

RabbitMQ Quorum Queue - No new leader found when initial leader goes down

I am doing POC on RabbitMQ's Quorum Queues, especially focusing on fail-over mechanism. In my case I have two nodes (for example NodeA and NodeB) and one Quorum queue which resides on NodeA. Now whenever I am publishing a test message to Quorum queue of NodeA, I can see the same message on NodeB.
Now when testing the failover mechanism and stopping NodeA, I am unable to publish any message, also I can not see any messages in quorum queue, I think the NodeB is not promoted to be a new leader. I am supposing the leader would be promoted automatically, do I need to do anything to make the other Node leader ?
Kind Regards
quorum queues do not support two nodes clusters and two-node clusters are highly recommended against for any clusters.
From Quorum Queues documentation guide:
A quorum queue requires a quorum of the declared nodes to be available to function.
When a RabbitMQ node hosting a quorum queue's leader fails or is stopped another node
hosting one of that quorum queue's follower will be elected leader and resume operations.

Logstash with rabbitmq cluster

I have a 3 node cluster of Rabbitmq behind a HAproxy Load Balancer. When I shut down a node, Rabbitmq successfully switches the queue to the other nodes. However, I notice that Logstash stops pulling messages from the queue unless I restart it. Is this a problem with the way rabbitmq operates? i.e. it deactivates all active consumers. I am not sure if log stash has any retry capability. Anyone run into this issue?
Quoting rabbit mq documentation, page for clustering first
What is Replicated? All data/state required for the operation of a
RabbitMQ broker is replicated across all nodes. An exception to this
are message queues, which by default reside on one node, though they
are visible and reachable from all nodes.
and high availability
Clients that are consuming from a mirrored queue may wish to know that
the queue from which they have been consuming has failed over. When a
mirrored queue fails over, knowledge of which messages have been sent
to which consumer is lost, and therefore all unacknowledged messages
are redelivered with the redelivered flag set. Consumers may wish to
know this is going to happen.
If so, they can consume with the argument x-cancel-on-ha-failover set
to true. Their consuming will then be cancelled on failover and a
consumer cancellation notification sent. It is then the consumer's
responsibility to reissue basic.consume to start consuming again.
So, what does all this mean:
You have to mirror queues
The consumers should use manual ACK
The consumers should reconnect on their own
So the answer to your question is no, it's not a problem with rabbitmq, that's simply how it works. It's up to clients to reconnect.

ActiveMQ cosumer connection differ from producer

The following is my ActiveMQ setup:
I have two AMQ broker which are configured with failover.
I have 40 producer but only on consumer.
Now the problem:
From time to time, one of the producer lost the connection to the master broker. The failover reacts and the producer gets a new connection to the slave which gets the messages. So far so good. But the consumer does not have the problem, he consumes still the messages from the master. He does not know, that the slave has also some messages.
How can i now solve the problem woth losing those messages thay are sent to the slave?
Thank in advance
I would recommend you configure a network of brokers. That way, your brokers will be connected as well, and it no longer matters which broker your producers and consumers connect to - the messages will get propagated across the network.

ActiveMQ replicated levelDB with zookeeper

I want to understand zookeeper's role in replicated leveldb for ActiveMQ broker.
About zookeeper election : How does zookeeper knows that out of all the clients connected to zookeeper, which clients are ActiveMQ brokers fighting to become master. Is there any particular key or configuration which is passed by all the brokers connecting to zookeeper which says that we (let say 3) ActiveMQ brokers belong to same environment and fighting to become master.
At what interval slave broker copy data from master broker ? Any corner cases where data might be lost ?
Does ActiveMQ provides guarantee of message ordering using replicated leveldb ? I am talking about the case when re-election of master happens while producer is sending messages in sequence to the broker?
Thanks,
Anuj
By zkPath in the Zookeeper configuration and by broker name.
Each message is synced to a quorum (nodes/2+1) brokers before the transaction completes. So there is no sync interval, it's synced in real time. The cluster will no function unless you have a quorum of brokers online so there should be no data loss.
The messages are synced to a majority of the nodes in a synchronous fashion. At reelection, a node with the latest updates will be elected. Ordered messages should be no problem. However, it's generally problematic to rely critically on ordered messages in a message queueing. As a rule of thumb - message order will only be complete under "happy days". Dead letters, multiple consumers and so forth might as well mess up message order.