Unable to Start RabbitMQ Due to Inconsistent Node - rabbitmq

I'm attempting to start a RabbitMQ node that was disconnected due an error in setup.
Now I'm unable to start the node because of the inconsistent node error. Reading online, all arrows point to a mnesia directory for node info, but this directory does not exist on my server.
How can I force a node to forget node configuration it the service doesn't start?

My problem was the persistence at which my node was retaining the last known connections.
I have to do delete the data in my node's data partition in order for it to forget it's last known connections.
Once deleted I was able to start the node isolated then join it as ram to the disk node.

Related

is it possible to start a Rabbitmq server in standalone mode if it is a node in a cluster?

Is it possible to start a Rabbitmq server without joining the cluster if it is a cluster member?
I want to stop, and start the server again, but without contacting the others members.
If I stop my server and remove it from the cluster using forget_cluster_node.
When I try to start it again I get this message:
{:inconsistent_cluster, 'Node rabbit#server3 thinks it\'s clustered with node rabbit#server8, but rabbit#server8 disagrees'}
I don't want to use reset because this cleans everything in my server, I'll lose the messages in the server.

Ignite error upgrading the setup in Kubernetes

While I upgraded the Ignite that is deployed in Kubernetes (EKS) for Log4j vulnerability, I get the error below
[ignite-1] Caused by: class org.apache.ignite.spi.IgniteSpiException: BaselineTopology of joining node (54b55de4-7742-4e82-9212-7158bf51b4a9) is not compatible with BaselineTopology in the cluster. Joining node BlT id (4) is greater than cluster BlT id (3). New BaselineTopology was set on joining node with set-baseline command. Consider cleaning persistent storage of the node and adding it to the cluster again.
The setup is a 3 node cluster, with native persistence enabled (PVC). This seems to be occurring many times in our journey with Apache Ignite, having followed the official guide.
I cannot clean the storage as the pod gets restarted every now and then, by the time I get the pod shell the pod crash & restarts.
This might happen to be due to the wrong startup order, starting nodes manually in reverse order may resolve this, but I'm not sure if that is possible in K8s. Another possible issue might be related to the baseline auto-adjustment that might change your baseline unexpectedly, I suggest you turn it off if it's enabled.
One of the workarounds to clean a DB of a failing POD might be (quite tricky) - to replace Ignite image with some simple image like a plain Debian or Alpine docker images (just to be able to access CLI) keeping the same PVC attached, and once you fix the persistence issue, set the Ignite image back. The other one is - to access underlying PV directly if possible and do surgery in place.

Redis cluster node failure not detected on MISCONF

We currently have a redis cache cluster with 3 masters and 3 slaves hosted on 3 windows servers (1 master/slave by server). We are using StackExhange.Redis as our client.
We have RBD disabled but AOF enabled and are experiencing some problems with the cluster in the following situation :
One of our servers became full and the redis node on this server was unable to write to the AOF file (the error returned to the client was MISCONF Errors writing to the AOF file: No space left on device).
The cluster did not detect that the node was failing and so did not exlclude it from the cluster.
All cache operations were blocked until we make some place on the server.
We know that we don't need the AOF, so we have disalbed it after the incident.
But we would like to confirm or infirm our view on redis clustering: for us, if a node was experiencing a failure, the cluster would redirect all requests to another one. We have tested that with a stopped node master, a slave is promoted into a master so we are confident that our cluster is working, but we are not sure why, in our case, the node was not marked as a failure.
Is the cluster capable of detecting a node failure when the failure is only happening when a request is made from a client to the cluster ?

rabbitmq cluster how to change active/active into active/passive mode?

I have setp a 2 nodes rabbitmq cluster with one loader balancer at frontend, after this was setup, it was working as active/active mode, then network partition happened on one node, I got the failed node out of the cluster and rejoin it into the cluster again, then this failed node were not accecpting any connection.
Then I tried to moved the other node out of the balancer, the recovered node began to accept connections, so this cluster is active/passive mode.
I don't know what caused this, is there any way to change it back to active/active? And which step to specify its mode during setup?
Thanks for your advice in advance!
rabbitmq really (really) doesn't like network partitions. By default, when you have one, everything pauses. In that situation you must fix it manually. Choosing the loser by stopping it and starting it should resume everything once it rejoins the cluster.
If that doesn't work, then shut down the failed node, and use rabbitmqctl to "forget_cluster_node", and then rejoin it to the cluster.
You should read this very carefully
https://www.rabbitmq.com/partitions.html
specifically, "Recovering from a network partition"
Then read the next few paragraphs even more carefully. There are some automatic recovery modes, each with advantages and disadvantages.
At my company we chose autoheal because we value availability, and accept the possible loss of messages.

CouchBase 2.5 2 nodes in replica: 1 node fail: the service is no more available

We are testing Couchbase with a two node cluster with one replica.
When we stop the service on one node, the other one does not respond until we restart the service or manually failover the stopped node.
Is there a way to maintain the service from the good node when one node is temporary unavailable?
If a node goes down then in order to activate the replicas on the other node you will need to manually fail it over. If you want this to happen automatically then you can enable auto-failover, but in order to use that feature I'm pretty sure you must have at least a three node cluster. When you want to add the failed node back then you can just re-add it to the cluster and rebalance.