cluster_formation.classic_config.nodes does not work of rabbitmq - rabbitmq

I have 2 rabbitmq nodes.
Their node names are: rabbit#testhost1 and rabbit#testhost2
I'd like them can auto cluster.
On testhost1
# cat /etc/rabbitmq/rabbitmq.conf
cluster_formation.peer_discovery_backend = rabbit_peer_discovery_classic_config
cluster_formation.classic_config.nodes.1 = rabbit#testhost1
cluster_formation.classic_config.nodes.2 = rabbit#testhost2
On testhost2
# cat /etc/rabbitmq/rabbitmq.conf
cluster_formation.peer_discovery_backend = rabbit_peer_discovery_classic_config
cluster_formation.classic_config.nodes.1 = rabbit#testhost1
cluster_formation.classic_config.nodes.2 = rabbit#testhost2
I start rabbit#testhost1 first and then rabbit#testhost2.
The second node didn't join to the cluster of first node.
While node rabbit#testhost1 can join rabbit#testhost2 with rabbitmqctl command: rabbitmqctl join_cluster rabbit#testhost2.
So the network between should not have problem.
Could you give me some idea about why can't combine cluster? Is the configuration nor correct?
I have opened the debug log and the info related to rabbit_peer_discovery_classic_config is very little:
2019-01-28 16:56:47.913 [info] <0.250.0> Peer discovery backend rabbit_peer_discovery_classic_config does not support registration, skipping registration.
The rabbitmq version is 3.7.8

Did you start the nodes without cluster config before you attempted clustering?
I had started individual peers with the default configuration once before I added cluster formation settings to the config file. By starting a node without clustering config, it seems form a cluster of its own, and on further start would only contact the last known cluster (self).
From https://www.rabbitmq.com/cluster-formation.html
How Peer Discovery Works
When a node starts and detects it doesn't have a previously initialised database, it will check if there's a peer discovery mechanism configured. If that's the case, it will then perform the discovery and attempt to contact each discovered peer in order. Finally, it will attempt to join the cluster of the first reachable peer.
You should be able to reset the node with rabbitmqctl reset (Warning: This removes all data from the management database, such as configured users and vhosts, and deletes all persistent messages along with clustering information.) and then use clustering config.

Related

How to setup multiple RabbitMQ nodes on a single machine

I was following this guide and have been unable to setup multiple nodes on my machine. I think it might be that rabbitmqctl continues to read my rabbit-conf file rather than inline env variables but unsure.
Here is my error output
##########/bin/zsh: rabbitmqctl -n hare stop_app
Stopping rabbit application on node hare########## ...
Error: unable to perform an operation on node 'hare########'. Please see diagnostics information and suggestions below.
Most common reasons for this are:
* Target node is unreachable (e.g. due to hostname resolution, TCP connection or firewall issues)
* CLI tool fails to authenticate with the server (e.g. due to CLI tool's Erlang cookie not matching that of the server)
* Target node is not running
In addition to the diagnostics info below:
* See the CLI, clustering and networking guides on https://rabbitmq.com/documentation.html to learn more
* Consult server logs on node hare############
* If target node is configured to use long node names, don't forget to use --longnames with CLI tools
DIAGNOSTICS
===========
attempted to contact: ['hare##########']
hare########:
* connected to epmd (port 4369) on ##########
* epmd reports: node 'hare' not running at all
other nodes on ###########:
* suggestion: start the node
Current node details:
* node name: 'rabbitmqcli-413-rabbit############'
* effective user's home directory: /Users/########
* Erlang cookie hash: #########

TCP connection succeeded, Erlang distribution failed

We installed Erlang Vm (erlang-23.2.1-1.el7.x86_64.rpm) and Rabbitmq server(rabbitmq-server-3.8.19-1.el7.noarch.rpm) on 3 different machines and were successful in starting the RabbitMQ server with three different clusters on 3 machines, but when we tried to cluster these rabbitmq nodes we are facing Erlang distribution failed error, googled it and found it might be due to Erlang cookie mismatch can anyone help us how to solve this mismatch issue if it is the root cause
Error message :
Error: unable to perform an operation on node 'rabbit#keng03-dev01-ins01-dmq67-app-1627533565-1'. Please see diagnostics information and suggestions below.
The most common reasons for this are:
Target node is unreachable (e.g. due to hostname resolution, TCP connection, or firewall issues)
CLI tool fails to authenticate with the server (e.g. due to CLI tool's Erlang cookie not matching that of the server)
Target node is not running
In addition to the diagnostics info below:
See the CLI, clustering, and networking guides on https://rabbitmq.com/documentation.html to learn more
Consult server logs on node rabbit#keng03-dev01-ins01-dmq67-app-1627533565-1
If a target node is configured to use long node names, don't forget to use --longnames with CLI tools
DIAGNOSTICS
attempted to contact: ['rabbit#keng03-dev01-ins01-dmq67-app-1627533565-1']
rabbit#keng03-dev01-ins01-dmq67-app-1627533565-1:
connected to epmd (port 4369) on keng03-dev01-ins01-dmq67-app-1627533565-1
epmd reports node 'rabbit' uses port 25672 for inter-node and CLI tool traffic
TCP connection succeeded but Erlang distribution failed
suggestion: check if the Erlang cookie is identical for all server nodes and CLI tools
suggestion: check if all server nodes and CLI tools use consistent hostnames when addressing each other
suggestion: check if inter-node connections may be configured to use TLS. If so, all nodes and CLI tools must do that
suggestion: see the CLI, clustering, and networking guides on https://rabbitmq.com/documentation.html to learn more
Current node details:
node name: 'rabbitmqcli-616-rabbit#keng03-dev01-ins01-dmq67-app-1627533565-2'
effective user's home directory: /var/lib/rabbitmq
Erlang cookie hash: AFJEXwyuc44Sp8oYi00SOw==
'''
I had samer error description, in my case the erlang cookies matched among cluster nodes, but I seemed to face some case-sensitivity with the rabbitmqctl join_cluster-command.
With an elevated command prompt on host 2179NBXXXDP
this failed: rabbitmqctl join_cluster rabbit#2179ASXXX02
and this worked: rabbitmqctl join_cluster rabbit#2179asxxx02
(Hostname of the latter turned out to be indeed lowercased in my case.)

rabbitmq TCP connection succeeded but Erlang distribution failed

I'm getting below error while joining the cluster
root#hostname02:~# rabbitmqctl join_cluster --longnames rabbit#hostname01
Error: unable to perform an operation on node 'rabbit#hostname02'. Please see diagnostics information and suggestions below.
Most common reasons for this are:
* Target node is unreachable (e.g. due to hostname resolution, TCP connection or firewall issues)
* CLI tool fails to authenticate with the server (e.g. due to CLI tool's Erlang cookie not matching that of the server)
* Target node is not running
In addition to the diagnostics info below:
* See the CLI, clustering and networking guides on https://rabbitmq.com/documentation.html to learn more
* Consult server logs on node rabbit#hostname02
* If target node is configured to use long node names, don't forget to use --longnames with CLI tools
DIAGNOSTICS
===========
attempted to contact: ['rabbit#hostname02']
rabbit#hostname02:
* connected to epmd (port 4369) on hostname02
* epmd reports node 'rabbit' uses port 25672 for inter-node and CLI tool traffic
* TCP connection succeeded but Erlang distribution failed
* Node name (or hostname) mismatch: node "rabbit#hostname02" believes its node name is not "rabbit#hostname02" but something else.
All nodes and CLI tools must refer to node "rabbit#hostname02" using the same name the node itself uses (see its logs to find out what it is)
Current node details:
* node name: 'rabbitmqcli-7548-rabbit#hostname02'
* effective user's home directory: /var/lib/rabbitmq
* Erlang cookie hash: Uaa+OhOna5fm+J0oPGPAiw==
root#hostname02:~#
.erlang.cookie on both the servers are same.
I cannot see what I am missing here. Can someone help in resolving it.
Check the --longnames option. Erlang nodes (and thus, its distribution protocol) only need longnames if there is a dot in the host part.
Also, check the node name reported by RabbitMQ on startup, and verify that it matches rabbit#hostname02

GoogleCloud Kubernetes node cannot connect to Redis Memorystore possibly due to overlap in IP ranges

I have a GoogleCloud Kubernetes cluster consisting of multiple nodes and a GoogleCloud Redis Memorystore. Distributed over these nodes are replicas of a pod containing a container that needs to connect to the Redis Memorystore. I have noticed that one of the nodes is not able to connect to Redis, i.e. any container in a pod on that node cannot connect to Redis.
The Redis Memorystore has the following properties:
IP address: 10.0.6.12
Instance IP address range: 10.0.6.8/29 (10.0.6.8 - 10.0.6.15)
The node from which no connection to Redis can be made has the following properties:
Internal IP: 10.132.0.5
PodCIDR: 10.0.6.0/24 (10.0.6.0 - 10.0.6.255)
I assume this problem is caused by the overlap in IP ranges of the Memorystore and this node. Is this assumption correct?
If this is the problem I would like to change the IP range of the node.
I have tried to do this by editing spec.podCIRD in the node config:
$ kubectl edit node <node-name>
However this did not work and resulted in the error message:
# * spec.podCIDR: Forbidden: node updates may not change podCIDR except from "" to valid
# * []: Forbidden: node updates may only change labels, taints, or capacity (or configSource, if the DynamicKubeletConfig feature gate is enabled)
Is there another way to change the IP range of an existing Kubernetes node? If so, how?
Sometimes I need to temporarily increase the number of pods in a cluster. When I do this I want to prevent Kubernetes from creating a new node with the IP range 10.0.6.0/24.
Is it possible to tell the Kubernetes cluster to not create new nodes with the IP range 10.0.6.0/24? If so, how?
Thanks in advance!
Not for a node. The podCidr gets defined when you install your network overlay in initial steps when setting up a new cluster.
Yes for the cluster. but it's not that easy. You have to change the podCidr for the network overlay in your whole cluster. It's a tricky process that can be done, but if you are doing that you might as well deploy a new cluster. Keep in mind that some network overlays require a very specific PodCidr. For example, Calico requires 192.168.0.0/16
You could:
Create a new cluster with a new cidr and move your workloads gradually.
Change the IP address cidr where your GoogleCloud Redis Memorystore lives.
Hope it helps!

How to configure RabbitMQ using Active/Passive High Availability architecture

I'm trying to setup a cluster of RabbitMQ servers, to get highly available queues using an active/passive server architecture. I'm following this guides:
http://www.rabbitmq.com/clustering.html
http://www.rabbitmq.com/ha.html
http://karlgrz.com/rabbitmq-highly-available-queues-and-clustering-using-amazon-ec2/
My requirement for high availability is simple, i have two nodes (CentOS 6.4) with RabbitMQ (v3.2) and Erlang R15B03. The Node1 must be the "active", responding all requests, and the Node2 must be the "passive" node that has all the queues and messages replicated (from Node1).
To do that, i have configured the following:
Node1 with RabbitMQ working fine in non-cluster mode
Node2 with RabbitMQ working fine in non-cluster mode
The next I did was to create a cluster between both nodes: joining Node2 to Node1 (guide 1). After that I configured a policy to make mirroring of the queues (guide 2), replicating all the queues and messages among all the nodes in the cluster. This works, i can connect to any node and publish or consume message, while both nodes are available.
The problem occurs when i have a queue "queueA" that was created on the Node1 (master on queueA), and when Node1 is stopped, I can't connect to the queueA in the Node2 to produce or consume messages, Node2 throws an error saying that Node1 is not accessible (I think that queueA is not replicated to Node2, and Node2 can't be promoted as master of queueA).
The error is:
{"The AMQP operation was interrupted: AMQP close-reason, initiated by
Peer, code=404, text=\"NOT_FOUND - home node 'rabbit#node1' of durable
queue 'queueA' in vhost 'app01' is down or inaccessible\", classId=50,
methodId=10, cause="}
The sequence of steps used is:
Node1:
1. rabbitmq-server -detached
2. rabbitmqctl start_app
Node2:
3. Copy .erlang.cookie from Node1 to Node2
4. rabbitmq-server -detached
Join the cluster (Node2):
5. rabbitmqctl stop_app
6. rabbitmqctl join_cluster rabbit#node1
7. rabbitmqctl start_app
Configure Queue mirroring policy:
8. rabbitmqctl set_policy ha-all "" '{"ha-mode":"all","ha-sync-mode":"automatic"}'
Note: The pattern used for queue names is "" (all queues).
When I run 'rabbitmqctl list_policies' and 'rabbitmqctl cluster_status' is everything ok.
Why the Node2 cannot respond if Node1 is unavailable? Is there something wrong in this setup?
You haven't specified the virtual host (app01) in your set_policy call, thus the policy will only apply to the default virtual host (/). This command line should work:
rabbitmqctl set_policy -p app01 ha-all "" '{"ha-mode":"all","ha-sync-mode":"automatic"}'
In the web management console, is queueA listed as Node1 +1?
It sounds like there might be some issue with your setup. I've got a set of vagrant boxes that are pre-configured to work in a cluster, might be worth trying that and identifying issues in your setup?
Only mirror queue which are synchronized with the master are promoted to be master, after fails. This is default behavior, but can be changed to promote-on-shutdown always.
Read carefully your reference
http://www.rabbitmq.com/ha.html
You could use a cluster of RabbitMQ nodes to construct your RabbitMQ
broker. This will be resilient to the loss of individual nodes in
terms of the overall availability of service, but some important
caveats apply: whilst exchanges and bindings survive the loss of
individual nodes, queues and their messages do not. This is because a
queue and its contents reside on exactly one node, thus the loss of a
node will render its queues unavailable.
Make sure that your queue is not durable or exclusive.
From the documentation (https://www.rabbitmq.com/ha.html):
Exclusive queues will be deleted when the connection that declared them is closed. For this reason, it is not useful for an exclusive
queue to be mirrored (or durable for that matter) since when the node
hosting it goes down, the connection will close and the queue will
need to be deleted anyway.
For this reason, exclusive queues are never mirrored (even if they
match a policy stating that they should be). They are also never
durable (even if declared as such).
From your error message:
{"The AMQP operation was interrupted: AMQP close-reason, initiated by
Peer, code=404, text=\"NOT_FOUND - home node 'rabbit#node1' of
durable queue 'queueA' in vhost 'app01' is down or inaccessible\", classId=50, methodId=10, cause="}
It looks like you created a durable queue.