RabbitMQ - AWS EC2 Clustering hell

RabbitMQ - AWS EC2 Clustering hell - rabbitmq

Sorry, should be shot for having to even ask this, but wasted day on this - and feel like I've read everything there is.
I can't create a cluster on my EC2 instances (3) that are spread on three different regions. The hosts:
rabbit#ip-172-31-47-217
rabbit#ip-172-31-1-82
rabbit#ip-172-31-36-111
The initial state before trying to make the cluster:
ubuntu#ip-172-31-47-217:~$ sudo rabbitmqctl cluster_status
Cluster status of node 'rabbit#ip-172-31-47-217' ...
[{nodes,[{disc,['rabbit#ip-172-31-47-217']}]},
{running_nodes,['rabbit#ip-172-31-47-217']},
{partitions,[]}]
ubuntu#ip-172-31-36-111:~$ sudo rabbitmqctl cluster_status
Cluster status of node 'rabbit#ip-172-31-36-111' ...
[{nodes,[{disc,['rabbit#ip-172-31-36-111']}]},
{running_nodes,['rabbit#ip-172-31-36-111']},
{partitions,[]}]
ubuntu#ip-172-31-1-82:~$ sudo rabbitmqctl cluster_status
Cluster status of node 'rabbit#ip-172-31-1-82' ...
[{nodes,[{disc,['rabbit#ip-172-31-1-82']}]},
{running_nodes,['rabbit#ip-172-31-1-82']},
{partitions,[]}]
When I try to check status from one server for another:
sudo rabbitmqctl status -n rabbit#ip-172-31-1-82
Status of node 'rabbit#ip-172-31-1-82' ...
Error: unable to connect to node 'rabbit#ip-172-31-1-82': nodedown
nodes in question: ['rabbit#ip-172-31-1-82']
hosts, their running nodes and ports:
- unable to connect to epmd on ip-172-31-1-82: timeout (timed out)
current node details:
- node name: 'rabbitmqctl3835#ip-172-31-36-111'
- home dir: /var/lib/rabbitmq
- cookie hash: 0tsf/OyQZI7zobmv1Ia97w==
All three servers have the same erlang cookie hash.
I can verify the host names are setup properly:
host ip-172-31-36-111
ip-172-31-36-111.us-west-2.compute.internal has address 172.31.36.111
I know the ports are open:
netstat -plten | grep beam
Because I opened all TCP and UDP at this point as a test, no change.
and finally if this would behave differently given those failures:
sudo rabbitmqctl join_cluster --ram rabbit#ip-172-31-1-82
Clustering node 'rabbit#ip-172-31-47-217' with 'rabbit#ip-172-31-1-82' ...
Error: {cannot_discover_cluster,"The nodes provided are either offline or not running"}
Please help, being driven insane by this.

The problem is that they are in different regions (presumably in EC2-classic - you didn't mention whether you were using a VPC). This means they cannot communicate via their private IPs (see e.g. Can EC2 instances in different regions communicate over their private IP addresses?)
ping 172.31.36.111
will fail from one of the other servers, for example. Pinging using the hostname probably will probably even fail on the DNS lookup.
Your options are:
Put them in separate zones in a single region (in EC2 classic, they will be able to communicate). You could also use a VPC in this case, putting the in separate subnets but allowing interconnections via appropriately set up security groups.
Set up /etc/hosts on each server to point the relevant public IPs of the other servers (you could attach elastic IPs to each server to ensure stability across server restarts). You could also set the hostname of each server for clarity. Set you your security groups to allow access on the relevant ports that rabbitmq uses. There may be security implications of doing this, since the data will be travelling over the public internet.
Set up a VPN between each server in the cluster. Amazon VPC has a VPN facility, but there are ways of setting it up yourself I think.
I think only option 1 is simplest. Option 2 has major security implications (I believe there are ways of securing the connection between the cluster servers, but they aren't documented on the rabbitmq website as far as I can tell). Option 3 is complex but probably the best option if you need multiple regions.
Note that rabbitmq clusters aren't meant to be run over wide geographical areas, since they aren't too reliable in the face of network partitions. See here: https://www.rabbitmq.com/clustering.html

Related

RabbitMQ cluster on a single machine

I want to create a three node RabbitMQ cluster on a single RHEL8 machine for testing purposes. I tried instructions given in RabbitMQ official guide and also tried to follow this guide.
The first node works fine and it's running. However, the second node cannot be started and throws up an error.
I used below commands as mentioned in the guide.
RABBITMQ_NODE_PORT=5672 RABBITMQ_NODENAME=rabbit rabbitmq-server -detached
RABBITMQ_NODE_PORT=5673 RABBITMQ_NODENAME=hare rabbitmq-server -detached
rabbitmqctl -n hare stop_app
This command throws up below error.
DIAGNOSTICS
attempted to contact: [hare#localhost]
hare#localhost:
connected to epmd (port 4369) on localhost
epmd reports: node 'hare' not running at all
other nodes on localhost: [rabbit]
On further inspection of logs, it seems like that this node tries to use the same ports used by the first node (e.g. MQTT port 1883).
I think I might have to use the other option of declaring /etc/rabbitmq/rabbitmq.conf. Mainly because it seems to give more options to change ports etc.
A sample config file resembling the one needed in my case or a link to a proper guide is highly appreciated.

You didn't specify, but you must have the MQTT plugin enabled for there to be a conflict on that port, correct?
The easiest work-around would be to have two configuration files specifying different ports for MQTT, AMQP and anything else. Then, use the RABBITMQ_CONFIG_FILE environment variable to point to the appropriate file:
RABBITMQ_NODE_PORT=5672 RABBITMQ_NODENAME=rabbit0 \
RABBITMQ_CONFIG_FILE=/path/to/rabbitmq-0.conf rabbitmq-server -detached
RABBITMQ_NODE_PORT=5673 RABBITMQ_NODENAME=rabbit1 \
RABBITMQ_CONFIG_FILE=/path/to/rabbitmq-1.conf rabbitmq-server -detached
NOTE: the RabbitMQ team monitors the rabbitmq-users mailing list and only sometimes answers questions on StackOverflow.

How to ssh from k8s pod to other computers

Edit: Before you down vote, please comment why you down vote. so I can improve next time, thank you.
I tried to ssh from pod in kubernetes to another VM in GCE, mainly because I want to use rsync between these two. At the moment, I use gcloud compute scp to copy file to local computer then kubectl cp.
I used kubectl exec to access the pod, setting up ssh with ssh-keygen then copy rsa_id.pub to designated VM to /home/user/.ssh/, but when I try ssh -v user#ip it just said error connection timed out.
I tried setup gcloud inside pods and to be able to use gcloud compute ssh and I also tried gcloud compute config-ssh, the results are the same.
When I ssh with my own computer it works fine
I think firewall or network configuration is causing this problem but I'm not really sure how to fix it. Should I expose ssh port with k8s service LoadBalancer or should I edit my firewall rules in VPC network?

Beginning with Kubernetes version 1.9.x, automatic firewall rules have changed such that workloads in your Kubernetes Engine cluster cannot communicate with other Compute Engine VMs that are on the same network, but outside the cluster. This change was made for security reasons.
You can find the solution HERE
First, find your cluster's network:
gcloud container clusters describe [CLUSTER_NAME] --format=get"(network)"
Then get the cluster's IPv4 CIDR used for the containers:
gcloud container clusters describe [CLUSTER_NAME] --format=get"(clusterIpv4Cidr)"
Finally create a firewall rule for the network, with the CIDR as the source range, and allow all protocols:
gcloud compute firewall-rules create "[CLUSTER_NAME]-to-all-vms-on-network" --network="[NETWORK]" --source-ranges="[CLUSTER_IPV4_CIDR]" --allow=tcp,udp,icmp,esp,ah,sctp

Connect a new node via SSH tunnel

I'm attempting to move data from one cluster to a new cluster without downtime.
So far, I have one node on the old cluster (this is a test setup) that's on a private subnet. I have a bastion host on that network that I connect through. My new cluster also has one node, and again, is behind a bastion, so inaccessible directly.
I was hoping I'd be able to create an SSH tunnel from my new cluster machine that connects to the machine in the old cluster (so something like localhost:9300 ==> remote:9300), and then be able to set the following in my elasticsearch.yml:
discovery.zen.ping.unicast.hosts: ["127.0.0.1"]
network.publish_host: 127.0.0.1
network.bind_host: 127.0.0.1
I've tried this, but all I'm seeing is bound_address {inet[/127.0.0.1:9301]}, publish_address {inet[/127.0.0.1:9301]}
Has anyone managed to connect two clusters over an SSH tunnel with success?

Node address in Infinispan

I start up an infinispan cache which joins to a cluster. It is the only cache in the cluster.
Now, I connect using JMX to see what ports are being used.
I click on:
CacheManager / MyCache/ CacheManager/ Attributes
Under clusterMembers, I see [mymachine-54202]
Thinking 54202 is the port, I do both a
lsof -i udp
lsof -i tcp
I am on a mac and I can't see anything on 54202. What does 54202 correspond to then?
Thanks

It's a random number to differentiate between multiple caches running on the same box.
For more details, see http://docs.jboss.org/infinispan/5.0/apidocs/config.html#ce_global_transport

Rabbit will not cluster on ec2

I am having server issues with getting rabbit to cluster.
I boot up two nodes on ec2.
On the the first node booted I do this.
rabbitmqctl stop_app
rabbitmqctl reset
rabbitmqctl start_app
I boot another node.
sudo service rabbitmq-server stop
#Copy cookie from the first server booted
sudo su - -c 'echo -n "cookie" > /var/lib/rabbitmq/.erlang.cookie'
rabbitmqctl stop_app
rabbitmqctl reset
rabbitmqctl cluster rabbit#server1
1) sever1 is running
2) What ports to need open? I have 22, 4369, 5672
sudo rabbitmqctl cluster rabbit#aws-rabbit-server-east-development-20121102162143
Clustering node 'rabbit#aws-rabbit-server-east-development-20121103033005' with ['rabbit#aws-rabbit-server-east-development-20121102162143'] ...
Error: {no_running_cluster_nodes,['rabbit#aws-rabbit-server-east-development-20121102162143'],
['rabbit#aws-rabbit-server-east-development-20121102162143']}
What could possibility be missing from there docs or what what am I missing?

I had a similar problem on EC2 with two windows machines. I eventually got it working but I'm not sure I did it in the correct way so there may be a better solution.
The issue I found was that the two nodes could not see each other when trying to cluster. Each time you start a Rabbit node it seemed to be assigned a port number dynamically.
This obviously makes it very difficult to know which port to open up in the security group so to solve this, I restricted the range of ports Rabbit chose from when assigning the port. I restricted this to a range of 1 port on each node so I always know which port was being assigned.
The easiest way I found to do this was by editing the sbin\rabbitmq-service.bat file.
find the line -kernel inet_default_connect_options "[{nodelay,true}]" ^
add the following two lines to the file underneath:
-kernel inet_dist_listen_min ##### ^
-kernel inet_dist_listen_max ##### ^
replacing ##### with your chosen port number.
So you should now open up the following ports:
5672 - RabbitMQ’s listening port
4369 - Erlang Port Mapper Daemon
##### - the chosen port number for the Erlang nodes to communicate via
Because Erlang does not recognise FQDNs you may need to modify the hosts file on all the servers to make sure they are all able to resolve all the Erlang node name to an IP address, e.g.
123.123.123.111 NODE1
123.123.123.222 NODE2
once this is done you should then be able to see each node from the other. you can do this by using calling the following from the command line (replacing rabbit#NODE2 with whichever node you want to see)
rabbitmqctl status -n rabbit#NODE2
Hope this give you some help, I'm no expert but found this got things working for me!

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas