Health checks for rabbitmq - rabbitmq

What kind of health checks can be configured for rabbitmq in openshift?
I wasn't able to configure rabbitmqctl as suggested in [1] or [2] as it delivers errors when used with the images registry.centos.org/centos/rabbitmq or luiscoms/openshift-rabbitmq
Example:
$ rabbitmqctl cluster_status
Error: could not recognise command
...
Only root or rabbitmq should run rabbitmqctl
[1] https://github.com/docker-library/rabbitmq/pull/174#issuecomment-452002696
[2] https://github.com/rabbitmq/rabbitmq-peer-discovery-k8s/blob/master/examples/k8s_statefulsets/rabbitmq_statefulsets.yaml

Related

How connect to MSK cluster from EKS cluster

I am having difficulties connecting to my MSK cluster from my EKS cluster even though both clusters share the same VPC and the same subnets.
The security group used by the MSK cluster has the following inbound rules
type
protocol
port range
source
all traffic
all
all
custom
SG_ID
all traffic
all
all
anywhere ipv4
0.0.0.0/0
Where SG_ID is the EKS' Cluster security group.
The one labeled: EKS created security group applied...
In the EKS cluster, I am using the following commands to test connectivity:
kubectl run kafka-consumer \
-ti \
--image=quay.io/strimzi/kafka:latest-kafka-2.8.1 \
--rm=true \
--restart=Never \
-- bin/kafka-topics.sh --create --topic test --bootstrap-server b-1.test.z35y0w.c4.kafka.us-east-1.amazonaws.com:9092 --replication-factor 2 --partitions 1 --if-not-exists
With the following result
Error while executing topic command : Call(callName=createTopics, deadlineMs=1635906680860, tries=1, nextAllowedTryMs=1635906680961) timed out at 1635906680861 after 1 attempt(s)
[2021-11-03 02:31:20,865] ERROR org.apache.kafka.common.errors.TimeoutException: Call(callName=createTopics, deadlineMs=1635906680860, tries=1, nextAllowedTryMs=1635906680961) timed out at 1635906680861 after 1 attempt(s)
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: createTopics
(kafka.admin.TopicCommand$)
pod "kafka-consumer" deleted
pod default/kafka-consumer terminated (Error)
Sadly, the second bootstrap server displayed on the MSK Page gives the same result.
nc eventually times out
kubectl run busybox -ti --image=busybox --rm=true --restart=Never -- nc b-2.test.z35y0w.c4.kafka.us-east-1.amazonaws.com
nslookup fails as well
kubectl run busybox -ti --image=busybox --rm=true --restart=Never -- nslookup b-2.test.z35y0w.c4.kafka.us-east-1.amazonaws.com
If you don't see a command prompt, try pressing enter.
*** Can't find b-2.test.z35y0w.c4.kafka.us-east-1.amazonaws.com: No answer
Could anyone please give me a hint?
Thanks
I need to connect MSK from my EKS pod. So I searched this doc, I want to share my solution, hope can help others:
This my config file:
root#kain:~/work# cat kafkaconfig
security.protocol=SASL_SSL
sasl.mechanism=AWS_MSK_IAM
sasl.jaas.config=software.amazon.msk.auth.iam.IAMLoginModule required;
sasl.client.callback.handler.class=software.amazon.msk.auth.iam.IAMClientCallbackHandler
This is my command:
./kafka-topics.sh --list --bootstrap-server <My MSK bootstrap server>:9098 --command-config ./kafkaconfig
For this command, there are 2 preconditions we need to make sure,
one is you have access to aws msk, (I access MSK from my eks pod, and my eks pod has OIDC to access the AWS).
Second is we need to has AWS auth jar file: aws-msk-iam-auth.jar
address: https://github.com/aws/aws-msk-iam-auth/releases
put it to kafkaclient libs directory or export CLASSPATH=/aws-msk-iam-auth-1.1.4-all.jar
reference doc: https://aws.amazon.com/blogs/big-data/securing-apache-kafka-is-easy-and-familiar-with-iam-access-control-for-amazon-msk/

Create vHost on Windows RabbitMQ node using command line

I have a RabbitMQ node on windows operating system. I want to create vhost on that node from command line of using a script with minimal pre-requisites.
EDIT: I tried to use the rabbitmqctl add_vhost but I always get an error.
rabbitmqctl add_vhost my_vhost
and
rabbitmqctl set_permissions -p my_vhost guest ".*" ".*" ".*"
I suggest to read this: https://www.rabbitmq.com/man/rabbitmqctl.1.man.html
So you have another error, about the node down read here
RabbitMQ has Nodedown Error

Auto reconnect to RabbitMQ cluster after server restart

I have master-slave configuration of RabbitMQ. As two Docker containers, with dynamic internal IP (changed on every restart).
Clustering works fine on clean run, but if one of servers got restarted it cannot reconnect to the cluster:
rabbitmqctl join_cluster --ram rabbit#master
Clustering node 'rabbit#slave' with 'rabbit#master' ...
Error: {ok,already_member}
And following:
rabbitmqctl cluster_status
Cluster status of node 'rabbit#slave' ...
[{nodes,[{disc,['rabbit#slave']}]}]
says that node not in a cluster.
Only way I found it remove this node, and only then try to rejoin cluster, like:
rabbitmqctl -n rabbit#master forget_cluster_node rabbit#slave
rabbitmqctl join_cluster --ram rabbit#master
That works, but doesn't look good for me. I believe there should be better way to rejoin cluster, than forgetting and join again. I see there is a command update_cluster_nodes also, but seems that this something different, not sure if it could help.
What is correct way to rejoin cluster on container restart?
I realize that this has been opened for a year but I though I would answer just in case it might help someone.
I believe that this issue has been resolved in a recent RabbitMQ release.
I implemented a Dockerized RabbitMQ Cluster using the Rabbit management 3.6.5 image and my nodes are able to auto rejoin the cluster on container or Docker host restart.

rabbitmq cluster join does not work

I have a rabbitmq cluster with 2 nodes. Node A and B. Node A is up and running. Everytime I run the following commadn on node A I get:
./rabbitmqctl cluster_status
Cluster status of node rabbit#A ...
[{nodes,[{disc,[rabbit#A,rabbit#B]}]},
{running_nodes,[rabbit#A]},
{partitions,[]}]
...done.
Interestingly node B is up and running. Everytime I have it join the other node (A) to get it clustered it states:
rabbitmqctl join_cluster rabbit#A
...done (already_member).
rabbitmqctl cluster_status
Cluster status of node rabbit#B ...
[{nodes,[{disc,[rabbit#B]}]}]
...done.
So somehow node A cannot see B. And on be the "already_member" does not seem to be reflected from the cluster_status command...
I can check the queues on both nodes and they are different. Node A has dozens of queues and node B none, therefore it is clear the cluster is not established. Both node A and B can ping each other and nothing gets reported in the rabbitmq's logs
Any idea how this is not working ?
In case of cluster I will suggest you to go for Load Balancer . Make sure you already set HA Policy for your cluster.
To set HA Policy
$ rabbitmqctl set_policy ha-all "" '{"ha-mode":"all","ha-sync-mode":"automatic"}'
More here RabbitMQ Cluster
I was able to solve this same problem. Given NodeA is the parent and NodeB is trying to join the cluster.
Stop NodeB app rabbitmqctl stop_app
On NodeA forget cluster node rabbitmqctl forget_cluster_node rabbit#NodeB
Reset NodeB rabbitmqctl reset
Join NodeB to the cluster rabbitmqctl join_cluster rabbit#NodeA
Start NodeB rabbitmqctl start_app

Rabbit will not cluster on ec2

I am having server issues with getting rabbit to cluster.
I boot up two nodes on ec2.
On the the first node booted I do this.
rabbitmqctl stop_app
rabbitmqctl reset
rabbitmqctl start_app
I boot another node.
sudo service rabbitmq-server stop
#Copy cookie from the first server booted
sudo su - -c 'echo -n "cookie" > /var/lib/rabbitmq/.erlang.cookie'
rabbitmqctl stop_app
rabbitmqctl reset
rabbitmqctl cluster rabbit#server1
1) sever1 is running
2) What ports to need open? I have 22, 4369, 5672
sudo rabbitmqctl cluster rabbit#aws-rabbit-server-east-development-20121102162143
Clustering node 'rabbit#aws-rabbit-server-east-development-20121103033005' with ['rabbit#aws-rabbit-server-east-development-20121102162143'] ...
Error: {no_running_cluster_nodes,['rabbit#aws-rabbit-server-east-development-20121102162143'],
['rabbit#aws-rabbit-server-east-development-20121102162143']}
What could possibility be missing from there docs or what what am I missing?
I had a similar problem on EC2 with two windows machines. I eventually got it working but I'm not sure I did it in the correct way so there may be a better solution.
The issue I found was that the two nodes could not see each other when trying to cluster. Each time you start a Rabbit node it seemed to be assigned a port number dynamically.
This obviously makes it very difficult to know which port to open up in the security group so to solve this, I restricted the range of ports Rabbit chose from when assigning the port. I restricted this to a range of 1 port on each node so I always know which port was being assigned.
The easiest way I found to do this was by editing the sbin\rabbitmq-service.bat file.
find the line -kernel inet_default_connect_options "[{nodelay,true}]" ^
add the following two lines to the file underneath:
-kernel inet_dist_listen_min ##### ^
-kernel inet_dist_listen_max ##### ^
replacing ##### with your chosen port number.
So you should now open up the following ports:
5672 - RabbitMQ’s listening port
4369 - Erlang Port Mapper Daemon
##### - the chosen port number for the Erlang nodes to communicate via
Because Erlang does not recognise FQDNs you may need to modify the hosts file on all the servers to make sure they are all able to resolve all the Erlang node name to an IP address, e.g.
123.123.123.111 NODE1
123.123.123.222 NODE2
once this is done you should then be able to see each node from the other. you can do this by using calling the following from the command line (replacing rabbit#NODE2 with whichever node you want to see)
rabbitmqctl status -n rabbit#NODE2
Hope this give you some help, I'm no expert but found this got things working for me!