How to reconnect Redis cluster nodes? - redis

I have a Redis cluster of 6 nodes, running in my Kubernetes cluster as a stateful set. Since it was for testing and not yet on production, all of the Redis nodes were on the same machine. Of course, the machine failed, and all of Redis' nodes crashed immediately.
When the machine was back alive the pods were recreated and was given different cluster ips, therefore they couldn't re-connect with each other.
I need to find a solution for a disaster case such as this. Let's assume all the nodes were reassigned with different ips, how can I configure the nodes to get to other ips?
The slaves are easy to reset with the CLUSTER RESET command, but the masters contain slots and data that shouldn't be deleted.
Should I manually re-write nodes.conf? I'm afraid this will make it even worse? I there a known method to deal with it?
Thanks!

Found a solution:
The first step is to change the current pod ip in nodes.conf when the pod is starting. You can achieve that with this script
#!/bin/sh
CLUSTER_CONFIG="/data/nodes.conf"
if [ -f ${CLUSTER_CONFIG} ]; then
if [ -z "${POD_IP}" ]; then
echo "Unable to determine Pod IP address!"
exit 1
fi
echo "Updating my IP to ${POD_IP} in ${CLUSTER_CONFIG}"
sed -i.bak -e "/myself/ s/[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}/${POD_IP}/" ${CLUSTER_CONFIG}
fi
exec "$#"
You should start any pod by calling this script and passing to it the original redis-server start command.
Now every pod in the cluster has the correct IP of itself set.
Make sure the cluster's pods are stable and don't crash.
Manually edit nodes.conf in one of the pods. Set the correct IPs instead of the deprecated ones.
Restart the pod you've edit with redis-cli shutdown. Kubernetes will set up a new pod for it. The new pod's IP will be set by the script I added above.

In my opinion you shouldn't rely on Pods' internal IP address at all, when referencing your Redis cluster in any place within your application. Pods are mortal, this means they are designed to crash. So when node dies, they are destroyed too. When node resurrects, PODs are recreated with new IP addresses.
The proper way to target your PODs would be via their DNS names (as explained here), if you created your Redis cluster as a Stateful application.

Related

Running multiple instance of Redis on Centos

I want to run multiple instance of Redis on Centos 7.
Can anyone point me to proper link or post steps here.
I googled for the information but I didn't find any relevant information.
You can run multiple instances of Redis using different ports on a single machine. If this what concerns you then you can follow the below steps.
By installing the first Redis instance, it listens on localhost:6379 by default.
For Second Instance create a new working directory
The default Redis instance uses /var/lib/redis as its working directory, dumped memory content is saved under this directory with name dump.rdb if you did not change it. To avoid runtime conflicts, we need to create a new working directory.
mkdir -p /var/lib/redis2/
chown redis /var/lib/redis2/
chgrp redis /var/lib/redis2/
Generate configurations
Create a new configuration file by copying /etc/redis/redis.conf
cp /etc/redis/redis.conf /etc/redis/redis2.conf
chown redis /etc/redis/redis2.conf
Edit following settings to avoid conflicts
logfile "/var/log/redis/redis2.log"
dir "/var/lib/redis2"
pidfile "/var/run/redis/redis2.pid"
port 6380
Create service file
cp /usr/lib/systemd/system/redis.service /usr/lib/systemd/system/redis2.service
Modify the settings under Service section
[Service]
ExecStart=/usr/bin/redis-server /etc/redis/redis2.conf --daemonize no
ExecStop=/usr/bin/redis-shutdown redis2
Set to start with boot
systemctl enable redis2
Start 2nd Redis
service redis2 start
Check Status
lsof -i:6379
lsof -i:6380
By Following this you can start two Redis servers. If you want more repeat the steps again.
If I set to --daemonize no, Redis will crash when data insert.
ExecStart=/usr/bin/redis-server /etc/redis2.conf --daemonize no
Should change to
ExecStart=/usr/bin/redis-server /etc/redis2.conf --supervised systemd
My Redis is 5.0.7.
FYI.

Redis cluster master slave - not able to add key

I have setup up Redis master slave configuration having one master (6379 port) and 3 slaves (6380,6381,6382) running in the same machine. Looks like cluster is setup properly as I can see the following output on running info command:
# Replication
role:master
connected_slaves:3
slave0:ip=127.0.0.1,port=6380,state=online,offset=29,lag=1
slave1:ip=127.0.0.1,port=6381,state=online,offset=29,lag=1
slave2:ip=127.0.0.1,port=6382,state=online,offset=29,lag=1
master_repl_offset:43
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:2
repl_backlog_histlen:42
But wherever I try to add new key in master, I get the following error:
(error) CLUSTERDOWN Hash slot not served
Using redis-3.0.7 in Mac OS X Yosemite.
I had the same issue, turned out I forgot to run the create cluster:
cd /path/to/utils/create-cluster
./create-cluster create
http://redis.io/topics/cluster-tutorial#creating-a-redis-cluster-using-the-create-cluster-script
To fix slots issue while insertion:
redis-cli --cluster fix localhost:6379
You can use ruby script buddled with redis for creating clusters as mentioned below :
/usr/local/redis-3.2.11/src/redis-trib.rb create --replicas 1 192.168.142.128:7001 192.168.142.128:7002 192.168.142.128:7003 192.168.142.128:7004 192.168.142.128:7005 192.168.142.128:7006
There is a hash slot not allotted to any master. Check the hash slots by looking at the column 9 in the output of following command (column 9 will be empty if no hash slots for that node):
redis-cli -h masterIP -p masterPORT CLUSTER NODES
The hash slots can be allotted by using the following command.
redis-cli -h masterIP -p masterPORT CLUSTER ADDSLOTS SLOTNNUMBER
But this has to be done for every slot number individually without missing. Use a bat script to include it in for loop. something like,
for /L %a in (0,1,5400) Do redis-cli -h 127.0.0.1 -p 7001 cluster addslots %a
Also, this command works before assigning slaves to master. After this ADDSLOTS step and completing the setup, the SET and GET worked fine. Remember to use -c along with redis-cli before SET to enable cluster support.
The issue comes when one or more Redis nodes gets corrupted and can no longer serve its configured hash slots.
You will have to bootstrap the cluster again to make sure the nodes agree on the hash slots to serve.
If the Redis nodes contain data or a key in the database 0, you will have to clear this data before rerunning the bootstrap.

Auto reconnect to RabbitMQ cluster after server restart

I have master-slave configuration of RabbitMQ. As two Docker containers, with dynamic internal IP (changed on every restart).
Clustering works fine on clean run, but if one of servers got restarted it cannot reconnect to the cluster:
rabbitmqctl join_cluster --ram rabbit#master
Clustering node 'rabbit#slave' with 'rabbit#master' ...
Error: {ok,already_member}
And following:
rabbitmqctl cluster_status
Cluster status of node 'rabbit#slave' ...
[{nodes,[{disc,['rabbit#slave']}]}]
says that node not in a cluster.
Only way I found it remove this node, and only then try to rejoin cluster, like:
rabbitmqctl -n rabbit#master forget_cluster_node rabbit#slave
rabbitmqctl join_cluster --ram rabbit#master
That works, but doesn't look good for me. I believe there should be better way to rejoin cluster, than forgetting and join again. I see there is a command update_cluster_nodes also, but seems that this something different, not sure if it could help.
What is correct way to rejoin cluster on container restart?
I realize that this has been opened for a year but I though I would answer just in case it might help someone.
I believe that this issue has been resolved in a recent RabbitMQ release.
I implemented a Dockerized RabbitMQ Cluster using the Rabbit management 3.6.5 image and my nodes are able to auto rejoin the cluster on container or Docker host restart.

RabbitMQ - AWS EC2 Clustering hell

Sorry, should be shot for having to even ask this, but wasted day on this - and feel like I've read everything there is.
I can't create a cluster on my EC2 instances (3) that are spread on three different regions. The hosts:
rabbit#ip-172-31-47-217
rabbit#ip-172-31-1-82
rabbit#ip-172-31-36-111
The initial state before trying to make the cluster:
ubuntu#ip-172-31-47-217:~$ sudo rabbitmqctl cluster_status
Cluster status of node 'rabbit#ip-172-31-47-217' ...
[{nodes,[{disc,['rabbit#ip-172-31-47-217']}]},
{running_nodes,['rabbit#ip-172-31-47-217']},
{partitions,[]}]
ubuntu#ip-172-31-36-111:~$ sudo rabbitmqctl cluster_status
Cluster status of node 'rabbit#ip-172-31-36-111' ...
[{nodes,[{disc,['rabbit#ip-172-31-36-111']}]},
{running_nodes,['rabbit#ip-172-31-36-111']},
{partitions,[]}]
ubuntu#ip-172-31-1-82:~$ sudo rabbitmqctl cluster_status
Cluster status of node 'rabbit#ip-172-31-1-82' ...
[{nodes,[{disc,['rabbit#ip-172-31-1-82']}]},
{running_nodes,['rabbit#ip-172-31-1-82']},
{partitions,[]}]
When I try to check status from one server for another:
sudo rabbitmqctl status -n rabbit#ip-172-31-1-82
Status of node 'rabbit#ip-172-31-1-82' ...
Error: unable to connect to node 'rabbit#ip-172-31-1-82': nodedown
nodes in question: ['rabbit#ip-172-31-1-82']
hosts, their running nodes and ports:
- unable to connect to epmd on ip-172-31-1-82: timeout (timed out)
current node details:
- node name: 'rabbitmqctl3835#ip-172-31-36-111'
- home dir: /var/lib/rabbitmq
- cookie hash: 0tsf/OyQZI7zobmv1Ia97w==
All three servers have the same erlang cookie hash.
I can verify the host names are setup properly:
host ip-172-31-36-111
ip-172-31-36-111.us-west-2.compute.internal has address 172.31.36.111
I know the ports are open:
netstat -plten | grep beam
Because I opened all TCP and UDP at this point as a test, no change.
and finally if this would behave differently given those failures:
sudo rabbitmqctl join_cluster --ram rabbit#ip-172-31-1-82
Clustering node 'rabbit#ip-172-31-47-217' with 'rabbit#ip-172-31-1-82' ...
Error: {cannot_discover_cluster,"The nodes provided are either offline or not running"}
Please help, being driven insane by this.
The problem is that they are in different regions (presumably in EC2-classic - you didn't mention whether you were using a VPC). This means they cannot communicate via their private IPs (see e.g. Can EC2 instances in different regions communicate over their private IP addresses?)
ping 172.31.36.111
will fail from one of the other servers, for example. Pinging using the hostname probably will probably even fail on the DNS lookup.
Your options are:
Put them in separate zones in a single region (in EC2 classic, they will be able to communicate). You could also use a VPC in this case, putting the in separate subnets but allowing interconnections via appropriately set up security groups.
Set up /etc/hosts on each server to point the relevant public IPs of the other servers (you could attach elastic IPs to each server to ensure stability across server restarts). You could also set the hostname of each server for clarity. Set you your security groups to allow access on the relevant ports that rabbitmq uses. There may be security implications of doing this, since the data will be travelling over the public internet.
Set up a VPN between each server in the cluster. Amazon VPC has a VPN facility, but there are ways of setting it up yourself I think.
I think only option 1 is simplest. Option 2 has major security implications (I believe there are ways of securing the connection between the cluster servers, but they aren't documented on the rabbitmq website as far as I can tell). Option 3 is complex but probably the best option if you need multiple regions.
Note that rabbitmq clusters aren't meant to be run over wide geographical areas, since they aren't too reliable in the face of network partitions. See here: https://www.rabbitmq.com/clustering.html

Rabbit will not cluster on ec2

I am having server issues with getting rabbit to cluster.
I boot up two nodes on ec2.
On the the first node booted I do this.
rabbitmqctl stop_app
rabbitmqctl reset
rabbitmqctl start_app
I boot another node.
sudo service rabbitmq-server stop
#Copy cookie from the first server booted
sudo su - -c 'echo -n "cookie" > /var/lib/rabbitmq/.erlang.cookie'
rabbitmqctl stop_app
rabbitmqctl reset
rabbitmqctl cluster rabbit#server1
1) sever1 is running
2) What ports to need open? I have 22, 4369, 5672
sudo rabbitmqctl cluster rabbit#aws-rabbit-server-east-development-20121102162143
Clustering node 'rabbit#aws-rabbit-server-east-development-20121103033005' with ['rabbit#aws-rabbit-server-east-development-20121102162143'] ...
Error: {no_running_cluster_nodes,['rabbit#aws-rabbit-server-east-development-20121102162143'],
['rabbit#aws-rabbit-server-east-development-20121102162143']}
What could possibility be missing from there docs or what what am I missing?
I had a similar problem on EC2 with two windows machines. I eventually got it working but I'm not sure I did it in the correct way so there may be a better solution.
The issue I found was that the two nodes could not see each other when trying to cluster. Each time you start a Rabbit node it seemed to be assigned a port number dynamically.
This obviously makes it very difficult to know which port to open up in the security group so to solve this, I restricted the range of ports Rabbit chose from when assigning the port. I restricted this to a range of 1 port on each node so I always know which port was being assigned.
The easiest way I found to do this was by editing the sbin\rabbitmq-service.bat file.
find the line -kernel inet_default_connect_options "[{nodelay,true}]" ^
add the following two lines to the file underneath:
-kernel inet_dist_listen_min ##### ^
-kernel inet_dist_listen_max ##### ^
replacing ##### with your chosen port number.
So you should now open up the following ports:
5672 - RabbitMQ’s listening port
4369 - Erlang Port Mapper Daemon
##### - the chosen port number for the Erlang nodes to communicate via
Because Erlang does not recognise FQDNs you may need to modify the hosts file on all the servers to make sure they are all able to resolve all the Erlang node name to an IP address, e.g.
123.123.123.111 NODE1
123.123.123.222 NODE2
once this is done you should then be able to see each node from the other. you can do this by using calling the following from the command line (replacing rabbit#NODE2 with whichever node you want to see)
rabbitmqctl status -n rabbit#NODE2
Hope this give you some help, I'm no expert but found this got things working for me!