How to get notifications about failovers in Redis cluster to the client application - redis

Is there a way for a client to get notified about failover events in the Redis cluster? If so, which client library would support this? I am currently using Jedis but have the flexibility to switch to any other Java client.

There are two ways that I can think of to check this, one of them is to grep for master nodes on the cluster, keeping in mind their IDs, if the ports changed for any of them then a failover happened.
$ redis-cli -p {PORT} cluster nodes | grep master
Another way, but it is not as robust of a solution is using the consistency checker ruby script, that will start showing errors in writes as an output, which you can monitor and send notifications depending on it, since that happens when the read server is trying to take its master's role.

Sentinel (http://redis.io/topics/sentinel) has the ability to monitor the cluster member, and send a publish/subscribe notification upon failure. The link contains a more in-depth explanation and tutorial.

Related

Golang Redis Sentinel client

I'm adding Redis support to an open-source project written in Go. The goal is to support all Redis topologies: server, cluster, sentinel.
I browsed Go clients listed in redis.io/clients, and it seems that github.com/go-redis/redis project is a viable option.
My main concern is the NewSentinelClient() method accepts a single sentinel address.
According to the Guidelines for Redis clients (redis.io/topics/sentinel-clients#guidelines-for-redis-clients-with-support-for-redis-sentinel), "the client should iterate the list of Sentinel addresses. "
How can SentinelClient iterate through the rest of sentinel instances, if it only has one sentinel address?
Do I miss something?
On the same topic, could someone recommend another Go Redis client that might be suitable for this scenario?
Use NewFailoverClient if you have multiple sentinels.
rdb := redis.NewFailoverClient(&redis.FailoverOptions{
MasterName: "mymaster",
SentinelAddrs: []string{
"sentinel_1:26379",
"sentinel_2:26379",
"sentinel_3:26379",
},
})

Redis Cluster configuration for CacheManager.NET

I have a basic question about Redis connection parameters from CacheManager.NET perspective. In case when we have Redis cluster with a master and 2 slaves, and with quorum of sentinel processes, should we provide the IP:PORT combinations pointing to the sentinel processes OR the actual Redis server processes.
As suggested in https://seanmcgary.com/posts/how-to-build-a-fault-tolerant-redis-cluster-with-sentinel, it is advisable to ask the sentinel process about the actual master before making the connection. And probably that goes in line with Jedis which provides JedisSentinelPool to do the initial lookup.
Essentially what we want is that the load balancing on reads (via CacheManager.NET) and the writes should go to the current master node of the cluster.
CacheManager relies on StackExchange.Redis for the Redis implementation. Therefor, whatever this client library supports, CacheManager does, too.
Unfortunately, sentinel support is not implemented, there are issues on github for years regarding that
That being said, I did some testing with a Multi Master/Slave + Sentinel setup. Added all the non-sentinel nodes as endpoints to the Multiplexer configuration and it kinda works because the Redis Client knows how to handle multiple master/slave instances.
In the process of switching to another master, the client might throw exceptions that it cannot write to a readonly slave and such. CacheManager might retry those calls and after a short amount of time, when the leader election is done, the call should go through.
But this is not 100% stable and I would not put that in production, as "official" support is still missing...
Alternative to running with sentinels, you could run Redis in Cluster mode which should just work, or behind a proxy which deals with all that master/slave stuff.
Twemproxy is one alternative.
I still have to add support for Twemproxy to CacheManager, as many features are simply not available, like Lua scripting or get a list of servers or flush commands...
This will come in 1.0.2
Hope that helps.

Clone RabbitMQ admin users, etc. on replacement server

We have a couple of crusty AWS hosts running a RabbitMQ implementation in a cluster. We need to upgrade the hardware, and therefore we developed a Chef cookbook to spawn replacement servers.
One thing that we would rather not recreate by hand is the admin users, the queues, etc.
What is the best method to get that stuff from the old hosts to the new ones? I believe it's everything that lives in the /var/lib/rabbitmq/mnesia directory.
Is it wise to copy the files from one host to another?
Is there a programmatic means to do this?
Can it be coded into our Chef cookbook?
You can definitely export and import configuration via command line: https://www.rabbitmq.com/management-cli.html
I'm not sure about admin user, though.
If you create new rabbitmq nodes on your new hardware, you will get all the users in that new node. This is easy to try:
run docker container with image of rabbitmq (with management plugin)
and create a user
run another container and add that node to the
cluster of the first one
kill rabbitmq on the first one, or delete
the docker container and you will see that you still have the newly
created user on the 2nd (but now master) node
I wrote docker since it's faster to create a cluster this way, but if you already have a cluster you could use it for testing if you prefer.
For the queues and exchanges, I don't want to quote almost everything found in the rabbitmq doc page for the high availability, but I will just say that you have to pay attention to the following:
exclusive queues because they are gone once the client connection is gone
queue mirroring (if you have any set up, if not it would be wise to consider it, if not even necessary)
I would do the migration gradually, waiting for the queues to get emptied and then kill of the nodes on the old hardware. It maybe doable in a big-bang fashion, but seems riskier. If you have a running system, than set up queue mirroring and try to find appropriate moment to do manual sync - but careful, this has a huge impact on the broker performance.
Additionally there is this shovel plugin (I have to point out that I did not use it or even explore it) but that may be another way to go since (quoting form the link):
In essence, a shovel is a simple pump. Each shovel:
connects to the source broker and the destination broker, consumes
messages from the queue, re-publishes each message to the destination
broker (using, by default, the original exchange name and
routing_key).

RabbitMq Clustering

I am new to RabbitMq. I am not able to understand the concept here. Please find the scenario.
I have two machines (RMQ1, RMQ2) where I have installed rabbitmq in both the machines which are running. Again I clustered RMQ2 to join RMQ1
cmd:/> rabbitmqctl join_cluster rabbit#RMQ1
If you see the status of the machines here it is as below
In RMQ1
c:/> rabbitmqctl cluster_status
Cluster status of node rabbit#RMQ1...
[{nodes,[{disc,[rabbit#RMQ1,rabbit#RMQ2]}]},
{running_nodes,[rabbit#RMQ1,rabbit#RMQ2]}]
In RMQ2
c:\> rabbitmqctl cluster_status
Cluster status of node rabbit#RMQ2 ...
[{nodes,[{disc,[rabbit#RMQ1,rabbit#RMQ2]}]},
{running_nodes,[rabbit#RMQ1,rabbit#RMQ2]}]
The in order to publish and subscribe message I am connecting to RMQ1. Now I see the whenever I sent or message to RMQ1, I see message mirrored in both RMQ1 and RMQ2. This I understand clearly that as both the nodes are in same cluster they are getting mirrored across nodes.
Let say I bring down the RMQ2, I still see message getting published to RMQ1.
But when I bring down the RMQ1, I cannot publish the message anymore. From this I understand that RMQ1 is master and RMQ2 is slave.
Now I have below questions, without changing the code :
How do I make the RMQ2 take up the job of accepting the message.
What is the meaning of Highly Available Queues.
How should be the strategy for implementing this kind scenario.
Please help
Question #2 is best answered first, since it will clear up a lot of things for you.
What is the meaning of highly available queues?
A good source of information for this is the Rabbit doc on high availability. It's very important to understand that mirroring (which is how you achieve high availability in Rabbit) and clustering are not the same thing. You need to create a cluster in order to mirror, but mirroring doesn't happen automatically just because you create a cluster.
When you cluster Rabbit, the nodes in the cluster share exchanges, bindings, permissions, and other resources. This allows you to manage the cluster as a single logical broker and utilize it for scenarios such as load-balancing. However, even though queues in a cluster are accessible from any machine in the cluster, each queue and its messages are still actually located only on the single node where the queue was declared.
This is why, in your case, bringing down RMQ1 will make the queues and messages unavailable. If that's the node you always connect to, then that's where those queues reside. They simply do not exist on RMQ2.
In addition, even if there are queues and messages on RMQ2, you will not be able to access them unless you specifically connect to RMQ2 after you detect that your connection to RMQ1 has been lost. Rabbit will not automatically connect you to some surviving node in a cluster.
By the way, if you look at a cluster in the RabbitMQ management console, what you see might make you think that the messages and queues are replicated. They are not. You are looking at the cluster in the management console. So regardless of which node you connect to in the console, you will see a cluster-wide view.
So with this background now you know the answer to your other two questions:
What should be the strategy for implementing high availability? / how to make RMQ2 accept messages?
From your description, you are looking for the failover that high availability is intended to provide. You need to enable this on your cluster. This is done through a policy, and there are various ways to do it, but the easiest way is in the management console on the Admin tab in the Policies section:
The previously cited doc has more detail on what it means to configure high availability in Rabbit.
What this will give you is mirroring of queues and messages across your cluster. That way, if RMQ1 fails then RMQ2 will still have your queues and messages since they are mirrored across both nodes.
An important note is that Rabbit will not automatically detect a loss of connection to RMQ1 and connect you to RMQ2. Your client needs to do this. I see you tagged your question with EasyNetQ. EasyNetQ provides this "failover connect" type of feature for you. You just need to supply both node hosts in the connection string. The EasyNetQ doc on clustering has details. Note that EasyNetQ even lets you inject a simple load balancing strategy in this case as well.

Master-Slave and Publish-Subscribe connections

Assuming I have a Master-Slave deployment of Redis (1 master, 1 slave) and a client (webapp) that will manage Publish-Subscribe.
Can I Publish messages to the slave and will they be "seen" by the master?
Or should I use only the Master for Publish and the Slave for Subscribe commands?
I've been looking around but couldn't find the answer. Anyone knows?
EDIT: As #jameshfisher pointed out, the link below is regarding Redis Cluster. The comment from #lionello seems to be the correct answer:
Publishing to a slave will not propagate to the master, only the other way around.
The answer is on the cluster-spec docs:
Publish/Subscribe
In a Redis Cluster clients can subscribe to every node, and can also publish to every other node. The cluster will make sure that published messages are forwarded as needed.
The current implementation will simply broadcast each published message to all other nodes, but at some point this will be optimized either using Bloom filters or other algorithms.
For the typical data you store in Redis, you should only write to the master.
From http://redis.io/topics/replication:
...writes [to slaves] will be discarded if the slave and the master will [sic] resynchronize, or if the slave is restarted...
In fact, starting from v2.6, you can put slaves in slave-read-only mode which would prevent the mistake of writing data to a slave.
The documentation does go on to mention a potential use case for writing data to slaves:
...often there is ephemeral data that is unimportant that can be stored
into slaves. For instance clients may take information about
reachability of master in the slave instance to coordinate a fail over
strategy.