Restart Redis replica - redis

I am using redis with 2 levels master/slave mode, without cluster or sentinel. Basically, I have A -> B -> C, where A is the primary master, B replicates from A, and C replicates from B. Both B and C are multiple instances. Now I need to restart all of our instances in order to do some security patches to the OS without downtime. According to the official Redis documentation, restarting a master will wipe out all data in slaves, which is not acceptable in our case. I am trying to find a solution that can restart A and B without wiping data on C. My tentative is to dump all data into a rdb file and restart A. However, this may still introduce some downtime because B and C will try to resync their data with A when it's loading back the rdb files. Another way I am thinking of is to turn off the replication of slaves while I am restarting the master, and then restart salves one by one. However, this requires lots of manual efforts to perform the restarts. Are there any better ways to perform this update without downtime?
A related question is how can I configure the system to be able to handle this kind restarting easily in the future? Can I enable persistent on all instances? If a slave is enabled for persistence. What will happen when it is restarted? Will it load the local rdb file or do a full synchronization when it restarts.

According to the official Redis documentation, restarting a master will wipe out all data in slaves, which is not acceptable in our case.
I'm not sure which part of the docs you're referring to, but in any case that's hardly the case - it all depends on how you do it. In your case, I'd use a workflow like the below to keep the service up and patch it in a rolling fashion:
Promote B for masterhood
Restart A, slave it to C
Promote C for masterhood
Restart B, slave it to A
Promote A to masterhood
Restart C, slave it B

Related

In Redis, what if I set a node to be a slave of another slave node?

So in the beginning of time, I ran a master node A.
Then, I ran the second node, B, which I then set as the slave of A.
node B> SLAVEOF A
So far so good.
Now, what if I ran the third node, C, which is set as the slave of B?
node C> SLAVEOF B
From local testing, it seems that while indeed Redis doesn't have a way of resolving such "transitive" definition, it did perform replications successfully. I was able to retrieve the value that I set in A, from both B and C.
Now, is it acceptable to do this for production? And why (or why not)?
Chained replication (slave of a slave) is supported in Redis and is an accepted deployment in production - here's the reference from https://redis.io/topics/replication:
Slaves are able to accept connections from other slaves. Aside from connecting a number of slaves to the same master, slaves can also be connected to other slaves in a cascading-like structure. Since Redis 4.0, all the sub-slaves will receive exactly the same replication stream from the master.

Does redis delete all the keys when one master and its slave fails in redis cluster

I have a question. Suppose I am using a Redis cluster with 3 shards (with master and slave). I came to know that if a master and its slave fails at the same time Redis Cluster is not able to continue to operate. What happen after that.
Would Redis cluster delete all the other keys from other 2 nodes as well? (When it comes back)
Do we need to manually restart this cluster and can we somehow retain the other keys values (on other nodes)?
How will it behave if I use Azure Redis Cache?
Thanks In Advance
1. Would Redis cluster delete all the other keys from other 2 nodes as well? (When it comes back)
First of all only the operations are blocked not the cluster activity and nothing is done with the data so says the documentation
Redis Cluster failure detection is used to recognize when a master or slave node is no longer reachable by the majority of nodes and then respond by promoting a slave to the role of master. When slave promotion is not possible the cluster is put in an error state to stop receiving queries from clients.
Next regarding if the data gets deleted or not (Under Replication document)
In setups where Redis replication is used, it is strongly advised to have persistence turned on in the master
Which means that only if the persistence was turned off and the master server pair went down then you will loose the data. When the pair comes back up, you will not be able to recover the data. So keep Redis persistence turned on.
2. Do we need to manually restart this cluster and can we somehow retain the other keys values (on other nodes)?
I think the above answer covers it up.
3. How will it behave if I use Azure Redis Cache?
From Azure Redis Cache FAQ
High Availability/SLA: Azure Redis Cache guarantees that a Standard/Premium cache will be available at least 99.9% of the time. To learn more about our SLA, see Azure Redis Cache Pricing. The SLA only covers connectivity to the Cache endpoints. The SLA does not cover protection from data loss. We recommend using the Redis data persistence feature in the Premium tier to increase resiliency against data loss.
So it's kinda their headache
OR
Redis Cluster: If you want to create caches larger than 53 GB or want to shard data across multiple Redis nodes, you can use Redis clustering which is available in the Premium tier. Each node consists of a primary/replica cache pair for high availability. For more information, see How to configure clustering for a Premium Azure Redis Cache.

minimum activemq cluster size with replicated leveldb store

What is the rationale behind requiring at least 3 ActiveMQ instances and 3 ZooKeeper servers for running master/slave setup with replicated LevelDB storage? If the requirement is imposed by the usage of ZooKeeper which requires at least 3 servers, what is the rationale for ZooKeeper to require at least 3 servers to provide reliability?
Is it for guaranteeing consistency in cases of network partitions (by sacrificing availability on the smaller smaller partition) as in a 2-node primary backup configuration it is impossible distinguish between a failed peer or both nodes being in different network partitions?
Is it for providing tolerance against Byzantine failures where you need 2f+1 nodes to survive f faulty nodes (considering ONLY crash failures requires only f+1 nodes to survive f faults)?
Or is there any other reason?
Thanks!
Zookeeper requires at least 3 servers because of how it elects a new Activemq Master. Zookeeper requires a majority (n/2+1) to elect a new master. If it does not have that majority, no master will be selected and the system will fail. This is the same reason for why you use an odd number of Zookeepers servers. (EG. 3 servers gives you the same failure rate as 4 because of majority, can still only lose 1 server.)
For Activemq, the necessity of at least 3 servers is derived from how the messages are synced, and the fact that when a new master is elected, it requires atleast a quorum of nodes (N/2+1) to be able to identify the latest updates. ActiveMQ will sync messages with 1 slave, and then respond with an OK. It will then sync asynchronously with all other slaves. If a quorum is not present when a node fails, then Zookeeper has no way to distinguish which node is the most currently updated. This is what happens when you have only 2 nodes originally, so at least 3 is recommended.
From ActiveMQ site, under How it Works:
All messaging operations which require a sync to disk will wait for the update to be replicated to a quorum of the nodes before completing. So if you configure the store with replicas="3" then the quorum size is (3/2+1)=2. The master will store the update locally and wait for 1 other slave to store the update before reporting success. Another way to think about it is that store will do synchronous replication to a quorum of the replication nodes and asynchronous replication replication to any additional nodes.
When a new master is elected, you also need at least a quorum of nodes online to be able to find a node with the lastest updates. The node with the lastest updates will become the new master. Therefore, it's recommend that you run with at least 3 replica nodes so that you can take one down without suffering a service outage.

How to switch masters in this Redis Sentinel configuration?

I have the following Redis/Sentinel configuration:
Redis master A + N slaves
M sentinels watching A, named masterA
the client application query the sentinels for masterA, then query and modify A
Now say A is outdated and I want to replace it by a new Redis master called B (with minimum down time / data loss.). In the end of the operation, I want this:
Redis master B + N slaves
the client application querying and modifying B
I could proceed as follows:
Have the sentinels start watching B, named masterB
Have each slave of A become a slave of B
From there, I am stuck because the client application still asks for masterA when talking to the sentinels. I have two questions:
Is there a way to switch masters names, such that B becomes known as masterA for the sentinels, and therefore for the client application as well?
Is it better to modify the client application code to handle the switch from an old master to a new master?
One way of achieving your aim is to follow the age old solution of "adding another level of indirection".
A particularly effective method is to have your clients talk to a TCP proxy (e.g. HAProxy) and have it pass the traffic to the current master.
To keep the TCP proxy is sync you can do something similar to http://blog.haproxy.com/2014/01/02/haproxy-advanced-redis-health-check/ which makes HAProxy Sentinel aware.
The major plus for this solution is that it makes your clients very simple - they only connect to one place and the traffic is always forwarded to the correct Redis instance.
One issue with this solution is that HAProxy's configuration DSL does not have the ability to deal with the period when a Redis server restarts and announces itself initially as a master before the sentinels make it a slave. This will lead to missed writes and inconsistent state which depending on you application could be fine or maybe not.
To deal with this I have started to develop a "smarter" daemon to keep HAProxy in sync with the current master. My solution is at https://github.com/mdevilliers/redishappy.

what is meaning partial resynchronization of redis?

Starting with Redis 2.8,redis add a function named "Partial resynchronization".I read this official document,but i don't understand.who can help me?
It is about master-slave replication.
The normal behavior of a Redis slave (slave of command, or configuration) is to connect to the master, ask the master to accumulate master-slave traffic, request a complete dump on filesystem to the master, download this dump on the slave, load the dump, and finally play the accumulated traffic until the slave catches up with the master.
This mechanism is quite robust but not very efficient to cover transient connection drops between the slave and the master. If the master-slave link is down for a couple of seconds, the slave will request a full resynchronization (involving a dump, etc ...), even if only a few commands have been missed.
Starting with 2.8, Redis includes a partial replication mechanism so a slave can reconnect to the master, and if some conditions are met (like a transient connection drop), asks the master to resynchronize without having to dump the whole memory instance.
In order to support this feature, the master has to buffer and keep a backlog of commands, so they can be served to the slaves at any time if needed. If the slave is too late behind the master, the backlog may not contain anymore the required data. In that case, a normal full synchronization is done, as in previous versions.