How to switch masters in this Redis Sentinel configuration? - redis

I have the following Redis/Sentinel configuration:
Redis master A + N slaves
M sentinels watching A, named masterA
the client application query the sentinels for masterA, then query and modify A
Now say A is outdated and I want to replace it by a new Redis master called B (with minimum down time / data loss.). In the end of the operation, I want this:
Redis master B + N slaves
the client application querying and modifying B
I could proceed as follows:
Have the sentinels start watching B, named masterB
Have each slave of A become a slave of B
From there, I am stuck because the client application still asks for masterA when talking to the sentinels. I have two questions:
Is there a way to switch masters names, such that B becomes known as masterA for the sentinels, and therefore for the client application as well?
Is it better to modify the client application code to handle the switch from an old master to a new master?

One way of achieving your aim is to follow the age old solution of "adding another level of indirection".
A particularly effective method is to have your clients talk to a TCP proxy (e.g. HAProxy) and have it pass the traffic to the current master.
To keep the TCP proxy is sync you can do something similar to http://blog.haproxy.com/2014/01/02/haproxy-advanced-redis-health-check/ which makes HAProxy Sentinel aware.
The major plus for this solution is that it makes your clients very simple - they only connect to one place and the traffic is always forwarded to the correct Redis instance.
One issue with this solution is that HAProxy's configuration DSL does not have the ability to deal with the period when a Redis server restarts and announces itself initially as a master before the sentinels make it a slave. This will lead to missed writes and inconsistent state which depending on you application could be fine or maybe not.
To deal with this I have started to develop a "smarter" daemon to keep HAProxy in sync with the current master. My solution is at https://github.com/mdevilliers/redishappy.

Related

Redis cluster via HAProxy

I have a Redis Cluster that clients are connecting to via HAPRoxy with a Virtual IP. The Redis cluster has three nodes (with each node sharing the same server with a running sentinel instance).
My question is, when i clients gets a "MOVED" error/message from a cluster node upon sending a request, does it bypass the HAProxy the second time when it connects since it has been provided with an IP:port when the MOVEd message was issued? If not, how does the HAProxy know the second time to send it to the correct node?
I just need to understand how this works under the hood.
If you want to use HAProxy in front of Redis Cluster nodes, you will need to either:
Set up an HAProxy for each master/slave pair, and wire up something to update HAProxy when a failure happens, as well as probably intercept the topology related commands to insert the virtual IPs rather than the IPs the nodes themselves have and report via the topology commands/responses.
Customize HAProxy to teach it how to be the cluster-aware Redis client so the actual client doesn't know about cluster at all. This means teaching it the Redis protocol, storing the cluster's topology information, and selecting the node to query based on the key(s) being accessed by the consumer code.
With Redis Cluster the client must be able to access every node in the cluster. Of the two options above Option 2 is the "easier" one, but at this point I wouldn't recommend either.
Conceivably you could use the VIP as a "first place to get the topology info" IP but I suspect you'd have serious issues develop as that original IP would not be one of the ones properly being reported as a nod handling data. For that you could simply use round-robin DNS and avoid that problem, or use the built-in "here is a list of cluster IPs (or names?)" to the initial connection configuration.
Your simplest, and least likely to be problematic, route is to go "full native" and simply give full and direct access to every node in the cluster to your clients and not use HAProxy at all.

redis sentinel out of sync with servers in a cluster

We have a setup with a number of redis (2.8) servers (lets say 4) and as many redis sentinels. On startup of each machine, we set a pre-select machine as master through the command line and all the rest as slaves of that. and the sentinels all monitor these machines. The clients first connect to the local sentinel and retrieve the master's IP address and then connect there.
This setup is trouble free most of the time but sometimes the sentinels go out of sync with servers. if I name the machines A,B,C and D - sentinels will think B is master while redis servers are all connected to A as the master. bringing down redis server on B doesnt help either. I had to bring it down and manually "Sentinel failover" on A to fix the issue. Question is
1. What causes this to happen and whats the easiest and quickest way to fix this ?
2. What is best configuration - is there something better than this ?
The only time you should set a master is the first time. Once sentinel has taken over management of replication you should let it do it. This includes on restarts. Don't use the command line to set replication. Let sentinel and redis manage it. This is why you're getting issues - you've told sentinel it is authoritative, but you are telling the Redis servers to ignore sentinel.
Sentinel stores the status in its Config file, so when it restarts it can resume the last configuration. So even on restart, let sentinel do it's job.
Also, if you have 4 servers (be specific, not "let's say") you should be running a quorum of three on your monitor statement in sentinel. With a quorum of two you can wind up with two masters

Redis sentinels in same servers as master/slave?

I've been doing some reading on how to use Redis Sentinel, and I know it's possible to have 2 or more sentinels, and load balance between them when calling from the client side.
Is it good practice to have these 2 sentinels in the same server as my master + slave? In other words, have 1 sentinel in the same physical server as master, and another in same physical server as slave?
It seems to me if the master server dies, the sentinel in the slave will simply promote the slave to a master. if the slave server dies, it doesn't matter because the master is still up.
Am I missing something? What are the downsides?
I rather have the sentinels be in the same physical server as the master/slave to reduce latency.
First, Sentinel is not a load balancer or a proxy for Redis.
Second, not all failures are death of the host. Sometimes the server hangs briefly, sometimes a network cable gets unplugged, etc. Because f this, it is not good practice to run Sentinel on the same hosts as your Redis instance. If you're using Sentinel to manage failover, anything less than three sentinels running on nodes other than your Redis master and slave(s) is asking for trouble.
Sentinel uses a quorum mechanism to vote on a failover and slave. With less than two sentinels you run the risk of split brain where two or more Redis servers think they are master.
Imagine the scenario where you run two servers and run sentinel on each. If you lose one you lose reliable failover capability.
Clients only connect to Sentinel to learn the current master connection information. Anytime the client loses connectivity they repeat this process. Sentinel is not a proxy for Redis - commands for Redis go directly to Redis.
The only reliable reason to run Sentinel with less than three sentinels is for service discovery, which means not using it for failover management.
Consider the two host scenario:
Host A: redis master + sentinel 1 (Quorum 1)
Host B: redis slave + sentinel 2 (Quorum 1)
If Host B temporarily loses network connectivity to Host A in this scenario HostB will promote itself to master. Now you have:
Host A: redis master + sentinel 1 (Quorum 1)
Host B: redis master + sentinel 2 (Quorum 1)
Any clients which connect to Sentinel 2 will be told Host B is the master, whereas clients which connect to Sentinel 1 will be told Host A the master (which, if you have your Sentinels behind a load balancer, means half of your clients).
Thus what you need to run to obtain minimum acceptable reliable failover management is:
Host A: Redis master
Host B: Redis Slave
Host C: Sentinel 1
Host D: Sentinel 2
Host E: Sentinel 2
Your clients connect to the sentinels and obtain the current master for the Redis instance (by name), then connect to it. If the master dies the connection should be dropped by the client whereupon the client will/should connect to Sentinel again and get the new information.
How well each client library handles this is dependent on the library.
Ideally Hosts C,D, and E are either on the same hosts where you connect to Redis from (ie. the client host). or represent a good sampling got them. The main thrust here is to ensure you are checking from where you need to connect to Redis from. Failing that place them in the same DC/Rack/Region as the clients.
If you are wanting to have your clients talk to a load balancer try to have your Sentinels on those LB nodes if possible, adding additional non-LB hosts as needed to obtain an odd number of sentinels > 2. An exception to this is if your client hosts are dynamic in that the number of them is inconsistent (they scale up for traffic, down for slow periods, for example). In this scenario you pretty much must run your Sentinels on non-client and non-redis-server hosts.
Note that if you do this you will then need to write a daemon which monitors the Sentinel PUBSUB channel for the master switch event to update the LB -which you must configure to only talk to the current master (never try to talk to both). It is more work to do that but does make use of Sentinel transparent to the client - which only knows to talk to the LB IP/Port.
It all depends the level of Disaster Recovery you want to achieve, let's assume you have the following components independently of where they are hosted:
2 Sentinels
1 Master
1 Slave
1 Master 1+ Slaves
One host scenario
Host fails: You loose everything, bad replication scenario for most use cases.
Two host scenario
Host 1:
(Current elected) Master
1 Sentinel
Host 2:
Slave
1 Sentinel
It is true that in this scenario you can have the hosts fail one at a time which gives you some level of security. Just try to understand if by different server you mean physically different hosts. If these are just VMs on the same host, you do not get the same level of DR (Disaster Recovery).
Regarding your question:
I rather have the sentinels be in the same server as the master/slave to reduce latency.
Notice that Sentinels keep track of the current master and slaves, but the Redis clients do not connect to the Master VIA the Sentinels, they just get where the current master is via the Sentinels, e.g., in terms of reads and writes you're not looking into any considerable* latency gains.
Configuration provider. Sentinel acts as a source of authority for clients service discovery: clients connect to Sentinels in order to ask for the address of the current Redis master responsible for a given service. If a failover occurs, Sentinels will report the new address.
(see: http://redis.io/topics/sentinel)
The way I see it the only gains you have in terms of latency are the heartbeats sent from the Master and Slaves to the sentinel. As long as you are not spreading your servers through the whole world that should be ok.
It all depends on the use cases, but it seems you would do best to keep things as separate as possible if all other things are equal (costs, distance to clients, etc).
You can have sentinels on the same machine with master/slave, but the sentinels must be odd(3/5/7) in number. There should be atleast three sentinels and it is must to have a dedicated machine for atleast one sentinel.
If you have only two nodes, then in case of a split-brain (network disrupt) situation, the slave will be promoted to master. Both the master now will accept data from clients.However, when things come back to normal, one of the master will be demoted as a slave. That master will lose all of its data as it is a slave now and will replicate the data from current master.
check this for good a explanation of redis architectural desings and split-brain:
https://web.archive.org/web/20170527053749/http://www.yzuzun.com/2015/04/some-architectural-design-concepts-for-redis/
It's certainly not a recommended approach.
The Redis Sentinel docs explains the tradeoffs pretty well. Hope this helps.
https://redis.io/topics/sentinel#example-sentinel-deployments

Failing over with single Replication Group on ElastiCache Redis

I'm testing out ElastiCache backed by Redis with the following specs:
Using Redis 2.8, with Multi-AZ
Single replication group
1 master node in us-east-1b, 1 slave node in us-east-1c, 1 slave node in us-east-1d
The part of the application writing is directly using the endpoint for the master node (primary-node.use1.cache.amazonaws.com)
The part of the application doing only reads is pointing to a custom endpoint (readonly.redis.mydomain.com) configured in HAProxy, which then points to the two other read slave end points. (readslave1.use1.cache.amazonaws.com and readslave2.use1.cache.amazonaws.com)
Now lets say the primary node (master) fails in us-east-1b.
From what I understand, if the master instance fails, I won't have to change the url for the end point for writing to Redis (primary-node.use1.cache.amazonaws.com), although from there, I still have the following questions:
Do I have to change the endpoint names for the read only slaves?
How long until the missing slave is added into the pool?
If there's anything else I'm missing, I'd appreciate the advice/information.
Thanks!
If you are using ElastiCache, you should make use the "Primary EndpointThe" provided by AWS.
That endpoint actually is backed by Route53, if the primary (master) redis is down, since you enable MutliA-Z, it will auto fail over to one of the read replica (slave).
In that case, you don't need to modify the endpoint of your redis.
I don't know why you have such design, seems you only want write to master, but always read from slave.
For HA Proxy part, you should include TCP check for ALL 3 redis nodes, using their "Read Endpoint"
In haproxy, you can check if the endpoint is SLAVE, if yes, your haproxy should redirect the traffic to that.
Notice that in the application layer, if your redis driver don't support auto reconnect, your script will fail to connect to the new master nodes.
In addition to "auto reconnect", since AWS is using Route53 DNS to do fail over, some lib will NOT do NS lookup again, which means the DNS is still pointing to the OLD ip which is the old master.
Using HAproxy can solve this problem.

ActiveMQ - network of multiple brokers configuration

I'm trying to set up three brokers in a network for load balancing -- clients and producers can connect to any of these brokers.
Questions:
What is the recommended topology to use to network these brokers? More specifically, what is the networkConnector configuration to use on each of these brokers? should duplex setting be enabled? (I guess duplex setting depends on the topology we choose)
A->B->C->A or A<-->B<-->C<-->A
Client should use failover protocol to connect to these brokers, right? e.g. failover://(tcp://b1:6161, tcp://b2:6161, tcp://b3:6161)
Any duplicate message handling required on the client side in case of restarts? See http://forum.springsource.org/showthread.php?108461-Failover-issue-in-ActiveMQ -- not clear why duplicate message issue exists here
Ideally we want to set up topology as shown in this post http://edelsonmedia.com/?p=143 -- not clear how to set up networkConnector on masters and slaves.
1.) I can't actually recommend a topology. This choice depends on the number of hops (between the broker where the messages enters the cluster and the broker where the consumer conects to) you can accept. In a heave traffic scenario every hop adds to the network load.
In my company we use a hypercube network (every broker knows every other broaker) and it works great.
Generaly you should make sure that your node configurations are as similar as possible. Using duplex makes sure you have less connections to configure (since the connection from B to A is already part of the duplex connection from A to B) but it introduce a large number of differences into your config file.
Personaly i created my own start script for ActiveMQ that auto-generated the connection config based on the dns names of my cluster (mycluster-01 to 06).
2.) yes. You might want to add ?randomize=false if you want to make sure the client uses the first entry in the list.
3.) Duplicate entries can happen if there are failures during message transport or as race conditions during heavy load. In general one message only is owned by one broker.
4.) dont set up network connectors between masters and slaves (REALLY DONT). Use the pure Master Slave feature of activeMQ and configure the master for each slave (you don't have to configure anything on the masters). For the all Masters configure NetworkConnections to the other Masters with failover to their slaves)
http://activemq.apache.org/pure-master-slave.html