twemproxy (nutcracker) adding redis instance and keeping consistency - redis

I set up twemproxy (nutcracker) with 2 redis servers as backends including slaves, sentinel and failover.
As soon as I add another redis server some of the keys are not able to be read, probably due to twemproxy redirecting to another redis.
How do I add another redis instance without breaking the consistency?
I want to use the setup as a consistent and very fast database.
Here are my settings:
redis_cluster:
auto_eject_hosts: false
distribution: ketama
hash: fnv1a_32
listen: 127.0.0.1:6379
preconnect: true
redis: true
servers:
- 127.0.0.1:7004:1 redis_1
- 127.0.0.1:7005:1 redis_2
I want to keep sharding a job of the server and be able to add instances. Do I need to use another setup?

Twemproxy can't do that. You can use Redis Cluster, or if you want to use Twemproxy you have to use a technique called presharding. Which is, start directly with, like, 32 or 64 instances or alike, even if them all run in the same host to start. Then start moving instances from one box to another in order to scale to multiple actual servers. The word to the right of the instances configured inside Twemproxy "redis_1" are used in order to hash, so that you can change IP address when you move instances, and still the hashing will be the same for that server.
Redis Cluster is release candidate 2 at this point. While it needs more testing and deployments to be battle tested as Redis is, it is already a viable product, so you may want to test it as well.

Related

Redis advantages of Sentinel and Cluster

I'm planning to create a high available Redis Cluster. After reading many articles about building Redis cluster i'm confused. So what exactly are
the advantages of a Redis Sentinel Master1 Slave1 Slave2 Cluster? Is it more reliable as a Redis Multinode Sharded Cluster?
the advantages of a Redis Multinode Sharded Cluster? Is it more reliable as a Redis Sentinel Master1 Slave1 Slave2 Cluster?
Further questions to the Redis Sentinel Master1 Slave1 Slave2 Cluster:
when i have 1 Master and the two Slaves and traffic is getting higher and higher so this cluster will be to small how can i make the cluster bigger?
Further questions to the Redis Multinode Sharded Cluster:
why are there so many demos with running a cluster on a single instance but on different ports? That makes no sense to me.
when i have a cluster with 4 masters and 4 replicas, how can an application or a client be sure to write to the cluster? When Master1 and Slave1 are dying but my application is writing always to the IP of Master1 then it will not work anymore. Which solutions are out there to implement a sharded cluster well to make it available for applications to find it with a single ip and port? Keepalived? HAproxy?
when i juse for a 4 master setup with e.g. Keepalived - doesn't that cancel out the different masters?
furthermore i need to understand why the multinode cluster is only for solutions where more data will need to be written as memory is available. Why? For me a multi master setup sounds good to be scaleable.
is it right that the the sharded cluster setup does not support multikey operations when the cluster is not in caching mode?
I'm unsure if these two solutions are the only ones. Hopefully you guys can help me to understand the architectures of Redis. Sorry for so many questions.
I will try to answer some of your questions but first let me describe the different deployment options of Redis.
Redis has three basic deployments: single node, sentinel and cluster.
Single node - The basic solution where you run single process running Redis.
It is not scalable and not highly available.
Redis Sentinel - Deployment that consist of multiple nodes where one is elected as master and the rest are slaves.
It adds high availability since in case of master failure one of the slaves will be automatically promoted to master.
It is not scalable since the master node is the only node that can write data.
You can configure the clients to direct read requests to the slaves, which will take some of the load from the master. However, in this case slaves might return stale data since they replicate the master asynchronously.
Redis Cluster - Deployment that consist of at least 6 nodes (3 masters and 3 slaves). where data is sharded between the masters. It is highly available since in case of master failure, one of his slaves will automatically be promoted to master. It is scalable since you can add more nodes and reshard the data so that the new nodes will take some of the load.
So to answer your questions:
The advantages of Sentinel over Redis Cluster are:
Hardware - You can setup fully working Sentinel deployment with three nodes. Redis Cluster requires at least six nodes.
Simplicity - usually it is easier to maintain and configure.
The advantages of Redis Cluster over Sentinel is that it is scalable.
The decision between that two deployment should be based on your expected load.
If your write load can be managed with a single Redis master node, you can go with Sentinel deployment.
If one node cannot handle your expected load, you must go with Cluster deployment.
Redis Sentinel deployment is not scalable so making the cluster bigger will not improve your performance. The only exception is that adding slaves can improve your read performance (in case you direct read requests to the slaves).
Redis Cluster running on a single node with multiple ports is only for development and demo purposes. In production it is useless.
In Redis Cluster deployment clients should have network access to all nodes (and node only Master1). This is because data is sharded between the masters.
In case client try to write data to Master1 but Master2 is the owner of the data, Master1 will return a MOVE message to the client, guiding it to send the request to Master2.
You cannot have a single HAProxy in front of all Redis nodes.
Same answer as in 5, in the cluster deployment clients should have direct connection to all masters and slaves not through LB or Keepalived.
Not sure I totally understood your question but Redis Cluster is the only solution for Redis that is scalable.
Redis Cluster deployment support multikey operations only when all keys are in the same node. You can use "hash tags" to force multiple keys to be handled by the same master.
Some good links that can help you understand it better:
Description on the different Redis deployment options: https://blog.octo.com/en/what-redis-deployment-do-you-need
Detailed explanation on the architecture of Redis Cluster: https://blog.usejournal.com/first-step-to-redis-cluster-7712e1c31847

Redis Cluster or Replication without proxy

Is it possible to build one master (port 6378) + two slave (read only port: 6379, 6380) "cluster" on one machine and increase the performances (especially reading) and do not use any proxy? Can the site or code connect to master instance and read data from read-only nodes? Or if I use 3 instances of Redis I have to use proxy anyway?
Edit: Seems like slave nodes don't have any data, they try to redirect to master instance, but it is not correct way, am I right?
Definitely. You can code the paths in your app so writes and reads go to different servers. Depending on the programming language that you're using and the Redis client, this may be easier or harder to achieve.
Edit: that said, I'm unsure how you're running a cluster with a single master - the minimum should be 3.
You need to send a READONLY command after connecting to the slave before you could execute any read commands.
A READONLY command only affects during the current socket session which means you need this command for every TCP connection.

Redis Cluster configuration for CacheManager.NET

I have a basic question about Redis connection parameters from CacheManager.NET perspective. In case when we have Redis cluster with a master and 2 slaves, and with quorum of sentinel processes, should we provide the IP:PORT combinations pointing to the sentinel processes OR the actual Redis server processes.
As suggested in https://seanmcgary.com/posts/how-to-build-a-fault-tolerant-redis-cluster-with-sentinel, it is advisable to ask the sentinel process about the actual master before making the connection. And probably that goes in line with Jedis which provides JedisSentinelPool to do the initial lookup.
Essentially what we want is that the load balancing on reads (via CacheManager.NET) and the writes should go to the current master node of the cluster.
CacheManager relies on StackExchange.Redis for the Redis implementation. Therefor, whatever this client library supports, CacheManager does, too.
Unfortunately, sentinel support is not implemented, there are issues on github for years regarding that
That being said, I did some testing with a Multi Master/Slave + Sentinel setup. Added all the non-sentinel nodes as endpoints to the Multiplexer configuration and it kinda works because the Redis Client knows how to handle multiple master/slave instances.
In the process of switching to another master, the client might throw exceptions that it cannot write to a readonly slave and such. CacheManager might retry those calls and after a short amount of time, when the leader election is done, the call should go through.
But this is not 100% stable and I would not put that in production, as "official" support is still missing...
Alternative to running with sentinels, you could run Redis in Cluster mode which should just work, or behind a proxy which deals with all that master/slave stuff.
Twemproxy is one alternative.
I still have to add support for Twemproxy to CacheManager, as many features are simply not available, like Lua scripting or get a list of servers or flush commands...
This will come in 1.0.2
Hope that helps.

Should I run haproxy for db and redis sentinel on web nodes?

I am setting up a cluster of servers using vagrant and playing with Redis sentinel and HAProxy for Postgresql db connection (with pgpool). I was curious if it make sense to put haproxy and redis sentinel on each of my web server nodes and have them connect directly to those. The thought is that it can create a distributed connection to the DB and redis and reduce the single point of failure to having a single haproxy that they connect to and then split to different db nodes. I can also keep the database connect (via haproxy) and redis (via sentinel) encapsulated to the localhost. Does this make sense?
It only makes sense if you're trying to save up on resources/costs.
Please note that redis sentinel must have a finite list of sentinel instances, which doesn't fit the scenario of placing one per machine, as your maching count would probably scale/change.
Otherwise , it's always makes the most sense to put different infrastructure components ( especially those with clustering/HA nature, such as redis ) on different machines.
By mixing them all together, you usually end up with applications getting in the way of each other and stealing CPU from each-other once the load increases. You also risk designing your applications/scripts/flows to be location aware (i.e assume external resources are always local ) which is also not a really good practice.

Redis cluster via HAProxy

I have a Redis Cluster that clients are connecting to via HAPRoxy with a Virtual IP. The Redis cluster has three nodes (with each node sharing the same server with a running sentinel instance).
My question is, when i clients gets a "MOVED" error/message from a cluster node upon sending a request, does it bypass the HAProxy the second time when it connects since it has been provided with an IP:port when the MOVEd message was issued? If not, how does the HAProxy know the second time to send it to the correct node?
I just need to understand how this works under the hood.
If you want to use HAProxy in front of Redis Cluster nodes, you will need to either:
Set up an HAProxy for each master/slave pair, and wire up something to update HAProxy when a failure happens, as well as probably intercept the topology related commands to insert the virtual IPs rather than the IPs the nodes themselves have and report via the topology commands/responses.
Customize HAProxy to teach it how to be the cluster-aware Redis client so the actual client doesn't know about cluster at all. This means teaching it the Redis protocol, storing the cluster's topology information, and selecting the node to query based on the key(s) being accessed by the consumer code.
With Redis Cluster the client must be able to access every node in the cluster. Of the two options above Option 2 is the "easier" one, but at this point I wouldn't recommend either.
Conceivably you could use the VIP as a "first place to get the topology info" IP but I suspect you'd have serious issues develop as that original IP would not be one of the ones properly being reported as a nod handling data. For that you could simply use round-robin DNS and avoid that problem, or use the built-in "here is a list of cluster IPs (or names?)" to the initial connection configuration.
Your simplest, and least likely to be problematic, route is to go "full native" and simply give full and direct access to every node in the cluster to your clients and not use HAProxy at all.