does redis cluster use consistent hashing - redis

I'm using redis cluster 3.0.1.
I think redis cluster use consistent hashing. The hash slots are similar to virtual nodes in consistent hashing. Cassandra's data distribution is almost the same as redis cluster, and this article said it's consistent hashing.
But the redis cluster turorial said redis cluster does not use consistent hash.
What do I miss? Thanks.

You are right, virtual nodes is quite simalar with hash slot.
But virtual nodes is not an original concept of consistent hashing, but more like a trick used by Cassandra based on consistent hashing. So it's also ok for redis to say not using consistent hashing.
So, don't bother with phraseology.

Consistent hashing gives a lot of nice properties when it hashes servers into a ring:
servers are randomly distributed in the ring, good for balancing load in a cluster
add/remove a server only affect its neighbors, minimize data migration
However, I don't think you can control which key goes to which server: i.e. I can't do the following assignment:
key 1-99 ==> serverA
key 100 ==> serverB
// I can probably reach the same traffic split, 99:1
// by given more virtual nodes to serverA, but it won't guarantee
// key 1 and key 99 is served by the same machine
This is allowed in redis, redis uses hash slot, which I believe is an explicit map from hash value -> severs. This gives you full control, especially it enables multi-key transaction: i.e.
key Alice, key Bob ==> serverA
// move 100$ from Alice's bank account to Bob's in one operation
// no need special technique like 2 phase commit
The key -> server mapping is now managed by yourself as opposed to by consistent hashing, the drawback is that there are more work/responsibility for the admins, Redis also provides commends to help you with the management: rebalance, reshard
Disclaimer: this is my own understanding (here's my sources), I wish I can just #redis_dev on stackoverflow and let them proofread my answer

Related

Redis Cluster minimal configuration

Actually I'm using a configuration of Redis Master-Slaves with HAProxy for Wordpress to have High Avaibility. This configuration is nice and works perfect (I'm able to remove any server to maintenance without downtime). The problem of this configuration is that only one Redis server is getting all the traffic and the others are just waiting if that server dies, so in a very high load webpage can be a problem, and add more servers is not a solution because always only one will be master.
With this in mind, I'm thinking if maybe I can just use a Redis Cluster to allow to read/write on all nodes but I'm not really sure if it will works on my setup.
My setup is limited to three nodes the most of times, and I've read in some places that Redis cluster minimal setup is three nodes, but six is recommended. This is rational because this setup allow to have Slaves nodes that will become Masters if her Master dies, and then all data will be kept, but what happend if data don't cares?. I mean, on my setups the data is just cached objects, so if don't exists it just create it again so:
The data will be lost (don't care), and the other nodes will get the objects from clients again, to serve it on later requests (like happen if a Flush the data).
The nodes will answer that data doesn't exists and will reject to cache because the object would have to be on other node that is dead.
Someone know it?
Thanks!!
When a master dies, the Redis cluster goes to a down state and any command involving a key served by the failed instance will fail.
This may differ from some other distributed software because Redis Cluster is not the kind of program that every master holds all data. In fact, the key space is horizontally partitioned and each key is served by only one master.
This is mentioned in the specification:
The key space is split into 16384 slots...
a single hash slot will be served by a single node...
The base algorithm used to map keys to hash slots is the following:
HASH_SLOT = CRC16(key) mod 16384
When you setup a cluster, you certainly ask each node to serve a set of slots, and each slot can only be served by one node. If one node dies, you lose the slots on this node unless you have a slave failover to serve them, so that any command involving keys mapped to these slots will fail.

Redis: count specific class of keys on a Redis cluster?

Is there an efficient method to count specific class of keys on a Redis cluster?
Here, 'specific class of keys' means the keys that are used for a common purpose; for example, session keys. They can have a common key name prefix. There can be multiple classes. From now, I will refer the class of keys as simply the keys.
What I want to do is as follows:
Redis cluster must be used.
The keys must be distributed to the nodes of the Redis cluster.
There must be an efficient way to count the number of the keys on all of the nodes of the Redis cluster.
The keys can have TTL - that is, can expire.
The number of the nodes of the Redis cluster can be changed on runtime, and hash slots can be redistributed.
Clients are implemented using Node.js.
I've read the documentation, but could not find a proper solution.
Thanks in advance.
No, basically. That doesn't exist for "classic" (non-cluster), either. To do that without an additional storage mechanism, you would need to use SCAN repeatedly to iterate over the entire keyspace. Fortunately it does at least accept a filter (so you don't need to fetch every key), but is far from efficient - you'd typically only do this periodically as a review feature, not an operational feature. We actually include such a feature in "opserver"'s redis plugin.
When you switch to cluster, you'd need to repeat this but on one of each set of replication verticals. You would typically get that list via the CLUSTER commands, so the dynamic nature of the nodes is moot.
In both classic and cluster, it would be recommended to only do this on a replica - not the master. And again: only as an admin tool, not as a routine part of your system.
Do not use KEYS to do this. Prefer SCAN.

Query multiple keys in Redis in Cluster mode

I'm using Redis in Cluster mode(6 nodes, 3 masters, and 3 slaves) and I'm using SE.Redis, However, commands with multiple keys in different hash slots are not supported as usual
so I'm using HashTags to be sure that certain key belongs to a particular hash slot using the {}. for example I have 2 keys like cacheItem:{1}, cacheItem:{94770}
I set those keys using ( each key in a separate request):
SEclient.Database.StringSet(key,value)
this works fine,
but now I want to query key1 and key2 which belongs to multiple hash slot
SEclient.Database.StringGet(redisKeys);
above will fail and throws an exception because those keys belong to multiple hash slots
while querying keys, I can't make sure that my keys will belong to the same hash slot,
this example is just 2 keys I have hundreds of keys which I want to query.
so I have following questions:
how can I query multiple keys when they belong to different hash slots?
what's the best practice to do that?
should I calculate hash slots on my side and then send individual requests per hash slot?
can I use TwemProxy for my scenario?
any helps highly appreciated
I can’t speak to SE.Redis, but you are on the right track. You either need to:
Make individual requests per key to ensure they go to the right cluster node, or...
Precalculate the shard + server each key belongs to, grouping by the host. Then send MGET requests with those keys to each host that owns them
Precalculating will require you (or your client) to know the cluster topology (hash slot owners) and the Redis key hashing method (don’t worry, it is simple and well documented) up front.
You can query cluster info from Redis to get owned slots.
The basic hashing algorithm is HASH_SLOT=CRC16 (key) mod 16384. Search around and you can find code for that for about any language 🙂 Remember that the use of hash tags makes this more complicated! See also: https://redis.io/commands/cluster-keyslot
Some Redis cluster clients will do this for you with internal magic (e.g. Lettuce in Java), but they are not all created equal 🙂
Also be aware that cluster topology can change at basically any time, and the above work is complicated. To be durable you’ll want to have retries if you get cross slot errors. Or you can just make many requests for single keys as it is much much simpler to maintain.

Redis sentinel vs clustering

I understand redis sentinel is a way of configuring HA (high availability) among multiple redis instances. As I see, there is one redis instance actively serving the client requests at any given time. There are two additional servers are on standby (waiting for a failure to happen, so one of them can be in action again).
Is it waste of resources?
Is there a better way of using full use of the resources available?
Is Redis clustering an alternative to Redis sentinel?
I already looked up redis documentation for sentinel and clustering, can somebody having experience explain please.
UPDATE
OK. In my real deployment scenario I have two servers dedicated for redis. I have another server my Jboss server is running. The application running in Jboss is configured to connect to redis master server(M).
Failover scenario
Ideally, I think when Master cache server fails (either Redis process goes down or machine failure) the application in Jboss needs to connect to Slave cache server. How would I configure the redis servers to achieve this?
+--------+ +--------+
| Master |---------| Slave |
| | | |
+--------+ +--------+
Configuration: quorum = 1
First, lets talk sentinel.
Sentinel manages the failover, it doesn't configure Redis for HA. It is an important distinction. Second, the diagram you posted is actually a bad setup - you don't want to run Sentinel on the same node as the Redis nodes it is managing. When you lose that host you lose both.
As to "Is it waste of resources?" it depends on your use case. You don't need three Redis nodes in that setup, you only need two. Three increases your redundancy, but is not required. If you need the added redundancy then it isn't a waste of resources. If you don't need redundancy then you just run a single Redis instance and call it good - as running more would be "wasted".
Another reason for running two slaves would be to split reads. Again, if you need it then it wouldn't be a waste.
As to "Is there a better way of using full use of the resources available?" we can't answer that as it is far too dependent on your specific scenario and code. That said if the amount of data to store is "small" and the command rate is not exceedingly high, then remember you don't need to dedicate a host to Redis.
Now for "Is Redis clustering an alternative to Redis sentinel?".
It really depends entirely on your use case. Redis Cluster is not an HA solution - it is a multiple writer/larger-than-ram solution. If your goal is just HA then it likely won't be suitable for you. Redis Cluster comes with limitations, particularly around multi-key operations, so it isn't necessarily a straightforward "just use cluster" operation.
If you think having three hosts running Redis (and three running sentinel) is wasteful, you'll likely hold Cluster to be even more so as it does require more resources.
The questions you've asked are probably too broad and opinion-based to survive as written. If you have a specific case/problem you are working out please update with that so we can provide specific assistance and information.
Update for specifics:
For proper failover management in your scenario I would go with 3 sentinels, one running on your JBoss server. If you have 3 JBoss nodes then go with one on each. I'd have a Redis pod (master+slave) on separate nodes, and let sentinel manage the failover.
From there it is a matter of wiring up JBoss/Jedis to use Sentinel for it's information and connection management. As I don't use those a quick search turns up that Jedis has the support for it, you just need to configure it correctly. Some examples I found are at Looking for an example of Jedis with Sentinel and https://github.com/xetorthio/jedis/issues/725 which talk about JedisSentinelPool being the route for using a pool.
When Sentinel executes a failover the clients will be disconnected and Jedis will (should?) handle the reconnection by asking the Sentinels who the current master is.
This is not direct answer to your question, but think, it's helpful information for Redis newbies, like me. Also this question appears as the first link in google when searching the "Redis cluster vs sentinel".
Redis Sentinel is the name of the Redis high availability solution...
It has nothing to do with Redis Cluster and is intended to be used by
people that don't need Redis Cluster, but simply a way to perform
automatic fail over when a master instance is not functioning
correctly.
Taken from the Redis Sentinel design draft 1.3
It's not obviuos when you are new to Redis and implementing failover solution. Official documentations about sentinel and clustering doens't compare to each other, so it's hard to choose the right way without reading tons of documentations.
The recommendation, everywhere, is to start with an odd number of instances, not using two or a multiple of two. That was corrected, but lets correct some other points.
First, to say that Sentinel provides failover without HA is false. When you have failover, you have HA with the additional benefit of application state being replicated. The distinction is that you can have HA in a system without replication (it's HA but it's not fault tolerant).
Second, running a sentinel on the same machine as its target redis instance is not a "bad setup": if you lose your sentinel, or your redis instance, or the whole machine, the results are the same. That's probably why every example of such configurations shows both running on the same machine.
Additional info to above answers
Redis Cluster
One main purpose of the Redis cluster is to equally/uniformly distribute
your data load by sharding
Redis Cluster does not use consistent hashing, but a different form of sharding where every key is conceptually part of what is called as hash slot
There are 16384 hash slots in Redis Cluster, Every node in a Redis Cluster is responsible for a subset of the hash slots, so, for example, you may have a cluster with 3 nodes,
where:
Node A contains hash slots from 0 to 5500,
Node B contains hash slots from 5501 to 11000,
Node C contains hash slots from 11001 to 16383
This allows us to add and remove nodes in the cluster easily. For example, if we want to add a new node D, we need to move some hash slot from nodes A, B, C to D
Redis cluster supports the master-slave structure, you can create slaves A1,B1, C2 along with master A, B, C when creating a cluster, so when master B goes down slave B1 gets promoted as master
You don't need additional failover handling when using Redis Cluster and you should definitely not point Sentinel instances at any of the Cluster nodes.
So in practical terms, what do you get with Redis Cluster?
1.The ability to automatically split your dataset among multiple nodes.
2.The ability to continue operations when a subset of the nodes are experiencing failures or are unable to communicate with the rest of the cluster.
Redis Sentinel
Redis supports multiple slaves replicating data from a master node.
This provides a backup for data in master node.
Redis Sentinel is a system designed to manage master and slave. It runs as separate program. The minimum number of sentinels required in an ideal system is 3. They communicate among themselves and make sure that the Master is alive, if not alive they will promote one of the slaves as master, so later when the dead node spins up it will be acting as a slave for the new master
Quorum is configurable. Basically it is the number of sentinels that need to agree as the master is down. N/2 +1 should agree. N is the number of nodes in the Pod (note this setup is called a pod and is not a cluster)
So in practical terms, what do you get with Redis Sentinel?
It will make sure that Master is always available (if master goes down, the slave will be promoted as master)
Reference :
https://fnordig.de/2015/06/01/redis-sentinel-and-redis-cluster/
https://redis.io/topics/cluster-tutorial
This is my understanding after banging my head throughout the documentation.
Sentinel is a kind of hot standby solution where the slaves are kept replicated and ready to be promoted at any time. However, it won't support any multi-node writes. Slaves can be configured for read operations. It's NOT true that Sentinel won't provide HA, it has all the features of a typical active-passive cluster ( though that's not the right term to use here ).
Redis cluster is more or less a distributed solution, working on top of shards. Each chunk of data is being distributed among masters and slaves nodes. A minimum replication factor of 2 ensures that you have two active shards available across master and slaves.
If you know the sharding in Mongo or Elasticsearch, it will be easy to catch up.
Redis can operate in partitioned cluster (with many masters and slaves of those masters) or a single instance mode (single master with replica slaves).
The link here says:
When using Redis in single instance mode, in which a single Redis server manages the entire unpartitioned database, Redis Sentinel is used to manage its availability
It also says:
A Redis cluster, in which data is partitioned among multiple primary instances, manages availability by itself and requires no extra components.
So HA can be ensured in the 2 mentioned scenarios. Hope this clears the doubts. Redis cluster and sentinels are not alternative to each other. They are just used to ensure HA in different cases of partitioned or non-partitioned master.
Redis Sentinel performs the failover promoting replicas when they see a master is down. You typically want an odd number of sentinel nodes. For the example of one master and one replica, 3 sentinels should be used so there can be a consensus on the decision. Ideally the 3rd sentinel is on a 3rd server so the decision is not skewed (depending on failure). Sentinel takes care of changing the master/replica config settings on your nodes so that promotion and syncing occurs in the correct order and you don’t overwrite data by bringing on an old failed master that now contains older data.
Once you have your sentinel nodes set up to perform failovers, you need to ensure you are pointing to the correct instance. See an example of HAProxy configuration for this. HAProxy performs health checks and will point to the new master if a failure occurs.
Clustering will allow you to scale horizontally and can help handle high loads. It does take a bit of work to set up and configure up front.
There is an open source fork of Redis, “KeyDB” that has eliminated the need for sentinel nodes with an active-replica option. This allows the replica node to accept reads and writes. When a failover occurs HAProxy stops reads/writes with the failed node and just uses the remaining active node which is already sync’d. Timestamping enables the failed nodes to rejoin automatically and resync without losing data when they come back online. Setup is simple and for higher traffic you don’t need special upfront setup to direct reads to the replica node and read/writes to the master. See example of active replication here. KeyDB is also multi-threaded which for some applications might be an alternative to clustering, but really depends on what your needs are.
There is also an example of setting up clustering manually and with the create-cluster tool. These are the same steps if you are using Redis (replace 'keydb' with 'redis' in instruction)

Redis: Efficient cluster of servers for large key set

I have a very large set of keys, 200M keys, with small values, <100 bytes, to store and I'm trying to use Redis. The problem is such that I have 10 Redis DB to split the keys over, but currently I'm on a single server with those 10 Redis DB. By a Redis DB I mean using SELECT. From my calculations it looks like I'm going to blow out memory. I think I'll need over 4TB of memory for this case! What are my options? First, my calculation is based on 10000 keys with 100 byte values taking 220MB of RAM (this is from a table I found). So simply put (2*10^8 / 10^4) * 220MB = 4.4TB.
If my calculation looks correct, what are my options? I've read on different posts that Redis VM is no longer an option. Can I use a Redis cluster? This still appears to require too many servers to be practical. I understand I could switch to another DB, but I'd like that to be the last resort option.
Firstly, using shared databases (i.e. the SELECT command) isn't a recommended practice since all of these databases are essentially managed by the same Redis process. It is preferable having 10 separate Redis processes (even on the same server) in order to avoid contention (more info here).
Next, there are ways to reduce the memory footprint of your database. You could, for example, perform client-side compression (see here) or consider other optimizations such as using Hashes to keep multiple values (as described here).
That said, a Redis server is ultimately bound by the amount of RAM that the host provides. Once you've reached that limit you'll need to shard your database and use a Redis cluster. Since you're already using multiple databases this shouldn't pose a big challenge as your code should already be compatible with that to a degree. Sharding can be done in one of three approaches: client, proxy or Redis Cluster. Client-side sharding can be implemented in your code or by the Redis client that you're using (if the client library that you're using supports that). Redis Cluster (v3) is expected to be released in the very near future and already has a stable release candidate. As for proxy-based sharding, there are several open source solutions out there, including Twitter's twemproxy, Netflix's dynomite and codis. Additional information about sharding and partitioning can be found here.
Disclaimer: I work at Redis Labs. Lastly, AFAIK there's only one Redis-as-a-Service provider that already provides built-in support for clustering Redis. Redis Labs' Redis Cloud is a fully-managed service that can scale seamlessly to any required capacity. Our clusters support both the '{}' hashtag standard as well as sharding by RegEx - more about this can be found here.
You can use LMDB with Dynomite to store data beyond your memory capacity. LMDB uses both disk and memory to store data. Dynomite make LMDB to be distributed.
We have done a POC with this combo and they work nicely together.
For more information, please check out our open issue here:
https://github.com/Netflix/dynomite/issues/254