I have read redis-cluster documents but couldn't get the gist of it. Can someone help me understand it from the basics?
Redis Cluster does not use consistent hashing, but a different form of
sharding where every key is conceptually part of what we call an hash
slot.
Hash slots are defined by Redis so the data can be mapped to different nodes in the Redis cluster. The number of slots (16384 ) can be divided and distributed to different nodes.
For example, in a 3 node cluster one node can hold the slots 0 to 5640, next 5461 to 10922 and the third 10923 to 16383. Input key or a part of it is hashed (run against a hash function) to determine a slot number and hence the node to add the key to.
Think of it as logical shards. So redis has 16384 logical shards and these logical shards are mapped to the available physical machines in the cluster.
Mapping may look something like:
0-1000 : Machine 1
1000-2000 : Machine 2
2000-3000 : Machine 3
...
...
When redis gets a key, it does the following:
Calculate hash(key)%16384 -> this finds the logical shard where the given key belongs, let's say it comes to 1500
Look at logical shard to physical machine mapping to identify physical machine. From the above mapping, logical shard 1500 is served by Machine 2. So route the request to physical machine #2.
You can consider slot as its literal meaning, just like slots in the real world.
Every key belongs to a certain slot by some rules. And a slot also belongs to a certain redis node by config.
Related
I am new to redis, I do not know the meaning of "keyspace" and "key space" in redis terminology which I encountered in redis official website. Can someone help me to clear that? Thanks.
These terms refer to the internal dictionary that Redis manages, in which all keys are stored. The keyspace of a Redis database is managed by a single server in the case of a single instance deployment, and is divided to exclusive slot ranges managed by different nodes when using cluster mode.
In a key-value database, all keys can be in one node or divided in multiple nodes. Suppose I am storing telephone dictionary as key-value store with name as key and phone number as a value. If I store names A-L on one node and M-Z on another node, I divide my database into two key spaces. When I run query to search number of Smith, I need to search only second key space or node. This divides the query on multiple nodes and divide the work giving faster result. This could be shared-nothing model of working.
I want to understand the behavior of aerospike in different consistancy mode.
Consider a aerospike cluster running with 3 nodes and replication factor 3.
AP modes is simple and it says
Aerospike will allow reads and writes in every sub-cluster.
And Maximum no. of node which can go down < 3 (replication factor)
For aerospike strong consistency it says
Note that the only successful writes are those made on replication-factor number of nodes. Every other write is unsuccessful
Does this really means the no writes are allowed if available nodes < replication factor.
And then same document says
All writes are committed to every replica before the system returns success to the client. In case one of the replica writes fails, the master will ensure that the write is completed to the appropriate number of replicas within the cluster (or sub cluster in case the system has been compromised.)
what does appropriate number of replica means ?
So if I lose one node from my 3 node cluster with strong consistency and replication factor 3 , I will not be able to wright data ?
For aerospike strong consistency it says
Note that the only successful writes are those made on
replication-factor number of nodes. Every other write is unsuccessful
Does this really means the no writes are allowed if available nodes <
replication factor.
Yes, if there are fewer than replication-factor nodes then it is impossible to meet the user specified replication-factor.
All writes are committed to every replica before the system returns
success to the client. In case one of the replica writes fails, the
master will ensure that the write is completed to the appropriate
number of replicas within the cluster (or sub cluster in case the
system has been compromised.)
what does appropriate number of replica means ?
It means replication-factor nodes must receive the write. When a node fails, a new node can be promoted to replica status until either the node returns or an operator registers a new roster (cluster membership list).
So if I lose one node from my 3 node cluster with strong consistency
and replication factor 3 , I will not be able to wright data ?
Yes, so having all nodes a replicas wouldn't be a very useful configuration. Replication-factor 3 allows up to 2 nodes to be down, but only if the remaining nodes are able to satisfy the replication-factor. So for replication-factor 3 you would probably want to run with a minimum of 5 nodes.
You are correct, with 3 nodes and RF 3, losing one node means the cluster will not be able to successfully take write transactions since it wouldn't be able to write the required number of copies (3 in this case).
Appropriate number of replicas means a number of replicas that would match the replication factor configured.
In the aerospike documentation, it is mentioned that aerospike has 4096 logical partitions and each key is hashed and eventually mapped to any of the partitions between 1 to 4096, which determines in which node the data for that key should be stored.
However if we have two keys "A" and "AB" and we want to store them in the same node, is there a way?
In Redis it can be achieved by making the keys as "A" and "{A}B" that will make sure that the key "{A}B" will go to a node where "A" is hashed and stored.
In Apache Ignite, same can be done using "AffinityKey".
Does a similar idea exist in Aerospike?
Thanks
Aerospike was designed as a distributed database. Redis was designed to run on a single node, and lacks concepts such as data distribution, clustering, replication, failover, at least natively. I'm aware that you can use various application-side shenanigans to make it into an ad-hoc cluster.
Don't worry about the implementation details of Aerospike's data distribution. Those happen automatically between the client and cluster, and don't require you to do anything on the application side. Instead, think about your access patterns.
First, your Aerospike cluster will make sure the data is evenly distributed. Because work is directly proportional to data, you should make sure the nodes are homogeneous. You can then expect multi-node operations to wrap up in roughly the same amount of time on each node.
You can create a secondary index on the fields that you'll be querying often to enhance the speed of the query. Release 3.12 adds predicate filtering, allowing you to create more complex query predicates on top of the initial secondary index based filter (also see the Java client's PredExp class).
If you don't want to use secondary indexes (there are several valid reasons), you can create your own lookup using external records. In a set called country-school you can have a record for each country (keys such as 'india', 'luxembourg') with the value being a list containing the IDs of the schools in that country. You can get the list with a single get (or a batch-get if it's several records, such as india-1, india-2, ... , india-9999), then use the results to compose a batch-get operation for the schools. Batch reads return results in the ordered you asked so you can get a large batch, check whether the last element is null, and if not get another batch.
('ns1', 'country-school', 'us-california') => [ 1, 2, 3, 5, 8, 11, .. ]
Similarly, you can create permutations such as country-state-city, (example, us-california-oakland) with smaller lists. This costs some extra space, but gives you faster (key-value based) retrieval without spending memory on secondary indexes.
('ns1', 'country-school', 'us-california-oakland') => [ 1, 5, 42, .. ]
I have a Redis Cluster. I am using JedisCluster client to connect to my Redis.
My application is a bit complex and I want to basically control to which partition data from my application goes. For example, my application consists of sub-module A, B, C. Then I want that all data from sub-module A should go to partition 1 for example. Similarly data from sub-module B should go to partition 2 for example and so on.
I am using JedisCluster, but I don't find any API to write to a particular partition on my cluster. I am assuming I will have same partition names on all my Redis nodes and handling which data goes to which node will be automatically handled but to which partition will be handled by me.
I tried going through the JedisCluster lib at
https://github.com/xetorthio/jedis/blob/b03d4231f4412c67063e356a7c3acf9bb7e62534/src/main/java/redis/clients/jedis/JedisCluster.java
but couldn't find anything. Please help?
Thanks in advance for the help.
That's not how Redis Cluster works. With Redis Cluster, each node (partition) has a defined set of keys (slots) that it's handling. Writing a key to a master node which is not served by the master results in rejection of the command.
From the Redis Cluster Spec:
Redis Cluster implements a concept called hash tags that can be used in order to force certain keys to be stored in the same node.
[...]
The key space is split into 16384 slots, effectively setting an upper limit for the cluster size of 16384 master nodes (however the suggested max size of nodes is in the order of ~ 1000 nodes).
Each master node in a cluster handles a subset of the 16384 hash slots.
You need to define at Cluster configuration-level which master node is exclusively serving a particular slot or a set of slots. The configuration results in data locality.
The slot is calculated from the key. The good news is that you can enforce a particular slot for a key by using Key hash tags:
There is an exception for the computation of the hash slot that is used in order to implement hash tags. Hash tags are a way to ensure that multiple keys are allocated in the same hash slot. This is used in order to implement multi-key operations in Redis Cluster.
Example:
{user1000}.following
The content between {…} is used to calculate the slot. Key hash tags allow you to group keys on particular nodes and enforce the same data locality when using arbitrary hash tags.
You can also go a step further by using known hash tags that map to slots (you'd need either precalculate a table or see this Gist). By using known hash tags that map to a specific slot you're able to select the slot and so the master node on which the data is located.
Everything else is handled by your Redis client.
I'm using Apache Cassandra 2.1.1 and when using nodetool status the Load for one of my nodes is about half the size of the other two while the Owns is almost equal on all the nodes. I am somewhat new to Cassandra and don't know if I should be worried about this or not. I have tried using repair and cleanup after restarting all the nodes, but it still appears unbalanced. I am using GossipingPropertyFileSnitch with each node configured dc=DC1 and rack=RAC1 specified in cassandra-rackdc.properties. I am also using Murmur3Partitioner with NetworkTopologyStrategy where my keyspace is defined as
CREATE KEYSPACE awl WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1': '2'} AND durable_writes = true;
I believe the problem to be with the awl keyspace since the size of the data/awl folder is the same size as reported by nodetool status. My output for nodetool status is below. Any help would be much appreciated.
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.1.1.152 3.56 GB 256 68.4% d42945cc-59eb-41de-9872-1fa252762797 RAC1
UN 10.1.1.153 6.8 GB 256 67.2% 065c471d-5025-4bf1-854d-52d579f2a6d3 RAC1
UN 10.1.1.154 6.31 GB 256 64.4% 46f05522-29cc-491c-ab65-334b205fc415 RAC1
I would suspect this is due to the distribution of the key values that are being inserted. They are probably not well distributed across the possible key values, so many of them are hashing to one node. Since you are using replication factor 2, the second replica is the next node in the ring, resulting in two nodes with more data than the third node.
You didn't show your table schema, so I don't know what you are using for the partition and clustering keys. You want to use key values that have a high cardinality and good distribution to avoid hot spots where a lot of inserts are hashing to one node. With a better distribution you will get better performance and more even space usage across the nodes.