Using JedisCluster to write to a partition in a Redis Cluster - redis

I have a Redis Cluster. I am using JedisCluster client to connect to my Redis.
My application is a bit complex and I want to basically control to which partition data from my application goes. For example, my application consists of sub-module A, B, C. Then I want that all data from sub-module A should go to partition 1 for example. Similarly data from sub-module B should go to partition 2 for example and so on.
I am using JedisCluster, but I don't find any API to write to a particular partition on my cluster. I am assuming I will have same partition names on all my Redis nodes and handling which data goes to which node will be automatically handled but to which partition will be handled by me.
I tried going through the JedisCluster lib at
https://github.com/xetorthio/jedis/blob/b03d4231f4412c67063e356a7c3acf9bb7e62534/src/main/java/redis/clients/jedis/JedisCluster.java
but couldn't find anything. Please help?
Thanks in advance for the help.

That's not how Redis Cluster works. With Redis Cluster, each node (partition) has a defined set of keys (slots) that it's handling. Writing a key to a master node which is not served by the master results in rejection of the command.
From the Redis Cluster Spec:
Redis Cluster implements a concept called hash tags that can be used in order to force certain keys to be stored in the same node.
[...]
The key space is split into 16384 slots, effectively setting an upper limit for the cluster size of 16384 master nodes (however the suggested max size of nodes is in the order of ~ 1000 nodes).
Each master node in a cluster handles a subset of the 16384 hash slots.
You need to define at Cluster configuration-level which master node is exclusively serving a particular slot or a set of slots. The configuration results in data locality.
The slot is calculated from the key. The good news is that you can enforce a particular slot for a key by using Key hash tags:
There is an exception for the computation of the hash slot that is used in order to implement hash tags. Hash tags are a way to ensure that multiple keys are allocated in the same hash slot. This is used in order to implement multi-key operations in Redis Cluster.
Example:
{user1000}.following
The content between {…} is used to calculate the slot. Key hash tags allow you to group keys on particular nodes and enforce the same data locality when using arbitrary hash tags.
You can also go a step further by using known hash tags that map to slots (you'd need either precalculate a table or see this Gist). By using known hash tags that map to a specific slot you're able to select the slot and so the master node on which the data is located.
Everything else is handled by your Redis client.

Related

What is keyspace in redis ?

I am new to redis, I do not know the meaning of "keyspace" and "key space" in redis terminology which I encountered in redis official website. Can someone help me to clear that? Thanks.
These terms refer to the internal dictionary that Redis manages, in which all keys are stored. The keyspace of a Redis database is managed by a single server in the case of a single instance deployment, and is divided to exclusive slot ranges managed by different nodes when using cluster mode.
In a key-value database, all keys can be in one node or divided in multiple nodes. Suppose I am storing telephone dictionary as key-value store with name as key and phone number as a value. If I store names A-L on one node and M-Z on another node, I divide my database into two key spaces. When I run query to search number of Smith, I need to search only second key space or node. This divides the query on multiple nodes and divide the work giving faster result. This could be shared-nothing model of working.

what do we mean by hash slot in redis cluster?

I have read redis-cluster documents but couldn't get the gist of it. Can someone help me understand it from the basics?
Redis Cluster does not use consistent hashing, but a different form of
sharding where every key is conceptually part of what we call an hash
slot.
Hash slots are defined by Redis so the data can be mapped to different nodes in the Redis cluster. The number of slots (16384 ) can be divided and distributed to different nodes.
For example, in a 3 node cluster one node can hold the slots 0 to 5640, next 5461 to 10922 and the third 10923 to 16383. Input key or a part of it is hashed (run against a hash function) to determine a slot number and hence the node to add the key to.
Think of it as logical shards. So redis has 16384 logical shards and these logical shards are mapped to the available physical machines in the cluster.
Mapping may look something like:
0-1000 : Machine 1
1000-2000 : Machine 2
2000-3000 : Machine 3
...
...
When redis gets a key, it does the following:
Calculate hash(key)%16384 -> this finds the logical shard where the given key belongs, let's say it comes to 1500
Look at logical shard to physical machine mapping to identify physical machine. From the above mapping, logical shard 1500 is served by Machine 2. So route the request to physical machine #2.
You can consider slot as its literal meaning, just like slots in the real world.
Every key belongs to a certain slot by some rules. And a slot also belongs to a certain redis node by config.

B+ Tree and Index Page in Apache Ignite

I'm trying to understanding the purpose of B+ Tree and Index Pages for Apache Ignite as described here: https://apacheignite.readme.io/docs/page-memory
I have a few questions:
What exactly does an Index Page contain? An ordered list of hash code values for keys that fall into the index page and "other" information that will be used to locate and index into the data page to store/get the key-value pair?
Since hash codes are being used in the index pages, what would happen if collision occurs?
For a "typical" application, do we expect the number of data pages to be much higher than the number of index pages ? (since data pages contain key-value pairs)
What type of relation exists between a distributed cache that we create using ignite.getOrCreateCache(name) and a memory region? 1-to-1, Many-to-1, 1-to-Many, or Many-to-Many?
Consider the following pseudo code:
Ignite ignite = Ignition.start("two_server_node_config");
IgniteCache<Integer,String> cache = ignite.getOrCreateCache("my_cache");
cache.put(7, "abcd");
How does Ignite determine the node to put the key into?
Once the node where to put the key is determined, how does Ignite locate the specific memory region the key belongs to?
Thanks
Index Page contains ordered list of hash values along with links to key-value pairs stored in durable memory. Link = page ID + offset inside page
All links to objects with collided hashes will be present in index page. To perform a lookup, Ignite will dereference links and compare the keys.
This is dependent on object size. You can roughly estimate ratio of data pages to index pages in "typical" application as 90 to 10. However, share of index pages will grow if you add extra indexes: https://apacheignite.readme.io/v2.1/docs/indexes#section-registering-indexed-types
You may also find useful the most recent version of docs: https://apacheignite.readme.io/v2.1/docs/memory-architecture
Answering last two questions:
Many-to-1. Same memory region can be used for multiple caches.
This is based on affinity. Basically, cache key is mapped to affinity key (by default they are the same), and then affinity function is called to determine partition and node. Some more details about affinity here: https://apacheignite.readme.io/docs/affinity-collocation

Understanding Cache Keys, Index, Partition and Affinity w.r.t reads and writes

I am new to Apache Ignite and come from a Data Warehousing background.
So pardon me if I try to relate to Ignite through DBMS jargon.
I have gone through forums but I am still unclear about some of the basics.
I also would like specific answers to the scenario I have posted later.
1.) CacheMode=PARTITIONED
a.) When a cache is declared as partitioned, does the data get equally
partitioned across all nodes by default?
b.) Is there an option to provide a "partition key" based on which the data
would be distributed across the nodes? Is this what we call the Affinity
Key?
c.) How is partitioning different from affinity and can a cache have both
partition and affinity key?
2.) Affinity Concept
With an Affinity Key defined, when I load data (using loadCache()) into a partitioned cache, will the source rows be sent to the node they belong to or all the nodes on the cluster?
3.) If I create one index on the cache, does it by default become the partition/
affinity key as well? In such a scenario, how is a partition different from index?
SCNEARIO DESCRIPTION
I want to load data from a persistent layer into a Staging Cache (assume ~2B) using loadCache(). The cache resides on a 4 node cluster.
a.) How to load data such that each node has to process only 0.5B records?
Is is by using Partitioned Cache mode and defining an Affinity Key?
Then I want to read transactions from the Staging Cache in TRANSACTIONAL atomicity mode, lookup a Target Cache and do some operations.
b.) When I do the lookup on Target Cache, how can I ensure that the lookup is happening only on the node where the data resides and not do lookup on all the nodes on which Target Cache resides?
Would that be using the AffinityKeyMapper API? If yes, how?
c.) Lets say I wanted to do a lookup on a key other than Affinity Key column, can creating an index on the lookup column help? Would I end up scanning all nodes in that case?
Staging Cache
CustomerID
CustomerEmail
CustomerPhone
Target Cache
Seq_Num
CustomerID
CustomerEmail
CustomerPhone
StartDate
EndDate
This is answered on Apache Ignite users forum: http://apache-ignite-users.70518.x6.nabble.com/Understanding-Cache-Key-Indexes-Partition-and-Affinity-td11212.html
Ignite uses AffinityFunction [1] for data distribution. AF implements two mappings: key->partition and partition->node.
Key->Partition mapping is definitely map entry to partition. It doesn't bother of backups, but data collocation\distribution over partitions.
Usually, entry key (actually it's hashcode) is used to calculate partition entry belongs to.
But you can use AffinityKey [2] that would be use instead to manage data collocation. See also 'org.apache.ignite.cache.affinity.AffinityKey' javadoc.
Partition->Node mapping determines primary and backup nodes for partition. It doesn't bother of data collocation, but backups and partition distribution among nodes
Cache.loadCache just makes all nodes to call localLoadCache method. Which calls CacheStore.loadCache. So, each of grid nodes will load all the data from cache store and then discard data that is not local for the node.
Same data may resides on several nodes if you use a backups. AffinityKey should be a part of entry key and if AffinityKey mapping is configured then AffinityKey will be used instead of entry key for entry->partition mapping
and AffinityKey will be passed to AffinityFunction.
Indexes always resides on same node with the data.
a. To achieve this you should implement CacheStore.loadCache method to load data for certain partitions. E.g. you can store partitionID for each row in database.
However, if you change AF or partitions numbers you should update partitionID for entries in database as well.
The other way. If it is posible, you can load all the data in single node and then add other nodes to the grid. Data will rebalanced over nodes automatically.
b. AffinityKey is always used if it is as it shoud be part of entry key. So, lookup will always be happening on the node where the data resides.
c. I can't understand the question. Would you please clarify if it still is actual?

Redis Cross Slot error

I am trying to insert multiple key/values at once on Redis (some values are sets, some are hashes) and I get this error: ERR CROSSSLOT Keys in request don't hash to the same slot.
I'm not doing this from redis-cli but from some Go code that needs to write multiple key/values to a redis cluster. I see other places in the code where multiple key values are done this way and I don't understand why mine don't work. What are the hash requirements to not have this error?
Thanks
In a cluster topology, the keyspace is divided into hash slots. Different nodes will hold a subset of hash slots.
Multiple keys operations, transactions, or Lua scripts involving multiple keys are allowed only if all the keys involved are in hash slots belonging to the same node.
Redis Cluster implements all the single key commands available in the
non-distributed version of Redis. Commands performing complex
multi-key operations like Set type unions or intersections are
implemented as well as long as the keys all belong to the same node.
You can force the keys to belong to the same node by using Hash Tags
ERR CROSSSLOT Keys in request don't hash to the same slot
As the error message suggests, only if all of the keys belong to same slot will the operation succeed. Otherwise, you will see this failure message. This error would be seen even though both/all slots belongs to the same node. The check is very strict and, as per the code, all keys should hash to same slot.