How can I let multiple nodes to store one hash map in Redis

How can I let multiple nodes to store one hash map in Redis - redis

In cluster mode of Redis, is a piece of data with a specific key has to be stored in a specific node, no matter what data structure the it has (e.g. List/Hash)?
For example, I have a hash map:
HMSET website google www.google.com yahoo www.yahoo.com
The key of the hash map is "website", and the hash map has data {google:www.google.com, yahoo:www.yahoo.com}. In my understanding, the hash map is stored in only one node of the cluster. It will be not efficient when the hash map is large (e.g. 400M key-value pairs in one hash map).
My question is: is there a way to automatically distribute the contents of the hash map of the same key among the cluster? For example, store pair {google:www.google.com} in node 0 and store pair {yahoo www.yahoo.com} in node 1, when the key of the hash map is still "website"?

In cluster mode of Redis, is a piece of data with a specific key has to be stored in a specific node, no matter what data structure the it has (e.g. List/Hash)?
Yes - every key is mapped to a hash slot, that a single cluster instance manages.
My question is: is there a way to automatically distribute the contents of the hash map of the same key among the cluster?
No - data is distributed between nodes at key level. A given key's data structure cannot be distributed between multiple shards. To distribute the data, you'll have to model it using more keys.
Correctly modeling your needs requires knowing what type of operations you'll be performing against your distributed "hash map" and their respective frequencies. Feel free to add this information to the question, or open a new one that is more focused on your requirements.

Related

redis how to get first key-value pair of hash map

I would like to have something like the following table in redis.
host name
back queue
stanford.edu
23
microsoft.com
17
As far as I know, the best way to implement this is to use redis hashes (with host name as key and back queue as value). However, in my use case, I also want to get the first key-value pair present in the hash map.
How can this be implemented? Are there any redis datatypes specifically for this?

Copy one key from one redis instance to another

I have a Redis implementation with 6 nodes (3 masters 3 slaves - cluster enabled). I have load in every master an amount of keys.
So, my question is:
Is it possible to actual copy one key from 127.0.0.1:30001 to 127.0.0.1:30002?
For example lets say that my key has the name "testkey". If i copy this key from 30001 to 30002, when i want to get the key from 30001 or from 30002 the response must fetch the value of "testkey" in both calls.

No, that not how it works.
Keys in the cluster are assigned to hash slots and slots are assigned to master nodes. The keys' assignment is done by hashing their names (or the hash tag in them) so it is consistent, meaning that a given key name always hashes to the same slot.
A key can exist only once in the keyspace, but the slot it belongs to can be moved between masters. To scale reads from that key you can use the slave of the applicable master.
A good point to start understanding how the cluster works is by referring to the [tutorial](https://redis.io/topics/cluster-tutorial].

Redis: how to use it similar to multi-tables

It seems that Redis has no any entity corresponding to "table" in relational database.
For instance, I have to store:
(token, user_id)
(cart_id, token, [{product_id, count}])
If it doesn't separate store those two, the get method would search from both, which would cause chaos.
By the way, (cart_id, token, [{product_id, count}]) is a shopping cart, how to design such data structure in redis?

It seems that Redis has no any entity corresponding to "table" in relational database.
Right, because it is not a relational database. It is a data structure server which is very different and requires a different approach to be used well.
Ultimately to use Redis in the way it is intended you need to not think in relational terms, but think of the data structures you use in the code. More specifically, how do you need the data when you want to consume it? That will be the most likely way to store it in Redis.
In this case there are a few options, but the hash method works incredibly well for this one so I'll detail it here.
First, create a hash, call it users:to:tokens. Store as the key in the hash the user id, and the value the token. Next create the inverse, a hash called 'tokens:to:users'. You will probably be wanting both of these - the ability to look one up from the other - and this foundation will provide that.
Next, for your carts. This, too, will be a hash: carts:cart_id. In this hash you have the product_id and the count.
Finally up is your third hash token:to:cart which builds an index of tokens to cart id. I'd go a step further and do user:to:cart to be able to pull carts by user as well.
Now as to whether to store the keynote in the map or not, I tend to go with "no". By just storing the ID you can easily build the Redis cart key and not store the key's full path in the data store as well the saving memory usage.
Indeed, if you can do so use integers for all of your IDs. By using integers you can take advantage of Redis' integer storage optimizations to keep memory usage down. Hashes storing integers are quite efficient and very fast.
If needed you can use Redis to build your IDs. You can use the INCR command to build a counter for each data type such as userid:counter, cartid:counter, and tokenid:counter. As INCR returns the new value you make a single call to increment and get the new ID and get cartid:counter will always give you the largest ID if you wanted to quickly see how many carts have been created. Kinda neat , IMO.
Now, where it gets tricky is if you want to use expiration to automatically expire carts as opposed to leaving them to "lie around" until you want to clean things up. By setting an expiration on the cart hash (which has the product,count mapping) your carts will automatically expire. However, their references will still be hanging out in the token:to:cart hash. Removing that is a simple periodic task which treats over the members of token:to:cart and does an exists check on the cart's key. If it doesn't exist delete it from the hash.

Redis is a key-value storage. From redis.io:
Redis is an open source (BSD licensed), in-memory data structure
store, used as database, cache and message broker. It supports data
structures such as strings, hashes, lists, sets, sorted sets with
range queries, bitmaps, hyperloglogs and geospatial indexes with
radius queries.
So if you want to store two diffetent types (tokens and carts) you will need to store two keys for different datatypes. For example:
127.0.0.1:6379> hset tokens.token_id#123 user user123
(integer) 1
127.0.0.1:6379> hget tokens.token_id#123 user
"user123"
Where tokens is a namespace for tokens only. It is stored as Redis-Hash:
Redis Hashes are maps between string fields and string values, so they
are the perfect data type to represent objects
To store lists I would do the following:
127.0.0.1:6379> hmset carts.cart_1 token token_id#123 cart_contents cart_contents_key1
OK
127.0.0.1:6379> hmget carts.cart_1 token cart_contents
1) "token_id#123"
2) "cart_contents_key1" # cart_contents is a list of receipts.
cart_contents are represented as a Redis-List:
127.0.0.1:6379> rpush cart_contents.cart_contents_key1 receipt_key1
(integer) 1
127.0.0.1:6379> lrange cart_contents.cart_contents_key1 0 -1
1) "receipt_key1"
Receipt is Redis-Hash for a tuple (product_id, count):
127.0.0.1:6379> hmset receipts.receipt_key1 product_id 43 count 2
OK
127.0.0.1:6379> hmget receipts.receipt_key1 product_id count
1) "43" # Your final product id.
2) "2"
But do you really need Redis in this case?

Is it possible to do LIST operations on the value of a HASH?

I am still new to Redis and wondering if it would be possible to have a HASH of LIST.
Then I could do for example LPOP HASH myKey where the hash set holds each list's key and the lists contains data that I want to manipulate.

Redis does not provide nested data structures, therefore a List of Hashes isn't possible. A Redis List can only contain strings, but what you could do is store the Hashes' key names in a List and do HGET after popping.

DynamoDB: When to use what PK type?

I am trying to read up on best practices on DynamoDB. I saw that DynamoDB has two PK types:
Hash Key
Hash and Range Key
From what I read, it appears the latter is like the former but supports sorting and indexing of a finite set of columns.
So my question is why ever use only a hash key without a range key? Is it a viable choice only when the table is not searched?
It'd also be great to have some general guidelines on when to use what key type. I've read several guides (including Amazon's own documentation on DynamoDB) but none of them appear to directly address this question.
Thanks

The choice of which key to use comes down to your Use Cases and Data Requirements for a particular scenario. For example, if you are storing User Session Data it might not make much sense using the Range Key since each record could be referenced by a GUID and accessed directly with no grouping requirements. In general terms once you know the Session Id you just get the specific item querying by the key. Another example could be storing User Account or Profile data, each user has his own and you most likely will access it directly (by User Id or something else).
However, if you are storing Order Items then the Range Key makes much more sense since you probably want to retrieve the items grouped by their Order.
In terms of the Data Model, the Hash Key allows you to uniquely identify a record from your table, and the Range Key can be optionally used to group and sort several records that are usually retrieved together. Example: If you are defining an Aggregate to store Order Items, the Order Id could be your Hash Key, and the OrderItemId the Range Key. Whenever you would like to search the Order Items from a particular Order, you just query by the Hash Key (Order Id), and you will get all your order items.
You can find below a formal definition for the use of these two keys:
"Composite Hash Key with Range Key allows the developer to create a
primary key that is the composite of two attributes, a 'hash
attribute' and a 'range attribute.' When querying against a composite
key, the hash attribute needs to be uniquely matched but a range
operation can be specified for the range attribute: e.g. all orders
from Werner in the past 24 hours, or all games played by an individual
player in the past 24 hours." [VOGELS]
So the Range Key adds a grouping capability to the Data Model, however, the use of these two keys also have an implication on the Storage Model:
"Dynamo uses consistent hashing to partition its key space across its
replicas and to ensure uniform load distribution. A uniform key
distribution can help us achieve uniform load distribution assuming
the access distribution of keys is not highly skewed."
[DDB-SOSP2007]
Not only the Hash Key allows to uniquely identify the record, but also is the mechanism to ensure load distribution. The Range Key (when used) helps to indicate the records that will be mostly retrieved together, therefore, the storage can also be optimized for such need.
Choosing the correct keys to represent your data is one of the most critical aspects during your design process, and it directly impacts how much your application will perform, scale and cost.
Footnotes:
The Data Model is the model through which we perceive and manipulate our data. It describes how we interact with the data in the database [FOWLER]. In other words, it is how you abstract your data model, the way you group your entities, the attributes that you choose as primary keys, etc
The Storage Model describes how the database stores and manipulates the data internally [FOWLER]. Although you cannot control this directly, you can certainly optimize how the data is retrieved or written by knowing how the database works internally.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas