Redis design data structure based on secondary indices

Redis design data structure based on secondary indices - redis

Let's say I have to store the following object in my cache-
{
student_id: "student123",
school_id: "xyz123",
class_id: "class123"
}
How do I design my Redis data structure in a way where I can retrieve the object by any of the ids?
I tried doing a HSET command: HSET student123 school_id xyz123 class_id class123 but this creates a hash for the specific student_id. I also want to make sure that the search is in O(1). Thanks in advance!
To clarify, if I have to search by school_id, how would i go about that?

You need to use multiple keys indexes to get O(1) in your queries.
Consider using other data structures as well. Take a look at Secondary indexing with Redis, how to have relations many to many in redis and this other article on many to many.
Say, using sets, you add the {student123, xyz456, class789} entry as:
SADD student:student123 "xyz456 class789"
SADD school:xyz456 "student123 class789"
SADD class:class789 "xyz456 student123"
You may think "this will increase my memory usage a lot". It does indeed. It is the usual trade-off between memory and processing. Relational databases also do this when creating indexes. But Redis will give you sub-millisecond performance, and Redis uses multiple tricks to optimize memory usage, like ziplists, see https://redis.io/topics/memory-optimization.
What mix of data structures is best depends on the specifics of your use-case.
Consider removing the prefix in your keys if they are constant, just be consistent on the order you place them in the value.
SADD student:123 "456 789"
Keep in mind that sets and sorted sets allow unique members only. If you use one sorted set for students using student ID as score: ZADD students 123 "456 789", and then add another student at the same school-class with ZADD students 235 "456 789" this actually updated the score for "456 789", it didn't add a new value.

Related

Composite Primary Key equivalent in Redis

I'm new to nosql databases so forgive my sql mentality but I'm looking to store data that can be 'queried' by one of 2 keys. Here's the structure:
{user_id, business_id, last_seen_ts, first_seen_ts}
where if this were a sql DB I'd use the user_id and business_id as a primary composite key. The sort of querying I'm looking for is a
1.'get all where business_id = x'
2.'get all where user_id = x'
Any tips? I don't think I can make a simple secondary index based on the 2 retrieval types above. I looked into commands like 'zadd' and 'zrange' but there isn't really any sorting involved here.
The use case for Redis for me is to alleviate writes and reads on my SQL database while this program computes (doing its storage in redis) what eventually will be written to the SQL DB.

Note: given the OP's self-proclaimed experience, this answer is intentionally simplified for educational purposes.
(one of) The first thing(s) you need to understand about Redis is that you design the data so every query will be what you're used to think about as access by primary key. It is convenient, in that sense, to imagine Redis' keyspace (the global dictionary) as something like this relational table:
CREATE TABLE redis (
key VARCHAR(512MB) NOT NULL,
value VARCHAR(512MB),
PRIMARY KEY (key)
);
Note: in Redis, value can be more than just a String of course.
Keeping that in mind, and unlike other database models where normalizing data is the practice, you want to have your Redis ready to handle both of your queries efficiently. That means you'll be saving the data twice: once under a primary key that allows searching for businesses by id, and another time that allows querying by user id.
To answer the first query ("'get all where business_id = x'"), you want to have a key for each x that hold the relevant data (in Redis we use the colon, ':', as separator as a matter of convention) - so for x=1 you'd probably call your key business:1, for x=a1b2c3 business:a1b2c3 and so forth.
Each such business:x key could be a Redis Set, where each member represents the rest of the tuple. So, if the data is something like:
{user_id: foo, business_id: bar, last_seen_ts: 987, first_seen_ts: 123}
You'd be storing it with Redis with something like:
SADD business:bar foo
Note: you can use any serialization you want, Set members are just Strings.
With this in place, answering the first query is just a matter of SMEMBERS business:bar (or SSCANing it for larger Sets).
If you've followed through, you already know how to serve the second query. First, use a Set for each user (e.g. user:foo) to which you SADD user:foo bar. Then SMEMBERS/SSCAN and you're almost home.
The last thing you'll need is another set of keys, but this time you can use Hashes. Each such Hash will store the additional information of the tuple, namely the timestamps. We can use a "Primary Key" made up of the bussiness and the user ids (or vice versa) like so:
HMSET foo:bar first 123 last 987
After you've gotten the results from the 1st or 2nd query, you can fetch the contents of the relevant Hashes to complete the query (assuming that the queries return the timestamps as well).

The idiomatic way of doing this in Redis is to use a SET for each type of query you want to do.
In your case you would create:
a hash for each tuple (user_id, business_id, last_seen_ts, first_seen_ts)
a set with a name like user:<user_id>:business:<business_id>, to store the keys of the hashes for this user and this business (you have to add the ID of the hashes with SADD)
Then to get all data for a given user and business, you have to get the SET content with SMEMBERS first, and then to GET every HASH whose ID is in the SET.

Is it possible to get all values for a Redis set in one operation?

Say I add two keys:
SET husband Bob
SET wife Alice
Then add these to a set:
SADD family husband wife
I can get the keys of this set with SMEMBERS family, which will return:
1) "wife"
2) "husband"
What I really want is the values:
1) "Alice"
2) "Bob"
Is this possible, in one operation? Essentially, I want to pipeline SMEMBERS with MGET.

SMEMBERS, but if the Set is big enough your database will take time to return all the members, during which it will be blocked. In such cases the use of SSCAN is recommended.
EDIT: missed the question itself :) use SORT family BY nosort GET *

The first thing you need to understand is Redis doesn't allow you to put a set in a set - no nested data structures. Likewise they are no vets and values in a set - only members which are strings in your case. This is why you can't "get the values".
It really sounds like the structure you are wanting is not a set but a hash. Using a hash will allow you do do exactly what you ask for.
hset family husband bob
hset family wife alice
And then to get what you call the values use hvals family.
This is the proper way to do it as it uses the right data structure, matches your terminology to the structure, and provides exaxtly what you want led without the performance penalties of sort. Further it allows you to use the various hash commands to select or update specific members of the family.
You could also use hgetall to use the mapping. That way your code knows Alice is the wife rather than the daughter.
Overall hashes are certainly the way to go here for virtually every reason.

Redis: how to use it similar to multi-tables

It seems that Redis has no any entity corresponding to "table" in relational database.
For instance, I have to store:
(token, user_id)
(cart_id, token, [{product_id, count}])
If it doesn't separate store those two, the get method would search from both, which would cause chaos.
By the way, (cart_id, token, [{product_id, count}]) is a shopping cart, how to design such data structure in redis?

It seems that Redis has no any entity corresponding to "table" in relational database.
Right, because it is not a relational database. It is a data structure server which is very different and requires a different approach to be used well.
Ultimately to use Redis in the way it is intended you need to not think in relational terms, but think of the data structures you use in the code. More specifically, how do you need the data when you want to consume it? That will be the most likely way to store it in Redis.
In this case there are a few options, but the hash method works incredibly well for this one so I'll detail it here.
First, create a hash, call it users:to:tokens. Store as the key in the hash the user id, and the value the token. Next create the inverse, a hash called 'tokens:to:users'. You will probably be wanting both of these - the ability to look one up from the other - and this foundation will provide that.
Next, for your carts. This, too, will be a hash: carts:cart_id. In this hash you have the product_id and the count.
Finally up is your third hash token:to:cart which builds an index of tokens to cart id. I'd go a step further and do user:to:cart to be able to pull carts by user as well.
Now as to whether to store the keynote in the map or not, I tend to go with "no". By just storing the ID you can easily build the Redis cart key and not store the key's full path in the data store as well the saving memory usage.
Indeed, if you can do so use integers for all of your IDs. By using integers you can take advantage of Redis' integer storage optimizations to keep memory usage down. Hashes storing integers are quite efficient and very fast.
If needed you can use Redis to build your IDs. You can use the INCR command to build a counter for each data type such as userid:counter, cartid:counter, and tokenid:counter. As INCR returns the new value you make a single call to increment and get the new ID and get cartid:counter will always give you the largest ID if you wanted to quickly see how many carts have been created. Kinda neat , IMO.
Now, where it gets tricky is if you want to use expiration to automatically expire carts as opposed to leaving them to "lie around" until you want to clean things up. By setting an expiration on the cart hash (which has the product,count mapping) your carts will automatically expire. However, their references will still be hanging out in the token:to:cart hash. Removing that is a simple periodic task which treats over the members of token:to:cart and does an exists check on the cart's key. If it doesn't exist delete it from the hash.

Redis is a key-value storage. From redis.io:
Redis is an open source (BSD licensed), in-memory data structure
store, used as database, cache and message broker. It supports data
structures such as strings, hashes, lists, sets, sorted sets with
range queries, bitmaps, hyperloglogs and geospatial indexes with
radius queries.
So if you want to store two diffetent types (tokens and carts) you will need to store two keys for different datatypes. For example:
127.0.0.1:6379> hset tokens.token_id#123 user user123
(integer) 1
127.0.0.1:6379> hget tokens.token_id#123 user
"user123"
Where tokens is a namespace for tokens only. It is stored as Redis-Hash:
Redis Hashes are maps between string fields and string values, so they
are the perfect data type to represent objects
To store lists I would do the following:
127.0.0.1:6379> hmset carts.cart_1 token token_id#123 cart_contents cart_contents_key1
OK
127.0.0.1:6379> hmget carts.cart_1 token cart_contents
1) "token_id#123"
2) "cart_contents_key1" # cart_contents is a list of receipts.
cart_contents are represented as a Redis-List:
127.0.0.1:6379> rpush cart_contents.cart_contents_key1 receipt_key1
(integer) 1
127.0.0.1:6379> lrange cart_contents.cart_contents_key1 0 -1
1) "receipt_key1"
Receipt is Redis-Hash for a tuple (product_id, count):
127.0.0.1:6379> hmset receipts.receipt_key1 product_id 43 count 2
OK
127.0.0.1:6379> hmget receipts.receipt_key1 product_id count
1) "43" # Your final product id.
2) "2"
But do you really need Redis in this case?

best way to store a set linked to a hash in Redis?

I need to store data about classrooms and students in Redis.
I have hashes for classroom info, e.g.:
classroom:0
where 0 is the class room id and it has field value pairs like:
classroomName -> xx, teacherId -> yy
In order to store students for these classroom, I have separate Set, e.g:
studentsForClassroom:0, and this set contains array of student IDs in that class.
Following this design, in order to get all information about a class, I have to first do a hgetall command for classroom:0 and then a smembers command for studentsForClassroom:0.
Is this the right way? Any better solution?
Is it possible that the students SET can somehow be nested in the classroom hash so that when I do a hgetall, the entire students array is populated right there in the classroom data?

You're doing it right. Redis doesn't have nested data structures.
Since your classroom hashes and students sets are not too big, using HGETALL and SMEMBERS is OK but remember that for larger volumes you'd probably want to use HSCAN and SSCAN instead.

You should not be worried about this. Redis is blazing fast and it's the usual way to use it: making a lot of simple requests. Moreover node-redis automatically pipelines commands.
If you really have perfomance issues, ensure you installed hiredis. Node_redis will use it.

Redis Indexes: Storing full key vs. ID

Given this example:
user:1 email bob#bob.com
user:1 name bob
Based on my research, all the examples create an "index" similar to the following:
user:bob#bob.com 1
My question is: wouldn't it be better to store it as "user:1"? That would eliminate the need to concatenate the string in code. Is there some other reason not to store the whole string? Memory maybe?

The question was specifically about storing the full key in the index or just a numeric ID which is part of this key.
Redis has a number of memory optimizations that you may want to leverage to decrease general memory consumption. One of these optimizations is the intset (an efficient structure to represent sets of integers).
Very often, sets are used as index entries, and in that case, it is much better to store a numeric ID rather than an alphanumeric key, to benefit from the intset optimization.
Your example is slightly different because a given email address should be associated to only one user. A unique hash object is fine to store the whole index. I would still use numeric ID here since it is more compact, and may benefit from future Redis optimizations.

Based on what you've conveyed so far, I'd use Redis hashes. For example, I'd denormalize the data a bit and store is as hmset users:1 email bob#bob.com name Bob and 'hset users:lookup:email bob#bob.com 1'.
This way, I can retrieve the user using both his email ID or user ID. You could create more lookup hashes depending on your needs.
For more useful patterns, look at the Little Redis book, written by Salvatore Sanfilippo himself.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas