Redis modelling - redis

What is the best way to model the following scenario? User has multiple portfolios, each with multiple stocks.
I have come up with the following:
Stocks will be in a hash as below
stk:1 {name:A, ticker:val, sector:val ..}
stk:2 {name:B, ticker:val, sector:val ..}
Users can be a hash as below: (is it better to store portfolios for a user seperately as a set?)
user:1 {k1:val1, k2:val2, portfolios:"value|growth|short term"}
user:2 {k1:val3, k2:val4, portfolios:"value|defensive|penny"}
Stocks in a portfolio can be sets
user:1:value (1,3)
user:2:value (2,3,4)
user:1:short term (1,5)
user:2:penny (4)
In order to add/remove a portfolio for a user, its required to 'HGET user:n portfolios' followed by a HSET
Is this a good way to model as the number of users and portfolios grow?

If a user can have multiple portfolio types then it would be best to separate them into their own sets.
sadd user:1:portfolios value growth "short term"
This makes removing a portfolio from a user as simple as calling srem user:1:portfolios value on the set (and of course removing the "user:ID:TYPE" set).
When you want to lookup stocks for a user based on portfolio type you can do so using the sunionstore and sort command (example in Ruby):
keys = redis.smembers('user:1:portfolios').map do |type|
"user:1:#{type}"
end
redis.multi do |r|
r.sunionstore "user:1:stocks:_tmp", *keys
r.sort "user:1:stocks:_tmp", get: ["stk:*->name", "stk:*->ticket"]
r.del "user:1:stocks:_tmp"
end
stk:*->name will return only the hash values for name. If you want to get all entries in the hash specify each of them using the 'KEY->HASHKEY' syntax.
http://redis.io/commands/sort

There is no best way to model something: it all depends on your access paths.
For instance, your proposal will work well if you systematically access the data from the user perspective. It will be very poor if you want to know which users have a certain stock in their portfolios. So my suggestion would be to list all the expected access paths and check they are covered by the data structure.
Supposing you only need the user perspective, I would rather materialize the portfolios as a separate set, instead of storing a serialized list in the user hash: they will be easier to maintain. Because you can use pipelining (or scripting) to run multiple commands in a single roundtrip, there is no real overhead.

Related

Is hash always preferable if I'm always getting multiple fields in redis?

Let's say I am checking information about some of my users every second. I need to take an action on some of those users that may take more than a second. Something like this:
#pseudocode
users = DB.query("SELECT * FROM users WHERE state=5");
users.forEach(user => {
if (user.needToDoThing()) {
user.doThatThing();
}
});
I want to make sure I won't accidentally run doThatThing on a user who has it already running. I am thinking of solving it by setting cache keys based on the user ID as things are processed
#pseudocode
runningUsers = redis.getMeThoseUsers();
users = DB.query("SELECT * FROM users WHERE state=5 AND id NOT IN (runningUsers)");
redis.setThoseUsers(users);
users.forEach(user => {
if (user.needToDoThing()) {
user.doThatThing();
}
redis.unsetThatUser(user);
});
I am unsure if I should...
Use one hash with a field per user
Use multiple keys with mset and hget
Is there a performance or business reason I'd want one over the other? I am assuming I should use a hash so I can use hgetall to know who is running on that hash vs doing a scan on something like runningusers:*. Does that seem right?
Generally speaking, option 1 (Use one hash with a field per user) is probably the best method in most cases because you want to access all fields for the users at once. It can be achieved by using HGETALL.
But when go you for 2nd option (use multiple keys with mset and mget) you want to query every single time in redis to get the user details.By using MGET you can access all key values but you want to know the key name for each users. It will be suitable when you are accessing few fields in an object.Disadvantages: possibly slower when you need to access all/most of the fields for the users.
NOTE: By using 1st option you can't set TTL for single users because in redis there is no support for TTL for child keys in hash structure,the only way you should set for entire hash.But by using 2nd option, you can set TTL for every single users.

Organising de-normalised data in redis

In a Redis database I have a number of hashes corresponding to "story" objects.
I have an ordered set stories containing all keys of the above (the stories) enabling convenient retrieval of stories.
I now want to store arbitrary emoticons (ie. the Unicode characters corresponding to "smiley face" etc) with stories as "user emotions" corresponding to the emotion the story made the user feel.
I am thinking of:
creating new hashes called emotions containing single emoticons (one per emotion expressed)
creating a hash called story-emotions that enables efficient retrieval of and counting of all the emotions associated with a story
creating another new hash called user-story-emotions mapping user IDs to items in the story-emotion hash.
Typical queries will be:
retrieve all the emotions for a story for the current user
retrieve the count of each kind of emotion for the 50 latest stories
Does this sound like a sensible approach?
Very sensible, but I think I can help make it even more so.
To store the emoticons dictionary, use two Hashes. The first, lets call it emoticon-id should have a field for each emoticon expressed. The field name is the actual Unicode sequence and the value is a unique integer value starting from 0, and increasing for each new emoticon added.
Another Hash, id-emoticon, should be put in place to do the reverse mapping, i.e. from field names that are ids to actual Unicode values.
This gives you O(1) lookups for emoticons, and you should also consider caching this in your app.
To store the user-story-emotions data, look into Redis' Bitmaps. Tersely, use the emoticon id as index to toggle the presence/lack of it by that user towards that story.
Note that in order to keep things compact, you'll want popular emotions to have low ids so your bitmaps remain a small as possible.
To store the aggregative story-emotions, the Sorted Set would be a better option. Elements can be either id or actual unicode, and the score should be the current count. This will allow you to fetch the top emoticons (ZREVRANGEBYSCORE) and/or page similarly to how you're doing with the recent 50 stories (I assume you're using the stories Sorted Set for that).
Lastly, when serving the second query, use pipelining or Lua scripting when fetching the bulk of 50 story-emotion counter values in order get more throughput and better concurrency.

What is the best way to retrieve soccer games by league names in redis?

I have a hundreds of soccer games saved in my redus database. They are saved in hashes under the key: games:soccer:data I have three z sets to clasify them into upcoming, live, and ended. All ordered by date (score). This way I can easily retrieve them depending on if will start soon, they are already happening, or they already ended. Now, i want to be able to retrieve them by league names.
I came up with two alternatives:
First alternative: save single hashes containing the game id and the league name. This way I can get all live game ids and then check each id against their respective hashes, if it matches the league name(s) i want, then i push it into an array, if not, i skip it. Finally, return the array with all game ids for the leagues i wanted.
Second alternative: create keys for each league and have live, upcoming, and ended sets for each. This way, i think, it would be faster to retrieve the game ids; however, it would be a pain to maintain each set.
If you have any other way of doing this, please let me know. I don't know if sorting would be faster and save me some memory.
I am looking for speed and low memory usage.
EDIT (following hobbs alternative):
const multi = client.multi();
const tempSet = 'users:data:14:sports:soccer:lists:temp_' + getTimestamp();
return multi
.sunionstore(
tempSet,
[
'sports:soccer:lists:leagueNames:Bundesliga',
'sports:soccer:lists:leagueNames:La Liga'
]
)
.zinterstore(
'users:data:14:sports:soccer:lists:live',
2,
'sports:lists:live',
tempSet
)
.del(tempSet)
.execAsync()
I need to set AGGREGATE MAX to my query and I have no idea how.
One way would be to use a SET containing all of the games for each league, and use ZINTERSTORE to compute the intersection between your league sets and your existing sets. You could do the ZINTERSTORE every time you query the data (it's not a horribly expensive operation unless your data is very large), or you could do it only when writing to one of the "parent" sets, or you could treat it as a sort of cache by giving it a short TTL and creating it only if it doesn't exist when you go to query it.

How to combine muli-fields values and sorted time-ranges using Redis

I am trying to insert time based records with multiple fields on the values (with TTL enabled).
For the multiple fields the best way to do it via Redis is using HSET:
HSET user:32 name "johns" timecreated "3333311232" address "somewhere"
I also try to read those values via time range:
for example return all history records (for example user 32) which was inserted in the last day:
so the best for that would be storing via ZADD using scores(this time I am losing the hash-map structure for easy retrieval):
ZADD user:32 3333311232 "name=johns,timecreated=3333311232,address=somewhere"
On the top of the things I want to add TTL for each record
Any idea how I could optimize my design?
I could split into two but that will requires two queries when reading:
ZADD user:32 3333311232 "user:32:3333311232"
HMSET user:32:3333311232 name “johns” timecreated “3333311232” address="somewhere"
than to retrieve ill need:
//some range
ZRANGEBYSCORE user:32 3333311232 333331123
result: 1389772850
now to get all information: HGETALL user:32:1389772850
What do you think?
Thank you,
ray.
The two methods you describe are the two common approaches. If you store the entire object in the ZSET, you would typically store it as a JSON string. If you don't need "random" access to the object, that's a valid approach.
I usually go for the other approach; a ZSET combined with hashes. the two queries are not a big deal. You could even abstract it away with a Lua script; see EVAL.
Regarding the TTL, while you cannot expire individual ZSET values, you could expire the hash, and use keyspace notifications to listen for the expired event, and remove the corresponding value from the ZSET.
Let me know if you need some more specifics.

Compound Queries with Redis

For learning purposes I'm trying to write a simple structured document store in Redis. In my example application I'm indexing millions of documents that look a little like the following.
<book id="1234">
<title>Quick Brown Fox</title>
<year>1999</year>
<isbn>309815</isbn>
<author>Fred</author>
</book>
I'm writing a little query language that allows me to say YEAR = 1999 AND TITLE="Quick Brown Fox" (again, just for my learning, I don't care that I'm reinventing the wheel!) and this should return the ID's of the matching documents (1234 in this case). The AND and OR expressions can be arbitrarily nested.
For each document I'm generating keys as follows
BOOK_TITLE.QUICK_BROWN_FOX = 1234
BOOK_YEAR.1999 = 1234
I'm using SADD to plop these documents in a series of sets in the form KEYNAME.VALUE = { REFS }.
When I do the querying, I parse the expression into an AST. A simple expression such as YEAR=1999 maps directly to a SMEMBERS command which gets me the set of matching documents back. However, I'm not sure how to most efficiently perform the AND and OR parts.
Given a query such as:
(TITLE=Dental Surgery OR TITLE=DIY Appendectomy)
AND
(YEAR = 1999 AND AUTHOR = FOO)
I currently make the following requests to Redis to answer these queries.
-- Stage one generates the intermediate results and returns RANDOM_GENERATED_KEY3
SUNIONSTORE RANDOMLY_GENERATED_KEY1 BOOK_TITLE.DENTAL_SURGERY BOOK_TITLE.DIY_APPENDECTOMY
SINTERSTORE RANDOMLY_GENERATED_KEY2 BOOK_YEAR.1999 BOOK_YEAR.1998
SINTERSTORE RANDOMLY_GENERATED_KEY3 RANDOMLY_GENERATED_KEY1 RANDOMLY_GENERATED_KEY2
-- Retrieving the top level results just requires the last key generated
SMEMBERS RANDOMLY_GENERATED_KEY3
When I encounter an AND I use SINTERSTORE based on the two child keys (and similarly for OR I use SUNIONSTORE). I randomly generate a key to store the results in (and set a short TTL so I don't fill Redis up with cruft). By the end of this series of commands the return value is a key that I can use to retrieve the results with SMEMBERS. The reason I've used the store functions is that I don't want to transport all the matching document references back to the server, so I use temporary keys to store the result on the Redis instance and then only bring back the matching results at the end.
My question is simply, is this the best way to make use of Redis as a document store?
I'm using a similar approach with sorted sets to implement full text indexing. The overall approach is good, though there are a couple of fairly simple improvements you could make.
Rather than using randomly generated keys, you can use the query (or a short form thereof) as the key. That lets you reuse the sets that have already been calculated, which could significantly improve performance if you have queries across two large sets that are commonly combined in similar ways.
Handling title as a complete string will result in a very large number of single member sets. It may be better to index individual words in the title and filter the final results for an exact match if you really need it.