What is best practice in redis when we need to store multiple objects of same category?
Currently I store the whole object as json string.
For ex: Books.
{
title:...,
author:...
}
Option 1:
set book:1 book1
set book:2 book2
set book:3 book3
usage: get book:1 will give me the book object.
Option 2: hash
hset book 1 book1
hset book 2 book2
hset book 3 book3
usage: hget book 1 will give me the whole book object
Is there any huge difference in terms of performance or usage etc?
Please share if you think of any other approach!
String vs Hash for storing JSON objects
Most of the relevant operations for Hashes and Strings are equivalent to each other in time complexity(GET/HGET/MGET/HMGET/SET/HSET). The only downside of using strings(GET/SET) is if you're concerned about being able to scan over the entire set of books as the SCAN and KEYS commands operate over the entire keyspace (every key in Redis) - even if your pattern was books:*, Redis would still need to iterate over all the keys. Whereas HSCAN and HGETALL operate only over the hash. In other words, if you're storing a lot of stuff in Redis, and you want to be able to iterate over all your books, a HASH can save you some time.
Storing JSON in Redis
You asked a question in the comments whether storing JSON is bad practice, there's definitely nothing wrong with it, and it's very common, the only downside of using JSON is that you can only GET/SET the entire string, and each time you pull the data out of Redis your app has to marshall the data from JSON and Serialize it back to JSON whenever you want to ship an update to the database. If that's a concern, it can be overcome with the RedisJson Module, but that's a whole separate topic.
Related
For example, I see many people are doing something like the following:
> set data:1000 "some string 1"
> set data:1001 "some string 2"
But what about using a hash to minimize the number of keys?
> hset data 1000 "some string 1"
> hset data 1001 "some string 2"
In the second way, it will only create one data key instead of creating many keys in the first way.
Which way is recommended?
I just see some people and tutorial are doing hset data:10 01 xxx. This is actually not related to my question. My question is simply asking what it's recommended between set data:1001 xxx and hset data 1001 xxx.
And I don't plan to modify hash-max-zipmap-entries and hash-max-zipmap-value. That means the hash will exceed the two values eventually. In such a config, are the two ways the same? or Which way is recommended?
Reasons to use strings:
you need per value timeouts
the values are semantically isolated
you're on cluster and want the values to be sharded over different nodes to spread load (sharding is based on the key)
Reasons to use hashes:
you want to be able to purge all of them at once (del/unlink), or have a timeout that impacts all of those values at once
you want to be able to enumerate them (prefer hscan/hgetall over scan/keys)
slightly better memory usage on the keys themselves
the values are semantically related
it is OK for all the values to be on the same node (whether single-server or cluster)
This all depends on the tradeoffs you want to support. In general, using hashes will have a smaller memory footprint than using simple keys. In fact, it is about an order of magnitude less memory. And access to hash values is constant time. So, if you are using redis simply as a key-value store, then hashes are way more efficient than simple keys.
However, you will want to use simple keys if you need to support expiration, keyspace notifications, etc, then you will need to use simple keys.
Just be careful to tweak the values for hash-max-zipmap-entries and hash-max-zipmap-value in your redis.conf in order to ensure that hashes are treated correctly for your environment.
You can read all about the details in the memory optimization section of the documentation.
I am using Jedis, and new to both that and Redis itself. I have db that stores hashes, and need to find all keys in the db that contain an entry with a specified key and a specified value. EG: "find all hashes in the db that have key/value of STATUS=ERROR". Is this possible in Jedis? From what I can tell from googling, hscan will find keys in a specified hash.
More generally, by way of teaching me to fish, any pointers for where to look this up? It seems there is no real jedis api doc, and not even the Redis doc itself seems to have nothing on hscan.
As you mentioned, you can use HSCAN to find the specified key-value pair from a hash. Also, you need to use the SCAN command to find all hashes.
However, this is NOT an efficient solution. In order to achieve your goal efficiently, you need to build an extra index, i.e. use a Redis SET to save keys of all hashes that have the specified key-value pair.
HSET hash1 STATUS ERROR
// ...
// HSET other members
// ...
// add it to index
SADD status:error hash1
// get all hashes have the specified key-value pair
SMEMBERS status:error
UPDATE:
As #Itamar Haber mentioned in the comments, if you have many records in the SET, you should use SSCAN to get these members. Since in this case, SMEMBERS might block Redis for a long time.
When creating a key in Redis, I get using the ":" format and treating it similar to a URL structure.
But what if that structure itself contains key-value type combinations? Does one put the key in the structure?
Made-up Example:
Option A) "country:usa:manufacturer:ford:vehicle:f150:color" = black
or
Option B) "usa:ford:f150:color" = black
In some ways, I think that there is strength in the structure of Option A, but it also adds a lot of complexity to the key.
Thoughts?
While keeping in mind your made-up example (do try to use an actual example, you'll get better answers) I would have to say neither.
I would go with an ID for the key, likely an int. then I'd put each key/value pair in your option A as a hash member and value.
For example:
HSET 1 country USA
HSET 1 manufacturer ford
And so on. Or you could use an hmset operation to set them all at once.
Why? You get the benefit of keeping the fields as describing the data (which you lose in your option b), the memory advantages of hashes over strings, and reduced complexity on key structure, not to mention the memory benefits of a short integer as keyname versus a long string.
Further, you have a memory cheap way to create indexes as integer sets. for example a key called "country:1" could be a set of entry IDs which then give you a way to "pull all entries for country ID 1" - USA in the example. By using integers you get the benefit of being able to store these all in a very memory efficient way, at the minor cost of a lookup table. This could even be done in lua to avoid a network hop.
The greater the range of possible combinations and entries, the more valuable the memory savings are. If you've got millions or billions of them, you'll want to follow the integer-ID & lookup route. This would also set you up nicely if you ever need to shard data - either server side or client side.
What is the most convenient/fast way to implement a sorted set in redis where the values are objects, not just strings.
Should I just store object id's in the sorted set and then query every one of them individually by its key or is there a way that I can store them directly in the sorted set, i.e. must the value be a string?
It depends on your needs, if you need to share this data with other zsets/structures and want to write the value only once for every change, you can put an id as the zset value and add a hash to store the object. However, it implies making additionnal queries when you read data from the zset (one zrange + n hgetall for n values in the zset), but writing and synchronising the value between many structures is cheap (only updating the hash corresponding to the value).
But if it is "self-contained", with no or few accesses outside the zset, you can serialize to a chosen format (JSON, MESSAGEPACK, KRYO...) your object and then store it as the value of your zset entry. This way, you will have better performance when you read from the zset (only 1 query with O(log(N)+M), it is actually pretty good, probably the best you can get), but maybe you will have to duplicate the value in other zsets / structures if you need to read / write this value outside, which also implies maintaining synchronisation by hand on the value.
Redis has good documentation on performance of each command, so check what queries you would write and calculate the total cost, so that you can make a good comparison of these two options.
Also, don't forget that redis comes with optimistic locking, so if you need pessimistic (because of contention for instance) you will have to do it by hand and/or using lua scripts. If you need a lot of sync, the first option seems better (less performance on read, but still good, less queries and complexity on writes), but if you have values that don't change a lot and memory space is not a problem, the second option will provide better performance on reads (you can duplicate the value in redis, synchronize the values periodically for instance).
Short answer: Yes, everything must be stored as a string
Longer answer: you can serialize your object into any text-based format of your choosing. Most people choose MsgPack or JSON because it is very compact and serializers are available in just about any language.
I'm learning how to use Redis for a project of mine. One thing I haven't got my head around is what exactly the colons are used for in the names of keys.
I have seen names of key such as these:
users:bob
color:blue
item:bag
Does the colon separate keys into categories and make finding the keys faster? If so can you use multiple colons when naming keys to break them down into sub categories? Lastly do they have anything to do with defining different databases within the Redis server?
I have read through documentation and done numerous Google searches on the matter but oddly I can't find anything discussing this.
The colons have been in earlier redis versions as a concept for storing namespaced data. In early versions redis supported only strings, if you wanted to store the email and the age of 'bob' you had to store it all as a string, so colons were used:
SET user:bob:email bob#example.com
SET user:bob:age 31
They had no special handling or performance characteristics in redis, the only purpose was namespacing the data to find it again. Nowadays you can use hashes to store most of the coloned keys:
HSET user:bob email bob#example.com
HSET user:bob age 31
You don't have to name the hash "user:bob" we could name it "bob", but namespacing it with the user-prefix we instantly know which information this hash should/could have.
Colons are a way to structure the keys. They are not interpreted by redis in any way. You can also use any other delimiter you like or none at all. I personally prefer /, which makes my keys look like file system paths. They have no influence on performance but you should not make them excessively long since redis has to keep all keys in memory.
A good key structure is important to leverage the power of the sort command, which is redis' answers to SQL's join.
GET user:bob:color -> 'blue'
GET user:alice:color -> 'red'
SMEMBERS user:peter:friends -> alice, bob
SORT user:peter:friends BY NOSORT GET user:*:color -> 'blue', 'red'
You can see that the key structure enables SORT to lookup the user's colors by referencing the structured keys.