Segmenting Redis By Database - redis

By default, Redis is configured with 16 databases, numbered 0-15. Is this simply a form of name spacing, or are there performance implications of segregating by database ?
For example, if I use the default database (0), and I have 10 million keys, best practices suggest that using the keys command to find keys by wildcard patterns will be inefficient. But what if I store my major keys, perhaps the first 4 segments of 8 segment keys, resulting in a much smaller subset of keys in a separate database (say database 3). Will Redis see these as a smaller set of keys, or do all keys across all databases appear as one giant index of keys ?
More explicitly put, in terms of time complexity, if my databases look like this:
Database 0: 10,000,000 keys
Database 3: 10,000 keys
will the time complexity of keys calls against Database 3 be O(10m) or will it be O(10k) ?
Thanks for your time.

Redis has a separate dictionary for each database. From your example, the keys call against database 3 will be O(10K)
That said, using keys is against best practice. Additionally, using multiple databases for the same application is against best practices as well. If you want to iterate over keys, you should index them in an application specific way. A SortedSet is a good way way to build an index.
References :
The structure redisServer has an array of redisDB. See redisServer in redis.h
Each redisDB has its own dictionary object. See redisDB in redis.h
keys command operates on the dictionary for the current database

Related

Redis - How to load multiple rows with the same key into Redis?

What is the best approach to load CSV with example:
id1,mike,123
id1,joe,234
id2,ben,235
id2,jack,445
The need is to query based on a first column (key) but there are keys that are repeating...
I recommend you to use HASHES because you're trying to do an object representation. According to best practices, you should use it every time it is possible; the key would be your first column and the value would be the repeating lines.
If you want more information about Redis data types, you can go on: https://redis.io/topics/data-types
Also this link is very useful for optimizing Redis: https://redis.io/topics/memory-optimization
From the memory optimization page:
Use hashes when possible
Small hashes are encoded in a very small space, so you should try
representing your data using hashes every time it is possible

Which approach is better when using Redis?

I'm facing following problem:
I wan't to keep track of tasks given to users and I want to store this state in Redis.
I can do:
1) create list called "dispatched_tasks" holding many objects (username, task)
2) create many (potentialy thousands) lists called dispatched_tasks:username holding usually few objects (task)
Which approach is better? If I only thought of my comfort, I would choose the second one, as from time to time I will have to search for particular user tasks, and this second approach gives this for free.
But how about Redis? Which approach will be more performant?
Thanks for any help.
Redis supports different kinds of data structures as shown here. There are different approaches you can take:
Scenario 1:
Using a list data type, your list will contain all the task/user combination for your problem. However, accessing and deleting a task runs in O(n) time complexity (it has to traverse the list to get to the element). This can have an impact in performance if your user has a lot of tasks.
Using sets:
Similar to lists, but you can add/delete/check for existence in O(1) and sets elements are unique. So if you add another username/task that already exists, it won't add it.
Scenario 2:
The data types do not change. The only difference is that there will be a lot more keys in redis, which in can increase the memory footprint.
From the FAQ:
What is the maximum number of keys a single Redis instance can hold? and what the max number of elements in a Hash, List, Set, Sorted
Set?
Redis can handle up to 232 keys, and was tested in practice to handle
at least 250 million keys per instance.
Every hash, list, set, and sorted set, can hold 232 elements.
In other words your limit is likely the available memory in your
system.
What's the Redis memory footprint?
To give you a few examples (all obtained using 64-bit instances):
An empty instance uses ~ 3MB of memory. 1 Million small Keys ->
String Value pairs use ~ 85MB of memory. 1 Million Keys -> Hash
value, representing an object with 5 fields, use ~ 160 MB of
memory. To test your use case is trivial using the
redis-benchmark utility to generate random data sets and check with
the INFO memory command the space used.

Using key-value databases as a set with persistent indices

Since the below got a bit long: Here's the tl;dr; version: Is there an existing key/value best-practice for fast key and value lookup, something like a hash-based set with persistent indices?
I'm interested in the world of key-value databases and have so far failed to figure out how one would efficiently implement the following use-case:
Assume we want to serialize some data and reference them somewhere else by a persistent, unique integer index. Thus e.g.: Key = unsigned int, Value = MyData.
The database should have fast key lookup and ensure that MyData is unique.
Now, when I insert a new value into my the database, I could assign it a new index key, e.g. the current size of the database or to prevent clashes after removing items, I could keep some counter externally.
But how would I ensure that I do not insert the same MyData value into my database? So far, it looks to me as if this is not efficiently possible with key-value databases - is this correct? I.e. I do not want to iterate over the whole database just to ensure MyData value is not in there already...
What is the best pratice to implement this, then?
For background: I work on KDevelop where we use the above for our code analysis cache. We actually have a custom implementation of the above use-case 1. Search for Bucket and ItemRepository if you are interested in the internals, and see 2 for an examplatory usage of the ItemRepository.
But you will probably agree, that this code is quite hard to understand and thus hard to maintain. I want to compare its performance to alternative solutions which might result in simpler code - but only if it does not incur a severe performance penalty. Considering the hype around the performance of key-value storages such as OpenLDAP MDB, Kyoto Cabinet and LevelDB, this is where I wanted to start.
What we have in KDevelop - as far as I figured out - is basically a sort of hybrid on-disk/in-memory hash map which gets saved to disk periodically (which of course can result in major data corruption in case of crashes etc.). Items are stored in a location based on their hash value which then of course also allows relatively fast value lookups as long as the hash function is fast. The added twist is that you also get some sort of persistent database index which can be used to lookup the items quite efficiently.
So - long story short - how would one do that with a key/value database such as LevelDB, Kyoto Cabinet, OpenLDAP MDB - you name it?
Sounds like you want to do what OpenLDAP does with its Equality index. Perhaps this is the same as the OrientDB example, I didn't read it.
The main table is indexed by a monotonically increasing integer key (called the entryID), and stores the data value. The equality index is indexed by a hash of the value, and stores a list of entryIDs that match the hash. Since the hash might have collisions, just the existence of an entry in the equality index doesn't prove uniqueness or duplication. You still need to check the actual values.
A faster/simpler approach, if you're using MDB, BDB, or some other database that supports duplicate keys, is to just keep one table, using the hash as the key. In both MDB and BDB there is a GET_BOTH request which matches both the key and the data to perform a fetch. If it succeeds then you know for certain that the value already exists. Otherwise, it allows you to save whatever data values and not worry whether or not there are hash collisions.
A caveat here, in MDB using duplicate keys, the size of the values is limited to less than one half of a disk page.
Unless I'm missing something here - typically your hash algorithm is consistent and will provide the same key for the same data. Thus you should only need to look up the key to see if it already exists, or handle the (likely duplicate key) error the DB gives back to you.
afaik Key/Value DBs can and will enforce a unique Value constraint for you i.e. you will get an error if you try and save a value that already exists.
How big are your value strings?
I would just store them in a key and let the database do all the work.
Typical LevelDB style, which applies to most KV stores, would be to use a pair of keys, prefixed to indicate type
eg:
Key = 'i' + ID
Value = valueString
Key = 'v' + valueString
Value = ID
In a system that needs to allow for multiple identical valueStrings you would move the ID into the tail of the second key
Key = 'v' + valueString + ID
Value = empty

Redis key matching performance

We are using Redis for key-value ordinary caching and for our thumbnail cache. In a machine which has 100+ sites Redis thumnail database has 500000 keys without distinctive prefix like:
"sorl-thumbnail||image||6c4a67b016c4f867b9fdd3e5c5609887"
"sorl-thumbnail||image||ad7c56bd5461e9061604867d056b5de8"
"sorl-thumbnail||image||655ad6bb21129326ef4618df83a0f1f7"
"sorl-thumbnail||thumbnails||871641bfefa6250518fe52b86cf742c9"
"sorl-thumbnail||thumbnails||570565770557013bada8c1fe2cb3d658"
"sorl-thumbnail||image||c01134f4a8746d24c6d62543419bbb3a"
"sorl-thumbnail||image||ecc5afb281bc78fefe3046e2cc3f972a"
"sorl-thumbnail||image||670f1f1b6c5660f46053a484e22a4071"
Does using a prefix like 001,002,003,... 100 for site ids increase the performance of accessing Redis?
Because the data structure of the main dictionary is a hash table and not a tree, the general performance of Redis is not really impacted if you have plenty of keys with a common prefix.
Prefixing your keys with some discriminating data will not really improve performance.

Redis: does separate database improve performance for KEYS and SORT

Does separate database improve performance for KEYS and SORT?
In case you mean that, by spreading the same number of keys across multiple databases, your KEYS and SORT operations will be faster, then the answer is yes.
This is because there are less keys to check against and the time complexity of both these operations is dependent on the number of keys.
At the same time, sorting two result sets in two different databases will be far more costly.
See:
Redis commands - Sort
Redis commands - Keys
No. Both of those commands are run on one database. If you have 2 or more databases and wanted to run that command, then you would have to execute in each database, therefor taking twice the amount of time.