Redis key matching performance - redis

We are using Redis for key-value ordinary caching and for our thumbnail cache. In a machine which has 100+ sites Redis thumnail database has 500000 keys without distinctive prefix like:
"sorl-thumbnail||image||6c4a67b016c4f867b9fdd3e5c5609887"
"sorl-thumbnail||image||ad7c56bd5461e9061604867d056b5de8"
"sorl-thumbnail||image||655ad6bb21129326ef4618df83a0f1f7"
"sorl-thumbnail||thumbnails||871641bfefa6250518fe52b86cf742c9"
"sorl-thumbnail||thumbnails||570565770557013bada8c1fe2cb3d658"
"sorl-thumbnail||image||c01134f4a8746d24c6d62543419bbb3a"
"sorl-thumbnail||image||ecc5afb281bc78fefe3046e2cc3f972a"
"sorl-thumbnail||image||670f1f1b6c5660f46053a484e22a4071"
Does using a prefix like 001,002,003,... 100 for site ids increase the performance of accessing Redis?

Because the data structure of the main dictionary is a hash table and not a tree, the general performance of Redis is not really impacted if you have plenty of keys with a common prefix.
Prefixing your keys with some discriminating data will not really improve performance.

Related

Does Redis keep all chars from prefixed kind of keys?

I may have 100mi long, but partially static keys like:
someReal...LongStaticPrefix:12345
someReal...LongStaticPrefix:12
someReal...LongStaticPrefix:123456
Where only the last part of the key is dynamic, the rest is static.
Does Redis keep all keys long or does it make an internal alias or something like that?
Should I worry about storage or performance?
Or is it better if I make internal alias do the keys to keep them short?
Redis does keep the whole key. This long prefix will impact your memory usage.
Given redis uses a hash map to store the keys, the performance impact is low. Hash map load factor is usually between 0.5 and 1. This means usually there is just one or two keys per hash slot. So the performance impact is the extra network payload for the long key, the longer effort to hash it, and the longer comparison in the hash slot with one or two keys. It's likely negligible unless your prefix is really really long
Consider a shorter key prefix.
Before considering using a hash structure (HSET), consider if you are using redis cluster or if you may need to eventually. A single hash key cannot be sharded.
A minor optimization : Consider using a suffix for faster compares at the hash slot chain

Redis using too much memory smaller number of keys

I have a redis standalone server, with around 8000 keys at a given instance .
The used_memeory is showing to be around 8.5 GB.
My individuals key-value size is max around 50kb , by that calculation the used_memory should be less than 1 GB (50kb * 8000)
I am using spring RedisTemplate with default pool configuration to connect to redis
Any idea what should I look into, to narrow down where the memory is being consumed ?
zset internally uses two data structures to hold the same elements in order to get O(log(N)) INSERT and REMOVE operations into a sorted data structure.
The two Data-structures to be specific are,
Hash Table
Skip list
Storage for ideal cases according to my research is in the following order,
hset < set < zset
I would recommend you to start using hset in case you have hierarchical data storage. This would lower down your memory consumption but might make searching teeny-tiny bit slower (only if one key has more than say a couple of hundred records)

Redis: Multiple unique keys versus bucketing through Hash

I have total six type of keys, say a,b,..,f each having around a million subkeys, like a1,a2,...a99999(different in each bucket). What is the faster way to access?
Having separate keys by combining bucket name and key like: a_a1,b_b1 etc.
Use hash for 6 keys to have buckets and then have 1 million keys in each?
I search stack-overflow and couldn't find such comparison when I have few buckets with huge number of keys!
Edit1: Every key and value is string only at maximum 100 characters. I would access it using Jedis library of Java making transactions
Your question remind me this article. It doesn't contains performance benchmarks but seems like your second case (with buckets of keys) will have appropriate performance and small memory footprint.

Best way to model millions of exist checks in Aerospike?

Having grown out of Redis for some data structures I'm looking to other solutions with good disk/SSD performance. I recently discovered Aerospike which seems to excel in an SSD environment.
One of the most memory hungry structures are about 100.000 Redis sets, which can each contain up to 10.000 strings. Each string is between 10 and 30 characters.
These sets are mostly used for exists / uniqueness checks.
What would be the best way to model these? I generally see 2 options:
* model a redis set as an Aerospike lset
* model each value in a set separately.
Besides this choice, the 100.000 Redis sets are used as a partitioning on the keys. For reasons of locality it would probably make sense to have a similar sort of partitioning/namespacing in Aerospike. However, I'm pretty sure the notion of 'namespacing' in Aerospike isn't used for this sort of key partitioning. What would be a correct way (if any) to do this in Aerospike, or is that not needed?
Aerospike does its own partitioning for load balancing and high availability needs. Namespace is synonymous to Database in traditional sense and NOT to Partition of data. Data in a Namespace is partitioned and stored in cluster. You as a user need not worry about placement of the data.
I would map a Redis set to Aerospike "lset" (one to one). Aerospike should takes care of data locality for the data in a given "lset".
Yes, you should not be worrying about the locality of the data as Aerospike does auto-sharding. This ensures equal balancing of data distribution and read/write load across all nodes of the cluster.
Putting in lset has its advantages. It gives functionality similar to redis where you do not need to write your own functionality. But at the same time it has its disadvantes too. So, you should choose based on your requirements. All the operations on a single set will be serialized. So, if you are expecting the read/wirte to the set to be parallelised, lset may not be the right fit for you. Also, the exists check in lset will actually read the full record and return true false. Aerospike has an exists api for normal keys, which will return true/false based on the in-memory index which is way faster.
For this usecase, you may not be able to segregate them into the 'sets' of aerospike. You need 100,000 sets. But as of now, Aerospike only supports 1024 sets.
Let me add a third option to your list. You can model the key itself to create virtual sets for you as below:
if you actual key is key1 and you want it to go to set1, you can set your mashed keys as set1_key1.
when you want to search for existence of key7 in set5, search for existence of set5_key7
If you go with this model, you are exploiting Aerospike's data-distribution, and load balancing to its best. The exists check will be the fastest as there will be no I/O.

Segmenting Redis By Database

By default, Redis is configured with 16 databases, numbered 0-15. Is this simply a form of name spacing, or are there performance implications of segregating by database ?
For example, if I use the default database (0), and I have 10 million keys, best practices suggest that using the keys command to find keys by wildcard patterns will be inefficient. But what if I store my major keys, perhaps the first 4 segments of 8 segment keys, resulting in a much smaller subset of keys in a separate database (say database 3). Will Redis see these as a smaller set of keys, or do all keys across all databases appear as one giant index of keys ?
More explicitly put, in terms of time complexity, if my databases look like this:
Database 0: 10,000,000 keys
Database 3: 10,000 keys
will the time complexity of keys calls against Database 3 be O(10m) or will it be O(10k) ?
Thanks for your time.
Redis has a separate dictionary for each database. From your example, the keys call against database 3 will be O(10K)
That said, using keys is against best practice. Additionally, using multiple databases for the same application is against best practices as well. If you want to iterate over keys, you should index them in an application specific way. A SortedSet is a good way way to build an index.
References :
The structure redisServer has an array of redisDB. See redisServer in redis.h
Each redisDB has its own dictionary object. See redisDB in redis.h
keys command operates on the dictionary for the current database