Does Redis rehash the key under the hood?
i.e. do I need to hash my key before writing it to Redis?
for example:
Normally I would do:
redis.put(rehash(key), value)
Is it really necessary?
Redis uses the CRC16 algorithm to map keys to hash slots. There are 16384 hash slots in Redis Cluster, and to compute what is the hash slot of a given key, we simply take the CRC16 of the key modulo 16384.
There is no need for rehashing the key before writing to Redis, Redis does it for you.
In general, a hash function maps keys to small integers (buckets). An ideal hash function maps the keys to the integers in a random-like manner, so that bucket values are evenly distributed even if there are regularities in the input data.
CRC16 can ensure an evenly load on the nodes concerning the amount of keys i.e. a uniform distrubution.
Related
I may have 100mi long, but partially static keys like:
someReal...LongStaticPrefix:12345
someReal...LongStaticPrefix:12
someReal...LongStaticPrefix:123456
Where only the last part of the key is dynamic, the rest is static.
Does Redis keep all keys long or does it make an internal alias or something like that?
Should I worry about storage or performance?
Or is it better if I make internal alias do the keys to keep them short?
Redis does keep the whole key. This long prefix will impact your memory usage.
Given redis uses a hash map to store the keys, the performance impact is low. Hash map load factor is usually between 0.5 and 1. This means usually there is just one or two keys per hash slot. So the performance impact is the extra network payload for the long key, the longer effort to hash it, and the longer comparison in the hash slot with one or two keys. It's likely negligible unless your prefix is really really long
Consider a shorter key prefix.
Before considering using a hash structure (HSET), consider if you are using redis cluster or if you may need to eventually. A single hash key cannot be sharded.
A minor optimization : Consider using a suffix for faster compares at the hash slot chain
I'm under the impression that one should hash (i.e. sha3) their Redis key before adding data to it. (It might have even been with regard to memcache.) I don't remember why I have this impression or where it came from but I can't find anything to validate (or refute) it. The reasoning was the hash would help with even distribution across a cluster.
When using Redis (in either/both clustered and non-clustered modes) is it best pracatice to hash the key before calling SET? e.g. set(sha3("username:123"), "owlman123")
No, you shouldn't hash the key. Redis Cluster hashes the key itself for the purpose of choosing the node:
There are 16384 hash slots in Redis Cluster, and to compute what is the hash slot of a given key, we simply take the CRC16 of the key modulo 16384.
You can also use hash tags to control which keys share the same slot.
It might be a good idea if your keys are very long, as recommended in the official documentation:
A few other rules about keys:
Very long keys are not a good idea. For instance a key of 1024 bytes is a bad idea not only memory-wise, but also because the lookup of the key in the dataset may require several costly key-comparisons. Even when the task at hand is to match the existence of a large value, hashing it (for example with SHA1) is a better idea, especially from the perspective of memory and bandwidth.
source: https://redis.io/docs/data-types/tutorial/#keys
Problem set : I am looking to store 6 billion SHA256 hashes. I want to check if a hash exist and if so, an action will be performed. When it comes to storing the SHA256 hash (64 byte string) just to check the if the key exist, I've come across two functions to use
HSET/HEXIST and GETBIT/SETBIT
I want to make sure I take the least amount of memory, but also want to make sure lookups are quick.
The Use case will be "check if sha256 hash exist"
The problem,
I want to understand how to store this data as currently I have a 200% increase from text -> redis. I want to understand what would the best shard options using ziplist entries and ziplist value would be. How to split the hash to be effective so the ziplist is maximised.
I've tried setting the ziplist entries to 16 ^ 4 (65536) and the value to 60 based on splitting 4:60
Any help to help me understand options, and techniques to make this as small of a footprint but quick enough to run lookups.
Thanks
A bit late to the party but you can just use plain Redis keys for this:
# Store a given SHA256 hash
> SET 9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08 ""
OK
# Check whether a specific hash exists
> EXISTS 2c26b46b68ffc68ff99b453c1d30413413422d706483bfa0f98a5e886266e7ae
0
Where both SET and EXISTS have a time complexity of O(1) for single keys.
As Redis can handle a maximum of 2^32 keys, you should split your dataset into two or more Redis servers / clusters, also depending on the number of nodes and the total combined memory available to your servers / clusters.
I would also suggest to use the binary sequence of your hashes instead of their textual representation - as that would allow to save ~50% of memory while storing your keys in Redis.
I am trying to insert multiple key/values at once on Redis (some values are sets, some are hashes) and I get this error: ERR CROSSSLOT Keys in request don't hash to the same slot.
I'm not doing this from redis-cli but from some Go code that needs to write multiple key/values to a redis cluster. I see other places in the code where multiple key values are done this way and I don't understand why mine don't work. What are the hash requirements to not have this error?
Thanks
In a cluster topology, the keyspace is divided into hash slots. Different nodes will hold a subset of hash slots.
Multiple keys operations, transactions, or Lua scripts involving multiple keys are allowed only if all the keys involved are in hash slots belonging to the same node.
Redis Cluster implements all the single key commands available in the
non-distributed version of Redis. Commands performing complex
multi-key operations like Set type unions or intersections are
implemented as well as long as the keys all belong to the same node.
You can force the keys to belong to the same node by using Hash Tags
ERR CROSSSLOT Keys in request don't hash to the same slot
As the error message suggests, only if all of the keys belong to same slot will the operation succeed. Otherwise, you will see this failure message. This error would be seen even though both/all slots belongs to the same node. The check is very strict and, as per the code, all keys should hash to same slot.
I am writing a JAR file that fetches large amount of data from Oracle db and stores in Redis. The details are properly stored, but the set key and hash key I have defined in the jar are getting limited in redis db. There should nearly 200 Hash and 300 set keys. But, I am getting only 29 keys when giving keys * in redis. Please help on how to increase the limit of the redis memory or hash or set key storage size.
Note: I changed the
hash-max-zipmap-entries 1024
hash-max-zipmap-value 64
manually in redis.conf file. But, its not reflecting. Anywhere it needs to be changed?
There is no limit about the number of set or hash keys you can put in a Redis instance, except for the size of the memory (check the maxmemory, and maxmemory-policy parameters).
The hash-max-zipmap-entries parameter is completely unrelated: it only controls memory optimization.
I suggest using a MONITOR command to check which queries are sent to the Redis instance.
hash-max-zipmap-value keeps the hash key value pair system in redis optimized as the searching for the keys in these hashes follow an amortized N and therefore longer keys will in turn increase the latency of the system.
These settings are available in redis.conf.
If one enters keys more then the specified number then the hash key value pair will be converted to basic key value pair structure internally and thereby will not be able to provide the advantage in memory which hashes provide so.