Naming Convention and Valid Characters for a Redis Key - redis

I was wondering what characters are considered valid in a Redis key. I have googled for some time and can not find any useful info.
Like in Python, valid variable name should belong to the class [a-zA-Z0-9_]. What are the requirements and conventions for Redis keys?

Part of this is answered here, but this isn't completely a duplicate, as you're asking about allowed characters as well as conventions.
As for valid characters in Redis keys, the manual explains this completely:
Redis keys are binary safe, this means that you can use any binary sequence as a key, from a string like "foo" to the content of a JPEG file. The empty string is also a valid key.
A few other rules about keys:
Very long keys are not a good idea, for instance a key of 1024 bytes is a bad idea not only memory-wise, but also because the lookup of the key in the dataset may require several costly key-comparisons. Even when the task at hand is to match the existence of a large value, to resort to hashing it (for example with SHA1) is a better idea, especially from the point of view of memory and bandwidth.
Very short keys are often not a good idea. There is little point in writing "u1000flw" as a key if you can instead write "user:1000:followers". The latter is more readable and the added space is minor compared to the space used by the key object itself and the value object. While short keys will obviously consume a bit less memory, your job is to find the right balance.
Try to stick with a schema. For instance "object-type:id" is a good idea, as in "user:1000". Dots or dashes are often used for multi-word fields, as in "comment:1234:reply.to" or "comment:1234:reply-to".
The maximum allowed key size is 512 MB.

Related

Are there any downsides to using nanoid for primary key?

I know that UUIDs and incrementing integers are often used for primary keys.
I'm thinking of nanoids instead because those are URL friendly without being guessable / brute-force scrapeable (like incrementing integers).
Would there be any reason not to use nanoids as primary keys in a database like Postgres? (For example: Maybe they drastically increase query time since they aren't ... aligned or something?)
https://github.com/ai/nanoid
Most databases use incrementing id's because it's more efficient to insert a new value onto the end of a B-tree based index.
If you insert a new value into a random place in the middle of a B-tree, it may have to split the B-tree nonterminal node, and that could cause the node at the next higher level to split, and so on up to the top of the B-tree.
This also has a greater risk of causing fragmentation, which means the index takes more space for the same number of values.
Read https://www.percona.com/blog/2015/04/03/illustrating-primary-key-models-in-innodb-and-their-impact-on-disk-usage/ for a great visualization about the tradeoff between using an auto-increment versus UUID in a primary key.
That blog is about MySQL, but the same issue applies to any B-tree based data structure.
I'm not sure if there is a disadvantage to using nanoids, but they are often unnecessary. While UUIDs are long, they can be translated to a shorter format without losing entropy.
See the NPM package (https://www.npmjs.com/package/short-uuid).
UUIDs are standardized by the Open Software Foundation (OSF) and described by the RFC 4122. That means that there will be far more chances for other tools to give you some perks around it.
Some examples:
MongoDB has a special type to optimize the storage of UUIDs. Not only a NanoID string will take more space, but even the binary takes more bits (126 in Nano ID and 122 in UUID)
Once saw a logging tool extracting the timestamp from the uids, can't remember which, but is is available
Also the long, non reduced version of UUIDs are very easy to identify visually. When the end user is a developer, it might help to understand the nature/source of the ID (like clearly not a database auto-increment key)

Does Redis keep all chars from prefixed kind of keys?

I may have 100mi long, but partially static keys like:
someReal...LongStaticPrefix:12345
someReal...LongStaticPrefix:12
someReal...LongStaticPrefix:123456
Where only the last part of the key is dynamic, the rest is static.
Does Redis keep all keys long or does it make an internal alias or something like that?
Should I worry about storage or performance?
Or is it better if I make internal alias do the keys to keep them short?
Redis does keep the whole key. This long prefix will impact your memory usage.
Given redis uses a hash map to store the keys, the performance impact is low. Hash map load factor is usually between 0.5 and 1. This means usually there is just one or two keys per hash slot. So the performance impact is the extra network payload for the long key, the longer effort to hash it, and the longer comparison in the hash slot with one or two keys. It's likely negligible unless your prefix is really really long
Consider a shorter key prefix.
Before considering using a hash structure (HSET), consider if you are using redis cluster or if you may need to eventually. A single hash key cannot be sharded.
A minor optimization : Consider using a suffix for faster compares at the hash slot chain

Key types supported by Redis

What are the different key types supported by Redis? The documentation mentions all the various types (strings, set, hashmap, etc) of values supported by Redis, but I couldn't quiet find the key type information.
From redis documentation (Data types intro):
Redis keys
Redis keys are binary safe, this means that you can use any binary sequence as a key, from a string like "foo" to the content
of a JPEG file. The empty string is also a valid key. A few other
rules about keys:
Very long keys are not a good idea. For instance a key of 1024 bytes is a bad idea not only memory-wise, but also because the
lookup of the key in the dataset may require several costly
key-comparisons. Even when the task at hand is to match the
existence of a large value, hashing it (for example with SHA1) is a
better idea, especially from the perspective of memory and
bandwidth.
Very short keys are often not a good idea. There is little point in writing "u1000flw" as a key if you can instead write
"user:1000:followers". The latter is more readable and the added
space is minor compared to the space used by the key object itself
and the value object. While short keys will obviously consume a bit
less memory, your job is to find the right balance.
Try to stick with a schema. For instance "object-type:id" is a good idea, as in "user:1000". Dots or dashes are often used for multi-word
fields, as in "comment:1234:reply.to" or "comment:1234:reply-to".
The maximum allowed key size is 512 MB.
From my experience any binary sequence typically means a String, but I may not be familiar with languages where you can achieve this by using other data types.
Keys in Redis are all strings, so it doesn't really matter what kind of value you pass into a client. Under-the-hood the RESP protocol is used and it will pass the value as a string to the engine.
Example:
ZADD some_key 1 some_value
some_key is always a string, even if you pass 3 as key, it is handled as a string. This is true for every client.

How to implement a scalable, unordered collection in DynamoDB?

I am looking into implementing a scalable unordered collection of objects on top of Amazon DynamoDB. So far the following options have been considered:
Use DynamoDB document data types (map, list) and use document path to access stand-alone items. This has one obvious drawback for collection being limited to 400KB of data, meaning perhaps 1..10K objects depending on their size. Less obvious drawback is that cost of insertion of a new object into such collection is going to be huge: Amazon specifies that the write capacity will be deducted based on the total item size, not just newly added object -- therefore ~400 capacity units for inserting 1KB object when approaching the size limit. So considering this ruled out?
Using composite primary hash + range key, where primary hash remains the same for all objects in the collection, and range key is just something random or an atomic counter. Obvious drawback is that having identical hash key results in bad key distribution -- cardinality is low when there are collections with large number of objects. This means bad partitioning, and having a scale issue with all reads/writes on the same collection being stuck to one shard, becoming subject to 3000 reads / 1000 writes per second limitation of DynamoDB partition.
Using global secondary index with secondary hash + range key, where hash key remains the same for all objects belonging to the same collection, and range key is just something random or an atomic counter. Similar to above, partitioning becomes poor for the GSI, and it will become a bottleneck with too many identical hashes draining all the provisioned capacity to the index rapidly. I didn't find how the GSI is implemented exactly, thus not sure how badly it suffers from low cardinality.
Question is, whether I could live with (2) or (3) and suffer from non-ideal key distribution, or is there another way of implementing collection that was overlooked, or perhaps I should at all consider looking into another nosql database engine.
This is a "shooting from the hip" answer, what you end up doing may depend on how much and what type of reading and writing you do.
Two things the dynamo docs encourage you to avoid are hot keys and, in general, scans. You noted that in cases (2) and (3), you end up with a hot key. If you expect this to scale (large collections), the hot key will probably hurt more and more, especially if this is a write-intensive application.
The docs on Query and Scan operations (http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/QueryAndScan.html) say that, for a query, "you must specify the hash key attribute name and value as an equality condition." So if you want to avoid scans, this might still force your hand and put you back into that hot key situation.
Maybe one route would be to embrace doing a scan operation, but just have one table devoted to your collection. Then you could just have a fully random (well distributed) hash key and do a scan every time. This assumes you always want everything from the collection (you didn't say). This will still hurt if you scale up to a large collection, but if you always want the full set back, you'll have to deal with that pain regardless. If you just want a subset, you can add a limit parameter. This would help performance, but you will always get back the same subset (or you can use the last evaluated key and keep going). The docs also mention parallel scans.
If you are using AWS, elasticache/redis might be another route to try? The first pass might code up a lot faster/cleaner than situation (1) that you mentioned.

Using GUID (or similar) has performance penalty in Redis?

Does using a GUID or ulong key impact Redis DB performance?
Similar: Does name length impact performance in Redis?
This question is an old one, but other answers are a bit misleading. Eric's answer is totally unrelated to Redis. Pfreixes's answer is based on personal assumptions and is simply wrong.
In fact, it's fairly safe to use GUID keys (performance-wise) as even 300+ character keys don't affect performance significantly on O(1) operations. Check this benchmark: Does name length impact performance in Redis?.
GUID typically has a length of 32-36 chars, if you're using hex representation. As Evan Carrol noticed in comments, Redis strings are binary safe, so you can use binary value and reduce key size down to 128 bits (16 chars). Keys with such length won't hurt performance at all.
Also, documentation suggests to use hashing functions for really large keys: http://redis.io/topics/data-types-intro
Redis use a hash strategy to store all keys, every key is stored using a hash function. All Redis db peformance about keys fall into this function - or something related.
Original key is also stored to figure out future colisions between diferent keys, and yes big keys could be impact at memory handle and all of related fields : memory fragmentation, cache hits/misses, etc ...