Redis key design - redis

I'm wondering if the way we "design" keys in Redis can impact performance and scalability.
For example, if I store content related to "users" under keys like "user:<user_id>" and content related to say, groups, under keys like "group:<group_id>", all my keys will start with either "user:" or "group:".
Will this have a negative impact on the way Redis hashes keys internally?

There is no negative impact. Precisely the design you mention is recommended in the official Redis docs, which are quite clear on this:
Very long keys are not a good idea, for instance a key of 1024 bytes is a bad idea ...
but, read on:
Very short keys are often not a good idea. There is little point in writing "u1000flw" as a key if you can instead write "user:1000:followers". The latter is more readable and the added space is minor compared to the space used by the key object itself and the value object. While short keys will obviously consume a bit less memory, your job is to find the right balance.
Try to stick with a schema. For instance "object-type:id" is a good idea, as in "user:1000". Dots or dashes are often used for multi-word fields, as in "comment:1234:reply.to" or "comment:1234:reply-to".
(Emphases mine.)
See also: Redis key naming conventions?
As it is basically a hash table under the hood, there is nothing analogous to a SQL-style WHERE. That's where bad design could effect performance.

No, it shouldn't be any issue with prefixing your keys like that. Redis uses a hash table internally which in turn uses a proper hash function (one of the murmur hashes if I recall correctly) that won't budge by prefixes.

Related

redis - see if a string contains any key from a set of keys

I have a set of strings, which I was planning to store in a redis set. What I want to do is to check if any of these strings [s] is present inside a subject string ( say S1 ).
I read about SSCAN in redis but it allows me to search if any set member matches a pattern. I want the opposite way round. I want to check if any of the patterns matches my string. Is this possible?
What you want to do is not possible, but if your plan is to match prefixes, like in an autocomplete, you can take a look at sorted sets and ZRANGEBYLEX. You can take a look at this example. Even though it's not working right now, the code is very simple.
There are several ways to do it, it just depends how you want it done. SSCAN is a perfectly legitimate approach where you do the processing client-side and potentially over the network. Depending on your requirements, this may or may not be a good choice.
The opposite way is to let Redis do it for you, or as much as possible, to save on bandwidth, latency and client cpu. The trade off is, of course, having Redis' cpu do it so it may impact performance in some cases.
When it comes to letting Redis do the work, please first understand that the functionality you're describing is not included in it. So you need to build your own and, again, that depends on your specific use case (e.g. how big are s and S1, is S1 indexable as well, ...). Without this information it is hard to make accurate recommendations but the naive (mine) approach would be to use Lua for the job. The script's logic should either check all permutations of S1 for existence in s with SISMEMBER, or, do Lua pattern matching of all of s's members to S1.
This solution, of course, has plenty of room for optimization if some assumptions/rules are set.
Edit: Sorted sets and ZLEX* stuff are also possibly good for this kind of thing, as #Soveran points out. To augment his example and for further inspiration, see here for a reversed version and think of the possibilities :) I still can't understand how someone didn't go and implement FTS in Redis!

Using GUIDs for Custom Tables?

As far as I know, SAP CRM and HANA both utilise GUIDs to uniquely identify records instead of using classic incremented integers. Are there best practices or clear guidelines that cover their use?
Here are some factors I've considered in favour of GUIDs:
Offline creation of objects. IIRC GUIDs are near-guaranteed to be unique in these situations so merging or integration of disparate data sets is not an issue.
Surrogate keys have distinct development advantages. While incrementing integers are a form of surrogate key, use of different number sequences can impose a functional meaning on them.
And some scenarious that favour classic keys:
Users require human-readable keys to identify records in the system. This can be handled in GUID tables by also specifying an external ID with a readable value.
Users want to use number sequences to identify different types of records, similar to sales or purchase documents. Though I actually consider this bad design.
What scenarios for custom development would make you prefer GUIDs over classic keys?
Is blanket-usage of GUIDs for all tables a good idea?
To answer the question at the end: No, it isn’t (at least not in an ABAP environment, and I doubt it’s sensible elsewhere). Using GUIDs for primary keys everywhere makes it awfully hard to maintain and follow complex foreign key relationships at runtime. Just imagine having to debug a program that handles everything using GUIDs instead of the semantic keys you’re used to. And remember that the total length of the primary key may not exceed 255, and the total length of the primary key should not exceed 120 if you want to be able to transport table entries using fully qualified keys. Using GUIDs in composite keys blows the keys up unnecessarily, and using them as synthetics keys makes using foreign key relationships virtually impossible. So no, using GUIDs everywhere is not a good idea, especially not for configuration / customizing data.
It is however a good idea to use GUIDs in almost every place where you would have used a number range object in “old-school ABAP development”. GUIDs can be generated by the application server, while number ranges require network communication to the enqueuing server. (Yes, there is some buffering involved, but generally speaking, GUIDs are a lot faster and easier to handle). So unless you need your keys to follow a certain pattern, you should consider using a GUID. Even if you need some kind of sequential number for whatever business reasons, it might be sensible to use a GUID as the primary key and store the sequential number inside an (indexed) attribute to increase flexibility at development time.

Too many fields bad for elasticsearch index?

Let say I have a thousand keys, and I would want to store the associated values. The intuitive approach seems to be something like
{
"key1":"someval",
"key2":"someotherval",
...
}
Is this a bad design pattern for elasticsearch index to have thousands of keys? Would each keys introduced this way create overhead for every documents under the index?
If you know there is an upper limit to the number of keys you'll have, a few thousand fields is not a problem.
The problem is when you have an unbounded set of keys, e.g. when the key is derived from a value, as you'll have a continuously growing mapping and thus also cluster state. It can also lead to quirky searches.
This is a common enough question/issue that I dedicated a section to it in my article on Troubleshooting Elasticsearch searches, for Beginners.
In short, thousands of fields is no problem - not having control of the mapping is.
Elasticsearch is not ideal for 1000s of key-value pattern in a document. and if you want to update them in real-time or something, then try redis or riak for that.
If you have thousands of keys in a document/record, essentially they become fields and the value become the text and indexed.
From information-retrieval perspective with large data, it is advised to use fewer big fields than numerous small fields, for faster search performance.

Best way to store a small key-value list in Redis

I'm trying to use Redis as a primary database for a small game I'm making (mostly to mess around with programming and using Redis).
However I came across a scenario that I couldn't find an answer to:
I wish to store a list of the names of different maps that people can be on (not many of them) along with their id. Note: I never need to get the ID from the name.
The two ways I believe this can be done are either storing the information as a string or as a hash.
i.e:
1) String based:
set maps:0 "Main"
set maps:1 "Island"
etc (and maybe a maps:id to
store an auto increment value)
2) Hash based:
hset maps "0" "Main"
hset maps "1" "Island"
etc
My question is which way seems the best. Given that there will never be that many maps I'm leaning towards the single hashed object. Partially because this provides a nice method to return all the maps in existence. But is there any particular reason that the string based queries would be more useful.
Hopefully you can give me some clear information.
Thank you,
Pluckerpluck
The String based values are actually discouraged because it consumes a lot more memory than a hash.
Redis optimizes small hashes and encodes them in a memory efficient manner. This encoding is called zipmap (or ziplist in redis 2.6). See http://redis.io/topics/memory-optimization, specially the section "Use hashes when possible".

NHibernate and string primary keys

We have a legacy database that uses strings as primary keys. I want to implement objects on top of that legacy database to better implement some business logic and provide more functionality to the user.
I have read in places that using strings for primary keys on tables is bad. I'm wondering why this is? Is it because of the case-sensitivity issues? character sets?
... why is this particularly bad for NHibernate?
... and following up on that ... if strings do make bad primary keys, is it worth it to replace the primary keys in the database with ints or GUIDs or the like? (we only have about 25-30 tables involved)
Okay, I will have a stab at this. I will give a couple of quick caveats - I am not an expert on databases and my experience is with Hibernate (Java) rather than NHibernate, but here goes.
I think the issue of primary keys as strings is to do with the SQL data-type that is used to represent them in the database. Because the primary key is used all the time when inserting, querying and so on, the database engine has to spend lots of time comparing primary keys. If you are using numbers, these are simply stored as bytes which computers are really good at doing stuff with quickly. As soon as you start using strings, the cost of these operations (comparisons mainly) goes up significantly. Even if the database engine is using really neat strategies to compare keys, it will still always be faster to compare bytes as bytes rather than strings.
On modern hardware though, this is becoming much less an issue than it used to be, and with indexes the problem almost disappears.
I don't know for sure about why this is really bad in Hibernate (and NHibernate) but in my experience, because my application has a complex graph of objects that often have references to other persisted objects, often as lists or sets, the references are all stored using the ID of the other object, and because of the rules I have in place for cascading saves, fetching and so on, this will mean that the primary keys are being used ALL the time. Hibernate - which I quite like - tends to do exactly what its told to, and sometimes people (especially me!) tell it to do really dumb things. As a result, even seemingly simple updates or queries end up generating quite complex SQL.
So - in summary - strings as primary keys are bad due to cost of simple operations on them and using Hibernate may magnify this. In practice though, modern database engines have lots of neat strategies to ensure that the performance hit is not that bad. (Postgres - and presumably others - by default create indexes for primary keys)
For your follow up - should you replace your keys? Well, that depends on the performance of your application. If performance is critical, then for a high volume and very intensive application it may be a good idea, otherwise there will probably be minimal benefit, with the downside of having to spend time changing all your tables. You could expect to get much better results refining the strategies you are using with NHibernate (ie fetching strategies and when you are cascading saves and so on).
Andy K seems to imply that strings are not stored as bytes. That would be funny! In fact it all depends on how long the string PK is and what collation you use. It might be even faster than bigint or int identity and will almost definitely be faster than Guids. If these strings are something you'd have to search by anyway, then you would need an index (perhaps even clustered index) on them anyway, so why not make them PKs!
Using strings or chars adds a huge amount of accidental complexity to your system. Consider these questions:
how to handle case sensitivity;
how to handle padding. NHibernate lets you insert a shorter string, and the database will silently add padding to it, but it won't be reflected in your persisted entity. Trying to fetch the entity again with the in-memory ID returns null;
how to handle encoding issues. C# uses unicode strings, your database migth not. Can you tell how the conversion will be handled? I don't think so.
synthetic integer keys can be autogenerated by most databases without extra effort. With strings you most probably create them "by hand". Unless you hide them behind a Factory (in the DDD sense), the resulting code will clutter your domain model.
Though the performance overhead mentioned by andy K can diminish because of indexing, still many times you do ID comparisions in-memory (hash-maps?) and the DB optimizations do not apply there.
I have been working on a project with a legacy database having string primary keys and no foreign keys at all. We are not allowed to thouch the old schema because a legacy app depends on every minor aspects of it. I feel that the string primary keys hurt the consistency more than the missing foreign keys, since NHibernate handles the later quite gracefully.