Why does Redis use integer database numbers? - redis

Why does Redis use integer database numbers instead of strings? It seems like it would be trivial to keep a small internal data structure which maps strings to the “actual” integer.

the reason why Redis does not use strings as DB names but indexes is that the goal and ability of Redis databases is not to provide an outer level of dictionary: Redis dictionaries can't scale to many dictionaries, but just to a small number (it is a tradeoff), nor we want to provide nested data structures per design, so this are just "a few namespaces" and as a result using a numerical small index seemed like the best option.

Having named databases doesn't really fit the design goals of redis. For a start, in a system designed for maximum performance, adding a string lookup to every call isn't a great idea when most users put everything in DB 0 anyway.
Another one of the design goals is keeping the core simple - If a requested new command can be implemented by combining existing commands on the client without a huge performance penalty it won't get added to the core system. If you really need named databases, it is trivial to update your client code read a string and send a number to redis.

Related

Database (Azure or Oracle) - Most efficient accurate way of detecting incremental changes?

So in the past for our ETL processes we used to do a column by column lookup and comparison. This way worked for us for a while and no one bothered to go back to take a look at how to make it more efficient.
Recently we noticed that our job run times are starting to creep up so this led to some discussion and one suggestion to optimize our process was that we use "hashing".
From my limited time with Googling I am getting some mixed messages about the benefits to hashing.
Now I know by doing a column by column look up it seems to be a tried and true method, simple to understand/implement and is relatively accurate.
With hashing - How would this be any better?
Logically I would think that instead of comparing multiple rows & columns with hashing we could just be comparing multiple rows and one single column for example.
But there is mention that there could be "clashes" or "collision" with hashing - So could this lead to inaccuracies or challenges trouble shooting?
Before we start tearing things apart I just wanted to see if I may have missed anything
EDIT 2022-07-16
When I refer to Azure I used it a bit generically as we are planning on using Azure Data Factory and Azure Synapse for our data movement/ETL/ELT. So we would be using BOTH Azure SQL Server and also Azure Data Lake Gen2 (and eventually other Azure services). Its quite a few moving pieces, in particular with ADLS - One item we were trying to figure out is how we can compare differences between "files". That led us down a rabbit hole of "can we use hashing? What is hashing?"
Hashing is normally better as you only compare one column, that contains the hash, rather than every column.
Collisions can occur with hashes but very rarely. and it depends on the hash function you use. You would need to research your chosen hash function, determine under what conditions you might get collisions and then decide whether that was a concern given your particular circumstances

What's the Point of Multiple Redis Databases?

So, I've come to a place where I wanted to segment the data I store in redis into separate databases as I sometimes need to make use of the keys command on one specific kind of data, and wanted to separate it to make that faster.
If I segment into multiple databases, everything is still single threaded, and I still only get to use one core. If I just launch another instance of Redis on the same box, I get to use an extra core. On top of that, I can't name Redis databases, or give them any sort of more logical identifier. So, with all of that said, why/when would I ever want to use multiple Redis databases instead of just spinning up an extra instance of Redis for each extra database I want? And relatedly, why doesn't Redis try to utilize an extra core for each extra database I add? What's the advantage of being single threaded across databases?
You don't want to use multiple databases in a single redis instance. As you noted, multiple instances lets you take advantage of multiple cores. If you use database selection you will have to refactor when upgrading. Monitoring and managing multiple instances is not difficult nor painful.
Indeed, you would get far better metrics on each db by segregation based on instance. Each instance would have stats reflecting that segment of data, which can allow for better tuning and more responsive and accurate monitoring. Use a recent version and separate your data by instance.
As Jonaton said, don't use the keys command. You'll find far better performance if you simply create a key index. Whenever adding a key, add the key name to a set. The keys command is not terribly useful once you scale up since it will take significant time to return.
Let the access pattern determine how to structure your data rather than store it the way you think works and then working around how to access and mince it later. You will see far better performance and find the data consuming code often is much cleaner and simpler.
Regarding single threaded, consider that redis is designed for speed and atomicity. Sure actions modifying data in one db need not wait on another db, but what if that action is saving to the dump file, or processing transactions on slaves? At that point you start getting into the weeds of concurrency programming.
By using multiple instances you turn multi threading complexity into a simpler message passing style system.
In principle, Redis databases on the same instance are no different than schemas in RDBMS database instances.
So, with all of that said, why/when would I ever want to use multiple
Redis databases instead of just spinning up an extra instance of Redis
for each extra database I want?
There's one clear advantage of using redis databases in the same redis instance, and that's management. If you spin up a separate instance for each application, and let's say you've got 3 apps, that's 3 separate redis instances, each of which will likely need a slave for HA in production, so that's 6 total instances. From a management standpoint, this gets messy real quick because you need to monitor all of them, do upgrades/patches, etc. If you don't plan on overloading redis with high I/O, a single instance with a slave is simpler and easier to manage provided it meets your SLA.
Even Salvatore Sanfilippo (creator of Redis) thinks it's a bad idea to use multiple DBs in Redis. See his comment here:
https://groups.google.com/d/topic/redis-db/vS5wX8X4Cjg/discussion
I understand how this can be useful, but unfortunately I consider
Redis multiple database errors my worst decision in Redis design at
all... without any kind of real gain, it makes the internals a lot
more complex. The reality is that databases don't scale well for a
number of reason, like active expire of keys and VM. If the DB
selection can be performed with a string I can see this feature being
used as a scalable O(1) dictionary layer, that instead it is not.
With DB numbers, with a default of a few DBs, we are communication
better what this feature is and how can be used I think. I hope that
at some point we can drop the multiple DBs support at all, but I think
it is probably too late as there is a number of people relying on this
feature for their work.
I don't really know any benefits of having multiple databases on a single instance. I guess it's useful if multiple services use the same database server(s), so you can avoid key collisions.
I would not recommend building around using the KEYS command, since it's O(n) and that doesn't scale well. What are you using it for that you can accomplish in another way? Maybe redis isn't the best match for you if functionality like KEYS is vital.
I think they mention the benefits of a single threaded server in their FAQ, but the main thing is simplicity - you don't have to bother with concurrency in any real way. Every action is blocking, so no two things can alter the database at the same time. Ideally you would have one (or more) instances per core of each server, and use a consistent hashing algorithm (or a proxy) to divide the keys among them. Of course, you'll loose some functionality - piping will only work for things on the same server, sorts become harder etc.
Redis databases can be used in the rare cases of deploying a new version of the application, where the new version requires working with different entities.
I know this question is years old, but there's another reason multiple databases may be useful.
If you use a "cloud Redis" from your favourite cloud provider, you probably have a minimum memory size and will pay for what you allocate. If however your dataset is smaller than that, then you'll be wasting a bit of the allocation, and so wasting a bit of money.
Using databases you could use the same Redis cloud-instance to provide service for (say) dev, UAT and production, or multiple instances of your application, or whatever else - thus using more of the allocated memory and so being a little more cost-effective.
A use-case I'm looking at has several instances of an application which use 200-300K each, yet the minimum allocation on my cloud provider is 1M. We can consolidate 10 instances onto a single Redis without really making a dent in any limits, and so save about 90% of the Redis hosting cost. I appreciate there are limitations and issues with this approach, but thought it worth mentioning.
I am using redis for implementing a blacklist of email addresses , and i have different TTL values for different levels of blacklisting , so having different DBs on same instance helps me a lot .
Using multiple databases in a single instance may be useful in the following scenario:
Different copies of the same database could be used for production, development or testing using real-time data. People may use replica to clone a redis instance to achieve the same purpose. However, the former approach is easier for existing running programs to just select the right database to switch to the intended mode.
Our motivation has not been mentioned above. We use multiple databases because we routinely need to delete a large set of a certain type of data, and FLUSHDB makes that easy. For example, we can clear all cached web pages, using FLUSHDB on database 0, without affecting all of our other use of Redis.
There is some discussion here but I have not found definitive information about the performance of this vs scan and delete:
https://github.com/StackExchange/StackExchange.Redis/issues/873

Best way to store a small key-value list in Redis

I'm trying to use Redis as a primary database for a small game I'm making (mostly to mess around with programming and using Redis).
However I came across a scenario that I couldn't find an answer to:
I wish to store a list of the names of different maps that people can be on (not many of them) along with their id. Note: I never need to get the ID from the name.
The two ways I believe this can be done are either storing the information as a string or as a hash.
i.e:
1) String based:
set maps:0 "Main"
set maps:1 "Island"
etc (and maybe a maps:id to
store an auto increment value)
2) Hash based:
hset maps "0" "Main"
hset maps "1" "Island"
etc
My question is which way seems the best. Given that there will never be that many maps I'm leaning towards the single hashed object. Partially because this provides a nice method to return all the maps in existence. But is there any particular reason that the string based queries would be more useful.
Hopefully you can give me some clear information.
Thank you,
Pluckerpluck
The String based values are actually discouraged because it consumes a lot more memory than a hash.
Redis optimizes small hashes and encodes them in a memory efficient manner. This encoding is called zipmap (or ziplist in redis 2.6). See http://redis.io/topics/memory-optimization, specially the section "Use hashes when possible".

Ruby: To make a string cache, which is faster: Array or SQL?

I'm making a ruby server with a cache of info about the online clients. This info should be preserved when I turn off the server. I can store it in a simple array and save it with Marshal or I can use a SQL database (MySQL, probably). Which is better to use? I think the Array method is easly, but the SQL is faster than? Thanks!
It depends on scalability requirements. If you expect thousands of records, you should use SQL or another DB, although this imposes developer overhead. If you're dealing with a small number, however, you could get by with just serializing objects and saving them to disk.
Actually I would expect storing the serialized array to be significantly faster, as no indexing or additional row allocation needs to take place. I think it all depends on whether you want to be able to perform queries on the information. If not, you don't really need a database, you just need persistence. You might as well write the cache to a file then.

Which would be better? Storing/access data in a local text file, or in a database?

Basically, I'm still working on a puzzle-related website (micro-site really), and I'm making a tool that lets you input a word pattern (e.g. "r??n") and get all the matching words (in this case: rain, rein, ruin, etc.). Should I store the words in local text files (such as words5.txt, which would have a return-delimited list of 5-letter words), or in a database (such as the table Words5, which would again store 5-letter words)?
I'm looking at the problem in terms of data retrieval speeds and CPU server load. I could definitely try it both ways and record the times taken for several runs with both methods, but I'd rather hear it from people who might have had experience with this.
Which method is generally better overall?
The database will give you the best performance with the least amount of work. The built in index support and query analyzers will give you good performance for free while a textfile might give you excellent performance for a ton of work.
In the short term, I'd recommend creating a generic interface which would hide the difference between a database and a flat-file. Later on, you can benchmark which one will provide the best performance but I think the database will give you the best bang per hour of development.
For fast retrieval you certainly want some kind of index. If you don't want to write index code yourself, it's certainly easiest to use a database.
If you are using Java or .NET for your app, consider looking into db4o. It just stores any object as is with a single line of code and there are no setup costs for creating tables.
Storing data in a local text file (when you add new records to end of the file) always faster then storing in database. So, if you create high load application, you can save the data in a text file and copy data to a database later. However in most application you should use a database instead of text file, because database approach has many benefits.