I'm considering Redis backed web application, but I'm not sure about the hardware requirements for Redis, or really precisely how Redis "works"? - hardware

I'm new to Redis, as in... I'm really not sure how it works. But I'm considering it for a web app with a relatively uncomplicated data structure that could benefit from Redis' speed. The thing is this app could end up getting millions and millions of rows. Since Redis is "in-memory" and "disk backed" does that mean I'm going to need enough memory to sport these millions of rows of values? Or does it only load values into memory that are recently or commonly accessed?
What sort of hardware requirements am I looking at? Does anyone have any real-world examples of Redis and hardware usage?

Redis handles memory in a great way. There are a few things to point out first. Redis will use less memory if compiled under a 32-bit system, but the maximum memory usage is 4GB. As for your hardware requirements, it all depends on why kind of data you are storing. If you are storing a million keys, but they only have a 8 character string inside it, the memory usage will be a lot lower than a 16 character string. The bottom line; If you are storing 1 million keys in memory, the ballpark memory usage might be around 1GB. I say might because there are a number of factors. Do you have virtual memory? Are all the keys access often? How big are the keys. There is a great post here that describes ways to improve redis memory usage
If you use the disk backend, then only the most frequently accessed keys will be stored in memory. You might have 1GB of data, but only 100Mb could be stored in memory. See here for more info
For hardware, lots of Ram.

Related

Which common database library will rack up the least cost (e.g. from memory and cpu usage) on Google Cloud Run and similar services?

I want to make a CRUD API (create-read-update-delete) by which users can interact with a key-value store database. It'll be hosted on Cloud Run (e.g. see this example) or a similar service, running all day to serve requests.
All data will have a short TTL (time-to-live) around 1 minute, and keys and values will just be short strings. Furthermore, speed, security, redundancy etc. aren't concerns (within reason).
In this case, which common database backend will be the cheapest in terms of its CPU and memory usage? I was thinking of using Redis, but I worried that it might be unnecessarily CPU/memory intensive over say SQLite, PostgresQL, etc..
Or is it the case that basically all these database libraries will have similar CPU/memory usage?
Edit:
Keys are 256-bit numbers, and values are <140-character strings. Every minute, a user requests to write/read from at most 100 of these, and let's just say there's 100k users.
Redis would do fine for this kind of use cases. RDBMs would also do the work but from what you explained, you don't need relational database for this since your data is key/value. Redis is super fast for this case and if you make a good data modeling you may reduce the memory usage.
Since your requirements are key/value and the keys/values have reasonable sizes, you may get the advantage of Redis hashes. In addition to that; you don't need a persistent storage, you may use EXPIRE to manage your memory usage easily. Redis's benchmark tool may help you to benchmark for both strings and hashes to decide which one uses less memory.
Couple of hours ago, I answered a question for reducing memory usage of Redis by using hashes over strings here, it may give some insight.

Redis vs Aerospike usecases?

After going through couple of resources on Google and stack overflow(mentioned below) , I have got high level understanding when to use what but
got couple of questions too
My Understanding :
When used as pure in-memory memory databases both have comparable performance. But for big data where complete complete dataset
can not be fit in memory or even if it can be fit (but it increases the cost), AS(aerospike) can be the good fit as it provides
the mode where indexes can be kept in memory and data in SSD. I believe performance will be bit degraded(compared to completely in memory
db though the way AS handles the read/write from SSD , it makes it fast then traditional disk I/O) but saves the cost and provide performance
then complete data on disk. So when complete data can be fit in memory both can be
equally good but when memory is constraint AS can be good case. Is that right ?
Also it is said that AS provided rich and easy to set up clustering feature whereas some of clustering features in redis needs to be
handled at application. Is it still hold good or it was true till couple of years back(I believe so as I see redis also provides clustering
feature) ?
How is aerospike different from other key-value nosql databases?
What are the use cases where Redis is preferred to Aerospike?
Your assumption in (1) is off, because it applies to (mostly) synthetic situations where all data fits in memory. What happens when you have a system that grows to many terabytes or even petabytes of data? Would you want to try and fit that data in a very expensive, hard to manage fully in-memory system containing many nodes? A modern machine can store a lot more SSD/NVMe drives than memory. If you look at the new i3en instance family type from Amazon EC2, the i3en.24xl has 768G of RAM and 60TB of NVMe storage (8 x 7.5TB). That kind of machine works very well with Aerospike, as it only stores the indexes in memory. A very large amount of data can be stored on a small cluster of such dense nodes, and perform exceptionally well.
Aerospike is used in the real world in clusters that have grown to hundreds of terabytes or even petabytes of data (tens to hundreds of billions of objects), serving millions of operations per-second, and still hitting sub-millisecond to single digit millisecond latencies. See https://www.aerospike.com/summit/ for several talks on that topic.
Another aspect affecting (1) is the fact that the performance of a single instance of Redis is misleading if in-reality you'll be deployed on multiple servers, each with multiple instances of Redis on them. Redis isn't a distributed database as Aerospike is - it requires application-side sharding (which becomes a bit of a clustering and horizontal scaling nightmare) or a separate proxy, which often ends up being the bottleneck. It's great that a single shard can do a million operations per-second, but if the proxy can't handle the combined throughput, and competes with shards for CPU and memory, there's more to the performance at scale picture than just in-memory versus data on SSD.
Unless you're looking at a tiny amount of objects or a small amount of data that isn't likely to grow, you should probably compare the two for yourself with a proof-of-concept test.

How to use chronicle-map instead of redis as a data cache

I intend to use chronicle-map instead of redis, the application scenario is the memoryData module starts every day from the database to load hundreds of millions of records to chronicle-map, and dozens of jvm continue to read chronicle-map records. Each jvm has hundreds of threads. But probably because of the lack of understanding of the chronicle-map, the code poor performance, running slower, until the memory overflow. I wonder if the above practice is the correct use of chronicle-map.
Because Chronicle map stores your data off-heap it's able to store more data than you can hold in main memory, but will perform better if all the data can fit into memory, ( so if possible, consider increasing your machine memory, if this is not possible try to use an SSD drive ), another reason for poor performance maybe down to how you have sized the map in the chronicle map builder, for example how you have set the number of max entries, if this is too large it will effect performance.

As redis keeps data in memory, what can be the maximum size of data stored in Redis

I don't know much about redis, but what I know is Redis stores data in key-value format in memory and data is also persistent as stored in disk in intervals.
So I want to know, Let say if the RAM is 10GB, then can we store data in redis more than 10GB?
In fact, I am not much clear with the disk and memory usages that redis use to store.
From the Redis FAQ:
Redis is an in-memory but persistent on disk database, so it represents a different trade off where very high write and read speed is achieved with the limitation of data sets that can't be larger than memory.
So, unfortunately, no, your datasize is limited to the amount of RAM you've allowed Redis to use.
Situation is even worse. If you have 10GB of RAM then in fact you can store about 6-7GB. It is for at least two reasons:
Redis has certain memory overhead per a data item.
Redis forks in order to do snapshots. Which results in allocation of additional memory for all of the pages that have been changing during that process.

Why a 500MB Redis dump.rdb file takes about 5.0GB memory?

Actually, I have 3 Redis instances and I put them together into this 500MB+ dump.rdb. The Redis server can read this dump.rdb and it seems that everything is ok. Then I notice that redis-server cost more than 5.0GB memory. I don't know why.
Is there anything wrong with my file? My db has about 3 million keys, values for each key is a list contains about 80 integers.
I use this METHOD to put 3 instance together.
PS:Another dump.rdb with the same size and same key-value structure cost only 1GB memory.
And my data looks like keyNum->{num1, num2, num3,......}. All numbers is between 1 and 4,000,000. So should I use List to store them? For now, I use lpush(k, v). Did this way cost too much?
The ratio of memory to dump size depends on the data types Redis uses internally.
For small objects (hashes, lists and sortedsets), redis uses ziplists to encode data. For small sets made of integers, redis uses Intsets. ZipLists and IntSets are stored on disk in the same format as they are stored in memory. So, you'd expect a 1:1 ratio if your data uses these encodings.
For larger objects, the in-memory representation is completely different from the on-disk representation. The on-disk format is compressed, doesn't have pointers, doesn't have to deal with memory fragmentation. So, if your objects are large, a 10:1 memory to disk ratio is normal and expected.
If you want to know which objects eat up memory, use redis-rdb-tools to profile your data (disclaimer: I am the author of this tool). From there, follow the memory optimization notes on redis.io, as well as the memory optimization wiki entry on redis-rdb-tools.
There may be more to it, but I believe Redis compresses the dump files.