Why a 500MB Redis dump.rdb file takes about 5.0GB memory? - redis

Actually, I have 3 Redis instances and I put them together into this 500MB+ dump.rdb. The Redis server can read this dump.rdb and it seems that everything is ok. Then I notice that redis-server cost more than 5.0GB memory. I don't know why.
Is there anything wrong with my file? My db has about 3 million keys, values for each key is a list contains about 80 integers.
I use this METHOD to put 3 instance together.
PS:Another dump.rdb with the same size and same key-value structure cost only 1GB memory.
And my data looks like keyNum->{num1, num2, num3,......}. All numbers is between 1 and 4,000,000. So should I use List to store them? For now, I use lpush(k, v). Did this way cost too much?

The ratio of memory to dump size depends on the data types Redis uses internally.
For small objects (hashes, lists and sortedsets), redis uses ziplists to encode data. For small sets made of integers, redis uses Intsets. ZipLists and IntSets are stored on disk in the same format as they are stored in memory. So, you'd expect a 1:1 ratio if your data uses these encodings.
For larger objects, the in-memory representation is completely different from the on-disk representation. The on-disk format is compressed, doesn't have pointers, doesn't have to deal with memory fragmentation. So, if your objects are large, a 10:1 memory to disk ratio is normal and expected.
If you want to know which objects eat up memory, use redis-rdb-tools to profile your data (disclaimer: I am the author of this tool). From there, follow the memory optimization notes on redis.io, as well as the memory optimization wiki entry on redis-rdb-tools.

There may be more to it, but I believe Redis compresses the dump files.

Related

Why can't I store un-serialized data structure on disk the same way I can store them in memory?

Firstly, I am assuming that data structures, like a hash-map for example, can only be stored in-memory but not on disk unless they are serialized. I want to understand why not?
What is holding us back from dumping a block of memory which stores the data structure directly into disk without any modifications?
Something like a JSON could be thought of as a "serialized" python dictionary. We can very well store JSON in files, so why not a dict?
You may say how would you represent non-string values like bool/objects on disk? I can argue "the same way you store them in memory". Am I missing something here?
naming a few problems:
Big endian vs Little endian makes reading data from disk depend on the architecture of the CPU, so if you just dumped it you won't be able to read it again on different device.
items are not contagious in memory, a list (or dictionary) for example only contains pointers to things that exist "somewhere" in memory, you can only dump contagious memory, otherwise you are only storing the locations in memory that the data was in, which won't be the same when you load the program again.
the way structures are laid in memory can change between two compiled versions of the same program, so if you just recompile your application, you may get different layouts for structures in memory so you just lost your data.
different versions of the same application may wish to update the shape of the structures to allow extra functionality, this won't be possible if the data shape on disk is the same as in memory. (which is one of the reasons why you shouldn't be using pickle for portable data storage, despite it using a memory serializer)

How to use chronicle-map instead of redis as a data cache

I intend to use chronicle-map instead of redis, the application scenario is the memoryData module starts every day from the database to load hundreds of millions of records to chronicle-map, and dozens of jvm continue to read chronicle-map records. Each jvm has hundreds of threads. But probably because of the lack of understanding of the chronicle-map, the code poor performance, running slower, until the memory overflow. I wonder if the above practice is the correct use of chronicle-map.
Because Chronicle map stores your data off-heap it's able to store more data than you can hold in main memory, but will perform better if all the data can fit into memory, ( so if possible, consider increasing your machine memory, if this is not possible try to use an SSD drive ), another reason for poor performance maybe down to how you have sized the map in the chronicle map builder, for example how you have set the number of max entries, if this is too large it will effect performance.

As redis keeps data in memory, what can be the maximum size of data stored in Redis

I don't know much about redis, but what I know is Redis stores data in key-value format in memory and data is also persistent as stored in disk in intervals.
So I want to know, Let say if the RAM is 10GB, then can we store data in redis more than 10GB?
In fact, I am not much clear with the disk and memory usages that redis use to store.
From the Redis FAQ:
Redis is an in-memory but persistent on disk database, so it represents a different trade off where very high write and read speed is achieved with the limitation of data sets that can't be larger than memory.
So, unfortunately, no, your datasize is limited to the amount of RAM you've allowed Redis to use.
Situation is even worse. If you have 10GB of RAM then in fact you can store about 6-7GB. It is for at least two reasons:
Redis has certain memory overhead per a data item.
Redis forks in order to do snapshots. Which results in allocation of additional memory for all of the pages that have been changing during that process.

I'm considering Redis backed web application, but I'm not sure about the hardware requirements for Redis, or really precisely how Redis "works"?

I'm new to Redis, as in... I'm really not sure how it works. But I'm considering it for a web app with a relatively uncomplicated data structure that could benefit from Redis' speed. The thing is this app could end up getting millions and millions of rows. Since Redis is "in-memory" and "disk backed" does that mean I'm going to need enough memory to sport these millions of rows of values? Or does it only load values into memory that are recently or commonly accessed?
What sort of hardware requirements am I looking at? Does anyone have any real-world examples of Redis and hardware usage?
Redis handles memory in a great way. There are a few things to point out first. Redis will use less memory if compiled under a 32-bit system, but the maximum memory usage is 4GB. As for your hardware requirements, it all depends on why kind of data you are storing. If you are storing a million keys, but they only have a 8 character string inside it, the memory usage will be a lot lower than a 16 character string. The bottom line; If you are storing 1 million keys in memory, the ballpark memory usage might be around 1GB. I say might because there are a number of factors. Do you have virtual memory? Are all the keys access often? How big are the keys. There is a great post here that describes ways to improve redis memory usage
If you use the disk backend, then only the most frequently accessed keys will be stored in memory. You might have 1GB of data, but only 100Mb could be stored in memory. See here for more info
For hardware, lots of Ram.

Saving large objects to file

I'm working on a project in Objective-c where I need to work with large quantities of data stored in an NSDictionary (it's around max ~2 gigs in ram). After all the computations that I preform on it, it seems like it would be quicker to save/load the data when needed (versus re-parsing the original file).
So I started to look into saving large amount of data. I've tried using NSKeyedUnarchiver and [NSDictionary writeToFile:atomically:], but both failed with malloc errors (Can not allocate ____ bytes).
I've looked around SO, Apple's Dev forums and Google, but was unable to find anything. I'm wondering if it might be better to create the file bit-by-bit instead of all at once, but I can't anyway to add to an existing file. I'm not completely opposed to saving with a bunch of small files, but I would much rather use one big file.
Thanks!
Edited to include more information: I'm not sure how much overhead NSDictionary gives me, as I don't take all the information from the text files. I have a 1.5 gig file (of which I keep ~1/2), and it turns out to be around 900 megs through 1 gig in ram. There will be some more data that I need to add eventually, but it will be constructed with references to what's already loaded into memory - it shouldn't double the size, but it may come close.
The data is all serial, and could be separated in storage, but needs to all be in memory for execution. I currently have integer/string pairs, and will eventually end up with string/strings pairs (with all the values also being a key for a different set of strings, so the final storage requirements will be the same strings that I currently have, plus a bunch of references).
In the end, I will need to associate ~3 million strings with some other set of strings. However, the only important thing is the relationship between those strings - I could hash all of them, but NSNumber (as NSDictionary needs objects) might give me just as much overhead.
NSDictionary isn't going to give you the scalable storage that you're looking for, at least not for persistence. You should implement your own type of data structure/serialisation process.
Have you considered using an embedded sqllite database? Then you can process the data but perhaps only loading a fragment of the data structure at a time.
If you can, rebuilding your application in 64-bit mode will give you a much larger heap space.
If that's not an option for you, you'll need to create your own data structure and define your own load/save routines that don't allocate as much memory.