For apache Ignite, when a key-value cache is spilled to disk is it spilled a key at a time or will it spill part of the value to disk?
In addition, for IgniteCache.localentries() will that automatically read all the values from disk or is there a way to traverse the in-memory ones first?
Eviction happens on per-entry basis. E.g., you can't remove only half of the value from memory.
As for localEntries method, its behavior depends on provided CachePeekMode(s). For example, to get entries from all storage layers, call it like this:
cache.localEntries(CachePeekMode.ALL);
Related
How can I limit maximum size on disk when using Ignite Persistence? For example, my data set in a database is 5TB. I want to cache maximum of 50GB of data in memory with no more than 500GB on disk. Any reasonable eviction policy like LRU for on-disk data will work for me. Parameter maxSize controls in-memory size and I will set it to 50GB. What should I do to limit my disk storage to 500GB then? Looking for something like maxPersistentSize and not finding it there.
Thank you
There is no direct parameter to limit the complete disk usage occupied by the data itself. As you mentioned in the question, you can control in-memory regon allocation, but when a data region is full, data pages are going to be flushed and loaded on demand to/from the disk, this process is called page replacement.
On the other hand, page eviction works only for non-persistent cluster preventing it from running OOM. Personally, I can't see how and why that eviction might be implemented for the data stored on disk. I'm almost sure that other "normal" DBs like Postgres or MySQL do not have this option either.
I suppose you might check the following options:
You can limit WAL and WAL archive max sizes. Though these items are rather utility ones, they still might occupy a lot of space [1]
Check if it's possible to use expiry policies on your data items, in this case, data will be cleared from disk as well [2]
Use monitoring metrics and configure alerting to be aware if you are close to the disk limits.
I want to understand when a cache is created with native persistence enabled, will it store the data in the defined data region/RAM and in the disk at the same time? Is there any way I can restrict the disk utilization for storing the data?
Additionally, in a cluster of 3 due to any reason the disk got full for one of the nodes and there is not enough memory available, what will be the impact on the cluster?
Yes, data will be stored both in RAM and on the disk. I does not have to fit in RAM at the same time.
If you run out of disk space, your persistent store will likely be corrupted.
I understand the native persistence mode of Apache Ignite allows to store as much as possible data in memory - and the potential remaining data on disk.
Is it possible to manually choose which table I want to store in memory and which I want to store EXCLUSIVELY on disk? If I want to save costs, should I just give Ignite a lot of disk space and just a small amount of memory? What if I know some tables should return results as fast as possible while other tables have lower priorities in terms of speed (even if they are accessed more often)? Is there any feature to prioritize data storage into memory at table level or any other level?
You can define two different data regions - one with small amount of memory and enabled persistence and second without persistence, but with bigger max memory size: https://apacheignite.readme.io/docs/memory-configuration
You can't have a cache (which contains rows for a table) to be stored exclusively on disk.
When you add a row to table it gets stored in Durable Memory, which is always located in RAM. Later it may be flushed to disk via Checkpointing process, which will use checkpoint page buffer, which is also in RAM. So you can have a separate region with low memory usage (see another answer) but you can't have data exclusively on disk.
When you access data it will always be pulled from disk to Durable Memory, too.
Is there a way to set Redis so it will never evict data and cause a hard failure if it runs out of memory? I need to ensure no data is lost; I am not using this as a permanent data storage, mechanism, but rather for more of a temporary data storage mechanism for high volume/high performance data transformations.
Is there an alternative NoSQL data store that could come close in performance, but utilize disk read/write if memory runs out; this is not ideal, but is better than losing data. I am reading/writing/updating millions of JSON documents (12+ million and growing).
Yes.
First make sure that you set the maxmemory directive (in the conf file or with a CONFIG SET) to a value other than 0. This will instruct Redis to use that value as it's memory upper limit.
Next, set the maxmemory-policy directive to noeviction - this will cause Redis to return an OOM (out of memory) error when maxmemory is reached when trying to write to it.
See the config file in-file documentation for more details on these directives: http://download.redis.io/redis-stable/redis.conf
Is Redis memory only store like memcached or does it write the data to the disk? If it does write to the disk, how often is the disk written to?
Redis persistence is described in detail here:
http://redis.io/topics/persistence
By default, redis performs snapshotting:
By default Redis saves snapshots of the dataset on disk, in a binary file called dump.rdb. You can configure Redis to have it save the dataset every N seconds if there are at least M changes in the dataset, or you can manually call the SAVE or BGSAVE commands.
For example, this configuration will make Redis automatically dump the dataset to disk every 60 seconds if at least 1000 keys changed: save 60 1000
Another good reference is this link to the author's blog where he tries to explain how redis persistance works:
http://antirez.com/post/redis-persistence-demystified.html
Redis holds all data in memory. If the size of an application's data is too large for that, then Redis is not an appropriate solution.
However, Redis also offers two ways to make the data persistent:
1) Snapshots at predefined intervals, which may also depend on the number of changes. Any changes between these intervals will be lost at a power failure or crash.
2) Writing a kind of change log at every data change. You can fine-tune how often this is physically written to the disk, but if you chose to always write immediately (which will cost you some performance), then there will be no data loss caused by the in-memory nature of Redis.