For Ignite Persistence, how to control maximum data size on disk? - ignite

How can I limit maximum size on disk when using Ignite Persistence? For example, my data set in a database is 5TB. I want to cache maximum of 50GB of data in memory with no more than 500GB on disk. Any reasonable eviction policy like LRU for on-disk data will work for me. Parameter maxSize controls in-memory size and I will set it to 50GB. What should I do to limit my disk storage to 500GB then? Looking for something like maxPersistentSize and not finding it there.
Thank you

There is no direct parameter to limit the complete disk usage occupied by the data itself. As you mentioned in the question, you can control in-memory regon allocation, but when a data region is full, data pages are going to be flushed and loaded on demand to/from the disk, this process is called page replacement.
On the other hand, page eviction works only for non-persistent cluster preventing it from running OOM. Personally, I can't see how and why that eviction might be implemented for the data stored on disk. I'm almost sure that other "normal" DBs like Postgres or MySQL do not have this option either.
I suppose you might check the following options:
You can limit WAL and WAL archive max sizes. Though these items are rather utility ones, they still might occupy a lot of space [1]
Check if it's possible to use expiry policies on your data items, in this case, data will be cleared from disk as well [2]
Use monitoring metrics and configure alerting to be aware if you are close to the disk limits.

Related

Ignite durable memory settings

I want to understand when a cache is created with native persistence enabled, will it store the data in the defined data region/RAM and in the disk at the same time? Is there any way I can restrict the disk utilization for storing the data?
Additionally, in a cluster of 3 due to any reason the disk got full for one of the nodes and there is not enough memory available, what will be the impact on the cluster?
Yes, data will be stored both in RAM and on the disk. I does not have to fit in RAM at the same time.
If you run out of disk space, your persistent store will likely be corrupted.

Choose to store table exclusively on disk in Apache Ignite

I understand the native persistence mode of Apache Ignite allows to store as much as possible data in memory - and the potential remaining data on disk.
Is it possible to manually choose which table I want to store in memory and which I want to store EXCLUSIVELY on disk? If I want to save costs, should I just give Ignite a lot of disk space and just a small amount of memory? What if I know some tables should return results as fast as possible while other tables have lower priorities in terms of speed (even if they are accessed more often)? Is there any feature to prioritize data storage into memory at table level or any other level?
You can define two different data regions - one with small amount of memory and enabled persistence and second without persistence, but with bigger max memory size: https://apacheignite.readme.io/docs/memory-configuration
You can't have a cache (which contains rows for a table) to be stored exclusively on disk.
When you add a row to table it gets stored in Durable Memory, which is always located in RAM. Later it may be flushed to disk via Checkpointing process, which will use checkpoint page buffer, which is also in RAM. So you can have a separate region with low memory usage (see another answer) but you can't have data exclusively on disk.
When you access data it will always be pulled from disk to Durable Memory, too.

Ignite Local Entries and Spilled to Disk Entries

For apache Ignite, when a key-value cache is spilled to disk is it spilled a key at a time or will it spill part of the value to disk?
In addition, for IgniteCache.localentries() will that automatically read all the values from disk or is there a way to traverse the in-memory ones first?
Eviction happens on per-entry basis. E.g., you can't remove only half of the value from memory.
As for localEntries method, its behavior depends on provided CachePeekMode(s). For example, to get entries from all storage layers, call it like this:
cache.localEntries(CachePeekMode.ALL);

Is Redis a memory only store like memcached or does it write the data to the disk

Is Redis memory only store like memcached or does it write the data to the disk? If it does write to the disk, how often is the disk written to?
Redis persistence is described in detail here:
http://redis.io/topics/persistence
By default, redis performs snapshotting:
By default Redis saves snapshots of the dataset on disk, in a binary file called dump.rdb. You can configure Redis to have it save the dataset every N seconds if there are at least M changes in the dataset, or you can manually call the SAVE or BGSAVE commands.
For example, this configuration will make Redis automatically dump the dataset to disk every 60 seconds if at least 1000 keys changed: save 60 1000
Another good reference is this link to the author's blog where he tries to explain how redis persistance works:
http://antirez.com/post/redis-persistence-demystified.html
Redis holds all data in memory. If the size of an application's data is too large for that, then Redis is not an appropriate solution.
However, Redis also offers two ways to make the data persistent:
1) Snapshots at predefined intervals, which may also depend on the number of changes. Any changes between these intervals will be lost at a power failure or crash.
2) Writing a kind of change log at every data change. You can fine-tune how often this is physically written to the disk, but if you chose to always write immediately (which will cost you some performance), then there will be no data loss caused by the in-memory nature of Redis.

RavenDB disk storage

I have a requirement to keep the RavenDB database running when the disk for main database and index storage is full. I know I can configure provide a drive for storage with config option - Raven/IndexStoragePath
But I need to design for the corner case of when the this disk is full. What is the usual pattern used in this situation. One way is to halt all access while shutting the service down and updating the config file programatically and then start the service - but it is a bit risky.
I am aware of sharding and this question is not related to that , assume that sharding is enables and I have multiple shards and I want to increase storage for each shard by adding a new drive to each. Is there an elegant solution to this?
user544550,
In a disk full scenario, RavenDB will continue to operate, but will refuse to accept further writes.
Indexing will fail as well and eventually mark the indexes as permanently failing.
What is your actual scenario?
Note that in RavenDB, indexes tend to be significantly smaller than the actual data size, so the major cause for disk space utilization is actually the main database, not the indexes.