Choose to store table exclusively on disk in Apache Ignite - ignite

I understand the native persistence mode of Apache Ignite allows to store as much as possible data in memory - and the potential remaining data on disk.
Is it possible to manually choose which table I want to store in memory and which I want to store EXCLUSIVELY on disk? If I want to save costs, should I just give Ignite a lot of disk space and just a small amount of memory? What if I know some tables should return results as fast as possible while other tables have lower priorities in terms of speed (even if they are accessed more often)? Is there any feature to prioritize data storage into memory at table level or any other level?

You can define two different data regions - one with small amount of memory and enabled persistence and second without persistence, but with bigger max memory size: https://apacheignite.readme.io/docs/memory-configuration

You can't have a cache (which contains rows for a table) to be stored exclusively on disk.
When you add a row to table it gets stored in Durable Memory, which is always located in RAM. Later it may be flushed to disk via Checkpointing process, which will use checkpoint page buffer, which is also in RAM. So you can have a separate region with low memory usage (see another answer) but you can't have data exclusively on disk.
When you access data it will always be pulled from disk to Durable Memory, too.

Related

For Ignite Persistence, how to control maximum data size on disk?

How can I limit maximum size on disk when using Ignite Persistence? For example, my data set in a database is 5TB. I want to cache maximum of 50GB of data in memory with no more than 500GB on disk. Any reasonable eviction policy like LRU for on-disk data will work for me. Parameter maxSize controls in-memory size and I will set it to 50GB. What should I do to limit my disk storage to 500GB then? Looking for something like maxPersistentSize and not finding it there.
Thank you
There is no direct parameter to limit the complete disk usage occupied by the data itself. As you mentioned in the question, you can control in-memory regon allocation, but when a data region is full, data pages are going to be flushed and loaded on demand to/from the disk, this process is called page replacement.
On the other hand, page eviction works only for non-persistent cluster preventing it from running OOM. Personally, I can't see how and why that eviction might be implemented for the data stored on disk. I'm almost sure that other "normal" DBs like Postgres or MySQL do not have this option either.
I suppose you might check the following options:
You can limit WAL and WAL archive max sizes. Though these items are rather utility ones, they still might occupy a lot of space [1]
Check if it's possible to use expiry policies on your data items, in this case, data will be cleared from disk as well [2]
Use monitoring metrics and configure alerting to be aware if you are close to the disk limits.

Ignite durable memory settings

I want to understand when a cache is created with native persistence enabled, will it store the data in the defined data region/RAM and in the disk at the same time? Is there any way I can restrict the disk utilization for storing the data?
Additionally, in a cluster of 3 due to any reason the disk got full for one of the nodes and there is not enough memory available, what will be the impact on the cluster?
Yes, data will be stored both in RAM and on the disk. I does not have to fit in RAM at the same time.
If you run out of disk space, your persistent store will likely be corrupted.

Ignite Local Entries and Spilled to Disk Entries

For apache Ignite, when a key-value cache is spilled to disk is it spilled a key at a time or will it spill part of the value to disk?
In addition, for IgniteCache.localentries() will that automatically read all the values from disk or is there a way to traverse the in-memory ones first?
Eviction happens on per-entry basis. E.g., you can't remove only half of the value from memory.
As for localEntries method, its behavior depends on provided CachePeekMode(s). For example, to get entries from all storage layers, call it like this:
cache.localEntries(CachePeekMode.ALL);

Is Redis a memory only store like memcached or does it write the data to the disk

Is Redis memory only store like memcached or does it write the data to the disk? If it does write to the disk, how often is the disk written to?
Redis persistence is described in detail here:
http://redis.io/topics/persistence
By default, redis performs snapshotting:
By default Redis saves snapshots of the dataset on disk, in a binary file called dump.rdb. You can configure Redis to have it save the dataset every N seconds if there are at least M changes in the dataset, or you can manually call the SAVE or BGSAVE commands.
For example, this configuration will make Redis automatically dump the dataset to disk every 60 seconds if at least 1000 keys changed: save 60 1000
Another good reference is this link to the author's blog where he tries to explain how redis persistance works:
http://antirez.com/post/redis-persistence-demystified.html
Redis holds all data in memory. If the size of an application's data is too large for that, then Redis is not an appropriate solution.
However, Redis also offers two ways to make the data persistent:
1) Snapshots at predefined intervals, which may also depend on the number of changes. Any changes between these intervals will be lost at a power failure or crash.
2) Writing a kind of change log at every data change. You can fine-tune how often this is physically written to the disk, but if you chose to always write immediately (which will cost you some performance), then there will be no data loss caused by the in-memory nature of Redis.

Is there any logic in just maxing tempdb and never having it change size?

The reason I ask it we have a dedicated RAID10 array with ~150GB for the tempdb (the "t" drive). It is only used for storing tempdb. The t drive isn't used by by SQL Server or any other process for anything else.
Our DBA has tempdb setup with 15GB initial size and autogrow 20% increments. Everytime the server starts it resized to 15GB and then over the course of the day grows to ~80GB (on average). Now IT is looking into making initial size larger say 30 or 40GB but given the drive is ONLY used for tempdb my thinking is why not "max it" right away.
Is the any negative effect to simply create 4 data files in the primary group for tempdb give them each an initial size of 30GB (120GB total), turn autogrow off and be done with it?
Are there any limits on SQL Server ability to span multiple tempdb data files in one query? i.e. will it cause problems if the tempdb has say 70GB total free but the file used by one process is full (30 of 30GB used)?
I would size them to about 100GB and leave autogrow on, this way you don't have to wait for it to grow every time, I would also add multiple files
Is the any negative effect to simply
create 4 data files in the primary
group for tempdb give them each an
initial size of 30GB, turn autogrow
off and be done with it?
Sounds like a good plan to me, however I would leave autogrow on just in case someone decides to do a sort operation on a big table which doesn't have an index on that column
See also here: http://technet.microsoft.com/en-us/library/cc966534.aspx
It is recommended to have .25 to 1
data files (per filegroup) for each
CPU on the host server.
This is especially true for TEMPDB
where the recommendation is 1 data
file per CPU.
Dual core counts as 2 CPUs; logical
procs (hyperthreading) do not.
We have found it very useful to create large TempDB data and log files. Any actions that limit server OS activities such as resizing TempDB increase server efficiencies. We have a 16 processor machine with 113 GB dedicated to TempDB data space. This machine is dedicated to large SSIS ETL processes, thus resulting in mass data operations.
The bulk of our ETL operations spawn up to 4 SQL threads. After initially configuring a TempDB file for each processor (16), we quickly realized via performance monitoring that our configuration was forcing SQL\windows to unnecessarily span the multiple TempDB files. We settled on 5 larger TempDB data files and realized performance improvements. We have since moved on to a 24 processor box and are using 8 TempDB files.
Please note that this is a large data migration server; I’m sure transaction-oriented systems would still benefit from the recommended 1-1 processor to TempDB file configuration. It should also be noted that having a large increase % on a TempDB file may force a critical transaction to take the windows operation hit and thus may not be appropriate for your specific application.