Ever increasing Disk space for a cache in ignite storage directory - ignite

We are using ignite 2.12 with persistence enabled along with full sync - replicated mode.
There is a cache whose data ranges from few Kilobytes to few Megabytes.
Once a particular cache entry is worked upon, we remove that entry from cache. The cache goes through a large number of reads/writes parallely.
But what we have observed is, the disk space utilized by the cache keeps on increasing. I had expected the disk space to be reused - (deleted cache entries disk partitions) for new cache entries being inserted , but this is not happening.
Any reason for this behavior and any solution to avoid such problems?

Entry removal doesn't reclaim the file size, instead, the allocated space is reserved for reuse and never shrinks itself. There is a persistence defragmentation feature in GridGain, and most likely Apache Ignite, that explains the concept and might help you to reduce disk usage.
In the meantime, cache#destroy should do the trick as well.

Related

Is redis using cache?

I restarted my redis server after 120 days.
Before restart, memory usage 29.5GB
After restarted, memory usage 27.5GB
So, how 2GB reduced comes?
Free memory in ram like this article https://redis.io/topics/memory-optimization
Redis will not always free up (return) memory to the OS when keys are
removed. This is not something special about Redis, but it is how most
malloc() implementations work. For example if you fill an instance
with 5GB worth of data, and then remove the equivalent of 2GB of data,
the Resident Set Size (also known as the RSS, which is the number of
memory pages consumed by the process) will probably still be around
5GB, even if Redis will claim that the user memory is around 3GB. This
happens because the underlying allocator can't easily release the
memory. For example often most of the removed keys were allocated in
the same pages as the other keys that still exist. The previous point
means that you need to provision memory based on your peak memory
usage. If your workload from time to time requires 10GB, even if most
of the times 5GB could do, you need to provision for 10GB.
However allocators are smart and are able to reuse free chunks of
memory, so after you freed 2GB of your 5GB data set, when you start
adding more keys again, you'll see the RSS (Resident Set Size) to stay
steady and don't grow more, as you add up to 2GB of additional keys.
The allocator is basically trying to reuse the 2GB of memory
previously (logically) freed.
Because of all this, the fragmentation ratio is not reliable when you
had a memory usage that at peak is much larger than the currently used
memory. The fragmentation is calculated as the amount of memory
currently in use (as the sum of all the allocations performed by
Redis) divided by the physical memory actually used (the RSS value).
Because the RSS reflects the peak memory, when the (virtually) used
memory is low since a lot of keys / values were freed, but the RSS is
high, the ratio mem_used / RSS will be very high.
Or free memory of caches which was used by my redis server?
Is redis using cache? Cache of cache?
Thanks!

Redis-server using all RAM at startup

i'm using redis and noticed that it crashes with the following error :
MISCONF Redis is configured to save RDB snapshots
I tried the solution suggested in this post
but everything seems to be OK in term of permissions and space.
htop command tells me that redis is consuming 70% of RAM. i tried to stop / restart redis in order to flush but at startup, the amount of RAM used by redis was growing up dramatically and stops around 66%. I'm pretty sure at this moment no processus was using any redis instance !
what happens there ?
The growing up ram issue is an expected behaviour of Redis at first data load, after restarts, writing the data to disk (snapshot process). Redis tends to allocate memory as much as it can unless you don't use "maxmemory" option at your conf file.
It allocates memory but not release immediately. Sometimes it takes hours, I saw such cases.
Well known fact about Redis is that, it can allocate memory up to twice size of the dataset it keeps.
I suggest you to wait couple of hours without any restart (Redis can work in this time, get/set operations etc.) and keep watching the memory.
Please check that too
Redis will not always free up (return) memory to the OS when keys are
removed. This is not something special about Redis, but it is how most
malloc() implementations work. For example if you fill an instance
with 5GB worth of data, and then remove the equivalent of 2GB of data,
the Resident Set Size (also known as the RSS, which is the number of
memory pages consumed by the process) will probably still be around
5GB, even if Redis will claim that the user memory is around 3GB. This
happens because the underlying allocator can't easily release the
memory. For example often most of the removed keys were allocated in
the same pages as the other keys that still exist.

Configuring Lucene Index writer, controlling the segment formation (setRAMBufferSizeMB)

How to set the parameter - setRAMBufferSizeMB? Is depending on the RAM size of the Machine? Or Size of Data that needs to be Indexed? Or any other parameter? could someone please suggest an approach for deciding the value of setRAMBufferSizeMB.
So, what we have about this parameter in Lucene javadoc:
Determines the amount of RAM that may be used for buffering added
documents and deletions before they are flushed to the Directory.
Generally for faster indexing performance it's best to flush by RAM
usage instead of document count and use as large a RAM buffer as you
can. When this is set, the writer will flush whenever buffered
documents and deletions use this much RAM.
The maximum RAM limit is inherently determined by the JVMs available
memory. Yet, an IndexWriter session can consume a significantly larger
amount of memory than the given RAM limit since this limit is just an
indicator when to flush memory resident documents to the Directory.
Flushes are likely happen concurrently while other threads adding
documents to the writer. For application stability the available
memory in the JVM should be significantly larger than the RAM buffer
used for indexing.
By default, Lucene uses 16 Mb as this parameter (this is the indication to me, that you shouldn't have that much big parameter to have fine indexing speed). I would recommend you to tune this parameter by setting it let's say to 500 Mb and checking how well your system behave. If you will have crashes, you could try some smaller value like 200 Mb, etc. until your system will be stable.
Yes, as it stated in the javadoc, this parameter depends on the JVM heap, but for Python, I think it could allocate memory without any limit.

Redis snapshot overloading memory

I'm using redis as a client side caching mechanism.
Implemented with C# using stackexchange.redis.
I configured the snapshotting to "save 5 1" and rdbcompression is on.
The RDB mechanism loads the rdb file to memory every time it needs to append data.
The problem is when you have a fairly large RDB file and it's loaded to memory all at once. It chokes up the memory, disk and cpu for the average endpoint.
Is there a way to update the rdb file without loading the whole file to memory?
Also any other solution that lowers the load on the memory and cpu is welcome.
The RDB mechanism loads the rdb file to memory every time it needs to append data.
This isn't what the open source Redis server does (other variants, such as the MSFT fork, may behave differently) - RDBs are created by copying the contents of the memory to disk with a forked process. The dump's file is never loaded, except when used for recovery. The increased memory usage during the save process is dependent on the amount of writes performed while the dump is undergoing because of the copy-on-write (COW) mechanism.
Also any other solution that lowers the load on the memory and cpu is welcome.
There are several ways to tackle this, depending on your requirements and budget. These include:
Using both RDB and AOF for data persistency, thus reducing the frequency of dumps.
Delegating persistency to a slave instance.
Sharding your databases and performing cascading dumps.
We tackled the problem by using RDB and now use AOF exclusively.
We have reduced the memory peaks by reducing the auto-aof-rewrite-percentage and also limiting the auto-aof-rewrite-min-size to the desired size.

How to configure namespace to keep partial data as cache in ram and the remaining in hard disk?

I am trying to write some data to a namespace in Aerospike, but i don't have enough ram for the whole data.
How can i configure my Aerospike so that a portion of the data in kept in the ram as cache and remaining is kept in the hard drive?
Can I reduce the number of copies of data made in Aerospike kept in ram?
It can be done by modifying the contents ofaerospike.conf file but how exactly am i going to achieve it.
You should have seen the configuration page in aerospike documentation before asking such question
http://www.aerospike.com/docs/operations/configure/namespace/storage/
How can i configure my Aerospike so that a portion of the data in kept in the ram as cache and remaining is kept in the hard drive?
The post-write-queue parameter defines the amount of RAM to use to keep recently written records in RAM. As long as these records are still in the post-write-queue Aerospike will read directly from RAM rather than disk. This will allow you to configure a LRU cache for an namespace that is storage-engine device and data-in-memory false. Note that this is least recently updated (or created) rather than least recently used (read or write) cache eviction algorithm.