Is there a way to know how much off-heap memory will each cache record take? My cache is:
IgniteCache<String, byte[]>
Each key is around 24-26 symbols and value is 12 bytes. After putting 40000 records off-heap usage grew by 8MB, which is around 210 bytes for each record. Page size is configured as 1KB, metrics show that page fill factor is around 0.97-1.0. Assuming there is not backups
Is there anywhere to read about how each record is stored in off-heap to understand where those 210 bytes come from? Queries are disabled. Or what could possible cause such consumption?
According to the docs https://www.gridgain.com/docs/latest/administrators-guide/capacity-planning it's exactly about the 200 bytes overhead for an entry, so I think it's kind of expected.
Related
How can I limit maximum size on disk when using Ignite Persistence? For example, my data set in a database is 5TB. I want to cache maximum of 50GB of data in memory with no more than 500GB on disk. Any reasonable eviction policy like LRU for on-disk data will work for me. Parameter maxSize controls in-memory size and I will set it to 50GB. What should I do to limit my disk storage to 500GB then? Looking for something like maxPersistentSize and not finding it there.
Thank you
There is no direct parameter to limit the complete disk usage occupied by the data itself. As you mentioned in the question, you can control in-memory regon allocation, but when a data region is full, data pages are going to be flushed and loaded on demand to/from the disk, this process is called page replacement.
On the other hand, page eviction works only for non-persistent cluster preventing it from running OOM. Personally, I can't see how and why that eviction might be implemented for the data stored on disk. I'm almost sure that other "normal" DBs like Postgres or MySQL do not have this option either.
I suppose you might check the following options:
You can limit WAL and WAL archive max sizes. Though these items are rather utility ones, they still might occupy a lot of space [1]
Check if it's possible to use expiry policies on your data items, in this case, data will be cleared from disk as well [2]
Use monitoring metrics and configure alerting to be aware if you are close to the disk limits.
We have a Redis Database that currently contains more than 10 Million keys in Production environment and as per our prediction it might grow to more than 100-200 Million keys.
Will it impact my Read/Write time to Redis?
I think raising of keys count will not impact the benchmark of Redis, but the write/read rate is limited to your Resources and you can't expect Redis to response you more than potential of resources. So if you try to read/write more, it may result to delay, connection lost or ...
My suggestion is to use Redis cluster(with multiple servers) to increase the read/write rate.
I'm assuming that during replica resynchronisation (full or partial), the master will attempt to send data as fast as possible to the replica. Wouldn't this mean the replica output buffer on the master would rapidly fill up since the speed the master can write is likely to be faster than the throughput of the network? If I have client-output-buffer-limit set for replicas, wouldn't the master end up closing the connection before the resynchronisation can complete?
Yes, Redis Master will close the connection and the synchronization will be started from beginning again. But, please find some details below:
Do you need to touch this configuration parameter and what is the purpose/benefit/cost of it?
There is a zero (almost) chance it will happen with default configuration and pretty much moderate modern hardware.
"By default normal clients are not limited because they don't receive data
without asking (in a push way), but just after a request, so only asynchronous clients may create a scenario where data is requested faster than it can read." - the chunk from documentation .
Even if that happens, the replication will be started from beginning but it may lead up to infinite loop when slaves will continuously ask for synchronization over and over. Redis Master will need to fork whole memory snapshot (perform BGSAVE) and use up to 3 times of RAM from initial snapshot size each time during synchronization. That will be causing higher CPU utilization, memory spikes network utilization (if any) and IO.
General recommendations to avoid production issues tweaking this configuration parameter:
Don't decrease this buffer and before increasing the size of the buffer make sure you have enough memory on your box.
Please consider total amount of RAM as snapshot memory size (doubled for copy-on-write BGSAVE process) plus the size of any other buffers configured plus some extra capacity.
Please find more details here
I'm looking at Redis backed up rdb files for a web application. There are 4 such files (for 4 different redis servers working concurrently), sizes being: 13G + 1.6G + 66M + 14M = ~15G
However, these same 4 instances seem to be taking 43.8GB of memory (according to new relic). Why such a large discrepancy between how much space redis data takes up in mem vs disk? Could it be a misconfiguration and can the issue be helped?
I don't think there is any problem.
First of all, the data is stored in compressed format in rdb file so that the size is less than what it is in memory. How small the rdb file is depends on the type of data, but it can be around 20-80% of the memory used by redis
Another reason your memory usage could be more than the actual usage(you can compare the memory from new relic to the one obtained from redis-cli info memory command) is because of memory fragmentation. Whenever redis needs more memory, it will get the memory allocated from the OS, but will not release it easilyly(when the key expires or is deleted). This is not a big issue, as redis will ask for more memory only after using the extra memory that it has. You can also check the memory fragmentation using redis-cli info memory command.
I'm using ImageResizer as an Azure webapp with a service plan with 50 Gb file storage. My settings for DiskCache are:
<diskcache dir="~/imagecache" autoclean="true" hashModifiedDate="true" subfolders="1024" asyncWrites="true" asyncBufferSize="10485760" cacheAccessTimeout="15000" logging="true"/>
But that doesn't seem to stop the imagecache folder to get to the 50 Gb limit quite quickly. I have around 100 Gb of images in blob storage (original size), not all will be used on the same day, however the same image could be cached with different parameters multiple times. The images cached are around 200Kb average?.
Is there a way to stop the storage filling up so quick? Is there maybe a better way of using DiskCache? or use something else? The Premium Plans with 250Gb and decent CPU/RAM are far too expensive to justify the cost for this.
Thanks
You can't limit the cache by files size, only by a (very) rough count. Deleting the cache and setting subfolders="256" should keep you under 50GB, assuming that 200kb average holds true.
... However, if your cache fills up "quickly" (as in 1-3 days), then you're probably going to experience serious cache churn and poor performance as your disk write queue skyrockets.
You might consider using a CDN if you can't get storage space for, say, 10 days worth of cached files.