Understanding Infinispan Eviction, Expiration and File store? - infinispan

Consider a Infinispan cache ( version 5.3.0.Final) Which having following properties,
Have file store
Passivation is set to true.
I have following problems when understanding the cache behavior.
Is there two threads for eviction and expiration ?
When expiration thread runs, what happens to entries which are in file, but has expired ? Do those load back to memory and removed ?
What is time duration for these threads to run ?
Does the file store file is append-only file ?
Does file has a index in this Infinispan version ?
What exactly stored in file in this Infinispan version ? Is it key-value or just value ?

I won't speak about such an old version, but it's likely the same.
The naming is a bit messy, TBH. There's threadpool with id org.infinispan.executors.eviction with single thread by default, which hosts ScheduledTask that processes expiration. Eviction is triggered only when you add something to the data container, and it is processed by the thread that added the new item.
Depends on cache store implementation - cachestore SPI has method purgeExpired() which forces removal of expired entries from the store. Nothing needs to be loaded into memory.
By default it's 1 minute. Search for wakeUpInterval (or wake-up-interval) in configuration.
No, none of the classical file stores. SoftIndexFileStore uses similar technique.
FileCacheStore has just several 'buckets' and is based on key hashCode, SingleFileCacheStore (or KarstenFileCacheStore, depends on your version) has in-memory index.
Both keys and values.

Related

How does Redis perform at peak load?

Can't seem to find an answer and benchmarks are really tale-telling.
How does Redis handle itself during peak load/usage?
The question comes from knowing CPU usage may hit 100% of its logical core, or memory may be over used.
What happens in these cases?
In general, Redis isn't CPU heavy, and will act like any other application when CPU usage is high, but this largely depends on your Redis version.
Prior to Redis 4.0, it was entirely single-threaded, and long running operations would block (like background saves, DEL's with large objects, etc.) Since 4.0, most of these types of operations are pushed to the background. With saves to disk with the bgsave command, Redis now forks itself and does the save in the child, leaving the parent open to accept changes. 6.0 changed a few things like the del command now acts as unlink, and pushes the actual delete to a thread. There were some plans to add more multi-threading to Redis version 7.0, but it appears this was pushed to 7.2.
The biggest concern however is reaching max system memory, or the maxmemory directive in Redis' config. When this happens, Redis' eviction policy comes into play (set by the maxmemory-policy directive).
Here are the available eviction policies and what they do:
noeviction: New values aren’t saved when memory limit is reached. When a database uses replication, this applies to the primary database
allkeys-lru: Keeps most recently used keys; removes least recently used (LRU) keys
allkeys-lfu: Keeps frequently used keys; removes least frequently used (LFU) keys
volatile-lru: Removes least recently used keys with the expire field set to true.
volatile-lfu: Removes least frequently used keys with the expire field set to true.
allkeys-random: Randomly removes keys to make space for the new data added.
volatile-random: Randomly removes keys with expire field set to true.
volatile-ttl: Removes least frequently used keys with expire field set to true and the shortest remaining time-to-live (TTL) value.
The default maxmemory-policy is noeviction from Redis version 3.0 up through 7.0. In version 2.8 and earlier, the default is volatile-lru.
You can read the Key Eviction docs for more.

Auto Syncing for Keys in Apache Geode

I have an Apache Geode setup, connected with external Postgres datasource. I have a scenario where I define an expiration time for a key. Let's say after T time the key is going to expire. Is there a way so that the keys which are going to expire can make a call to an external datasource and update the value incase the value has been changed? I want a kind of automatic syncing for my keys which are there in Apache Geode. Is there any interface which i can implement and get the desired behavior?
I am not sure I fully understand your question. Are you saying that the values in the cache may possibly be more recent than what is currently stored in the database?
Whether you are using Look-Aside Caching, Inline Caching, or even Near Caching, Apache Geode combined with Spring would take care of ensuring the cache and database are kept in sync, to some extent depending on the caching pattern.
With Look-Aside Caching, if used properly, the database (i.e. primary System of Record (SOR), e.g. Postgres in your case) should always be the most current. (Look-Aside) Caching is secondary.
With Synchronous Inline Caching (using a CacheLoader/CacheWriter combination for Read/Write-Through) and in particular, with emphasis on CacheWriter, during updates (e.g. Region.put(key, value) cache operations), the DB is written to first, before the entry is stored (or overwritten) in the cache. If the DB write fails, then the cache entry is not written or updated. This is true each time a value for a key is updated. If the key has not be updated recently, then the database should reflect the most recent value. Once again, the database should always be the most current.
With Asynchronous Inline Caching (using AEQ + Listener, for Write-Behind), the updates for a cache entry are queued and asynchronously written to the DB. If an entry is updated, then Geode can guarantee that the value is eventually written to the underlying DB regardless of whether the key expires at some time later or not. You can persist and replay the queue in case of system failures, conflate events, and so on. In this case, the cache and DB are eventually consistent and it is assumed that you are aware of this, and this is acceptable for your application use case.
Of course, all of these caching patterns and scenarios I described above assume nothing else is modifying the SOR/database. If another external application or process is also modifying the database, separate from your Geode-based application, then it would be possible for Geode to become out-of-sync with the database and you would need to take steps to identify this situation. This is rather an issue for reads, not writes. Of course, you further need to make sure that stale cache entries does not subsequently overwrite the database on an update. This is easy enough to handle with optimistic locking. You could even trigger a cache entry remove on an DB update failure to have the cache refreshed on read.
Anyway, all of this is to say, if you applied 1 of the caching patterns above correctly, the value in the cache should already be reflected in the DB (or will be in the Async, Write-Behind Caching UC), even if the entry eventually expires.
Make sense?

Redisson local cache use

I have two questions regarding the reddison client:
Does redisson support automatic synchronization of local cache with remote redis cache (when remote cache data change or invalidate)?
I understand that redisson supports data partitioning only in pro edition but isn't that feature already supported OOTB by redis cluster mode? Am I missing something here?
Answering to your questions:
RLocalCachedMap has two synchronization strategies:
INVALIDATE - Used by default. Invalidate cache entry across all RLocalCachedMap instances on map entry change.
UPDATE - Update cache entry across all LocalCachedMap instances on map entry change.
Right, all Redisson objects works also in cluster mode. Each object tied to some Redis node and its content always remain only on the same Redis node and not distributed. If your object couldn't fit in single Redis node then you need to use data partitioning feature. This feature evenly distributes content of object across multiple Redis nodes in cluster.
Re: "local cache truely local" -- I think you can just use a java Map, initially populate it with a RMap contents then from then on just serve your requests from the 'truely local' map in memory.

What to do with old files of the SoftIndexFileStore in Infinispan persistent cache store?

I have a clustered cache store set up with Infinispan (8.2.4 Final) using the SoftIndexFileStore for persistence.
The documentation states that if entries expire it's not possible for the Compactor to cleanup purged entries and the disk usage will grow overtime. From the userguide:
When entries are stored with expiration, SIFS cannot detect that some
of those entries are expired. Therefore, such old file will not be
compacted (method AdvancedStore.purgeExpired() is not implemented).
This can lead to excessive file-system space usage.
Most of my entries expire but there are some which need to persist indefinitely meaning I can't simply run a cleanup job every once in while to delete all the data files.
How to deal with this wasted disk usage? After several weeks of running I see many files which haven't been modified in weeks. Is it safe to delete old files which haven't been modified e.g. less than a month ago?
No; old files won't ever be modified again (they are written once and then considered immutable until removed). Removing them manually could lead to failures since these files are referenced in the index.
Regrettably, when the store is iterated and the entries are found expired, the Compactor.free() is not called, because there could be multiple concurrent iterations and we could end up calling it many times for single entry.
A proper solution would be implementing a periodic (or JMX-triggered) process that goes through old files, computes space occupied by expired entries and schedules files that exceed some threshold for compaction. This should go into Compactor. Please see SIFS javadoc for general design description.
If you're interested in developing this feature and you want to discuss that more, please go to Infinispan forum.

The meaning of evict() in infinispan cache

According to the docs for infinispan: http://docs.jboss.org/infinispan/5.0/apidocs/ the evict() API does not remove the entry from any other cache stores in the cluster, on the cache store it was invoked on.
If using "replication" mode, where the data is replicated across the caches, surely it has to be consisted and using the evict() API will make it inconsistent.
How then is the inconsistency resolved?
Thanks
Evict removes the entry only from the memory on the node where you call it. It does not make the cache inconsistent, because if you call cache.get() and the entry is not found in memory, it is loaded from cache store.
As the documentation states, the purpose is to inform cache that it won't use the entry for some time and the node can free some memory.