Syncronize multiple instances of Spring Cache with a Redis lock - redis

I'm building a Spring Boot application that uses Spring Cache with a Redis backing store and needs to synchronize the updates made to the cache.
The caching is not made on the fly, but by an scheduled process that updates the cache periodically.
The algorithm I came up with is:
periodically the instances will check if the Redis cache is older than some predetermined time
if that's the case, the instance will try to acquire a lock on some Redis key
if the instance successfully locks the key, it will then proceed with the update
if some other instance already locked the key, move on
all instances can still read the cache
Everything is more or less already built, all I need is to implement the locking/releasing mechanism.
Spring Cache is using Lettuce to interact with Redis, what is the best way to get an connection to Redis and manage the locking mechanism?

As you may already be aware, Spring's Cache Abstraction provides simple coordination amongst multiple Threads in a single Spring [Boot] application process using the sync attribute on the #Cacheable annotation (see ref doc).
NOTE: Despite the comment ("... use the sync attribute to instruct the underlying cache provider to lock the cache entry while the value is being computed. As a result, only one thread is busy computing the value, while the others are blocked until the entry is updated in the cache.") in the documentation, the locking mechanics is handled by the core framework itself, and in most cases, not the provider. Anyway...
However, this "coordination" is only per-process and will not work for multiple Spring [Boot] application instances, or (OS) JVM processes. In this case, you need some form of distributed locking across your multiple Spring [Boot] application instances to coordinates access to shared cache entries stored in the single Redis server (cluster) shared by your Spring [Boot] application instances.
I am no Redis expert (I am still learning), but I am familiar with similar NoSQL stores (Apache Geode/VMware GemFire, Hazelcast, etc) and distributed locking mechanisms. I see that distributed locking is possible to achieve with Redis as well. In a quick search, I found "Distributed Locking" in Redis, and specifically, "Building a lock in Redis". This is probably the best way to go.
In addition, if you want to make this distributed locking automatically/transparently available through Spring's Cache Abstraction, then you could possibly create a custom AOP Aspect and weave this Aspect together with the framework provided Caching Aspect (Interceptor), being conscious of ordering, as 1 idea.
Alternatively, you could implement wrapper implementations for the Spring Cache and CacheManager SPI interfaces that implement distributed locking on top of the core Redis Cache and CacheManager provider implementations provided by Spring Boot/Spring Data Redis.
Of course, there are multiple ways to go about this. Just tossing out more ideas, but have a look at the distributed locking information in the book.

Related

ChronicleMap Recovery with multi process application

We are evaluating ChronicleMap and our application runs cluster mode with nodes ranging from 5 to 45. The plan is to have the ChronicleMap persisted in shared NFS folder so that all the nodes can read/write.
There are more likely chance that individual nodes could go down for various reasons in the middle of a read/write operation with this said. I have some questions
If node-1 goes down during a write operation, can another healthy node-2 in the cluster still continue to read/write to the files?
Lets say we implement some logic to detect a server crash and call the .recoverPersistedTo() on restart. Will this cause any issues while other healthy nodes in the cluster are reading/writing to the files? The reason I ask this question is that the document says
“You must ensure that no other process is accessing the Chronicle Map
store when calling .recoverPersistedTo()”
I have read that using .recoverPersistedTo() in place is createPersistedTo() is not a good practice, but what are the downsides?
First of all, we (Chronicle) don't support putting Chronicle Map files on NFS (as we use memory mapping and NFS is known to cause problems with it). Additionally, trying to use recovery on NFS will cause data corruption as there's no adequate file locking on NFS, and recovery tries to lock the file to prevent simultaneous recovery by multiple processes. In general, open source Chronicle Map is supposed to be used by multiple processes on the same host.
The solution to your problem is commercial Map Enterprise which supports map replication between nodes, please contact sales#chronicle.software for details.

Redisson: Locking with Spring cache

Redisson provides support for locking backed by Redis. It also provides implementation for working with spring cache framework. But based on what I saw locking is not called by default when try to update a key in a cache using spring cache framework. Redisson has separate APIs for locking a particular key. Is that correct?
Also the locking APIs seem to take key as an input so I am not clear how locking works. For locking I am assuming you need both cache name and key.
I am new to redis so any help in throwing some light on this is really appreciated. Thanks
Firstly, locking in Redisson is implement by Redis, but not only used for Redis updating.
For example if you want to implement an atomic operation like this:
Get key value from Redis
Calculate a new value based on some logic
Save the new value to Redis and Mysql
You can use Redisson lock to make the operation atomically.
Secondly, in Redis, set/update command is atomic and you don't need to lock the key if you only update the value.
And for locking API, Redisson implement lock by Redis key/value, so you only need to provide lock-key, which generally contains a resource id and resource type(like "lock:user:31352")

How to setup a Akka.NET cluster when I do not really need persistence?

I have a fairly simple Akka.NET system that tracks in-memory state, but contains only derived data. So any actor can on startup load its up-to-date state from a backend database and then start receiving messages and keep their state from there. So I can just let actors fail and restart the process whenever I want. It will rebuild itself.
But... I would like to run across multiple nodes (mostly for the memory requirements) and I'd like to increase/decrease the number of nodes according to demand. Also for releasing a new version without downtime.
What would be the most lightweight (in terms of Persistence) setup of clustering to achieve this? Can you run Clustering without Persistence?
This not a single question, so let me answer them one by one:
So I can just let actors fail and restart the process whenever I want - yes, but keep in mind, that hard reset of the process is a lot more expensive than graceful shutdown. In distributed systems if your node is going down, it's better for it to communicate that to the rest of the nodes before, than requiring them to detect the dead one - this is a part of node failure detection and can take some time (even sub minute).
I'd like to increase/decrease the number of nodes according to demand - this is a standard behavior of the cluster. In case of Akka.NET depending on which feature set are you going to use, you may sometimes need to specify an upper bound of the cluster size.
Also for releasing a new version without downtime. - most of the cluster features can be scoped to a set of particular nodes using so called roles. Each node can have it's set of roles, that can be used what services it provides and detect if other nodes have required capabilities. For that reason you can use roles for things like versioning.
Can you run Clustering without Persistence? - yes, and this is a default configuration (in Akka, cluster nodes don't need to use any form of persistent backend to work).

Zookeeper vs In-memory-data-grid vs Redis

I've found different zookeeper definitions across multiple resources. Maybe some of them are taken out of context, but look at them pls:
A canonical example of Zookeeper usage is distributed-memory computation...
ZooKeeper is an open source Apache™ project that provides a centralized infrastructure and services that enable synchronization across a cluster.
Apache ZooKeeper is an open source file application program interface (API) that allows distributed processes in large systems to synchronize with each other so that all clients making requests receive consistent data.
I've worked with Redis and Hazelcast, that would be easier for me to understand Zookeeper by comparing it with them.
Could you please compare Zookeeper with in-memory-data-grids and Redis?
If distributed-memory computation, how does zookeeper differ from in-memory-data-grids?
If synchronization across cluster, than how does it differs from all other in-memory storages? The same in-memory-data-grids also provide cluster-wide locks. Redis also has some kind of transactions.
If it's only about in-memory consistent data, than there are other alternatives. Imdg allow you to achieve the same, don't they?
https://zookeeper.apache.org/doc/current/zookeeperOver.html
By default, Zookeeper replicates all your data to every node and lets clients watch the data for changes. Changes are sent very quickly (within a bounded amount of time) to clients. You can also create "ephemeral nodes", which are deleted within a specified time if a client disconnects. ZooKeeper is highly optimized for reads, while writes are very slow (since they generally are sent to every client as soon as the write takes place). Finally, the maximum size of a "file" (znode) in Zookeeper is 1MB, but typically they'll be single strings.
Taken together, this means that zookeeper is not meant to store for much data, and definitely not a cache. Instead, it's for managing heartbeats/knowing what servers are online, storing/updating configuration, and possibly message passing (though if you have large #s of messages or high throughput demands, something like RabbitMQ will be much better for this task).
Basically, ZooKeeper (and Curator, which is built on it) helps in handling the mechanics of clustering -- heartbeats, distributing updates/configuration, distributed locks, etc.
It's not really comparable to Redis, but for the specific questions...
It doesn't support any computation and for most data sets, won't be able to store the data with any performance.
It's replicated to all nodes in the cluster (there's nothing like Redis clustering where the data can be distributed). All messages are processed atomically in full and are sequenced, so there's no real transactions. It can be USED to implement cluster-wide locks for your services (it's very good at that in fact), and tehre are a lot of locking primitives on the znodes themselves to control which nodes access them.
Sure, but ZooKeeper fills a niche. It's a tool for making a distributed applications play nice with multiple instances, not for storing/sharing large amounts of data. Compared to using an IMDG for this purpose, Zookeeper will be faster, manages heartbeats and synchronization in a predictable way (with a lot of APIs for making this part easy), and has a "push" paradigm instead of "pull" so nodes are notified very quickly of changes.
The quotation from the linked question...
A canonical example of Zookeeper usage is distributed-memory computation
... is, IMO, a bit misleading. You would use it to orchestrate the computation, not provide the data. For example, let's say you had to process rows 1-100 of a table. You might put 10 ZK nodes up, with names like "1-10", "11-20", "21-30", etc. Client applications would be notified of this change automatically by ZK, and the first one would grab "1-10" and set an ephemeral node clients/192.168.77.66/processing/rows_1_10
The next application would see this and go for the next group to process. The actual data to compute would be stored elsewhere (ie Redis, SQL database, etc). If the node failed partway through the computation, another node could see this (after 30-60 seconds) and pick up the job again.
I'd say the canonical example of ZooKeeper is leader election, though. Let's say you have 3 nodes -- one is master and the other 2 are slaves. If the master goes down, a slave node must become the new leader. This type of thing is perfect for ZK.
Consistency Guarantees
ZooKeeper is a high performance, scalable service. Both reads and write operations are designed to be fast, though reads are faster than writes. The reason for this is that in the case of reads, ZooKeeper can serve older data, which in turn is due to ZooKeeper's consistency guarantees:
Sequential Consistency
Updates from a client will be applied in the order that they were sent.
Atomicity
Updates either succeed or fail -- there are no partial results.
Single System Image
A client will see the same view of the service regardless of the server that it connects to.
Reliability
Once an update has been applied, it will persist from that time forward until a client overwrites the update. This guarantee has two corollaries:
If a client gets a successful return code, the update will have been applied. On some failures (communication errors, timeouts, etc) the client will not know if the update has applied or not. We take steps to minimize the failures, but the only guarantee is only present with successful return codes. (This is called the monotonicity condition in Paxos.)
Any updates that are seen by the client, through a read request or successful update, will never be rolled back when recovering from server failures.
Timeliness
The clients view of the system is guaranteed to be up-to-date within a certain time bound. (On the order of tens of seconds.) Either system changes will be seen by a client within this bound, or the client will detect a service outage.

infinispan - locking in distributed cache with hot rod, is it possible?

I have to use distributed cache and I would like to use Infinispan 5.3 for that.
I examined the different connection modes and I picked hot rod to implement the client-server communication. I also need to lock a specific key in the cache and later after processing to unlock it (the places for locking and unlocking are in different class in my application...).
I read many documents, articles and forum entries regarding the issue but I haven't found any solution so far. If I interpreted properly what I read then it is not possible to lock the key manually in hot rod. I tried to handle the transactions manually but I am not sure how to do that. Perhaps it is not possible in Infinispan 5.3...?
Or can you tell me a different connection mode (instead of hot rod) that can provide me client-server communication and the locking is solved?
Thanks,
V.
Remote transactions (and locking) via HotRod are not supported in Infinispan 5.3.
See ISPN-375 and ISPN-848.