Reactive programming - running jobs in a cluster - locking

I need to run some jobs in a cluster, only one at a time.
Because my team uses Hazelcast, I ended up with a solution based on
Hazelcast ILock implementation. For the purpose of the question, I am going to make a generalisation about it. Let's suppose we have the following interfaces (that could be easily implemented e.g. by Hazelcast or Reddison (Redis)):
public interface MyDistributedLock {
boolean lock();
void unlock();
boolean isLockedByCurrentThread();
}
public interface MyLockDistributedFactory {
MyDistributedLock getLock(String name);
}
And lock method waiting if lock cannot be acquired:
private Mono<Void> lock(String name, Publisher<?> publisher, MyLockDistributedFactory myLockFactory) {
// important to release lock on the same thread as
// it was aquired
Scheduler scheduler = Schedulers.newSingle(name.toLowerCase());
return Mono.defer(() -> Mono.just(myLockFactory.getLock(name)))
publishOn(scheduler)
.doOnNext(MyDistributedLock::lock)
.doOnNext(lock -> LOGGER.info("Process acquired lock for resource {}", name))
.flatMapMany(lock -> Flux.from(publisher))
.publishOn(scheduler)
.doFinally(signalType -> {
MyDistributedLock lock = myLockFactory.getLock(name);
if (signalType == SignalType.CANCEL) {
// cancel ignores publishOn
scheduler.schedule(() -> {
lock.unlock();
LOGGER.info("Process released lock for resource {} due to signal type {}", name, signalType);
});
} else if (lock.isLockedByCurrentThread()) {
lock.unlock();
LOGGER.info("Process released lock for resource {} due to signal type {}", name, signalType);
}
})
.then();
}
And example of some job
private Mono<Void> someJobRunEveryOneHourOnEveryNodeInCluster() {
MyLockDistributedFactory hazelcast = ...;
return lock("some-job", Flux.just(1,2), hazelcast)
.repeatWhen(afterOneHour());
}
I wonder whether this is a good approach of using Project reactor (and correct implementation) or it should be done in a different way. Please advice.

it is a correct approach when using Reactor, because you took care of offsetting the blocking portion into a dedicated Scheduler/Thread.
But I'd say mutually exclusive code like this is not a very good fit for reactive programming in general: you lose one of the key benefits of doing more with less threads, you risk blocking other parts of the application should you forget to publishOn a dedicated thread, etc...

Related

Spring Mono<User> as constructor param - how to "cache" object

I'm drawing a blank on how to do this in project reactor with Spring Boot:
class BakerUserDetails(val bakerUser: Mono<BakerUser>): UserDetails {
override fun getPassword(): String {
TODO("Not yet implemented")
// return ???.password
}
override fun getUsername(): String {
TODO("Not yet implemented")
// return ???.username
}
}
How do I make this work? Do I just put bakerUser.block().password and bakerUser.block().username and all, or is there a better way to implement these methods?
Currently, I'm doing something like this but it seems strange:
private var _user: BakerUser? = null
private var user: BakerUser? = null
get() {
if(_user == null){
_user = bakerUser.block()
}
return _user
}
override fun getAuthorities(): MutableCollection<out GrantedAuthority> {
return mutableSetOf(SimpleGrantedAuthority("USER"))
}
override fun getPassword(): String {
return user!!.password!!
}
im not well versed at Kotlin, but i can tell you that you should not pass in a Monoto the UserDetails object.
A Mono<T> is sort of like a future/promise. Which means that there is nothing in it. So if you want something out of it, you either block which means we wait, until there is something in it, or we subscribe, which basically means we wait async until there is something in it. Which can be bad. Think of it like starting a job on the side. What happens if you start a job and you quit the program, well the job would not be executed.
Or you do something threaded, and the program returns/exits, well main thread dies, all threads die, and nothing happend.
We usually in the reactive world talk about Publishers and Consumers. So a Flux/Mono is a Publisher and you then declare a pipelinefor what to happen when something is resolved. And to kick off the process the consumerneeds to subscribe to the producer.
Usually in a server world, this means that the webpage, that does the request, is the consumer and it subscribes to the server which in this case is the publisher.
So what im getting at, is that you, should almost never subscribe in your application, unless, your application is the one that starts the consumption. For instance you have a cron job in your server that consumes another server etc.
lets look at your problem:
You have not posted your code so im going to do some guesswork here, but im guessing you are getting a user from a database.
public Mono<BakerUserDetails> loadUserByUsername(String username) {
Mono<user> user = userRepository.findByUsername(username);
// Here we declare our pipline, flatMap will map one object to another async
Mono<BakerUserDetails> bakerUser = user.flatMap(user -> Mono.just(new BakerUserDetails(user));
return bakerUser;
}
i wrote this without a compiler from the top of my head.
So dont pass in the Mono<T> do your transformations using different operators like map or flatMap etc. And dont subscribe in your application unless your server is the final consumer.

How to wrap a Flux with a blocking operation in the subscribe?

In the documentation it is written that you should wrap blocking code into a Mono: http://projectreactor.io/docs/core/release/reference/#faq.wrap-blocking
But it is not written how to actually do it.
I have the following code:
#PostMapping(path = "some-path", consumes = MediaType.APPLICATION_STREAM_JSON_VALUE)
public Mono<Void> doeSomething(#Valid #RequestBody Flux<Something> something) {
something.subscribe(something -> {
// some blocking operation
});
// how to return Mono<Void> here?
}
The first problem I have here is that I need to return something but I cant.
If I would return a Mono.empty for example the request would be closed before the work of the flux is done.
The second problem is: how do I actually wrap the blocking code like it is suggested in the documentation:
Mono blockingWrapper = Mono.fromCallable(() -> {
return /* make a remote synchronous call */
});
blockingWrapper = blockingWrapper.subscribeOn(Schedulers.elastic());
You should not call subscribe within a controller handler, but just build a reactive pipeline and return it. Ultimately, the HTTP client will request data (through the Spring WebFlux engine) and that's what subscribes and requests data to the pipeline.
Subscribing manually will decouple the request processing from that other operation, which will 1) remove any guarantee about the order of operations and 2) break the processing if that other operation is using HTTP resources (such as the request body).
In this case, the source is not blocking, but only the transform operation is. So we'd better use publishOn to signal that the rest of the chain should be executed on a specific Scheduler. If the operation here is I/O bound, then Schedulers.elastic() is the best choice, if it's CPU-bound then Schedulers .paralell is better. Here's an example:
#PostMapping(path = "/some-path", consumes = MediaType.APPLICATION_STREAM_JSON_VALUE)
public Mono<Void> doSomething(#Valid #RequestBody Flux<Something> something) {
return something.collectList()
.publishOn(Schedulers.elastic())
.map(things -> {
return processThings(things);
})
.then();
}
public ProcessingResult processThings(List<Something> things) {
//...
}
For more information on that topic, check out the Scheduler section in the reactor docs. If your application tends to do a lot of things like this, you're losing a lot of the benefits of reactive streams and you might consider switching to a Servlet-based model where you can configure thread pools accordingly.

Concurrent threads in GemFire CacheWriter

We are currently using Cassandra as NoSQL Database and GemFire as In memory Database. We have been using the GemFire CacheWriter to insert the records in Cassandra. I would like your feedback on whether it’s a good engineering practice to use Concurrent threads in CacheWriter to insert/Update records. Your feedback on this would be appreciated.
public class GenericWriter<K, V> extends CacheWriterAdapter<K, V> implements Declarable {
private static Logger log = LoggerFactory.getLogger(GenericWriter.class);
#Autowired
private CassandraOperations cassandraOperations;
ExecutorService executor = null;
#Override
public void beforeCreate(EntryEvent<K, V> e) {
executor = Executors.newSingleThreadExecutor();
executor.submit(() -> {
if (eventOperation.equals("CREATE") || eventOperation.equalsIgnoreCase("PUTALL_CREATE")) {
try {
cassandraOperations.insert(e.getNewValue());
} catch (CassandraConnectionFailureException | CassandraWriteTimeoutException
| CassandraInternalException cassException) {
} catch (Exception ex) {
log.error("Exception in GenericCacheWriter->" + ExceptionUtils.getStackTrace(ex));
throw ex;
}
}
});
executor.shutdown();
}
#Override
public void init(Properties arg0) {
// TODO Auto-generated method stub
}
}
The CacheWriter handler is called synchronously, so the application does not continue until the handler returns. Therefore, is not recommended to execute long-running operations inside this listener. If a long-running operation is needed, consider processing the operation asynchronously through an AsyncEventListener instead.
Using an ExecutorService to delegate the execution to a different thread is possible but it is an anti-pattern, as it no longer implements the fail-fast property, and the handling of the event is no longer synchronous, so its timing would not be guaranteed relative to the application's completion of the event.
You can read more about this topic in the Geode Wiki, specifically in CacheWrite and CacheListener Best Practices.
Hope this helps.
Best regards.
Yes, it's a fine pattern but remove the Executor and partition your data such that all updates into GemFire go to one and only one node. Partition Cassandra the same way. Put a write lock around the Cassandra update. Use this only when your throughput is low.
If you need high throughput, use the AsyncEventListener and guarantee eventual consistency to your users. If you must use Executors in the AEL, use them in a way so as to throw an exception in the main thread. If the update fails after a number of tries, you write the failed entry to a different region with an expiration of a few seconds or a minute. When that expires, retry the operation. Keep doing this until the succeeds and then and only then, delete the expired entry.
You will need to track version numbers and what you are updating watching old values/ new values if order of updates is important to you or not.

notify listener inside or outside inner synchronization

I am struggling with a decision. I am writing a thread-safe library/API. Listeners can be registered, so the client is notified when something interesting happens. Which of the two implementations is most common?
class MyModule {
protected Listener listener;
protected void somethingHappens() {
synchronized(this) {
... do useful stuff ...
listener.notify();
}
}
}
or
class MyModule {
protected Listener listener;
protected void somethingHappens() {
Listener l = null;
synchronized(this) {
... do useful stuff ...
l = listener;
}
l.notify();
}
}
In the first implementation, the listener is notified inside the synchronization. In the second implementation, this is done outside the synchronization.
I feel that the second one is advised, as it makes less room for potential deadlocks. But I am having trouble to convince myself.
A downside of the second imlementation is that the client might receive 'incorrect' notifications, which happens if it accessed and changed the module prior to the l.notify() statement. For example, if it asked the module to stop sending notifications, this notificaiton is sent anyway. This is not the case in the first implementation.
thanks a lot
It depends on where you are getting listener in your method, how many listeners you have, how the listener subscribes/unsubscribes
Assuming from your example, you have only one listener then you might be better to use critical sections (or monitors) for different parts of the class rather than locking the entire object.
You could have one lock for performing tasks within the method that are specific to the object/task at hand, and one for the listener subscribe/unsubscribe/notify (that is to ensure that the listener is not changed during a notification).
I would also use a ReadWriteLock protecting you listener references (either single or list of listeners)
Answering you comment:
I think that you should notify the listener after you have unlocked the class. This is because, the result of that notification could result in a different thread trying to gain access to the class, which it may not be able to do, under certain circumstances, leading to deadlock.
Notifying a listener (if protected like I have described) should not hold up any other thread that requires the facilities of the class. The best strategy is to create locks that are specific to the state of the class and locks that are specific to safe notification.
If you take your example of suspending notifications, this could be covered by the lock that governs notifications, so if a different thread 'suspends' notifications, either the suspend will be processed or the current notification complete, if the other thread suspends notification between the task being processed and the notification happening, the l.notify() will not happen.
Listener l = null;
synchronised(processLock_) {
... do stuff....
synchronised(notifyLock_) {
l = listener;
}
}
//
// current thread preempted by other thread that suspends notification here.
//
synchronised(notifyLock_) { // ideally use a readwritelock here...
l = allowNotify_ ? l: null;
}
if(l)
l.notify();

.NET 4.0 Threading.Tasks

I've recently started working on a new application which will utilize task parallelism. I have just begun writing a tasking framework, but have recently seen a number of posts on SO regarding the new System.Threading.Tasks namespace which may be useful to me (and I would rather use an existing framework than roll my own).
However looking over MSDN I haven't seen how / if, I can implement the functionality which I'm looking for:
Dependency on other tasks completing.
Able to wait on an unknown number of tasks preforming the same action (maybe wrapped in the same task object which is invoked multiple times)
Set maximum concurrent instances of a task since they use a shared resource there is no point running more than one at once
Hint at priority, or scheduler places tasks with lower maximum concurrent instances at a higher priority (so as to keep said resource in use as much as possible)
Edit ability to vary the priority of tasks which are preforming the same action (pretty poor example but, PredictWeather (Tommorrow) will have a higher priority than PredictWeather (NextWeek))
Can someone point me towards an example / tell me how I can achieve this? Cheers.
C# Use Case: (typed in SO so please for give any syntax errors / typos)
**note Do() / DoAfter() shouldn't block the calling thread*
class Application ()
{
Task LeafTask = new Task (LeafWork) {PriorityHint = High, MaxConcurrent = 1};
var Tree = new TaskTree (LeafTask);
Task TraverseTask = new Task (Tree.Traverse);
Task WorkTask = new Task (MoreWork);
Task RunTask = new Task (Run);
Object SharedLeafWorkObject = new Object ();
void Entry ()
{
RunTask.Do ();
RunTask.Join (); // Use this thread for task processing until all invocations of RunTask are complete
}
void Run(){
TraverseTask.Do ();
// Wait for TraverseTask to make sure all leaf tasks are invoked before waiting on them
WorkTask.DoAfter (new [] {TraverseTask, LeafTask});
if (running){
RunTask.DoAfter (WorkTask); // Keep at least one RunTask alive to prevent Join from 'unblocking'
}
else
{
TraverseTask.Join();
WorkTask.Join ();
}
}
void LeafWork (Object leaf){
lock (SharedLeafWorkObject) // Fake a shared resource
{
Thread.Sleep (200); // 'work'
}
}
void MoreWork ()
{
Thread.Sleep (2000); // this one takes a while
}
}
class TaskTreeNode<TItem>
{
Task LeafTask; // = Application::LeafTask
TItem Item;
void Traverse ()
{
if (isLeaf)
{
// LeafTask set in C-Tor or elsewhere
LeafTask.Do(this.Item);
//Edit
//LeafTask.Do(this.Item, this.Depth); // Deeper items get higher priority
return;
}
foreach (var child in this.children)
{
child.Traverse ();
}
}
}
There are numerous examples here:
http://code.msdn.microsoft.com/ParExtSamples
There's a great white paper which covers a lot of the details you mention above here:
"Patterns for Parallel Programming: Understanding and Applying Parallel Patterns with the .NET Framework 4"
http://www.microsoft.com/downloads/details.aspx?FamilyID=86b3d32b-ad26-4bb8-a3ae-c1637026c3ee&displaylang=en
Off the top of my head I think you can do all the things you list in your question.
Dependencies etc: Task.WaitAll(Task[] tasks)
Scheduler: The library supports numerous options for limiting number of threads in use and supports providing your own scheduler. I would avoid altering the priority of threads if at all possible. This is likely to have negative impact on the scheduler, unless you provide your own.