Concurrent threads in GemFire CacheWriter - gemfire

We are currently using Cassandra as NoSQL Database and GemFire as In memory Database. We have been using the GemFire CacheWriter to insert the records in Cassandra. I would like your feedback on whether it’s a good engineering practice to use Concurrent threads in CacheWriter to insert/Update records. Your feedback on this would be appreciated.
public class GenericWriter<K, V> extends CacheWriterAdapter<K, V> implements Declarable {
private static Logger log = LoggerFactory.getLogger(GenericWriter.class);
#Autowired
private CassandraOperations cassandraOperations;
ExecutorService executor = null;
#Override
public void beforeCreate(EntryEvent<K, V> e) {
executor = Executors.newSingleThreadExecutor();
executor.submit(() -> {
if (eventOperation.equals("CREATE") || eventOperation.equalsIgnoreCase("PUTALL_CREATE")) {
try {
cassandraOperations.insert(e.getNewValue());
} catch (CassandraConnectionFailureException | CassandraWriteTimeoutException
| CassandraInternalException cassException) {
} catch (Exception ex) {
log.error("Exception in GenericCacheWriter->" + ExceptionUtils.getStackTrace(ex));
throw ex;
}
}
});
executor.shutdown();
}
#Override
public void init(Properties arg0) {
// TODO Auto-generated method stub
}
}

The CacheWriter handler is called synchronously, so the application does not continue until the handler returns. Therefore, is not recommended to execute long-running operations inside this listener. If a long-running operation is needed, consider processing the operation asynchronously through an AsyncEventListener instead.
Using an ExecutorService to delegate the execution to a different thread is possible but it is an anti-pattern, as it no longer implements the fail-fast property, and the handling of the event is no longer synchronous, so its timing would not be guaranteed relative to the application's completion of the event.
You can read more about this topic in the Geode Wiki, specifically in CacheWrite and CacheListener Best Practices.
Hope this helps.
Best regards.

Yes, it's a fine pattern but remove the Executor and partition your data such that all updates into GemFire go to one and only one node. Partition Cassandra the same way. Put a write lock around the Cassandra update. Use this only when your throughput is low.
If you need high throughput, use the AsyncEventListener and guarantee eventual consistency to your users. If you must use Executors in the AEL, use them in a way so as to throw an exception in the main thread. If the update fails after a number of tries, you write the failed entry to a different region with an expiration of a few seconds or a minute. When that expires, retry the operation. Keep doing this until the succeeds and then and only then, delete the expired entry.
You will need to track version numbers and what you are updating watching old values/ new values if order of updates is important to you or not.

Related

handle separate transaction in java batch (JSR-352)

I'm using jberet implementation of JSR 352 java batch specs.
Actually I need a separate transaction for doing a singular update, something like this:
class MyItemWriter implements ItemWriter
#Inject
UserTransaction transaction
void resetLastProductsUpdateDate(String uidCli) throws BusinessException {
try {
if (transaction.getStatus() != Status.STATUS_ACTIVE) {
transaction.begin();
}
final Customer customer = dao.findById(id);
customer.setLastUpdate(null);
customer.persist(cliente);
transaction.commit();
} catch (RollbackException | HeuristicMixedException | HeuristicRollbackException | SystemException | NotSupportedException e) {
logger.error("error while updating user products last update");
throw new BusinessException();
}
}
I first tried marking resetLastProductsUpdateDate methoad as #Transactional(REQUIRES_NEW), however it didn't worked.
My question is:
Is there any more elegant way to achieve this singular transaction without manually handle of transaction?
While does UserTransation works, EntityManager.transaction doesn't. I don't get it why.
Class below, which is injected from a Batchlet, works properly; Why I can't get to make work the #Transactional annotation on resetLastProductsUpdateDate method instead?
public class DynamicQueryDAO {
#Inject
EntityManager entityManager;
#Inject
private Logger logger;
#Transactional(Transactional.TxType.REQUIRED)
public void executeQuery(String query) {
logger.info("executing query: {}", query);
final int output = entityManager.createNativeQuery(query).executeUpdate();
logger.info("rows updated: {}", output);
}
}
EDIT
Actually I guess neither usertransaction is a good solution, because it affects entire itemwriter transaction management. Still Don't know how to deal with transaction isolation :(
In general the batch application should avoid directly handling transaction. You can have your batch component to throw some business exceptions upon certain conditions, and configure your job.xml to trigger retry upon this business exception. During retry, each individual data will be processed and committed in its own chunk.

Reactive programming - running jobs in a cluster

I need to run some jobs in a cluster, only one at a time.
Because my team uses Hazelcast, I ended up with a solution based on
Hazelcast ILock implementation. For the purpose of the question, I am going to make a generalisation about it. Let's suppose we have the following interfaces (that could be easily implemented e.g. by Hazelcast or Reddison (Redis)):
public interface MyDistributedLock {
boolean lock();
void unlock();
boolean isLockedByCurrentThread();
}
public interface MyLockDistributedFactory {
MyDistributedLock getLock(String name);
}
And lock method waiting if lock cannot be acquired:
private Mono<Void> lock(String name, Publisher<?> publisher, MyLockDistributedFactory myLockFactory) {
// important to release lock on the same thread as
// it was aquired
Scheduler scheduler = Schedulers.newSingle(name.toLowerCase());
return Mono.defer(() -> Mono.just(myLockFactory.getLock(name)))
publishOn(scheduler)
.doOnNext(MyDistributedLock::lock)
.doOnNext(lock -> LOGGER.info("Process acquired lock for resource {}", name))
.flatMapMany(lock -> Flux.from(publisher))
.publishOn(scheduler)
.doFinally(signalType -> {
MyDistributedLock lock = myLockFactory.getLock(name);
if (signalType == SignalType.CANCEL) {
// cancel ignores publishOn
scheduler.schedule(() -> {
lock.unlock();
LOGGER.info("Process released lock for resource {} due to signal type {}", name, signalType);
});
} else if (lock.isLockedByCurrentThread()) {
lock.unlock();
LOGGER.info("Process released lock for resource {} due to signal type {}", name, signalType);
}
})
.then();
}
And example of some job
private Mono<Void> someJobRunEveryOneHourOnEveryNodeInCluster() {
MyLockDistributedFactory hazelcast = ...;
return lock("some-job", Flux.just(1,2), hazelcast)
.repeatWhen(afterOneHour());
}
I wonder whether this is a good approach of using Project reactor (and correct implementation) or it should be done in a different way. Please advice.
it is a correct approach when using Reactor, because you took care of offsetting the blocking portion into a dedicated Scheduler/Thread.
But I'd say mutually exclusive code like this is not a very good fit for reactive programming in general: you lose one of the key benefits of doing more with less threads, you risk blocking other parts of the application should you forget to publishOn a dedicated thread, etc...

How to wrap a Flux with a blocking operation in the subscribe?

In the documentation it is written that you should wrap blocking code into a Mono: http://projectreactor.io/docs/core/release/reference/#faq.wrap-blocking
But it is not written how to actually do it.
I have the following code:
#PostMapping(path = "some-path", consumes = MediaType.APPLICATION_STREAM_JSON_VALUE)
public Mono<Void> doeSomething(#Valid #RequestBody Flux<Something> something) {
something.subscribe(something -> {
// some blocking operation
});
// how to return Mono<Void> here?
}
The first problem I have here is that I need to return something but I cant.
If I would return a Mono.empty for example the request would be closed before the work of the flux is done.
The second problem is: how do I actually wrap the blocking code like it is suggested in the documentation:
Mono blockingWrapper = Mono.fromCallable(() -> {
return /* make a remote synchronous call */
});
blockingWrapper = blockingWrapper.subscribeOn(Schedulers.elastic());
You should not call subscribe within a controller handler, but just build a reactive pipeline and return it. Ultimately, the HTTP client will request data (through the Spring WebFlux engine) and that's what subscribes and requests data to the pipeline.
Subscribing manually will decouple the request processing from that other operation, which will 1) remove any guarantee about the order of operations and 2) break the processing if that other operation is using HTTP resources (such as the request body).
In this case, the source is not blocking, but only the transform operation is. So we'd better use publishOn to signal that the rest of the chain should be executed on a specific Scheduler. If the operation here is I/O bound, then Schedulers.elastic() is the best choice, if it's CPU-bound then Schedulers .paralell is better. Here's an example:
#PostMapping(path = "/some-path", consumes = MediaType.APPLICATION_STREAM_JSON_VALUE)
public Mono<Void> doSomething(#Valid #RequestBody Flux<Something> something) {
return something.collectList()
.publishOn(Schedulers.elastic())
.map(things -> {
return processThings(things);
})
.then();
}
public ProcessingResult processThings(List<Something> things) {
//...
}
For more information on that topic, check out the Scheduler section in the reactor docs. If your application tends to do a lot of things like this, you're losing a lot of the benefits of reactive streams and you might consider switching to a Servlet-based model where you can configure thread pools accordingly.

Injecting Variables into a running Process

Is there a way to inject a variable into a running process without a process listening for RPC requests?
For example if a process was running and using an environment variable, could I change that environment variable at runtime and make the process use the new value?
Are there alternative solutions for dynamically changing variables in a running process? Assume that this process is like a PHP process or a Javascript (node.js) process so I can change the source code... etc.
I think this is similar to passing state or communicating to another process, but I need a really lightweight way of doing so, without going over the network or using libraries or preferably not setting up an RPC server.
Solution does not have to be cross-platform. Prefer Linux.
You can do it it java. Imagine this is your thread class:
public void ThreadClass extends Thread {
Boolean state;
ThreadClass(Boolean b) {
state = b;
}
public void StopThread() {
state = false;
}
public void run() {
while(state) { //Do whatever you want here}
}
}
Now all you have to do is start this thread from your main class:
ThreadClass thread = new ThreadClass(true);
thread.start();
And if you want to change the value of state, call the StopThread method in the thread like so:
try {
thread.StopThread();
} catch (InterruptedException ex) {
Logger.getLogger(NewClass.class.getName()).log(Level.SEVERE, null, ex);
}
This will change the state of the Boolean while the thread is running.
It appears that local IPC implementations like shared memory is the way to go: Fastest technique to pass messages between processes on Linux?

Apache Camel : GBs of data from database routed to JMS endpoint

I've done a few small projects in camel now but one thing I'm struggling to understand is how to deal with big data (that doesn't fit into memory) when consuming in camel routes.
I have a database containing a couple of GBs worth of data that I would like to route using camel. Obviously reading all data into memory isn't an option.
If I were doing this as a standalone app I would have code that paged through the data and send chunks to my JMS enpoint. I'd like to use camel as it provides a nice pattern. If I were consuming from a file I could use the streaming() call.
Also should I use camel-sql/camel-jdbc/camel-jpa or use a bean to read from my database.
Hope everyone is still with me. I'm more familiar with the Java DSL but would appreciate any help/suggestions people can provide.
Update : 2-MAY-2012
So I've had some time to play around with this and I think what I'm actually doing is abusing the concept of a Producer so that I can use it in a route.
public class MyCustomRouteBuilder extends RouteBuilder {
public void configure(){
from("timer:foo?period=60s").to("mycustomcomponent:TEST");
from("direct:msg").process(new Processor() {
public void process(Exchange ex) throws Exception{
System.out.println("Receiving value" : + ex.getIn().getBody() );
}
}
}
}
My producer looks something like the following. For clarity I've not included the CustomEndpoint or CustomComponent as it just seems to be a thin wrapper.
public class MyCustomProducer extends DefaultProducer{
Endpoint e;
CamelContext c;
public MyCustomProducer(Endpoint epoint){
super(endpoint)
this.e = epoint;
this.c = e.getCamelContext();
}
public void process(Exchange ex) throws Exceptions{
Endpoint directEndpoint = c.getEndpoint("direct:msg");
ProducerTemplate t = new DefaultProducerTemplate(c);
// Simulate streaming operation / chunking of BIG data.
for (int i=0; i <20 ; i++){
t.start();
String s ="Value " + i ;
t.sendBody(directEndpoint, value)
t.stop();
}
}
}
Firstly the above doesn't seem very clean. It seems like the cleanest way to perform this would be to populate a jms queue (in place of direct:msg) via a scheduled quartz job that my camel route then consumes so that I can have more flexibility over the message size received within my camel pipelines. However I quite liked the semantics of setting up time based activations as part of the Route.
Does anyone have any thoughts on the best way to do this.
In my understanding, all you need to do is:
from("jpa:SomeEntity" +
"?consumer.query=select e from SomeEntity e where e.processed = false" +
"&maximumResults=150" +
"&consumeDelete=false")
.to("jms:queue:entities");
maximumResults defines a limit of how many entities you get per query.
When you finish the processing of an entity instance, you need to set e.processed = true; and persist() it, so that the entity won't be processed again.
One way to do that is with the #Consumed annotation:
class SomeEntity {
#Consumed
public void markAsProcessed() {
setProcessed(true);
}
}
Another thing, you need to be careful with is how you serialize the entity before sending it to the queue. You might need to use an enricher between the from and to.