Spring Flux stage execute on a different thread when using reactive redis template - redis

When one of the intermediate stage uses reactive Redis template the thread in the subsequent stage changes.
Does it mean reactive redis template has its own scheduler? Which effects every stage moving forward.
If the first point is true, then should I be switching to my own scheduler for processing afterwards?.
Code example
return Flux.fromArray(new String[]{"1", "2", "3"}).map(s -> s.toUpperCase()).flatMap(s -> {
System.out.println(Thread.currentThread().getName());
return redisTemplate.opsForValue().set(s, s);
}).map(aBoolean -> {
System.out.println(Thread.currentThread().getName());
return aBoolean.booleanValue();
}).log();
The first map function print outputs reactor-http-nio-3.
The second map function executes in lettuce-nioEventLoop-5-1.

Related

How do you close a cold observable gracefully after certain time has elapsed from subscribe?

Assume below code which processes data from a paginatedAPI (external).
Flowable<Data> process = Flowable.generate(() -> new State(),
new BiConsumer<State, Emitter<Data>>() {
void accept() {
//get data from upstream service
//This calls dataEmitter.onNext() internally
pageId = paginatedAPI.get(pageId, dataEmitter);
//process
...
//update state
state.updatePageId(pageId);
}
}).subscribeOn(Schedulers.from(executor));
Now, since this is created from .generate, accept will be called only when subscriber is ready for next data.
I have full control on what I can add to state, but I can't change paginatedAPI
Requirement:
After a time T from subscription,
a) Iterate through all pages without sending them to subscriber and call paginatedAPI.close()
b) Provide subscriber with data from paginatedAPI.close()
If the subscriber disconnects before time T, then
a) Iterate through all pages without sending them to subscriber and call paginatedAPI.close()
I don't understand how to add the concept of time from subscription in controlling the flowable logic.
Also, accept can only call onNext atmost once. Now how can I finish through the paginatedAPI by calling onNext multiple times.
Edit: Added details on emitter and internal onNext call in paginatedAPi.get(pageId, dataEmitter);

Infinispan clustered lock performance does not improve with more nodes?

I have a piece of code that is essentially executing the following with Infinispan in embedded mode, using version 13.0.0 of the -core and -clustered-lock modules:
#Inject
lateinit var lockManager: ClusteredLockManager
private fun getLock(lockName: String): ClusteredLock {
lockManager.defineLock(lockName)
return lockManager.get(lockName)
}
fun createSession(sessionId: String) {
tryLockCounter.increment()
logger.debugf("Trying to start session %s. trying to acquire lock", sessionId)
Future.fromCompletionStage(getLock(sessionId).lock()).map {
acquiredLockCounter.increment()
logger.debugf("Starting session %s. Got lock", sessionId)
}.onFailure {
logger.errorf(it, "Failed to start session %s", sessionId)
}
}
I take this piece of code and deploy it to kubernetes. I then run it in six pods distributed over six nodes in the same region. The code exposes createSession with random Guids through an API. This API is called and creates sessions in chunks of 500, using a k8s service in front of the pods which means the load gets balanced over the pods. I notice that the execution time to acquire a lock grows linearly with the amount of sessions. In the beginning it's around 10ms, when there's about 20_000 sessions it takes about 100ms and the trend continues in a stable fashion.
I then take the same code and run it, but this time with twelve pods on twelve nodes. To my surprise I see that the performance characteristics are almost identical to when I had six pods. I've been digging in to the code but still haven't figured out why this is, I'm wondering if there's a good reason why infinispan here doesn't seem to perform better with more nodes?
For completeness the configuration of the locks are as follows:
val global = GlobalConfigurationBuilder.defaultClusteredBuilder()
global.addModule(ClusteredLockManagerConfigurationBuilder::class.java)
.reliability(Reliability.AVAILABLE)
.numOwner(1)
and looking at the code the clustered locks is using DIST_SYNC which should spread out the load of the cache onto the different nodes.
UPDATE:
The two counters in the code above are simply micrometer counters. It is through them and prometheus that I can see how the lock creation starts to slow down.
It's correctly observed that there's one lock created per session id, this is per design what we'd like. Our use case is that we want to ensure that a session is running in at least one place. Without going to deep into detail this can be achieved by ensuring that we at least have two pods that are trying to acquire the same lock. The Infinispan library is great in that it tells us directly when the lock holder dies without any additional extra chattiness between pods, which means that we have a "cheap" way of ensuring that execution of the session continues when one pod is removed.
After digging deeper into the code I found the following in CacheNotifierImpl in the core library:
private CompletionStage<Void> doNotifyModified(K key, V value, Metadata metadata, V previousValue,
Metadata previousMetadata, boolean pre, InvocationContext ctx, FlagAffectedCommand command) {
if (clusteringDependentLogic.running().commitType(command, ctx, extractSegment(command, key), false).isLocal()
&& (command == null || !command.hasAnyFlag(FlagBitSets.PUT_FOR_STATE_TRANSFER))) {
EventImpl<K, V> e = EventImpl.createEvent(cache.wired(), CACHE_ENTRY_MODIFIED);
boolean isLocalNodePrimaryOwner = isLocalNodePrimaryOwner(key);
Object batchIdentifier = ctx.isInTxScope() ? null : Thread.currentThread();
try {
AggregateCompletionStage<Void> aggregateCompletionStage = null;
for (CacheEntryListenerInvocation<K, V> listener : cacheEntryModifiedListeners) {
// Need a wrapper per invocation since converter could modify the entry in it
configureEvent(listener, e, key, value, metadata, pre, ctx, command, previousValue, previousMetadata);
aggregateCompletionStage = composeStageIfNeeded(aggregateCompletionStage,
listener.invoke(new EventWrapper<>(key, e), isLocalNodePrimaryOwner));
}
The lock library uses a clustered Listener on the entry modified event, and this one uses a filter to only notify when the key for the lock is modified. It seems to me the core library still has to check this condition on every registered listener, which of course becomes a very big list as the number of sessions grow. I suspect this to be the reason and if it is it would be really really awesome if the core library supported a kind of key filter so that it could use a hashmap for these listeners instead of going through a whole list with all listeners.
I believe you are creating a clustered lock per session id. Is this what you need ? what is the acquiredLockCounter? We are about to deprecate the "lock" method in favour of "tryLock" with timeout since the lock method will block forever if the clustered lock is never acquired. Do you ever unlock the clustered lock in another piece of code? If you shared a complete reproducer of the code will be very helpful for us. Thanks!

Kafka Streams write an event back to the input topic

in my kafka streams app, I need to re-try processing a message whenever a particular type of exception is thrown in the processing logic.
Rather than wrapping my logic in the RetryTemplate (am using springboot), am considering just simply writing the message back into the input topic, my assumption is that this message will be added to the back of the log in the appropriate partition and it will eventually be re-processed.
Am aware that this would mess up the ordering and am okay with that.
My question is, would kafka streams have an issue when it encounters a message that was supposedly already processed in the past (am assuming kafka streams has a way of marking the messages it has processed especially when exactly is enabled)?
Here is an example of the code am considering for this solution.
val branches = streamsBuilder.stream(inputTopicName)
.mapValues { it -> myServiceObject.executeSomeLogic(it) }
.branch(
{ _, value -> value is successfulResult() }, // success
{ _, error -> error is errorResult() }, // exception was thrown
)
branches[0].to(outputTopicName)
branches[1].to(inputTopicName) //write them back to input as a way of retrying

How to control flow of a transactional stream with Spring Data R2DBC?

Support for transactional streams seems to have been recently implemented but due to its newness, there are not many code examples.
Could someone show an example of a transactional stream that does a series of database inserts and then returns some value on success, but with a midstream checkpoint in between inserts that tests some condition and might roll back the transaction and return different values depending on the checkpoint result?
Reactive transactions follow the same pattern as imperative ones:
A transaction is started before running any user-space commands
Run user-space commands
Commit (or rollback)
A few aspects to note here: A connection is always associated with a materialization of a reactive sequence. What we know from a Thread-bound connection that is bound to an execution in imperative programming translates to an materialization in reactive programming.
So each (concurrent) execution gets a connection assigned.
Spring Data R2DBC has no support for savepoints. Take a look at the following code example that illustrates a decision to either commit or rollback:
DatabaseClient databaseClient = DatabaseClient.create(connectionFactory);
TransactionalOperator transactionalOperator = TransactionalOperator
.create(new R2dbcTransactionManager(connectionFactory));
transactionalOperator.execute(tx -> {
Mono<Void> insert = databaseClient.execute("INSERT INTO legoset VALUES(…)")
.then();
Mono<Long> select = databaseClient.execute("SELECT COUNT(*) FROM legoset")
.as(Long.class)
.fetch()
.first();
return insert.then(select.handle((count, sink) -> {
if(count > 10) {
tx.setRollbackOnly();
}
}));
}).as(StepVerifier::create).verifyComplete();
Notable aspects here are:
We're using TransactionalOperator instead of #Transactional.
The code in .handle() calls setRollbackOnly() to roll back the transaction.
Using #Transactional, you would typically use exceptions to signal a rollback condition.

Execute multiple downloads and wait for all to complete

I am currently working on an API service that allows 1 or more users to download 1 or more items from an S3 bucket and return the contents to the user. While the downloading is fine, the time taken to download several files is pretty much 100-150 ms * the number of files.
I have tried a few approaches to speeding this up - parallelStream() instead of stream() (which, considering the amount of simultaneous downloads, is at a serious risk of running out of threads), as well as CompleteableFutures, and even creating an ExecutorService, doing the downloads then shutting down the pool. Typically I would only want a few concurrent tasks e.g. 5 at the same time, per request to try and cut down on the number of active threads.
I have tried integrating Spring #Cacheable to store the downloaded files to Redis (the files are readonly) - while this certainly cuts down response times (several ms to retrieve files compared to 100-150 ms), the benefits are only there once the file has been previously retrieved.
What is the best way to handle waiting on multiple async tasks to finish then getting the results, also considering I don't want (or don't think I could) have hundreds of threads opening http connections and downloading all at once?
You're right to be concerned about tying up the common fork/join pool used by default in parallel streams, since I believe it is used for other things like sort operations outside of the Stream api. Rather than saturating the common fork/join pool with an I/O-bound parallel stream, you can create your own fork/join pool for the Stream. See this question to find out how to create an ad hoc ForkJoinPool with the size you want and run a parallel stream in it.
You could also create an ExecutorService with a fixed-size thread pool that would also be independent of the common fork/join pool, and would throttle the requests, using only the threads in the pool. It also lets you specify the number of threads to dedicate:
ExecutorService executor = Executors.newFixedThreadPool(MAX_THREADS_FOR_DOWNLOADS);
try {
List<CompletableFuture<Path>> downloadTasks = s3Paths
.stream()
.map(s3Path -> completableFuture.supplyAsync(() -> mys3Downloader.downloadAndGetPath(s3Path), executor))
.collect(Collectors.toList());
// at this point, all requests are enqueued, and threads will be assigned as they become available
executor.shutdown(); // stops accepting requests, does not interrupt threads,
// items in queue will still get threads when available
// wait for all downloads to complete
CompletableFuture.allOf(downloadTasks.toArray(new CompletableFuture[downloadTasks.size()])).join();
// at this point, all downloads are finished,
// so it's safe to shut down executor completely
} catch (InterruptedException | ExecutionException e) {
e.printStackTrace();
} finally {
executor.shutdownNow(); // important to call this when you're done with the executor.
}
Following #Hank D's lead, you can encapsulate the creation of the executor service to ensure that you do, indeed, call ExecutorService::shutdownNow after using said executor:
private static <VALUE> VALUE execute(
final int nThreads,
final Function<ExecutorService, VALUE> function
) {
ExecutorService executorService = Executors.newFixedThreadPool(nThreads);
try {
return function.apply(executorService);
} catch (final InterruptedException | ExecutionException exception) {
exception.printStackTrace();
} finally {
executorService .shutdownNow(); // important to call this when you're done with the executor service.
}
}
public static void main(final String... arguments) {
// define variables
final List<CompletableFuture<Path>> downloadTasks = execute(
MAX_THREADS_FOR_DOWNLOADS,
executor -> s3Paths
.stream()
.map(s3Path -> completableFuture.supplyAsync(
() -> mys3Downloader.downloadAndGetPath(s3Path),
executor
))
.collect(Collectors.toList())
);
// use downloadTasks
}