Should parallel API call use Schedulers.parallel() or Schedulers.boundedElastic() - kotlin

To be honest I have no idea how schedulers work in reactor. I have read few of them and this what I found.
Schedulers.parallel() is good for CPU-intensive but short-lived tasks.
It can execute N such tasks in parallel (by default N == number of
CPUs)
Schedulers.elastic() and Schedulers.boundedElastic() are good for more
long-lived tasks (eg. blocking IO tasks). The elastic one spawns
threads on-demand without a limit while the recently introduced
boundedElastic does the same with a ceiling on the number of created
threads.
So in my API calls there's a task where I have to poll request over and over again until its state is ready.
Flux.just(order1, order2)
.parallel(4)
.runOn(Schedulers.parallel())
.flatMap { order -> createOrder(order) }
.flatMap { orderId ->
pollConfirmOrderStatus(orderId)
.retryWhen(notReady)
}
.collectList()
.subscribe()
As you can see I use Schedulers.parallel() and it works fine, but I'm concerning about blocking CPU usage since my server doesn't have that much CPU cores. The pollConfirmOrderStatus takes about 1-2 minutes so I'm not sure if it would block other process in my server from accessing CPU. So should I use Schedulers.parallel() or Schedulers.bondedElastic() here.

If your method pollConfirmOrderStatus() doesn't block the parallel Scheduler's threads it should be fine. Otherwise then you might be blocking all the available thread in the parallel Scheduler, which might end up in a deadlock if your state never gets ready.
Here, it explains that parallel scheduler is reserved for non-blocking calls, and that you can use BlockHound to spot blocking calls from non-blocking intended threads.
https://spring.io/blog/2019/03/28/reactor-debugging-experience

Related

Swapping between Schedulers

I have a blocking workload that I want to execute on the bounded elastic scheduler. After this work is done, a lot of work that could be executed on the parallel scheduler follows, but it will automatically continue to run on the thread from the bounded elastic scheduler.
When is it "correct" to drop the previous scheduler you set earlier in the chain? Is there ever a reason to do so if it's not strictly necessary, because of thread starvation, for example?
I can switch the scheduler of a chain by "breaking" the existing chain with flatMap, then, switchIfEmpty, and probably a few more methods. Example:
public Mono<Void> mulpleSchedulers() {
return blockingWorkload()
.doOnSuccess(result -> log.info("Thread {}", Thread.currentThread().getName())) // Thread boundedElastic-1
.subscribeOn(Schedulers.boundedElastic())
.flatMap(result -> Mono.just(result)
.subscribeOn(Schedulers.parallel()))
.doOnSuccess(result -> log.info("Thread {}", Thread.currentThread().getName())); // Thread parallel-1
}
Generally it is not a bad practice to switch the execution to another scheduler throughout your reactive chain.
To switch the execution to another scheduler in the middle of your chain you can use publishOn() operator. Then any subsequent operator call will be run on the supplied scheduler worker to such publishOn().
Take a look at 4.5.1 from reference
However, you should clearly know why do you do that. Is there any reason for this?
If you want to run some long computational process (some CPU-bound work), then it is recommended to execute it on Schedulers.parallel()
For making blocking calls it is recommended to use Schedulers.boundedElastic()
However, usually for making blocking calls we use subscribeOn(Schedulers.boundedElastic()) on "blocking publisher" directly.
Wrapping blocking calls in reactor

Run mono in parallel doesn't seems faster

So I'm trying to make API request in parallel, but it doesn't seems any faster. Am I doing it wrong? Here are my codes.
fun getUserInfo(username: String): Mono<String> {
return webclient
// some config and params
.post()
.bodyToMono(String::class)
.subscribeOn(Schedulers.parallel())
}
fun main(){
val time = measureTimeMillis {
Mono.zip(getUserInfo("doge"), getUserInfo("cheems"), etc...)
.map { user ->listOf(it.t1, it.t2, etc...) }
.block()
}
// give the same amount of time doesn't seems faster
// with and without subscribeOn(Schedulers.parallel())
}
It has nothing to do with your code.
You must understand that any i/o-work is mostly spent waiting. As in waiting for a response.
If we look at the lifecycle of a thread, it will do a bit of preprocessing and then send the request. When the request has been sent it has to wait for the response. This is where the majority of the time is spent, then you get a response and process the response. Here's the thing, as much as 90% of the request time could be spent just waiting. This is a lot of wasted resources having the thread just waiting.
This is what is good with webflux/reactor. When a request is sent, the thread will not wait, it will go on to process other requests/responses, and when that first requests response comes back, any free thread can pick up that response, it does not have to the be the thread that sent the request in the first place.
What i have just described is usually what is called async or asynchronous work.
So lets look at what you are doing. You want to run your requests in parallel, meaning utilizing multiple threads on multiple cores at the same time.
For this to work, you will need to contact the other cpus and tell them to get prepared for incoming work. The cpus will then need to initialize a number of threads on each cpu and then the data must be sent to all the cpus. As you can see there is a setup time involved here.
Then all the requests are made from multiple cpus at the same time, but the waiting for the responses are constant! Its the exact same waiting time as before (up to as much as 90% of the total request time). Then when all responses are return they are collected and processed on multiple cpus, and then they are sent back to the original thread on the original cpu.
What have you gained? Most likely almost nothing, but you also most likely utilized a lot more resources for this very, very minimal gain.
Parallel is usually good if you have the need for raw cpu computing power, like calculations of some sort, examples could be a 3D renderer, or I don't know, cracking hashes etc. not i/o work.
I/O work is usually more about orchestration than raw cpu power. Slow responses will not be solved by parallel computing a slow response is always a slow response. You will just consume more resources to no good.
This is the reason why just regular flatMap Is so powerful in reactor.
It will perform everything async for you without needing to deal with threads, locks, synchronization, joins etc. It will perform all work async as quick as possible utilizing as few resources as possible.

Creating Futures without ExecutorService or FutureTask

I am using JerseyClient to make async calls to a http server and am directly creating Futures to store the response. These can even be batch calls in which case I create a list of futures.
This is working perfectly for now but I am concerned about CPU Utilization and Thread Count as I am not creating any Thread Pool using Executor Service, nor am I using FutureTask<> to create futures.
Small Code snippet of how I am constructing each future:
Future<Response> response = requestBuilder.async().get();
Are these concerns valid? Is it okay to continue with this approach? Would this approach not scale?
Another concern is that a get() might never be performed on some of these futures? Would this lead to spawning threads that will never be killed because neither get() or cancel() is performed for the futures running on these threads?
It is always better to control number of threads created by the application so while calling REST API using Jersey async API, you should limit number of threads created.
Below code can be used for controlling max number of threads created by jersey client in async API -
ClientConfig cc = new ClientConfig();
cc.property(ClientProperties.ASYNC_THREADPOOL_SIZE , 10);
Client client = ClientBuilder.newClient(cc);
Another point related to get() method call on Future instance, get() method is just a blocking method which will make your current thread wait for completion of the future task in this case for the response to receive, it will not impact execution of threads. So, there should not be any issue if you are not calling get() or cancel() on response Future instance.

DMLC and concurrent consumers working

Does DMLC creates separate threads for each concurrent consumer? What happens under the hood? The documentation writes this:
Actual MessageListener execution happens in asynchronous work units which are created through Spring's TaskExecutor abstraction. By default, the specified number of invoker tasks will be created on startup, according to the "concurrentConsumers" setting.
I am not able to understand this, are these tasks executed in parallel? If yes, what are the default limits for this, like thread count etc.?
Thanks!
Yes a separate thread is used for each consumer (obtained from the task executor). By default, a SimpleAsyncTaskExecutor is used and the thread is destroyed when the consumer is stopped. There is no thread limit beyond the container's concurrency settings.
If you inject a different kind of task executor (such as a ThreadPoolTaskExecutor) you must make sure it has enough available threads to support your container's concurrency settings. Container threads are generally long-lived.

On Heroku, does utilising Node.js prevent the need for queues + worker dynos for third-party API calls?

The Heroku Dev Center on the page about using worker dynos and background jobs states that you need to use worker's + queues to handle API calls, such as fetching an RSS feed, as the operation may take some time if the server is slow and doing this on a web dyno would result in it being blocked from receiving additional requests.
However, from what I've read, it seems to me that one of the major points of Node.js is that it doesn't suffer from blocking under these conditions due to its asynchronous event-based runtime model.
I'm confused because wouldn't this imply that it would be ok to do API calls (asynchronously) in the web dynos? Perhaps the docs were written more for the Ruby/Python/etc use cases where a synchronous model was more prevalent?
NodeJS is an implementation of the reactor pattern. The default build of of NodeJS uses 5 reactors. Once these 5 reactors are being used for IO bound tasks, the main event loop will block.
A common misconception about NodeJS is that it is a system that allows you to do many things at once. This is not necessarily the case, it allows you to do other things while waiting on IO bound tasks, up to 5 at a time.
Any CPU bound tasks are always executed in the main event loop, meaning they will block.
This means if your "job" is IO bound, like putting things in databases then you can probably get away with not using dynos. This of course is dependent on how many things you plan on having go on at once. Remember, any task you put in your main app will take away resources from other incoming requests.
Generally it is not recommended for things like this, if you have a job that does some processing, it belongs in a queue that is executed in its own process or thread.