Run mono in parallel doesn't seems faster - kotlin

So I'm trying to make API request in parallel, but it doesn't seems any faster. Am I doing it wrong? Here are my codes.
fun getUserInfo(username: String): Mono<String> {
return webclient
// some config and params
.post()
.bodyToMono(String::class)
.subscribeOn(Schedulers.parallel())
}
fun main(){
val time = measureTimeMillis {
Mono.zip(getUserInfo("doge"), getUserInfo("cheems"), etc...)
.map { user ->listOf(it.t1, it.t2, etc...) }
.block()
}
// give the same amount of time doesn't seems faster
// with and without subscribeOn(Schedulers.parallel())
}

It has nothing to do with your code.
You must understand that any i/o-work is mostly spent waiting. As in waiting for a response.
If we look at the lifecycle of a thread, it will do a bit of preprocessing and then send the request. When the request has been sent it has to wait for the response. This is where the majority of the time is spent, then you get a response and process the response. Here's the thing, as much as 90% of the request time could be spent just waiting. This is a lot of wasted resources having the thread just waiting.
This is what is good with webflux/reactor. When a request is sent, the thread will not wait, it will go on to process other requests/responses, and when that first requests response comes back, any free thread can pick up that response, it does not have to the be the thread that sent the request in the first place.
What i have just described is usually what is called async or asynchronous work.
So lets look at what you are doing. You want to run your requests in parallel, meaning utilizing multiple threads on multiple cores at the same time.
For this to work, you will need to contact the other cpus and tell them to get prepared for incoming work. The cpus will then need to initialize a number of threads on each cpu and then the data must be sent to all the cpus. As you can see there is a setup time involved here.
Then all the requests are made from multiple cpus at the same time, but the waiting for the responses are constant! Its the exact same waiting time as before (up to as much as 90% of the total request time). Then when all responses are return they are collected and processed on multiple cpus, and then they are sent back to the original thread on the original cpu.
What have you gained? Most likely almost nothing, but you also most likely utilized a lot more resources for this very, very minimal gain.
Parallel is usually good if you have the need for raw cpu computing power, like calculations of some sort, examples could be a 3D renderer, or I don't know, cracking hashes etc. not i/o work.
I/O work is usually more about orchestration than raw cpu power. Slow responses will not be solved by parallel computing a slow response is always a slow response. You will just consume more resources to no good.
This is the reason why just regular flatMap Is so powerful in reactor.
It will perform everything async for you without needing to deal with threads, locks, synchronization, joins etc. It will perform all work async as quick as possible utilizing as few resources as possible.

Related

Understanding Cro request/response cycle and memory use

I'm a bit confused about how Cro handles client requests and, specifically, why some requests seem to cause Cro's memory usage to balloon.
A minimal example of this shows up in the literal "Hello world!" Cro server.
use Cro::HTTP::Router;
use Cro::HTTP::Server;
my $application = route {
get -> {
content 'text/html', 'Hello Cro!';
}
}
my Cro::Service $service = Cro::HTTP::Server.new:
:host<localhost>, :port<10000>, :$application;
$service.start;
react whenever signal(SIGINT) {
$service.stop;
exit;
}
All that this server does is respond to GET requests with "Hello Cro!' – which certainly shouldn't be taxing. However, if I navigate to localhost:10000 and then rapidly refresh the page, I notice Cro's memory use start to climb (and then to stay elevated).
This only seems to happen when the refreshes are rapid, which suggests that the issue might be related either to not properly closing connections or to a concurrency issue (a maybe-slightly-related prior question).
Is there some performance technique or best practice that this "Hello world" server has omitted for simplicity? Or am I missing something else about how Cro is designed to work?
The Cro request processing pipeline is a chain of supply blocks that requests and, later, responses pass through. Decisions about the optimal number of processing threads to create are left to the Raku ThreadPoolScheduler implementation.
So far as connection lifetime goes, it's up to the client - that is, the web browser - as to how eagerly connections are closed; if the browser uses a keep-alive HTTP/1.1 connection or retains a HTTP/2.0 connection, Cro respects that request.
Regarding memory use, growth up to a certain point isn't surprising; it's only a problem if it doesn't eventually level out. Causes include:
The scheduler determining more threads are required to handle the load. Each OS thread comes with some overhead inside the VM, the majority of it being that the GC nursery is per thread to allow simple bump-the-pointer allocation.
The MoarVM optimizer using memory for specialized bytecode and JIT-compiled machine code, which it produces in the background as the application runs, and is driven by certain bits of code having been executed enough times.
The GC trying to converge on a full collection threshold.

Should parallel API call use Schedulers.parallel() or Schedulers.boundedElastic()

To be honest I have no idea how schedulers work in reactor. I have read few of them and this what I found.
Schedulers.parallel() is good for CPU-intensive but short-lived tasks.
It can execute N such tasks in parallel (by default N == number of
CPUs)
Schedulers.elastic() and Schedulers.boundedElastic() are good for more
long-lived tasks (eg. blocking IO tasks). The elastic one spawns
threads on-demand without a limit while the recently introduced
boundedElastic does the same with a ceiling on the number of created
threads.
So in my API calls there's a task where I have to poll request over and over again until its state is ready.
Flux.just(order1, order2)
.parallel(4)
.runOn(Schedulers.parallel())
.flatMap { order -> createOrder(order) }
.flatMap { orderId ->
pollConfirmOrderStatus(orderId)
.retryWhen(notReady)
}
.collectList()
.subscribe()
As you can see I use Schedulers.parallel() and it works fine, but I'm concerning about blocking CPU usage since my server doesn't have that much CPU cores. The pollConfirmOrderStatus takes about 1-2 minutes so I'm not sure if it would block other process in my server from accessing CPU. So should I use Schedulers.parallel() or Schedulers.bondedElastic() here.
If your method pollConfirmOrderStatus() doesn't block the parallel Scheduler's threads it should be fine. Otherwise then you might be blocking all the available thread in the parallel Scheduler, which might end up in a deadlock if your state never gets ready.
Here, it explains that parallel scheduler is reserved for non-blocking calls, and that you can use BlockHound to spot blocking calls from non-blocking intended threads.
https://spring.io/blog/2019/03/28/reactor-debugging-experience

Idiomatic Way Handle and Concat Multiple HTTP Requests In Clojure?

I'm working on a program that needs to make multiple calls to various microservices, possibly do some processing on those results, and then return a combination of those processes.
A very basic example might look like:
(def urls ["http://localhost:8080" "http://localhost:8080"])
(defn handle-http-request [url]
(let [request (http-kit/get url)]
(do-some-processing (:body #request))))
(defn home-page
[request]
(let [response (pmap handle-http-request urls)]
(ring-resp/response {:buildings (first response) :characters (second response)})))
In this case i'm using pmap to handle running all my requests in parallel, and then return them as JSON that the UI making the request can handle. In the real environment there will be more URLs each fetching different data.
My question is whether this is an appropriate way to handle this problem? I've looked some core.async, and see it as a possible way to handle this but worry it might be overkill? My second concern is handling errors, it would appear that core.async might be able to more elegantly handle issues where a remote has timed out. Am I right in this assumption, or is using pmap okay in this situation?
Lastly, are there any accepted patterns or reading on how to handle microservice architectures like this? I see my problem as relatively specific, but feel the idea of a server making requests to many others and compiling those results is nothing new.
The HTTP-Kit documentation
provides an example for this using futures:
Combined, concurrent requests, handle results synchronously
(let [urls ["http://server.com/api/1" "http://server.com/api/2" "http://server.com/api/3"]
;; send the request concurrently (asynchronously)
futures (doall (map http/get urls))]
(doseq [resp futures]
;; wait for server response synchronously
(println (-> #resp :opts :url) " status: " (:status #resp))
)
Yet, often this is still not enough. Make sure you configure timeouts
and you have an escalation strategy once the timeouts hit. The more
other services you are going to hit, the more complex this task gets.
Maybe have a look at libraries like
Resilience4JClj, that
provide configurable ways to deal with retrying, caching, timeouts, ...
Note: pmap is best suited to CPU intensive tasks. Or as the docs state:
Only useful for
computationally intensive functions where the time of f dominates
the coordination overhead.

Why is async used when using http request?

i can't understand why use Asynchronous and await when fetch data from server
A network request from a client to a server, possibly over a long distance and slow internet can take an eternity in CPU time scales.
If it weren't async, the UI would block until the request is completed.
With async execution the UI thread is free to update a progress bar or render other stuff while the framework or Operating System stack is busy on another thread to send and receive the request your code made.
Most other calls that reach out to the Operating System for files or other resources are async for the same reason, while not all of them are as slow as requests to a remote server, but often you can't know in advance if it will be fast enough to not hurt your frame rate and cause visible disruption or janks in the UI.
await is used to make code after that statement starting with wait is executed only when the async request is completed. async / await is used to make async code look more like sync code to make it easier to write and reason about.
Async helps a lot with scalability and responsiveness.
Using synchronous request blocks the client until a response has been received. As you increase concurrent users you basically have a thread per user. This can create a lot of idle time, and wasted computation. One request gets one response in the order received.
Using asynchronous requests allows the client to receive requests/send responses in any random order of execution, as they are able to be received/sent. This lets your threads work smarter.
Here's a pretty simple and solid resource from Mozilla:
https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequest/Synchronous_and_Asynchronous_Requests#Asynchronous_request

How mutiiple async NSURLConnection handles internally

I am curious to know how multiple async NSURLConnection connections handles internally ? I know they use an internal background thread to run it but lets say if in code i am creating two async NSURLConnection concurrently , does that will create two thread internally to run them in parllel or second connection will wait for first to complete ? In brief please confrim how multiple async NSURLConnection achieve concurrency ?
I guess it will run in parallel. You can have a look on WWDC Session Video about network programming.
Apple engineer said handling url request one by one is expensive, running them in parallel is much more reasonable. The reason is, for processing a request, actually most of the time is spent on latency, not logic processing in devices and servers. So, handling requests parallel will efficiently reduce time waste for latency.
so I guess they wont do async NSURLConnection one by one because it's contradicting this basic theory.
Besides, I have tried to download images Async using NSURLConnection. I sent out a few request once. like
for ( i = 1 to 4) {
send request i
}
The response is also not in sequence.
Each async NSURLConnection runs on it's own thread after you start the connection (async NSURLConnection has to be created and started on main thread!) and their delegate and datadelegate methods called on main thread.
Other option that you can use it as using "NSOperationQueue" and execute request using NSOperations. Please refer http://www.icodeblog.com/2012/10/19/tutorial-asynchronous-http-client-using-nsoperationqueue/ for more detail.
Thanks,
Jim