Need suggestion to implement OkHttp connection - kotlin

I am switching from HTTPURLConnection to OkHttpConnection in my service (Kotlin).
I have implemented below code for GET and POST request function with OKHttp connections. This is great and useful, about 50% latency reduction compared to calling HTTPUrlConnection. I can achieve this, only if I make few requests at a time.
If I make a lot of GET request in period of time (like 10K+), the performance seems to be average. Any suggestions ? Am I need to configure any params to achieve better ?
private fun okhttpioConnection(): String {"Initializing OKHttp-ioConnection")
client = OkHttpClient()
client.setConnectTimeout(600000, TimeUnit.MILLISECONDS)
client.setReadTimeout(600000, TimeUnit.MILLISECONDS)
client.connectionPool = ConnectionPool(100, 100000)
client.retryOnConnectionFailure = true
var request = createRequest(restParams) // GET or POST
var response = client.newCall(request).execute()
var responseCodeString = response.code().toString()
return response.body()!!.string()
fun createRequest(restParams: RestParams): Request {
if (restParams.method == "POST") {
return Request.Builder()
MediaType.parse("application/json; charset=utf-8"), restParams.body.toString()))
.header("Authorization", "xxxxxx")
.addHeader("Accept", "application/json, text/json")
} else {
return Request.Builder()
.header("Authorization", "xxxxxxx")
.addHeader("Accept", "application/json, text/json")

OkHttp doesn't impose a maximum number of connections, any limit will come from the code that is calling okhttp, and I don't think you can do anything to improve the performance further (from the point of view of okhttp). To know for sure you need to share more information such as req/s, latency, number of threads, etc. you are getting now.
Some of the things for you to check, as there are many potential issues:
Check if your code runs in an Executor that has has a fixed size (this will limit the request/s).
Check that you are not maxing out your CPU, memory, etc.
That the server you are hitting is not maxing out on any resources (concurrent threads, CPU, connections, etc) or rate limiting the calls.
That the process is not maxing the number of allowed TCP sockets open.
These are some of the things that come to my mind, but probably there are many more.


Apache HTTPClient5 - How to Prevent Connection/Stream Refused

Problem Statement
I'm a Software Engineer in Test running order permutations of Restaurant Menu Items to confirm that they succeed order placement w/ the POS
In short, this POSTs a JSON payload to an endpoint which then validates the order w/ a POS to define success/fail/other
Where POS, and therefore Transactions per Second (TPS), may vary, but each Back End uses the same core handling
This can be as high as ~22,000 permutations per item, in easily manageable JSON size, that need to be handled as quickly as possible
The Network can vary wildly depending upon the Restaurant, and/or Region, one is testing
E.g. where some have a much higher latency than others
Therefore, the HTTPClient should be able to intelligently negotiate the same content & endpoint regardless of this
Direct Problem
I'm using Apache's HTTP Client 5 w/ PoolingAsyncClientConnectionManager to execute both the GET for the Menu contents, and the POST to check if the order succeeds
This works out of the box, but sometimes loses connections w/ Stream Refused, specifically:
org.apache.hc.core5.http2.H2StreamResetException: Stream refused
No individual tuning seems to work across all network contexts w/ variable latency, that I can find
Following the stacktrace seems to indicate it is that the stream had closed already, therefore needs a way to keep it open or not execute an already-closed connection
if (connState == ConnectionHandshake.GRACEFUL_SHUTDOWN) {
throw new H2StreamResetException(H2Error.PROTOCOL_ERROR, "Stream refused");
Some Attempts to Fix Problem
Tried to use Search Engines to find answers but there are few hits for HTTPClient5
Tried to use official documentation but this is sparse
Changing max connections per route to a reduced number, shifting inactivity validations, or connection time to live
Where the inactivity checks may fix the POST, but stall the GET for some transactions
And that tuning for one region/restaurant may work for 1 then break for another, w/ only the Network as variable
PoolingAsyncClientConnectionManagerBuilder builder = PoolingAsyncClientConnectionManagerBuilder
Shifting to a custom RequestConfig w/ different timeouts
private HttpClientContext getHttpClientContext() {
RequestConfig requestConfig = RequestConfig.custom()
.setConnectTimeout(Timeout.of(10, TimeUnit.SECONDS))
.setResponseTimeout(Timeout.of(10, TimeUnit.SECONDS))
HttpClientContext httpContext = HttpClientContext.create();
return httpContext;
Initial Code Segments for Analysis
(In addition to the above segments w/ change attempts)
Wrapper handling to init and get response
public SimpleHttpResponse getFullResponse(String url, PoolingAsyncClientConnectionManager manager, SimpleHttpRequest req) {
try (CloseableHttpAsyncClient httpclient = getHTTPClientInstance(manager)) {
CountDownLatch latch = new CountDownLatch(1);
long startTime = System.currentTimeMillis();
Future<SimpleHttpResponse> future = getHTTPResponse(url, httpclient, latch, startTime, req);
return future.get();
} catch (IOException | InterruptedException | ExecutionException e) {
return new SimpleHttpResponse(999, CommonUtils.getExceptionAsMap(e).toString());
With actual handler and probing code
private Future<SimpleHttpResponse> getHTTPResponse(String url, CloseableHttpAsyncClient httpclient, CountDownLatch latch, long startTime, SimpleHttpRequest req) {
return httpclient.execute(req, getHttpContext(), new FutureCallback<SimpleHttpResponse>() {
public void completed(SimpleHttpResponse response) {
latch.countDown();"[{}][{}ms] - {}", response.getCode(), getTotalTime(startTime), url);
public void failed(Exception e) {
logger.error("[{}ms] - {} - {}", getTotalTime(startTime), url, e);
public void cancelled() {
logger.error("[{}ms] - request cancelled for {}", getTotalTime(startTime), url);
Direct Question
Is there a way to configure the client such that it can handle for these variances on its own without explicitly modifying the configuration for each endpoint context?
Fixed w/ Combination of the below to Assure Connection Live/Ready
(Or at least is stable)
Forcing HTTP 1
Setting Effective Headers for POST
Specifically the close header
req.setHeader("Connection", "close, TE");
Note: Inactivity check helps, but still sometimes gets refusals w/o this
Setting Inactivity Checks by Type
Set POSTs to validate immediately after inactivity
Note: Using 1000 for both caused a high drop rate for some systems
Set GET to validate after 1s
Given the Error Context
Tracing the connection problem in stacktrace to AbstractH2StreamMultiplexer
Shows ConnectionHandshake.GRACEFUL_SHUTDOWN as triggering the stream refusal
if (connState == ConnectionHandshake.GRACEFUL_SHUTDOWN) {
throw new H2StreamResetException(H2Error.PROTOCOL_ERROR, "Stream refused");
Which corresponds to
connState = streamMap.isEmpty() ? ConnectionHandshake.SHUTDOWN : ConnectionHandshake.GRACEFUL_SHUTDOWN;
If I'm understanding correctly:
The connections were being un/intentionally closed
However, they were not being confirmed ready before executing again
Which caused it to fail because the stream was not viable
Therefore the fix works because (it seems)
Given Forcing HTTP1 allows for a single context to manage
Where HttpVersionPolicy NEGOTIATE/FORCE_HTTP_2 had greater or equivalent failures across the spectrum of regions/menus
And it assures that all connections are valid before use
And POSTs are always closed due to the close header, which is unavailable to HTTP2
GET is checked for validity w/ reasonable periodicity
POST is checked every time, and since it is forcibly closed, it is re-acquired before execution
Which leaves no room for unexpected closures
And otherwise the potential that it was incorrectly switching to HTTP2
Will accept this until a better answer comes along, as this is stable but sub-optimal.

Kotlin wrap sequential IO calls as a Sequence

I need to process all of the results from a paged API endpoint. I'd like to present all of the results as a sequence.
I've come up with the following (slightly psuedo-coded):
suspend fun getAllRowsFromAPI(client: Client): Sequence<Row> {
var currentRequest: Request? = client.requestForNextPage()
return withContext(Dispatchers.IO) {
sequence {
while(currentRequest != null) {
var rowsInPage = runBlocking { client.makeRequest(currentRequest) }
currentRequest = client.requestForNextPage()
This functions but I'm not sure about a couple of things:
Is the API request happening inside runBlocking still happening with the IO dispatcher?
Is there a way to refactor the code to launch the next request before yielding the current results, then awaiting on it later?
Question 1: The API-request will still run on the IO-dispatcher, but it will block the thread it's running on. This means that no other tasks can be scheduled on that thread while waiting for the request to finish. There's not really any reason to use runBlocking in production-code at all, because:
If makeRequest is already a blocking call, then runBlocking will do practically nothing.
If makeRequest was a suspending call, then runBlocking would make the code less efficient. It wouldn't yield the thread back to the pool while waiting for the request to finish.
Whether makeRequest is a blocking or non-blocking call depends on the client you're using. Here's a non-blocking http-client I can recommend:
Question 2: I would use a Flow for this purpose. You can think of it as a suspendable variant of Sequence. Flows are cold, which means that it won't run before the consumer asks for its contents (in contrary to being hot, which means the producer will push new values no matter if the consumer wants it or not). A Kotlin Flow has an operator called buffer which you can use to make it request more pages before it has fully consumed the previous page.
The code could look quite similar to what you already have:
suspend fun getAllRowsFromAPI(client: Client): Flow<Row> = flow {
var currentRequest: Request? = client.requestForNextPage()
while(currentRequest != null) {
val rowsInPage = client.makeRequest(currentRequest)
currentRequest = client.requestForNextPage()
.buffer(capacity = 1)
The capacity of 1 means that will only make 1 more request while processing an earlier page. You could increase the buffer size to make more concurrent requests.
You should check out this talk from KotlinConf 2019 to learn more about flows:
Sequences are definitely not the thing you want to use in this case, because they are not designed to work in asynchronous environment. Perhaps you should take a look at flows and channels, but for your case the best and simplest choice is just a collection of deferred values, because you want to process all requests at once (flows and channels process them one-by-one, maybe with limited buffer size).
The following approach allows you to start all requests asynchronously (assuming that makeRequest is suspended function and supports asynchronous requests). When you'll need your results, you'll need to wait only for the slowest request to finish.
fun getClientRequests(client: Client): List<Request> {
val requests = ArrayList<Request>()
var currentRequest: Request? = client.requestForNextPage()
while (currentRequest != null) {
requests += currentRequest
currentRequest = client.requestForNextPage()
return requests
// This function is not even suspended, so it finishes almost immediately
fun getAllRowsFromAPI(client: Client): List<Deferred<Page>> =
getClientRequests(client).map {
* The better practice would be making getAllRowsFromApi an extension function
* to CoroutineScope and calling receiver scope's async function.
* GlobalScope is used here just for simplicity.
GlobalScope.async(Dispatchers.IO) { client.makeRequest(it) }
fun main() {
val client = Client()
val deferredPages = getAllRowsFromAPI(client) // This line executes fast
// Here you can do whatever you want, all requests are processed in background
// Then, when we need results....
val pages = runBlocking { { it.await() }
// In your case you also want to "unpack" pages and get rows, you can do it here:
val rows = pages.flatMap { it.getRows() }
I happened across suspendingSequence in Kotlin's coroutines-examples:
This is exactly what I was looking for.

Cache the result of a Mono from a WebClient call in a Spring WebFlux web application

I am looking to cache a Mono (only if it is successful) which is the result of a WebClient call.
From reading the project reactor addons docs I don't feel that CacheMono is a good fit as it caches the errors as well which I do not want.
So instead of using CacheMono I am doing the below:
Cache<MyRequestObject, Mono<MyResponseObject>> myCaffeineCache =
MyRequestObject myRequestObject = ...;
Mono<MyResponseObject> myResponseObject = myCaffeineCache.get(myRequestObject,
requestAsKey -> WebClient.create()
.doOnError(t -> myCaffeineCache.invalidate(requestAsKey)));
Here I am calling cache on the Mono and then adding it to the caffeine cache.
Any errors will enter doOnError to invalidate the cache.
Is this a valid approach to caching a Mono WebClient response?
This is one of the very few use cases where you'd be actually allowed to call non-reactive libraries and wrap them with reactive types, and have processing done in side-effects operators like doOnXYZ, because:
Caffeine is an in-memory cache, so as far as I know there's no I/O involved
Caches often don't offer strong guarantees about caching values (it's very much "fire and forget)
You can then in this case query the cache to see if a cached version is there (wrap it and return right away), and cache a successful real response in a doOn operator, like this:
public class MyService {
private WebClient client;
private Cache<MyRequestObject, MyResponseObject> myCaffeineCache;
public MyService() {
this.client = WebClient.create();
this.myCaffeineCache = Caffeine.newBuilder().maximumSize(100)
public Mono<MyResponseObject> fetchResponse(MyRequestObject request) {
MyResponseObject cachedVersion = this.myCaffeineCache.get(myRequestObject);
if (cachedVersion != null) {
return Mono.just(cachedVersion);
} else {
.doOnNext(response -> this.myCaffeineCache.put(request.getKey(), response));
Note that I wouldn't cache reactive types here, since there's no I/O involved nor backpressure once the value is returned by the cache. On the contrary, it's making things more difficult with subscription and other reactive streams constraints.
Also you're right about the cache operator since it isn't about caching the value per se, but more about replaying what happened to other subscribers. I believe that cache and replay operators are actually synonyms for Flux.
Actually, you don't have to save errors with CacheMono.
private Cache<MyRequestObject, MyResponseObject> myCaffeineCache;
Mono<MyResponseObject> myResponseObject =
CacheMono.lookup(key -> Mono.justOrEmpty(myCaffeineCache.getIfPresent(key))
.map(Signal::next), myRequestObject)
.onCacheMissResume(() -> /* Your web client or other Mono here */)
.andWriteWith((key, signal) -> Mono.fromRunnable(() ->
.ifPresent(value -> myCaffeineCache.put(key, value))));
When you switch to external cache, this may be usefull. Don't forget using reactive clients for external caches.

How to wrap a Flux with a blocking operation in the subscribe?

In the documentation it is written that you should wrap blocking code into a Mono:
But it is not written how to actually do it.
I have the following code:
#PostMapping(path = "some-path", consumes = MediaType.APPLICATION_STREAM_JSON_VALUE)
public Mono<Void> doeSomething(#Valid #RequestBody Flux<Something> something) {
something.subscribe(something -> {
// some blocking operation
// how to return Mono<Void> here?
The first problem I have here is that I need to return something but I cant.
If I would return a Mono.empty for example the request would be closed before the work of the flux is done.
The second problem is: how do I actually wrap the blocking code like it is suggested in the documentation:
Mono blockingWrapper = Mono.fromCallable(() -> {
return /* make a remote synchronous call */
blockingWrapper = blockingWrapper.subscribeOn(Schedulers.elastic());
You should not call subscribe within a controller handler, but just build a reactive pipeline and return it. Ultimately, the HTTP client will request data (through the Spring WebFlux engine) and that's what subscribes and requests data to the pipeline.
Subscribing manually will decouple the request processing from that other operation, which will 1) remove any guarantee about the order of operations and 2) break the processing if that other operation is using HTTP resources (such as the request body).
In this case, the source is not blocking, but only the transform operation is. So we'd better use publishOn to signal that the rest of the chain should be executed on a specific Scheduler. If the operation here is I/O bound, then Schedulers.elastic() is the best choice, if it's CPU-bound then Schedulers .paralell is better. Here's an example:
#PostMapping(path = "/some-path", consumes = MediaType.APPLICATION_STREAM_JSON_VALUE)
public Mono<Void> doSomething(#Valid #RequestBody Flux<Something> something) {
return something.collectList()
.map(things -> {
return processThings(things);
public ProcessingResult processThings(List<Something> things) {
For more information on that topic, check out the Scheduler section in the reactor docs. If your application tends to do a lot of things like this, you're losing a lot of the benefits of reactive streams and you might consider switching to a Servlet-based model where you can configure thread pools accordingly.

WCF Async deadlock?

Has anyone run into a situation where a WaitAny call returns a valid handle index, but the Proxy.End call blocks? Or has any recommendations or how best to debug this - tried tracing, performance counters (to check the max percentages), logging everywhere
The test scenario: 2 async. requests are going out (there's a bit more to the full implementation), and the 1st Proxy.End call return successfully, but the subsequent blocks. I've check the WCF trace and don't see anything particularly interesting. NOTE that it is self querying an endpoint that exists in the same process as well as a remote machine (=2 async requests)
As far as I can see the call is going through on the service implementation side for both queries, but it just blocks on the subsequent end call. It seems to work with just a single call though, regardless of whether it is sending the request to a remote machine or to itself; so it something to do with the multiple queries or some other factor causing the lockup.
I've tried different "concurrencymode"s and "instancecontextmode"s but it doesn't seem to have any bearing on the result.
Here's a cut down version of the internal code for parsing the handle list:
ValidationResults IValidationService.EndValidate()
var results = new ValidationResults();
if (_asyncResults.RemainingWaitHandles == null)
results.ReturnCode = AsyncResultEnum.NoMoreRequests;
return results;
var waitArray = _asyncResults.RemainingWaitHandles.ToArray();
if (waitArray.GetLength(0) > 0)
int handleIndex = WaitHandle.WaitAny(waitArray, _defaultTimeOut);
if (handleIndex == WaitHandle.WaitTimeout)
// Timeout on signal for all handles occurred
// Close proxies and return...
var asyncResult = _asyncResults.Results[handleIndex];
results.Results = asyncResult.Proxy.EndServerValidateGroups(asyncResult.AsyncResult);
results.ReturnCode = AsyncResultEnum.Success;
return results;
results.ReturnCode = AsyncResultEnum.NoMoreRequests;
return results;
and the code that calls this:
validateResult = validationService.EndValidateSuppression();
while (validateResult.ReturnCode == AsyncResultEnum.Success)
// Update progress step
validateResult = validationService.EndValidateSuppression();
I've commented out the callbacks on the initiating node (FYI it's actually an 3-tier setup, but the problem is isolated to this 2nd tier calling the 3rd tier - the callbacks go from the 2nd tier to the 1st tier which have been removed in this test). Thoughts?
Sticking to the solution I left in my comment. Simply avoid chaining a callback to an aysnc calls that have different destinations (i.e. proxies)