Helidon - ForwardingHandler does not complete RequestContext Publisher if the LoadBalancer cuts the started Expect: 100-continue process - apache

We are providing Helidon MP Rest-Services behind an Apache httpd Load balancer. The following constellation leads to a stuck of the JerseySupport Service executor queue.
A Client sends a POST request to our rest service with a json payload and an Expect 100-continue Header. The Apache load balancer sends the request do the backend. The backend takes the request and starts a JerseySupport runnable which waits for incomming data, then the backend sends a response to the LB to start the stream (Response Status 100). If the client request exceeds the load balancer connection timeout at this point the load Balancer cuts the connection to the calling client with a proxy error but the backend service does not get informed and waits forever.
The problem is that the io.helidon.webserver.ForwardingHandler only completes the http content publisher if a LastHttpContent Message is send and this never happens. If the publisher never completes, the subscriber inside the waiting JerseySupport service instance blocks a server executor instance forever. If this happens several times the whole rest service is blocked.
I did not found a possibility to configure a corresponding timeout inside helidon to interrupt the JerseySupport service nor a possibility to get the apache load balancer to end the connection to the backend appropriately.
Has anyone of you noticed similar problems or found a workaround apart from disabling 100-continue streaming.
Helidon Version: 1.4.4
Apache Version: 2.4.41
Thanks in advance

Related

Socket Timeout in WSO2 ESB 5.0.0

We are continuously getting Socket Timeout in our 2 ESB instances which are in the same cluster. The IP address which prints in the logs belongs to the Load Balancer which sits on top of 2 ESB instances. After some time the ES instances will go into the unhealthy condition and will not serve any requests.
Below is the sample log for reference.
TID: [-1] [] [2018-10-07 22:42:11,711] WARN {org.apache.synapse.transport.passthru.SourceHandler} - Connection time out after request is read: http-incoming-5709 Socket Timeout : 180000 Remote Address : /10.246.19.23:45278
Please let us know if anyone has come across this kind of issues.
TID: [-1] [] [2018-10-07 22:42:11,711] WARN
{org.apache.synapse.transport.passthru.SourceHandler} - Connection
time out after request is read: http-incoming-5709 Socket Timeout :
180000 Remote Address : /10.246.19.23:45278
The reason for the above error is the connection from ESB to the backend takes more than 180,000 milliseconds and ESB marks the connection as timed out. I believe you have configured the endpoint timeout for 180,000 milliseconds. This can be due to a slow backend service and taking more than 3 minutes to return a response is not a good sign usually, which may lead to high thread utilization in ESB.
Finally, we found what was the issue. Socket Timeout was coming in our ESB logs because of one of the API was failing and it was not returning any response back to the calling client (not even a fault response), so this case there might be a thread which will be holding this single transaction for which it will be waiting as well. After some time there will be no new thread available to serve the request, because of this server was going into an unhealthy condition.

WSO2 ESB 5.0.0 threads and classes loaded issue

I have a simple passthrough proxy in WSO2 ESB 5.0.0 to a WSO2 DSS. When I consume the esb proxy the live treads classes loaded increase until WSO2 ESB breaks down. When esb breaks down there are 284 threads and 14k classes load. If I consume the DSS directly, dss doesn't break down and the maximun threads are 104 and 9k classes loaded.
How can I force esb releases that resources, or improve how the esb handle the http connections in esb? Looks like zombie connections never release the thread.
Any help to focus the problem?
Doesn't look that problem with classloading and thread count. I just finished to test newly installed WSO2ESB server.
WSOESB version 5.0.0
java8
Windows 8
Esb server as well has DSS feature installed.
DSS services is called over http1.1 protocol.
DSS service has long running query (over 10s)
Total number of simultaneously requests to ESB service over 150
Total number of loaded classes over 15000 total threads running over 550. Even in this high load there is no any issuer like you mention.
What i actually recommend is to check how you make http requests to esb service. It is kind of sensitive to headers like Content-Type, Encoding. It took quiet long time to find out how properly call soap service on esb, using apache httpclient (4.5)
Eventually probably find out then problem. Problem is between DSS and ESB servers. According to source code, this kind of error happen, when esb send request to dss server and request is read by DSS server but connection to DSS server is closed before DSS server write response to ESB server. Then esb server report message about such problem as your mention
SourceHandler
...
} else if (state == ProtocolState.REQUEST_DONE) {
isFault = true;
log.warn("Connection closed by the client after request is read: " + conn);
}
Easy to reproduce start esb and dss server. Start sending a lots of requests to passthrough proxy (which proxy request to DSS service) on ESB, shutdown DSS server and you will see a lot of
WARN - SourceHandler Connection closed by the client after request is read: http-incoming-1073 Remote Address
This is might be network issuer, firewall or as well WsoDSS server has socket timeout which is by default 180s.

Weblogic server failed to response under load

We have a quite strange situation on my sight. Under load our WL 10.3.2 server failed to response. We are using RestEasy with HttpClient version 3.1 to coordinate with web service deployed as WAR.
What we have is a calculation process that run on 4 containers based on 4 physical machines and each of them send request to WL during calculation.
Each run we see a messages from HttpClient like this:
[THREAD1] INFO I/O exception (org.apache.commons.httpclient.NoHttpResponseException) caught when processing request: The server OUR_SERVER_NAME failed to respond
[THREAD1] INFO Retrying request
The HttpClient make several requests until get necessary data.
I want to understand why WL can refuse connections. I read about WL thread pool that process http request and found out that WL allocate separate thread to process web request and the numbers of threads is not bounded in default configuration. Also our server is configured Maximum Open Sockets: -1 which means that the number of open sockets is unlimited.
From this thread I'd want to understand where the issue is? Is it on WL side or it's a problem of our business logic? Can you guys help to deeper investigate the situation.
What should I check more in order to understand that our WL server is configured to work with as much requests as we want?

Socket exception while Load testing

While load testing with JMeter we are receiving Non HTTP response code: java.net.SocketException for all the requests once peak load is reached.
Here is the server config:
JMeter -> F5 (load balancer) -> 2 legs of Weblogic servers.
What could be the reason of getting socketexception?
Any help in this regard is highly appreciated!
This is your server starting to reject connections or timeouts on client side.
This means your server cannot handle the load correctly.

How to configure Glassfish to drop hanging requests?

Can I configure Glassfish to drop any request that takes longer than 10 seconds to process?
Example:
I'm using Glassfish to host my web service. The thread pool is configured to have max 5 connections.
My service has a method that does this:
System.out.println("New request");
Thread.sleep(1000*1000);
I'm creating 5 requests to the service and I see 5 messages "New request" in the log. Then the server stop to respond for a looong time.
In live environment all requests must be processed in less than a second. If it takes more time to process then there is a problem with the request and I want Glassfish to drop such requests but stay alive and serve other requests.
Currently I'm using a workaround in the code. At the beginning of my web method I launch a separate thread for request processing with a timeout as it was suggested here: How to timeout a thread
I do not like this solution and still believe that there must be a configuration setting in the Glassfish to apply this logic to all requests, not to just one method.