We are continuously getting Socket Timeout in our 2 ESB instances which are in the same cluster. The IP address which prints in the logs belongs to the Load Balancer which sits on top of 2 ESB instances. After some time the ES instances will go into the unhealthy condition and will not serve any requests.
Below is the sample log for reference.
TID: [-1] [] [2018-10-07 22:42:11,711] WARN {org.apache.synapse.transport.passthru.SourceHandler} - Connection time out after request is read: http-incoming-5709 Socket Timeout : 180000 Remote Address : /10.246.19.23:45278
Please let us know if anyone has come across this kind of issues.
TID: [-1] [] [2018-10-07 22:42:11,711] WARN
{org.apache.synapse.transport.passthru.SourceHandler} - Connection
time out after request is read: http-incoming-5709 Socket Timeout :
180000 Remote Address : /10.246.19.23:45278
The reason for the above error is the connection from ESB to the backend takes more than 180,000 milliseconds and ESB marks the connection as timed out. I believe you have configured the endpoint timeout for 180,000 milliseconds. This can be due to a slow backend service and taking more than 3 minutes to return a response is not a good sign usually, which may lead to high thread utilization in ESB.
Finally, we found what was the issue. Socket Timeout was coming in our ESB logs because of one of the API was failing and it was not returning any response back to the calling client (not even a fault response), so this case there might be a thread which will be holding this single transaction for which it will be waiting as well. After some time there will be no new thread available to serve the request, because of this server was going into an unhealthy condition.
Related
We are providing Helidon MP Rest-Services behind an Apache httpd Load balancer. The following constellation leads to a stuck of the JerseySupport Service executor queue.
A Client sends a POST request to our rest service with a json payload and an Expect 100-continue Header. The Apache load balancer sends the request do the backend. The backend takes the request and starts a JerseySupport runnable which waits for incomming data, then the backend sends a response to the LB to start the stream (Response Status 100). If the client request exceeds the load balancer connection timeout at this point the load Balancer cuts the connection to the calling client with a proxy error but the backend service does not get informed and waits forever.
The problem is that the io.helidon.webserver.ForwardingHandler only completes the http content publisher if a LastHttpContent Message is send and this never happens. If the publisher never completes, the subscriber inside the waiting JerseySupport service instance blocks a server executor instance forever. If this happens several times the whole rest service is blocked.
I did not found a possibility to configure a corresponding timeout inside helidon to interrupt the JerseySupport service nor a possibility to get the apache load balancer to end the connection to the backend appropriately.
Has anyone of you noticed similar problems or found a workaround apart from disabling 100-continue streaming.
Helidon Version: 1.4.4
Apache Version: 2.4.41
Thanks in advance
I have some images in my queue and I pass each image to my flask server where processing on images is done and a response is received in my rabbitmq server. After receiving response, I get this error "pika.exceptions.StreamLostError: Stream connection lost(104,'Connection reset by peer')". This happens when rabbitmq channel again starts consuming the connection. I don't understand why this happens. Also I would like to restart the server again automatically if this error persists. Is there any way to do that?
Your consume process is probably taking too much time to complete and send Ack/Nack to the server. Therefore, server does not receive heartbeat from your client, and thereby stops from serving. Then, on the client side you receive:
pika.exceptions.StreamLostError: Stream connection lost(104,'Connection reset by peer')
You should see server logs as well. It is probably like this:
missed heartbeats from client, timeout: 60s
See this issue for mor information.
Do your work on another thread. See this code as an example -
https://github.com/pika/pika/blob/master/examples/basic_consumer_threaded.py
NOTE: the RabbitMQ team monitors the rabbitmq-users mailing list and only sometimes answers questions on StackOverflow.
You can change stream connection limit if you set heartbeat in ConnectionParameters
connection_params = pika.ConnectionParameters(heartbeat=10)
wher number in seconds. It say yout TCP connection keepalive to 10 seconds for example.
More information https://www.rabbitmq.com/heartbeats.html and https://www.rabbitmq.com/heartbeats.html#tcp-keepalives
I am using ActiveMQ version 5.10.0 with default configuration.
The documentation on Active MQ transport protocols say that by default wireFormat.maxInactivityDuration is 30000 and transport.useKeepAlive is enabled by default.
Does that mean that for default configuration , inactivity timeout will never occur ? as keepAlive messages are enabled and sent by default ?
I have tried leaving my queues idle for a day and I did not see any Inactivity timeout logs.
But the activeMQ page also says
" Using the default values; if no data has been written or read from the connection for 30 seconds, the InactivityMonitor kicks in. The InactivityMonitor throws an InactivityIOException and shuts down the transport associated with the connection."
http://activemq.apache.org/activemq-inactivitymonitor.html
The inactivity timeout would occur when the connection is broken or the broker is experiencing issues such that it cannot respond to the ping request that the client will send it. The timeout does not related to message inactivity or the like but to ping / pong type hearbeats between client and broker. So long as the broker is healthy and sending the requested responses the client will not terminate the connection even if no messages happen to be flowing across it.
We have a quite strange situation on my sight. Under load our WL 10.3.2 server failed to response. We are using RestEasy with HttpClient version 3.1 to coordinate with web service deployed as WAR.
What we have is a calculation process that run on 4 containers based on 4 physical machines and each of them send request to WL during calculation.
Each run we see a messages from HttpClient like this:
[THREAD1] INFO I/O exception (org.apache.commons.httpclient.NoHttpResponseException) caught when processing request: The server OUR_SERVER_NAME failed to respond
[THREAD1] INFO Retrying request
The HttpClient make several requests until get necessary data.
I want to understand why WL can refuse connections. I read about WL thread pool that process http request and found out that WL allocate separate thread to process web request and the numbers of threads is not bounded in default configuration. Also our server is configured Maximum Open Sockets: -1 which means that the number of open sockets is unlimited.
From this thread I'd want to understand where the issue is? Is it on WL side or it's a problem of our business logic? Can you guys help to deeper investigate the situation.
What should I check more in order to understand that our WL server is configured to work with as much requests as we want?
Opened 2 TCP connections :
1. Normal connection(while implementing echo server,client) &
2. HTTP connection
Opened HTTP connection with curl(modified) utility while running apache as server, where curl is not sending GET request for some time after connection establishment.
For normal connection after connection establishment, server is waiting for request from client.
But as observed, Strangely in HTTP connection after connection establishment, if GET request is not coming from client(for some time), server is sending FIN pkt to client & closing his part of connection.
Is it a mandatory condition for HTTP client to send GET request immediately after initial connection.
Apache got a parameter called Timeout.
Its manual page ( Apache Core - Timeout Directive ) states:
The TimeOut directive defines the length of time Apache will wait for I/O in various circumstances:
When reading data from the client, the length of time to wait for a
TCP packet to arrive if the read buffer is empty.
When writing data to the client, the length of time to wait for an
acknowledgement of a packet if the send buffer is full.
In mod_cgi, the length of time to wait for output from a CGI script.
In mod_ext_filter, the length of time to wait for output from a
filtering process.
In mod_proxy, the default timeout value if ProxyTimeout is not
configured.
I think you fell into case NUMBER ONE
EDIT
I was lurking into W3 HTTP document and I found no refer to timeouts.
but into the chapter 8 (connections) I found:
8.1.4 Practical Considerations
Servers will usually have some time-out value beyond which they will no longer maintain an inactive connection. (...) The use of persistent connections places no requirements on the length (or existence) of this time-out for either the client or the server.
that sounds to me like "every server or client is free to choose his behaviour about inactive connection timeouts"