In Jaxrs (WebClient for instance) we can set a connect timeout and a read timeout.
ClientConfiguration c = WebClient.getConfig(client);
HTTPConduit http = c.getHttpConduit();
HTTPClientPolicy httpClientPolicy = new HTTPClientPolicy();
httpClientPolicy.setConnectionTimeout(timeout);
httpClientPolicy.setReceiveTimeout(timeout);
httpClientPolicy.setAllowChunking(false);
http.setClient(httpClientPolicy);
I would like to set a timeout that includes both, I don't really care how much time is spent in connecting or in receiving, my requirement is to get a response in X seconds or just discard the search.
There is no way with CXF to set a maximum timeout for a request that consider both connection and receive durations. The maximum timeout for a request will be:
maximum_timeout = connection_timeout + receive_timeout
See this similar question for Apache HTTP client. The workaround could be to set a timer in a separate Thread to abort the connection when the desired maximum timeout expires
Related
I'm using aerospike java client v 6.0.1 and use the following configs from client read policy:
clientPolicy.readPolicyDefault.connectTimeout = 1000;
clientPolicy.readPolicyDefault.socketTimeout = 30;
clientPolicy.readPolicyDefault.totalTimeout = 110;
clientPolicy.readPolicyDefault.maxRetries = 2;
clientPolicy.readPolicyDefault.sleepBetweenRetries = 0;
but I'm getting the following errors from time to time, which say that not all retries were used and timeout occurred:
org.springframework.dao.QueryTimeoutException: Client timeout: iteration=0 connect=1000 socket=30 total=110 maxRetries=2 node=null inDoubt=false; nested exception is com.aerospike.client.AerospikeException$Timeout: Client timeout: iteration=0 connect=1000 socket=30 total=110 maxRetries=2 node=null inDoubt=false
org.springframework.dao.QueryTimeoutException: Client timeout: iteration=1 connect=1000 socket=30 total=110 maxRetries=2 node=A2 node_ip 3000 inDoubt=false; nested exception is com.aerospike.client.AerospikeException$Timeout: Client timeout: iteration=1 connect=1000 socket=30 total=110 maxRetries=2 node=A2 node_ip 3000 inDoubt=false
Does it mean that total operation timeout also involves connect to Aerospike node? Aerospike docs state that total timeout starts after connect timeout finishes:
If connectTimeout is greater than zero, it will be applied to creating a connection plus optional user authentication and TLS handshake. When the connect completes, socketTimeout/totalTimeout is then applied. In this case, totalTimeout starts after the connection completes. see https://discuss.aerospike.com/t/understanding-timeout-and-retry-policies/2852
99% of all my requests to aerospike take less than 20 ms and it doesn't make sense for me to increate total timeout.
Originally I had 200-300 ms connect timeout and I increased it to 1000 ms, but it didn't help much
Transactions can sometimes timeout before the transaction has started. For example, async transactions can be throttled and can exist in the delay queue for longer than totalTimeout. If this occurs, a timeout exception is generated with iteration=0.
Anytime totalTimeout is reached, the transaction is cancelled regardless of the number of retries.
If connectTimeout is used and a new connection is required (no available connections in the pool) for the transaction, the connectTimeout is applied to connection creation and the totalTimeout stopwatch does not start until the new connection is created.
If connectTimeout is used and an existing connection is available from the pool, the connectTimeout is not applicable and the totalTimeout stopwatch starts from the beginning of the transaction.
Since most transactions are able to obtain connections from the pool, it's not surprising that increasing connectTimeout has little effect.
Unable to timeout a grpc connection from server side. It is possible that client establishes a connection but kept on hold/sleep which is resulting in grpc server connection to hang. Is there a way at server side to disconnect the connection after a certain time or set the timeout?
We tried disconnecting the connection from client side but unable to do so from server side. In this link Problem with gRPC setup. Getting an intermittent RPC unavailable error, Angad says that it is possible but unable to define those parameters in python.
My code snippet:
def serve():
server = grpc.server(thread_pool=futures.ThreadPoolExecutor(max_workers=2), maximum_concurrent_rpcs=None, options=(('grpc.so_reuseport', 1),('grpc.GRPC_ARG_KEEPALIVE_TIME_MS', 1000)))
stt_pb2_grpc.add_ListenerServicer_to_server(Listener(), server)
server.add_insecure_port("localhost:50051")
print("Server starting in port "+str(50051))
server.start()
try:
while True:
time.sleep(60 * 60 * 24)
except KeyboardInterrupt:
server.stop(0)
if __name__ == '__main__':
serve()
I expect the connection should be timed out from grpc server side too in python.
In short, you may find context.abort(...) useful, see API reference. Timeout a server handler is not supported by the underlying C-Core API of gRPC Python. So, you have to implement your own timeout mechanism in Python.
You can try out some solution from other StackOverflow questions.
Or use a simple-but-big-overhead extra threads to abort the connection after certain length of time. It might look like this:
_DEFAULT_TIME_LIMIT_S = 5
class FooServer(FooServicer):
def RPCWithTimeLimit(self, request, context):
rpc_ended = threading.Condition()
work_finished = threading.Event()
def wrapper(...):
YOUR_ACTUAL_WORK(...)
work_finished.set()
rpc_ended.notify_all()
def timer():
time.sleep(_DEFAULT_TIME_LIMIT_S)
rpc_ended.notify_all()
work_thread = threading.Thread(target=wrapper, ...)
work_thread.daemon = True
work_thread.start()
timer_thread = threading.Thread(target=timer)
timer_thread.daemon = True
timer_thread.start()
rpc_ended.wait()
if work_finished.is_set():
return NORMAL_RESPONSE
else:
context.abort(grpc.StatusCode.DEADLINE_EXCEEDED, 'RPC Time Out!')
Is there a way to make set the socket timeout when publishing?
I'm testing correct recovery from lost connection with Pika, by
establishing a BlockingConnection connection
disconnecting from the network to force an error
reestablishing a connection and checking that the producer reconnects correctly and continues producing.
However, I don't seem to be able to set the socket timeout and basic_publish hangs - for WAY more than 5 seconds -- 60 or more.
credentials = pika.PlainCredentials(worker_config.username, worker_config.password)
connection = pika.BlockingConnection(pika.ConnectionParameters(
host=worker_config.host,
credentials=credentials,
port=worker_config.port,
connection_attempts=1,
retry_delay=5,
socket_timeout=5,
))
# No effect
#connection._impl.socket.settimeout(5)
channel = connection.channel()
while True:
result = channel.basic_publish(
exchange=EXCHANGE,
routing_key=ROUTING_KEY,
body=message,
properties=pika.BasicProperties(
delivery_mode, # MQ_TRANSIENT_DELIVERY_MODE, #1
))
# Someone after some success, disconnect network.
Pika comes into (select_connection.py):
def poll(self, write_only=False):
"""Poll until the next timeout waiting for an event
:param bool write_only: Only process write events
"""
while True:
try:
events = self._poll.poll(self.get_next_deadline())
break
except _SELECT_ERROR as error:
if _get_select_errno(error) == errno.EINTR:
continue
else:
raise
... and indeed, get_next_deadline is sending 5.
_poll is a python Poll object which takes a timeout in seconds.
What's up with this?
There's a similar question, but has no answers (not enough detail?)
I am calling 5 external servers to retrieve XML-based data for each request for a particular webpage on my IIS 6 server. Present volume is between 3-5 incoming requests per second, meaning 15-20 outgoing requests per second.
99% of the outgoing requests from my server (the client) to the external servers (the server) work OK but about 100-200 per day end up with a "The operation has timed out" exception.
This suggests I have a resource problem on my server - some shortage of sockets, ports etc or a thread lock but the problem with this theory is that the failures are entirely random - there are not a number of requests in a row that all fail - and two of the external servers account for the majority of the failures.
My question is how can I further diagnose these exceptions to determine if the problem is on my end (the client) or on the other end (the servers)?
The volume of requests precludes putting an analyzer on the wire - it would be very difficult to capture these few exceptions. I have reset CONNECTIONS and THREADS in my machine.config and the basic code looks like:
Dim hRequest As HttpWebRequest
Dim responseTime As String
Dim objWatch As New Stopwatch
Try
' calculate time it takes to process transaction
objWatch.Start()
hRequest = System.Net.WebRequest.Create(url)
' set some defaults
hRequest.Timeout = 5000
hRequest.ReadWriteTimeout = 10000
hRequest.KeepAlive = False ' to prevent open HTTP connection leak
hRequest.SendChunked = False
hRequest.AllowAutoRedirect = True
hRequest.MaximumAutomaticRedirections = 3
hRequest.Accept = "text/xml"
hRequest.Proxy = Nothing 'do not waste time searching for a proxy
hRequest.ServicePoint.Expect100Continue = False
Dim feed As New XDocument()
' use *Using* to auto close connections
Using hResponse As HttpWebResponse = DirectCast(hRequest.GetResponse(), HttpWebResponse)
Using reader As XmlReader = XmlReader.Create(hResponse.GetResponseStream())
feed = XDocument.Load(reader)
reader.Close()
End Using
hResponse.Close()
End Using
objWatch.Stop()
' Work here with returned contents in "feed" document
Return XXX' some results here
Catch ex As Exception
objWatch.Stop()
hRequest.Abort()
Return Nothing
End Try
Any suggestions?
By default, HttpWebRequest limits you to 2 connections per HTTP/1.1 server. So, if your requests take time to complete, and you have incoming requests queuing up on the server, you will run out of connection and thus get timeouts.
You should change the max outgoing connections on ServicePointManager.
ServicePointManager.DefaultConnectionLimit = 20 // or some big value.
You said that you are doing 5 outgoing request for each incoming request to the ASP page. Is that 5 different servers, or the same server?
DO you wait for the previous request to complete, before issuing the next one? Is the timeout happening while it is waiting for a connection, or during the request/response?
If the timeout is happening during the request/response then it means that the target server is under stress. The only way to find out if this is the case, is to run wireshark/netmon on one of the machines, and look at the network trace to see if the request from the app is even making it through to the server, and if it is, whether the target server is responding within the given timeout.
If this is a thread starvation issue, then one of the ways to diagnose it is to attach windbg.exe debugger to w3wp.exe process, when you start getting timeout. Then load the sos.dll debugging extension. And run the !threads command, followed by !threadpool command. It will show you how many Worker threads and completion port threads are utilized/remaining. If the #completionport threads or worker threads are low, then that will contribute to the timeout.
Alternatively, you can monitor ASP.NET and System.net perf counters. See if the ASP.NET request queue is increasing monotonically - this might indicate that your outgoing requests are not completing fast enough.
Sorry, there are no easy answers here. THere is a lot of avenues you will need to explore. If I were you, I would start off by attaching windbg.exe to w3wp when you start getting timeouts and do what I described earlier.
I have a WCF service used mainly for managing documents in a repository.
I used the chunking channel sample from MS so that I could upload/download huge files.
Now I implemented reliable session with the service and I am seeing some strange behaviors.
Here are the timeout values I am using.
this.SendTimeout = new TimeSpan(0,10,0);
this.OpenTimeout = new TimeSpan(0, 1, 0);
this.CloseTimeout = new TimeSpan(0, 1, 0);
this.ReceiveTimeout = new TimeSpan(0,10, 0);
reliableBe.InactivityTimeout = new TimeSpan(0,2,0);
I have the following issues:
1. If the Service is not up & running, the clients are not get disconnected after OpenTimeout.
I tried it with my test client.
Scenario 1: Without Reliable Session:
I get the following exception:
Could not connect to net.tcp://localhost:8788/MediaManagementService/ep1. The connection attempt lasted for a time span of 00:00:00.9848790. TCP error code 10061: No connection could be made because the target machine actively refused it 127.0.0.1:8788
This is the correct behavior as I have given the OpenTimeout as 1 sec.
Scenario 2: With ReliableSession:
I get the same exception:
Could not connect to net.tcp://localhost:8788/MediaManagementService/ep1. The connection attempt lasted for a time span of 00:00:00.9692460. TCP error code 10061: No connection could be made because the target machine actively refused it 127.0.0.1:8788.
But this message comes after around 10 mintes . (I believe after SendTimeout)
So here I just have enabled the reliable session and now it looks like the OpenTimeout = SendTimeout for the client.
Is this desired behavior?
2: Issue while uploading huge files with ReliableSession:
The general rule is that you have to set a huge value for the maxReceivedMessageSize, SendTimeout and ReceiveTimeout.
But in the case of Chunking channel, the max received message size doesn't matter as the data is sent in chunks.
So I set a huge value for Send and ReceiveTimeout : say 10 hours.
Now the upload is going fine, but it has a side effect that, even if the Service is not up, it takes 10 hours to timeout the client connection due to the behavior mentioned in (1).
Please let me know your thoughts on this behavior.