When maxConcurrentCalls is 100% WCF stops response for incoming requests over existing connections. However, in the same situation, in test enviroment requests come over existing connections well. Requests over new connections timeout as expected. Binding – net.tcp. What can stop request processing over existing connection in production?
After a week of investigation the cause was found. It was an application that issued many requests (and used all available workers - maxConcurrentCalls became 100%). Before receiving replies that app hangs. Because of our app has long send timeout (1 hour) all workers could not send reply during that timeout and just waiting. New request could not be processed because all workers were busy.
All in all long send timeout is evil. If it were short (default 1 minute) hang requests could be aborted mush earlier and another requests could be processed as normal.
Related
I have a WCF SOAP service that receives too many synchronous requests from other systems
I am having problem when too many request comes at that time IIS Queue will not provide proper result and server memory and CPU usage is gone high and discard request or time out
I did a normal Load test (100 requests with 100 concurrent user) and the IIS started to discard the requests after the maximum queue length reach as well as all the request coming delays to provide the response and Other requests coming in are delayed until the first one either times out, or responds.
Below is server configuration
WCF application code is tested with Resharper tool and there is no object or memory dispose issue
Is there any settings for setup application pool or worker process to manage queue ?
Can i apply web garden in Application ?
Please help me to solve this issue
Thanks in Advance
I have a a NesrviceBus host that his job is to send HTTP request to our customers for each new incoming MSMQ message.
Recently one customer begun to "return" HTTP timeouts, Which cause:
1) The input queue got exploded from new messages
2) starvation of all the other customers )-:
My solution is to split the host and to install new host for each customer.
Any other ideas?
You could specify an acceptable timeout for you not to be "exploded from new messages" and then catch the timeout and defer the message for a while until assuming the client would respond quicker.
And to avoid the starvation while waiting for requests you could set up the number of threads your worker is using so it processes more than one message at the time.
We have a HA RabbitMQ cluster (v3.2.x) with two nodes that sits behind a load-balancer. Our clients are configured to use a 300s heartbeat. Everything works as expected most of the time.
However, if the client's connection drops (say the client's NIC is disconnected), we have noticed (via TCPDump/wireshark) that the RabbitMQ node will attempt 3 heartbeat messages (in our case nearly 15 mins) before it closes the connection. Why? Why not close it after one failure?
Is there some means to change this behavior on the RabbitMQ server? Or do we have to shorten our heartbeat to something much smaller like 5s or 10s in order to get the connection to close sooner, thoughts?
Related issue...
Looking at the TCPDump (captured on load-balancer), I wonder why the LB doesn't close the connection when it doesn't receive the TCP-ACK from the dead client in response to the proxied RabbitMQ server heartbeat request? In fact, the LB will attempt to send the request several times (never receiving a response, of course). Wouldn't it make sense for the LB to make the assumption the connection has been dropped and close the entire session (including the connection to RabbitMQ node)?
It appears as though RabbitMQ is configured to tolerate two missed heartbeats before it terminates the connection. However, it waits until the next heartbeat would need to be sent before it drops the connection, that's what gives it the appearance of requiring 3 missed heartbeats.
Heartbeat1 (no response) wait Heartbeat2 (no response) wait Heartbeat3 terminate
There is a slight bug in MQ (it sends a 3rd heartbeat but immediately terminates the connection) but it isn't really affecting anything.
We observed the following behavior on one of the servers hosting a WCF service on IIS 6.0:
The IIS log shows a high value for time-taken (> 100000)
The HTTP status code is 200
sc-win32-status code shows a value of 64
I found out that sc-win32-status code of 64 indicates "The specified network is no longer available"
Initially I suspected that it could be because of limits set on MinFileBytesPerSecond, which sets the minimum throughput rate that HTTP.sys enforces when sending data from the client to the server, and back from the server to the client.
But the value for sc-bytes and cs-bytes indicate that the amount of data is sent is within the range generally observed for the service.
Also note that the WCF service is hosted on four boxes and is load-balanced, but the problem occurs only one of the servers. (but not essentially on the same server). The problem is also intermittent.
Has anybody else encountered this error? Any clues about what could be wrong?
Update
Note: Observation on IIS 7.5 (IIS version does not really matter)
I was able to replicate the issue. The issue occurs if:
1. The WCF service takes a long time to respond
2. The client proxy times out before it receives a response from the server. In this case it leads to TimeoutException on the client.
3. The server keeps waiting for TCP ACK for the client, which it would never receive.
Hence a long timeout (TCP socket timeout (default value: 4 minutes) and sc-win32-status of 64
So essentially it appears that WCF code is taking a long time to respond and the client is timing out, what I observe in IIS log is just a symptom and not a problem.
The behavior you are describing will also occur if you exceed a WCF service's max sessions, calls or instances (depending on how you have your service instancecontext mode configured). If you observe the System.ServiceModel performance counters for %max concurrent sessions and/or %max concurrent calls (again depending on your service's instance context), you may see a correlation with the IIS log entries.
Note that these maxes can be configured in the service throttling behavior.
https://msdn.microsoft.com/en-us/library/vstudio/system.servicemodel.description.servicethrottlingbehavior(v=vs.100).aspx
I saw your question again and wanted to point out that I found a solution for this. It turned out to be this piece of code in the web.config:
<pages smartNavigation="true">
After turning this off I stopped receiving the same time-out errors. See also the answer here
IIS put the services into sleep to save recources.
Copied from here (WCF REST Service goes to sleep after inactivity)
The application pool hosting your service defines Idle Time-out property (advanced settings of app pool in IIS management console) which defaults to 20 minutes. If no request is received by the app pool within idle timeout the worker processes serving the pool is terminated. After receiving a new request the IIS must start the process again, the process must load application domain and all related assemblies, compile .svc file, run the service host and process the request.The solution can be increasing idle time-out but the meaning of this time-out is correct handling of server resources. If the process is not needed it should be stopped. Another ugly workaround is using some ping process (for example cron job or scheduled task on the server) which will regularly ping call some method on the service or page in the same application.
Can I configure Glassfish to drop any request that takes longer than 10 seconds to process?
Example:
I'm using Glassfish to host my web service. The thread pool is configured to have max 5 connections.
My service has a method that does this:
System.out.println("New request");
Thread.sleep(1000*1000);
I'm creating 5 requests to the service and I see 5 messages "New request" in the log. Then the server stop to respond for a looong time.
In live environment all requests must be processed in less than a second. If it takes more time to process then there is a problem with the request and I want Glassfish to drop such requests but stay alive and serve other requests.
Currently I'm using a workaround in the code. At the beginning of my web method I launch a separate thread for request processing with a timeout as it was suggested here: How to timeout a thread
I do not like this solution and still believe that there must be a configuration setting in the Glassfish to apply this logic to all requests, not to just one method.