TCP Connection issue Connection refused - wcf

even if I am having a huge value of ListenBackLog=10000 and MaxConnection=10000 then also I am getting "tcp error code 10061 target machine actively refused". Its not every time but when load increases then error start appearing and one service is not able to communicate with other.
Following are the other values,
ListenBacklog = 10000;
MaxBufferPoolSize = 500000;
MaxBufferSize = 2000000000;
MaxConnections = 10000;
MaxReceivedMessageSize = 2000000000;
We already have MaxConcurrentCalls,MaxConcurrentSessions and MaxConcurrentInstances set as 10000.
In normal condition its working fine but when load increases services are not able to communicate with each other.
If we observer the performance counter then Calls Outstanding is lies between 150 to 250. Services are hosted as windows service.
any suggestion or thoughts?

Related

RabbitMQ slow when opening a new connection

I have a rather busy RabbitMQ setup which at peak times becomes extremely slow accepting new connections (RabbitMQ 3.9.14)
I've tried fine tuning /etc/sysctl.conf as found on a guide on the RabbitMQ website
fs.file-max = 10000000
fs.nr_open = 10000000
fs.inotify.max_user_watches=524288
net.core.somaxconn = 4096
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_keepalive_time=30
net.ipv4.tcp_keepalive_intvl=10
net.ipv4.tcp_keepalive_probes=4
net.ipv4.ip_local_port_range = 10000 64000
net.ipv6.conf.all.disable_ipv6=1
net.ipv6.conf.default.disable_ipv6=1
net.ipv6.conf.lo.disable_ipv6=1
net.netfilter.nf_conntrack_max=1048576
And also played around with the rabbitmq.conf options to see if anything would have an impact, however that is unfortunately not the case
num_acceptors.tcp = 32
channel_max = 4096
tcp_listen_options.backlog = 512
tcp_listen_options.nodelay = true
tcp_listen_options.linger.on = true
tcp_listen_options.linger.timeout = 0
tcp_listen_options.sndbuf = 196608
tcp_listen_options.recbuf = 196608
collect_statistics_interval = 60000
Due to the nature of my setup (PHP), every time messages are being published to RabbitMQ, a new connection is created, I wish I could do long-standing connections but that is beyond of what PHP is designed for
During peak activity, some connections take up to 7 seconds to open, once the connection is established however, the messages publishing performance is just fine.
I feel like I've exhausted all the logical options that I'm aware of. Is there any other tweaks that I can attempt to change in order to improve the connection performance of the node? The server load is low-ish, sitting at 15% peak. Disabling the management interface had negligible impact
Update: At first, when I've updated to RabbitMQ 3.10.5, I thought that the issue was solved, however that was not the case, it just gave us a bit more headroom.
The real cause was our high churn-rate (200/s+), during my conversation in the RabbitMQ slack channel it became apparent that a high churn rate would block the event loop and cause the spikes seen above.
The solution for us was to use a proxy to re-use connections instead of opening a new one every time we publish something:
https://github.com/cloudamqp/amqproxy
This has effectively resolved our issue.

WCF cannot reach 100% CPU. What is the bottleneck?

I am benchmarking a self-hosting nettcp WCF service, making requests from 50 threads to a service located no the same computer. The problem is that the CPU utilization never exceeds 35% on Xeon E3-1270. When I run the same test on a two core laptop it does reach 100%.
The WCF method does nothing, so it should not be limited by IO. I tried to increase the number of threads, but that does not help. Each thread creates a service channel and performs thousands calls reusing that channel instance.
Here is the service class I am using:
[ServiceBehavior(InstanceContextMode = InstanceContextMode.Single, ConcurrencyMode = ConcurrencyMode.Multiple)]
public class TestService : ITestService
{
public void Void()
{
// DO NOTHING
}
}
Configs:
ServiceThrottlingBehavior:
MaxConcurrentCalls = 1000
MaxConcurrentInstances = 1000,
MaxConcurrentSessions = 1000
NetTcpBinding
ListenBacklog = 2000
MaxConnections = 2000
I would try changing your InstanceContextMode to PerCall. I'm pretty sure your current configuration setting will be ignored because WFC only ever creates a single instance of your class and will process them in order. With PerCall a new instance will be created for each request until the maximum number of threads or your configuration limit has been reached. You shouldn't need the netTcpBinding setting either, but keep your Throttling behaviour but make sure you get your proportions right otherwise might have adverse effects.
MaxConcurrentCalls: 16 * Processor Count
MaxConcurrentSessions: 100 * Processor Count
MaxConcurrentInstance: Sum (116 * Processor Count)

Measuring "total bytes sent" from web service with nettcpbinding using perfmon

I have a web service (WCF) exposing both http endpoints and a tcp endpoint (using the nettcpbinding). I am trying to measure the difference in "total bytes sent" using the different endpoints.
I have tried using perfmon and looked at the performance counter: web service > total bytes sent. However it looks like that this only measures http traffic - can any of you confirm this? It doesn't look like tcp traffic increments the number.
There is also a TCP category in perfmon, but there is not a "total bytes sent". Is perfmon the wrong tool for the job?
Solved. I measured bytes received on the client by using code something similar to:
NetworkInterface[] interfaces = NetworkInterface.GetAllNetworkInterfaces();
NetworkInterface lan = null;
foreach (NetworkInterface networkInterface in interfaces)
{
if (networkInterface.Name.Equals("Local Area Connection"))
{
lan = networkInterface;
}
}
IPv4InterfaceStatistics stats = lan.GetIPv4Statistics();
Console.WriteLine("bytes received: " + stats.BytesReceived);
Do this before and after the web service call and diff the 2 values. Obviously you need to be aware that any other traffic on the client does not interfere.

How to diagnose "the operation has timed out" HttpException

I am calling 5 external servers to retrieve XML-based data for each request for a particular webpage on my IIS 6 server. Present volume is between 3-5 incoming requests per second, meaning 15-20 outgoing requests per second.
99% of the outgoing requests from my server (the client) to the external servers (the server) work OK but about 100-200 per day end up with a "The operation has timed out" exception.
This suggests I have a resource problem on my server - some shortage of sockets, ports etc or a thread lock but the problem with this theory is that the failures are entirely random - there are not a number of requests in a row that all fail - and two of the external servers account for the majority of the failures.
My question is how can I further diagnose these exceptions to determine if the problem is on my end (the client) or on the other end (the servers)?
The volume of requests precludes putting an analyzer on the wire - it would be very difficult to capture these few exceptions. I have reset CONNECTIONS and THREADS in my machine.config and the basic code looks like:
Dim hRequest As HttpWebRequest
Dim responseTime As String
Dim objWatch As New Stopwatch
Try
' calculate time it takes to process transaction
objWatch.Start()
hRequest = System.Net.WebRequest.Create(url)
' set some defaults
hRequest.Timeout = 5000
hRequest.ReadWriteTimeout = 10000
hRequest.KeepAlive = False ' to prevent open HTTP connection leak
hRequest.SendChunked = False
hRequest.AllowAutoRedirect = True
hRequest.MaximumAutomaticRedirections = 3
hRequest.Accept = "text/xml"
hRequest.Proxy = Nothing 'do not waste time searching for a proxy
hRequest.ServicePoint.Expect100Continue = False
Dim feed As New XDocument()
' use *Using* to auto close connections
Using hResponse As HttpWebResponse = DirectCast(hRequest.GetResponse(), HttpWebResponse)
Using reader As XmlReader = XmlReader.Create(hResponse.GetResponseStream())
feed = XDocument.Load(reader)
reader.Close()
End Using
hResponse.Close()
End Using
objWatch.Stop()
' Work here with returned contents in "feed" document
Return XXX' some results here
Catch ex As Exception
objWatch.Stop()
hRequest.Abort()
Return Nothing
End Try
Any suggestions?
By default, HttpWebRequest limits you to 2 connections per HTTP/1.1 server. So, if your requests take time to complete, and you have incoming requests queuing up on the server, you will run out of connection and thus get timeouts.
You should change the max outgoing connections on ServicePointManager.
ServicePointManager.DefaultConnectionLimit = 20 // or some big value.
You said that you are doing 5 outgoing request for each incoming request to the ASP page. Is that 5 different servers, or the same server?
DO you wait for the previous request to complete, before issuing the next one? Is the timeout happening while it is waiting for a connection, or during the request/response?
If the timeout is happening during the request/response then it means that the target server is under stress. The only way to find out if this is the case, is to run wireshark/netmon on one of the machines, and look at the network trace to see if the request from the app is even making it through to the server, and if it is, whether the target server is responding within the given timeout.
If this is a thread starvation issue, then one of the ways to diagnose it is to attach windbg.exe debugger to w3wp.exe process, when you start getting timeout. Then load the sos.dll debugging extension. And run the !threads command, followed by !threadpool command. It will show you how many Worker threads and completion port threads are utilized/remaining. If the #completionport threads or worker threads are low, then that will contribute to the timeout.
Alternatively, you can monitor ASP.NET and System.net perf counters. See if the ASP.NET request queue is increasing monotonically - this might indicate that your outgoing requests are not completing fast enough.
Sorry, there are no easy answers here. THere is a lot of avenues you will need to explore. If I were you, I would start off by attaching windbg.exe to w3wp when you start getting timeouts and do what I described earlier.

WCF ReliableSession and Timeouts

I have a WCF service used mainly for managing documents in a repository.
I used the chunking channel sample from MS so that I could upload/download huge files.
Now I implemented reliable session with the service and I am seeing some strange behaviors.
Here are the timeout values I am using.
this.SendTimeout = new TimeSpan(0,10,0);
this.OpenTimeout = new TimeSpan(0, 1, 0);
this.CloseTimeout = new TimeSpan(0, 1, 0);
this.ReceiveTimeout = new TimeSpan(0,10, 0);
reliableBe.InactivityTimeout = new TimeSpan(0,2,0);
I have the following issues:
1. If the Service is not up & running, the clients are not get disconnected after OpenTimeout.
I tried it with my test client.
Scenario 1: Without Reliable Session:
I get the following exception:
Could not connect to net.tcp://localhost:8788/MediaManagementService/ep1. The connection attempt lasted for a time span of 00:00:00.9848790. TCP error code 10061: No connection could be made because the target machine actively refused it 127.0.0.1:8788
This is the correct behavior as I have given the OpenTimeout as 1 sec.
Scenario 2: With ReliableSession:
I get the same exception:
Could not connect to net.tcp://localhost:8788/MediaManagementService/ep1. The connection attempt lasted for a time span of 00:00:00.9692460. TCP error code 10061: No connection could be made because the target machine actively refused it 127.0.0.1:8788.
But this message comes after around 10 mintes . (I believe after SendTimeout)
So here I just have enabled the reliable session and now it looks like the OpenTimeout = SendTimeout for the client.
Is this desired behavior?
2: Issue while uploading huge files with ReliableSession:
The general rule is that you have to set a huge value for the maxReceivedMessageSize, SendTimeout and ReceiveTimeout.
But in the case of Chunking channel, the max received message size doesn't matter as the data is sent in chunks.
So I set a huge value for Send and ReceiveTimeout : say 10 hours.
Now the upload is going fine, but it has a side effect that, even if the Service is not up, it takes 10 hours to timeout the client connection due to the behavior mentioned in (1).
Please let me know your thoughts on this behavior.