WCF App recieving multiple requests per second causing other asp.net apps to stop responding and deadlock - wcf

We have a WCF Service using a wsHttpBinding. When it recieves many requests in a short period of time (25 per second for a few minutes) it stops working and our other asp.net applications and pages to stop responding as well. Some of them timeout and eventually we see the following in the event viewer:
ISAPI 'c:\windows\microsoft.net\framework\v2.0.50727\aspnet_isapi.dll' reported itself as unhealthy for the following reason: 'Deadlock detected'.
Often we get calls about the problem first and restart IIS to solve the problem.
How can we configure our WCF service to handle this many transmissions or at least configure it to not take down our other applications when it can't handle the load. Our classic asp applications run without issues during this time, it's only our .net apps that are effected.

are you running all your asp/wcf sites in the same AppPool? if so, I'd suggest creating a new one and running the WCF service just in that. That in itself might be enough to solve the problem from a practical perspective.

Also can you target a more recent version of the framework with your WCF app? (and leave the other apps the same) It will isolate it much better.

Related

Maximum concurrent requests in signalr self hosted in kestrel

I've encountered a strange problem with an application I've developed. The application is a windows service hosting AspNetCore 2.0 running on Kestrel. This application receives requests through an IIS site acting as a proxy.
In this application, I also use signal 2.2.2 integrated using Microsoft.AspNetCore.Owin. All worked well until I detected that the application was not responding to requests.
Other applications on the same machine and using the same IIS server as proxy were working fine. Restarting the application pool serving the site solved the problem temporarily.
The problem resurfaced again and digging through monitoring information the application seems to hang when there are 400 signalr SSE connections on the same machine. This seems plausible as I've found that by default OWIN limits the number of concurrent requests at 100 * number of cpus. (Note that a site on the same machine is serving 5000 requests per minute without a sweat but these are not a long-lived request like the SignalR ones)
The problem is that I seem unable to find the same option when hosting Owin inside AspNetCore. Does someone know if this can be the solution and what is the correct setting?
EDIT: I'm fairly certain that the issue is caused by the number of SignalR connections opened concurrently because by disabling it in Javascript the problem vanished.
2nd EDIT: signalr does not seem to be the cuplrit as load testing the site with crank both in test and in production worked until 5000 concurrent connections which is the default IIS limit and is fine by me
After some trial and error I've been able to identify and correct the problem but it was no easy task so I'm leaving this answer behind if someone else stumbles upon the same problem.
Disabling SignalR did not solve the problem but it made it appear less often.
Thanks to the monitoring in place on the server and IIS I observed that the problem appeared when the number of connections to the site started growing rapidly. This system primarily makes request to other services so it does not have a database nor expensive computations.
Examining the code I've found that there were three problems:
a new HttpClient was created for every request which can exhaust the sockets which are not reused between requests blog blog2 blog3
by default there's a maximum number of concurrent connections on the httpClient to a single domain and this limit is set by default to 2 (!!!) blog4
the code was waiting synchronously on every web request to another system (this program was ported from an mvc4 site which never displayed this problem). This worked fine in MVC but asp.net core is very sensitive to this as it will rapidly exhaust all available threads and because the thread pool starts with the number of cores they will be exhausted quickly making all the requests wait. This value can be increased as temporary stop gap solution with ThreadPool.SetMaxThreads(Int32, Int32) but the only solution is to transform all calls in async calls.
Once all calls were mde async the problem never returned. Basically the problem was due to threadpool starvation and aspnet core sensibility to it vs MVC. Here you can find a nice explanation and a detection method using PerfView.
This could be the issue, but it's unlikely. When hosting in dotnet core you're probably using Kestrel as a webserver implementation, to switch these limits such as concurrent connections you can use KestrelServerLimits class as described in this Microsoft article.
KestrelServerLimits should not be causing you any problems since the default value for ConcurrentConnections is unlimited.

IIS Application Recycle drops static classes

I'm using Simple Injector in my WCF service. While running it from VS2010 everything is fine. However, when I publish it to my server using IIS 7, after some time (20 min, counted) my WCF loses all registered assemblies, modules, classes in container.
I guess IIS recycles the WCF Service Application Pool and drops my container registrations.
Can anyone help me on this?
While there exists many legitimate cases of using self-hosting WCF services, however, approaching self-hosting just because of IIS recycling may be counter productive.
Hosting in IIS gives you a lot benefit during development and daily operations, and I am not going to repeat what benefits which you could easily find out in google search.
So when IIS receive the first request to your application, it will launch a worker process named "w3wp.exe" according the settings in the application pool associated with your web app. And by default IIS will shutdown in 20 minutes of idle time. Check the Advanced Settings of the application pool, you will find a lot settings for the life cycle. You won't get such flexibility and robustness through self-hosting out of the box.
So basically you could have a few options provided you decide to stay with IIS hosting.
Change the Idle Time-out to 24-hours or even a month.
Write a small program or use cUrl to ping your application every 10 minute.
Leave it as it is
If you want to keep states during operations, save them in disk, then load them during next launch triggered by a request.

Is WAS activation over MSMQ a legend or what?

I'm working on my fourth or fifth implementation of a WCF service over MSQM with IIS/WAS activation. And I was never able to make it work properly. It's always the same story: my services are activated only if the IIS web site was interacted some other way (like servicing the service metadata page at /somewhere/myService.svc). Suddenly, if the only thing happening is sending messages into the queue, my services stop to process messages, and restart as soon as I visit the .svc page...
It's a so common pattern for me, that I also came to a common solution: scheduling a job (every few minutes) that runs a powershell script that access that page. Quite simple, but not very elegant. And, further more, unnecessary in theory.
This happened over different IIS versions (7.0 and 7.5), over various Win 2008 service packs and releases and with server in AD domains or workgroups. I think I've read every bits on the web about this, especially MSDN and microsofties blog, so binding configuration, MSMQ permissions, and all the other small details you can discover here and there are set up.
So the question: does anybody was successful with WAS over MSMQ?

The server rejected the session-establishment request: WCF hosted on IIS

We have some WCF services implemented in an IIS application, communicating over net.tcp on the default port (808), using the Microsoft Net.Tcp Port Sharing Service, throwing an error on production servers. When I instantiate a connection to the first of the services, I get back an exception:
The server at <URL> rejected the session-establishment request. All the other services respond fine.
But it runs fine on our test servers.
I initially thought there was something wrong with the particular service that was failing, but I tried rearranging the list of services into a different order, and it SEEMS to always be the first service that I hit that fails. (I say SEEMS because it think once in the early iterations of testing, I saw it happen on the second service that it hit. But I haven't been able to reproduce that.)
I've looked at application startup delays, and that doesn't seem to be the problem, because I can come back and run the test again as soon as it finishes - a delay of only a minute or two - and get the same error. Also, in the lower level environments, there is a start up delay of probably 30 seconds to a minute, but the result still comes back as expected.
I've tried accessing the services over http from INetManager, and I get intermittent failures on all the services - a particular service will return a yellow screen of death on on invocation, then come up with the expected link to the WSDL on the next one seconds later.
I'm completely at a loss to explain this behavior, or how to resolve it. I've googled the error message, and not found anything helpful. It may be a configuration issue - the production servers are newly provisioned VM's, and we may not have the config exactly right (whereas all the lower level environments have been running this and other similar apps for some time), but I have not idea what to look for. I've looked at the properties of the app pool that the app is running on and compared it to the lower level environments without finding any differences.
If somebody can point me in the right direction, you would have my undying gratitude.
Things I can find:
http://go4answers.webhost4life.com/Example/connect-busy-wcf-service-host-while-725.aspx:
MaxConcurrentSessions (default = 10) [Per-channel] The maximum number of sessions that a service can accept at one time. Only comes into play with session-based bindings (wsHttp or netTcp)"
http://blogs.infosupport.com/unable-to-generate-a-wcf-proxy-using-svcutil-but-retreiving-the-wsdl-works/
So in the end the trick is to add the additional right on the c:\windows\temp folder for your App Pool Identity [for the service to be able to generate metadata] to solve the problem.
Also, are timeouts or other limits configured and being hit? Give tracing a look and access the service using WcfTestClient and see if you can find underlying errors.

Silverlight wcf lag when calling functions

I am trying to debug performance issues with a .net 3.5 silverlight 3 - wcf service based application. the service is running under IIS 7 on a server that is not under heavy load
The problem is that certain actions in the application are taking a long time to complete, in an attempt to debug this we have added manual loggin into the silverlight and wcf application to time how long calls from the silverlight client take to reach the service, how long for the service to process, then how long for the response to reach the client
Are findings show that we seem to be getting large delays - up to 45 seconds between messages being sent and received to and from the server.
Unfortunately these large delays seem to be happening randomly, to different service calls we no pattern. The majority of the time calls work relatively quickl.
I believe this may be related to concurrent usage, as the issue appears to get worse when 4- 5 users are using the system at once.
Has anyone else experienced issues like this? And could anyone advise any usefull ways to debug this kind of issue or at least narrow down the possible causes?
When running in debug or via IIS locally on a development machine this issue does not occur
Thanks!