.NET Core 3.1 Kestrel/Apache stop responding requests - asp.net-core

I am running an application in .NET Core 3.1 behind an Apache 2.4.41 reverse proxy in Ubuntu 20.04.1 LTS.
This application was running for months without any issues, then I added SignalR to the project, just to see connected clients, after SignalR, the Kestrel/Apache randomly stops responding to requests after some hours after the start.
I have no idea how to trace the problem, no exceptions, nothing strange in memory/CPU usage.

My problem was was simpler than I thought, just had to increase the MaxRequestWorkers in /etc/apache2/mods-available/mpm_event.conf, as SignalR mainly uses WebSocket connections which keeps a request worker busy as long as it is connected.
Another possible solution was to change the HttpTransportType in the client and force another type, as ServerSentEvents for example, but it has other counterparts.

Related

Can Hangfire schedule jobs do this?

I am evaluating Hangfire for an upcoming ASP.net Core project that has several scheduled and reoccurring tasks that need to execute independently of users clicking on web pages. I know that HangFire can do this if the web application is started because a request has come in. I need to know whether or not HangFire can execute a scheduled task between being rebooted and the first web request coming in.
Example: Web server is rebooted at 11pm, and no web requests will come in to cause the web server to spin up until 5am the next morning. A scheduled task needs to be performed at 1AM. Will Hangfire execute this task even though the web application hasn't been started by an incoming request?
If it can, is there a certain setup I need to do to allow this?
Details, if needed:
We are going to be using Kestrel hosted in a windows service and sitting behind an NGINX reverse proxy. This setup could be modified if needed to make HangFire meet this requirement.
When running under IIS it would be a real problem, see Making ASP.NET application always running
But it should not be problem for ASP.NET CORE with kestrel, see
It is not necessary for ASP.NET Core, because application is exposed
by a console application that it already always on – there are no
timeouts, no suspends and other optimization techniques yet. All you
need to do is to use supervisor as written in the official docs for
Linux, or use Windows Service with automatic start time, when running
on Windows.

IIS Restart Web API in ordered

When IIS Server is the outage, the server is restarted and all applications and web APIs are restarted as well, the problem I'm facing that some Web API depends on BUS. Could we let site/app wait until BUS ready before starting from IIS without touching the application code? In Docker, we can use the WAIT command or other third parties to wait until service is available before starting a container.
The Web API we are built on .Net Core 3.1
Any help is appreciated.
During the startup could you use something like
https://github.com/App-vNext/Polly to enter a period of retry
(https://github.com/App-vNext/Polly#retry). This would allow you to
retry the call to the API until it's hopefully it's available.
Use a windows service to monitor heartbeats of the applications and trigger application pool restarts on applications not working correctly. This should help you get into a running state.
Ultimately I'd try and remove this dependency, if you could give a little more information around the webapi requirements I'd be happy to suggest more ideas.

Maximum concurrent requests in signalr self hosted in kestrel

I've encountered a strange problem with an application I've developed. The application is a windows service hosting AspNetCore 2.0 running on Kestrel. This application receives requests through an IIS site acting as a proxy.
In this application, I also use signal 2.2.2 integrated using Microsoft.AspNetCore.Owin. All worked well until I detected that the application was not responding to requests.
Other applications on the same machine and using the same IIS server as proxy were working fine. Restarting the application pool serving the site solved the problem temporarily.
The problem resurfaced again and digging through monitoring information the application seems to hang when there are 400 signalr SSE connections on the same machine. This seems plausible as I've found that by default OWIN limits the number of concurrent requests at 100 * number of cpus. (Note that a site on the same machine is serving 5000 requests per minute without a sweat but these are not a long-lived request like the SignalR ones)
The problem is that I seem unable to find the same option when hosting Owin inside AspNetCore. Does someone know if this can be the solution and what is the correct setting?
EDIT: I'm fairly certain that the issue is caused by the number of SignalR connections opened concurrently because by disabling it in Javascript the problem vanished.
2nd EDIT: signalr does not seem to be the cuplrit as load testing the site with crank both in test and in production worked until 5000 concurrent connections which is the default IIS limit and is fine by me
After some trial and error I've been able to identify and correct the problem but it was no easy task so I'm leaving this answer behind if someone else stumbles upon the same problem.
Disabling SignalR did not solve the problem but it made it appear less often.
Thanks to the monitoring in place on the server and IIS I observed that the problem appeared when the number of connections to the site started growing rapidly. This system primarily makes request to other services so it does not have a database nor expensive computations.
Examining the code I've found that there were three problems:
a new HttpClient was created for every request which can exhaust the sockets which are not reused between requests blog blog2 blog3
by default there's a maximum number of concurrent connections on the httpClient to a single domain and this limit is set by default to 2 (!!!) blog4
the code was waiting synchronously on every web request to another system (this program was ported from an mvc4 site which never displayed this problem). This worked fine in MVC but asp.net core is very sensitive to this as it will rapidly exhaust all available threads and because the thread pool starts with the number of cores they will be exhausted quickly making all the requests wait. This value can be increased as temporary stop gap solution with ThreadPool.SetMaxThreads(Int32, Int32) but the only solution is to transform all calls in async calls.
Once all calls were mde async the problem never returned. Basically the problem was due to threadpool starvation and aspnet core sensibility to it vs MVC. Here you can find a nice explanation and a detection method using PerfView.
This could be the issue, but it's unlikely. When hosting in dotnet core you're probably using Kestrel as a webserver implementation, to switch these limits such as concurrent connections you can use KestrelServerLimits class as described in this Microsoft article.
KestrelServerLimits should not be causing you any problems since the default value for ConcurrentConnections is unlimited.

performance issues with Apache as reverse proxy and an ajax-heavy jsf application

I am currently developing a jsf application (running in jboss7 with primefaces 3.5 and push via primepush which basically uses the atmosphere framework to hide all the transport specific stuff behind a layer of abstraction)
As long as i am running just jboss the application works fine and responds quickly as would be expected. However when deploying this to production where jboss runs behind an Apache reverse proxy several problems appear.
The first problem being that Apache seems to kill the long-polling connection which causes the client to miss out on push messages (even after configuring atmosphere to use a broadcast cache). I currently work around that by periodically refreshing the whole page when user is idle, although this smells really bad..
Second, Apache seems to really slow down the whole application. Watching the Apache error log i am seeing a lot of messages like error reading chunk (will post the exact message later as i am currently writing this post on the go with my smartphone). Lot's of digging around in the atmosphere documentation and trying out different broadcasters did mit change this in any way.
My question would be this: would i be better off by using nginx, especially in the context of push via long polling?
I know i have given only little detail, i will edit this post later when at home ;)
just so this topic gets closed: if you have an atmopshere-based application running behind an apache reverse proxy, be sure to set the TTL parameter for the proxypass directive. setting this parameter to 5 worked for me, apache now discards old connections fast enough so it doesn't run out of worker threads.

WCF App recieving multiple requests per second causing other asp.net apps to stop responding and deadlock

We have a WCF Service using a wsHttpBinding. When it recieves many requests in a short period of time (25 per second for a few minutes) it stops working and our other asp.net applications and pages to stop responding as well. Some of them timeout and eventually we see the following in the event viewer:
ISAPI 'c:\windows\microsoft.net\framework\v2.0.50727\aspnet_isapi.dll' reported itself as unhealthy for the following reason: 'Deadlock detected'.
Often we get calls about the problem first and restart IIS to solve the problem.
How can we configure our WCF service to handle this many transmissions or at least configure it to not take down our other applications when it can't handle the load. Our classic asp applications run without issues during this time, it's only our .net apps that are effected.
are you running all your asp/wcf sites in the same AppPool? if so, I'd suggest creating a new one and running the WCF service just in that. That in itself might be enough to solve the problem from a practical perspective.
Also can you target a more recent version of the framework with your WCF app? (and leave the other apps the same) It will isolate it much better.