Nlog in hangfire jobs deadlock - sql

I am using hangfire as a custom workflow engine in a dotnet core application, which works perfectly fine until I start logging. I log to DB as a requirement. So normal database target. I enabled a sync=true. Once I turn logging on I start getting deadlocks. All is using DI. Again all works fine without log, once I add a rule to write the logs in db I get a deadlock. It’s not even a lot of concurrent jobs, only 11 in my test. Help pls?! Already lost a week on this and still stuck.
I am not sure it’s related, but I am also getting this at the same time: asynchronous exception: timeout flushing all targets

Related

Intermittent problems starting Azure App Services: "500.37 ANCM Failed to Start Within Startup Time Limit"

Our app services are experiencing the problem, that they can’t be restarted by the hosting environment (ANCM).
The user is getting the following screen in that case:
Http Error 500.37
Our production subscription consists of up to 8 different app services and the problem can randomly harm one of them ore some of them.
The problem can occur several times a week, or just once a month.
The bootstrapping procedure of our app services is not time consuming.
The last occurrence of the problem has this log entries within the eventlog:
Failed to gracefully shutdown application 'MACHINE/WEBROOT/APPHOST/XXXXXXXXX'.
followed by:
Application '/LM/W3SVC/815681839/ROOT' with physical root 'D:\home\site\wwwroot' failed to load coreclr. Exception message: Managed server didn't initialize after 120000 ms
In most cases the problem can be resolved by manually stopping and starting the app service. In some cases we had to do that twice.
We are not able to reproduce that behavior locally.
The App Service Plan is S2 and we actually use just one instance.
The documentation of the Http error 500.37 recommends:
"You may need to stagger the startup process of multiple apps."
But there is no hint of how to do that.
How can we ensure that our app services are restarted without errors.
HTTP Error 500.37 - ANCM Failed to Start Within Startup Time Limit
You can try following approaches:
Approach 1: If possible, can try to move one app into a new App Service with a separate App Service plan, then check whether it can start as expected.
Please note that creating and using a separate App Service plan would be charged.
Approach 2: Increasing the startupTimeLimit attribute of the aspNetCore element.
For more information about the startupTimeLimit attribute, please check: https://learn.microsoft.com/en-us/aspnet/core/host-and-deploy/aspnet-core-module?view=aspnetcore-3.1#attributes-of-the-aspnetcore-element

Troubleshooting Web App process restarting

Our web app process is restarting regularly and we are unable to determine the reason.
When looking into Application Events (using the 'Diagnostics and solve problems' blade in the Azure Portal), there exists a bunch of the following Info logs by 'IIS AspNetCore Module'
Event ID 1005:
Failed to gracefully shutdown process '14040'.
Event ID 1001:
Application 'MACHINE/WEBROOT/APPHOST/myapplication__xxxx' started process '31628' successfully and is listening on port '17663'.
There is nothing fishy with general resource usage and nothing in our application logs.
What is the best way to troubleshoot the reason behind these process restarts?
EDIT 1:
After fiddling around with web logging in the Web App's Diagnostic Logs, I now get an error logged from W3SVC-WP after each restart, but the message is nonsense:
1<br/>5<br/>50000780
EDIT 2:
Event Id 2284 refers to this:
FailedRequestTracing module failed to write buffered events to log
file for the request that matched failure definition. No logs will be
generated until this condition is corrected. The problem happened at
least %1 times in the last %2 minutes. The data is the error.
I'm not sure if this could be related to our Diagnostic Logs configuration, but seems unlikely.
EDIT 3:
As per Brando Zhang's suggestion, I've used the Web App Crash Diagnoser extension and tried monitoring 2nd Chance Unhandled Exceptions on both my application process AND on w3wp, but nothing is dumped.
From how I understand it, 1st Chance Exceptions will not crash the process, so no need to monitor these.
Very likely application is crashing due to fatal exception and causing the restarts.
On Azure App Service platform.You can use the Diagnostics as a
Service (DaaS) to troubleshoot this
It can also do an analysis and tell you the root cause most of the time.More step by step infofrmation can be found on this msdn blog .Also refer tips for using crash diagnoser

Handling cache warm-up with twisted and systemd

I have a simple twisted application which I run using a systemd service, executing a script, which subsequently executes a .tac file.
The application is structured as a JSON RPC endpoint (fastjsonrpc), built into a t.w.r.Resource, which is in a t.w.s.Site, and served t.a.i.TCPServer, and the whole thing packed into a t.a.Application. This works fine.
Where I do run into trouble is when I try to warm up caches at startup. This warm-up process is pretty slow (~300 seconds), and makes systemd timeout and kill the process. Increasing the timeout is not really a viable option, since I wouldn't want this to block system boot.
Analogous code is used in a separate stack running on Flask from within Apache and wsgi. That server starts itself off and lets systemd go on while it takes its time building the caches. This behaviour is fine for me.
I've tried calling the warmup function using the following within the setup function of the t.w.r.Resource:
reactor.callLater(1, ep.warmup, None)
I've not yet tried using this from within systemd, and have been testing it from twistd directly on the command line. The server does work as expected, however it no longer responds to SIGINT (^C). Removing the callLater is all that's needed to let the server respond to SIGINT.
If the warmup function is called directly (not by callLater, i.e., the arrangement which makes systemd give up while waiting for warm up to complete), the resulting server also continues to respond to SIGINT.
Is there a better / good way to handle this sort of long-running warmup code?
Why would twistd / the reactor not respond to SIGINT? Am I missing something here?
Twisted is a single-threaded thing. It sounds like your "cache warmup" code is blocking the reactor for those 300 seconds. One easy way to fix this would be using deferToThread to let it run without blocking the reactor.

Want to deploy WCF web service on Azure platform

I want to deploy my WCF web Service on Azure plateform.
I have created a Storage account for my website, and also created a cloud Service and uploaded my package file and config file to the staging site.
But while uploading, The message displays
'Your staging deployment is starting. Hang on, the page will refresh once the deployment begins.'
I am waiting sice 2-3 hours and not getting the desired output.
Am I doing correctly? Or is there anything that I forgot?
Please Help...!
Most likely there is a problem in your code or the packaging that is causing the role to continuously restart. This is a fairly common problem, but there are a lot of possible causes (missing an assembly reference, an uncaught exception, the Run() method is exiting, a Startup Task is failing, or many other things). You need to gather more information to know exactly what the problem is and how to fix it.
There are many threads here on SO about this topic. There's also a Microsoft post discussing how to diagnose this type of issue. Those are good places to start.

RUN#Cloud consistently throws me out during a heavy operation

I'm using a large app instance to run a basic java web application (GWT + Spring). There's an expensive operation within my application (report) which takes a long time to execute.
I've tried running it with the cloudbees SDK on my local machine with similar settings as it would be on the cloud and it seems to function just fine. It runs in about 3-4 minutes.
On the cloud, it seems to be taking longer. The problem isn't the fact that it takes long. What happens in that cloudbees terminates the session after 5 minutes and gives me an error in my browser saying 'Unable to connect to server. Please contact your administrator'. A report which doesn't take as long runs just fine. My application has a session timeout of 30 minutes, so that isn't a problem either.
What could possibly be going wrong? Is it something to do with cloudbees?
This may be due to proxy buffering of your request through the routing layer (revproxy) - so it most likely isn't a session timeout - but the http connection getting cut.
You can either set proxyBuffering=false via the bees CLI command (eg when you deploy the app) - this will ensure longer running connections can work.
Ideally, however, you could change the app slightly to return to the browser with some token which you can poll with to get completion status, as even with a connection that lasts that long, over the internet it may provide a bad experience vs locally.