Hangfire jobs stuck in processing state - hangfire

We are using Hangfire to download data from Azure. We are using Hangfire 1.7.6. However, after running for some time, Hangfire is having a deadlock and seems stuck in processing the job. We had to restart the service to keep it working.
There is a recurring job which is adding jobs to the other background server. Mostly the jobs are stuck when it is downloading a big file.
Has anyone faced this type of problem of hangfire jobs stuck in processing?
Please let me know if any further information is required. Any help/guidance is appreciated.

Is this not caused by the length of time it takes to complete the download from azure?
You could try testing this with large files, and see how it handles it.
Also, like #jbl asked, how is your Hangfire Server hosted? If it is hosted in IIS then remember that the Hangfire Server may lose its heartbeat if IIS shuts down the application process due to it being idle for a given period of time.
I came across this issue in the past and ended up running the application as a process on the server.
IIS is optimised to save resources, so it will shutdown processes that aren't being used. When a request is made to your application, then it fires the process back up. This will also cause any scheduled background jobs not to fire.

Related

Can Hangfire schedule jobs do this?

I am evaluating Hangfire for an upcoming ASP.net Core project that has several scheduled and reoccurring tasks that need to execute independently of users clicking on web pages. I know that HangFire can do this if the web application is started because a request has come in. I need to know whether or not HangFire can execute a scheduled task between being rebooted and the first web request coming in.
Example: Web server is rebooted at 11pm, and no web requests will come in to cause the web server to spin up until 5am the next morning. A scheduled task needs to be performed at 1AM. Will Hangfire execute this task even though the web application hasn't been started by an incoming request?
If it can, is there a certain setup I need to do to allow this?
Details, if needed:
We are going to be using Kestrel hosted in a windows service and sitting behind an NGINX reverse proxy. This setup could be modified if needed to make HangFire meet this requirement.
When running under IIS it would be a real problem, see Making ASP.NET application always running
But it should not be problem for ASP.NET CORE with kestrel, see
It is not necessary for ASP.NET Core, because application is exposed
by a console application that it already always on – there are no
timeouts, no suspends and other optimization techniques yet. All you
need to do is to use supervisor as written in the official docs for
Linux, or use Windows Service with automatic start time, when running
on Windows.

Debugging Performance of Spring Cloud Gateway with Netty

I have Spring Cloud Gateway (Greenwich) running with Netty. This application receives request and then sends request downstream applications depending on the route configuration.
Randomly few request take lot of time(> 70s). Even though the downstream server responded back within 5 sec, Netty threads (reactor-http-epoll-*) are not picking up the response. I have enabled debug logs to see what those threads are doing. From preliminary analysis, it look like those threads are processing something else and are always in runnable state. When this happens the traffic to server is not unusual and it's same as before.
My question here is:
Why response was not processed by reactor threads while response was received(according to the logging of the downstream app, it sent the response. However, spring-cloud app received response way too late in the logs). Is it possible that all the threads are busy doing other things.
Is there any run book on how such issues should be investigated?
Some-places in logs I do see high number of inactive connections in logs but not sure if that is impacting anything. (Channel cleaned, now 56 active connections and 1400 inactive connections)
Any general guidance on how to proceed with investigation to understand why random slowness is happening in application will really help. Thanks for the help in advance.
Okay, so I ended up doing below things and after lot of investigation it started working fine for me.
Enable logging. Look at how many connections are getting created. In my case, lot of new connections were getting created and and they were not getting re-used.
io.netty.leakDetectionLevel=paranoid
logging.level.reactor.netty=DEBUG
logging.level.reactor.netty.channel.FluxReceive=DEBUG
spring.cloud.gateway.httpclient.wiretap=true
spring.cloud.gateway.httpserver.wiretap=true
Make sure there is no blocking code running on reactor-http-epoll-* threads.
I upgraded Spring Cloud dependencies from Greenwhich train to latest version of Hoxton train.

Distributed job management system

I'm using beeQueue for video transcoding job scheduling and processing
For now everything is fine and but I'm now facing challenge of working with distributed environment like auto scaling the amazon the instances for adding more workers to process more jobs which are pending in the queue, We scale well but need to implement a system which is fail safe, I mean in case a instance on which workers were processing the job has gone shutdown and we don't get job status or events, In that case the job which were running on that instance is gone into blackhole and can't be recovered and processed again.
What I did :
I'm looking up for ready made solution who works fail safe in distributed env.
Thanks

iis idle timeout and long running request on wcf service

I have to implement long running process which is starts via request to the wcf method (not start proces when application start)
I now that this is wrong solution, better will be windows serwis or something else for long running process, but for my situation it is impossible. I have to use wcf servis hosted on IIS.
I read about appdomain recycled and I can't figure out thing about Idle Timeout - appdomain restart if request run over 20 minutes. I know that this issue appears when is started background task in application start.
So will be my appdomain kill when (idle timeout is setup 20 minutes).
it is start one long running request, and after that will be not another request.
When process is started in application start IIS nothing knows about this task and this is for me clear that in this situation appdomain is closed
Does after 20 minutes IIS kill appdomain, besides that eier request still running ? I am confused, because IIS know about still running request and mayby does not do this.
What is true ?
Yes, IIS will kill the process because it works on a rolling horizon of requests, not what is running. A way around this might be to have the web service request itself while it is running to continually ping the server to let it know that it is still running. But on the whole, IIS will kill its processes when no requests are coming in.
Taken directly from MSDN: The worker process shuts down after it finishes processing its existing requests, or after a configured time-out, whichever comes first.
In your case, if your process is longer than the timeout, your process will never finish.

NServiceBus - Messages are going to Error queue directly without processing

We have an issue with a windows service which uses nServiceBus. At some random moment, the nServiceBus stops processing messages and direct them directly to Error queue, and I have to restart the service. After the restart, the messages arrived in the input message queue are handled, and everything gets back to normal. If we re-drop the messages which were went to error queue, it is processing it successfully without any issue.
We are using log4net logs to audit the message flow and storing in DB. The NServiceBus Handler stops to log in log4net. After we restart the windows service (NServiceBus) then it start to log again. We are NOT able to redproduce this issue in development environment. We are suspecting this could be a NService Bus Memory Leak issue. But we don't know how to confirm this issue and resolve the same.
We are planning to move this Windows Service (NServiceBus) to different server as a trial and error basis. Did anyone face this issue ever and resolved it? Please help us to resolve this issue as it is causing more troubles in Production environment.
NServiceBus Version that we are using : 2.0.0.1329
Message queue and windows service are in the same machine.
I believe you're running on a version of NServiceBus that is about 5 years old and is no longer supported. While I could give you the standard recommendation of upgrading to a more current release, it could very well be that some of the configuration APIs that you're using have been made obsolete so you may need to make some modifications there and/or in the app.configs.
I'm sorry to say that there probably isn't a better solution for you at this time.
In general, I'd suggest trying to track the NServiceBus releases somewhat more closely. If you're within 6-12 months of the current release, you should generally be in good shape.