"Thread was being aborted" exception raised in excuting Hangfire job - hangfire

I have a Hangfire (with SqlServer for persistence) job which query some data from database. Then generate an Excel with NPOI with these data, and sent it via Email to users.
The data is not so large, only like 1,000 rows. So the job executes in seconds. However, I get "Thread was being aborted" exception when generating Excel or sending mail.
I guess there is something like Timeout, which kills the thread.
Does anyone have the same issue? Any solutions?

Alright. Actually I made a stupid mistake in here.
Before I generate an Excel, I have a template. The template will be copied to HttpRuntime.BinDirectory first (I am trying to avoid the resource conflict).
This HttpRuntime.BinDirectory is the root cause which makes the thread aborted.
When there are any changes in websites bin folder, IIS will renew the AppDomain (or ApplicationPool?). That will abort current running threads.

Related

ProcessPoolExecutor stuck indefinitely when child process dies

I have a script running on one of my linux servers which handles batch file processing with a ProcessPoolExecutor and generally runs fine days or even weeks on end without any issue. Sometimes though it looks like a few of my child processes just die (I have no error message or exception at all and can't reproduce it even with killing cp's from the shell) and lead to the parent process just waiting for the return indefinitely...
Thats the call (the initializer doesn't have any effect in this case, it's just to handle the reverse scenario described in another very helpful thread on s.o.)
with ProcessPoolExecutor(max_workers=int(config['PERFORMANCE']['NumberOfProcesses']),
initializer=start_thread_to_terminate_when_parent_process_dies,
initargs=(os.getpid(),)
) as executor:
executor.map(process_main, file_list)
From what I've gathere the Pool should be able to recover in exactly the described scenario:
https://bugs.python.org/issue9205
Anyone got any idea? (thought about switching to the pebble library with it's timeout functionality or creating a separate watchdog script)

Avoid Deadlock During SSRS Reports Deployment

I wonder if anyone has any suggestion or experience with the same scenario.
We have one Server we utilise for our SSRS Reports. We deploy to Multiple Folders in SSRS i.e. Site_1, Site_2, Site_3 ... Site_26
In each site we deploy roughly about 800+ Reports. These reports are the same for Site_1 to Site_26 (except if we skip a site).
We use Azure DevOps with Powershell ReportingServicesTools to deploy the reports.
What happens is when we start the deployment, we will get several sites failing due to a deadlock with the below error:
The Report and Process ID is Random and never the same
##[error]Failed to create item Report.rdl : Failed to create catalog item C:\azagent\A9_work\r5\a\SSRS Reports\Reports\Report.rdl : Exception calling "CreateCatalogItem" with "7" argument(s): "System.Web.Services.Protocols.SoapException: An error occurred within the report server database. This may be due to a connection failure, timeout or low disk condition within the database. ---> Microsoft.ReportingServices.Diagnostics.Utilities.ReportServerStorageException: An error occurred within the report server database. This may be due to a connection failure, timeout or low disk condition within the database. ---> System.Data.SqlClient.SqlException: Transaction (Process ID 100) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
The error is not related to Low Disk etc as we've tested this to death and it occurs with two sites on a monster server. The error is Transaction Deadlock.
The only way we can successfully deploy the reports is if we deploy them concurrently one after the other. However, due to time constraints and business requirements this is not an option.
We have done all the PSSDiags etc and found that the error occurs due to this Stored Procedure "FindObjectsNonRecursive"
We nearly resolved it by adding the (NoLock) option but it seems this was only temporary and we're back to where we were. Microsoft also advised that they would not change it. Also noting that 18 months down the line MS still has not been able to give us a fix or a solution to our problem.
I would appreciate any feedback from anyone on how you overcame this problem if you had it.
Thank you for your time.
I would appreciate any feedback from anyone on how you overcame this problem if you had it.
Did you try retrying like the error suggests? Deadlocks are timing-dependent, so it should eventually succeed.

The wait operation timed out. .aspx

I created an internal website for our company. It run smoothly for several months and then I add more items to website. When I run in live, it run normally. Then suddenly one of my user from another server sending me an "The Wait operation timed out." error. When I check access that certain link, It run normally for me and some other who I ask to check if they access that page. I already increase the connection timeout but still no luck. Is it the error come from another server? Can someone explain the possible causes?
This is how the another plant faced, every time they firstly open the website, error screen show up, but when they refresh it, they can use the website. I dont know why this happened. I need your help.
Down below is a error detail:
1.Exception Details: System.ComponentModel.Win32Exception: The wait operation timed out
source error :An unhandled exception was generated during the execution of the current web request.
2.Information regarding the origin and location of the exception can be identified using the exception stack trace below.
Thanks in advance
The fact that this happens for a user but not for the testers implies this may occur when the system is under load; database timeouts are pretty common in database queries functioning under stress if the database has been set up "out of the box" without tuning.
I would suggest referring to
The wait operation timed out. ASP
I don't have enough information to troubleshoot more question properly, since I don't know what DBMS you are working with. But as a rule this seems to happen because a call to the database is timing out. In SQL Server, increasing the CommandTimeout (NOT connection timeout) is one of the quick-and-dirty ways to solve the problem.
In SQL Server, CommandTimeout is the time allowed for an operation before exiting with a time out error. Connectiontimeout, by contrast, is the time the system waits when trying to open an initial connection to the database. Changing connectiontimeout won't help with the timeout of an operation, but commandtimeout will.
Other DBMS systems will have other mechanisms for resolving timeout issues.
That's one quick and dirty solution. The longer solution is to add more logging to your system to identify which calls are timing out, then doing some DBA work to optimize the query and database performance. My understanding is that entity frameworks also have tuning options for automatically generated queries, but exactly what those are depends on which one you're using!

Spring Batch restart crashed jobs

Hi spring batch users,
regarding the documentation http://docs.spring.io/spring-batch/reference/htmlsingle/#d5e1320
"If the process died ("kill -9" or server failure) the job is, of course, not running, but the JobRepository has no way of knowing because no-one told it before the process died."
I try to find and restart the stale job executions by using
Set<JobExecution> jobExecutions = jobExplorer.findRunningJobExecutions(jobName);
...
jobExecution.setStatus(FAILED);
jobExecution.setEndTime(new Date());
jobRepository.update(jobExecution);
jobOperator.restart(jobExecution.getId());
But this seems to be very inconvenient.
1) I have to do this before other (new) jobs could be started.
2) I have to handle multiple instances of running servers so findRunningJobExecutions will not do the trick.
You can find other questions regarding this topic:
https://jira.spring.io/browse/BATCH-2433?jql=project%20%3D%20BATCH%20AND%20status%20%3D%20Open%20ORDER%20BY%20priority%20DESC
Spring Batch after JVM crash
I would love to see a solution to register a "start up clean jobs listener". This will still not fix the problems originated by the multi server environment because spring batch does not know if the JobExecution marked by STARTED is not running on an other instance.
Thanks for any advice
Alex
Your job cannot and should not recover "automatically" from a kill -9 scenario. A kill -9 is treated very differently than you application throwing a caught Exception. The reason for this is that you've effectively pulled the carpet out from under the application without giving it a chance to reach a synchronization point with the database to commit any necessary information to the ExecutionContext or update the job/step status(es). Therefore, the last status touchpoint with the database will remain and the job will still look STARTED.
"OK, fine" you say, "but if I start another execution, I want it to find that STARTED execution, and pick up where it left off." The problem here is that there is no clean way for the application to distinguish a job that is ACTUALLY RUNNING from one that has failed but couldn't up the database. The framework here correctly errs on the side of caution and prevents you from starting a job that already appears running, and this is a GOOD thing.
Why? Because let's assume your job was actually still running and you restarted by accident. As coded, the framework will start to spin up, see your running execution and fail with the following message A job execution for this job is already running. I can't tell you how many times we've been saved by this because someone accidentally launched a job twice!
If you were to implement the listener you suggest, the 2nd execution would instead be allowed to start and you'd have 2 different JVMs repeating the same work, possibly writing to the same files/tables and causing a huge data mess that could be impossible to clean up.
Trust me, in the event the Linux terminal kills your job or your job dies because the connection to the database has been severed, you WANT human eyes on those execution states before you attempt a restart.
Finally, on the off chance you actually wanted to kill you job, you can leverage several other standard patterns for stopping jobs:
Stop via throw Exception
Stop via JobOperator.stop()

Deadlocks when running NServicebus service causes corrupt connection

We're running NServiceBus for a web application to handle situations where the user do "batch like" actions. Like fire a command that affects 1000 entities..
It works well, but during moderate load we get some deadlocks, this isn't a problem, just retry the message.. right? :)
The problem occurs when the next message arrives and tries to open a connection. The connection is then "corrupt".
We get the following error:
System.Data.SqlClient.SqlException (0x80131904): New request is not allowed to start because it should come with valid transaction descriptor
I've searched the web and I think our problem is a reported NH "bug":
A workaround should be to disable connection pooling. But I don't like that, since performce will degrade..
We're running NServiceBus 2.6, NHibernate 3.3.
Does anyone have any experience with this? Can a upgrade of NServiceBus help?
I’ve seen this in the past, if your design warrants, try breaking the transaction into two, if you flow the message transaction all the way to your database operations, any failures will have a cascading effect and it will impact (ideally it shouldn’t) any subsequent messages as well.
Instead of updating the 1000 entities in the command could you publishing an event to say that the command has been completed and then have several subscribers acting on this event to update effect entities. It sounds to me that a command that updates a 1000 entities should be split into a number of smaller commands. Take a look a the sagas to see how you can handle long running business process. For example, you might have something like, process started, step 1 completed, step 2 completed , process completed etc...