Hangfire Job timout - hangfire

I have certain jobs that appear to be 'Hung' in hangfire and may run for hours but aren't actually doing anything. Is there a way for Hangfire to kill a job if it runs longer than a certain amount to time?
I'm running the latest version of Hangfire on SQL server.

In your job creation (doesn't matter if it's a recurring or a single background job) call, pass in an extra param of type "IJobCancellationToken" to your job method like this,
public static void Method1(string param1, IJobCancellationToken token) { }
When you create your job, create it with a null IJobCancellationToken token value and save the jobId. Have another recurring job that polls these jobs and simply call BackgroundJob.Delete(jobId) when it exceeds your desired time limit. This will clear the job from hangfire and also kill the process on your server.
Reference: https://discuss.hangfire.io/t/how-to-cancel-a-job/872

Yes you can do this, you'll want to set the FetchNextJobTimeout at startup. By setting FetchNextJobTimeout, you can control how long a job can run for before Hangfire starts executing it again on another thread.
services.AddHangfire(config => {
config.UseMemoryStorage(new MemoryStorageOptions { FetchNextJobTimeout = TimeSpan.FromHours(24) });
});

Related

Quartz.NET does not execute nor raise error for a job

Using Quartz.NET 3.0.6, a "malformed" job detail definition was passed to be scheduled, so the job was not executed and no error was raised.
Job Detail passed one param as bool (ignoreHeaderRow) instead of string (ignoreHeaderRow.ToString()), changing the param to string fixed the issue and the job got executed.
IJobDetail job = JobBuilder.Create<ImportJob>()
.WithIdentity("Immediate" + DateTime.UtcNow.ToFileTime(), GROUP_NAME)
.UsingJobData("InfolinxSession", JsonConvert.SerializeObject(session))
.UsingJobData("unprintable", unprintable.ToString())
.UsingJobData("ignoreHeaderRow", ignoreHeaderRow.ToString())
.Build();
QuartzScheduler.ScheduleJob(job);
Is there a way to catch this scenario?
Quartz.NET does log all execution errors when job throws an exception. You can enable logging (liblog abstraction hooks to NLog, log4net, Serilog) and watch for logs and have alerts with modern log aggregation system.
Other option is to have a scheduler listener attached to the scheduler listening for scheduler errors and then perfom some action on errors like Slack notification or whatever suits your needs.

Usage of WorkspaceJob

I have an eclipse plugin which has some performance issues. Looking into the progress view sometimes there are multiple jobs waiting and from the code most of it's arhitecture is based on classes which extend WorkspaceJobs mixed with Guava EventBus events. The current solution involves also nested jobs...
I read the documentation, I understand their purpose, but I don't get it why would I use a workspace job when I could run syncexec/asyncexec from methods which get triggered when an event is sent on the bus?
For example instead of creating 3 jobs which wait one for another, I could create an event which triggers what would have executed Job 1, then when the method is finished, it would have sent a different event type which will trigger a method that does what Job 2 would have done and so on...
So instead of:
WorkspaceJob Job1 = new WorkspaceJob("Job1");
Job1.schedule();
WorkspaceJob Job2 = new WorkspaceJob("Job2");
Job2.schedule();
WorkspaceJob Job1 = new WorkspaceJob("Job3");
Job3.schedule();
I could use:
#Subsribe
public replaceJob1(StartJob1Event event) {
//do what runInWorkspace() of Job1 would have done
com.something.getStaticEventBus().post(new Job1FinishedEvent());
}
#Subsribe
public replaceJob2(Job1FinishedEvent event) {
//do what `runInWorkspace()` of Job2 would have done
com.something.getStaticEventBus().post(new Job2FinishedEvent());
}
#Subsribe
public replaceJob3(Job2FinishedEvent event) {
//do what `runInWorkspace()` of Job3 would have done
com.something.getStaticEventBus().post(new Job3FinishedEvent());
}
I didn't tried it yet because I simplified the ideas as much as I could and the problem is more complex than that, but I think that the EventBus would win in terms of performance over the WorkspaceJobs.
Can anyone confirm my idea or tell my why this I shouldn't try this( except for the fact that I must have a good arhitecture of my events)?
WorkspaceJob delays resource change events until the job finishes. This prevents components listening for resource changes receiving half completed changes. This may or may not be important to your application.
I can't comment on the Guava code as I don't know anything about it - but note that if your code is long running you must make sure it runs in a background thread (which WorkbenchJob does).

Change Google Cloud Dataflow BigQuery Priority

I have a Beam job running on Google Cloud DataFlow that reads data from BigQuery. When I run the job it takes minutes for the job to start reading data from the (tiny) table. It turns out the dataflow job sends of a BigQuery job which runs in BATCH mode and not in INTERACTIVE mode. How can I switch this to run immediately in Apache Beam? I couldn't find a method in the API to change the priority.
Maybe a Googler will correct me, but no, you cannot change this from BATCH to INTERACTIVE because it's not exposed by Beam's API.
From org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.java (here):
private void executeQuery(
String executingProject,
String jobId,
TableReference destinationTable,
JobService jobService) throws IOException, InterruptedException {
JobReference jobRef = new JobReference()
.setProjectId(executingProject)
.setJobId(jobId);
JobConfigurationQuery queryConfig = createBasicQueryConfig()
.setAllowLargeResults(true)
.setCreateDisposition("CREATE_IF_NEEDED")
.setDestinationTable(destinationTable)
.setPriority("BATCH") <-- NOT EXPOSED
.setWriteDisposition("WRITE_EMPTY");
jobService.startQueryJob(jobRef, queryConfig);
Job job = jobService.pollJob(jobRef, JOB_POLL_MAX_RETRIES);
if (parseStatus(job) != Status.SUCCEEDED) {
throw new IOException(String.format(
"Query job %s failed, status: %s.", jobId, statusToPrettyString(job.getStatus())));
}
}
If it's really a problem for you that the query is running in BATCH mode, then one workaround could be:
Using the BigQuery API directly, roll your own initial request, and set the priority to INTERACTIVE.
Write the results of step 1 to a temp table
In your Beam pipeline, read the temp table using BigQueryIO.Read.from()
You can configure to run the queries with "Interactive" priority by passing a priority parameter. Check this Github example for details.
Please note that you might be reaching one of the BigQuery limits and quotas as when you use batch, if you ever hit a rate limit, the query will be queued and retried later. As opposed to the interactive ones, when if these limits are hit, the query will fail immediately. This is because BigQuery assumes that an interactive query is something you need run immediately.

Hangfire: How to enqueue a job conditionally

I am using Hangfire to trigger a database retrieval operation as a background job.
This operation is only supposed to happen once, and can be triggered in multiple ways. (for example, in the UI whenever a user drags and drops a tool, I need to fire that job in the background. But if another tool is dragged and dropped, I don't want to fire the background job as it's already prefetched from the database).
This is what my code looks like now:
var jobId = BackgroundJob.Enqueue<BackgroundModelHelper>( (x) => x.PreFetchBillingByTimePeriods(organizationId) );
What I want is some kind of check before I execute above statement, to find if a background job has already been fired; if yes, then do not fire another and if not, then enqueue this .
for example:
bool prefetchIsFired = false;
// find out if a background job has already been fired. If yes, set prefetchIsFired to true.
if (!prefetchIsFired)
var jobId = BackgroundJob.Enqueue<BackgroundModelHelper>( (x) => x.PreFetchBillingByTimePeriods(organizationId, null) );
You can use a filter (DisableMultipleQueuedItemsFilter) on your job method like here : https://discuss.hangfire.io/t/how-do-i-prevent-creation-of-duplicate-jobs/1222/4

Can I get the current queue in perform method of worker with sidekiq/redis?

I want to be able to delete all job in queue, but I don't know what queue is it. I'm in perform method of my worker and I need to get the "current queue", the queue where the current job is come from.
for this time I use :
require 'sidekiq/api'
queue = Sidekiq::Queue.new
queue.each do |job|
job.delete
end
because I just use "default queue", It's work.
But now I will use many queues and I can't specify only one queue for this worker because I need use a lots for a server load balancing.
So how I can get the queue where we are in perform method?
thx.
You can't by design, that's orthogonal context to the job. If your job needs to know a queue name, pass it explicitly as an argument.
This is much faster:
Sidekiq::Queue.new.clear
These docs show that you can access all running job information wwhich includes the jid (job ID) and queue name for each job
inside the perform method you have access to the jid with the jid accessor. From that you can find the current job and get the queue name
workers = Sidekiq::Workers.new
this_worker = workers.find { |_, _, work|
work['payload']['jid'] == jid
}
queue = this_worker[2]['queue']
however, the content of Sidekiq::Workers can be up to 5 seconds out of date, so you should only try this after your worker has been running at least 5 seconds, which may not be ideal