Quartz.NET does not execute nor raise error for a job - jobs

Using Quartz.NET 3.0.6, a "malformed" job detail definition was passed to be scheduled, so the job was not executed and no error was raised.
Job Detail passed one param as bool (ignoreHeaderRow) instead of string (ignoreHeaderRow.ToString()), changing the param to string fixed the issue and the job got executed.
IJobDetail job = JobBuilder.Create<ImportJob>()
.WithIdentity("Immediate" + DateTime.UtcNow.ToFileTime(), GROUP_NAME)
.UsingJobData("InfolinxSession", JsonConvert.SerializeObject(session))
.UsingJobData("unprintable", unprintable.ToString())
.UsingJobData("ignoreHeaderRow", ignoreHeaderRow.ToString())
.Build();
QuartzScheduler.ScheduleJob(job);
Is there a way to catch this scenario?

Quartz.NET does log all execution errors when job throws an exception. You can enable logging (liblog abstraction hooks to NLog, log4net, Serilog) and watch for logs and have alerts with modern log aggregation system.
Other option is to have a scheduler listener attached to the scheduler listening for scheduler errors and then perfom some action on errors like Slack notification or whatever suits your needs.

Related

Camunda - Intermedia message event cannot correlate to a single execution

I created a small application (Spring Boot and camunda) to process an order process. The Order-Service receives the new order via Rest and calls the Start Event of the BPMN Order workflow. The order process contains two asynchronous JMS calls (Customer check and Warehouse Stock check). If both checks return the order process should continue.
The Start event is called within a Spring Rest Controller:
ProcessInstance processInstance =
runtimeService.startProcessInstanceByKey("orderService", String.valueOf(order.getId()));
The Send Task (e.g. the customer check) sends the JMS message into a asynchronous queue.
The answer of this service is catched by a another Spring component which then trys to send an intermediate message:
runtimeService.createMessageCorrelation("msgReceiveCheckCustomerCredibility")
.processInstanceBusinessKey(response.getOrder().getBpmnBusinessKey())
.setVariable("resultOrderCheckCustomterCredibility", response)
.correlate();
I deactivated the warehouse service to see if the order process waits for the arrival of the second call, but instead I get this exception:
1115 06:33:08.564 WARN [o.c.b.e.jobexecutor] ENGINE-14006 Exception while executing job 67d2cc24-0769-11ea-933a-d89ef3425300:
org.springframework.messaging.MessageHandlingException: nested exception is org.camunda.bpm.engine.MismatchingMessageCorrelationException: ENGINE-13031 Cannot correlate a message with name 'msgReceiveCheckCustomerCredibility' to a single execution. 4 executions match the correlation keys: CorrelationSet [businessKey=1, processInstanceId=null, processDefinitionId=null, correlationKeys=null, localCorrelationKeys=null, tenantId=null, isTenantIdSet=false]
This is my process. I cannot see a way to post my bpmn file :-(
What can't it not correlate with the message name and the business key? The JMS queues are empty, there are other messages with the same businessKey waiting.
Thanks!
Just to narrow the problem: Do a runtimeService eventSubscription query before you try to correlate and check what subscriptions are actually waiting .. maybe you have a duplicate message name? Maybe you (accidentally) have another instance of the same process running? Once you identified the subscriptions, you could just notify the execution directly without using the correlation builder ...

How to receive root cause for Pipeline Dataflow job failure

I am running my pipeline in Dataflow. I want to collect all error messages from Dataflow job using its id. I am using Apache-beam 2.3.0 and Java 8.
DataflowPipelineJob dataflowPipelineJob = ((DataflowPipelineJob) entry.getValue());
String jobId = dataflowPipelineJob.getJobId();
DataflowClient client = DataflowClient.create(options);
Job job = client.getJob(jobId);
Is there any way to receive only error message from pipeline?
Programmatic support for reading Dataflow log messages is not very mature, but there are a couple options:
Since you already have the DataflowPipelineJob instance, you could use the waitUntilFinish() overload which accepts a JobMessagesHandler parameter to filter and capture error messages. You can see how DataflowPipelineJob uses this in its own waitUntilFinish() implementation.
Alternatively, you can query job logs using the Dataflow REST API: projects.jobs.messages/list. The API takes in a minimumImportance parameter which would allow you to query just for errors.
Note that in both cases, there may be error messages which are not fatal and don't directly cause job failure.

Hangfire Job timout

I have certain jobs that appear to be 'Hung' in hangfire and may run for hours but aren't actually doing anything. Is there a way for Hangfire to kill a job if it runs longer than a certain amount to time?
I'm running the latest version of Hangfire on SQL server.
In your job creation (doesn't matter if it's a recurring or a single background job) call, pass in an extra param of type "IJobCancellationToken" to your job method like this,
public static void Method1(string param1, IJobCancellationToken token) { }
When you create your job, create it with a null IJobCancellationToken token value and save the jobId. Have another recurring job that polls these jobs and simply call BackgroundJob.Delete(jobId) when it exceeds your desired time limit. This will clear the job from hangfire and also kill the process on your server.
Reference: https://discuss.hangfire.io/t/how-to-cancel-a-job/872
Yes you can do this, you'll want to set the FetchNextJobTimeout at startup. By setting FetchNextJobTimeout, you can control how long a job can run for before Hangfire starts executing it again on another thread.
services.AddHangfire(config => {
config.UseMemoryStorage(new MemoryStorageOptions { FetchNextJobTimeout = TimeSpan.FromHours(24) });
});

Pentaho Data Integration: Error Handling

I'm building out an ETL process with Pentaho Data Integration (CE) and I'm trying to operationalize my Transformations and Jobs so that they'll be able to be monitored. Specifically, I want to be able to catch any errors and then send them to an error reporting service like Honeybadger or New Relic. I understand how to do row-level error reporting but I don't see a way to do job or transaction failure reporting.
Here is an example job.
The down path is where the transformation succeeds but has row errors. There we can just filter the results and log them.
The path to the right is the case where the transformation fails all-together (e.g. DB credentials are wrong). This is where I'm having trouble: I can't figure out how to get the error info to be sent.
How do I capture transformation failures to be logged?
You can not capture job-level errors details inside the job itself.
However there are other options for monitoring.
First option is using database logging for transformations or jobs (see the "Log" tab in the job/trans parameters dialog) - this way you always have up-to-date information about the execution status so you can, say, write a job that periodically scans the logging database and sends error reports wherever you need.
Meanwhile this option seems to be something pretty heavy-weight for development and support and not too flexible for further modifications. So in our company we ended up with monitoring on a job-execution level - i.e. when you run a job with kitchen.bat and it fails by any reason you get an "error" status of execution of the kitchen, so you can easily examine it and perform necessary actions with whenever tools you'd like - .bat commands, PowerShell or (in our case) Jenkins CI.
You could use the writeToLog("e", "Message") function in the Modified Java Script step.
Documentation:
// Writes a string to the defined Kettle Log.
//
// Usage:
// writeToLog(var);
// 1: String - The Message which should be written to
// the Kettle Debug Log
//
// writeToLog(var,var);
// 1: String - The Type of the Log
// d - Debug
// l - Detailed
// e - Error
// m - Minimal
// r - RowLevel
//
// 2: String - The Message which should be written to
// the Kettle Log

spring batch| Graceful job termination within the job

After launching a job, in the before job - there are certain occasions where we want to gracefully terminate the job (i.e. dont run the job at all but neither complain i.e .no exception). The current way of doing this looks like invoking jobExecution.stop - However, this results in JobInteruptedException which further results in logger.error invocation.
Is there any other better programmatic alternative (without manual intervention)?
You may read :
Section 5.3.3 Configuring for Stop and
section 5.3.4. Programmatic Flow Decisions.
Just introduce an end element for your first step based on condition:
The 'end' element instructs a Job to stop with a BatchStatus of
COMPLETED.
I solved the problem adding a flag boolean executeTheJob in my "before job" listener that I set to false when I don't want to execute the job.
Then I handle that in my firstStep with this configuration:
<step id="firstStep" >
<tasklet ref="myFirstTasklet"/>
<stop on="STOPPED" restart="firstStep" />
<next on="COMPLETED" to="nextStep"/>
</step>
And at the beginning of my first tasklet I have this:
if (executeTheJob == false) {
contribution.setExitStatus(ExitStatus.STOPPED);
}
stop() instruction will be active only if transaction commit successfully.
If all you chunks rollback your job doesn't stop.
I have make this workaround:
Create a ChunkListener and in the method afterChunkError(ChunkContext chunkCtx) put:
StepExecution stepExecution = chunkCtx.getStepContext().getStepExecution();
JobExecution jobExecution = jobExplorer.getJobExecution(stepExecution.getJobExecutionId());
if (jobExecution.getStatus().equals(BatchStatus.STOPPING)) {
stepExecution.setTerminateOnly();
}
This will force a "controlled" stop.
Instead of invoking stop() on the job execution, try signalling it via the JobOperator as shown in Stopping a Job