Apache Flink - exception handling in "keyBy" - error-handling

It may happen that data that enters Flink job triggers exception either due to bug in code or lack of validation.
My goal is to provide consistent way of exception handling that our team could use within Flink jobs that won't cause any downtime in production.
Restart strategies do not seem to be applicable here as:
simple restart won't fix issue and we fall into restart loop
we cannot simply skip event
they can be good for OOME or some transient issues
we cannot add custom one
try/catch block in "keyBy" function does not fully help as:
there's no way to skip event in "keyBy" after exception is handled
Sample code:
env.addSource(kafkaConsumer)
.keyBy(keySelector) // must return one result for one entry
.flatMap(mapFunction) // we can skip some entries here in case of errors
.addSink(new PrintSinkFunction<>());
env.execute("Flink Application");
I'd like to have ability to skip processing of event that caused issue in "keyBy" and similar methods that are supposed to return exactly one result.

Beside the suggestion of #phanhuy152 (which seems totally legit to me) why not filter before keyBy?
env.addSource(kafkaConsumer)
.filter(invalidKeys)
.keyBy(keySelector) // must return one result for one entry
.flatMap(mapFunction) // we can skip some entries here in case of errors
.addSink(new PrintSinkFunction<>());
env.execute("Flink Application");

Can you reserve a special value like "NULL" for the keyBy to return in such case? Then your flatMap function can skip when encounter such value?

Related

Persist detailed information about failed Item processing

I´ve got a Job that runs a TaskletStep, then a chunk-based step and then another TaskletStep.
In each of these steps, errors (in the form of Exceptions) can occur.
The chunk-based step looks like this:
stepBuilderFactory
.get("step2")
.chunk<SomeItem, SomeItem>(1)
.reader(flatFileItemReader)
.processor(itemProcessor)
.writer {}
.faultTolerant()
.skipPolicy { _ , _ -> true } // skip all Exceptions and continue
.taskExecutor(taskExecutor)
.throttleLimit(taskExecutor.corePoolSize)
.build()
The whole job definition:
jobBuilderFactory.get("job1")
.validator(validator())
.preventRestart()
.start(taskletStep1)
.next(step2)
.next(taskletStep2)
.build()
I expected that Spring Batch somehow picks up the Exceptions that occur along the way, so I can then create a Report including them after the Job has finished processing. Looking at the different contexts, there´s also fields that should contain failureExceptions. However, it seems there´s no such information (especially for the chunked step).
What would be a good approach if I need information about:
what Exceptions did occur in which Job execution
which Item was the one that triggered it
The JobExecution provides a method to get all failure exceptions that happened during the job. You can use that in a JobExecutionListener#afterJob(JobExecution jobExecution) to generate your report.
In regards to which items caused the issue, this will depend on where the exception happens (during the read, process or write operation). For this requirement, you can use one of the ItemReadListener, ItemProcessListener or ItemWriteListener to keep record of the those items (For example, by adding them to the job execution context to be able to get access to them in the JobExecutionListener#afterJob method for your report).

Best practice for BAPI commit and rollback?

I am using C# to call BAPI to communicate with SAP. I am new to this topic so I want to clarify some of the concept.
Q1: If I call BAPI_GOODSMVT_CREATE, should I check RETURN table or MAT_DOC field of items table to see whether it is succeed or failed?
Q2: If it is failed, need I call BAPI_TRANSACTION_ROLLBACK, or just ignore it(because without BAPI_TRANSACTION_COMMIT, data will not be saved)?
Q3: I found sometimes, even if there is error message, if I continue call BAPI_TRANSACTION_COMMIT, the data will be saved. But sometimes it won't.
Thanks in advance.
Check RETURN table. If it's OK, issue a BAPI_TRANSACTION_COMMIT with the WAIT flag. If it's not OK, issue a BAPI_TRANSACTION_ROLLBACK.
Check RETURN from BAPI_TRANSACTION_COMMIT as there may be errors there as well (for example a database update issue).
Ad Q1 In that particular case I'd rather check if material document number is returned in MAT_DOC. This way you don't rely on return messages. If a material document is returned, it means BAPI call was successful irrespective of messages. I find BAPIs implementation of handling return message quite inconsistent. Some BAPIs return a success message, some don't.
Ad Q2, Q3 Always call BAPI_TRANSACTION_COMMIT or BAPI_TRANSACTION_ROLLBACK after a BAPI call depending on result. BAPI_TRANSACTION_COMMIT and BAPI_TRANSACTION_ROLLBACK not only execute commit / rollback work but also call BUFFER_REFRESH_ALL function.

Handling error after aggregation

I am reading some lines from a CSV file, converting them to business objects, aggregating these to batches and passing the resulting aggregates to a bean, which may throw an PersistenceException.
Somehow like this:
from(file:inputdir).split().tokenize("\n").bean(a).aggregate(constant(true), new AbstractListAggregationStrategy(){...}).completionSize(3).bean(b)
I have a onException(Exception.class).handled(true).to("file:failuredir").log(). If an exception occurs on bean(a), everything is handled as expected: wrong lines in inputdir/input.csv are written to failuredir/input.csv.
Now if bean(b) fails, Camel seems to fail reconstructing the original message:
message.org.apache.camel.component.file.GenericFileOperationFailedException: Cannot store file: target/failure/ID-myhostname-34516-1372093690069-0-7
Having tried various attempts to get this working, like using HawtDBAggregationRepository, toggling useOriginalMessage at onException and propagating back the exception in my AggregationStrategy, I am out of ideas.
How can I achieve the same behaviour for bean(b) which can be seen with bean(a)?
The aggregator is a stateful EIP pattern, so when it sends out a message, then its a new Exchange. So the bean(b) cannot get access to the original message that came from the file route.

Transactions in Datamapper & Rails (dm-rails)

I have two models: hotel and location. A location belongs to a hotel, a hotel has one location. I'm trying to create both in a single form, bear in mind that I can't use dm-nested for nested forms due to a dependency clash.
I have code that looks like:
if (#hotel.save && #location.save)
# process
else
# back to form with errors
end
Unfortunately, #hotel.save can fail and #location.save can complete (which confuses me because I didn't think the second condition would run in an AND block if the first one failed).
I'd like to wrap these in a transaction so I can rollback the Location save. I can't seem to find a way to do it online. I'm using dm-rails, rails 3 and a postgresql database. Thanks.
The usual way to wrap database operations in DataMapper is to do something like this:
#hotel.transaction do
#hotel.save
#location.save
end
Notice that #hotel is quite arbitrary there; it could as well be #location or even a model name like Hotel.
In my experience, this works best when you enable exceptions to be thrown. Then if #hotel.save fails, it will throw an exception, which will be caught by the transaction block, causing the transaction to be rolled back. The exception is, of course, reraised.

Suppressing NServicebus Transaction to write errors to database

I'm using NServiceBus to handle some calculation messages. I have a new requirement to handle calculation errors by writing them the same database. I'm using NHibernate as my DAL which auto enlists to the NServiceBus transaction and provides rollback in case of exceptions, which is working really well. However if I write this particular error to the database, it is also rolled back which is a problem.
I knew this would be a problem, but I thought I could just wrap the call in a new transaction with the TransactionScopeOption = Suppress. However the error data is still rolled back. I believe that's because it was using the existing session with has already enlisted in the NServiceBus transaction.
Next I tried opening a new session from the existing SessionFactory within the suppression transaction scope. However the first call to the database to retrieve or save data using this new session blocks and then times out.
InnerException: System.Data.SqlClient.SqlException
Message=Timeout expired. The timeout period elapsed prior to completion of the >operation or the server is not responding.
Finally I tried creating a new SessionFactory using it to open a new session within the suppression transaction scope. However again it blocks and times out.
I feel like I'm missing something obvious here, and would greatly appreciate any suggestions on this probably common task.
As Adam suggests in the comments, in most cases it is preferred to let the entire message fail processing, giving the built-in Retry mechanism a chance to get it right, and eventually going to the error queue. Then another process can monitor the error queue and do any required notification, including logging to a database.
However, there are some use cases where the entire message is not a failure, i.e. on the whole, it "succeeds" (whatever the business-dependent definition of that is) but there is some small part that is in error. For example, a financial calculation in which the processing "succeeds" but some human element of the data is "in error". In this case I would suggest catching that exception and sending a new message which, when processed by another endpoint, will log the information to your database.
I could see another case where you want the entire message to fail, but you want the fact that it was attempted noted somehow. This may be closest to what you are describing. In this case, create a new TransactionScope with TransactionScopeOption = Suppress, and then (again) send a new message inside that scope. That message will be sent whether or not your full message transaction rolls back.
You are correct that your transaction is rolling back because the NHibernate session is opened while the transaction is in force. Trying to open a new session inside the suppressed transaction can cause a problem with locking. That's why, most of the time, sending a new message asynchronously is part of the solution in these cases, but how you do it is dependent upon your specific business requirements.
I know I'm late to the party, but as an alternative suggestion, you coudl simply raise another separate log message, which NSB handles independently, for example:
public void Handle(DebitAccountMessage message)
{
var account = this.dbcontext.GetById(message.Id);
if (account.Balance <= 0)
{
// log request - new handler
this.Bus.Send(new DebitAccountLogMessage
{
originalMessage = message,
account = account,
timeStamp = DateTime.UtcNow
});
// throw error - NSB will handle
throw new DebitException("Not enough funds");
}
}
public void Handle(DebitAccountLogMessage message)
{
var messageString = message.originalMessage.Dump();
var accountString = message.account.Dump(DumpOptions.SuppressSecurityTokens);
this.Logger.Log(message.UniqueId, string.Format("{0}, {1}", messageString, accountString);
}