Spring Cloud Stream deserialization error handling for Batch processing

Spring Cloud Stream deserialization error handling for Batch processing - batch-processing

I have a question about handling deserialization exceptions in Spring Cloud Stream while processing batches (i.e. batch-mode: true).
Per the documentation here, https://docs.spring.io/spring-kafka/docs/2.5.12.RELEASE/reference/html/#error-handling-deserializer, (looking at the implementation of FailedFooProvider), it looks like this function should return a subclass of the original message.
Is the intent here that a list of both Foo's and BadFoo's will end up at the original #StreamListener method, and then it will be up to the code (i.e. me) to sort them out and handle separately? I suspect this is the case, as I've read that the automated DLQ sending isn't desirable for batch error handling, as it would resubmit the whole batch.
And if this is the case, what if there is more than one message type received by the app via different #StreamListener's, say Foo's and Bar's. What type should the value function return in that case? Below is the pseudo code to illustrate the second question?
#StreamListener
public void readFoos(List<Foo> foos) {
List<> badFoos = foos.stream()
.filter(f -> f instanceof BadFoo)
.map(f -> (BadFoo) f)
.collect(Collectors.toList());
// logic
}
#StreamListener
public void readBars(List<Bar> bars) {
// logic
}
// Updated to return Object and let apply() determine subclass
public class FailedFooProvider implements Function<FailedDeserializationInfo, Object> {
#Override
public Object apply(FailedDeserializationInfo info) {
if (info.getTopics().equals("foo-topic") {
return new BadFoo(info);
}
else if (info.getTopics().equals("bar-topic") {
return new BadBar(info);
}
}
}

Yes, the list will contain the function result for failed deserializations; the application needs to handle them.
The function needs to return the same type that would have been returned by a successful deserialization.
You can't use conditions with batch listeners. If the list has a mixture of Foos and Bars, they all go to the same listener.

Related

Can someone explain to me what's the proper usage of gRPC StreamObserver.onError?

I am trying to handle gRPC errors properly (Java, Spring-boot app).
Basically, I need too transfer error details from gRPC server to client, but I find it hard to understand the proper usage of StreamObserver.onError();
The method doc says:
"Receives a terminating error from the stream. May only be called once
and if called it must be the last method called. In particular if an
exception is thrown by an implementation of onError no further calls
to any method are allowed."
What does this "no further calls are allowed" mean? In the app that I maintain, they call other gRPC methods and they get java.lang.IllegalStateException: call already closed which is just fine, as per documentation.
I am wondering - should I (the developer) terminate the current java method (which usus gRPC calls) after an error is received? Like for example throwing an exception to stop execution. Or it is expected tht gRPC is going to terminate the execution.. (something like throwing an exception from gRPC)
Basically how do I properly use onError() and what should I expect and handle if I call it?
I need an explanation of its usage and effects.

There are two StreamObserver instances involved. One is for the inbound direction, which is the StreamObserver instance you implement and pass to the gRPC library. This is the StreamObserver containing your logic for how to handle responses. The other is for the outbound direction, which is the StreamObserver instance that gRPC library returns to you when calling the RPC method. This is the StreamObserver that you use to send requests. Most of the time, these two StreamObservers are interacting with each other (e.g., in a fully duplexed streaming call, the response StreamObserver usually calls the request StreamObserver's onNext() method, this is how you achieve ping-pong behavior).
"no further calls are allowed" means you should not call any more onNext(), onComplete() and/or onError() on the outbound direction StreamObserver when the inbound StreamObserver's onError() method is invoked, even if your implementation for the inbound onError() throws an exception. Since the inbound StreamObserver is invoked asynchronously, it has nothing to do with your method that encloses the StreamObserver's implementation.
For example:
public class HelloWorld {
private final HelloWorldStub stub;
private StreamObserver<HelloRequest> requestObserver;
...
private void sendRequest(String message) {
requestObserver.onNext(HelloRequest.newBuilder.setMessage(message).build());
}
public void start() {
stub.helloWorld(new StreamObserver<HelloResponse> {
#Override
public void onNext(HelloResponse response) {
sendRequest("hello from client");
// Optionally you can call onCompleted() or onError() on
// the requestObserver to terminate the call.
}
#Override
public void onCompleted() {
// You should not call any method on requestObserver.
}
#Override
public void onError(Throwable error) {
// You should not call any method on requestObserver.
}
});
}
}
It has nothing to do with the start() method.
The doc is also mentioning that you should not do things like
try {
requestObserver.onCompleted();
} catch(RuntimeException e) {
requestObserver.onError();
}
It's mostly for user's own StreamObserver implementations. StreamObserver's returned by gRPC never throws.

I've extracted a template for GRPC streaming which sort of abstracts away a lot of the GRPC boilerplate that also addresses the the logic for onError. In the DechunkingStreamObserver
I use the following general pattern for GRPC streaming which is something along the lines of
META DATA DATA DATA META DATA DATA DATA
An example of where I would use it would be to take one form and transform it to another form.
message SavedFormMeta {
string id = 1;
}
message SavedFormChunk {
oneof type {
SavedFormMeta meta = 1;
bytes data = 2;
}
}
rpc saveFormDataStream(stream SavedFormChunk) returns (stream SavedFormChunk) {}
I use a flag that would track the inError state to prevent further processing and catch exceptions on the onNext and onComplete both of which I redirect to onError which forwards the error to the server side.
The code below pulls the GRPC semantics and takes lamdas that do the processing.
/**
* Dechunks a GRPC stream from the request and calls the consumer when a complete object is created. This stops
* further processing once an error has occurred.
*
* #param <T> entity type
* #param <R> GRPC chunk message type
* #param <S> GRPC message type for response streams
*/
class DechunkingStreamObserver<T, R, S> implements StreamObserver<R> {
/**
* This function takes the current entity state and the chunk and returns a copy of the combined result. Note the combiner may modify the existing data, but may cause unexpected behaviour.
*/
private final BiFunction<T, R, T> combiner;
/**
* A function that takes in the assembled object and the GRPC response observer.
*/
private final BiConsumer<T, StreamObserver<S>> consumer;
/**
* Predicate that returns true if it is a meta chunk indicating a start of a new object.
*/
private final Predicate<R> metaPredicate;
/**
* this function gets the meta chunk and supplies a new object.
*/
private final Function<R, T> objectSupplier;
/**
* GRPC response observer.
*/
private final StreamObserver<S> responseObserver;
/**
* Currently being processed entity.
*/
private T current = null;
/**
* In error state. Starts {#code false}, but once it is set to {#code true} it stops processing {#link #onNext(Object)}.
*/
private boolean inError = false;
/**
* #param metaPredicate predicate that returns true if it is a meta chunk indicating a start of a new object.
* #param objectSupplier this function gets the meta chunk and supplies a new object
* #param combiner this function takes the current entity state and the chunk and returns a copy of the combined result. Note the combiner may modify the existing data, but may cause unexpected behaviour.
* #param consumer a function that takes in the assembled object and the GRPC response observer.
* #param responseObserver GRPC response observer
*/
DechunkingStreamObserver(
final Predicate<R> metaPredicate,
final Function<R, T> objectSupplier,
final BiFunction<T, R, T> combiner,
final BiConsumer<T, StreamObserver<S>> consumer,
final StreamObserver<S> responseObserver) {
this.metaPredicate = metaPredicate;
this.objectSupplier = objectSupplier;
this.combiner = combiner;
this.consumer = consumer;
this.responseObserver = responseObserver;
}
#Override
public void onCompleted() {
if (inError) {
return;
}
try {
if (current != null) {
consumer.accept(current, responseObserver);
}
responseObserver.onCompleted();
} catch (final Exception e) {
onError(e);
}
}
#Override
public void onError(final Throwable throwable) {
responseObserver.onError(throwable);
inError = true;
}
#Override
public void onNext(final R chunk) {
if (inError) {
return;
}
try {
if (metaPredicate.test(chunk)) {
if (current != null) {
consumer.accept(current, responseObserver);
}
current = objectSupplier.apply(chunk);
} else {
current = combiner.apply(current, chunk);
}
} catch (final Exception e) {
onError(e);
}
}
}
I have 4 lamdas
Predicate<R> metaPredicate which takes in a chunk and returns whether the chunk is meta or not.
Function<R, T> objectSupplier which takes in a meta chunk and creates a new object that is used by your module.
BiFunction<T, R, T> combiner, which takes in a data chunk and the current object and returns a new object that contains the combination.
BiConsumer<T, StreamObserver<S>> consumer which will consume a completed object. It also passes in a stream observer in the case of sending new objects in response.

the only thing you want to do is mark as return after calling the responseObserver.onError(); like below. because there is nothing to do after sending the error.
if(condition){
responseObserver.onError(StatusProto.toStatusException(status));
//this is the required part
return;
}else{
responseObserver.onComplete(DATA);
}

How to modify variables outside of their scope in kotlin?

I understand that in Kotlin there is no such thing as "Non-local variables" or "Global Variables" I am looking for a way to modify variables in another "Scope" in Kotlin by using the function below:
class Listres(){
var listsize = 0
fun gatherlistresult(){
var listallinfo = FirebaseStorage.getInstance()
.getReference()
.child("MainTimeline/")
.listAll()
listallinfo.addOnSuccessListener {
listResult -> listsize += listResult.items.size
}
}
}
the value of listsize is always 0 (logging the result from inside of the .addOnSuccessListener scope returns 8) so clearly the listsize variable isn't being modified. I have seen many different posts about this topic on other sites , but none fit my usecase.
I simply want to modify listsize inside of the .addOnSuccessListener callback

This method will always be returned 0 as the addOnSuccessListener() listener will be invoked after the method execution completed. The addOnSuccessListener() is a callback method for asynchronous operation and you will get the value if it gives success only.
You can get the value by changing the code as below:
class Demo {
fun registerListResult() {
var listallinfo = FirebaseStorage.getInstance()
.getReference()
.child("MainTimeline/")
.listAll()
listallinfo.addOnSuccessListener {
listResult -> listsize += listResult.items.size
processResult(listsize)
}
listallinfo.addOnFailureListener {
// Uh-oh, an error occurred!
}
}
fun processResult(listsize: Int) {
print(listResult+"") // you will get the 8 here as you said
}
}

What you're looking for is a way to bridge some asynchronous processing into a synchronous context. If possible it's usually better (in my opinion) to stick to one model (sync or async) throughout your code base.
That being said, sometimes these circumstances are out of our control. One approach I've used in similar situations involves introducing a BlockingQueue as a data pipe to transfer data from the async context to the sync context. In your case, that might look something like this:
class Demo {
var listSize = 0
fun registerListResult() {
val listAll = FirebaseStorage.getInstance()
.getReference()
.child("MainTimeline/")
.listAll()
val dataQueue = ArrayBlockingQueue<Int>(1)
listAll.addOnSuccessListener { dataQueue.put(it.items.size) }
listSize = dataQueue.take()
}
}
The key points are:
there is a blocking variant of the Queue interface that will be used to pipe data from the async context (listener) into the sync context (calling code)
data is put() on the queue within the OnSuccessListener
the calling code invokes the queue's take() method, which will cause that thread to block until a value is available
If that doesn't work for you, hopefully it will at least inspire some new thoughts!

How to handle exceptions thrown in Wicket custom model?

I have a component with a custom model (extending the wicket standard Model class). My model loads the data from a database/web service when Wicket calls getObject().
This lookup can fail for several reasons. I'd like to handle this error by displaying a nice message on the web page with the component. What is the best way to do that?
public class MyCustomModel extends Model {
#Override
public String getObject() {
try {
return Order.lookupOrderDataFromRemoteService();
} catch (Exception e) {
logger.error("Failed silently...");
// How do I propagate this to the component/page?
}
return null;
}
Note that the error happens inside the Model which is decoupled from the components.

Handling an exception that happens in the model's getObject() is tricky, since by this time we are usually deep in the response phase of the whole request cycle, and it is too late to change the component hierarchy. So the only place to handle the exception is very much non-local, not anywhere near your component or model, but in the RequestCycle.
There is a way around that though. We use a combination of a Behavior and an IRequestCycleListener to deal with this:
IRequestCycleListener#onException allows you to examine any exception that was thrown during the request. If you return an IRequestHandler from this method, that handler will be run and rendered instead of whatever else was going on beforehand.
We use this on its own to catch generic stuff like Hibernate's StaleObjectException to redirect the user to a generic "someone else modified your object" page. If you
For more specific cases we add a RuntimeExceptionHandler behavior:
public abstract class RuntimeExceptionHandler extends Behavior {
public abstract IRequestHandler handleRuntimeException(Component component, Exception ex);
}
In IRequestCycleListener we walk through the current page's component tree to see whether any component has an instance of RuntimeExceptionHandler. If we find one, we call its handleRuntimeException method, and if it returns an IRequestHandler that's the one we will use. This way you can have the actual handling of the error local to your page.
Example:
public MyPage() {
...
this.add(new RuntimeExceptionHandler() {
#Override public IRequestHandler handleRuntimeException(Component component, Exception ex) {
if (ex instanceof MySpecialException) {
// just an example, you really can do anything you want here.
// show a feedback message...
MyPage.this.error("something went wrong");
// then hide the affected component(s) so the error doesn't happen again...
myComponentWithErrorInModel.setVisible(false); // ...
// ...then finally just re-render this page:
return new RenderPageRequestHandler(new PageProvider(MyPage.this));
} else {
return null;
}
}
});
}
Note: This is not something shipped with Wicket, we rolled our own. We simply combined the IRequestCycleListener and Behavior features of Wicket to come up with this.

Your model could implement IComponentAssignedModel, thus being able to get hold on the owning component.
But I wonder how often are you able to reuse MyCustomModel?
I know that some devs advocate creating standalone model implementations (often in separate packages). While there are general cases where this is useful (e.g. FeedbackMessagesModel), in my experience its easier to just create inner classes which are component specific.

Being the main issue here that Models are by design decoupled from the component hierarchy, you could implement a component-aware Model that will report all errors against a specific component.
Remember to make sure it implements Detachable so that the related Component will be detached.
If the Model will perform an expensive operation, you might be interested in using LoadableDetachableModel instead (take into account that Model.getObject() might be called multiple times).
public class MyComponentAwareModel extends LoadableDetachableModel {
private Component comp;
public MyComponentAwareModel(Component comp) {
this.comp = comp;
}
protected Object load() {
try {
return Order.lookupOrderDataFromRemoteService();
} catch (Exception e) {
logger.error("Failed silently...");
comp.error("This is an error message");
}
return null;
}
protected void onDetach(){
comp.detach();
}
}
It might also be worth to take a try at Session.get().error()) instead.

I would add a FeedbackPanel to the page and call error("some description") in the catch clause.

You might want to simply return null in getObject, and add logic to the controller class to display a message if getObject returns null.
If you need custom messages for different fail reasons, you could add a property like String errorMessage; to the model which is set when catching the Exception in getObject - so your controller class can do something like this
if(model.getObject == null) {
add(new Label("label",model.getErrorMessage()));
} else {
/* display your model object*/
}

Have multiple calls wait on the same internal async task

(Note: this is an over-simplified scenario to demonstrate my coding issue.)
I have the following class interface:
public class CustomerService
{
Task<IEnumerable<Customer>> FindCustomersInArea(String areaName);
Task<Customer> GetCustomerByName(String name);
:
}
This is the client-side of a RESTful API which loads a list of Customer objects from the server then exposes methods that allows client code to consume and work against that list.
Both of these methods work against the internal list of Customers retrieved from the server as follows:
private Task<IEnumerable<Customer>> LoadCustomersAsync()
{
var tcs = new TaskCompletionSource<IEnumerable<Customer>>();
try
{
// GetAsync returns Task<HttpResponseMessage>
Client.GetAsync(uri).ContinueWith(task =>
{
if (task.IsCanceled)
{
tcs.SetCanceled();
}
else if (task.IsFaulted)
{
tcs.SetException(task.Exception);
}
else
{
// Convert HttpResponseMessage to desired return type
var response = task.Result;
var list = response.Content.ReadAs<IEnumerable<Customer>>();
tcs.SetResult(list);
}
});
}
catch (Exception ex)
{
tcs.SetException(ex);
}
}
The Client class is a custom version of the HttpClient class from the WCF Web API (now ASP.NET Web API) because I am working in Silverlight and they don't have an SL version of their client assemblies.
After all that background, here's my problem:
All of the methods in the CustomerService class use the list returned by the asynchronous LoadCustomersAsync method; therefore, any calls to these methods should wait (asynchronously) until the LoadCustomers method has returned and the appopriate logic executed on the returned list.
I also only want one call made from the client (in LoadCustomers) at a time. So, I need all of the calls to the public methods to wait on the same internal task.
To review, here's what I need to figure out how to accomplish:
Any call to FindCustomersInArea and GetCustomerByName should return a Task that waits for the LoadCustomersAsync method to complete. If LoadCustomersAsync has already returned (and the cached list still valid), then the method may continue immediately.
After LoadCustomersAsync returns, each method has additional logic required to convert the list into the desired return value for the method.
There must only ever be one active call to LoadCustomersAsync (of the GetAsync method within).
If the cached list expires, then subsequent calls will trigger a reload (via LoadCustomersAsync).
Let me know if you need further clarification, but I'm hoping this is a common enough use case that someone can help me work out the logic to get the client working as desired.

Disclaimer: I'm going to assume you're using a singleton instance of your HttpClient subclass. If that's not the case we need only modify slightly what I'm about to tell you.
Yes, this is totally doable. The mechanism we're going to rely on for subsequent calls to LoadCustomersAsync is that if you attach a continuation to a Task, even if that Task completed eons ago, you're continuation will be signaled "immediately" with the task's final state.
Instead of creating/returning a new TaskCompletionSource<T> (TCS) every time from the LoadCustomerAsync method, you would instead have a field on the class that represents the TCS. This will allow your instance to remember the TCS that last represented the call that represented a cache-miss. This TCS's state will be signaled exactly the same as your existing code. You'll add the knowledge of whether or not the data has expired as another field which, combined with whether the TCS is currently null or not, will be the trigger for whether or not you actually go out and load the data again.
Ok, enough talk, it'll probably make a lot more sense if you see it.
The Code
public class CustomerService
{
// Your cache timeout (using 15mins as example, can load from config or wherever)
private static readonly TimeSpan CustomersCacheTimeout = new TimeSpan(0, 15, 0);
// A lock object used to provide thread safety
private object loadCustomersLock = new object();
private TaskCompletionSource<IEnumerable<Customer>> loadCustomersTaskCompletionSource;
private DateTime loadCustomersLastCacheTime = DateTime.MinValue;
private Task<IEnumerable<Customer>> LoadCustomersAsync()
{
lock(this.loadCustomersLock)
{
bool needToLoadCustomers = this.loadCustomersTaskCompletionSource == null
||
(this.loadCustomersTaskCompletionSource.Task.IsFaulted || this.loadCustomersTaskCompletionSource.Task.IsCanceled)
||
DateTime.Now - this.loadCustomersLastCacheTime.Value > CustomersService.CustomersCacheTimeout;
if(needToLoadCustomers)
{
this.loadCustomersTaskCompletionSource = new TaskCompletionSource<IEnumerable<Customer>>();
try
{
// GetAsync returns Task<HttpResponseMessage>
Client.GetAsync(uri).ContinueWith(antecedent =>
{
if(antecedent.IsCanceled)
{
this.loadCustomersTaskCompletionSource.SetCanceled();
}
else if(antecedent.IsFaulted)
{
this.loadCustomersTaskCompletionSource.SetException(antecedent.Exception);
}
else
{
// Convert HttpResponseMessage to desired return type
var response = antecedent.Result;
var list = response.Content.ReadAs<IEnumerable<Customer>>();
this.loadCustomersTaskCompletionSource.SetResult(list);
// Record the last cache time
this.loadCustomersLastCacheTime = DateTime.Now;
}
});
}
catch(Exception ex)
{
this.loadCustomersTaskCompletionSource.SetException(ex);
}
}
}
}
return this.loadCustomersTaskCompletionSource.Task;
}
Scenarios where the customers aren't loaded:
If it's the first call, the TCS will be null so the TCS will be created and customers fetched.
If the previous call faulted or was canceled, a new TCS will be created and the customers fetched.
If the cache timeout has expired, a new TCS will be created and the customers fetched.
Scenarios where the customers are loading/loaded:
If the customers are in the process of loading, the existing TCS's Task will be returned and any continuations added to the task using ContinueWith will be executed once the TCS has been signaled.
If the customers are already loaded, the existing TCS's Task will be returned and any continuations added to the task using ContinueWith will be executed as soon as the scheduler sees fit.
NOTE: I used a coarse grained locking approach here and you could theoretically improve performance with a reader/writer implementation, but it would probably be a micro-optimization in your case.

I think you should change the way you call Client.GetAsync(uri). Do it roughly like this:
Lazy<Task> getAsyncLazy = new Lazy<Task>(() => Client.GetAsync(uri));
And in your LoadCustomersAsync method you write:
getAsyncLazy.Value.ContinueWith(task => ...
This will ensure that GetAsync only gets called once and that everyone interested in its result will receive the same task.

Persisted properties - asynchronously

In classic ASP.NET I’d persist data extracted from a web service in base class property as follows:
private string m_stringData;
public string _stringData
{ get {
if (m_stringData==null)
{
//fetch data from my web service
m_stringData = ws.FetchData()
}
return m_stringData;
}
}
This way I could simply make reference to _stringData and know that I’d always get the data I was after (maybe sometimes I’d use Session state as a store instead of a private member variable).
In Silverlight with a WCF I might choose to use Isolated Storage as my persistance mechanism, but the service call can't be done like this, because a WCF service has to be called asynchronously.
How can I both invoke the service call and retrieve the response in one method?
Thanks,
Mark

In your method, invoke the service call asynchronously and register a callback that sets a flag. After you have invoked the method, enter a busy/wait loop checking the flag periodically until the flag is set indicating that the data has been returned. The callback should set the backing field for your method and you should be able to return it as soon as you detect the flag has been set indicating success. You'll also need to be concerned about failure. If it's possible to get multiple calls to your method from different threads, you'll also need to use some locking to make your code thread-safe.
EDIT
Actually, the busy/wait loop is probably not the way to go if the web service supports BeginGetData/EndGetData semantics. I had a look at some of my code where I do something similar and I use WaitOne to simply wait on the async result and then retrieve it. If your web service doesn't support this then throw a Thread.Sleep -- say for 50-100ms -- in your wait loop to give time for other processes to execute.
Example from my code:
IAsyncResult asyncResult = null;
try
{
asyncResult = _webService.BeginGetData( searchCriteria, null, null );
if (asyncResult.AsyncWaitHandle.WaitOne( _timeOut, false ))
{
result = _webService.EndGetData( asyncResult );
}
}
catch (WebException e)
{
...log the error, clean up...
}

Thanks for your help tvanfosson. I followed your code and have also found a pseudo similar solution that meets my needs exactly using a lambda expression:
private string m_stringData;
public string _stringData{
get
{
//if we don't have a list of departments, fetch from WCF
if (m_stringData == null)
{
StringServiceClient client = new StringServiceClient();
client.GetStringCompleted +=
(sender, e) =>
{
m_stringData = e.Result;
};
client.GetStringAsync();
}
return m_stringData;
}
}
EDIT
Oops... actually this doesn't work either :-(
I ended up making the calls Asynchronously and altering my programming logic to use MVVM pattern and more binding.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Spring Cloud Stream deserialization error handling for Batch processing - batch-processing

Related

Can someone explain to me what's the proper usage of gRPC StreamObserver.onError?

How to modify variables outside of their scope in kotlin?

How to handle exceptions thrown in Wicket custom model?

Have multiple calls wait on the same internal async task

Persisted properties - asynchronously

Categories

Resources