Project Reactor TcpServer -> Process incoming chunked message in its entirety - spring-webflux

I have a TcpServer created with Project Reactor, where I want to read in a large String (potentially XML) on the inbound channel, and perform some work on it, and send this back out on the outbound channel.
For me to do the work, I need to have the incoming data in its entirety, without it being chunked. I cannot seem to handle correctly aggregating the data together. Most of the examples in the reactor test code base are just outputting a simple String constant, or a non chunked version of the inbound eg:
https://github.com/reactor/reactor-netty/blob/main/reactor-netty-core/src/test/java/reactor/netty/tcp/TcpServerTests.java#L444
I have tried to String::concat the incoming data like so
inbound.receive().asString().reduce(String::concat)
But the subsequent flatmap never seems to be called.
Perhaps there is something with adding a channelHandler, but I was unable to find something that would aggregate, similar to the JsonObjectDecoder
for example:
TcpServer.create().doOnConnection(conn -> connection.addHandlerLast(new AggregatingStringDecoder))`
import reactor.netty.DisposableServer;
import reactor.netty.tcp.TcpServer;
public class Application {
public static void main(String[] args) {
DisposableServer server =
TcpServer.create()
//inbound.connection.addHandlerLast() Is there a combiner I can use here?
.handle((inbound, outbound) -> {
//Tried to concat here, but then the flatMap is never called.
//return inbound.receive().asString().reduce(String::concat)
return inbound.receive().asString().flatMap(s -> {
//this is chunked, but I need the S to be the complete string.
NettyOutbound out = outbound.sendString(Mono.just(s.toUpperCase()));
return out
})
.bindNow();
server.onDispose()
.block();
}
}

Related

Spring Cloud Stream deserialization error handling for Batch processing

I have a question about handling deserialization exceptions in Spring Cloud Stream while processing batches (i.e. batch-mode: true).
Per the documentation here, https://docs.spring.io/spring-kafka/docs/2.5.12.RELEASE/reference/html/#error-handling-deserializer, (looking at the implementation of FailedFooProvider), it looks like this function should return a subclass of the original message.
Is the intent here that a list of both Foo's and BadFoo's will end up at the original #StreamListener method, and then it will be up to the code (i.e. me) to sort them out and handle separately? I suspect this is the case, as I've read that the automated DLQ sending isn't desirable for batch error handling, as it would resubmit the whole batch.
And if this is the case, what if there is more than one message type received by the app via different #StreamListener's, say Foo's and Bar's. What type should the value function return in that case? Below is the pseudo code to illustrate the second question?
#StreamListener
public void readFoos(List<Foo> foos) {
List<> badFoos = foos.stream()
.filter(f -> f instanceof BadFoo)
.map(f -> (BadFoo) f)
.collect(Collectors.toList());
// logic
}
#StreamListener
public void readBars(List<Bar> bars) {
// logic
}
// Updated to return Object and let apply() determine subclass
public class FailedFooProvider implements Function<FailedDeserializationInfo, Object> {
#Override
public Object apply(FailedDeserializationInfo info) {
if (info.getTopics().equals("foo-topic") {
return new BadFoo(info);
}
else if (info.getTopics().equals("bar-topic") {
return new BadBar(info);
}
}
}
Yes, the list will contain the function result for failed deserializations; the application needs to handle them.
The function needs to return the same type that would have been returned by a successful deserialization.
You can't use conditions with batch listeners. If the list has a mixture of Foos and Bars, they all go to the same listener.

Flux collectList() on list of WebClient exchanges always empty

I'm trying to execute a list requests using WebClient, then filter them finding the first one that succeed (if any) and return that. Or fall back to a default response if non succeeded.
The problem I'm facing is that when I call .collectList() on a Flux<ServerResponse>, the list is always empty. I would have expected the list to contain N number of ServerResponse based on the number of requests I issued earlier.
public Mono<ServerResponse> retry(ServerRequest request) {
return Flux.fromIterable(request.headers().header(SEQUENCE_HEADER_NAME))
.map(URI::create)
// Build a "list" of responses
.flatMap(uri -> webClientBuilder.baseUrl(uri.toString()).build()
.method(Objects.requireNonNull(request.method()))
.headers(headers -> request.headers().asHttpHeaders().forEach((key, values) -> {
if (!SEQUENCE_HEADER_NAME.equals(key)) {
headers.addAll(key, values);
}
}))
.body(BodyInserters.fromDataBuffers(request.body(BodyExtractors.toDataBuffers())))
.exchange()
.flatMap(clientResponse -> ServerResponse.status(clientResponse.statusCode())
.headers(headers -> headers.addAll(clientResponse.headers().asHttpHeaders()))
.body(BodyInserters.fromDataBuffers(clientResponse.body(BodyExtractors.toDataBuffers()))))
)
// "Wait" for all of them to complete so we can filter
.collectList()
.flatMap(clientResponses -> {
List<ServerResponse> filteredResponses = clientResponses.stream()
.filter(response -> response.statusCode().is2xxSuccessful())
.collect(Collectors.toList());
if (filteredResponses.isEmpty()) {
log.error("No request succeeded; defaulting to {}", HttpStatus.BAD_REQUEST.toString());
return ServerResponse.badRequest().build();
}
if (filteredResponses.size() > 1) {
log.error("Multiple requests succeeded; defaulting to {}", HttpStatus.BAD_REQUEST.toString());
return ServerResponse.badRequest().build();
}
return Mono.just(filteredResponses.get(0));
});
}
Any ideas why .collectList() always returns an empty list?
Well, it seems to me you have a confused requirement in that you want the First Mono that responds but you are trying to put that functionality into a Flux which is meant to process all items in the flow efficiently. Mono in Webflux is meant to create a flow that will perform a series of transformations on the item in the flow efficiently. Nothing in your requirement of testing a bunch of URIs for the first one that succeeds is what WebFlux is good for so I have to question why try to force that into the framework.
You might argue that a Flux is giving you better asynchronous processing but I don't think that's the case when it is a bunch of WebClient calls. WebClient is still HTTP under the hood and so each item in the flow stops and starts around WebClient. If you want to do HTTP asynchronously you should use a ThreadPool and Callable.

WebFlux: Only one item arriving at the backend

On the backend im doing:
#PostMapping(path = "/products", consumes = MediaType.APPLICATION_STREAM_JSON_VALUE)
public void saveProducts(#Valid #RequestBody Flux<Product> products) {
products.subscribe(product -> log.info("product: " + product.toString()));
}
And on the frontend im calling this using:
this.targetWebClient
.post()
.uri(productUri)
.accept(MediaType.APPLICATION_STREAM_JSON)
.contentType(MediaType.APPLICATION_STREAM_JSON)
.body(this.sourceWebClient
.get()
.uri(uriBuilder -> uriBuilder.path(this.sourceEndpoint + "/id")
.queryParam("date", date)
.build())
.accept(MediaType.APPLICATION_STREAM_JSON)
.retrieve()
.bodyToFlux(Product.class), Product.class)
.exchange()
.subscribe();
What happens now is that I have 472 products which need to get saved but only one of them is actually saving. The stream closes after the first and I cant find out why.
If I do:
...
.retrieve()
.bodyToMono(Void.class);
instead, the request isnt even arriving at the backend.
I also tried fix amount of elements:
.body(Flux.just(new Product("123"), new Product("321")...
And with that also only the first arrived.
EDIT
I changed the code:
#PostMapping(path = "/products", consumes =
MediaType.APPLICATION_STREAM_JSON_VALUE)
public Mono<Void> saveProducts(#Valid #RequestBody Flux<Product> products) {
products.subscribe(product -> this.service.saveProduct(product));
return Mono.empty();
}
and:
this.targetWebClient
.post()
.uri(productUri)
.accept(MediaType.APPLICATION_STREAM_JSON)
.contentType(MediaType.APPLICATION_STREAM_JSON)
.body(this.sourceWebClient
.get()
.uri(uriBuilder -> uriBuilder.path(this.sourceEndpoint + "/id")
.queryParam("date", date)
.build())
.accept(MediaType.APPLICATION_STREAM_JSON)
.retrieve()
.bodyToFlux(Product.class), Product.class)
.exchange()
.block();
That led to the behaviour that one product was saved twice (because the backend endpoint was called twice) but again only just one item. And also we got an error on the frontend side:
IOException: Connection reset by peer
Same for:
...
.retrieve()
.bodyToMono(Void.class)
.subscribe();
Doing the following:
this.targetWebClient
.post()
.uri(productUri)
.accept(MediaType.APPLICATION_STREAM_JSON)
.contentType(MediaType.APPLICATION_STREAM_JSON)
.body(this.sourceWebClient
.get()
.uri(uriBuilder -> uriBuilder.path(this.sourceEndpoint + "/id")
.queryParam("date", date)
.build())
.accept(MediaType.APPLICATION_STREAM_JSON)
.retrieve()
.bodyToFlux(Product.class), Product.class)
.retrieve();
Leads to the behaviour that the backend again isnt called at all.
The Reactor documentation does say that nothing happens until you subscribe, but it doesn't mean you should subscribe in your Spring WebFlux code.
Here are a few rules you should follow in Spring WebFlux:
If you need to do something in a reactive fashion, the return type of your method should be Mono or Flux
Within a method returning a reactive typoe, you should never call block or subscribe, toIterable, or any other method that doesn't return a reactive type itself
You should never do I/O-related in side-effects DoOnXYZ operators, as they're not meant for that and this will cause issues at runtime
In your case, your backend should use a reactive repository to save your data and should look like:
#PostMapping(path = "/products", consumes = MediaType.APPLICATION_STREAM_JSON_VALUE)
public Mono<Void> saveProducts(#Valid #RequestBody Flux<Product> products) {
return productRepository.saveAll(products).then();
}
In this case, the Mono<Void> return type means that your controller won't return anything as a response body but will signal still when it's done processing the request. This might explain why you're seeing that behavior - by the time the controller is done processing the request, all products are not saved in the database.
Also, remember the rules noted above. Depending on where your targetWebClient is used, calling .subscribe(); on it might not be the solution. If it's a test method that returns void, you might want to call block on it and get the result to test assertions on it. If this is a component method, then you should probably return a Publisher type as a return value.
EDIT:
#PostMapping(path = "/products", consumes =
MediaType.APPLICATION_STREAM_JSON_VALUE)
public Mono<Void> saveProducts(#Valid #RequestBody Flux<Product> products) {
products.subscribe(product -> this.service.saveProduct(product));
return Mono.empty();
}
Doing this isn't right:
calling subscribe decouples the processing of the request/response from that saveProduct operation. It's like starting that processing in a different executor.
returning Mono.empty() signals Spring WebFlux that you're done right away with the request processing. So Spring WebFlux will close and clean the request/response resources; but your saveProduct process is still running and won't be able to read from the request since Spring WebFlux closed and cleaned it.
As suggested in the comments, you can wrap blocking operations with Reactor (even though it's not advised and you may encounter performance issues) and make sure that you're connecting all the operations in a single reactive pipeline.

How to wrap a Flux with a blocking operation in the subscribe?

In the documentation it is written that you should wrap blocking code into a Mono: http://projectreactor.io/docs/core/release/reference/#faq.wrap-blocking
But it is not written how to actually do it.
I have the following code:
#PostMapping(path = "some-path", consumes = MediaType.APPLICATION_STREAM_JSON_VALUE)
public Mono<Void> doeSomething(#Valid #RequestBody Flux<Something> something) {
something.subscribe(something -> {
// some blocking operation
});
// how to return Mono<Void> here?
}
The first problem I have here is that I need to return something but I cant.
If I would return a Mono.empty for example the request would be closed before the work of the flux is done.
The second problem is: how do I actually wrap the blocking code like it is suggested in the documentation:
Mono blockingWrapper = Mono.fromCallable(() -> {
return /* make a remote synchronous call */
});
blockingWrapper = blockingWrapper.subscribeOn(Schedulers.elastic());
You should not call subscribe within a controller handler, but just build a reactive pipeline and return it. Ultimately, the HTTP client will request data (through the Spring WebFlux engine) and that's what subscribes and requests data to the pipeline.
Subscribing manually will decouple the request processing from that other operation, which will 1) remove any guarantee about the order of operations and 2) break the processing if that other operation is using HTTP resources (such as the request body).
In this case, the source is not blocking, but only the transform operation is. So we'd better use publishOn to signal that the rest of the chain should be executed on a specific Scheduler. If the operation here is I/O bound, then Schedulers.elastic() is the best choice, if it's CPU-bound then Schedulers .paralell is better. Here's an example:
#PostMapping(path = "/some-path", consumes = MediaType.APPLICATION_STREAM_JSON_VALUE)
public Mono<Void> doSomething(#Valid #RequestBody Flux<Something> something) {
return something.collectList()
.publishOn(Schedulers.elastic())
.map(things -> {
return processThings(things);
})
.then();
}
public ProcessingResult processThings(List<Something> things) {
//...
}
For more information on that topic, check out the Scheduler section in the reactor docs. If your application tends to do a lot of things like this, you're losing a lot of the benefits of reactive streams and you might consider switching to a Servlet-based model where you can configure thread pools accordingly.

Apache Camel : GBs of data from database routed to JMS endpoint

I've done a few small projects in camel now but one thing I'm struggling to understand is how to deal with big data (that doesn't fit into memory) when consuming in camel routes.
I have a database containing a couple of GBs worth of data that I would like to route using camel. Obviously reading all data into memory isn't an option.
If I were doing this as a standalone app I would have code that paged through the data and send chunks to my JMS enpoint. I'd like to use camel as it provides a nice pattern. If I were consuming from a file I could use the streaming() call.
Also should I use camel-sql/camel-jdbc/camel-jpa or use a bean to read from my database.
Hope everyone is still with me. I'm more familiar with the Java DSL but would appreciate any help/suggestions people can provide.
Update : 2-MAY-2012
So I've had some time to play around with this and I think what I'm actually doing is abusing the concept of a Producer so that I can use it in a route.
public class MyCustomRouteBuilder extends RouteBuilder {
public void configure(){
from("timer:foo?period=60s").to("mycustomcomponent:TEST");
from("direct:msg").process(new Processor() {
public void process(Exchange ex) throws Exception{
System.out.println("Receiving value" : + ex.getIn().getBody() );
}
}
}
}
My producer looks something like the following. For clarity I've not included the CustomEndpoint or CustomComponent as it just seems to be a thin wrapper.
public class MyCustomProducer extends DefaultProducer{
Endpoint e;
CamelContext c;
public MyCustomProducer(Endpoint epoint){
super(endpoint)
this.e = epoint;
this.c = e.getCamelContext();
}
public void process(Exchange ex) throws Exceptions{
Endpoint directEndpoint = c.getEndpoint("direct:msg");
ProducerTemplate t = new DefaultProducerTemplate(c);
// Simulate streaming operation / chunking of BIG data.
for (int i=0; i <20 ; i++){
t.start();
String s ="Value " + i ;
t.sendBody(directEndpoint, value)
t.stop();
}
}
}
Firstly the above doesn't seem very clean. It seems like the cleanest way to perform this would be to populate a jms queue (in place of direct:msg) via a scheduled quartz job that my camel route then consumes so that I can have more flexibility over the message size received within my camel pipelines. However I quite liked the semantics of setting up time based activations as part of the Route.
Does anyone have any thoughts on the best way to do this.
In my understanding, all you need to do is:
from("jpa:SomeEntity" +
"?consumer.query=select e from SomeEntity e where e.processed = false" +
"&maximumResults=150" +
"&consumeDelete=false")
.to("jms:queue:entities");
maximumResults defines a limit of how many entities you get per query.
When you finish the processing of an entity instance, you need to set e.processed = true; and persist() it, so that the entity won't be processed again.
One way to do that is with the #Consumed annotation:
class SomeEntity {
#Consumed
public void markAsProcessed() {
setProcessed(true);
}
}
Another thing, you need to be careful with is how you serialize the entity before sending it to the queue. You might need to use an enricher between the from and to.