Spring Webflux : using flatmap + Mono for downloading google cloud storage Blob or just map? - spring-webflux

I want to get a bunch of pictures in a google bucket and convert it to base64 using a webflux project.
public Flux<String> getImages() {
final Storage storage = StorageOptions.newBuilder().setProjectId(PROJECT_ID).build().getService();
Page<Blob> blobs = storage.list(BUCKET_NAME);
long startMillis = System.currentTimeMillis();
return
Flux.fromIterable(blobs.iterateAll())
.parallel()
.runOn(Schedulers.boundedElastic())
.map(blob -> blob.getContent(BlobSourceOption.generationMatch())) //blocking ? http request to google storage (returns byte[])
.map(PhotoUtils::processBytesToBase64) //non blocking
.sequential()
.doOnTerminate(() -> LOGGER.info("ELAPSED {}", System.currentTimeMillis() - startMillis));}
It works. I use Blockhound to check for blocking and nothing is detected. But I read several times that any IO operations (as blob.getContent() is) should be wrapped in a flatmap. So it would be something like :
...
return
Flux.fromIterable(blobs.iterateAll())
.parallel()
.runOn(Schedulers.boundedElastic())
.flatMap(blob -> Mono.just(blob.getContent(BlobSourceOption.generationMatch())))
.map(PhotoUtils::processBytesToBase64)
.sequential()
.doOnTerminate(() -> LOGGER.info("ELAPSED {}", System.currentTimeMillis() - startMillis));}
But I feel it redundant.
Which one is the most correct or am I totally missing the point ?

Related

Flux collectList() on list of WebClient exchanges always empty

I'm trying to execute a list requests using WebClient, then filter them finding the first one that succeed (if any) and return that. Or fall back to a default response if non succeeded.
The problem I'm facing is that when I call .collectList() on a Flux<ServerResponse>, the list is always empty. I would have expected the list to contain N number of ServerResponse based on the number of requests I issued earlier.
public Mono<ServerResponse> retry(ServerRequest request) {
return Flux.fromIterable(request.headers().header(SEQUENCE_HEADER_NAME))
.map(URI::create)
// Build a "list" of responses
.flatMap(uri -> webClientBuilder.baseUrl(uri.toString()).build()
.method(Objects.requireNonNull(request.method()))
.headers(headers -> request.headers().asHttpHeaders().forEach((key, values) -> {
if (!SEQUENCE_HEADER_NAME.equals(key)) {
headers.addAll(key, values);
}
}))
.body(BodyInserters.fromDataBuffers(request.body(BodyExtractors.toDataBuffers())))
.exchange()
.flatMap(clientResponse -> ServerResponse.status(clientResponse.statusCode())
.headers(headers -> headers.addAll(clientResponse.headers().asHttpHeaders()))
.body(BodyInserters.fromDataBuffers(clientResponse.body(BodyExtractors.toDataBuffers()))))
)
// "Wait" for all of them to complete so we can filter
.collectList()
.flatMap(clientResponses -> {
List<ServerResponse> filteredResponses = clientResponses.stream()
.filter(response -> response.statusCode().is2xxSuccessful())
.collect(Collectors.toList());
if (filteredResponses.isEmpty()) {
log.error("No request succeeded; defaulting to {}", HttpStatus.BAD_REQUEST.toString());
return ServerResponse.badRequest().build();
}
if (filteredResponses.size() > 1) {
log.error("Multiple requests succeeded; defaulting to {}", HttpStatus.BAD_REQUEST.toString());
return ServerResponse.badRequest().build();
}
return Mono.just(filteredResponses.get(0));
});
}
Any ideas why .collectList() always returns an empty list?
Well, it seems to me you have a confused requirement in that you want the First Mono that responds but you are trying to put that functionality into a Flux which is meant to process all items in the flow efficiently. Mono in Webflux is meant to create a flow that will perform a series of transformations on the item in the flow efficiently. Nothing in your requirement of testing a bunch of URIs for the first one that succeeds is what WebFlux is good for so I have to question why try to force that into the framework.
You might argue that a Flux is giving you better asynchronous processing but I don't think that's the case when it is a bunch of WebClient calls. WebClient is still HTTP under the hood and so each item in the flow stops and starts around WebClient. If you want to do HTTP asynchronously you should use a ThreadPool and Callable.

How to replace blocking code for reading bytes in Kotlin

I have ktor application which expects file from multipart in code like this:
multipart.forEachPart { part ->
when (part) {
is PartData.FileItem -> {
image = part.streamProvider().readAllBytes()
}
else -> // irrelevant
}
}
The Intellij IDEA marks readAllBytes() as inappropriate blocking call since ktor operates on top of coroutines. How to replace this blocking call to the appropriate one?
Given the reputation of Ktor as a non-blocking, suspending IO framework, I was surprised that apparently for FileItem there is nothing else but the blocking InputStream API to retrieve it. Given that, your only option seems to be delegating to the IO dispatcher:
image = withContext(Dispatchers.IO) { part.streamProvider().readBytes() }

Cache the result of a Mono from a WebClient call in a Spring WebFlux web application

I am looking to cache a Mono (only if it is successful) which is the result of a WebClient call.
From reading the project reactor addons docs I don't feel that CacheMono is a good fit as it caches the errors as well which I do not want.
So instead of using CacheMono I am doing the below:
Cache<MyRequestObject, Mono<MyResponseObject>> myCaffeineCache =
Caffeine.newBuilder()
.maximumSize(100)
.expireAfterWrite(Duration.ofSeconds(60))
.build();
MyRequestObject myRequestObject = ...;
Mono<MyResponseObject> myResponseObject = myCaffeineCache.get(myRequestObject,
requestAsKey -> WebClient.create()
.post()
.uri("http://www.example.com")
.syncBody(requestAsKey)
.retrieve()
.bodyToMono(MyResponseObject.class)
.cache()
.doOnError(t -> myCaffeineCache.invalidate(requestAsKey)));
Here I am calling cache on the Mono and then adding it to the caffeine cache.
Any errors will enter doOnError to invalidate the cache.
Is this a valid approach to caching a Mono WebClient response?
This is one of the very few use cases where you'd be actually allowed to call non-reactive libraries and wrap them with reactive types, and have processing done in side-effects operators like doOnXYZ, because:
Caffeine is an in-memory cache, so as far as I know there's no I/O involved
Caches often don't offer strong guarantees about caching values (it's very much "fire and forget)
You can then in this case query the cache to see if a cached version is there (wrap it and return right away), and cache a successful real response in a doOn operator, like this:
public class MyService {
private WebClient client;
private Cache<MyRequestObject, MyResponseObject> myCaffeineCache;
public MyService() {
this.client = WebClient.create();
this.myCaffeineCache = Caffeine.newBuilder().maximumSize(100)
.expireAfterWrite(Duration.ofSeconds(60)).build();
}
public Mono<MyResponseObject> fetchResponse(MyRequestObject request) {
MyResponseObject cachedVersion = this.myCaffeineCache.get(myRequestObject);
if (cachedVersion != null) {
return Mono.just(cachedVersion);
} else {
return this.client.post()
.uri("http://www.example.com")
.syncBody(request.getKey())
.retrieve()
.bodyToMono(MyResponseObject.class)
.doOnNext(response -> this.myCaffeineCache.put(request.getKey(), response));
}
}
Note that I wouldn't cache reactive types here, since there's no I/O involved nor backpressure once the value is returned by the cache. On the contrary, it's making things more difficult with subscription and other reactive streams constraints.
Also you're right about the cache operator since it isn't about caching the value per se, but more about replaying what happened to other subscribers. I believe that cache and replay operators are actually synonyms for Flux.
Actually, you don't have to save errors with CacheMono.
private Cache<MyRequestObject, MyResponseObject> myCaffeineCache;
...
Mono<MyResponseObject> myResponseObject =
CacheMono.lookup(key -> Mono.justOrEmpty(myCaffeineCache.getIfPresent(key))
.map(Signal::next), myRequestObject)
.onCacheMissResume(() -> /* Your web client or other Mono here */)
.andWriteWith((key, signal) -> Mono.fromRunnable(() ->
Optional.ofNullable(signal.get())
.ifPresent(value -> myCaffeineCache.put(key, value))));
When you switch to external cache, this may be usefull. Don't forget using reactive clients for external caches.

WebFlux: Only one item arriving at the backend

On the backend im doing:
#PostMapping(path = "/products", consumes = MediaType.APPLICATION_STREAM_JSON_VALUE)
public void saveProducts(#Valid #RequestBody Flux<Product> products) {
products.subscribe(product -> log.info("product: " + product.toString()));
}
And on the frontend im calling this using:
this.targetWebClient
.post()
.uri(productUri)
.accept(MediaType.APPLICATION_STREAM_JSON)
.contentType(MediaType.APPLICATION_STREAM_JSON)
.body(this.sourceWebClient
.get()
.uri(uriBuilder -> uriBuilder.path(this.sourceEndpoint + "/id")
.queryParam("date", date)
.build())
.accept(MediaType.APPLICATION_STREAM_JSON)
.retrieve()
.bodyToFlux(Product.class), Product.class)
.exchange()
.subscribe();
What happens now is that I have 472 products which need to get saved but only one of them is actually saving. The stream closes after the first and I cant find out why.
If I do:
...
.retrieve()
.bodyToMono(Void.class);
instead, the request isnt even arriving at the backend.
I also tried fix amount of elements:
.body(Flux.just(new Product("123"), new Product("321")...
And with that also only the first arrived.
EDIT
I changed the code:
#PostMapping(path = "/products", consumes =
MediaType.APPLICATION_STREAM_JSON_VALUE)
public Mono<Void> saveProducts(#Valid #RequestBody Flux<Product> products) {
products.subscribe(product -> this.service.saveProduct(product));
return Mono.empty();
}
and:
this.targetWebClient
.post()
.uri(productUri)
.accept(MediaType.APPLICATION_STREAM_JSON)
.contentType(MediaType.APPLICATION_STREAM_JSON)
.body(this.sourceWebClient
.get()
.uri(uriBuilder -> uriBuilder.path(this.sourceEndpoint + "/id")
.queryParam("date", date)
.build())
.accept(MediaType.APPLICATION_STREAM_JSON)
.retrieve()
.bodyToFlux(Product.class), Product.class)
.exchange()
.block();
That led to the behaviour that one product was saved twice (because the backend endpoint was called twice) but again only just one item. And also we got an error on the frontend side:
IOException: Connection reset by peer
Same for:
...
.retrieve()
.bodyToMono(Void.class)
.subscribe();
Doing the following:
this.targetWebClient
.post()
.uri(productUri)
.accept(MediaType.APPLICATION_STREAM_JSON)
.contentType(MediaType.APPLICATION_STREAM_JSON)
.body(this.sourceWebClient
.get()
.uri(uriBuilder -> uriBuilder.path(this.sourceEndpoint + "/id")
.queryParam("date", date)
.build())
.accept(MediaType.APPLICATION_STREAM_JSON)
.retrieve()
.bodyToFlux(Product.class), Product.class)
.retrieve();
Leads to the behaviour that the backend again isnt called at all.
The Reactor documentation does say that nothing happens until you subscribe, but it doesn't mean you should subscribe in your Spring WebFlux code.
Here are a few rules you should follow in Spring WebFlux:
If you need to do something in a reactive fashion, the return type of your method should be Mono or Flux
Within a method returning a reactive typoe, you should never call block or subscribe, toIterable, or any other method that doesn't return a reactive type itself
You should never do I/O-related in side-effects DoOnXYZ operators, as they're not meant for that and this will cause issues at runtime
In your case, your backend should use a reactive repository to save your data and should look like:
#PostMapping(path = "/products", consumes = MediaType.APPLICATION_STREAM_JSON_VALUE)
public Mono<Void> saveProducts(#Valid #RequestBody Flux<Product> products) {
return productRepository.saveAll(products).then();
}
In this case, the Mono<Void> return type means that your controller won't return anything as a response body but will signal still when it's done processing the request. This might explain why you're seeing that behavior - by the time the controller is done processing the request, all products are not saved in the database.
Also, remember the rules noted above. Depending on where your targetWebClient is used, calling .subscribe(); on it might not be the solution. If it's a test method that returns void, you might want to call block on it and get the result to test assertions on it. If this is a component method, then you should probably return a Publisher type as a return value.
EDIT:
#PostMapping(path = "/products", consumes =
MediaType.APPLICATION_STREAM_JSON_VALUE)
public Mono<Void> saveProducts(#Valid #RequestBody Flux<Product> products) {
products.subscribe(product -> this.service.saveProduct(product));
return Mono.empty();
}
Doing this isn't right:
calling subscribe decouples the processing of the request/response from that saveProduct operation. It's like starting that processing in a different executor.
returning Mono.empty() signals Spring WebFlux that you're done right away with the request processing. So Spring WebFlux will close and clean the request/response resources; but your saveProduct process is still running and won't be able to read from the request since Spring WebFlux closed and cleaned it.
As suggested in the comments, you can wrap blocking operations with Reactor (even though it's not advised and you may encounter performance issues) and make sure that you're connecting all the operations in a single reactive pipeline.

Spring WebFlux Webclient receiving an application/octet-stream file as a Mono

I'm prototyping a small Spring WebFlux application in Kotlin. This application needs to GET a tar archive from a remote REST endpoint and store it locally on disk. Sounds simple.
I first created an integration test that starts the spring server and one other WebFlux server with a mock REST endpoint that serves the tar archive.
The test should go like:
1) app: GET mock-server/archive
2) mock-server: response with status 200 and tar archive in body as type attachment
3) app: block until all bytes received, then untar and use files
The problem I'm having is that when I try and collect the bytes into a ByteArray on the app, it blocks forever.
My mock-server/archive routes to the following function:
fun serveArchive(request: ServerRequest): Mono<ServerResponse> {
val tarFile = FileSystemResource(ARCHIVE_PATH)
assert(tarFile.exists() && tarFile.isFile && tarFile.contentLength() != 0L)
return ServerResponse
.ok()
.contentType(MediaType.APPLICATION_OCTET_STREAM)
.contentLength(tarFile.contentLength())
.header("Content-Disposition", "attachment; filename=\"$ARCHIVE_FNAME\"")
.body(fromResource(tarFile))
}
Then my app calls that with the following:
private fun retrieveArchive {
client.get().uri(ARCHIVE_URL).accept(MediaType.APPLICATION_OCTET_STREAM)
.exchange()
.flatMap { response ->
storeArchive(response.bodyToMono())
}.subscribe()
}
private fun storeArchive(archive: Mono<ByteArrayResource>): Mono<Void> {
val archiveContentBytes = archive.block() // <- this blocks forever
val archiveContents = TarArchiveInputStream(archiveContentBytes.inputStream)
// read archive
}
I've see How to best get a byte array from a ClientResponse from Spring WebClient? and that's why I'm trying to use the ByteArrayResource.
When I step through everything, I see that serveArchive seems to be working (the assert statement says the file I'm passing exists and there are some bytes in it). In retrieveArchive I get a 200 and can see all the appropriate information in the .headers (content-type, content-length all look good). When I get down to storeArchive and try to retrieve the bytes from the Mono using block, it simply blocks forever.
I'm at a complete loss of how to debug something like this.
You just have to return the converted body from the flatMap so it transforms from Mono<T> to T:
client.get().uri(ARCHIVE_URL).accept(MediaType.APPLICATION_OCTET_STREAM)
.exchange()
.flatMap { response ->
response.bodyToMono(ByteArrayResource::class.java)
}
.map { archiveContentBytes ->
archiveContentBytes.inputStream
}
.doOnSuccess { inputStream ->
//here is you code to do anything with the inputStream
val archiveContents = TarArchiveInputStream(inputStream)
}
.subscribe()