Spring WebFlux Webclient receiving an application/octet-stream file as a Mono - kotlin

I'm prototyping a small Spring WebFlux application in Kotlin. This application needs to GET a tar archive from a remote REST endpoint and store it locally on disk. Sounds simple.
I first created an integration test that starts the spring server and one other WebFlux server with a mock REST endpoint that serves the tar archive.
The test should go like:
1) app: GET mock-server/archive
2) mock-server: response with status 200 and tar archive in body as type attachment
3) app: block until all bytes received, then untar and use files
The problem I'm having is that when I try and collect the bytes into a ByteArray on the app, it blocks forever.
My mock-server/archive routes to the following function:
fun serveArchive(request: ServerRequest): Mono<ServerResponse> {
val tarFile = FileSystemResource(ARCHIVE_PATH)
assert(tarFile.exists() && tarFile.isFile && tarFile.contentLength() != 0L)
return ServerResponse
.ok()
.contentType(MediaType.APPLICATION_OCTET_STREAM)
.contentLength(tarFile.contentLength())
.header("Content-Disposition", "attachment; filename=\"$ARCHIVE_FNAME\"")
.body(fromResource(tarFile))
}
Then my app calls that with the following:
private fun retrieveArchive {
client.get().uri(ARCHIVE_URL).accept(MediaType.APPLICATION_OCTET_STREAM)
.exchange()
.flatMap { response ->
storeArchive(response.bodyToMono())
}.subscribe()
}
private fun storeArchive(archive: Mono<ByteArrayResource>): Mono<Void> {
val archiveContentBytes = archive.block() // <- this blocks forever
val archiveContents = TarArchiveInputStream(archiveContentBytes.inputStream)
// read archive
}
I've see How to best get a byte array from a ClientResponse from Spring WebClient? and that's why I'm trying to use the ByteArrayResource.
When I step through everything, I see that serveArchive seems to be working (the assert statement says the file I'm passing exists and there are some bytes in it). In retrieveArchive I get a 200 and can see all the appropriate information in the .headers (content-type, content-length all look good). When I get down to storeArchive and try to retrieve the bytes from the Mono using block, it simply blocks forever.
I'm at a complete loss of how to debug something like this.

You just have to return the converted body from the flatMap so it transforms from Mono<T> to T:
client.get().uri(ARCHIVE_URL).accept(MediaType.APPLICATION_OCTET_STREAM)
.exchange()
.flatMap { response ->
response.bodyToMono(ByteArrayResource::class.java)
}
.map { archiveContentBytes ->
archiveContentBytes.inputStream
}
.doOnSuccess { inputStream ->
//here is you code to do anything with the inputStream
val archiveContents = TarArchiveInputStream(inputStream)
}
.subscribe()

Related

How to observe Ktor download progress by a Flow

I want to observe the download progress by a Flow,
so I wrote a function like this:
suspend fun downloadFile(file: File, url: String): Flow<Int>{
val client = HttpClient(Android)
return flow{
val httpResponse: HttpResponse = client.get(url) {
onDownload { bytesSentTotal, contentLength ->
val progress = (bytesSentTotal * 100f / contentLength).roundToInt()
emit(progress)
}
}
val responseBody: ByteArray = httpResponse.receive()
file.writeBytes(responseBody)
}
}
but the onDownload will be called only once, and the file will not be downloaded. If I remove the emit(progress) it will work.
io.ktor:ktor-client-android:1.6.7
Use callbackFlow instead of flow. A regular flow can't launch background code, and can only emit values from code inside the flow itself. Meanwhile, a callback flow can launch other work in the background, and then receive callbacks from it.

Spring Webflux : using flatmap + Mono for downloading google cloud storage Blob or just map?

I want to get a bunch of pictures in a google bucket and convert it to base64 using a webflux project.
public Flux<String> getImages() {
final Storage storage = StorageOptions.newBuilder().setProjectId(PROJECT_ID).build().getService();
Page<Blob> blobs = storage.list(BUCKET_NAME);
long startMillis = System.currentTimeMillis();
return
Flux.fromIterable(blobs.iterateAll())
.parallel()
.runOn(Schedulers.boundedElastic())
.map(blob -> blob.getContent(BlobSourceOption.generationMatch())) //blocking ? http request to google storage (returns byte[])
.map(PhotoUtils::processBytesToBase64) //non blocking
.sequential()
.doOnTerminate(() -> LOGGER.info("ELAPSED {}", System.currentTimeMillis() - startMillis));}
It works. I use Blockhound to check for blocking and nothing is detected. But I read several times that any IO operations (as blob.getContent() is) should be wrapped in a flatmap. So it would be something like :
...
return
Flux.fromIterable(blobs.iterateAll())
.parallel()
.runOn(Schedulers.boundedElastic())
.flatMap(blob -> Mono.just(blob.getContent(BlobSourceOption.generationMatch())))
.map(PhotoUtils::processBytesToBase64)
.sequential()
.doOnTerminate(() -> LOGGER.info("ELAPSED {}", System.currentTimeMillis() - startMillis));}
But I feel it redundant.
Which one is the most correct or am I totally missing the point ?

Why Flux.flatMap() doesn't wait for completion of inner publisher?

Could you please explain what exactly happens in Flux/Mono returned by HttpClient.response() ? I thought value generated by http client will NOT be passed downstream until Mono completes but I see that tons of requests are generated which ends up with reactor.netty.internal.shaded.reactor.pool.PoolAcquirePendingLimitException: Pending acquire queue has reached its maximum size of 8 exception. It works as expected (items being processed one by one) if I replace call to testRequest() with Mono.fromCallable { }.
What am I missing ?
Test code:
import org.asynchttpclient.netty.util.ByteBufUtils
import reactor.core.publisher.Flux
import reactor.core.publisher.Mono
import reactor.netty.http.client.HttpClient
import reactor.netty.resources.ConnectionProvider
class Test {
private val client = HttpClient.create(ConnectionProvider.create("meh", 4))
fun main() {
Flux.fromIterable(0..99)
.flatMap { obj ->
println("Creating request for: $obj")
testRequest()
.doOnError { ex ->
println("Failed request for: $obj")
ex.printStackTrace()
}
.map { res ->
obj to res
}
}
.doOnNext { (obj, res) ->
println("Created request for: $obj ${res.length} characters")
}
.collectList().block()!!
}
fun testRequest(): Mono<String> {
return client.get()
.uri("https://projectreactor.io/docs/netty/release/reference/index.html#_connection_pool")
.responseContent()
.reduce(StringBuilder(), { sb, buf ->
val str= ByteBufUtils.byteBuf2String(Charsets.UTF_8, buf)
sb.append(str)
})
.map { it.toString() }
}
}
When you create the ConnectionProvider like this ConnectionProvider.create("meh", 4), this means connection pool with max connections 4 and max pending requests 8. See here more about this.
When you use flatMap this means Transform the elements emitted by this Flux asynchronously into Publishers, then flatten these inner publishers into a single Flux through merging, which allow them to interleave See here more about this.
So what happens is that you are trying to run all requests simultaneously.
So you have two options:
If you want to use flatMap then increase the number of the pending requests.
If you want to keep the number of the pending requests you may consider for example using concatMap instead of flatMap, which means Transform the elements emitted by this Flux asynchronously into Publishers, then flatten these inner publishers into a single Flux, sequentially and preserving order using concatenation. See more here about this.

How to replace blocking code for reading bytes in Kotlin

I have ktor application which expects file from multipart in code like this:
multipart.forEachPart { part ->
when (part) {
is PartData.FileItem -> {
image = part.streamProvider().readAllBytes()
}
else -> // irrelevant
}
}
The Intellij IDEA marks readAllBytes() as inappropriate blocking call since ktor operates on top of coroutines. How to replace this blocking call to the appropriate one?
Given the reputation of Ktor as a non-blocking, suspending IO framework, I was surprised that apparently for FileItem there is nothing else but the blocking InputStream API to retrieve it. Given that, your only option seems to be delegating to the IO dispatcher:
image = withContext(Dispatchers.IO) { part.streamProvider().readBytes() }

WebFlux: Only one item arriving at the backend

On the backend im doing:
#PostMapping(path = "/products", consumes = MediaType.APPLICATION_STREAM_JSON_VALUE)
public void saveProducts(#Valid #RequestBody Flux<Product> products) {
products.subscribe(product -> log.info("product: " + product.toString()));
}
And on the frontend im calling this using:
this.targetWebClient
.post()
.uri(productUri)
.accept(MediaType.APPLICATION_STREAM_JSON)
.contentType(MediaType.APPLICATION_STREAM_JSON)
.body(this.sourceWebClient
.get()
.uri(uriBuilder -> uriBuilder.path(this.sourceEndpoint + "/id")
.queryParam("date", date)
.build())
.accept(MediaType.APPLICATION_STREAM_JSON)
.retrieve()
.bodyToFlux(Product.class), Product.class)
.exchange()
.subscribe();
What happens now is that I have 472 products which need to get saved but only one of them is actually saving. The stream closes after the first and I cant find out why.
If I do:
...
.retrieve()
.bodyToMono(Void.class);
instead, the request isnt even arriving at the backend.
I also tried fix amount of elements:
.body(Flux.just(new Product("123"), new Product("321")...
And with that also only the first arrived.
EDIT
I changed the code:
#PostMapping(path = "/products", consumes =
MediaType.APPLICATION_STREAM_JSON_VALUE)
public Mono<Void> saveProducts(#Valid #RequestBody Flux<Product> products) {
products.subscribe(product -> this.service.saveProduct(product));
return Mono.empty();
}
and:
this.targetWebClient
.post()
.uri(productUri)
.accept(MediaType.APPLICATION_STREAM_JSON)
.contentType(MediaType.APPLICATION_STREAM_JSON)
.body(this.sourceWebClient
.get()
.uri(uriBuilder -> uriBuilder.path(this.sourceEndpoint + "/id")
.queryParam("date", date)
.build())
.accept(MediaType.APPLICATION_STREAM_JSON)
.retrieve()
.bodyToFlux(Product.class), Product.class)
.exchange()
.block();
That led to the behaviour that one product was saved twice (because the backend endpoint was called twice) but again only just one item. And also we got an error on the frontend side:
IOException: Connection reset by peer
Same for:
...
.retrieve()
.bodyToMono(Void.class)
.subscribe();
Doing the following:
this.targetWebClient
.post()
.uri(productUri)
.accept(MediaType.APPLICATION_STREAM_JSON)
.contentType(MediaType.APPLICATION_STREAM_JSON)
.body(this.sourceWebClient
.get()
.uri(uriBuilder -> uriBuilder.path(this.sourceEndpoint + "/id")
.queryParam("date", date)
.build())
.accept(MediaType.APPLICATION_STREAM_JSON)
.retrieve()
.bodyToFlux(Product.class), Product.class)
.retrieve();
Leads to the behaviour that the backend again isnt called at all.
The Reactor documentation does say that nothing happens until you subscribe, but it doesn't mean you should subscribe in your Spring WebFlux code.
Here are a few rules you should follow in Spring WebFlux:
If you need to do something in a reactive fashion, the return type of your method should be Mono or Flux
Within a method returning a reactive typoe, you should never call block or subscribe, toIterable, or any other method that doesn't return a reactive type itself
You should never do I/O-related in side-effects DoOnXYZ operators, as they're not meant for that and this will cause issues at runtime
In your case, your backend should use a reactive repository to save your data and should look like:
#PostMapping(path = "/products", consumes = MediaType.APPLICATION_STREAM_JSON_VALUE)
public Mono<Void> saveProducts(#Valid #RequestBody Flux<Product> products) {
return productRepository.saveAll(products).then();
}
In this case, the Mono<Void> return type means that your controller won't return anything as a response body but will signal still when it's done processing the request. This might explain why you're seeing that behavior - by the time the controller is done processing the request, all products are not saved in the database.
Also, remember the rules noted above. Depending on where your targetWebClient is used, calling .subscribe(); on it might not be the solution. If it's a test method that returns void, you might want to call block on it and get the result to test assertions on it. If this is a component method, then you should probably return a Publisher type as a return value.
EDIT:
#PostMapping(path = "/products", consumes =
MediaType.APPLICATION_STREAM_JSON_VALUE)
public Mono<Void> saveProducts(#Valid #RequestBody Flux<Product> products) {
products.subscribe(product -> this.service.saveProduct(product));
return Mono.empty();
}
Doing this isn't right:
calling subscribe decouples the processing of the request/response from that saveProduct operation. It's like starting that processing in a different executor.
returning Mono.empty() signals Spring WebFlux that you're done right away with the request processing. So Spring WebFlux will close and clean the request/response resources; but your saveProduct process is still running and won't be able to read from the request since Spring WebFlux closed and cleaned it.
As suggested in the comments, you can wrap blocking operations with Reactor (even though it's not advised and you may encounter performance issues) and make sure that you're connecting all the operations in a single reactive pipeline.