Flux from WebClient behaves differently than Flux from File.readLines - spring-webflux

I have two different sources of some IDs I have to do work with. One is from a file, another is from URL. When I'm creating Flux from the Files' lines, I can perfectly well work on it. When I'm switching the Flux-creating function with the one that uses WebClient....get(), I get different results; the WebClient does never get called for some reason.
private Flux<String> retrieveIdListFromFile(String filename) {
try {
return Flux.fromIterable(Files.readAllLines(ResourceUtils.getFile(filename).toPath()));
} catch (IOException e) {
return Flux.error(e);
}
}
Here the WebClient part...
private Flux<String> retrieveIdList() {
return client.get()
.uri(uriBuilder -> uriBuilder.path("capdocuments_201811v2/selectRaw")
.queryParam("q", "-P_Id:[* TO *]")
.queryParam("fq", "DateLastModified:[2010-01-01T00:00:00Z TO 2016-12-31T00:00:00Z]")
.queryParam("fl", "id")
.queryParam("rows", "10")
.queryParam("wt", "csv")
.build())
.retrieve()
.bodyToFlux(String.class);
}
When I do a subscribe(System.out::println)on the WebClient's flux, nothing happens. When I do a blockLast(), it works (URL is called, data returned). I don't get why, and how to correct this, and what I'm doing wrong.
With the flux that originates from the file, even the subscribe works fine. I sort of thought, that Fluxes are interchangable...
When I do a retrieveIdList().log().subscribe():
INFO [main] reactor.Flux.OnAssembly.1 | onSubscribe([Fuseable] FluxOnAssembly.OnAssemblySubscriber)
INFO [main] reactor.Flux.OnAssembly.1 | request(unbounded)
When I do the same with blockLast() instead of subscribe():
INFO [main] reactor.Flux.OnAssembly.1 | onSubscribe([Fuseable] FluxOnAssembly.OnAssemblySubscriber)
INFO [main] reactor.Flux.OnAssembly.1 | request(unbounded)
INFO [reactor-http-nio-4] reactor.Flux.OnAssembly.1 | onNext(id)
.
.
.

Judging from your question update, it seems that nothing is waiting on the processing to finish. I assume this is a batch or CLI application, and not a web application?
Assuming the following:
Flux<User> users = userService.fetchAll();
Calling blockLast on a Flux will trigger the processing and block until the result is there.
Calling subscribe on it will trigger the processing asynchronously; we're seeing the subscriber request elements in your logs, but nothing more. This probably means that the JVM exits before any elements are published - nothing is waiting on the result.
If you're effectively writing some CLI/batch application and not processing requests within a web application, you can block on the final reactive pipeline to get the result. If you wish to write that result to a file or send it to a different service, then you should compose on it with reactor operators.

Related

Mutinty/Quarkus Mid Output Stream Failure Handling

I'm running a quarkus server that streams large datasets to clients. During processing of a dataset, an error can occur, and I'm unsure of how to best handle the situation.
#GET
#Path("{fileName}/example")
#Produces(MediaType.APPLICATION_JSON)
fun example(#PathParam("fileName") fileName: String): Multi<Int> {
return Multi.createFrom().iterable((0 .. 10)).map { if (it != 4) it else throw IllegalArgumentException() }
}
Without any changes, this will stream "[1,2,3" and then stop without closing the connection (CURLs hang). I can handle the issue with ".onFailure().recoverWithCompletion()", but that closes the stream in a healthy fashion (resulting in [1,2,3]). Is there any way to close the connection, but leave the response as malformed? I need a way to communicate to downstream clients that the stream of data is not healthy.

Kafka Streams write an event back to the input topic

in my kafka streams app, I need to re-try processing a message whenever a particular type of exception is thrown in the processing logic.
Rather than wrapping my logic in the RetryTemplate (am using springboot), am considering just simply writing the message back into the input topic, my assumption is that this message will be added to the back of the log in the appropriate partition and it will eventually be re-processed.
Am aware that this would mess up the ordering and am okay with that.
My question is, would kafka streams have an issue when it encounters a message that was supposedly already processed in the past (am assuming kafka streams has a way of marking the messages it has processed especially when exactly is enabled)?
Here is an example of the code am considering for this solution.
val branches = streamsBuilder.stream(inputTopicName)
.mapValues { it -> myServiceObject.executeSomeLogic(it) }
.branch(
{ _, value -> value is successfulResult() }, // success
{ _, error -> error is errorResult() }, // exception was thrown
)
branches[0].to(outputTopicName)
branches[1].to(inputTopicName) //write them back to input as a way of retrying

Stop consuming from KafkaReceiver after a timeout

I have a common rest controller:
private final KafkaReceiver<String, Domain> receiver;
#GetMapping(produces = MediaType.APPLICATION_STREAM_JSON_VALUE)
public Flux<Domain> produceFluxMessages() {
return receiver.receive().map(ConsumerRecord::value)
.timeout(Duration.ofSeconds(2));
}
What I am trying to achieve is to collect messages from Kafka topic for a certain period of time, and then just stop consuming and consider this flux completed. If I remove timeout and open this in a browser, I am getting messages forever, downloading never stops. And with this timeout consuming stops after 2 seconds, but I'm getting an exception:
java.util.concurrent.TimeoutException: Did not observe any item or terminal signal within 2000ms in 'map' (and no fallback has been configured)
Is there a way to successfully complete Flux after timeout?
There's multiple overloads of the timeout() method - you're using the standard one that throws an exception on timeout.
Instead, just use the overloaded timeout method to provide an empty default publisher to fallback to:
timeout(Duration.ofSeconds(2), Mono.empty())
(Note in a general case you could explicitly capture the TimeoutException and fallback to an empty publisher using onErrorResume(TimeoutException.class, e -> Mono.empty()), but that's much less preferable to using the above option where possible.)

Execute multiple downloads and wait for all to complete

I am currently working on an API service that allows 1 or more users to download 1 or more items from an S3 bucket and return the contents to the user. While the downloading is fine, the time taken to download several files is pretty much 100-150 ms * the number of files.
I have tried a few approaches to speeding this up - parallelStream() instead of stream() (which, considering the amount of simultaneous downloads, is at a serious risk of running out of threads), as well as CompleteableFutures, and even creating an ExecutorService, doing the downloads then shutting down the pool. Typically I would only want a few concurrent tasks e.g. 5 at the same time, per request to try and cut down on the number of active threads.
I have tried integrating Spring #Cacheable to store the downloaded files to Redis (the files are readonly) - while this certainly cuts down response times (several ms to retrieve files compared to 100-150 ms), the benefits are only there once the file has been previously retrieved.
What is the best way to handle waiting on multiple async tasks to finish then getting the results, also considering I don't want (or don't think I could) have hundreds of threads opening http connections and downloading all at once?
You're right to be concerned about tying up the common fork/join pool used by default in parallel streams, since I believe it is used for other things like sort operations outside of the Stream api. Rather than saturating the common fork/join pool with an I/O-bound parallel stream, you can create your own fork/join pool for the Stream. See this question to find out how to create an ad hoc ForkJoinPool with the size you want and run a parallel stream in it.
You could also create an ExecutorService with a fixed-size thread pool that would also be independent of the common fork/join pool, and would throttle the requests, using only the threads in the pool. It also lets you specify the number of threads to dedicate:
ExecutorService executor = Executors.newFixedThreadPool(MAX_THREADS_FOR_DOWNLOADS);
try {
List<CompletableFuture<Path>> downloadTasks = s3Paths
.stream()
.map(s3Path -> completableFuture.supplyAsync(() -> mys3Downloader.downloadAndGetPath(s3Path), executor))
.collect(Collectors.toList());
// at this point, all requests are enqueued, and threads will be assigned as they become available
executor.shutdown(); // stops accepting requests, does not interrupt threads,
// items in queue will still get threads when available
// wait for all downloads to complete
CompletableFuture.allOf(downloadTasks.toArray(new CompletableFuture[downloadTasks.size()])).join();
// at this point, all downloads are finished,
// so it's safe to shut down executor completely
} catch (InterruptedException | ExecutionException e) {
e.printStackTrace();
} finally {
executor.shutdownNow(); // important to call this when you're done with the executor.
}
Following #Hank D's lead, you can encapsulate the creation of the executor service to ensure that you do, indeed, call ExecutorService::shutdownNow after using said executor:
private static <VALUE> VALUE execute(
final int nThreads,
final Function<ExecutorService, VALUE> function
) {
ExecutorService executorService = Executors.newFixedThreadPool(nThreads);
try {
return function.apply(executorService);
} catch (final InterruptedException | ExecutionException exception) {
exception.printStackTrace();
} finally {
executorService .shutdownNow(); // important to call this when you're done with the executor service.
}
}
public static void main(final String... arguments) {
// define variables
final List<CompletableFuture<Path>> downloadTasks = execute(
MAX_THREADS_FOR_DOWNLOADS,
executor -> s3Paths
.stream()
.map(s3Path -> completableFuture.supplyAsync(
() -> mys3Downloader.downloadAndGetPath(s3Path),
executor
))
.collect(Collectors.toList())
);
// use downloadTasks
}

What is causing EventStore to throw ConcurrencyException so easily?

Using JOliver EventStore 3.0, and just getting started with simple samples.
I have a simple pub/sub CQRS implementation using NServiceBus. A client sends commands on the bus, a domain server recieves and processes the commands and stores events to the eventstore, which are then published on the bus by the eventstore's dispatcher. a read-model server then subscribes to those events to update the read-model. Nothing fancy, pretty much by-the-book.
It is working, but just in simple tests I am getting lots of concurrency exceptions (intermittantly) on the domain server when the event is stored to the EventStore. It properly retries, but sometimes it hits the 5 retry limit and the command ends up on the error queue.
Where could I start investigating to see what is causing the concurrency exception? I remove the dispatcher and just focus on storing events and it has the same issue.
I'm using RavenDB for persistence of my EventStore. I'm not doing anything fancy, just this:
using (var stream = eventStore.OpenStream(entityId, 0, int.MaxValue))
{
stream.Add(new EventMessage { Body = myEvent });
stream.CommitChanges(Guid.NewGuid());
}
The stack trace for the exception looks like this:
2012-03-17 18:34:01,166 [Worker.14] WARN
NServiceBus.Unicast.UnicastBus [(null)] <(null)> -
EmployeeCommandHandler failed handling message.
EventStore.ConcurrencyException: Exception of type
'EventStore.ConcurrencyException' was thrown. at
EventStore.OptimisticPipelineHook.PreCommit(Commit attempt) in
c:\Code\public\EventStore\src\proj\EventStore.Core\OptimisticPipelineHook.cs:line
55 at EventStore.OptimisticEventStore.Commit(Commit attempt) in
c:\Code\public\EventStore\src\proj\EventStore.Core\OptimisticEventStore.cs:line
90 at EventStore.OptimisticEventStream.PersistChanges(Guid
commitId) in
c:\Code\public\EventStore\src\proj\EventStore.Core\OptimisticEventStream.cs:line
168 at EventStore.OptimisticEventStream.CommitChanges(Guid
commitId) in
c:\Code\public\EventStore\src\proj\EventStore.Core\OptimisticEventStream.cs:line
149 at CQRSTest3.Domain.Extensions.StoreEvent(IStoreEvents
eventStore, Guid entityId, Object evt) in
C:\dev\test\CQRSTest3\CQRSTest3.Domain\Extensions.cs:line 13 at
CQRSTest3.Domain.ComandHandlers.EmployeeCommandHandler.Handle(ChangeEmployeeSalary
message) in
C:\dev\test\CQRSTest3\CQRSTest3.Domain\ComandHandlers\Emplo
yeeCommandHandler.cs:line 55
I figured it out. Had to dig through source code to find it though. I wish this was better documented! Here's my new eventstore wireup:
EventStore = Wireup.Init()
.UsingRavenPersistence("RavenDB")
.ConsistentQueries()
.InitializeStorageEngine()
.Build();
I had to add .ConsistentQueries() in order for the raven persistence provider to internally use WaitForNonStaleResults on the queries eventstore was making to raven.
Basically when I add a new event, and then try to add another before raven has caught up with indexing, the stream revision was not up to date. The second event would step on the first one.