Mutinty/Quarkus Mid Output Stream Failure Handling - kotlin

I'm running a quarkus server that streams large datasets to clients. During processing of a dataset, an error can occur, and I'm unsure of how to best handle the situation.
#GET
#Path("{fileName}/example")
#Produces(MediaType.APPLICATION_JSON)
fun example(#PathParam("fileName") fileName: String): Multi<Int> {
return Multi.createFrom().iterable((0 .. 10)).map { if (it != 4) it else throw IllegalArgumentException() }
}
Without any changes, this will stream "[1,2,3" and then stop without closing the connection (CURLs hang). I can handle the issue with ".onFailure().recoverWithCompletion()", but that closes the stream in a healthy fashion (resulting in [1,2,3]). Is there any way to close the connection, but leave the response as malformed? I need a way to communicate to downstream clients that the stream of data is not healthy.

Related

Kafka Streams write an event back to the input topic

in my kafka streams app, I need to re-try processing a message whenever a particular type of exception is thrown in the processing logic.
Rather than wrapping my logic in the RetryTemplate (am using springboot), am considering just simply writing the message back into the input topic, my assumption is that this message will be added to the back of the log in the appropriate partition and it will eventually be re-processed.
Am aware that this would mess up the ordering and am okay with that.
My question is, would kafka streams have an issue when it encounters a message that was supposedly already processed in the past (am assuming kafka streams has a way of marking the messages it has processed especially when exactly is enabled)?
Here is an example of the code am considering for this solution.
val branches = streamsBuilder.stream(inputTopicName)
.mapValues { it -> myServiceObject.executeSomeLogic(it) }
.branch(
{ _, value -> value is successfulResult() }, // success
{ _, error -> error is errorResult() }, // exception was thrown
)
branches[0].to(outputTopicName)
branches[1].to(inputTopicName) //write them back to input as a way of retrying

Stop consuming from KafkaReceiver after a timeout

I have a common rest controller:
private final KafkaReceiver<String, Domain> receiver;
#GetMapping(produces = MediaType.APPLICATION_STREAM_JSON_VALUE)
public Flux<Domain> produceFluxMessages() {
return receiver.receive().map(ConsumerRecord::value)
.timeout(Duration.ofSeconds(2));
}
What I am trying to achieve is to collect messages from Kafka topic for a certain period of time, and then just stop consuming and consider this flux completed. If I remove timeout and open this in a browser, I am getting messages forever, downloading never stops. And with this timeout consuming stops after 2 seconds, but I'm getting an exception:
java.util.concurrent.TimeoutException: Did not observe any item or terminal signal within 2000ms in 'map' (and no fallback has been configured)
Is there a way to successfully complete Flux after timeout?
There's multiple overloads of the timeout() method - you're using the standard one that throws an exception on timeout.
Instead, just use the overloaded timeout method to provide an empty default publisher to fallback to:
timeout(Duration.ofSeconds(2), Mono.empty())
(Note in a general case you could explicitly capture the TimeoutException and fallback to an empty publisher using onErrorResume(TimeoutException.class, e -> Mono.empty()), but that's much less preferable to using the above option where possible.)

How to keep redis connection open when reading from reactive API

I am continuously listening on redis streams using the spring reactive api(using lettuce driver). I am using a standalone connection. It seems like the reactor's event loop opens a new connection every time it reads the messages instead of keeping the connection open. I see a lot of TIME_WAIT ports in my machine when i run my program. Is this normal? Is there a way to let lettuce know to re-use the connection instead of reconnecting every time?
This is my code:
StreamReceiver<String, MapRecord<String, String, String>> receiver = StreamReceiver.create(factory);
return receiver
.receive(Consumer.from(keyCacheStreamsConfig.getConsumerGroup(), keyCacheStreamsConfig.getConsumer()),
StreamOffset.create(keyCacheStreamsConfig.getStreamName(), ReadOffset.lastConsumed()))//
// flatMap reads 256 messages by default and processes them in the given scheduler
.flatMap(record -> Mono.fromCallable(() -> consumer.consume(record)).subscribeOn(Schedulers.boundedElastic()))//
.doOnError(t -> {
log.error("Error processing.", t);
streamConnections.get(nodeName).setDirty(true);
})//
.onErrorContinue((err, elem) -> log.error("Error processing message. Continue listening."))//
.subscribe();
Looks like the spring-data-redis library re-uses the connection only if the poll timeout is set to '0' in the stream receiver options and pass it as the second argument in StreamReceiver.create(factory, options). Figured by looking into spring-data-redis' source code.

Execute multiple downloads and wait for all to complete

I am currently working on an API service that allows 1 or more users to download 1 or more items from an S3 bucket and return the contents to the user. While the downloading is fine, the time taken to download several files is pretty much 100-150 ms * the number of files.
I have tried a few approaches to speeding this up - parallelStream() instead of stream() (which, considering the amount of simultaneous downloads, is at a serious risk of running out of threads), as well as CompleteableFutures, and even creating an ExecutorService, doing the downloads then shutting down the pool. Typically I would only want a few concurrent tasks e.g. 5 at the same time, per request to try and cut down on the number of active threads.
I have tried integrating Spring #Cacheable to store the downloaded files to Redis (the files are readonly) - while this certainly cuts down response times (several ms to retrieve files compared to 100-150 ms), the benefits are only there once the file has been previously retrieved.
What is the best way to handle waiting on multiple async tasks to finish then getting the results, also considering I don't want (or don't think I could) have hundreds of threads opening http connections and downloading all at once?
You're right to be concerned about tying up the common fork/join pool used by default in parallel streams, since I believe it is used for other things like sort operations outside of the Stream api. Rather than saturating the common fork/join pool with an I/O-bound parallel stream, you can create your own fork/join pool for the Stream. See this question to find out how to create an ad hoc ForkJoinPool with the size you want and run a parallel stream in it.
You could also create an ExecutorService with a fixed-size thread pool that would also be independent of the common fork/join pool, and would throttle the requests, using only the threads in the pool. It also lets you specify the number of threads to dedicate:
ExecutorService executor = Executors.newFixedThreadPool(MAX_THREADS_FOR_DOWNLOADS);
try {
List<CompletableFuture<Path>> downloadTasks = s3Paths
.stream()
.map(s3Path -> completableFuture.supplyAsync(() -> mys3Downloader.downloadAndGetPath(s3Path), executor))
.collect(Collectors.toList());
// at this point, all requests are enqueued, and threads will be assigned as they become available
executor.shutdown(); // stops accepting requests, does not interrupt threads,
// items in queue will still get threads when available
// wait for all downloads to complete
CompletableFuture.allOf(downloadTasks.toArray(new CompletableFuture[downloadTasks.size()])).join();
// at this point, all downloads are finished,
// so it's safe to shut down executor completely
} catch (InterruptedException | ExecutionException e) {
e.printStackTrace();
} finally {
executor.shutdownNow(); // important to call this when you're done with the executor.
}
Following #Hank D's lead, you can encapsulate the creation of the executor service to ensure that you do, indeed, call ExecutorService::shutdownNow after using said executor:
private static <VALUE> VALUE execute(
final int nThreads,
final Function<ExecutorService, VALUE> function
) {
ExecutorService executorService = Executors.newFixedThreadPool(nThreads);
try {
return function.apply(executorService);
} catch (final InterruptedException | ExecutionException exception) {
exception.printStackTrace();
} finally {
executorService .shutdownNow(); // important to call this when you're done with the executor service.
}
}
public static void main(final String... arguments) {
// define variables
final List<CompletableFuture<Path>> downloadTasks = execute(
MAX_THREADS_FOR_DOWNLOADS,
executor -> s3Paths
.stream()
.map(s3Path -> completableFuture.supplyAsync(
() -> mys3Downloader.downloadAndGetPath(s3Path),
executor
))
.collect(Collectors.toList())
);
// use downloadTasks
}

Detect TimeoutException on server side WCF

I have a WCF service that has some operations that may take long...
The client receives a TimeoutException, but the server continues executing after the long operation.
Server:
public void doSomeWork(TransmissionObject o) {
doDBOperation1(o); //
doDBOperation2(o); // may result in TimeoutException on client
doDBOperation3(o); // it continues doing DB operations. The client is unaware!
}
Client:
ServiceReference.IServiceClient cli = new ServiceReference.IServiceClient("WSHttpBinding_IService","http://localhost:3237/Test/service.svc");
int size = 1000;
Boolean done = false;
TransmissionObject o = null;
while(!done) {
o = createTransmissionObject(size);
try {
cli.doSomeWork(o);
done = true;
}catch(TimeoutException ex) {
// We want to reduce the size of the object, and try again
size--;
// the DB operations in server succeed, but the client doesn't know
// this makes errors.
}catch(Exception ex) { ... }
}
Since the server is performing some DB operations, I need to detect the timeout on the server side to be able to rollback the DB operations.
I tried to use Transactions with [TransactionFlow], TransactionScope, etc, on the client side, but the DB operations on the server are using Stored Procedures that are NESTED!!, therefore I cannot use distributed transactions. (I receive an SqlException saying: Cannot Use SAVE TRANSACTION Within A Distributed Transaction.). If I use simple SPs (that are not nested), then the solution with the transactions works fine.
My Question:
How can I detect the TimeoutException, but on the server side? I guess is something related to the proxy status... or probably some Events that can be captured by the server.
I'm not sure if handling the transaction on the server side is the correct solution..
Is there a pattern to solve this problem?
Thanks!
Instead of waiting for an operation to time out, you may consider using asynchronous operations, as in this blog post: http://www.danrigsby.com/blog/index.php/2008/03/18/async-operations-in-wcf-event-based-model/
The idea is that you precisely expect an operation to take some time. The server will signal to the client when the job is finished.