Kotlin flow - how to handle cancelation

Kotlin flow - how to handle cancelation - kotlin

I'm learning kotlin coroutines and flows and one thing is a little bit obscure to me. In case I have a long running loop for the regular coroutines I can use isActive or ensureActive to handle cancelation. However those are not defined for a flow but nevertheless the following code properly finishes the flow:
import kotlinx.coroutines.Dispatchers
import kotlinx.coroutines.flow.*
import kotlinx.coroutines.runBlocking
import org.slf4j.LoggerFactory
private val logger = LoggerFactory.getLogger("Main")
fun main() {
val producer = FlowProducer()
runBlocking {
producer
.produce()
.take(10)
.collect {
logger.info("Received $it")
}
}
logger.info("done")
}
class FlowProducer {
fun produce() = flow {
try {
var counter = 1
while (true) {
logger.info("Before emit")
emit(counter++)
logger.info("After emit")
}
}finally {
logger.info("Producer has finished")
}
}.flowOn(Dispatchers.IO)
}
Why is that a case? Is it because the emit is a suspendable function that handles cancelation for me? What to do in case the emit is called conditionally? For example that loop actually polls records from Kafka and calls emit only when the received records are not empty. Then we can have the situation that:
We want 10 messages(take 10)
Actually there are only 10 messages on the kafka topic
Since there are no more messages the emit won't be called again and therefore even though we received all messages we want, the loop will continue to waste resources on unnecessary polling.
Not sure if my understanding is correct. Should I call yield() on each loop in such case?

The important thing to remember here is that flows are "cold", at least in their simple form. What that means is that a flow isn't capable of doing any work except while you are actively consuming data from it. A cold flow doesn't have a coroutine associated with it. You can learn a little more from this blog post by Roman Elizarov.
When you call collect on a flow, control is tranferred from the collector to the flow. This is what enables the flow to do work. The collector is effectively executing the code inside the flow. When the flow calls emit, control transfers back to the collector. If you're familiar with Kotlin's sequence builder, you can think of flows very similarly.
By definition, this means that if you stop collecting the flow, the flow stops doing any work. In your case, because you used take(10), the collector will stop executing the flow once it has received ten items. Because the collector is the thing that's actually executing the loop inside the flow, the loop doesn't continue to run when the collector is no longer collecting. Once you stop using the flow, it's just like an iterator that's no longer being iterated over. It can be garbage collected like any other object.
You asked whether you should call yield() inside your flow. There are some situations where this could be useful, and you can read more about flow cancellation checks in the docs. In your case, it's not necessary, because:
The cancellation checks are only needed to detect when something has cancelled the coroutine that is executing the flow. When the flow aborts itself, such as when take(10) has emitted 10 items, it simply terminates normally, without cancelling any coroutines.
The flow is built using emit, which already checks for cancellation.
Even when cancellation checks aren't required, it's still possible to create a flow that runs forever. As mentioned above, control only transfers back to the collector each time the flow calls emit. So if your flow runs indefinitely without calling emit, it will never return control back to the collector. This is the same as writing an infinite loop in normal code, and isn't particularly special to flows.
Note that it is possible to create a hot flow that has a coroutine doing work in the background. In that case, you would need to make sure that the coroutine responds correctly to cancellation of the flow.

Yes, emit will throw CancellationException when take cancels the flow.
The Kafka example you give will actually work, because take will cancel the flow at the end of the 10th emit, not at the start of the 11th.

Related

Swapping between Schedulers

I have a blocking workload that I want to execute on the bounded elastic scheduler. After this work is done, a lot of work that could be executed on the parallel scheduler follows, but it will automatically continue to run on the thread from the bounded elastic scheduler.
When is it "correct" to drop the previous scheduler you set earlier in the chain? Is there ever a reason to do so if it's not strictly necessary, because of thread starvation, for example?
I can switch the scheduler of a chain by "breaking" the existing chain with flatMap, then, switchIfEmpty, and probably a few more methods. Example:
public Mono<Void> mulpleSchedulers() {
return blockingWorkload()
.doOnSuccess(result -> log.info("Thread {}", Thread.currentThread().getName())) // Thread boundedElastic-1
.subscribeOn(Schedulers.boundedElastic())
.flatMap(result -> Mono.just(result)
.subscribeOn(Schedulers.parallel()))
.doOnSuccess(result -> log.info("Thread {}", Thread.currentThread().getName())); // Thread parallel-1
}

Generally it is not a bad practice to switch the execution to another scheduler throughout your reactive chain.
To switch the execution to another scheduler in the middle of your chain you can use publishOn() operator. Then any subsequent operator call will be run on the supplied scheduler worker to such publishOn().
Take a look at 4.5.1 from reference
However, you should clearly know why do you do that. Is there any reason for this?
If you want to run some long computational process (some CPU-bound work), then it is recommended to execute it on Schedulers.parallel()
For making blocking calls it is recommended to use Schedulers.boundedElastic()
However, usually for making blocking calls we use subscribeOn(Schedulers.boundedElastic()) on "blocking publisher" directly.
Wrapping blocking calls in reactor

Kotlin app with a lot of coroutines is locking up

I'm working on a backend application that has a unique use case. For each "entity," I have to poll 4 APIs every 5 seconds. I also have data flowing from other sources (message queues, an internal API that I need to poll), so I decided that Kotlin's coroutines with the actor design pattern was decent approach.
I have 5 high level coroutines, about 6 Kotlin coroutine "actors", and each "entity" is a coroutine that spawns child coroutines to perform the polling.
So essentially it looks like this.
a Master coroutine spawns 5 high level coroutines that helps manage the state of the app.
the master coroutine then spawns a coroutine for each "entity" that is in the database.
the entity coroutine spawns 4 children coroutines that is performing the every 5 second polling.
Polling coroutines basically look like this:
while (isActive) {
try {
pollApi()
} catch (e: Exception) {
log.error(e)
} finally {
if (isActive) delay(5 seconds)
}
}
I performed my local testing with ~5 entities and everything works great. I then deployed to AWS and slowly added more entities (10 at a time), and I noticed that all coroutines seem to be locked up/suspended once I reached about 60 entities. I also noticed the exact same behavior when I launch the app with an immediate 25 entities.
Since there is a lot of API calls and delays going on here, I am using Dispatchers.IO for all of my coroutine scopes.
Any idea what might be going on here?
Thanks!

How does backpressure work in flatMap operator of Project Reactor?

There's this concept of back-pressure on project reactor, which is transparent from developers. Want to understand how it really works.
Let's use the following code block
fun consumeMethod(data: Flux<String>) {
data
.flatMap { slowHttpCall(it) }
.subscribe()
}
Is my understanding correct regarding the flow of execution:
When we call subscribe(), it will request the publisher to send ALL of the data.
Moving up to the flatMap, let's say it will request 32 elements to the publisher.
The publisher then will send 32 elements
Moving down to flatMap again, it will call slowHttpCall() for 32 elements without waiting until each http call to complete. So right now we have 32 ongoing http calls
Let's stop here
At this point, will flatMap request more element from the publisher? Or will it wait until all 32 http calls to complete before requesting for more? Or will it wait until 1 complete and request 1? How much will it request and why?
Thank you

It will NOT wait until all in progress HTTP calls are completed. It will request new items as some of the in-progress ones complete.
flatMap has an overloaded version which lets you define a concurrency parameter (by default 256) which puts a limit on how many inner publishers can be in progress at most. If the number of in-progress publishers is less than the defined limit then flatMap will request additional items from the source publisher.
Now, the request rate doesn't seem to be consistent. Most of the time flatMap requests items one by one, sometimes it requests more. Probably a Reactor developer can provide more insight on that.
You can check the behavior of your exact use case by using log operator:
Flux.range(1, 1000)
.log()
.flatMap { Mono.delay(Duration.ofMillis(Random.nextLong(1000, 3000))).thenReturn(it) }
.log(null, Level.WARNING)
.blockLast()
More on backpressure in this answer: https://stackoverflow.com/a/57298393/6051176

Kotlin Channels usage difference between Send and Offer

Channels have two functions that allow us to send events into it.
Send and offer.
I would like to understand better the difference between both.
I have some statements I wanna check are true.
Send is a suspend function. What will make my code(not the thread) wait for it to finish. So it keep running after the event inside send was complete/cancelled. OR it will be suspend only until I can queue the event/receive it?
This means that, if I use send from one channel to another, the first channel will be block until the second can receive/queue?
If I have a Rendezvous Channel and it is already running something (on suspend for example, waiting API) and I offer a new even. This will make offer throws exception? Cause the channel is not receiving?
If you know any other main difference I would be glad to know.
Thanks in advance

send suspends the coroutine it is invoked from while the channel being sent to is full.
send does not send from one channel to another one. When you invoke send you are sending an element to the channel. The channel then expects another block of code to invoke receive from a different coroutine.
In a RendezvousChannel the capacity is 0. This means that send always suspends waiting for a receive invocation from another coroutine. If you have invoked send on a RendezvousChannel and then use offer, offer will not throw an exception (it only does if the channel is closed), but rather it will return false if no balancing receive has been invoked on the RendezvousChannel after your initial send. This is because offer tries to immediately add the element to the channel if it doesn't violate its capacity restrictions.

Kotlin Coroutine - Keeping Channel Send Event Synchronous

I have a class which listens to events coming from a socket at a very fast pace. I would like to feed these events into a coroutine Channel. The following code is used:
class MyClass(channel: Channel<String>) : ... {
...
override onMessageReceived(message: String) {
MyScope.launch {
channel.send(message)
}
}
}
This does not work since sometimes the events come in so fast that they end up getting posted out of order due to the launch spawning a new coroutine and everything happening in parallel. How can I ensure the order of the send is synchronous?
I tried newSingleThreadContext which did work however it is considered experimental and has a note saying it will be removed eventually. I am looking for a more solution that is more correct and complete.

Instead of launching the sends in parallel, you should use a Channel with a capacity of Channel.UNLIMITED, and have onMessageReceived use offer instead of send.
This is a lot cheaper than launching a new job for each send, and the channel will preserve the order

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas