Reactor - only keep/process the latest value for slow consumer - kotlin

In reactor, when I have a quick producer but a slow consumer, and the values in the Reactor stream is like a "snapshot", I would like the consumer to process the latest value in the stream and drop the others. (For example, a consumer that shows the exchange value in GUI vs. a producer that converts the exchange ticks to a Flux.)
The Flux#onBackpressureLatest() operator seems to be the right way to go.
I did some Googling and find some usage examples:
Flux.range(1, 30)
.delayElements(Duration.ofMillis(500))
.onBackpressureLatest()
.delayElements(Duration.ofMillis(3000))
.subscribe { println("got $it") }
This puts a manual delay after onBackpressureLatest(). It's more like a Flux#sample(Duration) rather than a slow consumer.
Internally, the delayElements(Duration) operator wraps a concatMap, so I converted this into:
Flux.range(1, 30)
.delayElements(Duration.ofMillis(500))
.onBackpressureLatest()
.concatMap { Mono.just(it).subscribeOn(Schedulers.boundedElastic()) }
.subscribe { item ->
println("got $item")
// simulate slow subscriber with sleep
Thread.sleep(3000)
}
This is like the answers provided in question Latest overflow strategy with size 1 or any alternatives. However, it looks a bit wired. I don't understand why we need the concatMap(op) or flatMap(op, 1, 1) call to get the onBackpressureLatest() working.
I tried the following (simplified) versions but they do not work as expected, why?
// not working try - 1
Flux.range(1, 30)
.delayElements(Duration.ofMillis(500))
.onBackpressureLatest()
.publishOn(Schedulers.boundedElastic())
.subscribe { item ->
println("got $item")
// simulate slow subscriber with sleep
Thread.sleep(3000)
}
// not working try - 2
Flux.range(1, 30)
.delayElements(Duration.ofMillis(500))
.onBackpressureLatest()
.publishOn(Schedulers.boundedElastic())
.subscribe(object : BaseSubscriber<Int>() {
override fun hookOnSubscribe(subscription: Subscription) {
// explicitly request 1
subscription.request(1)
}
override fun hookOnNext(value: Int) {
// simulate slow subscriber with sleep
Thread.sleep(3000)
println("got $value")
// explicitly request 1
request(1)
}
})

To answer my own question
When the consumer is slow and producer is fast, they need to run on different scheduler threads, otherwise if they run in the same thread, the whole Flux chain will be in sync mode. If in that case, the consumer and producer will run in the same pace in a single thread. So the following won't work
// Not working, producer and consumer runs synchronously in the same thread
Flux.range(1, 30)
.delayElements(Duration.ofMillis(300))
.onBackpressureLatest()
.subscribe { item ->
println("got $item")
// simulate slow subscriber with sleep
Thread.sleep(1000)
}
So we need to switch scheduler thread after the producer to make sure consumer runs in a different thread.
If we switch the scheduler before .onBackpressureLatest() with .publishOn, the rest of the operator chain will run in the same thread. It's like we just started another thread, and runs the synchronous flux flow there, which is merely the same as the above case, so the following doesn't work.
// Not working
// operator publishOn acts as the producer,
// and it runs synchronously in the same thread with the consumer
Flux.range(1, 30)
.delayElements(Duration.ofMillis(300))
.publishOn(Schedulers.boundedElastic()) // the following runs synchronously as before
.onBackpressureLatest()
.subscribe { item ->
println("got $item")
// simulate slow subscriber with sleep
Thread.sleep(1000)
}
If we put the .publishOn(Schedulers.boundedElastic()) after .onBackpressureLatest(), it doesn't work either. The reason is that the 1-arg publishOn method takes a default prefetch value Queues.SMALL_BUFFER_SIZE = 256. So it will request(256) on subscription, which pushes a pressure to .onBackpressureLatest() that the downstream needs 256 items. Thus .onBackpressureLatest() will offer 256 values directly to .publishOn (if available), and the chain after .publishOn consumes the items synchronously. So the following doesn't work as expected:
// Not working
// 1-arg publishOn has prefetch=256 by default
// this pushes a pressure of 256 to onBackpressureLatest
Flux.range(1, 30)
.delayElements(Duration.ofMillis(300))
.onBackpressureLatest()
.publishOn(Schedulers.boundedElastic()) // pushes pressure of 256
.subscribe { item ->
println("got $item")
// simulate slow subscriber with sleep
Thread.sleep(1000)
}
So what we need is to make sure the operator chain after .onBackpressureLatest() pushes a pressure of 1 when they are ready to process the next item, i.e., in the speed of the consumer. We just need to call .publishOn with the second prefetch argument:
// Working
Flux.range(1, 30)
.delayElements(Duration.ofMillis(300))
.onBackpressureLatest()
.publishOn(Schedulers.boundedElastic(), 1) // pushes pressure of 1 when ready to process
.subscribe { item ->
println("got $item")
// simulate slow subscriber with sleep
Thread.sleep(1000)
}
The following two alternatives described in the question can be used to replace the .publishOn line. They do the same: 1) switch scheduler thread and 2) ensures back pressure of 1 to onBackpressureLatest().
.concatMap { Mono.just(it).subscribeOn(Schedulers.boundedElastic()) }
.flatMap({ Mono.just(it).subscribeOn(Schedulers.boundedElastic()) }, 1, 1)
To rewrite this using BaseSubscriber<T>, we can write
Flux.range(1, 30)
.delayElements(Duration.ofMillis(300))
.onBackpressureLatest()
.subscribe(object : BaseSubscriber<Int>() {
override fun hookOnSubscribe(subscription: Subscription) {
request(1)
}
override fun hookOnNext(value: Int) {
Mono.just(value)
.subscribeOn(Schedulers.boundedElastic())
.subscribe { item ->
println("got $item")
// simulate slow subscriber with sleep
Thread.sleep(1000)
request(1)
}
}
})

Related

Kotlin Coroutines - Asynchronously consume a sequence

I'm looking for a way to keep a Kotlin sequence that can produces values very quickly, from outpacing slower async consumers of its values. In the following code, if the async handleValue(it) cannot keep up with the rate that the sequence is producing values, the rate imbalance leads to buffering of produced values, and eventual out-of-memory errors.
getSequence().map { async {
handleValue(it)
}}
I believe this is a classic producer/consumer "back-pressure" situation, and I'm trying to understand how to use Kotlin coroutines to deal with it.
Thanks for any suggestions :)
Kotlin channels and flows offer buffering producer dispatched data until the consumer/collector is ready to consume it.
But Channels have some concerns that have been manipulated in Flows; for instance, they are considered hot streams:
The producer starts for dispatching data whether or not there is an attached consumer; and this introduces resource leaks.
As long as no consumer attached to the producer, the producer will stuck in suspending state
However Flows are cold streams; nothing will be produced until there is something to consume.
To handle your query with Flows:
GlobalScope.launch {
flow {
// Producer
for (item in getSequence()) emit(item)
}.map { handleValue(it) }
.buffer(10) // Optionally specify the buffer size
.collect { // Collector
}
}
For my own reference, and to anyone else this may help, here's how I eventually solved this using Channels - https://kotlinlang.org/docs/channels.html#channel-basics
A producer coroutine:
fun itemChannel() : ReceiveChannel<MyItem> {
return produce {
while (moreItems()) {
send(nextItem()) // <-- suspend until next 'receive()'
}
}
}
And a function to run multiple consumer coroutines, each reading off that channel:
fun itemConsumers() {
runBlocking {
val channel = itemChannel()
repeat(numberOfConsumers) {
launch {
var more = true
while (more) {
try {
val item = channel.receive()
// do stuff with item here...
} catch (ex: ClosedReceiveChannelException) {
more = false
}
}
}
}
}
}
The idea here is that the consumer receives off the channel within the coroutine, so the next receive() is not called until a consumer coroutine finishes handling the last item. This results in the desired back-pressure, as opposed to receiving from a sequence or flow in the main thread, and then passing the item into a coroutine to be consumed. In that scenario there is no back-pressure from the receiver, since the receive happens in a different coroutine than where the received item is consumed.

Flow - pause/resume flow

In RxJava there is the valve operator that allows to pause (and buffer) a flow and resumes the flow again (and also emit the buffered values as soon as it's resumed). It's part of the rx java extensions (https://github.com/akarnokd/RxJavaExtensions/blob/3.x/src/main/java/hu/akarnokd/rxjava3/operators/FlowableValve.java).
Is there something like this for kotlin flows?
My use case is that I want to observe a flow inside an activity and never lose an event (like I would do it with LiveData e.g. which stops observing data if the activity is paused). So while the activity is paused I want the flow to buffer observed values until the activity is resumed and emit them all as soon as the activity is resumed.
So while the activity is created (until it is destroyed) I want to observe the flow BUT I only want to emit values while the activity is active and buffer the values while it is not active (but still created) until it gets active again.
Is there something to solve this or has anyone ever written something to solve this?
A combination of Lifecycle.launchWhenX and a SharedFlow should do the trick. Here's a simple example using a flow that emits a number every second.
// In your ViewModel
class MainViewModel : ViewModel() {
val numbers = flow {
var counter = 0
while (true) {
emit(counter++)
delay(1_000L)
}
}
.shareIn(
scope = viewModelScope,
started = SharingStarted.Lazily
)
}
// In your Fragment.onViewCreated()
viewLifecycleOwner.lifecycleScope.launchWhenStarted {
viewModel.numbers
.collect { number ->
Log.d("asdf", "number: $number")
}
}
This works because Lifecycle.launchWhenStarted pauses the coroutine when the Lifecycle enters a stopped state, rather than cancels it. When your Lifecycle comes back to a started state after pausing, it'll collect everything that happened while in the stopped state.
I know it is ugly solution but it works fine for me:
fun main() {
val flow = MutableSharedFlow<String>(extraBufferCapacity = 50, onBufferOverflow = BufferOverflow.DROP_OLDEST)
val isOpened = AtomicBoolean()
val startTime = System.currentTimeMillis()
GlobalScope.launch(Executors.newSingleThreadExecutor().asCoroutineDispatcher()) {
flow
.transform { value ->
while (isOpened.get().not()) { }
emit(value)
}
.collect {
println("${System.currentTimeMillis() - startTime}: $it")
}
}
Thread.sleep(1000)
flow.tryEmit("First")
Thread.sleep(1000)
isOpened.set(true)
flow.tryEmit("Second")
isOpened.set(false)
Thread.sleep(1000)
isOpened.set(true)
flow.tryEmit("Third")
Thread.sleep(2000)
}
Result:
So you can set isOpened to false when your activity lifecycle paused and to true when resumed.
You can use lifecycleScope.launchWhenStarted
https://developer.android.com/kotlin/flow/stateflow-and-sharedflow#stateflow

Kotlin Coroutines - unlimited stream to fan out batches

I'm looking to implement a pipeline for processing an infinite stream of messages. I'm new to coroutines and trying to follow along with the docs but I'm not confident I'm doing the right thing.
My infinite stream is of batches of records and I'd like to fan out the processing of each record to a coroutine, wait for a batch to finish (to log stats and stuff) before continuing to the next batch.
-> process [record] \
source -> [records] -> process [record] -> [log batch stats]
-> process [record] /
|------------------- while(true) -------------------|
What I had planned is to have 2 Channels, one for the infinite stream, and one for the intermediate records that will fill up and empty on each batch.
runBlocking {
val infinite: Channel<List<Record>> = produce { send(source.getBatch()) }
val records = Channel<Record>(Channel.Factory.UNLIMITED)
while(true) {
infinite.receive().forEach { records.send(it) }
while(!records.isEmpty()) {
launch { process(records.receive()) }
}
// ??? Wait for jobs?
logBatchStats()
}
}
From googling, it seems that waiting for jobs is discouraged, plus I wasn't sure if calling .map on a channel will actually receive messages to convert them to jobs:
records.map { record -> launch { process(record) } }
yields a Channel<Job>. It seems I can call .toList() on it to collapse it, but then I need to join the jobs? Again, google suggested to do that by having a parent job, but I'm not really sure how to do that with launch.
Anyway, very much a n00b to this.
Thanks for the help.
I don't see a reason to have two channels. You could directly iterate over the list of records. And you should use async instead of launch. Then you can use await or even better awaitAll for the list of results.
val infinite: ReceiveChannel<List<Record>> = produce { ... }
while(true) {
val resultsDeferred = infinite.receive().map {
async {
process(it)
}
}
val results = resultsDeferred.awaitAll()
logBatchStats()
}

How can I know how many messages are waiting for Actor to process

I am using actor from Kotlin coroutines like this:
actor<Action>(CommonPool, 0, parent = job) {
consumeEach { action ->
when (action) {
is Spin -> spin(action.id)
is Done -> action.ack.complete(true)
}
}
}
For debugging purposes, I would like to know how many messages are waiting on the channel for processing. How is it possible?

Kotlin: withContext() vs Async-await

I have been reading kotlin docs, and if I understood correctly the two Kotlin functions work as follows :
withContext(context): switches the context of the current coroutine, when the given block executes, the coroutine switches back to previous context.
async(context): Starts a new coroutine in the given context and if we call .await() on the returned Deferred task, it will suspends the calling coroutine and resume when the block executing inside the spawned coroutine returns.
Now for the following two versions of code :
Version1:
launch(){
block1()
val returned = async(context){
block2()
}.await()
block3()
}
Version2:
launch(){
block1()
val returned = withContext(context){
block2()
}
block3()
}
In both versions block1(), block3() execute in default context(commonpool?) where as block2() executes in the given context.
The overall execution is synchronous with block1() -> block2() -> block3() order.
Only difference I see is that version1 creates another coroutine, where as version2 executes only one coroutine while switching context.
My questions are :
Isn't it always better to use withContext rather than async-await as it is functionally similar, but doesn't create another coroutine. Large numbers of coroutines, although lightweight, could still be a problem in demanding applications.
Is there a case async-await is more preferable to withContext?
Update:
Kotlin 1.2.50 now has a code inspection where it can convert async(ctx) { }.await() to withContext(ctx) { }.
Large number of coroutines, though lightweight, could still be a problem in demanding applications
I'd like to dispel this myth of "too many coroutines" being a problem by quantifying their actual cost.
First, we should disentangle the coroutine itself from the coroutine context to which it is attached. This is how you create just a coroutine with minimum overhead:
GlobalScope.launch(Dispatchers.Unconfined) {
suspendCoroutine<Unit> {
continuations.add(it)
}
}
The value of this expression is a Job holding a suspended coroutine. To retain the continuation, we added it to a list in the wider scope.
I benchmarked this code and concluded that it allocates 140 bytes and takes 100 nanoseconds to complete. So that's how lightweight a coroutine is.
For reproducibility, this is the code I used:
fun measureMemoryOfLaunch() {
val continuations = ContinuationList()
val jobs = (1..10_000).mapTo(JobList()) {
GlobalScope.launch(Dispatchers.Unconfined) {
suspendCoroutine<Unit> {
continuations.add(it)
}
}
}
(1..500).forEach {
Thread.sleep(1000)
println(it)
}
println(jobs.onEach { it.cancel() }.filter { it.isActive})
}
class JobList : ArrayList<Job>()
class ContinuationList : ArrayList<Continuation<Unit>>()
This code starts a bunch of coroutines and then sleeps so you have time to analyze the heap with a monitoring tool like VisualVM. I created the specialized classes JobList and ContinuationList because this makes it easier to analyze the heap dump.
To get a more complete story, I used the code below to also measure the cost of withContext() and async-await:
import kotlinx.coroutines.*
import java.util.concurrent.Executors
import kotlin.coroutines.suspendCoroutine
import kotlin.system.measureTimeMillis
const val JOBS_PER_BATCH = 100_000
var blackHoleCount = 0
val threadPool = Executors.newSingleThreadExecutor()!!
val ThreadPool = threadPool.asCoroutineDispatcher()
fun main(args: Array<String>) {
try {
measure("just launch", justLaunch)
measure("launch and withContext", launchAndWithContext)
measure("launch and async", launchAndAsync)
println("Black hole value: $blackHoleCount")
} finally {
threadPool.shutdown()
}
}
fun measure(name: String, block: (Int) -> Job) {
print("Measuring $name, warmup ")
(1..1_000_000).forEach { block(it).cancel() }
println("done.")
System.gc()
System.gc()
val tookOnAverage = (1..20).map { _ ->
System.gc()
System.gc()
var jobs: List<Job> = emptyList()
measureTimeMillis {
jobs = (1..JOBS_PER_BATCH).map(block)
}.also { _ ->
blackHoleCount += jobs.onEach { it.cancel() }.count()
}
}.average()
println("$name took ${tookOnAverage * 1_000_000 / JOBS_PER_BATCH} nanoseconds")
}
fun measureMemory(name:String, block: (Int) -> Job) {
println(name)
val jobs = (1..JOBS_PER_BATCH).map(block)
(1..500).forEach {
Thread.sleep(1000)
println(it)
}
println(jobs.onEach { it.cancel() }.filter { it.isActive})
}
val justLaunch: (i: Int) -> Job = {
GlobalScope.launch(Dispatchers.Unconfined) {
suspendCoroutine<Unit> {}
}
}
val launchAndWithContext: (i: Int) -> Job = {
GlobalScope.launch(Dispatchers.Unconfined) {
withContext(ThreadPool) {
suspendCoroutine<Unit> {}
}
}
}
val launchAndAsync: (i: Int) -> Job = {
GlobalScope.launch(Dispatchers.Unconfined) {
async(ThreadPool) {
suspendCoroutine<Unit> {}
}.await()
}
}
This is the typical output I get from the above code:
Just launch: 140 nanoseconds
launch and withContext : 520 nanoseconds
launch and async-await: 1100 nanoseconds
Yes, async-await takes about twice as long as withContext, but it's still just a microsecond. You'd have to launch them in a tight loop, doing almost nothing besides, for that to become "a problem" in your app.
Using measureMemory() I found the following memory cost per call:
Just launch: 88 bytes
withContext(): 512 bytes
async-await: 652 bytes
The cost of async-await is exactly 140 bytes higher than withContext, the number we got as the memory weight of one coroutine. This is just a fraction of the complete cost of setting up the CommonPool context.
If performance/memory impact was the only criterion to decide between withContext and async-await, the conclusion would have to be that there's no relevant difference between them in 99% of real use cases.
The real reason is that withContext() a simpler and more direct API, especially in terms of exception handling:
An exception that isn't handled within async { ... } causes its parent job to get cancelled. This happens regardless of how you handle exceptions from the matching await(). If you haven't prepared a coroutineScope for it, it may bring down your entire application.
An exception not handled within withContext { ... } simply gets thrown by the withContext call, you handle it just like any other.
withContext also happens to be optimized, leveraging the fact that you're suspending the parent coroutine and awaiting on the child, but that's just an added bonus.
async-await should be reserved for those cases where you actually want concurrency, so that you launch several coroutines in the background and only then await on them. In short:
async-await-async-await — don't do that, use withContext-withContext
async-async-await-await — that's the way to use it.
Isn't it always better to use withContext rather than asynch-await as it is funcationally similar, but doesn't create another coroutine. Large numebrs coroutines, though lightweight could still be a problem in demanding applications
Is there a case asynch-await is more preferable to withContext
You should use async/await when you want to execute multiple tasks concurrently, for example:
runBlocking {
val deferredResults = arrayListOf<Deferred<String>>()
deferredResults += async {
delay(1, TimeUnit.SECONDS)
"1"
}
deferredResults += async {
delay(1, TimeUnit.SECONDS)
"2"
}
deferredResults += async {
delay(1, TimeUnit.SECONDS)
"3"
}
//wait for all results (at this point tasks are running)
val results = deferredResults.map { it.await() }
//Or val results = deferredResults.awaitAll()
println(results)
}
If you don't need to run multiple tasks concurrently you can use withContext.
When in doubt, remember this like a rule of thumb:
If multiple tasks have to happen in parallel and the final result depends on completion of all of them, then use async.
For returning the result of a single task, use withContext.