coroutine execution problem some coroutines won't launch - kotlin

I'm trying to transform and flush some data to a database in a streaming application using spring and kotlin. performance constraints made me to use coroutine and chunk to execute the transformation and persist procedure faster.
The problem is the size of input data is not equal to the persisted data!
My Job Scheduled to run with a fixed delay:
#Scheduled(fixedDelay = 10_000)
fun flushToDb() {
// some operations
CoroutineScope(getDispatcherExec()).launch {
flush(oldStat, isFlushing)
}
}
and this is my suspend func:
private suspend fun flush(oldStat: Map<String, AppDailyStatsModel>, isFlushing: AtomicBoolean) {
logger.info("RedisFlush ${oldStat.size} starts")
val start = System.currentTimeMillis()
val count = AtomicLong(0)
val startCount = AtomicLong(0)
val newStat = AtomicLong(0)
val updateStat = AtomicLong(0)
val input = oldStat.values
coroutineScope {
input
.chunked(1000)
.forEach { models ->
launch {
try {
startCount.addAndGet(models.size.toLong())
// Transform
// Persist
count.addAndGet(models.size.toLong())
logger.info("Flushed ${count.get()}")
} catch (e: Exception) {
logger.error("PROBLEM IN SAVE ", e)
}
}
}
}
}
The issue is count and startCount are equal and they're less than input !!
any help would be much appreciated!

Related

How can I get a previous emission Kotlin Flow?

Let me use a simple image to illustrate what I want to get:
I don't want to use SharedFlow's replayCache to achieve this because if a new observer observes that SharedFlow, it will get 2 emissions instead of one latest emission.
Or if I write it in code:
val sharedFlow = MutableSharedFlow(replay = 1)
val theFlowThatIWant = sharedFlow.unknownOperator { … }
sharedFlow.emit(1)
sharedFlow.emit(2)
sharedFlow.collect {
println(it)
}
Expected output:
2
theFlowThatIWant.collect {
println(it)
}
Expected output:
1
We can create such operator by ourselves. We can generalize it to more items than only the last one and use circular buffer to keep postponed items:
suspend fun main() {
val f = flow {
repeat(5) {
println("Emitting $it")
emit(it)
delay(1000)
}
}
f.postponeLast()
.collect { println("Collecting $it") }
}
fun <T> Flow<T>.postponeLast(count: Int = 1): Flow<T> = flow {
val buffer = ArrayDeque<T>(count)
collect {
if (buffer.size == count) {
emit(buffer.removeFirst())
}
buffer.addLast(it)
}
}
Note that this solution never emits postponed items. If you like to emit them at the end, just add this after collect { }:
while (buffer.isNotEmpty()) {
emit(buffer.removeFirst())
}

Kotlin Flow: receiving values until emitting timeout

I want to collect specific amount of values from Flow until value emitting timeout happened. Unfortunately, there are no such operators, so I've tried to implement my own using debounce operator.
The first problem is that producer is too fast and some packages are skipped and not collected at all (they are in onEach of original packages flow, but not in onEach of second flow of merge in withNullOnTimeout).
The second problem - after taking last value according to amount argument orginal flow is closed, but timeout flow still alive and finally produce timeout after last value.
How can I solve this two problems?
My implementations:
suspend fun receive(packages: Flow<ByteArray>, amount: Int): ByteArray {
val buffer = ByteArrayOutputStream(blockSize.toInt())
packages
.take(10)
.takeUntilTimeout(100) // <-- custom timeout operator
.collect { pck ->
buffer.write(pck.data)
}
return buffer.toByteArray()
}
fun <T> Flow<T>.takeUntilTimeout(durationMillis: Long): Flow<T> {
require(durationMillis > 0) { "Duration should be greater than 0" }
return withNullOnTimeout(durationMillis)
.takeWhile { it != null }
.mapNotNull { it }
}
fun <T> Flow<T>.withNullOnTimeout(durationMillis: Long): Flow<T?> {
require(durationMillis > 0) { "Duration should be greater than 0" }
return merge(
this,
map<T, T?> { null }
.onStart { emit(null) }
.debounce(durationMillis)
)
}
This was what initially seemed obvious to me, but as Joffrey points out in the comments, it can cause an unnecessary delay before collection terminates. I'll leave this as an example of a suboptimal, oversimplified solution.
fun <T> Flow<T>.takeUntilTimeout(durationMillis: Long): Flow<T> = flow {
val endTime = System.currentTimeMillis() + durationMillis
takeWhile { System.currentTimeMillis() >= endTime }
.collect { emit(it) }
}
Here's an alternate idea I didn't test.
#Suppress("UNCHECKED_CAST")
fun <T> Flow<T>.takeUntilTimeout(durationMillis: Long): Flow<T> {
val signal = Any()
return merge(flow { delay(durationMillis); emit(signal) })
.takeWhile { it !== signal } as Flow<T>
}
How about:
fun <T> Flow<T>.takeUntilTimeout(timeoutMillis: Long) = channelFlow {
val collector = launch {
collect {
send(it)
}
close()
}
delay(timeoutMillis)
collector.cancel()
close()
}
Using a channelFlow allows you to spawn a second coroutine so you can count the time independently, and quite simply.

How to create a polling mechanism with kotlin coroutines?

I am trying to create a polling mechanism with kotlin coroutines using sharedFlow and want to stop when there are no subscribers and active when there is at least one subscriber. My question is, is sharedFlow the right choice in this scenario or should I use channel. I tried using channelFlow but I am unaware how to close the channel (not cancel the job) outside the block body. Can someone help? Here's the snippet.
fun poll(id: String) = channelFlow {
while (!isClosedForSend) {
try {
send(repository.getDetails(id))
delay(MIN_REFRESH_TIME_MS)
} catch (throwable: Throwable) {
Timber.e("error -> ${throwable.message}")
}
invokeOnClose { Timber.e("channel flow closed.") }
}
}
You can use SharedFlow which emits values in a broadcast fashion (won't emit new value until the previous one is consumed by all the collectors).
val sharedFlow = MutableSharedFlow<String>()
val scope = CoroutineScope(Job() + Dispatchers.IO)
var producer: Job()
scope.launch {
val producer = launch() {
sharedFlow.emit(...)
}
sharedFlow.subscriptionCount
.map {count -> count > 0}
.distinctUntilChanged()
.collect { isActive -> if (isActive) stopProducing() else startProducing()
}
fun CoroutineScope.startProducing() {
producer = launch() {
sharedFlow.emit(...)
}
}
fun stopProducing() {
producer.cancel()
}
First of all, when you call channelFlow(block), there is no need to close the channel manually. The channel will be closed automatically after the execution of block is done.
I think the "produce" coroutine builder function may be what you need. But unfortunately, it's still an experimental api.
fun poll(id: String) = someScope.produce {
invokeOnClose { Timber.e("channel flow closed.") }
while (true) {
try {
send(repository.getDetails(id))
// delay(MIN_REFRESH_TIME_MS) //no need
} catch (throwable: Throwable) {
Timber.e("error -> ${throwable.message}")
}
}
}
fun main() = runBlocking {
val channel = poll("hello")
channel.receive()
channel.cancel()
}
The produce function will suspended when you don't call the returned channel's receive() method, so there is no need to delay.
UPDATE: Use broadcast for sharing values across multiple ReceiveChannel.
fun poll(id: String) = someScope.broadcast {
invokeOnClose { Timber.e("channel flow closed.") }
while (true) {
try {
send(repository.getDetails(id))
// delay(MIN_REFRESH_TIME_MS) //no need
} catch (throwable: Throwable) {
Timber.e("error -> ${throwable.message}")
}
}
}
fun main() = runBlocking {
val broadcast = poll("hello")
val channel1 = broadcast.openSubscription()
val channel2 = broadcast.openSubscription()
channel1.receive()
channel2.receive()
broadcast.cancel()
}

Kotlin async processing with optional async dependencies

I have a function that conditionally fetches some data and runs some tasks concurrently on that data. Each task depends on different sets of data and I would like to avoid fetching the data that's not needed. Moreover, some of the data can have already been prefetched and given to the function. See the code I've come up with below.
suspend fun process(input: SomeInput, prefetchedDataX: DataX?, prefetchedDataY: DataY?) = coroutineScope {
val dataXAsync = lazy {
if (prefetchedDataX == null) {
async { fetchDataX(input) }
} else CompletableDeferred(prefetchedDataX)
}
val dataYAsync = lazy {
if (prefetchedDataY == null) {
async { fetchDataY(input) }
} else CompletableDeferred(prefetchedDataY)
}
if (shouldDoOne(input)) launch {
val (dataX, dataY) = awaitAll(dataXAsync.value, dataYAsync.value)
val modifiedDataX = modifyX(dataX)
val modifiedDataY = modifyY(dataY)
doOne(modifiedDataX, modifiedDataY)
}
if (shouldDoTwo(input)) launch {
val modifiedDataX = modifyX(dataXAsync.value.await())
doTwo(modifiedDataX)
}
if (shouldDoThree(input)) launch {
val modifiedDataY = modifyY(dataYAsync.value.await())
doThree(modifiedDataY)
}
}
Any improvements that could be made to this code? One, I don't like having to fakely wrap the prefetched data into a CompletableDeferred. Two, I don't like having to call modifyX, modifyY inside each task, I wish I could apply it at the fetching stage, but I haven't come up with a nice way to do that. Alternatively I could do
val modifiedDataXAsync = lazy {
async { modifyX(prefetchedDataX ?: fetchDataX(input)) }
}
but it feels wasteful to be spawning a new coroutine when the data is already prefetched. Am I over-optimizing?
How about this? This code is pretty similar to yours, I just simplified it a bit.
suspend fun process(input: SomeInput, prefetchedDataX: DataX?, prefetchedDataY: DataY?) = coroutineScope {
val modifiedDataX by lazy {
async { modifyX(prefetchedDataX ?: fetchDataX(input)) }
}
val modifiedDataY by lazy {
async { modifyY(prefetchedDataY ?: fetchDataY(input)) }
}
if (shouldDoOne(input)) launch {
val (dataX, dataY) = awaitAll(modifiedDataX, modifiedDataY)
doOne(dataX, dataY)
}
if (shouldDoTwo(input)) launch {
doTwo(modifiedDataX.await())
}
if (shouldDoThree(input)) launch {
doThree(modifiedDataY.await())
}
}

Fan-out / fan-in - closing result channel

I'm producing items, consuming from multiple co-routines and pushing back to resultChannel. Producer is closing its channel after last item.
The code never finishes as resultChannel is never being closed. How to detect and properly finish iteration so hasNext() return false?
val inputData = (0..99).map { "Input$it" }
val threads = 10
val bundleProducer = produce<String>(CommonPool, threads) {
inputData.forEach { item ->
send(item)
println("Producing: $item")
}
println("Producing finished")
close()
}
val resultChannel = Channel<String>(threads)
repeat(threads) {
launch(CommonPool) {
bundleProducer.consumeEach {
println("CONSUMING $it")
resultChannel.send("Result ($it)")
}
}
}
val iterator = object : Iterator<String> {
val iterator = resultChannel.iterator()
override fun hasNext() = runBlocking { iterator.hasNext() }
override fun next() = runBlocking { iterator.next() }
}.asSequence()
println("Starting interation...")
val result = iterator.toList()
println("finish: ${result.size}")
You can run a coroutine that awaits for the consumers to finish and then closes the resultChannel.
First, rewrite the code that starts the consumers to save the Jobs:
val jobs = (1..threads).map {
launch(CommonPool) {
bundleProducer.consumeEach {
println("CONSUMING $it")
resultChannel.send("Result ($it)")
}
}
}
And then run another coroutine that closes the channel once all the Jobs are done:
launch(CommonPool) {
jobs.forEach { it.join() }
resultChannel.close()
}