Why does binding `onEach` inside launch cause `MutableSharedFlow` events to be lost? - kotlin

This test passes:
#Test
fun passingTest() {
val emitter = MutableSharedFlow<Int>();
var total = 0;
val handler = emitter.onEach { // <--- onEach bound outside of `launch`
total += it;
println("-> $it : total: $total")
}
GlobalScope.launch {
handler.collect()
}
runBlocking {
for (i in 1..11) {
emitter.emit(1);
}
}
assert(total == 11);
}
However, if I move the .onEach into launch it doesn't work any more:
> expected:<11> but was:<0>
> Expected :11
> Actual :0
#Test
fun failingTest() {
val emitter = MutableSharedFlow<Int>();
var total = 0;
GlobalScope.launch {
emitter.onEach { // <--- Bind inside of launch
total += it;
println("-> $it : total: $total")
}.collect()
}
runBlocking {
for (i in 1..11) {
emitter.emit(1);
}
}
assertEquals(11, total);
}
What's going on? Why is my onEach handler getting lost when it's attached inside the launch scope?

So I tried both test cases on my own, logging the blocks of GlobalScope.launch and runBlocking.
GlobalScope.launch {
println("launch")
handler.collect()
}
runBlocking {
for (i in 1..11) {
println(i)
emitter.emit(1);
}
}
println(total)
This test case gives me random output each run. E.g: "12345678910110" even without "launch", sometimes it works as it is expected, sometimes it start collecting after some values are already emitted and therefore the value of total is less than 11.
launch
1
2
-> 1 : total: 1
3
-> 1 : total: 2
4
-> 1 : total: 3
5
-> 1 : total: 4
6
-> 1 : total: 5
7
-> 1 : total: 6
8
-> 1 : total: 7
9
-> 1 : total: 8
10
-> 1 : total: 9
11
-> 1 : total: 10
-> 1 : total: 11
11
In the same time, running second test case several times did not get me to seeing ]total] greater than 0. But I noticed that GlobalScope code executes at random time, so I can see launch output before the output of runBlocking or in the middle of it, but onEach's output is not present still.
I decided to log collect call itself
GlobalScope.launch {
println("launch")
emitter.onEach {
total += it;
println("-> $it : total: $total")
}.collect {
"collecting"
}
}
And noticed that it does not get called at all. Since the time, at which code of GlobalScope.launch gets called, does not matter, I bet that the scope of handler matters.
onEach works as a listener, therefore emitting in runBlocking updates it in the first test case sometimes. And when collect gets called in the right time, it does collecting.
But the listening cannot happen in the other scope and therefore in the second case you will always receive an empty flow. If you add one more GlobalScope.launch with emitting something in the second case, it will get processed by handler and collected.

Related

Kotlin - Debounce Only One Specific Value When Emitting from Flow

I have two flows that are being combined to transform the flows into a single flow. One of the flows has a backing data set that emits much faster than the the other.
Flow A - emits every 200 ms
Flow B - emits every ~1s
The problem I am trying to fix is this one:
combine(flowA, flowB) { flowAValue, flowBValue // just booleans
flowAValue && flowBValue
}.collect {
if(it) {
doSomething
}
}
Because Flow A emits extremely quickly, the boolean that's emitted can get cleared rapidly, which means that when flowB emits true, flowA already emitted true and the state is now false.
I've attempted something like:
suspend fun main() {
flowA.debounce {
if (it) {
1250L
} else {
0L
}
}.collect {
println(it)
}
}
But this doesn't work as sometimes the true values aren't emitted - inverting the conditional (so that if(true) = 0L else 1250L) also doesn't work. Basically what I'm looking for is that if flowA is true - hold that value for 1 second before changing values. Is something like that possible?
I made this use conflated on the 2nd flow, that is drastically faster, so that zipping them will always take the latest value from fastFlow, when slowFlow is finally ready, if you don't use conflated on the 2nd flow, it will always be the first time both emit.
fun forFlow() = runTest {
val slowString = listOf("first", "second", "third", "fourth")
val slowFlow = flow {
slowString.forEach {
delay(100)
emit(it)
}
}
val fastFlow = flow {
(1 until 1000).forEach { num ->
delay(5)
emit(num)
}
}.conflate()
suspend fun zip() {
slowFlow.zip(fastFlow) { first, second -> "$first: $second" }
.collect {
println(it)
}
}
runBlocking {
zip()
}
println("Done!")
}
With Conflated on fastFlow:
first: 1
second: 15
third: 32
fourth: 49
Done!
Without Conflated on fastFlow:
first: 1
second: 2
third: 3
fourth: 4
Done!

Will the collect of the Flow block to execute?

I run the Code A and get the Result A. In my mind, it should be the Result B.
It seems that flow.collect { value -> println(value) } block to execute.
Will the collect of the Flow block to execute ?
Code A
fun simple(): Flow<Int> = flow {
println("Flow started")
for (i in 1..3) {
delay(300)
emit(i)
}
}
fun main() = runBlocking<Unit> {
println("Calling simple function...")
val flow = simple()
println("Calling collect...")
flow.collect { value -> println(value) } //Block?
println("Calling collect again...")
}
Result A
Calling simple function...
Calling collect...
Flow started
1
2
3
Calling collect again...
Result B
Calling simple function...
Calling collect...
Flow started
Calling collect again...
1
2
3
BTW, I run Code 1 and get the result 1 as I expected.
Code 1
fun simple(): Flow<Int> = flow {
for (i in 1..3) {
delay(100)
emit(i)
}
}
fun main() = runBlocking<Unit> {
launch {
for (k in 1..3) {
println("I'm not blocked $k")
delay(100)
}
}
simple().collect { value -> println(value) }
}
Result 1
I'm not blocked 1
1
I'm not blocked 2
2
I'm not blocked 3
3
Suspend functions do not block, but they are synchronous, meaning the execution of the code in the coroutine waits for the suspend function to return before continuing. The difference between suspend function calls and blocking function calls is that the thread is released to be used for other tasks while the coroutine is waiting for the suspend function to return.
collect is a suspend function that internally calls its lambda repeatedly and synchronously (suspending, not blocking) and doesn't return until the Flow is completed.
launch is an asynchronous function that starts a coroutine. It returns immediately without waiting for its coroutine to complete, which is why Code 1 behaved as you expected.

Flows - Cloning a flow without multiple iteration - am I doing it right?

I am just starting to familiarize myself with Kotlin flows.
For this, I am using them to parse the contents of a binary file which I will simulate using the following flow:
fun testFlow() = flow {
println("Starting loop")
try {
for (i in 0..5) {
emit(i)
delay(100)
}
println("Loop has finished")
}
finally {
println("Finally")
}
}
Now, I need the file contents multiple times basically to extract different sets of information.
However, I don't want to read the file twice, but only once.
As there doesn't seem to be a built-in mechanism to clone / duplicate a flow, I developed the following helper function:
interface MultiConsumeBlock<T> {
suspend fun subscribe(): Flow<T>
}
suspend fun <T> Flow<T>.multiConsume(capacity: Int = DEFAULT_CONCURRENCY, scope: CoroutineScope? = null, block: suspend MultiConsumeBlock<T>.() -> Unit) {
val channel = buffer(capacity).broadcastIn(scope ?: CoroutineScope(coroutineContext))
val context = object : MultiConsumeBlock<T> {
override suspend fun subscribe(): Flow<T> {
val subscription = channel.openSubscription()
return flow { emitAll(subscription) }
}
}
try {
block(context)
} finally {
channel.cancel()
}
}
which I then use like this (think about the analogy to the file: flow a gets every record, flow b only the first 3 records (="file header") and flow c everything after the header):
fun main() = runBlocking {
val src = testFlow()
src.multiConsume {
val a = subscribe().map { it }
val b = subscribe().drop(3).map{ it + it}
val c = subscribe().take(3).map{ it * it}
mapOf("A" to a, "B" to b, "C" to c).map { task -> launch { task.value.collect{ println("${task.key}: $it")} } }.toList().joinAll()
}
}
Output:
Starting loop
A: 0
C: 1
A: 1
C: 2
A: 4
C: 3
A: 9
C: 4
A: 16
C: 5
B: 10
C: 6
B: 12
C: 7
B: 14
C: 8
B: 16
C: 9
B: 18
C: 10
B: 20
C: 11
Loop has finished
Finally
Which looks good so far.
However, am I am unsure if I am using Kotlin's flows correctly in this regard.
Am I opening myself up for Deadlocks, missed Exceptions etc.?
The documentation just states:
All implementations of the Flow interface must adhere to two key properties described in detail below:
Context preservation.
Exception transparency.
But I am unsure if that's the case for my implementation or if I am missing something.
Or maybe there is a better way alltogether?

Testing coroutines in Kotlin

I have this simple test about a crawler that is supposed to call the repo 40 times:
#Test
fun testX() {
// ...
runBlocking {
crawlYelp.concurrentCrawl()
// Thread.sleep(5000) // works if I un-comment
}
verify(restaurantsRepository, times(40)).saveAll(restaurants)
// ...
}
and this implementation:
suspend fun concurrentCrawl() {
cities.map { loc ->
1.rangeTo(10).map { start ->
GlobalScope.async {
val rests = scrapYelp.scrap(loc, start * 10)
restaurantsRepository.saveAll(rests)
}
}
}
}
But... I get this:
Wanted 40 times:
-> at ....testConcurrentCrawl(CrawlYelpTest.kt:46)
But was 30 times:
(the 30 is changing all the time; so it seems the test is not waiting...)
Why does it pass when I do the sleep? It should not be needed given I run blocking..
BTW, I have a controller that is supposed to be kept asynchronous:
#PostMapping("crawl")
suspend fun crawl(): String {
crawlYelp.concurrentCrawl()
return "crawling" // this is supposed to be returned right away
}
Thanks
runBlocking waits for all suspend functions to finish, but as concurrentCrawl basically just starts new jobs in new threads with GlobalScope.async currentCrawl, and therefore runBlocking, is done after all jobs were started and not after all of this jobs have finished.
You have to wait for all jobs started with GlobalScope.async to finish like this:
suspend fun concurrentCrawl() {
cities.map { loc ->
1.rangeTo(10).map { start ->
GlobalScope.async {
val rests = scrapYelp.scrap(loc, start * 10)
restaurantsRepository.saveAll(rests)
}
}.awaitAll()
}
}
If you want to wait for concurrentCrawl() to finish outside of concurrentCrawl() then you have to pass the Deferred results to the calling function like in the following example. In that case the suspend keyword can be removed from concurrentCrawl().
fun concurrentCrawl(): List<Deferred<Unit>> {
return cities.map { loc ->
1.rangeTo(10).map { start ->
GlobalScope.async {
println("hallo world $start")
}
}
}.flatten()
}
runBlocking {
concurrentCrawl().awaitAll()
}
As mentioned in the comments: In this case the async method does not return any value so it is better to use launch instead:
fun concurrentCrawl(): List<Job> {
return cities.map { loc ->
1.rangeTo(10).map { start ->
GlobalScope.launch {
println("hallo world $start")
}
}
}.flatten()
}
runBlocking {
concurrentCrawl().joinAll()
}
You could also just use MockK for this (and so much more).
MockK's verify has a timeout : Long parameter specifically for handling these races in tests.
You could leave your production code as it is, and change your test to this:
import io.mockk.verify
#Test
fun `test X`() = runBlocking {
// ...
crawlYelp.concurrentCrawl()
verify(exactly = 40, timeout = 5000L) {
restaurantsRepository.saveAll(restaurants)
}
// ...
}
If the verify is successful at any point before 5 seconds, it'll pass and move on. Else, the verify (and test) will fail.

Kotlin coroutines progress counter

I'm making thousands of HTTP requests using async/await and would like to have a progress indicator. I've added one in a naive way, but noticed that the counter value never reaches the total when all requests are done. So I've created a simple test and, sure enough, it doesn't work as expected:
fun main(args: Array<String>) {
var i = 0
val range = (1..100000)
range.map {
launch {
++i
}
}
println("$i ${range.count()}")
}
The output is something like this, where the first number always changes:
98800 100000
I'm probably missing some important detail about concurrency/synchronization in JVM/Kotlin, but don't know where to start. Any tips?
UPDATE: I ended up using channels as Marko suggested:
/**
* Asynchronously fetches stats for all symbols and sends a total number of requests
* to the `counter` channel each time a request completes. For example:
*
* val counterActor = actor<Int>(UI) {
* var counter = 0
* for (total in channel) {
* progressLabel.text = "${++counter} / $total"
* }
* }
*/
suspend fun getAssetStatsWithProgress(counter: SendChannel<Int>): Map<String, AssetStats> {
val symbolMap = getSymbols()?.let { it.map { it.symbol to it }.toMap() } ?: emptyMap()
val total = symbolMap.size
return symbolMap.map { async { getAssetStats(it.key) } }
.mapNotNull { it.await().also { counter.send(total) } }
.map { it.symbol to it }
.toMap()
}
The explanation what exactly makes your wrong approach fail is secondary: the primary thing is fixing the approach.
Instead of async-await or launch, for this communication pattern you should instead have an actor to which all the HTTP jobs send their status. This will automatically handle all your concurrency issues.
Here's some sample code, taken from the link you provided in the comment and adapted to your use case. Instead of some third party asking it for the counter value and updating the GUI with it, the actor runs in the UI context and updates the GUI itself:
import kotlinx.coroutines.experimental.*
import kotlinx.coroutines.experimental.channels.*
import kotlin.system.*
import kotlin.coroutines.experimental.*
object IncCounter
fun counterActor() = actor<IncCounter>(UI) {
var counter = 0
for (msg in channel) {
updateView(++counter)
}
}
fun main(args: Array<String>) = runBlocking {
val counter = counterActor()
massiveRun(CommonPool) {
counter.send(IncCounter)
}
counter.close()
println("View state: $viewState")
}
// Everything below is mock code that supports the example
// code above:
val UI = newSingleThreadContext("UI")
fun updateView(newVal: Int) {
viewState = newVal
}
var viewState = 0
suspend fun massiveRun(context: CoroutineContext, action: suspend () -> Unit) {
val numCoroutines = 1000
val repeatActionCount = 1000
val time = measureTimeMillis {
val jobs = List(numCoroutines) {
launch(context) {
repeat(repeatActionCount) { action() }
}
}
jobs.forEach { it.join() }
}
println("Completed ${numCoroutines * repeatActionCount} actions in $time ms")
}
Running it prints
Completed 1000000 actions in 2189 ms
View state: 1000000
You're losing writes because i++ is not an atomic operation - the value has to be read, incremented, and then written back - and you have multiple threads reading and writing i at the same time. (If you don't provide launch with a context, it uses a threadpool by default.)
You're losing 1 from your count every time two threads read the same value as they will then both write that value plus one.
Synchronizing in some way, for example by using an AtomicInteger solves this:
fun main(args: Array<String>) {
val i = AtomicInteger(0)
val range = (1..100000)
range.map {
launch {
i.incrementAndGet()
}
}
println("$i ${range.count()}") // 100000 100000
}
There's also no guarantee that these background threads will be done with their work by the time you print the result and your program ends - you can test it easily by adding just a very small delay inside launch, a couple milliseconds. With that, it's a good idea to wrap this all in a runBlocking call which will keep the main thread alive and then wait for the coroutines to all finish:
fun main(args: Array<String>) = runBlocking {
val i = AtomicInteger(0)
val range = (1..100000)
val jobs: List<Job> = range.map {
launch {
i.incrementAndGet()
}
}
jobs.forEach { it.join() }
println("$i ${range.count()}") // 100000 100000
}
Have you read Coroutines basics? There's exact same problem as yours:
val c = AtomicInteger()
for (i in 1..1_000_000)
launch {
c.addAndGet(i)
}
println(c.get())
This example completes in less than a second for me, but it prints some arbitrary number, because some coroutines don't finish before main() prints the result.
Because launch is not blocking, there's no guarantee all of coroutines will finish before println. You need to use async, store the Deferred objects and await for them to finish.