Flows - Cloning a flow without multiple iteration - am I doing it right? - kotlin

I am just starting to familiarize myself with Kotlin flows.
For this, I am using them to parse the contents of a binary file which I will simulate using the following flow:
fun testFlow() = flow {
println("Starting loop")
try {
for (i in 0..5) {
emit(i)
delay(100)
}
println("Loop has finished")
}
finally {
println("Finally")
}
}
Now, I need the file contents multiple times basically to extract different sets of information.
However, I don't want to read the file twice, but only once.
As there doesn't seem to be a built-in mechanism to clone / duplicate a flow, I developed the following helper function:
interface MultiConsumeBlock<T> {
suspend fun subscribe(): Flow<T>
}
suspend fun <T> Flow<T>.multiConsume(capacity: Int = DEFAULT_CONCURRENCY, scope: CoroutineScope? = null, block: suspend MultiConsumeBlock<T>.() -> Unit) {
val channel = buffer(capacity).broadcastIn(scope ?: CoroutineScope(coroutineContext))
val context = object : MultiConsumeBlock<T> {
override suspend fun subscribe(): Flow<T> {
val subscription = channel.openSubscription()
return flow { emitAll(subscription) }
}
}
try {
block(context)
} finally {
channel.cancel()
}
}
which I then use like this (think about the analogy to the file: flow a gets every record, flow b only the first 3 records (="file header") and flow c everything after the header):
fun main() = runBlocking {
val src = testFlow()
src.multiConsume {
val a = subscribe().map { it }
val b = subscribe().drop(3).map{ it + it}
val c = subscribe().take(3).map{ it * it}
mapOf("A" to a, "B" to b, "C" to c).map { task -> launch { task.value.collect{ println("${task.key}: $it")} } }.toList().joinAll()
}
}
Output:
Starting loop
A: 0
C: 1
A: 1
C: 2
A: 4
C: 3
A: 9
C: 4
A: 16
C: 5
B: 10
C: 6
B: 12
C: 7
B: 14
C: 8
B: 16
C: 9
B: 18
C: 10
B: 20
C: 11
Loop has finished
Finally
Which looks good so far.
However, am I am unsure if I am using Kotlin's flows correctly in this regard.
Am I opening myself up for Deadlocks, missed Exceptions etc.?
The documentation just states:
All implementations of the Flow interface must adhere to two key properties described in detail below:
Context preservation.
Exception transparency.
But I am unsure if that's the case for my implementation or if I am missing something.
Or maybe there is a better way alltogether?

Related

Implement backoff strategy in flow

I'm trying to implement a backoff strategy just using kotlin flow.
I need to fetch data from timeA to timeB
result = dataBetween(timeA - timeB)
if the result is empty then I want to increase the end time window using exponential backoff
result = dataBetween(timeA - timeB + exponentialBackOffInDays)
I was following this article which is explaining how to approach this in rxjava2.
But got stuck at a point where flow does not have takeUntil operator yet.
You can see my implementation below.
fun main() {
runBlocking {
(0..8).asFlow()
.flatMapConcat { input ->
// To simulate a data source which fetches data based on a time-window start-date to end-date
// available with in that time frame.
flow {
println("Input: $input")
if (input < 5) {
emit(emptyList<String>())
} else { // After emitting this once the flow should complete
emit(listOf("Available"))
}
}.retryWhenThrow(DummyException(), predicate = {
it.isNotEmpty()
})
}.collect {
//println(it)
}
}
}
class DummyException : Exception("Collected size is empty")
private inline fun <T> Flow<T>.retryWhenThrow(
throwable: Throwable,
crossinline predicate: suspend (T) -> Boolean
): Flow<T> {
return flow {
collect { value ->
if (!predicate(value)) {
throw throwable // informing the upstream to keep emitting since the condition is met
}
println("Value: $value")
emit(value)
}
}.catch { e ->
if (e::class != throwable::class) throw e
}
}
It's working fine except even after the flow has a successful value the flow continue to collect till 8 from the upstream flow but ideally, it should have stopped when it reaches 5 itself.
Any help on how I should approach this would be helpful.
Maybe this does not match your exact setup but instead of calling collect, you might as well just use first{...} or firstOrNull{...}
This will automatically stop the upstream flows after an element has been found.
For example:
flowOf(0,0,3,10)
.flatMapConcat {
println("creating list with $it elements")
flow {
val listWithElementCount = MutableList(it){ "" } // just a list of n empty strings
emit(listWithElementCount)
}
}.first { it.isNotEmpty() }
On a side note, your problem sounds like a regular suspend function would be a better fit.
Something like
suspend fun getFirstNonEmptyList(initialFrom: Long, initialTo: Long): List<Any> {
var from = initialFrom
var to = initialTo
while (coroutineContext.isActive) {
val elements = getElementsInRange(from, to) // your "dataBetween"
if (elements.isNotEmpty()) return elements
val (newFrom, newTo) = nextBackoff(from, to)
from = newFrom
to = newTo
}
throw CancellationException()
}

GroupBy operator for Kotlin Flow

I am trying to switch from RxJava to Kotlin Flow. Flow is really impressive. But Is there any operator similar to RxJava's "GroupBy" in kotlin Flow right now?
As of Kotlin Coroutines 1.3, the standard library doesn't seem to provide this operator. However, since the design of Flow is such that all operators are extension functions, there is no fundamental distinction between the standard library providing it and you writing your own.
With that in mind, here are some of my ideas on how to approach it.
1. Collect Each Group to a List
If you just need a list of all items for each key, use this simple implementation that emits pairs of (K, List<T>):
fun <T, K> Flow<T>.groupToList(getKey: (T) -> K): Flow<Pair<K, List<T>>> = flow {
val storage = mutableMapOf<K, MutableList<T>>()
collect { t -> storage.getOrPut(getKey(t)) { mutableListOf() } += t }
storage.forEach { (k, ts) -> emit(k to ts) }
}
For this example:
suspend fun main() {
val input = 1..10
input.asFlow()
.groupToList { it % 2 }
.collect { println(it) }
}
it prints
(1, [1, 3, 5, 7, 9])
(0, [2, 4, 6, 8, 10])
2.a Emit a Flow for Each Group
If you need the full RxJava semantics where you transform the input flow into many output flows (one per distinct key), things get more involved.
Whenever you see a new key in the input, you must emit a new inner flow to the downstream and then, asynchronously, keep pushing more data into it whenever you encounter the same key again.
Here's an implementation that does this:
fun <T, K> Flow<T>.groupBy(getKey: (T) -> K): Flow<Pair<K, Flow<T>>> = flow {
val storage = mutableMapOf<K, SendChannel<T>>()
try {
collect { t ->
val key = getKey(t)
storage.getOrPut(key) {
Channel<T>(32).also { emit(key to it.consumeAsFlow()) }
}.send(t)
}
} finally {
storage.values.forEach { chan -> chan.close() }
}
}
It sets up a Channel for each key and exposes the channel to the downstream as a flow.
2.b Concurrently Collect and Reduce Grouped Flows
Since groupBy keeps emitting the data to the inner flows after emitting the flows themselves to the downstream, you have to be very careful with how you collect them.
You must collect all the inner flows concurrently, with no upper limit on the level of concurrency. Otherwise the channels of the flows that are queued for later collection will eventually block the sender and you'll end up with a deadlock.
Here is a function that does this properly:
fun <T, K, R> Flow<Pair<K, Flow<T>>>.reducePerKey(
reduce: suspend Flow<T>.() -> R
): Flow<Pair<K, R>> = flow {
coroutineScope {
this#reducePerKey
.map { (key, flow) -> key to async { flow.reduce() } }
.toList()
.forEach { (key, deferred) -> emit(key to deferred.await()) }
}
}
The map stage launches a coroutine for each inner flow it receives. The coroutine reduces it to the final result.
toList() is a terminal operation that collects the entire upstream flow, launching all the async coroutines in the process. The coroutines start consuming the inner flows even while we're still collecting the main flow. This is essential to prevent a deadlock.
Finally, after all the coroutines have been launched, we start a forEach loop that waits for and emits the final results as they become available.
You can implement almost the same behavior in terms of flatMapMerge:
fun <T, K, R> Flow<Pair<K, Flow<T>>>.reducePerKey(
reduce: suspend Flow<T>.() -> R
): Flow<Pair<K, R>> = flatMapMerge(Int.MAX_VALUE) { (key, flow) ->
flow { emit(key to flow.reduce()) }
}
The difference is in the ordering: whereas the first implementation respects the order of appearance of keys in the input, this one doesn't. Both perform similarly.
3. Example
This example groups and sums 40 million integers:
suspend fun main() {
val input = 1..40_000_000
input.asFlow()
.groupBy { it % 100 }
.reducePerKey { sum { it.toLong() } }
.collect { println(it) }
}
suspend fun <T> Flow<T>.sum(toLong: suspend (T) -> Long): Long {
var sum = 0L
collect { sum += toLong(it) }
return sum
}
I can successfully run this with -Xmx64m. On my 4-core laptop I'm getting about 4 million items per second.
It is simple to redefine the first solution in terms of the new one like this:
fun <T, K> Flow<T>.groupToList(getKey: (T) -> K): Flow<Pair<K, List<T>>> =
groupBy(getKey).reducePerKey { toList() }
Not yet but you can have a look at this library https://github.com/akarnokd/kotlin-flow-extensions .
In my project, I was able to achieve this non-blocking by using Flux.groupBy.
https://projectreactor.io/docs/core/release/api/reactor/core/publisher/Flux.html#groupBy-java.util.function.Function-
I did this in the process of converting the results obtained with Flux to Flow.
This may be an inappropriate answer for the situation in question, but I share it as an example.

produce<Type> vs Channel<Type>()

Trying to understand channels. I want to channelify the android BluetoothLeScanner. Why does this work:
fun startScan(filters: List<ScanFilter>, settings: ScanSettings = defaultSettings): ReceiveChannel<ScanResult?> {
val channel = Channel<ScanResult>()
scanCallback = object : ScanCallback() {
override fun onScanResult(callbackType: Int, result: ScanResult) {
channel.offer(result)
}
}
scanner.startScan(filters, settings, scanCallback)
return channel
}
But not this:
fun startScan(scope: CoroutineScope, filters: List<ScanFilter>, settings: ScanSettings = defaultSettings): ReceiveChannel<ScanResult?> = scope.produce {
scanCallback = object : ScanCallback() {
override fun onScanResult(callbackType: Int, result: ScanResult) {
offer(result)
}
}
scanner.startScan(filters, settings, scanCallback)
}
It tells me Channel was closed when it wants to call offer for the first time.
EDIT1: According to the docs: The channel is closed when the coroutine completes. which makes sense. I know we can use suspendCoroutine with resume for a one shot callback-replacement. This however is a listener/stream-situation. I don't want the coroutine to complete
Using produce, you introduce scope to your Channel. This means, the code that produces the items, that are streamed over the channel, can be cancelled.
This also means that the lifetime of your Channel starts at the start of the lambda of the produce and ends when this lambda ends.
In your example, the lambda of your produce call almost ends immediately, which means your Channel is closed almost immediately.
Change your code to something like this:
fun CoroutineScope.startScan(filters: List<ScanFilter>, settings: ScanSettings = defaultSettings): ReceiveChannel<ScanResult?> = produce {
scanCallback = object : ScanCallback() {
override fun onScanResult(callbackType: Int, result: ScanResult) {
offer(result)
}
}
scanner.startScan(filters, settings, scanCallback)
// now suspend this lambda forever (until its scope is canceled)
suspendCancellableCoroutine<Nothing> { cont ->
cont.invokeOnCancellation {
scanner.stopScan(...)
}
}
}
...
val channel = scope.startScan(filter)
...
...
scope.cancel() // cancels the channel and stops the scanner.
I added the line suspendCancellableCoroutine<Nothing> { ... } to make it suspend 'forever'.
Update: Using produce and handling errors in a structured way (allows for Structured Concurrency):
fun CoroutineScope.startScan(filters: List<ScanFilter>, settings: ScanSettings = defaultSettings): ReceiveChannel<ScanResult?> = produce {
// Suspend this lambda forever (until its scope is canceled)
suspendCancellableCoroutine<Nothing> { cont ->
val scanCallback = object : ScanCallback() {
override fun onScanResult(callbackType: Int, result: ScanResult) {
offer(result)
}
override fun onScanFailed(errorCode: Int) {
cont.resumeWithException(MyScanException(errorCode))
}
}
scanner.startScan(filters, settings, scanCallback)
cont.invokeOnCancellation {
scanner.stopScan(...)
}
}
}

Is it possible to write a "double" extension method?

In Kotlin, it is possible to write
class A {
fun B.foo()
}
and then e.g. write with (myA) { myB.foo() }.
Is it possible to write this as an extension method on A, instead? My use case is writing
with (java.math.RoundingMode.CEILING) { 1 / 2 }
which I would want to return 1, the point being that I want to add operator fun Int.div(Int) to RoundingMode.
No it's not possible. operator div is required to have Int as a receiver.
You can't add also RoundingMode as receiver, since there can only be single function receiver.
What you can do, though, is use Pair<RoundingMode, Int> as a receiver:
operator fun Pair<RoundingMode, Int>.div(i: Int): BigDecimal =
BigDecimal.valueOf(second.toLong()).divide(BigDecimal.valueOf(i.toLong()), first)
with(RoundingMode.CEILING) {
println((this to 1) / 2) // => 1
}
That's not possible, Int already has a div function, thus, if you decide to write an extension function div, you won't be able to apply it, because member functions win over extension functions.
You can write this though:
fun RoundingMode.div(x: Int, y: Int): Int {
return if (this == RoundingMode.CEILING) {
Math.ceil(x.toDouble() / y.toDouble()).toInt()
} else {
Math.floor(x.toDouble() / y.toDouble()).toInt()
}
}
fun main(args: Array<String>) {
with(java.math.RoundingMode.CEILING) {
println(div(1,2))
}
}
It's not possible for a couple of reasons:
There's no "double extension functions" concept in Kotlin
You can't override a method with extension functions, and operator div is already defined in Int
However you can workaround these issues with
A context class and an extension lambda (e.g. block: ContextClass.() -> Unit)
Infix functions (e.g. use 15 div 4 instead of 15 / 4)
See the example below:
class RoundingContext(private val roundingMode: RoundingMode) {
infix fun Int.div(b: Int): Int {
val x = this.toBigDecimal()
val y = b.toBigDecimal()
val res = x.divide(y, roundingMode)
return res.toInt()
}
}
fun <T> using(roundingMode: RoundingMode, block: RoundingContext.() -> T): T {
return with(RoundingContext(roundingMode)) {
block()
}
}
// Test
fun main(args: Array<String>) {
using(RoundingMode.FLOOR) {
println(5 div 2) // 2
}
val x = using(RoundingMode.CEILING) {
10 div 3
}
println(x) // 4
}
Hope it helps!

Kotlin coroutines progress counter

I'm making thousands of HTTP requests using async/await and would like to have a progress indicator. I've added one in a naive way, but noticed that the counter value never reaches the total when all requests are done. So I've created a simple test and, sure enough, it doesn't work as expected:
fun main(args: Array<String>) {
var i = 0
val range = (1..100000)
range.map {
launch {
++i
}
}
println("$i ${range.count()}")
}
The output is something like this, where the first number always changes:
98800 100000
I'm probably missing some important detail about concurrency/synchronization in JVM/Kotlin, but don't know where to start. Any tips?
UPDATE: I ended up using channels as Marko suggested:
/**
* Asynchronously fetches stats for all symbols and sends a total number of requests
* to the `counter` channel each time a request completes. For example:
*
* val counterActor = actor<Int>(UI) {
* var counter = 0
* for (total in channel) {
* progressLabel.text = "${++counter} / $total"
* }
* }
*/
suspend fun getAssetStatsWithProgress(counter: SendChannel<Int>): Map<String, AssetStats> {
val symbolMap = getSymbols()?.let { it.map { it.symbol to it }.toMap() } ?: emptyMap()
val total = symbolMap.size
return symbolMap.map { async { getAssetStats(it.key) } }
.mapNotNull { it.await().also { counter.send(total) } }
.map { it.symbol to it }
.toMap()
}
The explanation what exactly makes your wrong approach fail is secondary: the primary thing is fixing the approach.
Instead of async-await or launch, for this communication pattern you should instead have an actor to which all the HTTP jobs send their status. This will automatically handle all your concurrency issues.
Here's some sample code, taken from the link you provided in the comment and adapted to your use case. Instead of some third party asking it for the counter value and updating the GUI with it, the actor runs in the UI context and updates the GUI itself:
import kotlinx.coroutines.experimental.*
import kotlinx.coroutines.experimental.channels.*
import kotlin.system.*
import kotlin.coroutines.experimental.*
object IncCounter
fun counterActor() = actor<IncCounter>(UI) {
var counter = 0
for (msg in channel) {
updateView(++counter)
}
}
fun main(args: Array<String>) = runBlocking {
val counter = counterActor()
massiveRun(CommonPool) {
counter.send(IncCounter)
}
counter.close()
println("View state: $viewState")
}
// Everything below is mock code that supports the example
// code above:
val UI = newSingleThreadContext("UI")
fun updateView(newVal: Int) {
viewState = newVal
}
var viewState = 0
suspend fun massiveRun(context: CoroutineContext, action: suspend () -> Unit) {
val numCoroutines = 1000
val repeatActionCount = 1000
val time = measureTimeMillis {
val jobs = List(numCoroutines) {
launch(context) {
repeat(repeatActionCount) { action() }
}
}
jobs.forEach { it.join() }
}
println("Completed ${numCoroutines * repeatActionCount} actions in $time ms")
}
Running it prints
Completed 1000000 actions in 2189 ms
View state: 1000000
You're losing writes because i++ is not an atomic operation - the value has to be read, incremented, and then written back - and you have multiple threads reading and writing i at the same time. (If you don't provide launch with a context, it uses a threadpool by default.)
You're losing 1 from your count every time two threads read the same value as they will then both write that value plus one.
Synchronizing in some way, for example by using an AtomicInteger solves this:
fun main(args: Array<String>) {
val i = AtomicInteger(0)
val range = (1..100000)
range.map {
launch {
i.incrementAndGet()
}
}
println("$i ${range.count()}") // 100000 100000
}
There's also no guarantee that these background threads will be done with their work by the time you print the result and your program ends - you can test it easily by adding just a very small delay inside launch, a couple milliseconds. With that, it's a good idea to wrap this all in a runBlocking call which will keep the main thread alive and then wait for the coroutines to all finish:
fun main(args: Array<String>) = runBlocking {
val i = AtomicInteger(0)
val range = (1..100000)
val jobs: List<Job> = range.map {
launch {
i.incrementAndGet()
}
}
jobs.forEach { it.join() }
println("$i ${range.count()}") // 100000 100000
}
Have you read Coroutines basics? There's exact same problem as yours:
val c = AtomicInteger()
for (i in 1..1_000_000)
launch {
c.addAndGet(i)
}
println(c.get())
This example completes in less than a second for me, but it prints some arbitrary number, because some coroutines don't finish before main() prints the result.
Because launch is not blocking, there's no guarantee all of coroutines will finish before println. You need to use async, store the Deferred objects and await for them to finish.