Explain the difference in scan + posting to participating source in RxJava vs Kotlin Coroutines - kotlin

I am porting a piece of code from Rx to Coroutines and came across the behavior I can't wrap my head around.
Background: imagine that you have a stream of values each one gets associated with an action-lambda to be executed later. There's also an "external" stream to which lambda can post and it's merged in the resulting stream.
I reduced the original code to this simpler (but still tricky) version:
val shared = PublishSubject.create<String>()
Observable
.merge(
subject.map { it to { } },
Observable.just("item1", "item2")
.map { it to { shared.onNext(it.toUpperCase()) } }
)
.scan("*" to { }) { accumulator, value ->
(accumulator.first + "_" + value.first) to value.second
}
.subscribe {
it.second()
println("got ${it.first}")
}
This prints
got *
got *_item1
got *_item1_ITEM1
got *_item1_ITEM1_item2
got *_item1_ITEM1_item2_ITEM2
Next I have this coroutines + flow based version.
The notable difference is that it has suspend modifier added to the lambda (to be able to call shared.emit().
runBlocking {
val shared = MutableSharedFlow<String>()
merge(
shared.map { it to suspend {} },
flowOf("item1", "item2").map { it to suspend { shared.emit(it.toUpperCase()) } }
)
.scan("*" to suspend { }) { accumulator, value ->
(accumulator.first + "_" + value.first) to value.second
}
.collect {
it.second()
println("got ${it.first}")
}
}
This prints
got *
got *_item1
got *_item1_item2
got *_item1_item1_ITEM1
got *_item1_item2_ITEM1_ITEM2
Notice, that in the Rx version uppercased ITEM emissions were interspersed with lowercase ones, while in the coroutines version they come last.
Questions I'd like to ask:
Why does this happen? Is it due to the suspending lambda? Would be grateful for step-by-step explanation if there is something complex going on
Does Rx have some internal buffer which allows it to behave as it does?
Can similar behavior be achieved with Flow and if so, how?

Related

Processing and aggregating data from multiple servers efficiently

Summary
My goal is to process and aggregate data from multiple servers efficiently while handling possible errors. For that, I
have a sequential version that I want to speed up. As I am using Kotlin, coroutines seem the way to go for this
asynchronous task. However, I'm quite new to this, and can't figure out how to do this idiomatic. None of my attempts
satisfied my requirements completely.
Here is the sequential version of the core function that I am currently using:
suspend fun readDataFromServers(): Set<String> = coroutineScope {
listOfServers
// step 1: read data from servers while logging errors
.mapNotNull { url ->
runCatching { makeRequestTo(url) }
.onFailure { println("err while accessing $url: $it") }
.getOrNull()
}
// step 2: do some element-wise post-processing
.map { process(it) }
// step 3: aggregate data
.toSet()
}
Background
In my use case, there are numServers I want to read data from. Each of them usually answers within successDuration,
but the connection attempt may fail after timeoutDuration with probability failProb and throw an IOException. As
downtimes are a common thing in my system, I do not need to retry anything, but only log it for the record. Hence,
the makeRequestTo function can be modelled as follows:
suspend fun makeRequestTo(url: String) =
if (random.nextFloat() > failProb) {
delay(successDuration)
"{Some response from $url}"
} else {
delay(timeoutDuration)
throw IOException("Connection to $url timed out")
}
Attempts
All these attempts can be tried out in the Kotlin playground. I don't know how long this link stays alive; maybe I'll need to upload this as a gist, but I liked that people can execute the code directly.
Async
I tried using async {makeRequestTo(it)} after listOfServers and awaiting the results in the following mapNotNull
similar
to this post
. While this collapses the communication time to timeoutDuration, all following processing steps have to wait for that
long before they can continue. Hence, some composition of Deferreds was required here, which is discouraged in
Kotlin (or at least should be avoided in favor of suspending
functions).
suspend fun readDataFromServersAsync(): Set<String> = supervisorScope {
listOfServers
.map { async { makeRequestTo(it) } }
.mapNotNull { kotlin.runCatching { it.await() }.onFailure { println("err: $it") }.getOrNull() }
.map { process(it) }
.toSet()
}
Loops
Using normal loops like below fulfills the functional requirements, but feels a bit more complex than it should be.
Especially the part where shared state must be synchronized makes me to not trust this code and any future modifications
to it.
val results = mutableSetOf<String>()
val mutex = Mutex()
val logger = CoroutineExceptionHandler { _, exception -> println("err: $exception") }
for (server in listOfServers) {
launch(logger) {
val response = makeRequestTo(server)
val processed = process(response)
mutex.withLock {
results.add(processed)
}
}
}
return#supervisorScope results

Combining kotlin flow results

I'm wandering if there is a clean way to launch a series of flows in Kotlin and then, after their resolution, perform further operations based on whether they succeeded or not
For example's sake I need to read all integers from a DB (returning them into a flow), check if they are even or odd against an external API (also returning a flow), and then remove the odd ones from the DB
In code it would be something like this
fun findEven() {
db.readIntegers()
.map { listOfInt ->
listOfInt.asFlow()
.flatMapMerge { singleInt ->
httpClient.apiCallToCheckForOddity(singleInt)
.catch {
// API failure when number is even
}
.map {
// API success when number is odd
db.remove(singleInt).collect()
}
}.collect()
}.collect()
}
But the problem I see with this code is the access to the DB deleting entries done in parallel, and I think a better solution would be to run all API calls and somewhere collect all that failed and all that succeeded, so to be able to do a bulk insertion in the DB only once instead of having multiple coroutines do that on their own
In my opinion, it's kind of an anti-pattern to produce side effects in map, filter, etc. A side effect like removing items from a database should be a separate step (collect in the case of a Flow, and forEach in the case of a List) for clarity.
The nested flow is also kind of convoluted, since you can directly modify the list as a List.
I think you can do it like this, assuming the API can only check one item at a time.
suspend fun findEven() {
db.readIntegers()
.map { listOfInt ->
listOfInt.filter { singleInt ->
runCatching {
httpClient.apiCallToCheckForOddity(singleInt)
}.isSuccess
}
}
.collect { listOfOddInt ->
db.removeAll(listOfOddInt)
}
}
Parallel version, if the API call returns the parameter. (By the way, Kotlin APIs should not throw exceptions on non-programmer errors).
suspend fun findEven() {
db.readIntegers()
.map { listOfInt ->
coroutineScope {
listOfInt.map { singleInt ->
async {
runCatching {
httpClient.apiCallToCheckForOddity(singleInt)
}
}
}.awaitAll()
.mapNotNull(Result<Int>::getOrNull)
}
}
.collect { listOfOddInt ->
db.removeAll(listOfOddInt)
}
}

Parallelly consuming a long sequence in Kotlin

I have a function generating a very long sequence of work items. Generating these items is fast, but there are too many in total to store a list of them in memory. Processing the items produces no results, just side effects.
I would like to process these items across multiple threads. One solution is to have a thread read from the generator and write to a concurrent bounded queue, and a number of executor threads polling for work from the bounded queue, but this is a lot of things to set up.
Is there anything in the standard library that would help me do that?
I had initially tried
items.map { async(executor) process(it) }.forEach { it.await() }
But, as pointed out in how to implement parallel mapping for sequences in kotlin, this doesn't work for reasons that are obvious in retrospect.
Is there a quick way to do this (possibly with an external library), or is manually setting up a bounded queue in the middle my best option?
You can look at coroutines combined with channels.
If all work items can be emmited on demand with producer channel. Then it's possible to await for each items and process it with a pool of threads.
An example :
sealed class Stream {
object End: Stream()
class Item(val data: Long): Stream()
}
val produceCtx = newSingleThreadContext("producer")
// A dummy producer that send one million Longs on its own thread
val producer = CoroutineScope(produceCtx).produce {
for (i in (0 until 1000000L)) send(Stream.Item(i))
send(Stream.End)
}
val workCtx = newFixedThreadPoolContext(4, "work")
val workers = Channel<Unit>(4)
repeat(4) { workers.offer(Unit) }
for(_nothing in workers) { // launch 4 times then wait for a task to finish
launch(workCtx) {
when (val item = producer.receive()) {
Stream.End -> workers.close()
is Stream.Item -> {
workFunction(item.data) // Actual work here
workers.offer(Unit) // Notify to launch a new task
}
}
}
}
Your magic word would be .asSequence():
items
.asSequence() // Creates lazy executable sequence
.forEach { launch { executor.process(it) } } // If you don't need the value aftrwards, use 'launch', a.k.a. "fire and forget"
but there are too many in total to store a list of them in memory
Then don't map to list and don't collect the values, no matter if you work with Kotlin or Java.
As long as you are on the JVM, you can write yourself an extension function, that works the sequence in chunks and spawns futures for all entries in a chunk. Something like this:
#Suppress("UNCHECKED_CAST")
fun <T, R> Sequence<T>.mapParallel(action: (value: T) -> R?): Sequence<R?> {
val numThreads = Runtime.getRuntime().availableProcessors() - 1
return this
.chunked(numThreads)
.map { chunk ->
val threadPool = Executors.newFixedThreadPool(numThreads)
try {
return#map chunk
.map {
// CAUTION -> needs to be written like this
// otherwise the submit(Runnable) overload is called
// which always returns an empty Future!!!
val callable: () -> R? = { action(it) }
threadPool.submit(callable)
}
} finally {
threadPool.shutdown()
}
}
.flatten()
.map { future -> future.get() }
}
You can then just use it like:
items
.mapParallel { /* process an item */ }
.forEach { /* handle the result */ }
As long as workload per item is similar, this gives a good parallel processing.

Kotlin - Coroutines not behaving as expected

This question is linked to one of my previous questions: Kotlin - Coroutines with loops.
So, this is my current implementation:
fun propagate() = runBlocking {
logger.info("Propagating objectives...")
val variablesWithSetObjectives: List<ObjectivePropagationMapping> =
variables.filter { it.variable.objective != Objective.NONE }
variablesWithSetObjectives.forEach { variableWithSetObjective ->
logger.debug("Propagating objective ${variableWithSetObjective.variable.objective} from variable ${variableWithSetObjective.variable.name}")
val job: Job = launch {
propagate(variableWithSetObjective, variableWithSetObjective.variable.objective, this, variableWithSetObjective)
}
job.join()
traversedVariableNames.clear()
}
logger.info("Done")
}
private tailrec fun propagate(currentVariable: ObjectivePropagationMapping, objectiveToPropagate: Objective, coroutineScope: CoroutineScope, startFromVariable: ObjectivePropagationMapping? = null) {
if (traversedVariableNames.contains(currentVariable.variable.name)) {
logger.debug("Detected loopback condition, stopping propagation to prevent loop")
return
}
traversedVariableNames.add(currentVariable.variable.name)
val objectiveToPropagateNext: Objective =
if (startFromVariable != currentVariable) {
logger.debug("Propagating objective $objectiveToPropagate to variable ${currentVariable.variable.name}")
computeNewObjectiveForVariable(currentVariable, objectiveToPropagate)
}
else startFromVariable.variable.objective
logger.debug("Choosing variable to propagate to next")
val variablesToPropagateToNext: List<ObjectivePropagationMapping> =
causalLinks
.filter { it.toVariable.name == currentVariable.variable.name }
.map { causalLink -> variables.first { it.variable.name == causalLink.fromVariable.name } }
if (variablesToPropagateToNext.isEmpty()) {
logger.debug("Detected end of path, stopping propagation...")
return
}
val variableToPropagateToNext: ObjectivePropagationMapping = variablesToPropagateToNext.random()
logger.debug("Chose variable ${variableToPropagateToNext.variable.name} to propagate to next")
if (variablesToPropagateToNext.size > 1) {
logger.debug("Detected split condition")
variablesToPropagateToNext.filter { it != variableToPropagateToNext }.forEach {
logger.debug("Launching child thread for split variable ${it.variable.name}")
coroutineScope.launch {
propagate(it, objectiveToPropagateNext, this)
}
}
}
propagate(variableToPropagateToNext, objectiveToPropagateNext, coroutineScope)
}
I'm currently running the algorithm on the following variable topology (Note that the algorithm follows arrows coming to a variable, but not arrows leaving from a variable):
Currently I am getting the following debug print result: https://pastebin.com/ya2tmc6s.
As you can see, even though I launch coroutines they don't begin executing until the main propagate recursive function has finished exploring a complete path.
I would want the launched coroutines to start executing immediately instead...
Unless otherwise specified, all the coroutines you start within runBlocking will run on the same thread.
If you want to enable multithreading, you can just change that to runBlocking(Dispatchers.Default). I'm just going to assume that all that code is thread-safe.
If you don't really want to enable multithreading, then you really shouldn't care what order the coroutines run in.

Observe many times from same Observable (RxAndroidBle)

I'm using the RxAndroidBle library with RxJava2 to read from a BLE Characteristic. I think this question is just an RxJava question, but including the detail that I'm using RxAndroidBle in case that is useful.
I get connection, and then use it to call readCharacteristic(), which itself returns a Single<ByteArray>. At this point, I don't just want to just get the one ByteArray though. I need to read from this characteristic several times, because the BLE device is set up to let me get a small file back, and characteristics can only send 20 bytes back at a time, hence my need to read repeatedly.
Is it possible to modify this code so that the switchMap() below returns an Observable that will emit many ByteArrays, instead of just the single one?
I'm new to RxJava.
val connection: Observable<RxBleConnection> = selectedDevice.record.bleDevice.establishConnection(false, Timeout(30, TimeUnit.SECONDS))
return connection
.subscribeOn(Schedulers.io())
.switchMap {
// I want to get an Observable that can read multiple times here.
it.readCharacteristic(serverCertCharacteristicUUID).toObservable()
}
.doOnNext {
Timber.e("Got Certificate bytes")
}
.map {
String(it as ByteArray)
}
.doOnNext {
Timber.e("Got certificate: $it")
}
.singleOrError()
To repeat a read multiple times until a specific value is emitted one needs to change this part:
// I want to get an Observable that can read multiple times here.
it.readCharacteristic(serverCertCharacteristicUUID).toObservable()
to something like what was suggested by the RxJava author in the first answer that google gives for phrase rxjava single repeat:
// this will repeat until a `checkRepeatIf` returns false
Observable.defer {
val successValue = AtomicReference<ByteArray>()
connection.readCharacteristic(serverCertCharacteristicUUID)
.doOnSuccess { successValue.lazySet(it) }
.repeatWhen { completes -> completes.takeWhile { checkRepeatIf(successValue.get()) } }
}
I was able to get this working by sending a signal to stop both the connectionObservable, and the read on the Bluetooth characteristic. Of note is that you need to call toObservable() AFTER repeat() or this doesn't work, although I don't know why exactly.
override fun readMultipartCharacteristic(macAddress: String): Single<String> {
val CERTIFICATE_TERMINATOR = 0x30.toByte()
val device = bluetoothService.getBleDevice(macAddress)
if (connectionObservable == null || !device.connectionState.equals(RxBleConnection.RxBleConnectionState.CONNECTED)) {
connectionObservable = device.establishConnection(false, Timeout(30, TimeUnit.SECONDS))
}
val stop: PublishSubject<Unit> = PublishSubject.create()
return connectionObservable!!
.subscribeOn(Schedulers.io())
.takeUntil(stop)
.switchMap {
it.readCharacteristic(UUID("my-uuid"))
.repeat()
.toObservable()
.takeUntil(stop)
}
.collectInto(ByteArrayOutputStream(), { buffer, byteArray ->
// Watch for the signal of the end of the stream
if (byteArray.size == 1 && byteArray.get(0).equals(CERTIFICATE_TERMINATOR)) {
stop.onComplete()
} else {
buffer.write(byteArray)
}
})
.map {
String(it.toByteArray())
}
}
You can use the notification to buffer your data.
device.establishConnection(false)
.flatMap(rxBleConnection -> rxBleConnection.setupNotification(characteristicUuid))
.flatMap(notificationObservable -> notificationObservable) // <-- Notification has been set up, now observe value changes.
.subscribe(
bytes -> {
// Given characteristic has been changes, here is the value.
},
throwable -> {
// Handle an error here.
}
);