How to efficiently perform concurrent computation with coroutines

How to efficiently perform concurrent computation with coroutines - kotlin

I'm trying to improve my knowledge of coroutines and currently working on following problem:
Given a random non empty string with a length of 14 characters, what would be the most efficient way to find a string that contains a specific prefix (let's assume prefix length is 5)?
Most of the solutions I encountered on the internet either a) manually launch async{} 2 or 3 times or b) launch async{} in a loop and then await all of them to complete which won't work for this scenario.
One approach I tried was to launch new coroutines until I get a non null repsonse from the computation function and cancel the scope after, however there's a clear a performance issue that I'm not seeing since this approach can take more than 20s to calculate for a prefix with length 1.
...
private val _flow = MutableSharedFlow<String>()
suspend fun invoke(prefix: String) = withContext(dispatcher) { // dispatcher is Dispatchers.Default
_flow.onEach {
println("String is=$it")
this.cancel()
}.launchIn(this)
repeat(Int.MAX_VALUE) {
launch {
getString(prefix)?.let {
_flow.emit(it)
}
}
}
}
private fun getString(prefix: String): String? { // or any other cpu intensive task
val randomString = generateRandomStringAccordingToSpecs() // implemented elsewhere
if (randomString .startsWith(prefix = "prefix", ignoreCase = true)) {
return randomString
} else {
return null
}
}
I also tried an approach with a while loop and 4 parallel executions, for which I'm getting better performace results, however awaiting after every X calculations doesn't seem like the most efficient solution to me:
suspend fun invoke(prefix: String) = withContext(dispatcher) {
var resultString: String? = getString(prefix)
while (resultString == null) {
val tasks = listOf(
async { getString(prefix) },
async { getString(prefix) },
async { getString(prefix) },
async { getString(prefix) }
)
resultString = tasks.awaitAll().filterNotNull().firstOrNull()
}
println("String is=$resultString")
}
private fun getString(prefix: String): String? { // or any other cpu intensive task
val randomString = generateRandomStringAccordingToSpecs() // implemented elsewhere
if (randomString .startsWith(prefix = "prefix", ignoreCase = true)) {
return randomString
} else {
return null
}
}
In the example above I'm using a find suffix problem, but in general, what is the most efficient way to concurrently perform some CPU intensive calculations with coroutines?
Especially for the calculations where we don't know how many times the task must be executed before we get an answer.

This seems like a job for the select function. Assuming your generateRandomStringAccordingToSpecs() is a computationally blocking function, you want to have all your CPU cores working on the problem simultaneously and you just want the first valid result, you could build an operator like this:
suspend fun <T> getFirstResult(block: suspend CoroutineScope.() -> T): T =
withContext(Dispatchers.Default) {
coroutineScope {
select {
repeat(Runtime.getRuntime().availableProcessors()) {
async { block() }.onAwait {
coroutineContext.cancelChildren()
it
}
}
}
}
}
It starts as many parallel coroutines as there are CPUs, and once any of them returns a result, it cancels the rest and returns that result.
So you can use this with a coroutine block that uses a while loop indefinitely until a result is returned:
suspend fun invoke(prefix: String) = getFirstResult {
while(isActive) {
return#getFirstResult getString(prefix) ?: continue
}
}

Related

How to wait for a flow to complete emitting the values

I have a function "getUser" in my Repository which emits an object representing a user based on the provided id.
flow function
fun getUser(id: String) = callbackFlow {
val collectionReference: CollectionReference =
FirebaseFirestore.getInstance().collection(COLLECTION_USERS)
val query: Query = collectionReference.whereEqualTo(ID, id)
query.get().addOnSuccessListener {
val lst = it.toObjects(User::class.java)
if (lst.isEmpty())
offer(null)
else
offer(it.toObjects(User::class.java)[0])
}
awaitClose()
}
I need these values in another class. I loop over a list of ids and I add the collected user to a new list. How can I wait for the list to be completed when I collect the values, before calling return?
collector function
private fun computeAttendeesList(reminder: Reminder): ArrayList<User> {
val attendeesList = arrayListOf<User>()
for (friend in reminder.usersToShare) {
repoScope.launch {
Repository.getUser(friend).collect {
it?.let { user ->
if (!attendeesList.contains(user))
attendeesList.add(user)
}
}
}
}
return attendeesList
}
I do not want to use live data since this is not a UI-related class.

There are multiple problems to address in this code:
getUser() is meant to return a single User, but it currently returns a Flow<User>
which will never end, and never return more than one user.
the way the list of users is constructed from multiple concurrent query is not thread safe (because multiple launches are executed on the multi-threaded IO dispatcher, and they all update the same unsafe list directly)
the actual use case is to get a list of users from Firebase, but many queries for a single ID are used instead of a single query
Solution to #1
Let's tackle #1 first. Here is a version of getUser() that suspends for a single User instead of returning a Flow:
suspend fun getUser(id: String): User {
val collectionReference = FirebaseFirestore.getInstance().collection(COLLECTION_USERS)
val query = collectionReference.whereEqualTo(ID, id)
return query.get().await().let { it.toObjects(User::class.java) }.firstOrNull()
}
// use the kotlinx-coroutines-play-services library instead
private suspend fun <T> Task<T>.await(): T {
return suspendCancellableCoroutine { cont ->
addOnCompleteListener {
val e = exception
if (e == null) {
#Suppress("UNCHECKED_CAST")
if (isCanceled) cont.cancel() else cont.resume(result as T)
} else {
cont.resumeWithException(e)
}
}
}
}
It turns out that this await() function was already written (in a better way) and it's available in the kotlinx-coroutines-play-services library, so you don't need to actually write it yourself.
Solution to #2
If we could not rewrite the whole thing according to #3, we could deal with problem #2 this way:
private suspend fun computeAttendeesList(reminder: Reminder): List<User> {
return reminder.usersToShare
.map { friendId ->
repoScope.async { Repository.getUser(friendId) }
}
.map { it.await() }
.toList()
}
Solution to #3
Instead, we could directly query Firebase for the whole list:
suspend fun getUsers(ids: List<String>): List<User> {
val collectionReference = FirebaseFirestore.getInstance().collection(COLLECTION_USERS)
val query = collectionReference.whereIn(ID, ids)
return query.get().await().let { it.toObjects(User::class.java) }
}
And then consume it in a very basic way:
private suspend fun computeAttendeesList(reminder: Reminder): List<User> {
return Repository.getUsers(reminder.usersToShare)
}
Alternatively, you could make this function blocking (remove suspend) and wrap your call in runBlocking (if you really need to block the current thread).
Note that this solution didn't enforce any dispatcher, so if you want a particular scope or dispatcher, you can wrap one of the suspend function calls with withContext.

Kotlin \ Android - LiveData async transformation prevent previous result

So I have a LiveData that I transform to an async function that takes a while to execute (like 2 seconds sometimes, or 4 seconds).
sometimes the call takes long, and sometimes it's really fast (depends on the results) sometimes it's instant (empty result)
the problem is that if I have 2 consecutive emits in my LiveData, sometimes the first result takes a while to execute, and the second one will take an instant, than it will show the second before the first, and than overwrite the result with the earlier calculation,
what I want is mroe of a sequential effect. (kinda like RxJava concatMap)
private val _state = query.mapAsync(viewModelScope) { searchString ->
if (searchString.isEmpty()) {
NoSearch
} else {
val results = repo.search(searchString)
if (results.isNotEmpty()) {
Results(results.map { mapToMainResult(it, searchString) })
} else {
NoResults
}
}
}
#MainThread
fun <X, Y> LiveData<X>.mapAsync(
scope: CoroutineScope,
mapFunction: androidx.arch.core.util.Function<X, Y>
): LiveData<Y> {
val result = MediatorLiveData<Y>()
result.addSource(this) { x ->
scope.launch(Dispatchers.IO) { result.postValue(mapFunction.apply(x)) }
}
return result
}
how do I prevent the second result from overwriting the first result?

#MainThread
fun <X, Y> LiveData<X>.mapAsync(
scope: CoroutineScope,
mapFunction: (X) -> Y,
): LiveData<Y> = switchMap { value ->
liveData(scope.coroutineContext) {
withContext(Dispatchers.IO) {
emit(mapFunction(value))
}
}
}

Implement backoff strategy in flow

I'm trying to implement a backoff strategy just using kotlin flow.
I need to fetch data from timeA to timeB
result = dataBetween(timeA - timeB)
if the result is empty then I want to increase the end time window using exponential backoff
result = dataBetween(timeA - timeB + exponentialBackOffInDays)
I was following this article which is explaining how to approach this in rxjava2.
But got stuck at a point where flow does not have takeUntil operator yet.
You can see my implementation below.
fun main() {
runBlocking {
(0..8).asFlow()
.flatMapConcat { input ->
// To simulate a data source which fetches data based on a time-window start-date to end-date
// available with in that time frame.
flow {
println("Input: $input")
if (input < 5) {
emit(emptyList<String>())
} else { // After emitting this once the flow should complete
emit(listOf("Available"))
}
}.retryWhenThrow(DummyException(), predicate = {
it.isNotEmpty()
})
}.collect {
//println(it)
}
}
}
class DummyException : Exception("Collected size is empty")
private inline fun <T> Flow<T>.retryWhenThrow(
throwable: Throwable,
crossinline predicate: suspend (T) -> Boolean
): Flow<T> {
return flow {
collect { value ->
if (!predicate(value)) {
throw throwable // informing the upstream to keep emitting since the condition is met
}
println("Value: $value")
emit(value)
}
}.catch { e ->
if (e::class != throwable::class) throw e
}
}
It's working fine except even after the flow has a successful value the flow continue to collect till 8 from the upstream flow but ideally, it should have stopped when it reaches 5 itself.
Any help on how I should approach this would be helpful.

Maybe this does not match your exact setup but instead of calling collect, you might as well just use first{...} or firstOrNull{...}
This will automatically stop the upstream flows after an element has been found.
For example:
flowOf(0,0,3,10)
.flatMapConcat {
println("creating list with $it elements")
flow {
val listWithElementCount = MutableList(it){ "" } // just a list of n empty strings
emit(listWithElementCount)
}
}.first { it.isNotEmpty() }
On a side note, your problem sounds like a regular suspend function would be a better fit.
Something like
suspend fun getFirstNonEmptyList(initialFrom: Long, initialTo: Long): List<Any> {
var from = initialFrom
var to = initialTo
while (coroutineContext.isActive) {
val elements = getElementsInRange(from, to) // your "dataBetween"
if (elements.isNotEmpty()) return elements
val (newFrom, newTo) = nextBackoff(from, to)
from = newFrom
to = newTo
}
throw CancellationException()
}

Kotlin coroutines and Java Completable future integration

Usually I'm using standard kotlin-jdk8 library to jump from Java *future API world into the Kotlin's suspend heaven.
And it worked great for me, until I encountered Neo4J cursor API, where I can't do .await() on the completion stage, because it immediately starts fetching millions of records into memory.
Kotlin way does not work for me, like this:
suspend fun query() {
driver.session().use { session ->
val cursor: StatementResultCursor = session.readTransactionAsync {
it.runAsync("query ...", params)
}.await() // HERE WE DIE WITH OOM
var record = cursor.nextAsync().await()
while (record != null) {
val node = record.get("node")
mySuspendProcessingFunction(node)
record = cursor.nextAsync().await()
}
}
}
At the same time, Java API works good, we fetch records one by one:
suspend fun query() {
session.readTransactionAsync { transaction ->
transaction.runAsync("query ...", params).thenCompose { cursor ->
cursor.forEachAsync { record ->
runBlocking { // BUT I NEED TO DO RUN BLOCKING HERE :(
val node = record.get("node")
mySuspendProcessingFunction(node)
}
}
}
}.thenCompose {
session.closeAsync()
}.await()
}
The second option works for me, but it is pretty ugly - definitely not Kotlin way, and what is more important, I need to use runBlocking (but these whole block is executed within suspend function)
What am I doing wrong? Is there a better way?
UPD
Tried to do this exercise using new Flow() feature, unfortunately results are the same:
suspend fun query() {
session.readTransactionAsync { transaction ->
transaction.runAsync(query, params).thenApply { cursor ->
cursor.asFlow().onEach { record ->
val node = record.get("node")
mySuspendProcessingFunction(node)
}
}
}.thenCompose {
session.closeAsync()
}.await()
}
fun StatementResultCursor.asFlow() = flow {
do {
val record = nextAsync().await()
if (record != null) emit(record)
} while (record != null)
}

Kotlin coroutines progress counter

I'm making thousands of HTTP requests using async/await and would like to have a progress indicator. I've added one in a naive way, but noticed that the counter value never reaches the total when all requests are done. So I've created a simple test and, sure enough, it doesn't work as expected:
fun main(args: Array<String>) {
var i = 0
val range = (1..100000)
range.map {
launch {
++i
}
}
println("$i ${range.count()}")
}
The output is something like this, where the first number always changes:
98800 100000
I'm probably missing some important detail about concurrency/synchronization in JVM/Kotlin, but don't know where to start. Any tips?
UPDATE: I ended up using channels as Marko suggested:
/**
* Asynchronously fetches stats for all symbols and sends a total number of requests
* to the `counter` channel each time a request completes. For example:
*
* val counterActor = actor<Int>(UI) {
* var counter = 0
* for (total in channel) {
* progressLabel.text = "${++counter} / $total"
* }
* }
*/
suspend fun getAssetStatsWithProgress(counter: SendChannel<Int>): Map<String, AssetStats> {
val symbolMap = getSymbols()?.let { it.map { it.symbol to it }.toMap() } ?: emptyMap()
val total = symbolMap.size
return symbolMap.map { async { getAssetStats(it.key) } }
.mapNotNull { it.await().also { counter.send(total) } }
.map { it.symbol to it }
.toMap()
}

The explanation what exactly makes your wrong approach fail is secondary: the primary thing is fixing the approach.
Instead of async-await or launch, for this communication pattern you should instead have an actor to which all the HTTP jobs send their status. This will automatically handle all your concurrency issues.
Here's some sample code, taken from the link you provided in the comment and adapted to your use case. Instead of some third party asking it for the counter value and updating the GUI with it, the actor runs in the UI context and updates the GUI itself:
import kotlinx.coroutines.experimental.*
import kotlinx.coroutines.experimental.channels.*
import kotlin.system.*
import kotlin.coroutines.experimental.*
object IncCounter
fun counterActor() = actor<IncCounter>(UI) {
var counter = 0
for (msg in channel) {
updateView(++counter)
}
}
fun main(args: Array<String>) = runBlocking {
val counter = counterActor()
massiveRun(CommonPool) {
counter.send(IncCounter)
}
counter.close()
println("View state: $viewState")
}
// Everything below is mock code that supports the example
// code above:
val UI = newSingleThreadContext("UI")
fun updateView(newVal: Int) {
viewState = newVal
}
var viewState = 0
suspend fun massiveRun(context: CoroutineContext, action: suspend () -> Unit) {
val numCoroutines = 1000
val repeatActionCount = 1000
val time = measureTimeMillis {
val jobs = List(numCoroutines) {
launch(context) {
repeat(repeatActionCount) { action() }
}
}
jobs.forEach { it.join() }
}
println("Completed ${numCoroutines * repeatActionCount} actions in $time ms")
}
Running it prints
Completed 1000000 actions in 2189 ms
View state: 1000000

You're losing writes because i++ is not an atomic operation - the value has to be read, incremented, and then written back - and you have multiple threads reading and writing i at the same time. (If you don't provide launch with a context, it uses a threadpool by default.)
You're losing 1 from your count every time two threads read the same value as they will then both write that value plus one.
Synchronizing in some way, for example by using an AtomicInteger solves this:
fun main(args: Array<String>) {
val i = AtomicInteger(0)
val range = (1..100000)
range.map {
launch {
i.incrementAndGet()
}
}
println("$i ${range.count()}") // 100000 100000
}
There's also no guarantee that these background threads will be done with their work by the time you print the result and your program ends - you can test it easily by adding just a very small delay inside launch, a couple milliseconds. With that, it's a good idea to wrap this all in a runBlocking call which will keep the main thread alive and then wait for the coroutines to all finish:
fun main(args: Array<String>) = runBlocking {
val i = AtomicInteger(0)
val range = (1..100000)
val jobs: List<Job> = range.map {
launch {
i.incrementAndGet()
}
}
jobs.forEach { it.join() }
println("$i ${range.count()}") // 100000 100000
}

Have you read Coroutines basics? There's exact same problem as yours:
val c = AtomicInteger()
for (i in 1..1_000_000)
launch {
c.addAndGet(i)
}
println(c.get())
This example completes in less than a second for me, but it prints some arbitrary number, because some coroutines don't finish before main() prints the result.
Because launch is not blocking, there's no guarantee all of coroutines will finish before println. You need to use async, store the Deferred objects and await for them to finish.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to efficiently perform concurrent computation with coroutines - kotlin

Related

How to wait for a flow to complete emitting the values

Kotlin \ Android - LiveData async transformation prevent previous result

Implement backoff strategy in flow

Kotlin coroutines and Java Completable future integration

Kotlin coroutines progress counter

Categories

Resources