Kotlin coroutines and Java Completable future integration - kotlin

Usually I'm using standard kotlin-jdk8 library to jump from Java *future API world into the Kotlin's suspend heaven.
And it worked great for me, until I encountered Neo4J cursor API, where I can't do .await() on the completion stage, because it immediately starts fetching millions of records into memory.
Kotlin way does not work for me, like this:
suspend fun query() {
driver.session().use { session ->
val cursor: StatementResultCursor = session.readTransactionAsync {
it.runAsync("query ...", params)
}.await() // HERE WE DIE WITH OOM
var record = cursor.nextAsync().await()
while (record != null) {
val node = record.get("node")
mySuspendProcessingFunction(node)
record = cursor.nextAsync().await()
}
}
}
At the same time, Java API works good, we fetch records one by one:
suspend fun query() {
session.readTransactionAsync { transaction ->
transaction.runAsync("query ...", params).thenCompose { cursor ->
cursor.forEachAsync { record ->
runBlocking { // BUT I NEED TO DO RUN BLOCKING HERE :(
val node = record.get("node")
mySuspendProcessingFunction(node)
}
}
}
}.thenCompose {
session.closeAsync()
}.await()
}
The second option works for me, but it is pretty ugly - definitely not Kotlin way, and what is more important, I need to use runBlocking (but these whole block is executed within suspend function)
What am I doing wrong? Is there a better way?
UPD
Tried to do this exercise using new Flow() feature, unfortunately results are the same:
suspend fun query() {
session.readTransactionAsync { transaction ->
transaction.runAsync(query, params).thenApply { cursor ->
cursor.asFlow().onEach { record ->
val node = record.get("node")
mySuspendProcessingFunction(node)
}
}
}.thenCompose {
session.closeAsync()
}.await()
}
fun StatementResultCursor.asFlow() = flow {
do {
val record = nextAsync().await()
if (record != null) emit(record)
} while (record != null)
}

Related

How to efficiently perform concurrent computation with coroutines

I'm trying to improve my knowledge of coroutines and currently working on following problem:
Given a random non empty string with a length of 14 characters, what would be the most efficient way to find a string that contains a specific prefix (let's assume prefix length is 5)?
Most of the solutions I encountered on the internet either a) manually launch async{} 2 or 3 times or b) launch async{} in a loop and then await all of them to complete which won't work for this scenario.
One approach I tried was to launch new coroutines until I get a non null repsonse from the computation function and cancel the scope after, however there's a clear a performance issue that I'm not seeing since this approach can take more than 20s to calculate for a prefix with length 1.
...
private val _flow = MutableSharedFlow<String>()
suspend fun invoke(prefix: String) = withContext(dispatcher) { // dispatcher is Dispatchers.Default
_flow.onEach {
println("String is=$it")
this.cancel()
}.launchIn(this)
repeat(Int.MAX_VALUE) {
launch {
getString(prefix)?.let {
_flow.emit(it)
}
}
}
}
private fun getString(prefix: String): String? { // or any other cpu intensive task
val randomString = generateRandomStringAccordingToSpecs() // implemented elsewhere
if (randomString .startsWith(prefix = "prefix", ignoreCase = true)) {
return randomString
} else {
return null
}
}
I also tried an approach with a while loop and 4 parallel executions, for which I'm getting better performace results, however awaiting after every X calculations doesn't seem like the most efficient solution to me:
suspend fun invoke(prefix: String) = withContext(dispatcher) {
var resultString: String? = getString(prefix)
while (resultString == null) {
val tasks = listOf(
async { getString(prefix) },
async { getString(prefix) },
async { getString(prefix) },
async { getString(prefix) }
)
resultString = tasks.awaitAll().filterNotNull().firstOrNull()
}
println("String is=$resultString")
}
private fun getString(prefix: String): String? { // or any other cpu intensive task
val randomString = generateRandomStringAccordingToSpecs() // implemented elsewhere
if (randomString .startsWith(prefix = "prefix", ignoreCase = true)) {
return randomString
} else {
return null
}
}
In the example above I'm using a find suffix problem, but in general, what is the most efficient way to concurrently perform some CPU intensive calculations with coroutines?
Especially for the calculations where we don't know how many times the task must be executed before we get an answer.
This seems like a job for the select function. Assuming your generateRandomStringAccordingToSpecs() is a computationally blocking function, you want to have all your CPU cores working on the problem simultaneously and you just want the first valid result, you could build an operator like this:
suspend fun <T> getFirstResult(block: suspend CoroutineScope.() -> T): T =
withContext(Dispatchers.Default) {
coroutineScope {
select {
repeat(Runtime.getRuntime().availableProcessors()) {
async { block() }.onAwait {
coroutineContext.cancelChildren()
it
}
}
}
}
}
It starts as many parallel coroutines as there are CPUs, and once any of them returns a result, it cancels the rest and returns that result.
So you can use this with a coroutine block that uses a while loop indefinitely until a result is returned:
suspend fun invoke(prefix: String) = getFirstResult {
while(isActive) {
return#getFirstResult getString(prefix) ?: continue
}
}

How to wait for a flow to complete emitting the values

I have a function "getUser" in my Repository which emits an object representing a user based on the provided id.
flow function
fun getUser(id: String) = callbackFlow {
val collectionReference: CollectionReference =
FirebaseFirestore.getInstance().collection(COLLECTION_USERS)
val query: Query = collectionReference.whereEqualTo(ID, id)
query.get().addOnSuccessListener {
val lst = it.toObjects(User::class.java)
if (lst.isEmpty())
offer(null)
else
offer(it.toObjects(User::class.java)[0])
}
awaitClose()
}
I need these values in another class. I loop over a list of ids and I add the collected user to a new list. How can I wait for the list to be completed when I collect the values, before calling return?
collector function
private fun computeAttendeesList(reminder: Reminder): ArrayList<User> {
val attendeesList = arrayListOf<User>()
for (friend in reminder.usersToShare) {
repoScope.launch {
Repository.getUser(friend).collect {
it?.let { user ->
if (!attendeesList.contains(user))
attendeesList.add(user)
}
}
}
}
return attendeesList
}
I do not want to use live data since this is not a UI-related class.
There are multiple problems to address in this code:
getUser() is meant to return a single User, but it currently returns a Flow<User>
which will never end, and never return more than one user.
the way the list of users is constructed from multiple concurrent query is not thread safe (because multiple launches are executed on the multi-threaded IO dispatcher, and they all update the same unsafe list directly)
the actual use case is to get a list of users from Firebase, but many queries for a single ID are used instead of a single query
Solution to #1
Let's tackle #1 first. Here is a version of getUser() that suspends for a single User instead of returning a Flow:
suspend fun getUser(id: String): User {
val collectionReference = FirebaseFirestore.getInstance().collection(COLLECTION_USERS)
val query = collectionReference.whereEqualTo(ID, id)
return query.get().await().let { it.toObjects(User::class.java) }.firstOrNull()
}
// use the kotlinx-coroutines-play-services library instead
private suspend fun <T> Task<T>.await(): T {
return suspendCancellableCoroutine { cont ->
addOnCompleteListener {
val e = exception
if (e == null) {
#Suppress("UNCHECKED_CAST")
if (isCanceled) cont.cancel() else cont.resume(result as T)
} else {
cont.resumeWithException(e)
}
}
}
}
It turns out that this await() function was already written (in a better way) and it's available in the kotlinx-coroutines-play-services library, so you don't need to actually write it yourself.
Solution to #2
If we could not rewrite the whole thing according to #3, we could deal with problem #2 this way:
private suspend fun computeAttendeesList(reminder: Reminder): List<User> {
return reminder.usersToShare
.map { friendId ->
repoScope.async { Repository.getUser(friendId) }
}
.map { it.await() }
.toList()
}
Solution to #3
Instead, we could directly query Firebase for the whole list:
suspend fun getUsers(ids: List<String>): List<User> {
val collectionReference = FirebaseFirestore.getInstance().collection(COLLECTION_USERS)
val query = collectionReference.whereIn(ID, ids)
return query.get().await().let { it.toObjects(User::class.java) }
}
And then consume it in a very basic way:
private suspend fun computeAttendeesList(reminder: Reminder): List<User> {
return Repository.getUsers(reminder.usersToShare)
}
Alternatively, you could make this function blocking (remove suspend) and wrap your call in runBlocking (if you really need to block the current thread).
Note that this solution didn't enforce any dispatcher, so if you want a particular scope or dispatcher, you can wrap one of the suspend function calls with withContext.

Implement backoff strategy in flow

I'm trying to implement a backoff strategy just using kotlin flow.
I need to fetch data from timeA to timeB
result = dataBetween(timeA - timeB)
if the result is empty then I want to increase the end time window using exponential backoff
result = dataBetween(timeA - timeB + exponentialBackOffInDays)
I was following this article which is explaining how to approach this in rxjava2.
But got stuck at a point where flow does not have takeUntil operator yet.
You can see my implementation below.
fun main() {
runBlocking {
(0..8).asFlow()
.flatMapConcat { input ->
// To simulate a data source which fetches data based on a time-window start-date to end-date
// available with in that time frame.
flow {
println("Input: $input")
if (input < 5) {
emit(emptyList<String>())
} else { // After emitting this once the flow should complete
emit(listOf("Available"))
}
}.retryWhenThrow(DummyException(), predicate = {
it.isNotEmpty()
})
}.collect {
//println(it)
}
}
}
class DummyException : Exception("Collected size is empty")
private inline fun <T> Flow<T>.retryWhenThrow(
throwable: Throwable,
crossinline predicate: suspend (T) -> Boolean
): Flow<T> {
return flow {
collect { value ->
if (!predicate(value)) {
throw throwable // informing the upstream to keep emitting since the condition is met
}
println("Value: $value")
emit(value)
}
}.catch { e ->
if (e::class != throwable::class) throw e
}
}
It's working fine except even after the flow has a successful value the flow continue to collect till 8 from the upstream flow but ideally, it should have stopped when it reaches 5 itself.
Any help on how I should approach this would be helpful.
Maybe this does not match your exact setup but instead of calling collect, you might as well just use first{...} or firstOrNull{...}
This will automatically stop the upstream flows after an element has been found.
For example:
flowOf(0,0,3,10)
.flatMapConcat {
println("creating list with $it elements")
flow {
val listWithElementCount = MutableList(it){ "" } // just a list of n empty strings
emit(listWithElementCount)
}
}.first { it.isNotEmpty() }
On a side note, your problem sounds like a regular suspend function would be a better fit.
Something like
suspend fun getFirstNonEmptyList(initialFrom: Long, initialTo: Long): List<Any> {
var from = initialFrom
var to = initialTo
while (coroutineContext.isActive) {
val elements = getElementsInRange(from, to) // your "dataBetween"
if (elements.isNotEmpty()) return elements
val (newFrom, newTo) = nextBackoff(from, to)
from = newFrom
to = newTo
}
throw CancellationException()
}

Equivalent of RxJava .toList() in Kotlin coroutines flow

I have a situation where I need to observe userIds then use those userIds to observe users. Either userIds or users could change at any time and I want to keep the emitted users up to date.
Here is an example of the sources of data I have:
data class User(val name: String)
fun observeBestUserIds(): Flow<List<String>> {
return flow {
emit(listOf("abc", "def"))
delay(500)
emit(listOf("123", "234"))
}
}
fun observeUserForId(userId: String): Flow<User> {
return flow {
emit(User("${userId}_name"))
delay(2000)
emit(User("${userId}_name_updated"))
}
}
In this scenario I want the emissions to be:
[User(abc_name), User(def_name)], then
[User(123_name), User(234_name)], then
[User(123_name_updated), User(234_name_updated)]
I think I can achieve this in RxJava like this:
observeBestUserIds.concatMapSingle { ids ->
Observable.fromIterable(ids)
.concatMap { id ->
observeUserForId(id)
}
.toList()
}
What function would I write to make a flow that emits that?
I believe you're looking for combine, which gives you an array that you can easily call toList() on:
observeBestUserIds().collectLatest { ids ->
combine(
ids.map { id -> observeUserForId(id) }
) {
it.toList()
}.collect {
println(it)
}
}
And here's the inner part with more explicit parameter names since you can't see the IDE's type hinting on Stack Overflow:
combine(
ids.map { id -> observeUserForId(id) }
) { arrayOfUsers: Array<User> ->
arrayOfUsers.toList()
}.collect { listOfUsers: List<User> ->
println(listOfUsers)
}
Output:
[User(name=abc_name), User(name=def_name)]
[User(name=123_name), User(name=234_name)]
[User(name=123_name_updated), User(name=234_name)]
[User(name=123_name_updated), User(name=234_name_updated)]
Live demo (note that in the demo, all the output appears at once, but this is a limitation of the demo site - the lines appear with the timing you'd expect when the code is run locally)
This avoids the (abc_name_updated, def_name_updated) discussed in the original question. However, there's still an intermediate emission with 123_name_updated and 234_name because the 123_name_updated is emitted first and it sends the combined version immediately because they're the latest from each flow.
However, this can be avoided by debouncing the emissions (on my machine, a timeout as small as 1ms works, but I did 20ms to be conservative):
observeBestUserIds().collectLatest { ids ->
combine(
ids.map { id -> observeUserForId(id) }
) {
it.toList()
}.debounce(timeoutMillis = 20).collect {
println(it)
}
}
which gets you the exact output you wanted:
[User(name=abc_name), User(name=def_name)]
[User(name=123_name), User(name=234_name)]
[User(name=123_name_updated), User(name=234_name_updated)]
Live demo
This is unfortunatly non trivial with the current state of kotlin Flow, there seem to be important operators missing. But please notice that you are not looking for rxJavas toList(). If you would try to to do it with toList and concatMap in rxjava you would have to wait till all observabes finish.
This is not what you want.
Unfortunately for you I think there is no way around a custom function.
It would have to aggregate all the results returned by observeUserForId for all the ids which you would pass to it. It would also not be a simple windowing function, since in reality it is conceivable that one observeUserForId already returned twice and another call still didn't finish. So checking whether you already have the same number of users as you passed ids into your aggregating functions isn't enought, you also have to group by user id.
I'll try to add code later today.
Edit: As promised here is my solution I took the liberty of augmenting the requirements slightly. So the flow will emit every time all userIds have values and an underlying user changes. I think this is more likely what you want since users probably don't change properties in lockstep.
Nevertheless if this is not what you want leave a comment.
import kotlinx.coroutines.delay
import kotlinx.coroutines.flow.*
import kotlinx.coroutines.runBlocking
data class User(val name: String)
fun observeBestUserIds(): Flow<List<String>> {
return flow {
emit(listOf("abc", "def"))
delay(500)
emit(listOf("123", "234"))
}
}
fun observeUserForId(userId: String): Flow<User> {
return flow {
emit(User("${userId}_name"))
delay(2000)
emit(User("${userId}_name_updated"))
}
}
inline fun <reified K, V> buildMap(keys: Set<K>, crossinline valueFunc: (K) -> Flow<V>): Flow<Map<K, V>> = flow {
val keysSize = keys.size
val valuesMap = HashMap<K, V>(keys.size)
flowOf(*keys.toTypedArray())
.flatMapMerge { key -> valueFunc(key).map {v -> Pair(key, v)} }
.collect { (key, value) ->
valuesMap[key] = value
if (valuesMap.keys.size == keysSize) {
emit(valuesMap.toMap())
}
}
}
fun observeUsersForIds(): Flow<List<User>> {
return observeBestUserIds().flatMapLatest { ids -> buildMap(ids.toSet(), ::observeUserForId as (String) -> Flow<User>) }
.map { m -> m.values.toList() }
}
fun main() = runBlocking {
observeUsersForIds()
.collect { user ->
println(user)
}
}
This will return
[User(name=def_name), User(name=abc_name)]
[User(name=123_name), User(name=234_name)]
[User(name=123_name_updated), User(name=234_name)]
[User(name=123_name_updated), User(name=234_name_updated)]
You can run the code online here
You can use flatMapConcat
val users = observeBestUserIds()
.flatMapConcat { ids ->
flowOf(*ids.toTypedArray())
.map { id ->
observeUserForId(id)
}
}
.flattenConcat()
.toList()
or
observeBestUserIds()
.flatMapConcat { ids ->
flowOf(*ids.toTypedArray())
.map { id ->
observeUserForId(id)
}
}
.flattenConcat()
.collect { user ->
}

Kotlin Process Collection In Parallel?

I have a collection of objects, which I need to perform some transformation on. Currently I am using:
var myObjects: List<MyObject> = getMyObjects()
myObjects.forEach{ myObj ->
someMethod(myObj)
}
It works fine, but I was hoping to speed it up by running someMethod() in parallel, instead of waiting for each object to finish, before starting on the next one.
Is there any way to do this in Kotlin? Maybe with doAsyncTask or something?
I know when this was asked over a year ago it was not possible, but now that Kotlin has coroutines like doAsyncTask I am curious if any of the coroutines can help
Yes, this can be done using coroutines. The following function applies an operation in parallel on all elements of a collection:
fun <A>Collection<A>.forEachParallel(f: suspend (A) -> Unit): Unit = runBlocking {
map { async(CommonPool) { f(it) } }.forEach { it.await() }
}
While the definition itself is a little cryptic, you can then easily apply it as you would expect:
myObjects.forEachParallel { myObj ->
someMethod(myObj)
}
Parallel map can be implemented in a similar way, see https://stackoverflow.com/a/45794062/1104870.
Java Stream is simple to use in Kotlin:
tasks.stream().parallel().forEach { computeNotSuspend(it) }
If you are using Android however, you cannot use Java 8 if you want an app compatible with an API lower than 24.
You can also use coroutines as you suggested. But it's not really part of the language as of now (August 2017) and you need to install an external library. There is very good guide with examples.
runBlocking<Unit> {
val deferreds = tasks.map { async(CommonPool) { compute(it) } }
deferreds.forEach { it.await() }
}
Note that coroutines are implemented with non-blocking multi-threading, which mean they can be faster than traditional multi-threading. I have code below benchmarking the Stream parallel versus coroutine and in that case the coroutine approach is 7 times faster on my machine. However you have to do some work yourself to make sure your code is "suspending" (non-locking) which can be quite tricky. In my example I'm just calling delay which is a suspend function provided by the library. Non-blocking multi-threading is not always faster than traditional multi-threading. It can be faster if you have many threads doing nothing but waiting on IO, which is kind of what my benchmark is doing.
My benchmarking code:
import kotlinx.coroutines.experimental.CommonPool
import kotlinx.coroutines.experimental.async
import kotlinx.coroutines.experimental.delay
import kotlinx.coroutines.experimental.launch
import kotlinx.coroutines.experimental.runBlocking
import java.util.*
import kotlin.system.measureNanoTime
import kotlin.system.measureTimeMillis
class SomeTask() {
val durationMS = random.nextInt(1000).toLong()
companion object {
val random = Random()
}
}
suspend fun compute(task: SomeTask): Unit {
delay(task.durationMS)
//println("done ${task.durationMS}")
return
}
fun computeNotSuspend(task: SomeTask): Unit {
Thread.sleep(task.durationMS)
//println("done ${task.durationMS}")
return
}
fun main(args: Array<String>) {
val n = 100
val tasks = List(n) { SomeTask() }
val timeCoroutine = measureNanoTime {
runBlocking<Unit> {
val deferreds = tasks.map { async(CommonPool) { compute(it) } }
deferreds.forEach { it.await() }
}
}
println("Coroutine ${timeCoroutine / 1_000_000} ms")
val timePar = measureNanoTime {
tasks.stream().parallel().forEach { computeNotSuspend(it) }
}
println("Stream parallel ${timePar / 1_000_000} ms")
}
Output on my 4 cores computer:
Coroutine: 1037 ms
Stream parallel: 7150 ms
If you uncomment out the println in the two compute functions you will see that in the non-blocking coroutine code the tasks are processed in the right order, but not with Streams.
You can use RxJava to solve this.
List<MyObjects> items = getList()
Observable.from(items).flatMap(object : Func1<MyObjects, Observable<String>>() {
fun call(item: MyObjects): Observable<String> {
return someMethod(item)
}
}).subscribeOn(Schedulers.io()).observeOn(AndroidSchedulers.mainThread()).subscribe(object : Subscriber<String>() {
fun onCompleted() {
}
fun onError(e: Throwable) {
}
fun onNext(s: String) {
// do on output of each string
}
})
By subscribing on Schedulers.io(), some method is scheduled on background thread.
To process items of a collection in parallel you can use Kotlin Coroutines. For example the following extension function processes items in parallel and waits for them to be processed:
suspend fun <T, R> Iterable<T>.processInParallel(
dispatcher: CoroutineDispatcher = Dispatchers.IO,
processBlock: suspend (v: T) -> R,
): List<R> = coroutineScope { // or supervisorScope
map {
async(dispatcher) { processBlock(it) }
}.awaitAll()
}
This is suspend extension function on Iterable<T> type, which does a parallel processing of items and returns some result of processing each item. By default it uses Dispatchers.IO dispatcher to offload blocking tasks to a shared pool of threads. Must be called from a coroutine (including a coroutine with Dispatchers.Main dispatcher) or another suspend function.
Example of calling from a coroutine:
val myObjects: List<MyObject> = getMyObjects()
someCoroutineScope.launch {
val results = myObjects.processInParallel {
someMethod(it)
}
// use processing results
}
where someCoroutineScope is an instance of CoroutineScope.
Or if you want to just launch and forget you can use this function:
fun <T> CoroutineScope.processInParallelAndForget(
iterable: Iterable<T>,
dispatcher: CoroutineDispatcher = Dispatchers.IO,
processBlock: suspend (v: T) -> Unit
) = iterable.forEach {
launch(dispatcher) { processBlock(it) }
}
This is an extension function on CoroutineScope, which doesn't return any result. It also uses Dispatchers.IO dispatcher by default. Can be called using CoroutineScope or from another coroutine.
Calling example:
someoroutineScope.processInParallelAndForget(myObjects) {
someMethod(it)
}
// OR from another coroutine:
someCoroutineScope.launch {
processInParallelAndForget(myObjects) {
someMethod(it)
}
}
where someCoroutineScope is an instance of CoroutineScope.