How to use Kotlin's coroutines with collections - kotlin

I'm fairly new to Kotlin and its coroutines module, and I'm trying to do something that seemed pretty simple to me at first.
I have a function (the getCostlyList() below) that returns a List after some costly computation. This method is called multiple time sequentially. All these calls are then merged into a Set.
private fun myFun(): Set<Int> {
return (1..10)
.flatMap { getCostlyList() }
.toSet()
}
private fun getCostlyList(): List<Int> {
// omitting costly code here...
return listOf(...)
}
My goal would be to use coroutines to make these calls to this costly method asynchronously, but I am having trouble wrapping my head around this issue.

you can write something like this:
private suspend fun myFun(): Set<Int> = coroutineScope {
(1..10)
.map { async { getCostlyList() } }
.awaitAll()
.flatten()
.toSet()
}

Related

Why is this extension function slower than non extension counterpart?

I was trying to write a parallel map extension function to do map operation over a List in parallel using coroutines.
However there is a significant overhead in my solution and I can't find out why.
This is my implementation of the pmap extension function:
fun <T, U> List<T>.pmap(scope: CoroutineScope = GlobalScope,
transform: suspend (T) -> U): List<U> {
return map { i -> scope.async { transform(i) } }.map { runBlocking { it.await() } }
}
However, when I do the exact same operation in a normal function, it takes up to extra 100ms (which is a lot).
I tried using inline but it had no effect.
I'm leaving here the full test I've done to demonstrate this behavior:
import kotlinx.coroutines.*
import kotlin.system.measureTimeMillis
fun main() {
test()
}
fun <T, U> List<T>.pmap(scope: CoroutineScope = GlobalScope,
transform: suspend (T) -> U): List<U> {
return this.map { i -> scope.async { transform(i) } }.map { runBlocking { it.await() } }
}
fun test() {
val list = listOf<Long>(100,200,300)
val transform: suspend (Long) -> Long = { long: Long ->
delay(long)
long*2
}
val timeTakenPmap = measureTimeMillis {
list.pmap(GlobalScope) { transform(it) }
}
val manualpmap = measureTimeMillis {
list.map { GlobalScope.async { transform(it) } }
.map { runBlocking { it.await() } }
}
val timeTakenMap = measureTimeMillis {
list.map { runBlocking { transform(it) } }
}
println("pmapTime: $timeTakenPmap - mapTime: $timeTakenMap - manualpmap: $manualpmap")
}
It can be run in kotlin playground: https://pl.kotl.in/CIXVqezg3
In the playground it prints this result:
pmapTime: 411 - mapTime: 602 - manualpmap: 302
MapTime and manualPmap give reasonable results, only 2ms of time outside the delays. But pmapTime is way off. And the code between manualpmap and pmap looks exactly the same to me.
In my own machine it runs a little faster, pmap takes around 350ms.
Does anyone know why this happens?
First of all, manual benchmarks like this are usually of very little significance. There are many things that can be optimized away by the compiler or the JIT and any conclusion can be quite wrong. If you really want to compare things, you should instead use benchmarking libraries which take into account JVM warmup etc.
Now, the overhead you see (if you could confirm there was an actual overhead) might be caused by the fact that your higher-order extension is not marked inline, so instances of the lambda you pass need to be created - but as #Tenfour04 noted there are many other possible reasons: thread pool lazy initialization, significance of the list size, etc.
That being said, this is really not an appropriate way to write parallel map, for several reasons:
GlobalScope is a pretty bad default in general, and should be used in very specific situations only. But don't worry about it because of the next point.
You don't need an externally provided CoroutineScope if the coroutines you launch do not outlive your method. Instead, use coroutineScope { ... } and make your function suspend, and the caller will choose the context if they need to
map { it.await() } is inefficient in case of errors: if the last element's transformation immediately fails, map will wait for all previous elements to finish before failing. You should prefer awaitAll which takes care of this.
runBlocking should be avoided in coroutines (blocking threads in general, especially when you don't control which thread you're blocking), so using it in deep library-like functions like this is dangerous, because it will likely be used in coroutines at some point.
Applying those points gives:
suspend inline fun <T, U> List<T>.pmap(transform: suspend (T) -> U): List<U> {
return coroutineScope {
map { async { transform(it) } }.awaitAll()
}
}

Mix and match Coroutines and Rxjava

Coroutines and RxJava3
I have the following method that first makes a call to a suspend method and in the same launch scope I make 2 calls to RxJava.
I am wondering if there is a way to remove the Rxjava code out of the viewModelScope.launch scope and return the result of fetchRecentUseCase.execute().
Basically, is it possible for the viewModelScope.launch to return the listOfProducts rather than doing everything in the launch scope?
fun loadRecentlyViewed() {
viewModelScope.launch {
val listOfProducts = withContext(Dispatchers.IO) {
fetchRecentUseCase.execute()
}
val listOfSkus = listOfProducts.map { it.sku }
if (listOfSkus.isNotEmpty()) {
loadProductUseCase.execute(listOfSkus)
.subscribeOn(schedulersFacade.io)
.flatMap(convertProductDisplayUseCase::execute)
.map { /* work being done */ }
.observeOn(schedulersFacade.ui)
.subscribeBy(
onError = Timber::e,
onSuccess = { }
)
}
}
}
Usecase for the suspend method
class FetchRecentUseCaseImp() {
override suspend fun execute(): List<Products> {
// Call to network
}
}
Many thanks in advance
With coroutines, the way to return a single item that is produced asynchronously is to use a suspend function. So instead of launching a coroutine, you mark the function as suspend and convert blocking or async callback functions into non-blocking code.
The places where coroutines are launched are typically at UI interactions (click listeners), or when classes are first created (on Android, this is places like in a ViewModel constructor or Fragment's onViewCreated()).
As a side note, it is against convention for any suspend function to expect the caller to have to specify a dispatcher. It should internally delegate if it needs to, for example:
class FetchRecentUseCaseImp() {
override suspend fun execute(): List<Products> = withContext(Dispatchers.IO) {
// Synchronous call to network
}
}
But if you were using a library like Retrofit, you'd simply make your Request and await() it without specifying a dispatcher, because await() is a suspend function itself.
So your function should look something like:
suspend fun loadRecentlyViewed(): List<SomeProductType> {
val listOfSkus = fetchRecentUseCase.execute().map(Product::sku)
if (listOfSkus.isEmpty()) {
return emptyList()
}
return runCatching {
loadProductUseCase.execute(listOfSkus) // A Single, I'm assuming
.await() // Only if you're not completely stripping Rx from project
.map { convertProductDisplayUseCase.execute(it).await() } // Ditto for await()
.toList()
.flatten()
}.onFailure(Timber::e)
.getOrDefault(emptyList())
}

Kotlin - How to lock a collection when accessing it from two threads

wondered if anyone could assist, I'm trying to understand the correct way to access a collection in Kotlin with two threads.
The code below simulates a problem I'm having in a live system. One thread iterates over the collection but another thread can remove elements in that array.
I have tried adding #synchronized to the collections getter but that still gives me a concurrentmodification exception.
Can anyone let me know what the correct way of doing this would be?
class ListTest() {
val myList = mutableListOf<String>()
#Synchronized
get() = field
init {
repeat(10000) {
myList.add("stuff: $it")
}
}
}
fun main() = runBlocking<Unit> {
val listTest = ListTest()
launch(Dispatchers.Default) {
delay(1L)
listTest.myList.remove("stuff: 54")
}
launch {
listTest.myList.forEach { println(it) }
}
}
You are only synchronizing the getter and setter, so when you start using the reference you get to the list, it is already unlocked.
Kotlin has the Mutex class available for locking manipulation of a shared mutable object. Mutex is nicer than Java's synchronized because it suspends instead of blocking the coroutine thread.
Your example would be poor design in the real world because your class publicly exposes a mutable list. But going along with making it at least safe to modify the list:
class ListTest() {
private val myListMutex = Mutex()
private val myList = mutableListOf<String>()
init {
repeat(10000) {
myList.add("stuff: $it")
}
}
suspend fun modifyMyList(block: MutableList<String>.() -> Unit) {
myListMutex.withLock { myList.block() }
}
}
fun main() = runBlocking<Unit> {
val listTest = ListTest()
launch(Dispatchers.Default) {
delay(1L)
listTest.modifyMyList { it.remove("stuff: 54") }
}
launch {
listTest.modifyMyList { it.forEach { println(it) } }
}
}
If you are not working with coroutines, instead of a Mutex(), you can use an Any and instead of withLock use synchronized (myListLock) {} just like you would in Java to prevent code from within the synchronized blocks from running at the same time.
If you want to lock a collection, or any object for concurrent access, you can use the almost same construct as java's synchronized keyword.
So while accessing such an object you would do
fun someFun() {
synchronized(yourCollection) {
}
}
You can also use synchronizedCollection method from java's Collections class, but this only makes single method access thread safe, if you have to iterate over the collection, you will still have to manually handle the synchronization.

How To await a function call?

So I have some asynchronous operations happening, I can create some lambada, call a function and pass that value to them. But what i want is not to have the result of the operation as a parameter, I want to return them.
As a example, I have a class A with some listeners, if there is a result all listeners are notified. So basically the asyncFunction should return a result if there is one otherwise be suspended.
object A {
val listeners = mutableListOf<(Int) -> Unit>()
fun onResult(value: Int) {
listeners.forEach { it(value) }
}
}
fun asyncFunction(): Deferred<Int> {
return async {
A.listeners.add({ result ->
})
return result
}
}
What I'm thinking right now (maybe I'm completely on the wrong track), is to have something like a Deferred, to which i can send the result and it returns. Is there something like that? Can I implement a Deffered myself?
class A {
private val awaiter: ??? // can this be a Deferred ?
fun onResult(result: Int) {
awaiter.putResult(result)
}
fun awaitResult(): Int {
return awaiter.await()
}
}
val a = A()
launch {
val result = a.awaitResult()
}
launch {
a.onResult(42)
}
So I do know that with callbacks this can be handled but it would be cleaner and easier to have it that way.
I hope there is a nice and clean solution im just missing.
Your asyncFunction should in fact be a suspendable function:
suspend fun suspendFunction(): Int =
suspendCoroutine { cont -> A.listeners.add { cont.resume(it) } }
Note that it returns the Int result and suspends until it's available.
However, this is just a fix for your immediate problem. It will still malfunction in many ways:
the listener's purpose is served as soon as it gets the first result, but it stays in the listener list forever, resulting in a memory leak
if the result arrived before you called suspendFunction, it will miss it and hang.
You can keep improving it manually (it's a good way to learn) or switch to a solid solution provided by the standard library. The library solution is CompletableDeferred:
object A {
val result = CompletableDeferred<Int>()
fun provideResult(r: Int) {
result.complete(r)
}
}
suspend fun suspendFunction(): Int = A.result.await()

Kotlin Process Collection In Parallel?

I have a collection of objects, which I need to perform some transformation on. Currently I am using:
var myObjects: List<MyObject> = getMyObjects()
myObjects.forEach{ myObj ->
someMethod(myObj)
}
It works fine, but I was hoping to speed it up by running someMethod() in parallel, instead of waiting for each object to finish, before starting on the next one.
Is there any way to do this in Kotlin? Maybe with doAsyncTask or something?
I know when this was asked over a year ago it was not possible, but now that Kotlin has coroutines like doAsyncTask I am curious if any of the coroutines can help
Yes, this can be done using coroutines. The following function applies an operation in parallel on all elements of a collection:
fun <A>Collection<A>.forEachParallel(f: suspend (A) -> Unit): Unit = runBlocking {
map { async(CommonPool) { f(it) } }.forEach { it.await() }
}
While the definition itself is a little cryptic, you can then easily apply it as you would expect:
myObjects.forEachParallel { myObj ->
someMethod(myObj)
}
Parallel map can be implemented in a similar way, see https://stackoverflow.com/a/45794062/1104870.
Java Stream is simple to use in Kotlin:
tasks.stream().parallel().forEach { computeNotSuspend(it) }
If you are using Android however, you cannot use Java 8 if you want an app compatible with an API lower than 24.
You can also use coroutines as you suggested. But it's not really part of the language as of now (August 2017) and you need to install an external library. There is very good guide with examples.
runBlocking<Unit> {
val deferreds = tasks.map { async(CommonPool) { compute(it) } }
deferreds.forEach { it.await() }
}
Note that coroutines are implemented with non-blocking multi-threading, which mean they can be faster than traditional multi-threading. I have code below benchmarking the Stream parallel versus coroutine and in that case the coroutine approach is 7 times faster on my machine. However you have to do some work yourself to make sure your code is "suspending" (non-locking) which can be quite tricky. In my example I'm just calling delay which is a suspend function provided by the library. Non-blocking multi-threading is not always faster than traditional multi-threading. It can be faster if you have many threads doing nothing but waiting on IO, which is kind of what my benchmark is doing.
My benchmarking code:
import kotlinx.coroutines.experimental.CommonPool
import kotlinx.coroutines.experimental.async
import kotlinx.coroutines.experimental.delay
import kotlinx.coroutines.experimental.launch
import kotlinx.coroutines.experimental.runBlocking
import java.util.*
import kotlin.system.measureNanoTime
import kotlin.system.measureTimeMillis
class SomeTask() {
val durationMS = random.nextInt(1000).toLong()
companion object {
val random = Random()
}
}
suspend fun compute(task: SomeTask): Unit {
delay(task.durationMS)
//println("done ${task.durationMS}")
return
}
fun computeNotSuspend(task: SomeTask): Unit {
Thread.sleep(task.durationMS)
//println("done ${task.durationMS}")
return
}
fun main(args: Array<String>) {
val n = 100
val tasks = List(n) { SomeTask() }
val timeCoroutine = measureNanoTime {
runBlocking<Unit> {
val deferreds = tasks.map { async(CommonPool) { compute(it) } }
deferreds.forEach { it.await() }
}
}
println("Coroutine ${timeCoroutine / 1_000_000} ms")
val timePar = measureNanoTime {
tasks.stream().parallel().forEach { computeNotSuspend(it) }
}
println("Stream parallel ${timePar / 1_000_000} ms")
}
Output on my 4 cores computer:
Coroutine: 1037 ms
Stream parallel: 7150 ms
If you uncomment out the println in the two compute functions you will see that in the non-blocking coroutine code the tasks are processed in the right order, but not with Streams.
You can use RxJava to solve this.
List<MyObjects> items = getList()
Observable.from(items).flatMap(object : Func1<MyObjects, Observable<String>>() {
fun call(item: MyObjects): Observable<String> {
return someMethod(item)
}
}).subscribeOn(Schedulers.io()).observeOn(AndroidSchedulers.mainThread()).subscribe(object : Subscriber<String>() {
fun onCompleted() {
}
fun onError(e: Throwable) {
}
fun onNext(s: String) {
// do on output of each string
}
})
By subscribing on Schedulers.io(), some method is scheduled on background thread.
To process items of a collection in parallel you can use Kotlin Coroutines. For example the following extension function processes items in parallel and waits for them to be processed:
suspend fun <T, R> Iterable<T>.processInParallel(
dispatcher: CoroutineDispatcher = Dispatchers.IO,
processBlock: suspend (v: T) -> R,
): List<R> = coroutineScope { // or supervisorScope
map {
async(dispatcher) { processBlock(it) }
}.awaitAll()
}
This is suspend extension function on Iterable<T> type, which does a parallel processing of items and returns some result of processing each item. By default it uses Dispatchers.IO dispatcher to offload blocking tasks to a shared pool of threads. Must be called from a coroutine (including a coroutine with Dispatchers.Main dispatcher) or another suspend function.
Example of calling from a coroutine:
val myObjects: List<MyObject> = getMyObjects()
someCoroutineScope.launch {
val results = myObjects.processInParallel {
someMethod(it)
}
// use processing results
}
where someCoroutineScope is an instance of CoroutineScope.
Or if you want to just launch and forget you can use this function:
fun <T> CoroutineScope.processInParallelAndForget(
iterable: Iterable<T>,
dispatcher: CoroutineDispatcher = Dispatchers.IO,
processBlock: suspend (v: T) -> Unit
) = iterable.forEach {
launch(dispatcher) { processBlock(it) }
}
This is an extension function on CoroutineScope, which doesn't return any result. It also uses Dispatchers.IO dispatcher by default. Can be called using CoroutineScope or from another coroutine.
Calling example:
someoroutineScope.processInParallelAndForget(myObjects) {
someMethod(it)
}
// OR from another coroutine:
someCoroutineScope.launch {
processInParallelAndForget(myObjects) {
someMethod(it)
}
}
where someCoroutineScope is an instance of CoroutineScope.