Kotlin Process Collection In Parallel? - kotlin

I have a collection of objects, which I need to perform some transformation on. Currently I am using:
var myObjects: List<MyObject> = getMyObjects()
myObjects.forEach{ myObj ->
someMethod(myObj)
}
It works fine, but I was hoping to speed it up by running someMethod() in parallel, instead of waiting for each object to finish, before starting on the next one.
Is there any way to do this in Kotlin? Maybe with doAsyncTask or something?
I know when this was asked over a year ago it was not possible, but now that Kotlin has coroutines like doAsyncTask I am curious if any of the coroutines can help

Yes, this can be done using coroutines. The following function applies an operation in parallel on all elements of a collection:
fun <A>Collection<A>.forEachParallel(f: suspend (A) -> Unit): Unit = runBlocking {
map { async(CommonPool) { f(it) } }.forEach { it.await() }
}
While the definition itself is a little cryptic, you can then easily apply it as you would expect:
myObjects.forEachParallel { myObj ->
someMethod(myObj)
}
Parallel map can be implemented in a similar way, see https://stackoverflow.com/a/45794062/1104870.

Java Stream is simple to use in Kotlin:
tasks.stream().parallel().forEach { computeNotSuspend(it) }
If you are using Android however, you cannot use Java 8 if you want an app compatible with an API lower than 24.
You can also use coroutines as you suggested. But it's not really part of the language as of now (August 2017) and you need to install an external library. There is very good guide with examples.
runBlocking<Unit> {
val deferreds = tasks.map { async(CommonPool) { compute(it) } }
deferreds.forEach { it.await() }
}
Note that coroutines are implemented with non-blocking multi-threading, which mean they can be faster than traditional multi-threading. I have code below benchmarking the Stream parallel versus coroutine and in that case the coroutine approach is 7 times faster on my machine. However you have to do some work yourself to make sure your code is "suspending" (non-locking) which can be quite tricky. In my example I'm just calling delay which is a suspend function provided by the library. Non-blocking multi-threading is not always faster than traditional multi-threading. It can be faster if you have many threads doing nothing but waiting on IO, which is kind of what my benchmark is doing.
My benchmarking code:
import kotlinx.coroutines.experimental.CommonPool
import kotlinx.coroutines.experimental.async
import kotlinx.coroutines.experimental.delay
import kotlinx.coroutines.experimental.launch
import kotlinx.coroutines.experimental.runBlocking
import java.util.*
import kotlin.system.measureNanoTime
import kotlin.system.measureTimeMillis
class SomeTask() {
val durationMS = random.nextInt(1000).toLong()
companion object {
val random = Random()
}
}
suspend fun compute(task: SomeTask): Unit {
delay(task.durationMS)
//println("done ${task.durationMS}")
return
}
fun computeNotSuspend(task: SomeTask): Unit {
Thread.sleep(task.durationMS)
//println("done ${task.durationMS}")
return
}
fun main(args: Array<String>) {
val n = 100
val tasks = List(n) { SomeTask() }
val timeCoroutine = measureNanoTime {
runBlocking<Unit> {
val deferreds = tasks.map { async(CommonPool) { compute(it) } }
deferreds.forEach { it.await() }
}
}
println("Coroutine ${timeCoroutine / 1_000_000} ms")
val timePar = measureNanoTime {
tasks.stream().parallel().forEach { computeNotSuspend(it) }
}
println("Stream parallel ${timePar / 1_000_000} ms")
}
Output on my 4 cores computer:
Coroutine: 1037 ms
Stream parallel: 7150 ms
If you uncomment out the println in the two compute functions you will see that in the non-blocking coroutine code the tasks are processed in the right order, but not with Streams.

You can use RxJava to solve this.
List<MyObjects> items = getList()
Observable.from(items).flatMap(object : Func1<MyObjects, Observable<String>>() {
fun call(item: MyObjects): Observable<String> {
return someMethod(item)
}
}).subscribeOn(Schedulers.io()).observeOn(AndroidSchedulers.mainThread()).subscribe(object : Subscriber<String>() {
fun onCompleted() {
}
fun onError(e: Throwable) {
}
fun onNext(s: String) {
// do on output of each string
}
})
By subscribing on Schedulers.io(), some method is scheduled on background thread.

To process items of a collection in parallel you can use Kotlin Coroutines. For example the following extension function processes items in parallel and waits for them to be processed:
suspend fun <T, R> Iterable<T>.processInParallel(
dispatcher: CoroutineDispatcher = Dispatchers.IO,
processBlock: suspend (v: T) -> R,
): List<R> = coroutineScope { // or supervisorScope
map {
async(dispatcher) { processBlock(it) }
}.awaitAll()
}
This is suspend extension function on Iterable<T> type, which does a parallel processing of items and returns some result of processing each item. By default it uses Dispatchers.IO dispatcher to offload blocking tasks to a shared pool of threads. Must be called from a coroutine (including a coroutine with Dispatchers.Main dispatcher) or another suspend function.
Example of calling from a coroutine:
val myObjects: List<MyObject> = getMyObjects()
someCoroutineScope.launch {
val results = myObjects.processInParallel {
someMethod(it)
}
// use processing results
}
where someCoroutineScope is an instance of CoroutineScope.
Or if you want to just launch and forget you can use this function:
fun <T> CoroutineScope.processInParallelAndForget(
iterable: Iterable<T>,
dispatcher: CoroutineDispatcher = Dispatchers.IO,
processBlock: suspend (v: T) -> Unit
) = iterable.forEach {
launch(dispatcher) { processBlock(it) }
}
This is an extension function on CoroutineScope, which doesn't return any result. It also uses Dispatchers.IO dispatcher by default. Can be called using CoroutineScope or from another coroutine.
Calling example:
someoroutineScope.processInParallelAndForget(myObjects) {
someMethod(it)
}
// OR from another coroutine:
someCoroutineScope.launch {
processInParallelAndForget(myObjects) {
someMethod(it)
}
}
where someCoroutineScope is an instance of CoroutineScope.

Related

Why is this extension function slower than non extension counterpart?

I was trying to write a parallel map extension function to do map operation over a List in parallel using coroutines.
However there is a significant overhead in my solution and I can't find out why.
This is my implementation of the pmap extension function:
fun <T, U> List<T>.pmap(scope: CoroutineScope = GlobalScope,
transform: suspend (T) -> U): List<U> {
return map { i -> scope.async { transform(i) } }.map { runBlocking { it.await() } }
}
However, when I do the exact same operation in a normal function, it takes up to extra 100ms (which is a lot).
I tried using inline but it had no effect.
I'm leaving here the full test I've done to demonstrate this behavior:
import kotlinx.coroutines.*
import kotlin.system.measureTimeMillis
fun main() {
test()
}
fun <T, U> List<T>.pmap(scope: CoroutineScope = GlobalScope,
transform: suspend (T) -> U): List<U> {
return this.map { i -> scope.async { transform(i) } }.map { runBlocking { it.await() } }
}
fun test() {
val list = listOf<Long>(100,200,300)
val transform: suspend (Long) -> Long = { long: Long ->
delay(long)
long*2
}
val timeTakenPmap = measureTimeMillis {
list.pmap(GlobalScope) { transform(it) }
}
val manualpmap = measureTimeMillis {
list.map { GlobalScope.async { transform(it) } }
.map { runBlocking { it.await() } }
}
val timeTakenMap = measureTimeMillis {
list.map { runBlocking { transform(it) } }
}
println("pmapTime: $timeTakenPmap - mapTime: $timeTakenMap - manualpmap: $manualpmap")
}
It can be run in kotlin playground: https://pl.kotl.in/CIXVqezg3
In the playground it prints this result:
pmapTime: 411 - mapTime: 602 - manualpmap: 302
MapTime and manualPmap give reasonable results, only 2ms of time outside the delays. But pmapTime is way off. And the code between manualpmap and pmap looks exactly the same to me.
In my own machine it runs a little faster, pmap takes around 350ms.
Does anyone know why this happens?
First of all, manual benchmarks like this are usually of very little significance. There are many things that can be optimized away by the compiler or the JIT and any conclusion can be quite wrong. If you really want to compare things, you should instead use benchmarking libraries which take into account JVM warmup etc.
Now, the overhead you see (if you could confirm there was an actual overhead) might be caused by the fact that your higher-order extension is not marked inline, so instances of the lambda you pass need to be created - but as #Tenfour04 noted there are many other possible reasons: thread pool lazy initialization, significance of the list size, etc.
That being said, this is really not an appropriate way to write parallel map, for several reasons:
GlobalScope is a pretty bad default in general, and should be used in very specific situations only. But don't worry about it because of the next point.
You don't need an externally provided CoroutineScope if the coroutines you launch do not outlive your method. Instead, use coroutineScope { ... } and make your function suspend, and the caller will choose the context if they need to
map { it.await() } is inefficient in case of errors: if the last element's transformation immediately fails, map will wait for all previous elements to finish before failing. You should prefer awaitAll which takes care of this.
runBlocking should be avoided in coroutines (blocking threads in general, especially when you don't control which thread you're blocking), so using it in deep library-like functions like this is dangerous, because it will likely be used in coroutines at some point.
Applying those points gives:
suspend inline fun <T, U> List<T>.pmap(transform: suspend (T) -> U): List<U> {
return coroutineScope {
map { async { transform(it) } }.awaitAll()
}
}

In Kotlin, is it possible to substitute a suspend fun with a non-suspend version, without breaking the caller?

I'm learning concurrency in Kotlin, coming from C#/JavaScript background, and I can't help comparing some concepts.
In C# and JavaScript, technically we can rewrite an async function as a regular non-async version doing the same thing, using Task.ContinueWith or Promise.then etc.
The caller of the function wouldn't even notice the difference (I ranted about it in a blog post).
Is something like that possible for a suspend function in Kotlin (i.e., without changing the calling code)? I don't think it is, but I thought I'd still ask.
The closest thing I could come up with is below (Kotlin playground link), I still have to call .await():
import kotlinx.coroutines.*
suspend fun suspendableDelay(ms: Long): Long {
delay(ms);
return ms;
}
fun regularDelay(ms: Long): Deferred<Long> {
val d = CompletableDeferred<Long>()
GlobalScope.async { delay(ms); d.complete(ms) }
return d;
}
suspend fun test(ms: Long): Long {
delay(ms);
return ms;
}
fun main() {
val r1 = runBlocking { suspendableDelay(250) }
println("suspendableDelay ended: $r1");
val r2 = runBlocking { regularDelay(500).await() }
println("regularDelay ended: $r2");
}
https://pl.kotl.in/_AmzanwcB
If you're on JVM 8 or higher, you can make a function that calls the suspend function in an async job and returns a CompletableFuture, which can be used to get your result with a callback (thenApplyAsync()) or synchronously (get()).
val scope = CoroutineScope(SupervisorJob())
suspend fun foo(): Int {
delay(500)
return Random.nextInt(10)
}
fun fooAsync(): CompletableFuture<Int> = scope.async { foo() }.asCompletableFuture()
fun main() {
fooAsync()
.thenApplyAsync { println(it) }
Thread.sleep(1000)
}
The above requires the kotlinx-coroutines-jdk8 library.
I don't know of a solution that works across multiple platforms.
This can only work if you change your suspending function to a non-suspending blocking function, for example
private fun method(){
GlobalScope.launch {
val value = getInt()
}
}
// Calling coroutine can be suspended and resumed when result is ready
private suspend fun getInt(): Int{
delay(2000) // or some suspending IO call
return 5;
}
// Calling coroutine can't be suspended, it will have to wait (block)
private fun getInt(): Int{
Thread.sleep(2000) // some blocking IO
return 5;
}
Here you can simply use the non-suspending version, without any change on the caller.
But the issue here is that without suspend modifier the function becomes blocking and as such it can not cause the coroutine to suspend, basically throwing away the advantage of using coroutiens.

Mix and match Coroutines and Rxjava

Coroutines and RxJava3
I have the following method that first makes a call to a suspend method and in the same launch scope I make 2 calls to RxJava.
I am wondering if there is a way to remove the Rxjava code out of the viewModelScope.launch scope and return the result of fetchRecentUseCase.execute().
Basically, is it possible for the viewModelScope.launch to return the listOfProducts rather than doing everything in the launch scope?
fun loadRecentlyViewed() {
viewModelScope.launch {
val listOfProducts = withContext(Dispatchers.IO) {
fetchRecentUseCase.execute()
}
val listOfSkus = listOfProducts.map { it.sku }
if (listOfSkus.isNotEmpty()) {
loadProductUseCase.execute(listOfSkus)
.subscribeOn(schedulersFacade.io)
.flatMap(convertProductDisplayUseCase::execute)
.map { /* work being done */ }
.observeOn(schedulersFacade.ui)
.subscribeBy(
onError = Timber::e,
onSuccess = { }
)
}
}
}
Usecase for the suspend method
class FetchRecentUseCaseImp() {
override suspend fun execute(): List<Products> {
// Call to network
}
}
Many thanks in advance
With coroutines, the way to return a single item that is produced asynchronously is to use a suspend function. So instead of launching a coroutine, you mark the function as suspend and convert blocking or async callback functions into non-blocking code.
The places where coroutines are launched are typically at UI interactions (click listeners), or when classes are first created (on Android, this is places like in a ViewModel constructor or Fragment's onViewCreated()).
As a side note, it is against convention for any suspend function to expect the caller to have to specify a dispatcher. It should internally delegate if it needs to, for example:
class FetchRecentUseCaseImp() {
override suspend fun execute(): List<Products> = withContext(Dispatchers.IO) {
// Synchronous call to network
}
}
But if you were using a library like Retrofit, you'd simply make your Request and await() it without specifying a dispatcher, because await() is a suspend function itself.
So your function should look something like:
suspend fun loadRecentlyViewed(): List<SomeProductType> {
val listOfSkus = fetchRecentUseCase.execute().map(Product::sku)
if (listOfSkus.isEmpty()) {
return emptyList()
}
return runCatching {
loadProductUseCase.execute(listOfSkus) // A Single, I'm assuming
.await() // Only if you're not completely stripping Rx from project
.map { convertProductDisplayUseCase.execute(it).await() } // Ditto for await()
.toList()
.flatten()
}.onFailure(Timber::e)
.getOrDefault(emptyList())
}

Difference between Kotlin arrow IO, IO.fx, IO !effect

I am trying to use arrow in kotlin
Arrow has three functions
IO {}
IO.fx {}
IO.fx { !effect}
I want to know the difference between these. I know IO.fx and IO.fx {!effect} help us use side effects but then whats the difference between the two and why would I use one over the other
While this is going to change shortly, on version 0.11.X:
IO { } is a constructor that takes a suspend function, so you can call any suspend function inside. It's a shortcut for IO.effect { }
suspend fun bla(): Unit = ...
fun myIO(): IO<Unit> = IO { bla() }
fun otherIO(): IO<Unit> = IO.effect { bla() }
IO.fx { } is the same as IO except it adds a few DSL functions that are shortcuts for other APIs of IO. The most important one is ! or bind, which executes another IO inside.
fun myIO(): IO<Unit> = IO.fx { bla() }
fun nestIO(): IO<IO<Unit>> = IO.fx { myIO() }
fun unpackIO(): IO<Unit> = IO.fx { !myIO() }
Another function it enables is the constructor effect from the first point. So what you're effectively doing is adding an additional layer of wrapping that may not be necessary.
fun inefficientNestIO(): IO<IO<Unit>> = IO.fx { effect { bla() } }
fun inefficientUnpackedIO(): IO<Unit> = IO.fx { !effect { bla() } }
We frequently see that inefficientUnpackedIO from people who come to the support channels, and it's easily replaceable by just IO { bla() }.
Why have two ways of doing the same in effect and fx? It's something we're looking to improve on the next releases. We recommend using the least powerful abstraction wherever possible, so reserve fx only when using other IO-based APIs such as scheduling or parallelization.
IO.fx {
val id = getUserIdSuspend()
val friends: List<User> =
!parMapN(
userFriends(id),
IO { userProfile(id) },
::toUsers
)
!friends.parTraverse(IO.applicative()) { user ->
IO { broadcastStatus(user) }
}
}

How to unit-test Kotlin-JS code with coroutines?

I've created a multi-platform Kotlin project (JVM & JS), declared an expected class and implemented it:
// Common module:
expect class Request(/* ... */) {
suspend fun loadText(): String
}
// JS implementation:
actual class Request actual constructor(/* ... */) {
actual suspend fun loadText(): String = suspendCoroutine { continuation ->
// ...
}
}
Now I'm trying to make a unit test using kotlin.test, and for the JVM platform I simply use runBlocking like this:
#Test
fun sampleTest() {
val req = Request(/* ... */)
runBlocking { assertEquals( /* ... */ , req.loadText()) }
}
How can I reproduce similar functionality on the JS platform, if there is no runBlocking?
Mb it's late, but there are open issue for adding possibility to use suspend functions in js-tests (there this function will transparent convert to promise)
Workaround:
One can define in common code:
expect fun runTest(block: suspend () -> Unit)
that is implemented in JVM with
actual fun runTest(block: suspend () -> Unit) = runBlocking { block() }
and in JS with
actual fun runTest(block: suspend () -> Unit): dynamic = promise { block() }
TL;DR
On JS one can use GlobalScope.promise { ... }.
But for most use cases the best option is probably to use runTest { ... } (from kotlinx-coroutines-test), which is cross-platform, and has some other benefits over runBlocking { ... } and GlobalScope.promise { ... } as well.
Full answer
I'm not sure what things were like when the question was originally posted, but nowadays the standard, cross-platform way to run tests that use suspend functions is to use runTest { ... } (from kotlinx-coroutines-test).
Note that in addition to running on all platforms, this also includes some other features, such as skipping delays (with the ability to mock the passage of time).
If for any reason (which is not typical, but might sometimes be the case) it is actually desirable to run the code in the test as it runs in production (including actual delays), then runBlocking { ... } can be used on JVM and Native, and GlobalScope.promise { ... } on JS. If going for this option, it might be convenient to define a single function signature which uses runBlocking on JVM and Native, and GlobalScope.promise on JS, e.g.:
// Common:
expect fun runTest(block: suspend CoroutineScope.() -> Unit)
// JS:
#OptIn(DelicateCoroutinesApi::class)
actual fun runTest(block: suspend CoroutineScope.() -> Unit): dynamic = GlobalScope.promise(block=block)
// JVM, Native:
actual fun runTest(block: suspend CoroutineScope.() -> Unit): Unit = runBlocking(block=block)
I was able to make the following work:
expect fun coTest(timeout: Duration = 30.seconds, block: suspend () -> Unit): Unit
// jvm
actual fun coTest(timeout: Duration, block: suspend () -> Unit) {
runBlocking {
withTimeout(timeout) {
block.invoke()
}
}
}
// js
private val testScope = CoroutineScope(CoroutineName("test-scope"))
actual fun coTest(timeout: Duration, block: suspend () -> Unit): dynamic = testScope.async {
withTimeout(timeout) {
block.invoke()
}
}.asPromise()
This launches a co-routine in a scope of your choice using async which you can then return like a promise.
You then write a test like so:
#Test
fun myTest() = coTest {
...
}