Why is this extension function slower than non extension counterpart? - kotlin

I was trying to write a parallel map extension function to do map operation over a List in parallel using coroutines.
However there is a significant overhead in my solution and I can't find out why.
This is my implementation of the pmap extension function:
fun <T, U> List<T>.pmap(scope: CoroutineScope = GlobalScope,
transform: suspend (T) -> U): List<U> {
return map { i -> scope.async { transform(i) } }.map { runBlocking { it.await() } }
}
However, when I do the exact same operation in a normal function, it takes up to extra 100ms (which is a lot).
I tried using inline but it had no effect.
I'm leaving here the full test I've done to demonstrate this behavior:
import kotlinx.coroutines.*
import kotlin.system.measureTimeMillis
fun main() {
test()
}
fun <T, U> List<T>.pmap(scope: CoroutineScope = GlobalScope,
transform: suspend (T) -> U): List<U> {
return this.map { i -> scope.async { transform(i) } }.map { runBlocking { it.await() } }
}
fun test() {
val list = listOf<Long>(100,200,300)
val transform: suspend (Long) -> Long = { long: Long ->
delay(long)
long*2
}
val timeTakenPmap = measureTimeMillis {
list.pmap(GlobalScope) { transform(it) }
}
val manualpmap = measureTimeMillis {
list.map { GlobalScope.async { transform(it) } }
.map { runBlocking { it.await() } }
}
val timeTakenMap = measureTimeMillis {
list.map { runBlocking { transform(it) } }
}
println("pmapTime: $timeTakenPmap - mapTime: $timeTakenMap - manualpmap: $manualpmap")
}
It can be run in kotlin playground: https://pl.kotl.in/CIXVqezg3
In the playground it prints this result:
pmapTime: 411 - mapTime: 602 - manualpmap: 302
MapTime and manualPmap give reasonable results, only 2ms of time outside the delays. But pmapTime is way off. And the code between manualpmap and pmap looks exactly the same to me.
In my own machine it runs a little faster, pmap takes around 350ms.
Does anyone know why this happens?

First of all, manual benchmarks like this are usually of very little significance. There are many things that can be optimized away by the compiler or the JIT and any conclusion can be quite wrong. If you really want to compare things, you should instead use benchmarking libraries which take into account JVM warmup etc.
Now, the overhead you see (if you could confirm there was an actual overhead) might be caused by the fact that your higher-order extension is not marked inline, so instances of the lambda you pass need to be created - but as #Tenfour04 noted there are many other possible reasons: thread pool lazy initialization, significance of the list size, etc.
That being said, this is really not an appropriate way to write parallel map, for several reasons:
GlobalScope is a pretty bad default in general, and should be used in very specific situations only. But don't worry about it because of the next point.
You don't need an externally provided CoroutineScope if the coroutines you launch do not outlive your method. Instead, use coroutineScope { ... } and make your function suspend, and the caller will choose the context if they need to
map { it.await() } is inefficient in case of errors: if the last element's transformation immediately fails, map will wait for all previous elements to finish before failing. You should prefer awaitAll which takes care of this.
runBlocking should be avoided in coroutines (blocking threads in general, especially when you don't control which thread you're blocking), so using it in deep library-like functions like this is dangerous, because it will likely be used in coroutines at some point.
Applying those points gives:
suspend inline fun <T, U> List<T>.pmap(transform: suspend (T) -> U): List<U> {
return coroutineScope {
map { async { transform(it) } }.awaitAll()
}
}

Related

In Kotlin, is it possible to substitute a suspend fun with a non-suspend version, without breaking the caller?

I'm learning concurrency in Kotlin, coming from C#/JavaScript background, and I can't help comparing some concepts.
In C# and JavaScript, technically we can rewrite an async function as a regular non-async version doing the same thing, using Task.ContinueWith or Promise.then etc.
The caller of the function wouldn't even notice the difference (I ranted about it in a blog post).
Is something like that possible for a suspend function in Kotlin (i.e., without changing the calling code)? I don't think it is, but I thought I'd still ask.
The closest thing I could come up with is below (Kotlin playground link), I still have to call .await():
import kotlinx.coroutines.*
suspend fun suspendableDelay(ms: Long): Long {
delay(ms);
return ms;
}
fun regularDelay(ms: Long): Deferred<Long> {
val d = CompletableDeferred<Long>()
GlobalScope.async { delay(ms); d.complete(ms) }
return d;
}
suspend fun test(ms: Long): Long {
delay(ms);
return ms;
}
fun main() {
val r1 = runBlocking { suspendableDelay(250) }
println("suspendableDelay ended: $r1");
val r2 = runBlocking { regularDelay(500).await() }
println("regularDelay ended: $r2");
}
https://pl.kotl.in/_AmzanwcB
If you're on JVM 8 or higher, you can make a function that calls the suspend function in an async job and returns a CompletableFuture, which can be used to get your result with a callback (thenApplyAsync()) or synchronously (get()).
val scope = CoroutineScope(SupervisorJob())
suspend fun foo(): Int {
delay(500)
return Random.nextInt(10)
}
fun fooAsync(): CompletableFuture<Int> = scope.async { foo() }.asCompletableFuture()
fun main() {
fooAsync()
.thenApplyAsync { println(it) }
Thread.sleep(1000)
}
The above requires the kotlinx-coroutines-jdk8 library.
I don't know of a solution that works across multiple platforms.
This can only work if you change your suspending function to a non-suspending blocking function, for example
private fun method(){
GlobalScope.launch {
val value = getInt()
}
}
// Calling coroutine can be suspended and resumed when result is ready
private suspend fun getInt(): Int{
delay(2000) // or some suspending IO call
return 5;
}
// Calling coroutine can't be suspended, it will have to wait (block)
private fun getInt(): Int{
Thread.sleep(2000) // some blocking IO
return 5;
}
Here you can simply use the non-suspending version, without any change on the caller.
But the issue here is that without suspend modifier the function becomes blocking and as such it can not cause the coroutine to suspend, basically throwing away the advantage of using coroutiens.

Difference between Kotlin arrow IO, IO.fx, IO !effect

I am trying to use arrow in kotlin
Arrow has three functions
IO {}
IO.fx {}
IO.fx { !effect}
I want to know the difference between these. I know IO.fx and IO.fx {!effect} help us use side effects but then whats the difference between the two and why would I use one over the other
While this is going to change shortly, on version 0.11.X:
IO { } is a constructor that takes a suspend function, so you can call any suspend function inside. It's a shortcut for IO.effect { }
suspend fun bla(): Unit = ...
fun myIO(): IO<Unit> = IO { bla() }
fun otherIO(): IO<Unit> = IO.effect { bla() }
IO.fx { } is the same as IO except it adds a few DSL functions that are shortcuts for other APIs of IO. The most important one is ! or bind, which executes another IO inside.
fun myIO(): IO<Unit> = IO.fx { bla() }
fun nestIO(): IO<IO<Unit>> = IO.fx { myIO() }
fun unpackIO(): IO<Unit> = IO.fx { !myIO() }
Another function it enables is the constructor effect from the first point. So what you're effectively doing is adding an additional layer of wrapping that may not be necessary.
fun inefficientNestIO(): IO<IO<Unit>> = IO.fx { effect { bla() } }
fun inefficientUnpackedIO(): IO<Unit> = IO.fx { !effect { bla() } }
We frequently see that inefficientUnpackedIO from people who come to the support channels, and it's easily replaceable by just IO { bla() }.
Why have two ways of doing the same in effect and fx? It's something we're looking to improve on the next releases. We recommend using the least powerful abstraction wherever possible, so reserve fx only when using other IO-based APIs such as scheduling or parallelization.
IO.fx {
val id = getUserIdSuspend()
val friends: List<User> =
!parMapN(
userFriends(id),
IO { userProfile(id) },
::toUsers
)
!friends.parTraverse(IO.applicative()) { user ->
IO { broadcastStatus(user) }
}
}

Differences between two coroutine launch in kotlin

What's the difference between CoroutineScope(dispatchers).launch{} and coroutineScope{ launch{}}?
Say I have the code below:
(you can go to Kotlin playground to run this snippet https://pl.kotl.in/U4eDY4uJt)
suspend fun perform(invokeAfterDelay: suspend () -> Unit) {
// not printing
CoroutineScope(Dispatchers.Default).launch {
delay(1000)
invokeAfterDelay()
}
// will print
coroutineScope {
launch {
delay(1000)
invokeAfterDelay()
}
}
}
fun printSomething() {
println("Counter")
}
fun main() {
runBlocking {
perform {
printSomething()
}
}
}
And as the comment stated, when using CoroutineScope().launch it won't invoke the print, however when using the other way, the code behaves as intended.
What's the difference?
Thanks.
Further question
New findings.
if I leave the perform function like this (without commenting out one of the coroutines)
suspend fun perform(invokeAfterDelay: suspend () -> Unit) {
CoroutineScope(Dispatchers.Default).launch {
delay(1000)
invokeAfterDelay()
}
coroutineScope {
launch {
delay(1000)
invokeAfterDelay()
}
}
}
then both of these coroutines will be executed 🤔Why?
CoroutineScope().launch {} and coroutineScope { launch{} } have almost nothing in common. The former just sets up an ad-hoc scope for launch to run against, completing immediately, and the latter is a suspendable function that ensures that all coroutines launched within it complete before it returns.
The snippet under your "Further question" is identical to the original one except for the deleted comments.
Whether or not the first coroutine prints anything is up to non-deterministic behavior: while perform is spending time within coroutineScope, awaiting for the completion of the inner launched coroutine, the first one may or may not complete itself. They have the same delay.

How to use Kotlin's coroutines with collections

I'm fairly new to Kotlin and its coroutines module, and I'm trying to do something that seemed pretty simple to me at first.
I have a function (the getCostlyList() below) that returns a List after some costly computation. This method is called multiple time sequentially. All these calls are then merged into a Set.
private fun myFun(): Set<Int> {
return (1..10)
.flatMap { getCostlyList() }
.toSet()
}
private fun getCostlyList(): List<Int> {
// omitting costly code here...
return listOf(...)
}
My goal would be to use coroutines to make these calls to this costly method asynchronously, but I am having trouble wrapping my head around this issue.
you can write something like this:
private suspend fun myFun(): Set<Int> = coroutineScope {
(1..10)
.map { async { getCostlyList() } }
.awaitAll()
.flatten()
.toSet()
}

Kotlin Process Collection In Parallel?

I have a collection of objects, which I need to perform some transformation on. Currently I am using:
var myObjects: List<MyObject> = getMyObjects()
myObjects.forEach{ myObj ->
someMethod(myObj)
}
It works fine, but I was hoping to speed it up by running someMethod() in parallel, instead of waiting for each object to finish, before starting on the next one.
Is there any way to do this in Kotlin? Maybe with doAsyncTask or something?
I know when this was asked over a year ago it was not possible, but now that Kotlin has coroutines like doAsyncTask I am curious if any of the coroutines can help
Yes, this can be done using coroutines. The following function applies an operation in parallel on all elements of a collection:
fun <A>Collection<A>.forEachParallel(f: suspend (A) -> Unit): Unit = runBlocking {
map { async(CommonPool) { f(it) } }.forEach { it.await() }
}
While the definition itself is a little cryptic, you can then easily apply it as you would expect:
myObjects.forEachParallel { myObj ->
someMethod(myObj)
}
Parallel map can be implemented in a similar way, see https://stackoverflow.com/a/45794062/1104870.
Java Stream is simple to use in Kotlin:
tasks.stream().parallel().forEach { computeNotSuspend(it) }
If you are using Android however, you cannot use Java 8 if you want an app compatible with an API lower than 24.
You can also use coroutines as you suggested. But it's not really part of the language as of now (August 2017) and you need to install an external library. There is very good guide with examples.
runBlocking<Unit> {
val deferreds = tasks.map { async(CommonPool) { compute(it) } }
deferreds.forEach { it.await() }
}
Note that coroutines are implemented with non-blocking multi-threading, which mean they can be faster than traditional multi-threading. I have code below benchmarking the Stream parallel versus coroutine and in that case the coroutine approach is 7 times faster on my machine. However you have to do some work yourself to make sure your code is "suspending" (non-locking) which can be quite tricky. In my example I'm just calling delay which is a suspend function provided by the library. Non-blocking multi-threading is not always faster than traditional multi-threading. It can be faster if you have many threads doing nothing but waiting on IO, which is kind of what my benchmark is doing.
My benchmarking code:
import kotlinx.coroutines.experimental.CommonPool
import kotlinx.coroutines.experimental.async
import kotlinx.coroutines.experimental.delay
import kotlinx.coroutines.experimental.launch
import kotlinx.coroutines.experimental.runBlocking
import java.util.*
import kotlin.system.measureNanoTime
import kotlin.system.measureTimeMillis
class SomeTask() {
val durationMS = random.nextInt(1000).toLong()
companion object {
val random = Random()
}
}
suspend fun compute(task: SomeTask): Unit {
delay(task.durationMS)
//println("done ${task.durationMS}")
return
}
fun computeNotSuspend(task: SomeTask): Unit {
Thread.sleep(task.durationMS)
//println("done ${task.durationMS}")
return
}
fun main(args: Array<String>) {
val n = 100
val tasks = List(n) { SomeTask() }
val timeCoroutine = measureNanoTime {
runBlocking<Unit> {
val deferreds = tasks.map { async(CommonPool) { compute(it) } }
deferreds.forEach { it.await() }
}
}
println("Coroutine ${timeCoroutine / 1_000_000} ms")
val timePar = measureNanoTime {
tasks.stream().parallel().forEach { computeNotSuspend(it) }
}
println("Stream parallel ${timePar / 1_000_000} ms")
}
Output on my 4 cores computer:
Coroutine: 1037 ms
Stream parallel: 7150 ms
If you uncomment out the println in the two compute functions you will see that in the non-blocking coroutine code the tasks are processed in the right order, but not with Streams.
You can use RxJava to solve this.
List<MyObjects> items = getList()
Observable.from(items).flatMap(object : Func1<MyObjects, Observable<String>>() {
fun call(item: MyObjects): Observable<String> {
return someMethod(item)
}
}).subscribeOn(Schedulers.io()).observeOn(AndroidSchedulers.mainThread()).subscribe(object : Subscriber<String>() {
fun onCompleted() {
}
fun onError(e: Throwable) {
}
fun onNext(s: String) {
// do on output of each string
}
})
By subscribing on Schedulers.io(), some method is scheduled on background thread.
To process items of a collection in parallel you can use Kotlin Coroutines. For example the following extension function processes items in parallel and waits for them to be processed:
suspend fun <T, R> Iterable<T>.processInParallel(
dispatcher: CoroutineDispatcher = Dispatchers.IO,
processBlock: suspend (v: T) -> R,
): List<R> = coroutineScope { // or supervisorScope
map {
async(dispatcher) { processBlock(it) }
}.awaitAll()
}
This is suspend extension function on Iterable<T> type, which does a parallel processing of items and returns some result of processing each item. By default it uses Dispatchers.IO dispatcher to offload blocking tasks to a shared pool of threads. Must be called from a coroutine (including a coroutine with Dispatchers.Main dispatcher) or another suspend function.
Example of calling from a coroutine:
val myObjects: List<MyObject> = getMyObjects()
someCoroutineScope.launch {
val results = myObjects.processInParallel {
someMethod(it)
}
// use processing results
}
where someCoroutineScope is an instance of CoroutineScope.
Or if you want to just launch and forget you can use this function:
fun <T> CoroutineScope.processInParallelAndForget(
iterable: Iterable<T>,
dispatcher: CoroutineDispatcher = Dispatchers.IO,
processBlock: suspend (v: T) -> Unit
) = iterable.forEach {
launch(dispatcher) { processBlock(it) }
}
This is an extension function on CoroutineScope, which doesn't return any result. It also uses Dispatchers.IO dispatcher by default. Can be called using CoroutineScope or from another coroutine.
Calling example:
someoroutineScope.processInParallelAndForget(myObjects) {
someMethod(it)
}
// OR from another coroutine:
someCoroutineScope.launch {
processInParallelAndForget(myObjects) {
someMethod(it)
}
}
where someCoroutineScope is an instance of CoroutineScope.