How to do parallel flatMap in Kotlin? - kotlin

I need to do parallel flat map.
Let's say I have this code:
val coll: List<Set<Int>> = ...
coll.flatMap{set -> setOf(set, set + 1)}
I need something like this:
coll.pFlatMap{set -> setOf(set, set + 1)} // parallel execution

Kotlin doesn’t provide any threading out of the box.
But you can use kotlinx.coroutines to do something like this:
val coll: List<Set<Int>> = ...
val result = coll
.map {set ->
// Run each task in own coroutine,
// you can limit concurrency using custom coroutine dispatcher
async { doSomethingWithSet(set) }
}
.flatMap { deferred ->
// Await results and use flatMap
deferred.await() // You can handle errors here
}

Alternatively, you can do it without coroutines:
fun <T, R> Collection<T>.pFlatMap(transform: (T) -> Collection<R>): List<R> =
parallelStream().flatMap { transform(it).stream() }.toList()
This solution requires Kotlin on JDK 8 or higher.
You can also make it more general (analogous to Kotlin's flatMap):
fun <T, R> Iterable<T>.pFlatMap(transform: (T) -> Iterable<R>): List<R> =
toList()
.parallelStream()
.flatMap { transform(it).toList().stream() }
.toList()

You need to add Coroutine scope first (runBlocking), and apply deferred execution to your function (async):
val coll: List<Set<Int>> = listOf()
val x = runBlocking {
coll.map{ set ->
async {
setOf(set, set + 1)
}
}.awaitAll().flatten()
}
x ....

Related

Kotlin coroutine suspend and delay

I want to simulate file loading and I want to delay code for 4 seconds and I can't do this.
suspend fun showLoadingProgress() : String = suspendCancellableCoroutine{ continuation ->
while (fileIsBeingLoaded())
{
delay(4000)
val percent = ((loadedBites.toDouble() / fileBites.toDouble())*100).toInt()
continuation.resume("$loadedBites/$fileBites ($percent%)")
}
}
I have error that: suspension functions can be called only from coroutine body. BUT
When I have code like this, without returning String, then my delay works.. WHY?:
suspend fun showLoadingProgress() {
while (fileIsBeingLoaded())
{
delay(4000)
val percent = ((loadedBites.toDouble() / fileBites.toDouble())*100).toInt()
continuation.resume("$loadedBites/$fileBites ($percent%)")
}
}
How can I make delay and return a String?
suspendCancellableCoroutine is mainly used with callbacks to suspend a coroutine execution until the callback fires, for example:
suspend fun getUser(id: String): User = suspendCancellableCoroutine { continuation ->
Api.getUser(id) { user ->
continuation.resume(user)
}
continuation.invokeOnCancellation {
// clear some resources, cancel tasks, close streams etc.
}
}
delay doesn't work in suspendCancellableCoroutine block because it is not marked as suspend and therefore we can't call suspend function in it. suspendCancellableCoroutine function is defined like:
public suspend inline fun <T> suspendCancellableCoroutine(
crossinline block: (CancellableContinuation<T>) -> Unit
): T = ...
If it was defined something like this (please note block marked as suspend):
public suspend inline fun <T> suspendCancellableCoroutine(
crossinline block: suspend (CancellableContinuation<T>) -> Unit
): T = ...
then we would be able to call delay function in it.
I don't know why you use while loop, it seems it is redundant there. Or you use it incorrectly for the loading progress.
You don't have callbacks, so you can get rid of suspendCancellableCoroutine:
suspend fun getLoadingProgress(): String {
delay(4000)
val percent = ((loadedBites.toDouble() / fileBites.toDouble())*100).toInt()
return "$loadedBites/$fileBites ($percent%)"
}
suspend fun showLoadingProgress() {
while (fileIsBeingLoaded()) {
val progress = getLoadingProgress()
// use progress
}
}
Another approach is to use Flow to emit the loading progress. It will look something like the following using flow builder:
fun getLoadingProgress(): Flow<String> = flow {
while (fileIsBeingLoaded()) {
delay(4000)
val percent = ((loadedBites.toDouble() / fileBites.toDouble())*100).toInt()
emit("$loadedBites/$fileBites ($percent%)")
}
}
And collect values:
someCoroutineScope.launch {
getLoadingProgress().collect { progress ->
// use progress
}
}

coroutine scope and async - right approach?

I love the concept of co-routines and I've been using in my android projects. Currently i'm working on a JVM module which i'll be including in a Ktor project and i know ktor has support for co-routines.
(find the attached code snippet)
Just wanted to know is this the right approach?
How do i use async with recursion?
Any resources that you can recommend which can help me grasp more in-depth knowledge of co-routines would be helpful.
Thanks in advance!
override suspend fun processInstruction(args.. ): List<Any> = coroutineScope {
val dataWithFields = async{
listOfFields.fold(mutableList()){ acc,field ->
val data = someProcess(field)
val nested = processInstruction(...nestedField) // nested call
acc.addAll(data)
acc.addAll(nested)
acc
}
}
return#coroutineScope postProcessData(dataWithFields.await())
}
If you want to process all nested calls in parallel, you should wrap each of them in async (async should be inside of the loop). And then, after the loop, you should await all the results. (In your code you run await right after single async, so there is no parallel execution).
For example, if you have Element:
interface Element {
val subElements: List<Element>
suspend fun calculateData(): SomeData
}
interface SomeData
And you want to calculateData of all subElements in parallel, you can do it like this:
suspend fun Element.calculateAllData(): List<SomeData> = coroutineScope {
val data = async { calculateData() }
val subData = subElements.map { sub -> async { sub.calculateAllData() } }
return#coroutineScope listOf(data.await()) + subData.awaitAll().flatten()
}
As you said in a comments section, you need parent-data to calculate sub-data, therefore the first thing calculateAllData() should do is calculate the parent-data:
suspend fun Element.calculateAllData(
parentData: SomeData = defaultParentData()
): List<SomeData> = coroutineScope {
val data = calculateData(parentData)
val subData = subElements.map { sub -> async { sub.calculateAllData(data) } }
return#coroutineScope listOf(data) + subData.awaitAll().flatten()
}
Now you may wonder how fast it works. Consider the following Element implementation:
class ElementImpl(override val subElements: List<Element>) : Element {
override suspend fun calculateData(parentData: SomeData): SomeData {
delay(1000)
return SomeData()
}
}
fun elmOf(vararg elements: Element) = ElementImpl(listOf(*elements))
And the following test:
println(measureTime {
elmOf(
elmOf(),
elmOf(
elmOf(),
elmOf(
elmOf(),
elmOf(),
elmOf()
)
),
elmOf(
elmOf(),
elmOf()
),
elmOf()
).calculateAllData()
})
If parent-data isn't needed to calculate sub-data, it prints 1.06s, since in this case, all the data is calculated in parallel. Otherwise, it prints 4.15s, since elements tree height is 4.

GroupBy operator for Kotlin Flow

I am trying to switch from RxJava to Kotlin Flow. Flow is really impressive. But Is there any operator similar to RxJava's "GroupBy" in kotlin Flow right now?
As of Kotlin Coroutines 1.3, the standard library doesn't seem to provide this operator. However, since the design of Flow is such that all operators are extension functions, there is no fundamental distinction between the standard library providing it and you writing your own.
With that in mind, here are some of my ideas on how to approach it.
1. Collect Each Group to a List
If you just need a list of all items for each key, use this simple implementation that emits pairs of (K, List<T>):
fun <T, K> Flow<T>.groupToList(getKey: (T) -> K): Flow<Pair<K, List<T>>> = flow {
val storage = mutableMapOf<K, MutableList<T>>()
collect { t -> storage.getOrPut(getKey(t)) { mutableListOf() } += t }
storage.forEach { (k, ts) -> emit(k to ts) }
}
For this example:
suspend fun main() {
val input = 1..10
input.asFlow()
.groupToList { it % 2 }
.collect { println(it) }
}
it prints
(1, [1, 3, 5, 7, 9])
(0, [2, 4, 6, 8, 10])
2.a Emit a Flow for Each Group
If you need the full RxJava semantics where you transform the input flow into many output flows (one per distinct key), things get more involved.
Whenever you see a new key in the input, you must emit a new inner flow to the downstream and then, asynchronously, keep pushing more data into it whenever you encounter the same key again.
Here's an implementation that does this:
fun <T, K> Flow<T>.groupBy(getKey: (T) -> K): Flow<Pair<K, Flow<T>>> = flow {
val storage = mutableMapOf<K, SendChannel<T>>()
try {
collect { t ->
val key = getKey(t)
storage.getOrPut(key) {
Channel<T>(32).also { emit(key to it.consumeAsFlow()) }
}.send(t)
}
} finally {
storage.values.forEach { chan -> chan.close() }
}
}
It sets up a Channel for each key and exposes the channel to the downstream as a flow.
2.b Concurrently Collect and Reduce Grouped Flows
Since groupBy keeps emitting the data to the inner flows after emitting the flows themselves to the downstream, you have to be very careful with how you collect them.
You must collect all the inner flows concurrently, with no upper limit on the level of concurrency. Otherwise the channels of the flows that are queued for later collection will eventually block the sender and you'll end up with a deadlock.
Here is a function that does this properly:
fun <T, K, R> Flow<Pair<K, Flow<T>>>.reducePerKey(
reduce: suspend Flow<T>.() -> R
): Flow<Pair<K, R>> = flow {
coroutineScope {
this#reducePerKey
.map { (key, flow) -> key to async { flow.reduce() } }
.toList()
.forEach { (key, deferred) -> emit(key to deferred.await()) }
}
}
The map stage launches a coroutine for each inner flow it receives. The coroutine reduces it to the final result.
toList() is a terminal operation that collects the entire upstream flow, launching all the async coroutines in the process. The coroutines start consuming the inner flows even while we're still collecting the main flow. This is essential to prevent a deadlock.
Finally, after all the coroutines have been launched, we start a forEach loop that waits for and emits the final results as they become available.
You can implement almost the same behavior in terms of flatMapMerge:
fun <T, K, R> Flow<Pair<K, Flow<T>>>.reducePerKey(
reduce: suspend Flow<T>.() -> R
): Flow<Pair<K, R>> = flatMapMerge(Int.MAX_VALUE) { (key, flow) ->
flow { emit(key to flow.reduce()) }
}
The difference is in the ordering: whereas the first implementation respects the order of appearance of keys in the input, this one doesn't. Both perform similarly.
3. Example
This example groups and sums 40 million integers:
suspend fun main() {
val input = 1..40_000_000
input.asFlow()
.groupBy { it % 100 }
.reducePerKey { sum { it.toLong() } }
.collect { println(it) }
}
suspend fun <T> Flow<T>.sum(toLong: suspend (T) -> Long): Long {
var sum = 0L
collect { sum += toLong(it) }
return sum
}
I can successfully run this with -Xmx64m. On my 4-core laptop I'm getting about 4 million items per second.
It is simple to redefine the first solution in terms of the new one like this:
fun <T, K> Flow<T>.groupToList(getKey: (T) -> K): Flow<Pair<K, List<T>>> =
groupBy(getKey).reducePerKey { toList() }
Not yet but you can have a look at this library https://github.com/akarnokd/kotlin-flow-extensions .
In my project, I was able to achieve this non-blocking by using Flux.groupBy.
https://projectreactor.io/docs/core/release/api/reactor/core/publisher/Flux.html#groupBy-java.util.function.Function-
I did this in the process of converting the results obtained with Flux to Flow.
This may be an inappropriate answer for the situation in question, but I share it as an example.

Kotlin: higher-order function taking vararg lamba-with-receiver, where the receiver takes arguments

I am trying to wrap a hierarchy of Java builders in a Kotlin type-safe builder. The hierarchy consists of the following builders (and their targets):
FigureBuilder (Figure)
LayoutBuilder (Layout)
TraceBuilder (Trace)
In Java, FigureBuilder has one method that takes a Layout, and another that take n traces, using a varargs method called addTraces():
addTraces(Trace... traces)
The assembly process in Java is basically
Figure f = Figure.builder()
.layout(
Layout.builder()
.title("title")
.build())
.addTraces(
ScatterTrace.builder()
.name("my series")
.build())
.build();
In Kotlin, I have code that creates the figure builder and the layout builder, but I am stuck on the trace builder. My code so far looks like this:
val figure = figure {
layout {title("Wins vs BA")}
addTraces(
ScatterTrace.builder(x, y)
.name("Batting avg.").build()
)
}
fun figure(c: FigureBuilder.() -> Unit) : Figure {
val builder = Figure.builder()
c(builder)
return builder.build()
}
fun FigureBuilder.layout(c: Layout.LayoutBuilder.() -> Unit) {
layout(Layout.builder().apply(c).build())
}
// won't compile: ScatterTrace.builder() requires 2 args
fun FigureBuilder.traces(vararg c: ScatterTrace.ScatterBuilder.() -> Unit) {
c.forEach {
val builder = ScatterTrace.builder()
it(builder)
addTraces(builder.build())
}
}
I'm not at all sure the last function will work if I can get it to compile, but the immediate blocking issue is that ScatterTrace.builder() takes two arguments and
I cannot figure out how to pass them into the lambda.
Thanks very much
It's strange that in Java you can create ScatterTrace.builder without arguments but in Kotlin you need two arguments to construct it. Maybe it will be better to apply traces one by one?
fun FigureBuilder.traces(x: Int, y: Int, c: ScatterTrace.ScatterBuilder.() -> Unit) {
val builder = ScatterTrace.builder(x, y)
c(builder)
addTraces(builder.build())
}
val figure = figure {
layout { title("Wins vs BA") }
addTraces(
trace(x, y) { name("Batting avg.") },
trace(x, y) { name("Batting avg. 2") },
trace(x, y) { name("Batting avg. 3") }
)
}
fun FigureBuilder.traces(vararg c: ScatterTrace.ScatterBuilder.() -> Unit) {
addTraces(
*c.map {
val builder = ScatterTrace.builder()
builder.build()
}.toTypedArray()
)
}
should do what you are looking for with your vararg requirement.

How to unit-test Kotlin-JS code with coroutines?

I've created a multi-platform Kotlin project (JVM & JS), declared an expected class and implemented it:
// Common module:
expect class Request(/* ... */) {
suspend fun loadText(): String
}
// JS implementation:
actual class Request actual constructor(/* ... */) {
actual suspend fun loadText(): String = suspendCoroutine { continuation ->
// ...
}
}
Now I'm trying to make a unit test using kotlin.test, and for the JVM platform I simply use runBlocking like this:
#Test
fun sampleTest() {
val req = Request(/* ... */)
runBlocking { assertEquals( /* ... */ , req.loadText()) }
}
How can I reproduce similar functionality on the JS platform, if there is no runBlocking?
Mb it's late, but there are open issue for adding possibility to use suspend functions in js-tests (there this function will transparent convert to promise)
Workaround:
One can define in common code:
expect fun runTest(block: suspend () -> Unit)
that is implemented in JVM with
actual fun runTest(block: suspend () -> Unit) = runBlocking { block() }
and in JS with
actual fun runTest(block: suspend () -> Unit): dynamic = promise { block() }
TL;DR
On JS one can use GlobalScope.promise { ... }.
But for most use cases the best option is probably to use runTest { ... } (from kotlinx-coroutines-test), which is cross-platform, and has some other benefits over runBlocking { ... } and GlobalScope.promise { ... } as well.
Full answer
I'm not sure what things were like when the question was originally posted, but nowadays the standard, cross-platform way to run tests that use suspend functions is to use runTest { ... } (from kotlinx-coroutines-test).
Note that in addition to running on all platforms, this also includes some other features, such as skipping delays (with the ability to mock the passage of time).
If for any reason (which is not typical, but might sometimes be the case) it is actually desirable to run the code in the test as it runs in production (including actual delays), then runBlocking { ... } can be used on JVM and Native, and GlobalScope.promise { ... } on JS. If going for this option, it might be convenient to define a single function signature which uses runBlocking on JVM and Native, and GlobalScope.promise on JS, e.g.:
// Common:
expect fun runTest(block: suspend CoroutineScope.() -> Unit)
// JS:
#OptIn(DelicateCoroutinesApi::class)
actual fun runTest(block: suspend CoroutineScope.() -> Unit): dynamic = GlobalScope.promise(block=block)
// JVM, Native:
actual fun runTest(block: suspend CoroutineScope.() -> Unit): Unit = runBlocking(block=block)
I was able to make the following work:
expect fun coTest(timeout: Duration = 30.seconds, block: suspend () -> Unit): Unit
// jvm
actual fun coTest(timeout: Duration, block: suspend () -> Unit) {
runBlocking {
withTimeout(timeout) {
block.invoke()
}
}
}
// js
private val testScope = CoroutineScope(CoroutineName("test-scope"))
actual fun coTest(timeout: Duration, block: suspend () -> Unit): dynamic = testScope.async {
withTimeout(timeout) {
block.invoke()
}
}.asPromise()
This launches a co-routine in a scope of your choice using async which you can then return like a promise.
You then write a test like so:
#Test
fun myTest() = coTest {
...
}