GroupBy operator for Kotlin Flow - kotlin

I am trying to switch from RxJava to Kotlin Flow. Flow is really impressive. But Is there any operator similar to RxJava's "GroupBy" in kotlin Flow right now?

As of Kotlin Coroutines 1.3, the standard library doesn't seem to provide this operator. However, since the design of Flow is such that all operators are extension functions, there is no fundamental distinction between the standard library providing it and you writing your own.
With that in mind, here are some of my ideas on how to approach it.
1. Collect Each Group to a List
If you just need a list of all items for each key, use this simple implementation that emits pairs of (K, List<T>):
fun <T, K> Flow<T>.groupToList(getKey: (T) -> K): Flow<Pair<K, List<T>>> = flow {
val storage = mutableMapOf<K, MutableList<T>>()
collect { t -> storage.getOrPut(getKey(t)) { mutableListOf() } += t }
storage.forEach { (k, ts) -> emit(k to ts) }
}
For this example:
suspend fun main() {
val input = 1..10
input.asFlow()
.groupToList { it % 2 }
.collect { println(it) }
}
it prints
(1, [1, 3, 5, 7, 9])
(0, [2, 4, 6, 8, 10])
2.a Emit a Flow for Each Group
If you need the full RxJava semantics where you transform the input flow into many output flows (one per distinct key), things get more involved.
Whenever you see a new key in the input, you must emit a new inner flow to the downstream and then, asynchronously, keep pushing more data into it whenever you encounter the same key again.
Here's an implementation that does this:
fun <T, K> Flow<T>.groupBy(getKey: (T) -> K): Flow<Pair<K, Flow<T>>> = flow {
val storage = mutableMapOf<K, SendChannel<T>>()
try {
collect { t ->
val key = getKey(t)
storage.getOrPut(key) {
Channel<T>(32).also { emit(key to it.consumeAsFlow()) }
}.send(t)
}
} finally {
storage.values.forEach { chan -> chan.close() }
}
}
It sets up a Channel for each key and exposes the channel to the downstream as a flow.
2.b Concurrently Collect and Reduce Grouped Flows
Since groupBy keeps emitting the data to the inner flows after emitting the flows themselves to the downstream, you have to be very careful with how you collect them.
You must collect all the inner flows concurrently, with no upper limit on the level of concurrency. Otherwise the channels of the flows that are queued for later collection will eventually block the sender and you'll end up with a deadlock.
Here is a function that does this properly:
fun <T, K, R> Flow<Pair<K, Flow<T>>>.reducePerKey(
reduce: suspend Flow<T>.() -> R
): Flow<Pair<K, R>> = flow {
coroutineScope {
this#reducePerKey
.map { (key, flow) -> key to async { flow.reduce() } }
.toList()
.forEach { (key, deferred) -> emit(key to deferred.await()) }
}
}
The map stage launches a coroutine for each inner flow it receives. The coroutine reduces it to the final result.
toList() is a terminal operation that collects the entire upstream flow, launching all the async coroutines in the process. The coroutines start consuming the inner flows even while we're still collecting the main flow. This is essential to prevent a deadlock.
Finally, after all the coroutines have been launched, we start a forEach loop that waits for and emits the final results as they become available.
You can implement almost the same behavior in terms of flatMapMerge:
fun <T, K, R> Flow<Pair<K, Flow<T>>>.reducePerKey(
reduce: suspend Flow<T>.() -> R
): Flow<Pair<K, R>> = flatMapMerge(Int.MAX_VALUE) { (key, flow) ->
flow { emit(key to flow.reduce()) }
}
The difference is in the ordering: whereas the first implementation respects the order of appearance of keys in the input, this one doesn't. Both perform similarly.
3. Example
This example groups and sums 40 million integers:
suspend fun main() {
val input = 1..40_000_000
input.asFlow()
.groupBy { it % 100 }
.reducePerKey { sum { it.toLong() } }
.collect { println(it) }
}
suspend fun <T> Flow<T>.sum(toLong: suspend (T) -> Long): Long {
var sum = 0L
collect { sum += toLong(it) }
return sum
}
I can successfully run this with -Xmx64m. On my 4-core laptop I'm getting about 4 million items per second.
It is simple to redefine the first solution in terms of the new one like this:
fun <T, K> Flow<T>.groupToList(getKey: (T) -> K): Flow<Pair<K, List<T>>> =
groupBy(getKey).reducePerKey { toList() }

Not yet but you can have a look at this library https://github.com/akarnokd/kotlin-flow-extensions .

In my project, I was able to achieve this non-blocking by using Flux.groupBy.
https://projectreactor.io/docs/core/release/api/reactor/core/publisher/Flux.html#groupBy-java.util.function.Function-
I did this in the process of converting the results obtained with Flux to Flow.
This may be an inappropriate answer for the situation in question, but I share it as an example.

Related

How to implement timeout without interrupting a calculation in Kotlin coroutine?

Let's say a request started a long calculation, but ready to wait no longer than X seconds for the result. But instead of interrupting the calculation I would like it to continue in parallel until completion.
The first condition ("wait no longer than") is satisfied by withTimeoutOrNull function.
What is the standard (idiomatic Kotlin) way to continue the computation and execute some final action when the result is ready?
Example:
fun longCalculation(key: Int): String { /* 2..10 seconds */}
// ----------------------------
cache = Cache<Int, String>()
suspend fun getValue(key: Int) = coroutineScope {
val value: String? = softTimeoutOrNull(
timeout = 5.seconds(),
calculation = { longCalculation() },
finalAction = { v -> cache.put(key, v) }
)
// return calculated value or null
}
This is a niche enough case that I don't think there's a consensus on an idiomatic way to do it.
Since you want the work to continue in the background even if the current coroutine is resuming, you need a separate CoroutineScope to launch that background work, rather than using the coroutineScope builder that launches the coroutine as a child coroutine of the current one. That outer scope will determine the lifetime of the coroutines that it launches (cancelling them if it is cancelled). Typically, if you're using coroutines, you already have a scope on hand that's associated with the lifecycle of the current class.
I think this would do what you're describing. The current coroutine can wrap a Deferred.await() call in withTimeoutOrNull to see if it can wait for the other coroutine (that's not a child coroutine since it was launched directly from an outer CoroutineScope) without interfering with it.
suspend fun getValue(key: Int): String? {
val deferred = someScope.async {
longCalculation(key)
.also { cache.put(key, it) }
}
return withTimeoutOrNull(5000) { deferred.await() }
}
Here's a generalized version:
/**
* Launches a coroutine in the specified [scope] to perform the given [calculation]
* and [finalAction] with that calculation's result. Returns the result of the
* calculation if it is available within [timeout].
*/
suspend fun <T> softTimeoutOrNull(
scope: CoroutineScope,
timeout: Duration,
calculation: suspend () -> T,
finalAction: suspend (T) -> Unit = { }
): T? {
val deferred = scope.async {
calculation().also { finalAction(it) }
}
return withTimeoutOrNull(timeout) { deferred.await() }
}

Why is this extension function slower than non extension counterpart?

I was trying to write a parallel map extension function to do map operation over a List in parallel using coroutines.
However there is a significant overhead in my solution and I can't find out why.
This is my implementation of the pmap extension function:
fun <T, U> List<T>.pmap(scope: CoroutineScope = GlobalScope,
transform: suspend (T) -> U): List<U> {
return map { i -> scope.async { transform(i) } }.map { runBlocking { it.await() } }
}
However, when I do the exact same operation in a normal function, it takes up to extra 100ms (which is a lot).
I tried using inline but it had no effect.
I'm leaving here the full test I've done to demonstrate this behavior:
import kotlinx.coroutines.*
import kotlin.system.measureTimeMillis
fun main() {
test()
}
fun <T, U> List<T>.pmap(scope: CoroutineScope = GlobalScope,
transform: suspend (T) -> U): List<U> {
return this.map { i -> scope.async { transform(i) } }.map { runBlocking { it.await() } }
}
fun test() {
val list = listOf<Long>(100,200,300)
val transform: suspend (Long) -> Long = { long: Long ->
delay(long)
long*2
}
val timeTakenPmap = measureTimeMillis {
list.pmap(GlobalScope) { transform(it) }
}
val manualpmap = measureTimeMillis {
list.map { GlobalScope.async { transform(it) } }
.map { runBlocking { it.await() } }
}
val timeTakenMap = measureTimeMillis {
list.map { runBlocking { transform(it) } }
}
println("pmapTime: $timeTakenPmap - mapTime: $timeTakenMap - manualpmap: $manualpmap")
}
It can be run in kotlin playground: https://pl.kotl.in/CIXVqezg3
In the playground it prints this result:
pmapTime: 411 - mapTime: 602 - manualpmap: 302
MapTime and manualPmap give reasonable results, only 2ms of time outside the delays. But pmapTime is way off. And the code between manualpmap and pmap looks exactly the same to me.
In my own machine it runs a little faster, pmap takes around 350ms.
Does anyone know why this happens?
First of all, manual benchmarks like this are usually of very little significance. There are many things that can be optimized away by the compiler or the JIT and any conclusion can be quite wrong. If you really want to compare things, you should instead use benchmarking libraries which take into account JVM warmup etc.
Now, the overhead you see (if you could confirm there was an actual overhead) might be caused by the fact that your higher-order extension is not marked inline, so instances of the lambda you pass need to be created - but as #Tenfour04 noted there are many other possible reasons: thread pool lazy initialization, significance of the list size, etc.
That being said, this is really not an appropriate way to write parallel map, for several reasons:
GlobalScope is a pretty bad default in general, and should be used in very specific situations only. But don't worry about it because of the next point.
You don't need an externally provided CoroutineScope if the coroutines you launch do not outlive your method. Instead, use coroutineScope { ... } and make your function suspend, and the caller will choose the context if they need to
map { it.await() } is inefficient in case of errors: if the last element's transformation immediately fails, map will wait for all previous elements to finish before failing. You should prefer awaitAll which takes care of this.
runBlocking should be avoided in coroutines (blocking threads in general, especially when you don't control which thread you're blocking), so using it in deep library-like functions like this is dangerous, because it will likely be used in coroutines at some point.
Applying those points gives:
suspend inline fun <T, U> List<T>.pmap(transform: suspend (T) -> U): List<U> {
return coroutineScope {
map { async { transform(it) } }.awaitAll()
}
}

Design pattern to best implement batch api requests that happen transparently to the calling layer

I have a batch processor that I want to refactor to be expressed a 1-to-1 fashion based on input to increase readability, and for further optimization later on. The issue is that there is a service that should be called in batches to reduce HTTP overhead, so mixing the 1-to-1 code with the batch code is a bit tricky, and we may not want to call the service with every input. Results can be sent out eagerly one-by-one, but order must be maintained, so something like a flow doesn't seem to work.
So, ideally the batch processor would look something like this:
class Processor<A, B> {
val service: Service<A, B>
val scope: CoroutineScope
fun processBatch(input: List<A>) {
input.map {
Pair(it, scope.async { service.call(it) })
}.map {
(a, b) ->
runBlocking { b.await().let { /** handle result, do something with a if result is null, etc **/ } }
}
}
}
The desire is to perform all of the service logic in such a way that it is executing in the background, automatically splitting the inputs for the service into batches, executing them asynchronously, and somehow mapping the result of the batch call into the suspended call.
Here is a hacky implementation:
class Service<A, B> {
val inputContainer: MutableList<A>
val outputs: MutableList<B>
val runCalled = AtomicBoolean(false)
val batchSize: Int
suspended fun call(input: A): B? {
// some prefiltering logic that returns a null early
val index = inputContainer.size
inputContainer.add(a) // add to overall list for later batching
return suspend {
run()
outputs[index]
}
}
fun run() {
val batchOutputs = mutableListOf<Deferred<List<B?>>>()
if (!runCalled.getAndSet(true)) {
inputs.chunked(batchSize).forEach {
batchOutputs.add(scope.async { batchCall(it) })
}
runBlocking {
batchOutputs.map {
val res = result.await()
outputs.addAll(res)
}
}
}
}
suspended fun batchCall(input: List<A>): List<B?> {
// batch API call, etc
}
}
Something like this could work but there are several concerns:
All API calls go out at once. Ideally this would be batching and executing in the background while other inputs are being scheduled, but this is not .
Processing of the service result for the first input cannot resume until all results have been returned. Ideally we could process the result if the service call has returned, while other results continue to be performed in the background.
Containers of intermediate results seem hacky and prone to bugs. Cleanup logic is also needed, which introduces more hacky bits into the rest of the code
I can think of several optimizations to the address 1 and 2, but I imagine concerns related to 3 would be worse. This seems like a fairly common call pattern and I would expect there to be a library or much simpler design pattern to accomplish this, but I haven't been able to find anything. Any guidance is appreciated.
You're on the right track by using Deferred. The solution I would use is:
When the caller makes a request, create a CompletableDeferred
Using a channel, pass this CompletableDeferred to the service for later completion
Have the caller suspend until the service completes the CompletableDeferred
It might look something like this:
val requestChannel = Channel<Pair<Request, CompletableDeferred<Result>>()
suspend fun doRequest(request: Request): Result {
val result = CompletableDeferred<Result>()
requestChannel.send(Pair(request, result))
return result.await()
}
fun run() = scope.launch {
while(isActive) {
val (requests, deferreds) = getBatch(batchSize).unzip()
val results = batchCall(requests)
(results zip deferreds).forEach { (result, deferred) ->
deferred.complete(result)
}
}
}
suspend fun getBatch(batchSize: Int) = buildList {
repeat(batchSize) {
add(requestChannel.receive())
}
}

Parallelly consuming a long sequence in Kotlin

I have a function generating a very long sequence of work items. Generating these items is fast, but there are too many in total to store a list of them in memory. Processing the items produces no results, just side effects.
I would like to process these items across multiple threads. One solution is to have a thread read from the generator and write to a concurrent bounded queue, and a number of executor threads polling for work from the bounded queue, but this is a lot of things to set up.
Is there anything in the standard library that would help me do that?
I had initially tried
items.map { async(executor) process(it) }.forEach { it.await() }
But, as pointed out in how to implement parallel mapping for sequences in kotlin, this doesn't work for reasons that are obvious in retrospect.
Is there a quick way to do this (possibly with an external library), or is manually setting up a bounded queue in the middle my best option?
You can look at coroutines combined with channels.
If all work items can be emmited on demand with producer channel. Then it's possible to await for each items and process it with a pool of threads.
An example :
sealed class Stream {
object End: Stream()
class Item(val data: Long): Stream()
}
val produceCtx = newSingleThreadContext("producer")
// A dummy producer that send one million Longs on its own thread
val producer = CoroutineScope(produceCtx).produce {
for (i in (0 until 1000000L)) send(Stream.Item(i))
send(Stream.End)
}
val workCtx = newFixedThreadPoolContext(4, "work")
val workers = Channel<Unit>(4)
repeat(4) { workers.offer(Unit) }
for(_nothing in workers) { // launch 4 times then wait for a task to finish
launch(workCtx) {
when (val item = producer.receive()) {
Stream.End -> workers.close()
is Stream.Item -> {
workFunction(item.data) // Actual work here
workers.offer(Unit) // Notify to launch a new task
}
}
}
}
Your magic word would be .asSequence():
items
.asSequence() // Creates lazy executable sequence
.forEach { launch { executor.process(it) } } // If you don't need the value aftrwards, use 'launch', a.k.a. "fire and forget"
but there are too many in total to store a list of them in memory
Then don't map to list and don't collect the values, no matter if you work with Kotlin or Java.
As long as you are on the JVM, you can write yourself an extension function, that works the sequence in chunks and spawns futures for all entries in a chunk. Something like this:
#Suppress("UNCHECKED_CAST")
fun <T, R> Sequence<T>.mapParallel(action: (value: T) -> R?): Sequence<R?> {
val numThreads = Runtime.getRuntime().availableProcessors() - 1
return this
.chunked(numThreads)
.map { chunk ->
val threadPool = Executors.newFixedThreadPool(numThreads)
try {
return#map chunk
.map {
// CAUTION -> needs to be written like this
// otherwise the submit(Runnable) overload is called
// which always returns an empty Future!!!
val callable: () -> R? = { action(it) }
threadPool.submit(callable)
}
} finally {
threadPool.shutdown()
}
}
.flatten()
.map { future -> future.get() }
}
You can then just use it like:
items
.mapParallel { /* process an item */ }
.forEach { /* handle the result */ }
As long as workload per item is similar, this gives a good parallel processing.

Kotlin Process Collection In Parallel?

I have a collection of objects, which I need to perform some transformation on. Currently I am using:
var myObjects: List<MyObject> = getMyObjects()
myObjects.forEach{ myObj ->
someMethod(myObj)
}
It works fine, but I was hoping to speed it up by running someMethod() in parallel, instead of waiting for each object to finish, before starting on the next one.
Is there any way to do this in Kotlin? Maybe with doAsyncTask or something?
I know when this was asked over a year ago it was not possible, but now that Kotlin has coroutines like doAsyncTask I am curious if any of the coroutines can help
Yes, this can be done using coroutines. The following function applies an operation in parallel on all elements of a collection:
fun <A>Collection<A>.forEachParallel(f: suspend (A) -> Unit): Unit = runBlocking {
map { async(CommonPool) { f(it) } }.forEach { it.await() }
}
While the definition itself is a little cryptic, you can then easily apply it as you would expect:
myObjects.forEachParallel { myObj ->
someMethod(myObj)
}
Parallel map can be implemented in a similar way, see https://stackoverflow.com/a/45794062/1104870.
Java Stream is simple to use in Kotlin:
tasks.stream().parallel().forEach { computeNotSuspend(it) }
If you are using Android however, you cannot use Java 8 if you want an app compatible with an API lower than 24.
You can also use coroutines as you suggested. But it's not really part of the language as of now (August 2017) and you need to install an external library. There is very good guide with examples.
runBlocking<Unit> {
val deferreds = tasks.map { async(CommonPool) { compute(it) } }
deferreds.forEach { it.await() }
}
Note that coroutines are implemented with non-blocking multi-threading, which mean they can be faster than traditional multi-threading. I have code below benchmarking the Stream parallel versus coroutine and in that case the coroutine approach is 7 times faster on my machine. However you have to do some work yourself to make sure your code is "suspending" (non-locking) which can be quite tricky. In my example I'm just calling delay which is a suspend function provided by the library. Non-blocking multi-threading is not always faster than traditional multi-threading. It can be faster if you have many threads doing nothing but waiting on IO, which is kind of what my benchmark is doing.
My benchmarking code:
import kotlinx.coroutines.experimental.CommonPool
import kotlinx.coroutines.experimental.async
import kotlinx.coroutines.experimental.delay
import kotlinx.coroutines.experimental.launch
import kotlinx.coroutines.experimental.runBlocking
import java.util.*
import kotlin.system.measureNanoTime
import kotlin.system.measureTimeMillis
class SomeTask() {
val durationMS = random.nextInt(1000).toLong()
companion object {
val random = Random()
}
}
suspend fun compute(task: SomeTask): Unit {
delay(task.durationMS)
//println("done ${task.durationMS}")
return
}
fun computeNotSuspend(task: SomeTask): Unit {
Thread.sleep(task.durationMS)
//println("done ${task.durationMS}")
return
}
fun main(args: Array<String>) {
val n = 100
val tasks = List(n) { SomeTask() }
val timeCoroutine = measureNanoTime {
runBlocking<Unit> {
val deferreds = tasks.map { async(CommonPool) { compute(it) } }
deferreds.forEach { it.await() }
}
}
println("Coroutine ${timeCoroutine / 1_000_000} ms")
val timePar = measureNanoTime {
tasks.stream().parallel().forEach { computeNotSuspend(it) }
}
println("Stream parallel ${timePar / 1_000_000} ms")
}
Output on my 4 cores computer:
Coroutine: 1037 ms
Stream parallel: 7150 ms
If you uncomment out the println in the two compute functions you will see that in the non-blocking coroutine code the tasks are processed in the right order, but not with Streams.
You can use RxJava to solve this.
List<MyObjects> items = getList()
Observable.from(items).flatMap(object : Func1<MyObjects, Observable<String>>() {
fun call(item: MyObjects): Observable<String> {
return someMethod(item)
}
}).subscribeOn(Schedulers.io()).observeOn(AndroidSchedulers.mainThread()).subscribe(object : Subscriber<String>() {
fun onCompleted() {
}
fun onError(e: Throwable) {
}
fun onNext(s: String) {
// do on output of each string
}
})
By subscribing on Schedulers.io(), some method is scheduled on background thread.
To process items of a collection in parallel you can use Kotlin Coroutines. For example the following extension function processes items in parallel and waits for them to be processed:
suspend fun <T, R> Iterable<T>.processInParallel(
dispatcher: CoroutineDispatcher = Dispatchers.IO,
processBlock: suspend (v: T) -> R,
): List<R> = coroutineScope { // or supervisorScope
map {
async(dispatcher) { processBlock(it) }
}.awaitAll()
}
This is suspend extension function on Iterable<T> type, which does a parallel processing of items and returns some result of processing each item. By default it uses Dispatchers.IO dispatcher to offload blocking tasks to a shared pool of threads. Must be called from a coroutine (including a coroutine with Dispatchers.Main dispatcher) or another suspend function.
Example of calling from a coroutine:
val myObjects: List<MyObject> = getMyObjects()
someCoroutineScope.launch {
val results = myObjects.processInParallel {
someMethod(it)
}
// use processing results
}
where someCoroutineScope is an instance of CoroutineScope.
Or if you want to just launch and forget you can use this function:
fun <T> CoroutineScope.processInParallelAndForget(
iterable: Iterable<T>,
dispatcher: CoroutineDispatcher = Dispatchers.IO,
processBlock: suspend (v: T) -> Unit
) = iterable.forEach {
launch(dispatcher) { processBlock(it) }
}
This is an extension function on CoroutineScope, which doesn't return any result. It also uses Dispatchers.IO dispatcher by default. Can be called using CoroutineScope or from another coroutine.
Calling example:
someoroutineScope.processInParallelAndForget(myObjects) {
someMethod(it)
}
// OR from another coroutine:
someCoroutineScope.launch {
processInParallelAndForget(myObjects) {
someMethod(it)
}
}
where someCoroutineScope is an instance of CoroutineScope.