Kotlin Coroutines - Asynchronously consume a sequence - kotlin

I'm looking for a way to keep a Kotlin sequence that can produces values very quickly, from outpacing slower async consumers of its values. In the following code, if the async handleValue(it) cannot keep up with the rate that the sequence is producing values, the rate imbalance leads to buffering of produced values, and eventual out-of-memory errors.
getSequence().map { async {
handleValue(it)
}}
I believe this is a classic producer/consumer "back-pressure" situation, and I'm trying to understand how to use Kotlin coroutines to deal with it.
Thanks for any suggestions :)

Kotlin channels and flows offer buffering producer dispatched data until the consumer/collector is ready to consume it.
But Channels have some concerns that have been manipulated in Flows; for instance, they are considered hot streams:
The producer starts for dispatching data whether or not there is an attached consumer; and this introduces resource leaks.
As long as no consumer attached to the producer, the producer will stuck in suspending state
However Flows are cold streams; nothing will be produced until there is something to consume.
To handle your query with Flows:
GlobalScope.launch {
flow {
// Producer
for (item in getSequence()) emit(item)
}.map { handleValue(it) }
.buffer(10) // Optionally specify the buffer size
.collect { // Collector
}
}

For my own reference, and to anyone else this may help, here's how I eventually solved this using Channels - https://kotlinlang.org/docs/channels.html#channel-basics
A producer coroutine:
fun itemChannel() : ReceiveChannel<MyItem> {
return produce {
while (moreItems()) {
send(nextItem()) // <-- suspend until next 'receive()'
}
}
}
And a function to run multiple consumer coroutines, each reading off that channel:
fun itemConsumers() {
runBlocking {
val channel = itemChannel()
repeat(numberOfConsumers) {
launch {
var more = true
while (more) {
try {
val item = channel.receive()
// do stuff with item here...
} catch (ex: ClosedReceiveChannelException) {
more = false
}
}
}
}
}
}
The idea here is that the consumer receives off the channel within the coroutine, so the next receive() is not called until a consumer coroutine finishes handling the last item. This results in the desired back-pressure, as opposed to receiving from a sequence or flow in the main thread, and then passing the item into a coroutine to be consumed. In that scenario there is no back-pressure from the receiver, since the receive happens in a different coroutine than where the received item is consumed.

Related

Should I emit from a coroutine when collecting from a different flow?

I have a use case where I need to trigger on a specific event collected from a flow and restart it when it closes. I also need to emit all of the events to a different flow. My current implementation looks like this:
scope.launch {
val flowToReturn = MutableSharedFlow<Event>()
while (true) {
client
.connect() // returns Flow<Event>
.catch { ... } // ignore errors
.onEach { launch { flowToReturn.emit(it) } } // problem here
.filterIsInstance<Event.Some>()
.collect { someEvent ->
doStuff(someEvent)
}
}
}.start()
The idea is to always reconnect when the client disconnects (collect then returns and a new iteration begins) while having the outer flow lifecycle separate from the inner (connection) one. It being a shared flow with potentially multiple subscribers is a secondary concern.
As the emit documentation states it is not thread-safe. Should I call it from a new coroutine then? My concern is that the emit will suspend if there are no subscribers to the outer flow and I need to run the downstream pipeline regardless.
The MutableSharedFlow.emit() documentation say that it is thread-safe. Maybe you were accidentally looking at FlowCollector.emit(), which is not thread-safe. MutableSharedFlow is a subtype of FlowCollector but promotes emit() to being thread-safe since it's not intended to be used as a Flow builder receiver like a plain FlowCollector. There's no reason to launch a coroutine just to emit to your shared flow.
There's no reason to call start() on a coroutine Job that was created with launch because launch both creates the Job and starts it.
You will need to declare flowToReturn before your launch call to be able to have it in scope to return from this outer function.

How can I get a non-blocking infinite loop in a Kotlin Actor?

I would like to consume some stream-data using Kotlin actors
I was thinking to put my consumer inside an actor, while it polls in an infinite loop while(true). Then, when I decide, I send a message to stop the consumer.
Currently I have this:
while(true) {
for (message in channel){ <--- blocked in here, waiting
when(message) {
is MessageStop -> consumer.close()
else -> {}
}
}
consumer.poll()
}
The problem
The problem with this is that it only runs when I send a message to the actor, so my consumer is not polling the rest of the time because channel is blocking waiting to receive the next message
Is there any alternative?, someone with the same issue?, or something similar to actors but not blocked by channel in Kotlin?
Since the channel is just a Channel (https://kotlin.github.io/kotlinx.coroutines/kotlinx-coroutines-core/kotlinx.coroutines.channels/-channel/index.html) you can first check if the channel is empty and if so start your polling. Otherwise handle the messages.
E.g.
while(true) {
while (channel.isNotEmpty()) {
val message = channel.receive()
when(message) {
is MessageStop -> consumer.close()
else -> {}
}
}
consumer.poll()
}
In the end I used AKKA with Kotlin, I'm finding much easier this way
You should use postDelayed(), for example:
final Runnable r = new Runnable() {
public void run() {
// your code here
handler.postDelayed(this, 1000)
}
}
You can change 1000 with the the millisecond delay you want. Also I highly recommend to put your code inside a thread (if you are not already have) to prevent ANR (App Not Responding)

Parallel requests with coroutines

I'm trying to fetch some data from multiple locations to fill a recyclerView. I used to use callbacks, which worked fine, but need to refactor it to coroutines.
So i have a list of retrofit services and call each on of them parallerl. Then i can update the recyclerView with the onResponse callback. How can i achive this with coroutines.
I tried something like that, but the next call is fired after i got a response:
runblocking {
for (service in services) {
val response = async(Dispatchers.IO) {
service.getResponseAsync()
}
adapter.updateRecyclerView(response.await())
}
}
With another approach i had the problem that i was not able to get back on the main thread to update my ui as i was using launch and could not await the response:
runblocking {
services.foreach {
launch(Dispatcher.IO) {
val response = it.getResponseAsync()
}
withContext(Dispatcher.Main) {
adapter.updateRecyclerView(response)
}
}
}
I'm thankfull for every tip ;)
cheers patrick
Start coroutines with launch instead of runBlocking. The examples below assume you're launching from a context that uses Dispatchers.Main by default. If that's not the case, you could use launch(Dispatchers.Main) for these.
If you want to update your view every time any of the parallel actions returns, then move your UI update inside the coroutines that you're launching for each of the service items:
for (service in services) {
launch {
val response = withContext(Dispatchers.IO) { service.getResponseAsync() }
adapter.updateRecyclerView(response)
}
}
If you only need to update once all of them have returned, you can use awaitAll. Here, your updateRecyclerView function would have to be written to handle a list of responses instead of one at a time.
launch {
val responses = services.map { service ->
async(Dispatchers.IO) { service.getResponseAsync() }
}
adapter.updateRecyclerView(responses.awaitAll())
}
The await() call suspends the current coroutine and frees the current thread for being attached by other queued coroutines.
So when await() is called the current coroutine suspends till the response is received, and that's why for loop does not complete (goes to next iteration before completion of before request).
First and foremost you should not be using the runBlocking here, it is highly discouraged to be used in production evironment.
You should instead be using the ViewModel scope provided by android for structured concurrency (cancels the request if no longer needed like if lifecycle of activity is over).
You can use view model scope like this in activity or fragment viewModelOwner.viewModelScope.launch(/*Other dispatcher if needed*/) {} or make a coroutine scope yourself with a job attached which cancels itself on onDestroy.
For the problem the coroutine does not do parallel requests, you can launch multiple request without await (ing) on them inside the for loop.
And select them, using select expression https://kotlinlang.org/docs/reference/coroutines/select-expression.html#selecting-deferred-values
Example:
viewModelOwner.viewModelScope.launch {
val responses = mutableListOf<Deferred<TypeReturnedFromGetResponse>>()
for (service in services) {
async(Dispatchers.IO) {
service.getResponseAsync()
}.let(responses::add)
}
// adds which ever request is done first in oppose to awaiting for all then update
for (i in responses.indices) {
select<Unit> {
for (response in responses) {
response.onAwait {
adapter.updateRecyclerView(it)
}
}
}
}
}
PS: Using this method looks ugly but will update the adapter as soon as whichever request is first resolved, instead of awaiting for each and every request and then updating the items in it.

Kotlin wrap sequential IO calls as a Sequence

I need to process all of the results from a paged API endpoint. I'd like to present all of the results as a sequence.
I've come up with the following (slightly psuedo-coded):
suspend fun getAllRowsFromAPI(client: Client): Sequence<Row> {
var currentRequest: Request? = client.requestForNextPage()
return withContext(Dispatchers.IO) {
sequence {
while(currentRequest != null) {
var rowsInPage = runBlocking { client.makeRequest(currentRequest) }
currentRequest = client.requestForNextPage()
yieldAll(rowsInPage)
}
}
}
}
This functions but I'm not sure about a couple of things:
Is the API request happening inside runBlocking still happening with the IO dispatcher?
Is there a way to refactor the code to launch the next request before yielding the current results, then awaiting on it later?
Question 1: The API-request will still run on the IO-dispatcher, but it will block the thread it's running on. This means that no other tasks can be scheduled on that thread while waiting for the request to finish. There's not really any reason to use runBlocking in production-code at all, because:
If makeRequest is already a blocking call, then runBlocking will do practically nothing.
If makeRequest was a suspending call, then runBlocking would make the code less efficient. It wouldn't yield the thread back to the pool while waiting for the request to finish.
Whether makeRequest is a blocking or non-blocking call depends on the client you're using. Here's a non-blocking http-client I can recommend: https://ktor.io/clients/
Question 2: I would use a Flow for this purpose. You can think of it as a suspendable variant of Sequence. Flows are cold, which means that it won't run before the consumer asks for its contents (in contrary to being hot, which means the producer will push new values no matter if the consumer wants it or not). A Kotlin Flow has an operator called buffer which you can use to make it request more pages before it has fully consumed the previous page.
The code could look quite similar to what you already have:
suspend fun getAllRowsFromAPI(client: Client): Flow<Row> = flow {
var currentRequest: Request? = client.requestForNextPage()
while(currentRequest != null) {
val rowsInPage = client.makeRequest(currentRequest)
emitAll(rowsInPage.asFlow())
currentRequest = client.requestForNextPage()
}
}.flowOn(Dispatchers.IO)
.buffer(capacity = 1)
The capacity of 1 means that will only make 1 more request while processing an earlier page. You could increase the buffer size to make more concurrent requests.
You should check out this talk from KotlinConf 2019 to learn more about flows: https://www.youtube.com/watch?v=tYcqn48SMT8
Sequences are definitely not the thing you want to use in this case, because they are not designed to work in asynchronous environment. Perhaps you should take a look at flows and channels, but for your case the best and simplest choice is just a collection of deferred values, because you want to process all requests at once (flows and channels process them one-by-one, maybe with limited buffer size).
The following approach allows you to start all requests asynchronously (assuming that makeRequest is suspended function and supports asynchronous requests). When you'll need your results, you'll need to wait only for the slowest request to finish.
fun getClientRequests(client: Client): List<Request> {
val requests = ArrayList<Request>()
var currentRequest: Request? = client.requestForNextPage()
while (currentRequest != null) {
requests += currentRequest
currentRequest = client.requestForNextPage()
}
return requests
}
// This function is not even suspended, so it finishes almost immediately
fun getAllRowsFromAPI(client: Client): List<Deferred<Page>> =
getClientRequests(client).map {
/*
* The better practice would be making getAllRowsFromApi an extension function
* to CoroutineScope and calling receiver scope's async function.
* GlobalScope is used here just for simplicity.
*/
GlobalScope.async(Dispatchers.IO) { client.makeRequest(it) }
}
fun main() {
val client = Client()
val deferredPages = getAllRowsFromAPI(client) // This line executes fast
// Here you can do whatever you want, all requests are processed in background
Thread.sleep(999L)
// Then, when we need results....
val pages = runBlocking {
deferredPages.map { it.await() }
}
println(pages)
// In your case you also want to "unpack" pages and get rows, you can do it here:
val rows = pages.flatMap { it.getRows() }
println(rows)
}
I happened across suspendingSequence in Kotlin's coroutines-examples:
https://github.com/Kotlin/coroutines-examples/blob/090469080a974b962f5debfab901954a58a6e46a/examples/suspendingSequence/suspendingSequence.kt
This is exactly what I was looking for.

Kotlin Coroutines - unlimited stream to fan out batches

I'm looking to implement a pipeline for processing an infinite stream of messages. I'm new to coroutines and trying to follow along with the docs but I'm not confident I'm doing the right thing.
My infinite stream is of batches of records and I'd like to fan out the processing of each record to a coroutine, wait for a batch to finish (to log stats and stuff) before continuing to the next batch.
-> process [record] \
source -> [records] -> process [record] -> [log batch stats]
-> process [record] /
|------------------- while(true) -------------------|
What I had planned is to have 2 Channels, one for the infinite stream, and one for the intermediate records that will fill up and empty on each batch.
runBlocking {
val infinite: Channel<List<Record>> = produce { send(source.getBatch()) }
val records = Channel<Record>(Channel.Factory.UNLIMITED)
while(true) {
infinite.receive().forEach { records.send(it) }
while(!records.isEmpty()) {
launch { process(records.receive()) }
}
// ??? Wait for jobs?
logBatchStats()
}
}
From googling, it seems that waiting for jobs is discouraged, plus I wasn't sure if calling .map on a channel will actually receive messages to convert them to jobs:
records.map { record -> launch { process(record) } }
yields a Channel<Job>. It seems I can call .toList() on it to collapse it, but then I need to join the jobs? Again, google suggested to do that by having a parent job, but I'm not really sure how to do that with launch.
Anyway, very much a n00b to this.
Thanks for the help.
I don't see a reason to have two channels. You could directly iterate over the list of records. And you should use async instead of launch. Then you can use await or even better awaitAll for the list of results.
val infinite: ReceiveChannel<List<Record>> = produce { ... }
while(true) {
val resultsDeferred = infinite.receive().map {
async {
process(it)
}
}
val results = resultsDeferred.awaitAll()
logBatchStats()
}