Kotlin wrap sequential IO calls as a Sequence - kotlin

I need to process all of the results from a paged API endpoint. I'd like to present all of the results as a sequence.
I've come up with the following (slightly psuedo-coded):
suspend fun getAllRowsFromAPI(client: Client): Sequence<Row> {
var currentRequest: Request? = client.requestForNextPage()
return withContext(Dispatchers.IO) {
sequence {
while(currentRequest != null) {
var rowsInPage = runBlocking { client.makeRequest(currentRequest) }
currentRequest = client.requestForNextPage()
yieldAll(rowsInPage)
}
}
}
}
This functions but I'm not sure about a couple of things:
Is the API request happening inside runBlocking still happening with the IO dispatcher?
Is there a way to refactor the code to launch the next request before yielding the current results, then awaiting on it later?

Question 1: The API-request will still run on the IO-dispatcher, but it will block the thread it's running on. This means that no other tasks can be scheduled on that thread while waiting for the request to finish. There's not really any reason to use runBlocking in production-code at all, because:
If makeRequest is already a blocking call, then runBlocking will do practically nothing.
If makeRequest was a suspending call, then runBlocking would make the code less efficient. It wouldn't yield the thread back to the pool while waiting for the request to finish.
Whether makeRequest is a blocking or non-blocking call depends on the client you're using. Here's a non-blocking http-client I can recommend: https://ktor.io/clients/
Question 2: I would use a Flow for this purpose. You can think of it as a suspendable variant of Sequence. Flows are cold, which means that it won't run before the consumer asks for its contents (in contrary to being hot, which means the producer will push new values no matter if the consumer wants it or not). A Kotlin Flow has an operator called buffer which you can use to make it request more pages before it has fully consumed the previous page.
The code could look quite similar to what you already have:
suspend fun getAllRowsFromAPI(client: Client): Flow<Row> = flow {
var currentRequest: Request? = client.requestForNextPage()
while(currentRequest != null) {
val rowsInPage = client.makeRequest(currentRequest)
emitAll(rowsInPage.asFlow())
currentRequest = client.requestForNextPage()
}
}.flowOn(Dispatchers.IO)
.buffer(capacity = 1)
The capacity of 1 means that will only make 1 more request while processing an earlier page. You could increase the buffer size to make more concurrent requests.
You should check out this talk from KotlinConf 2019 to learn more about flows: https://www.youtube.com/watch?v=tYcqn48SMT8

Sequences are definitely not the thing you want to use in this case, because they are not designed to work in asynchronous environment. Perhaps you should take a look at flows and channels, but for your case the best and simplest choice is just a collection of deferred values, because you want to process all requests at once (flows and channels process them one-by-one, maybe with limited buffer size).
The following approach allows you to start all requests asynchronously (assuming that makeRequest is suspended function and supports asynchronous requests). When you'll need your results, you'll need to wait only for the slowest request to finish.
fun getClientRequests(client: Client): List<Request> {
val requests = ArrayList<Request>()
var currentRequest: Request? = client.requestForNextPage()
while (currentRequest != null) {
requests += currentRequest
currentRequest = client.requestForNextPage()
}
return requests
}
// This function is not even suspended, so it finishes almost immediately
fun getAllRowsFromAPI(client: Client): List<Deferred<Page>> =
getClientRequests(client).map {
/*
* The better practice would be making getAllRowsFromApi an extension function
* to CoroutineScope and calling receiver scope's async function.
* GlobalScope is used here just for simplicity.
*/
GlobalScope.async(Dispatchers.IO) { client.makeRequest(it) }
}
fun main() {
val client = Client()
val deferredPages = getAllRowsFromAPI(client) // This line executes fast
// Here you can do whatever you want, all requests are processed in background
Thread.sleep(999L)
// Then, when we need results....
val pages = runBlocking {
deferredPages.map { it.await() }
}
println(pages)
// In your case you also want to "unpack" pages and get rows, you can do it here:
val rows = pages.flatMap { it.getRows() }
println(rows)
}

I happened across suspendingSequence in Kotlin's coroutines-examples:
https://github.com/Kotlin/coroutines-examples/blob/090469080a974b962f5debfab901954a58a6e46a/examples/suspendingSequence/suspendingSequence.kt
This is exactly what I was looking for.

Related

Difference between GlobalScope and runBlocking when waiting for multiple async

I have a Kotlin Backend/server API using Ktor, and inside a certain endpoint's service logic I need to concurrently get details for a list of ids and then return it all to the client with the 200 response.
The way I wanted to do it is by using async{} and awaitAll()
However, I can't understand whether I should use runBlocking or GlobalScope.
What is really the difference here?
fun getDetails(): List<Detail> {
val fetched: MutableList<Details> = mutableListOf()
GlobalScope.launch { --> Option 1
runBlocking { ---> Option 2
Dispatchers.IO --> Option 3 (or any other dispatcher ..)
myIds.map { id ->
async {
val providerDetails = getDetails(id)
fetched += providerDetails
}
}.awaitAll()
}
return fetched
}
launch starts a coroutine that runs in parallel with your current code, so fetched would still be empty by the time your getDetails() function returns. The coroutine will continue running and mutating the List that you have passed out of the function while the code that retrieved the list already has the reference back and will be using it, so there's a pretty good chance of triggering a ConcurrentModificationException. Basically, this is not a viable solution at all.
runBlocking runs a coroutine while blocking the thread that called it. The coroutine will be completely finished before the return fetched line, so this will work if you are OK with blocking the calling thread.
Specifying a Dispatcher isn't an alternative to launch or runBlocking. It is an argument that you can add to either to determine the thread pool used for the coroutine and its children. Since you are doing IO and parallel work, you should probably be using runBlocking(Dispatchers.IO).
Your code can be simplified to avoid the extra, unnecessary mutable list:
fun getDetails(): List<Detail> = runBlocking(Dispatchers.IO) {
myIds.map { id ->
async {
getDetails(id)
}
}.awaitAll()
}
Note that this function will rethrow any exceptions thrown by getDetails().
If your project uses coroutines more generally, you probably have higher level coroutines running, in which case this should probably be a suspend function (non-blocking) instead:
suspend fun getDetails(): List<Detail> = withContext(Dispatchers.IO) {
myIds.map { id ->
async {
getDetails(id)
}
}.awaitAll()
}

launch long-running task then immediately send HTTP response

Using ktor HTTP server, I would like to launch a long-running task and immediately return a message to the calling client. The task is self-sufficient, it's capable of updating its status in a db, and a separate HTTP call returns its status (i.e. for a progress bar).
What I cannot seem to do is just launch the task in the background and respond. All my attempts at responding wait for the long-running task to complete. I have experimented with many configurations of runBlocking and coroutineScope but none are working for me.
// ktor route
get("/launchlongtask") {
val text: String = (myFunction(call.request.queryParameters["loops"]!!.toInt()))
println("myFunction returned")
call.respondText(text)
}
// in reality, this function is complex... the caller (route) is not able to
// determine the response string, it must be done here
suspend fun myFunction(loops : Int) : String {
runBlocking {
launch {
// long-running task, I want to launch it and move on
(1..loops).forEach {
println("this is loop $it")
delay(2000L)
// updates status in db here
}
}
println("returning")
// this string must be calculated in this function (or a sub-function)
return#runBlocking "we just launched $loops loops"
}
return "never get here" // actually we do get here in a coroutineScope
}
output:
returning
this is loop 1
this is loop 2
this is loop 3
this is loop 4
myFunction returned
expected:
returning
myFunction returned
(response sent)
this is loop 1
this is loop 2
this is loop 3
this is loop 4
Just to explain the issue with the code in your question, the problem is using runBlocking. This is meant as the bridge between the synchronous world and the async world of coroutines and
"the name of runBlocking means that the thread that runs it ... gets blocked for the duration of the call, until all the coroutines inside runBlocking { ... } complete their execution."
(from the Coroutine docs).
So in your first example, myFunction won't complete until your coroutine containing loop completes.
The correct approach is what you do in your answer, using CoroutineScope to launch your long-running task. One thing to point out is that you are just passing in a Job() as the CoroutineContext parameter to the CoroutineScope constructor. The CoroutineContext contains multiple things; Job, CoroutineDispatcher, CoroutineExceptionHandler... In this case, because you don't specifiy a CoroutineDispatcher it will use CoroutineDispatcher.Default. This is intended for CPU-intensive tasks and will be limited to "the number of CPU cores (with a minimum of 2)". This may or may not be want you want. An alternative is CoroutineDispatcher.IO - which has a default of 64 threads.
inspired by this answer by Lucas Milotich, I utilized CoroutineScope(Job()) and it seems to work:
suspend fun myFunction(loops : Int) : String {
CoroutineScope(Job()).launch {
// long-running task, I want to launch it and move on
(1..loops).forEach {
println("this is loop $it")
delay(2000L)
// updates status in db here
}
}
println("returning")
return "we just launched $loops loops"
}
not sure if this is resource-efficient, or the preferred way to go, but I don't see a whole lot of other documentation on the topic.

What is the difference between limitedParallelism vs a fixed thread pool dispatcher?

I am trying to use Kotlin coroutines to perform multiple HTTP calls concurrently, rather than one at a time, but I would like to avoid making all of the calls concurrently, to avoid rate limiting by the external API.
If I simply launch a coroutine for each request, they all are sent near instantly. So I looked into the limitedParallelism function, which sounds very close to what I need, and some stack overflow answers suggest is the recommended solution. Older answers to the same question suggested using newFixedThreadPoolContext.
The documentation for that function mentioned limitedParallelism as a preferred alternative "if you do not need a separate thread pool":
If you do not need a separate thread-pool, but only have to limit effective parallelism of the dispatcher, it is recommended to use CoroutineDispatcher.limitedParallelism instead.
However, when I write my code to use limitedParallelism, it does not reduce the number of concurrent calls, compared to newFixedThreadPoolContext which does.
In the example below, I replace my network calls with Thread.sleep, which does not change the behavior.
// method 1
val fixedThreadPoolContext = newFixedThreadPoolContext(2)
// method 2
val limitedParallelismContext = Dispatchers.IO.limitedParallelism(2)
runBlocking {
val jobs = (1..1000).map {
// swap out the dispatcher here
launch(limitedParallelismContext) {
println("started $it")
Thread.sleep(1000)
println(" finished $it")
}
}
jobs.joinAll()
}
The behavior for fixedThreadPoolContext is as expected, no more than 2 of the coroutines runs at a time, and the total time to finish is several minutes (1000 times one second each, divided by two at a time, roughly 500 seconds).
However, for limitedParallelismContext, all "started #" lines print immediately, and one second later, all "finished #" lines print and the program completes in just over 1 total second.
Why does limitedParallelism not have the same effect as using a separate thread pool? What does it accomplish?
I modified your code slightly so that every coroutine takes 200ms to complete and it prints the time when it is completed. Then I pasted it to play.kotlinlang.org to check:
/**
* You can edit, run, and share this code.
* play.kotlinlang.org
*/
import kotlinx.coroutines.*
fun main() {
// method 1
val fixedThreadPoolContext = newFixedThreadPoolContext(2, "Pool")
// method 2
val limitedParallelismContext = Dispatchers.IO.limitedParallelism(2)
runBlocking {
val jobs = (1..10).map {
// swap out the dispatcher here
launch(limitedParallelismContext) {
println("it at ${System.currentTimeMillis()}")
Thread.sleep(200)
}
}
jobs.joinAll()
}
}
And there using kotlin 1.6.21 the result is as expected:
it at 1652887163155
it at 1652887163157
it at 1652887163358
it at 1652887163358
it at 1652887163559
it at 1652887163559
it at 1652887163759
it at 1652887163759
it at 1652887163959
it at 1652887163959
Only 2 coroutines are executed at a time.

Parallel requests with coroutines

I'm trying to fetch some data from multiple locations to fill a recyclerView. I used to use callbacks, which worked fine, but need to refactor it to coroutines.
So i have a list of retrofit services and call each on of them parallerl. Then i can update the recyclerView with the onResponse callback. How can i achive this with coroutines.
I tried something like that, but the next call is fired after i got a response:
runblocking {
for (service in services) {
val response = async(Dispatchers.IO) {
service.getResponseAsync()
}
adapter.updateRecyclerView(response.await())
}
}
With another approach i had the problem that i was not able to get back on the main thread to update my ui as i was using launch and could not await the response:
runblocking {
services.foreach {
launch(Dispatcher.IO) {
val response = it.getResponseAsync()
}
withContext(Dispatcher.Main) {
adapter.updateRecyclerView(response)
}
}
}
I'm thankfull for every tip ;)
cheers patrick
Start coroutines with launch instead of runBlocking. The examples below assume you're launching from a context that uses Dispatchers.Main by default. If that's not the case, you could use launch(Dispatchers.Main) for these.
If you want to update your view every time any of the parallel actions returns, then move your UI update inside the coroutines that you're launching for each of the service items:
for (service in services) {
launch {
val response = withContext(Dispatchers.IO) { service.getResponseAsync() }
adapter.updateRecyclerView(response)
}
}
If you only need to update once all of them have returned, you can use awaitAll. Here, your updateRecyclerView function would have to be written to handle a list of responses instead of one at a time.
launch {
val responses = services.map { service ->
async(Dispatchers.IO) { service.getResponseAsync() }
}
adapter.updateRecyclerView(responses.awaitAll())
}
The await() call suspends the current coroutine and frees the current thread for being attached by other queued coroutines.
So when await() is called the current coroutine suspends till the response is received, and that's why for loop does not complete (goes to next iteration before completion of before request).
First and foremost you should not be using the runBlocking here, it is highly discouraged to be used in production evironment.
You should instead be using the ViewModel scope provided by android for structured concurrency (cancels the request if no longer needed like if lifecycle of activity is over).
You can use view model scope like this in activity or fragment viewModelOwner.viewModelScope.launch(/*Other dispatcher if needed*/) {} or make a coroutine scope yourself with a job attached which cancels itself on onDestroy.
For the problem the coroutine does not do parallel requests, you can launch multiple request without await (ing) on them inside the for loop.
And select them, using select expression https://kotlinlang.org/docs/reference/coroutines/select-expression.html#selecting-deferred-values
Example:
viewModelOwner.viewModelScope.launch {
val responses = mutableListOf<Deferred<TypeReturnedFromGetResponse>>()
for (service in services) {
async(Dispatchers.IO) {
service.getResponseAsync()
}.let(responses::add)
}
// adds which ever request is done first in oppose to awaiting for all then update
for (i in responses.indices) {
select<Unit> {
for (response in responses) {
response.onAwait {
adapter.updateRecyclerView(it)
}
}
}
}
}
PS: Using this method looks ugly but will update the adapter as soon as whichever request is first resolved, instead of awaiting for each and every request and then updating the items in it.

Kotlin runBlocking and async with return

I am taking my first steps in kotlin coroutines and I have a problem.
In order to create Foo and return it from a function I need to call two heavy service methods asynchronously to get some values for Foo creating. This is my code:
return runBlocking {
val xAsync = async {
service.calculateX()
}
val yAsync = async {
service.calculateY()
}
Foo(xAsync.await(), yAsync.await())
};
However, after reading logs is seems to me that calculateX() and calculateY() are called synchronously. Is my code correct?
Your code isn't perfect, but it is correct in terms of making calculateX() and calculateY() run concurrently. However, since it launches this concurrent work on the runBlocking dispatcher which is single-threaded, and since your heavyweight operations are blocking instead of suspending, they will not be parallelized.
The first observation to make is that blocking operations cannot gain anything from coroutines compared to the old-school approach with Java executors, apart from a bit simpler API.
The second observation is that you can at least make them run in parallel, each blocking its own thread, by using the IO dispatcher:
return runBlocking {
val xAsync = async(Dispatchers.IO) {
service.calculateX()
}
val yAsync = async(Dispatchers.IO) {
service.calculateY()
}
Foo(xAsync.await(), yAsync.await())
};
Compared to using the java.util.concurrent APIs, here you benefit from the library's IO dispatcher instead of having to create your own thread pool.