What is the difference between limitedParallelism vs a fixed thread pool dispatcher? - kotlin

I am trying to use Kotlin coroutines to perform multiple HTTP calls concurrently, rather than one at a time, but I would like to avoid making all of the calls concurrently, to avoid rate limiting by the external API.
If I simply launch a coroutine for each request, they all are sent near instantly. So I looked into the limitedParallelism function, which sounds very close to what I need, and some stack overflow answers suggest is the recommended solution. Older answers to the same question suggested using newFixedThreadPoolContext.
The documentation for that function mentioned limitedParallelism as a preferred alternative "if you do not need a separate thread pool":
If you do not need a separate thread-pool, but only have to limit effective parallelism of the dispatcher, it is recommended to use CoroutineDispatcher.limitedParallelism instead.
However, when I write my code to use limitedParallelism, it does not reduce the number of concurrent calls, compared to newFixedThreadPoolContext which does.
In the example below, I replace my network calls with Thread.sleep, which does not change the behavior.
// method 1
val fixedThreadPoolContext = newFixedThreadPoolContext(2)
// method 2
val limitedParallelismContext = Dispatchers.IO.limitedParallelism(2)
runBlocking {
val jobs = (1..1000).map {
// swap out the dispatcher here
launch(limitedParallelismContext) {
println("started $it")
Thread.sleep(1000)
println(" finished $it")
}
}
jobs.joinAll()
}
The behavior for fixedThreadPoolContext is as expected, no more than 2 of the coroutines runs at a time, and the total time to finish is several minutes (1000 times one second each, divided by two at a time, roughly 500 seconds).
However, for limitedParallelismContext, all "started #" lines print immediately, and one second later, all "finished #" lines print and the program completes in just over 1 total second.
Why does limitedParallelism not have the same effect as using a separate thread pool? What does it accomplish?

I modified your code slightly so that every coroutine takes 200ms to complete and it prints the time when it is completed. Then I pasted it to play.kotlinlang.org to check:
/**
* You can edit, run, and share this code.
* play.kotlinlang.org
*/
import kotlinx.coroutines.*
fun main() {
// method 1
val fixedThreadPoolContext = newFixedThreadPoolContext(2, "Pool")
// method 2
val limitedParallelismContext = Dispatchers.IO.limitedParallelism(2)
runBlocking {
val jobs = (1..10).map {
// swap out the dispatcher here
launch(limitedParallelismContext) {
println("it at ${System.currentTimeMillis()}")
Thread.sleep(200)
}
}
jobs.joinAll()
}
}
And there using kotlin 1.6.21 the result is as expected:
it at 1652887163155
it at 1652887163157
it at 1652887163358
it at 1652887163358
it at 1652887163559
it at 1652887163559
it at 1652887163759
it at 1652887163759
it at 1652887163959
it at 1652887163959
Only 2 coroutines are executed at a time.

Related

launch long-running task then immediately send HTTP response

Using ktor HTTP server, I would like to launch a long-running task and immediately return a message to the calling client. The task is self-sufficient, it's capable of updating its status in a db, and a separate HTTP call returns its status (i.e. for a progress bar).
What I cannot seem to do is just launch the task in the background and respond. All my attempts at responding wait for the long-running task to complete. I have experimented with many configurations of runBlocking and coroutineScope but none are working for me.
// ktor route
get("/launchlongtask") {
val text: String = (myFunction(call.request.queryParameters["loops"]!!.toInt()))
println("myFunction returned")
call.respondText(text)
}
// in reality, this function is complex... the caller (route) is not able to
// determine the response string, it must be done here
suspend fun myFunction(loops : Int) : String {
runBlocking {
launch {
// long-running task, I want to launch it and move on
(1..loops).forEach {
println("this is loop $it")
delay(2000L)
// updates status in db here
}
}
println("returning")
// this string must be calculated in this function (or a sub-function)
return#runBlocking "we just launched $loops loops"
}
return "never get here" // actually we do get here in a coroutineScope
}
output:
returning
this is loop 1
this is loop 2
this is loop 3
this is loop 4
myFunction returned
expected:
returning
myFunction returned
(response sent)
this is loop 1
this is loop 2
this is loop 3
this is loop 4
Just to explain the issue with the code in your question, the problem is using runBlocking. This is meant as the bridge between the synchronous world and the async world of coroutines and
"the name of runBlocking means that the thread that runs it ... gets blocked for the duration of the call, until all the coroutines inside runBlocking { ... } complete their execution."
(from the Coroutine docs).
So in your first example, myFunction won't complete until your coroutine containing loop completes.
The correct approach is what you do in your answer, using CoroutineScope to launch your long-running task. One thing to point out is that you are just passing in a Job() as the CoroutineContext parameter to the CoroutineScope constructor. The CoroutineContext contains multiple things; Job, CoroutineDispatcher, CoroutineExceptionHandler... In this case, because you don't specifiy a CoroutineDispatcher it will use CoroutineDispatcher.Default. This is intended for CPU-intensive tasks and will be limited to "the number of CPU cores (with a minimum of 2)". This may or may not be want you want. An alternative is CoroutineDispatcher.IO - which has a default of 64 threads.
inspired by this answer by Lucas Milotich, I utilized CoroutineScope(Job()) and it seems to work:
suspend fun myFunction(loops : Int) : String {
CoroutineScope(Job()).launch {
// long-running task, I want to launch it and move on
(1..loops).forEach {
println("this is loop $it")
delay(2000L)
// updates status in db here
}
}
println("returning")
return "we just launched $loops loops"
}
not sure if this is resource-efficient, or the preferred way to go, but I don't see a whole lot of other documentation on the topic.

Kotlin coroutine delay waits for couroutine completion/fire and forget

I have the following code.
fun main(args: Array<String>) {
runBlocking {
while (true) {
launch {
println("Taking measurement")
val begin = System.currentTimeMillis()
while (System.currentTimeMillis() - begin < 20000) {
val test = 5 + 5
}
println("Took measurement")
}
println("After launch")
delay(1000)
println("After delay")
}
}
}
I would expect the output of that to be twice as many "After launch"s and "After delay"s in the first 20 seconds. SO the output should look something like this:
After launch
Taking measurement
After delay
After launch
Taking measurement
After delay
After launch
Taking measurement
After delay
After launch
Taking measurement
After delay
After launch
Taking measurement
... and then after 20s the first took measurements
But rather it looks like this and I can not wrap my head around to why it is not working.
After launch
Taking measurement
Took measurement
After delay
After launch
Taking measurement
What I basically want to achieve is a fire and forget. I want to start the code in launch and then 1 second after that I want to start it again.... The result of that code will be saved by that code, so I do not need any data back.
Any tips on why this is not working?
This is because the default dispatcher of runBlocking is a single-threaded dispatcher that uses the same thread that runBlocking was called on. Since it is single-threaded, it cannot run the blocking portions of your coroutine concurrently. If you use runBlocking(Dispatchers.Default) or launch(Dispatchers.Default) to avoid running in the single-threaded dispatcher, it will behave as you had expected.
However, the only reason you ran into this surprise in the first place is that you are trying to do blocking work directly in a coroutine, which you're not supposed to do. You should not do blocking work in a coroutine without wrapping it in withContext or a suspend function that uses withContext. If you replace your blocking loop in the coroutine with a delay(1000) suspend function call to simulate doing work in a non-blocking way, it will behave as you had expected.
When you do blocking work in a coroutine, you should either wrap it in a withContext call like this:
withContext(Dispatchers.Default) {
while (System.currentTimeMillis() - begin < 20000) {
val test = 5 + 5
}
}
or break it out into a suspend function that uses withContext to perform the blocking work on an appropriate dispatcher:
suspend fun doLongCalculation() = withContext(Dispatchers.Default) {
val begin = System.currentTimeMillis()
while (System.currentTimeMillis() - begin < 20000) {
val test = 5 + 5
}
}
This is why replacing your blocking work with a delay suspend function call is an adequate simulation of doing long-running work. delay and the doLongCalculation() above are both proper suspend functions that do not block their caller or expect their caller to be using a specific dispatcher to function correctly.

emitting flow values asynchronously with kotlins flow

Iam building a simple Spring Service with kotlin and webflux.
I have a endpoint which returns a flow. The flow contains elements which take a long time to compute which is simulated by a delay.
It is constructed like this:
suspend fun latest(): Flow<Message> {
println("generating messages")
return flow {
for (i in 0..20) {
println("generating $i")
if (i % 2 == 0) delay(1000)
else delay(200)
println("generated messsage $i")
emit(generateMessage(i))
}
println("messages generated")
}
}
My expectation was that it would return Message1 followed by Message3, Message5... and then Message0 because of the different delays the individual generation takes.
But in reality the flow contains the elements in order.
I guess iam missing something important about coroutins and flow and i tryed diffrent thinks to achive what i want with couroutins but i cant figure out how.
Solution
As pointed out by Marko Topolnik and William Reed using channelFlow works.
fun latest(): Flow<Message> {
println("generating numbers")
return channelFlow {
for (i in 0..20) {
launch {
send(generateMessage(i))
}
}
}
}
suspend fun generateMessage(i: Int): Message {
println("generating $i")
val time = measureTimeMillis {
if (i % 2 == 0) delay(1000)
else delay(500)
}
println("generated messsage $i in ${time}ms")
return Message(UUID.randomUUID(), "This is Message $i")
}
When run the results are as expected
generating numbers
generating 2
generating 0
generating 1
generating 6
...
generated messsage 5 in 501ms
generated messsage 9 in 501ms
generated messsage 13 in 501ms
generated messsage 15 in 505ms
generated messsage 4 in 1004ms
...
Once you go concurrent with the computation of each element, your first problem will be to figure out when all the computation is done.
You have to know in advance how many items to expect. So it seems natural to me to construct a plain List<Deferred<Message>> and then await on all the deferreds before returning the entire thing. You aren't getting any mileage from the flow in your case, since flow is all about doing things synchronously, inside the flow collection.
You can also use channelFlow in combination with a known count of messages to expect, and then terminate the flow based on that. The advantage would be that Spring can start collecting the flow earlier.
EDIT
Actually, the problem of the count isn't present: the flow will automatically wait for all the child coroutines you launched to complete.
Your current approach uses a single coroutine for the entire function, including the for loop. That means that any calling of a suspend fun, e.g. delay will block that entire coroutine until it completes. It does free up the thread to go do other stuff, but the current coroutine is blocked.
It's hard to say what the right solution is based on your simplified example. If you truly did want a new coroutine for each for loop, you could launch it there, but it doesn't seem clear that is the right solution from the information given.

Kotlin wrap sequential IO calls as a Sequence

I need to process all of the results from a paged API endpoint. I'd like to present all of the results as a sequence.
I've come up with the following (slightly psuedo-coded):
suspend fun getAllRowsFromAPI(client: Client): Sequence<Row> {
var currentRequest: Request? = client.requestForNextPage()
return withContext(Dispatchers.IO) {
sequence {
while(currentRequest != null) {
var rowsInPage = runBlocking { client.makeRequest(currentRequest) }
currentRequest = client.requestForNextPage()
yieldAll(rowsInPage)
}
}
}
}
This functions but I'm not sure about a couple of things:
Is the API request happening inside runBlocking still happening with the IO dispatcher?
Is there a way to refactor the code to launch the next request before yielding the current results, then awaiting on it later?
Question 1: The API-request will still run on the IO-dispatcher, but it will block the thread it's running on. This means that no other tasks can be scheduled on that thread while waiting for the request to finish. There's not really any reason to use runBlocking in production-code at all, because:
If makeRequest is already a blocking call, then runBlocking will do practically nothing.
If makeRequest was a suspending call, then runBlocking would make the code less efficient. It wouldn't yield the thread back to the pool while waiting for the request to finish.
Whether makeRequest is a blocking or non-blocking call depends on the client you're using. Here's a non-blocking http-client I can recommend: https://ktor.io/clients/
Question 2: I would use a Flow for this purpose. You can think of it as a suspendable variant of Sequence. Flows are cold, which means that it won't run before the consumer asks for its contents (in contrary to being hot, which means the producer will push new values no matter if the consumer wants it or not). A Kotlin Flow has an operator called buffer which you can use to make it request more pages before it has fully consumed the previous page.
The code could look quite similar to what you already have:
suspend fun getAllRowsFromAPI(client: Client): Flow<Row> = flow {
var currentRequest: Request? = client.requestForNextPage()
while(currentRequest != null) {
val rowsInPage = client.makeRequest(currentRequest)
emitAll(rowsInPage.asFlow())
currentRequest = client.requestForNextPage()
}
}.flowOn(Dispatchers.IO)
.buffer(capacity = 1)
The capacity of 1 means that will only make 1 more request while processing an earlier page. You could increase the buffer size to make more concurrent requests.
You should check out this talk from KotlinConf 2019 to learn more about flows: https://www.youtube.com/watch?v=tYcqn48SMT8
Sequences are definitely not the thing you want to use in this case, because they are not designed to work in asynchronous environment. Perhaps you should take a look at flows and channels, but for your case the best and simplest choice is just a collection of deferred values, because you want to process all requests at once (flows and channels process them one-by-one, maybe with limited buffer size).
The following approach allows you to start all requests asynchronously (assuming that makeRequest is suspended function and supports asynchronous requests). When you'll need your results, you'll need to wait only for the slowest request to finish.
fun getClientRequests(client: Client): List<Request> {
val requests = ArrayList<Request>()
var currentRequest: Request? = client.requestForNextPage()
while (currentRequest != null) {
requests += currentRequest
currentRequest = client.requestForNextPage()
}
return requests
}
// This function is not even suspended, so it finishes almost immediately
fun getAllRowsFromAPI(client: Client): List<Deferred<Page>> =
getClientRequests(client).map {
/*
* The better practice would be making getAllRowsFromApi an extension function
* to CoroutineScope and calling receiver scope's async function.
* GlobalScope is used here just for simplicity.
*/
GlobalScope.async(Dispatchers.IO) { client.makeRequest(it) }
}
fun main() {
val client = Client()
val deferredPages = getAllRowsFromAPI(client) // This line executes fast
// Here you can do whatever you want, all requests are processed in background
Thread.sleep(999L)
// Then, when we need results....
val pages = runBlocking {
deferredPages.map { it.await() }
}
println(pages)
// In your case you also want to "unpack" pages and get rows, you can do it here:
val rows = pages.flatMap { it.getRows() }
println(rows)
}
I happened across suspendingSequence in Kotlin's coroutines-examples:
https://github.com/Kotlin/coroutines-examples/blob/090469080a974b962f5debfab901954a58a6e46a/examples/suspendingSequence/suspendingSequence.kt
This is exactly what I was looking for.

How to create this coroutines in kotlin?

I am trying to print Hello in between World 4 from this code.
import kotlinx.coroutines.*
import java.util.*
fun main() = runBlocking <Unit> {
launch { //COR-1
var now = Date().time
val c= timeConusmingFunct(now) //may be some IO work
print(" World $c") //once done lets process it
}
launch{ //COR-2
print(" in between")
}
print("Hello ")
}
suspend fun timeConusmingFunct(now: Long): Int{
while(Date().time-now <4000){
//a way to block the thread
}
return 4
}
My understanding is that when main thread execute it will face COR-1 and move onto next COR-2 and then move to final print("Hello"). As COR-1 will take some time (~4s) and the time consuming funct is marked suspend [which indicates that it will take some time].
The call should have gone to COR-2.
However it prints : Hello World 4 in between
Why this happens, and how do I make my code work as intended Hello in between World 4 with help of coroutine ?
You need this minimal change to your spin-waiting loop:
while (Date().time - now < 4000) {
yield()
}
The concurrency in coroutines is cooperative, suspension happens when the code explicitly asks for it. Normally it happens transparently because you call a suspendable function, but in your case you must add it in since you aren't otherwise calling any suspendable functions.
what if I had some Image processing in that timeConsumingcode to do. It will block the main thread. I want it to be handled by co routine.
Image processing is a CPU-bound task and, since it's not a task that must be pinned to the GUI thread, you are better off handing it over to a thread pool. The most convenient way to do it is to write
withContext(Dispatchers.Default) {
processImage(img)
}
This way your coroutine, that otherwise runs on the main GUI thread, will temporarily jump over to the thread pool to do the long task, and then move back to the GUI thread to go on. While it's running on the thread pool, the GUI thread will be able to serve other GUI events and keep your UI live.