How do you know when you need to yield()? - kotlin

Take Kotlin channels for example
for(msg in channel){
// to stuff
yield() // maybe?
}
How do you know if yield is required? I assume that Channels are built in a way that yielding happens automatically behind the scenes in the iterator but I'm not sure. In general, how do you know you need manual yields when using the rest of Kotlin's coroutine library that might do it for you automatically?

In most cases you should not at all need to use yield() or be concerned with it. Coroutines can switch automatically whenever we get to a suspension point, which usually happens pretty often.
yield() is needed only if our code does not suspend for prolonged time. That usually means we are performing intensive CPU calculations. In your example receiving from the channel is suspending operation, so you don't need yield() here.

You only need to call yield if you want to artificially add a suspension point when you have none in a piece of code. Suspension points are calls to suspend functions.
If you don't know which functions are suspend from the top of your head, you can quickly identify those in IntelliJ IDEA for instance because every suspend function call is marked with an icon:
So in your case you would see it on the iteration over the channel:
You only really need to manually add a yield if you have loops or extended pieces of code that exclusively use regular functions, or more generally if you want to ensure other coroutines have a chance to run at a particular point in time (for instance in tests). This shouldn't happen often.

Related

Coroutines, understanding suspend

I'm trying to understand a passage in Hands-On Design Patterns with Kotlin, Chapter 8, Threads and Coroutines.
Why is it that when we rewrite the function as suspend, "we can serve 20 times more users, all thanks to the smart way Kotlin has rewritten our code".
fun profile(id:String):Profile {
val bio = fetchBioOverHttp(id) //takes 1s
val picture = fetchPictureFromDb(id) // takes 100ms
val friends = fetchFriendsFromDb(id) // takes 500ms
return Profile(bio, picture)
}
I've attached the two relevant pages but basically, it says "if we have a thread pool of of 10 threads, the first 10 requests will get into the pool and the 11th will get stuck until the first one finishes. This means we can serve three users simultaneously, and the fourth one will wait until the first one gets his/her results."
I think I understand this point. 3 threads execute the three methods in parallel, then another 3, then another 3, which gives us 9 threads actively executing code. The 10th thread executes the first fetchBioOverHttp method, and we're out of threads until thread #1 finishes its fetchBioOverHttp call.
However, how does rewriting these methods as suspend methods result in serving 20 times more users? I guess I'm not understanding the path of execution here.
To be honest, I don't like this example.
Author meant that after rewriting httpCall() it doesn't wait for the result - it schedules processing in the background, registers a callback and then immediately returns. The caller thread is freed and it can start handling another request while the first one is being processed. By using this technique we can process multiple requests while using even a single thread.
I don't like this explanation, because it ignores how coroutines really work internally. Instead, it tries to compare them to something the reader could be familiar with - asynchronous callback-based APIs. Normally, this is good as it helps to understand. However, in this case the problem is that in most cases coroutines internally... create a thread pool and use it to schedule blocking IO operations. Therefore, both provided solutions are pretty much the same and the main difference is that we created a pool of 10 threads and by default coroutines use 64 threads.
Kotlin compiler does not cut the function into two. There is still a single function with a lot of additional code inside. I agree it can be interpreted as two functions calling each other, but this is not what the compiler does. If that wasn't explained in the book, I think this is misleading.

In which cases you don't want or you shouldn't use coroutines in Kotlin?

I'd read a lot of the many adventages of using coroutines, but I find nothing about why you shouldn't or couldn't use them.
Why not use all methods as suspend methods, by the way?
I'm having some trouble to understand some concepts here, so with my question I pretend to make the opposite case (why not use it), so I can understand better by contrast.
The main reason not to have all functions suspendable is the overhead they introduce, at least on the JVM. Every suspendable function compiles into a Java method that receives another parameter, the continuation object, and its body compiles into pretty complex state machine code that, among other things, always instantiates another continuation object and daisy-chains it to the one received as the parameter.
So, whenever you have nothing to gain from coroutines, you shouldn't use them as the default way to do things.
Please see my answers inline to your questions:
but I find nothing about why you shouldn't or couldn't use them.
Answer:
a. You should not use them for any foreground task.
b. You should not use them for any simple/real quick operations.
c. You should not use them for any kind of initialization.
Why not use all methods as suspend methods, by the way?
Answer:
a) This will be treated as code smell. Bad practice to do so.
b) If you mark all functions as suspend, then whenever you want to call a suspend function you will have to create a Coroutine Scope to run it.
c) Testing of suspend function is difficult. It needs some additional setup of RunBlockingTest from AndroidX.

How to understand coroutine cancellation is cooperative

In Kotlin, coroutine cancellation is cooperative. How should I understand it?
Link to Kotlin documentation.
If you have a Java background, you may be familiar with the thread interruption mechanism. Any thread can call thread.interrupt() and the receiving thread will get a signal in the form of a Boolean isInterrupted flag becoming true. The receiving thread may check the flag at any time with currentThread.isInterrupted() — or it may ignore it completely. That's why this mechanism is said to be cooperative.
Kotlin's coroutine cancellation mechanism is an exact replica of this: you have a coroutineContext.isActive flag that you (or a function you call) may check.
In both cases some well-known functions, for example Thread.sleep() in Java and delay() in Kotlin, check this flag and throw an InterruptedException and CancellationException, respectively. These methods/functions are said to be "interruptible" / "cancellable".
I'm not 100% sure whether I understand your question, but maybe this helps:
Coroutines are usually executed within the same thread you start them with. You can use different dispatchers, but they are designed to work when being started from the same thread. There's no extra scheduling happening.
You can compare this with scheduling mechanisms in an OS. Coroutines behave similar like to cooperative scheduling. You find similar concepts in many frameworks and languages to deal with async operations. Ruby for example has fibers which behave similar.
Basically this means that if a coroutine is hogging on your CPU in a busy loop, you cannot cancel it (unless you kill the whole process). Instead, your coroutines has to regularly check for cancellation and also add waits/delays/yields so that other coroutines can work.
This also defines on when coroutines are helpful the most: when running in a single-threaded-context, it doesn't help to use co-routines for local-only calculations. I used them mostly for processing async calls like interactions with databases or web servers.
This article also has some explanations on how coroutines work - maybe it helps you with any additional questions: https://antonioleiva.com/coroutines/

How many coroutines is too many?

I need to speed up a search over some collection with millions of elements.
Search predicate needs to be passed as argument.
I have been wondering wether the simplest solution(at least for now) wouldn't be just using coroutines for the task.
The question I am facing right now is how many coroutines can I actually create at once. :D As a side note there might be more than one such search running concurrently.
Can I make millions of coroutines(one for every item) for every such search? Should I decide on some workload per coroutine(for example 1000 items per coroutine)? Should I also decide on some cap for coroutines amount?
I have rough understanding of coroutines and how they actually work, however, I have no idea what are the performance limitations of this feature.
Thanks!
The memory weight of a coroutine scales with the depth of the call trace from the coroutine builder block to the suspension point. Each suspend fun call adds another Continuation object to a linked list and this is retained while the coroutine is suspended. A rough figure for one Continuation instance is 100 bytes.
So, if you have a call trace depth of, say, 5, that amounts to 500 bytes per item. A million items is 500 MB.
However, unless your search code involves blocking operations that would leave a thread idle, you aren't gaining anything from coroutines. Your task looks more like an instance of data paralellism and you can solve it very efficiently using the java.util.stream API (as noted by user marstran in the comment).
According the kotlin coroutine starter guide, the example launches 100K coroutines. I believe what you intend to do is exactly what kotlin coroutine is designed for.
If you will not do many modifications over your collection then just store it in a HashMap,
else store it in a TreeMap. Then just search items there. I believe the search methods implemented there are optimized enough to handle a million items in a blink. I would not use coroutines in this case.
Documentation (for Kotlin):
HashMap: https://developer.android.com/reference/kotlin/java/util/HashMap
TreeMap: https://developer.android.com/reference/kotlin/java/util/TreeMap

Why is CoroutineScope.launch and Coroutine.async are extension functions instead of a member function of CoroutineScope?

The title states my question.
What is exactly the reason why CoroutineScope.launch and Coroutine.async are just extension functions of CoroutineScope instead oa a member function?
What benefits does it provide?
I am asking because maybe the reason behind this design could be helpful in designing things in the future too.
Thank in advance.
Mostly because with extension functions it is easier to structure your code in multiple modules even if it is represented as one class.
CoroutineScope is actually a really good example of this design pattern. Take a look at CoroutineScope.kt where the interface is declared. There is only basic functionality there (plus operator and cancel())
The two functions you mentioned are defined in Builders.common.kt. If You take a look at the contents of this file, you can see that there are multiple classes which are private, this means they can only be used in this file. This tells you right away that you don't need this these classes for the basic functionality which is designed in CoroutineScope.kt, they are only there for launch {...} and async {...}
So if you have a large class with multiple functionality, it makes sense to break it up in multiple files (=modules).
launch and async are coroutine builders, but they aren't the only ones: look in integration modules for future (and another future), publish, the RxJava 2 builders etc. Obviously those can't be members of CoroutineScope, so why should launch and async be?
In addition, by being extension functions you know they don't rely on any CoroutineScope privates (well, they could rely on internals since they are in the same module).
The kotlinx.coroutines uses structural concurrency approach to make sure all errors are propagated to a parent coroutine. Similarly, a parent coroutine will by default wait for all it's child coroutines to complete.
There is a Job object associated with every coroutine when you do launch or async. It is just easier to use extension functions for that design to make it work implicitly, without a code-writer explicit attention
You may have a look at the more detailed explanation :
https://kotlinlang.org/docs/reference/coroutines/basics.html#structured-concurrency
https://medium.com/#elizarov/structured-concurrency-722d765aa952