Kotlin windowed function to coroutine and execute all at once - kotlin

How to execute batches all at once in a parallel way.
I have the following code:
val batches = list
.windowed(100, 100, true)
.map { Process(it) }
Is there a way to parallel execute all patches each in a coroutine at once?
Is there a better way than
withContext(Dispatchers.IO) {
batches.map { launch{}}.joinAll()
}

Your code sample is exactly how it should be done - simply use map() or foreach() with async() or launch().
Note that withContext() automatically waits for all children, so if you don't need any result from each batch (it doesn't seem so), you don't need to use joinAll() and map():
withContext(Dispatchers.IO) {
batches.foreach { launch{} }
}

Related

Difference between GlobalScope and runBlocking when waiting for multiple async

I have a Kotlin Backend/server API using Ktor, and inside a certain endpoint's service logic I need to concurrently get details for a list of ids and then return it all to the client with the 200 response.
The way I wanted to do it is by using async{} and awaitAll()
However, I can't understand whether I should use runBlocking or GlobalScope.
What is really the difference here?
fun getDetails(): List<Detail> {
val fetched: MutableList<Details> = mutableListOf()
GlobalScope.launch { --> Option 1
runBlocking { ---> Option 2
Dispatchers.IO --> Option 3 (or any other dispatcher ..)
myIds.map { id ->
async {
val providerDetails = getDetails(id)
fetched += providerDetails
}
}.awaitAll()
}
return fetched
}
launch starts a coroutine that runs in parallel with your current code, so fetched would still be empty by the time your getDetails() function returns. The coroutine will continue running and mutating the List that you have passed out of the function while the code that retrieved the list already has the reference back and will be using it, so there's a pretty good chance of triggering a ConcurrentModificationException. Basically, this is not a viable solution at all.
runBlocking runs a coroutine while blocking the thread that called it. The coroutine will be completely finished before the return fetched line, so this will work if you are OK with blocking the calling thread.
Specifying a Dispatcher isn't an alternative to launch or runBlocking. It is an argument that you can add to either to determine the thread pool used for the coroutine and its children. Since you are doing IO and parallel work, you should probably be using runBlocking(Dispatchers.IO).
Your code can be simplified to avoid the extra, unnecessary mutable list:
fun getDetails(): List<Detail> = runBlocking(Dispatchers.IO) {
myIds.map { id ->
async {
getDetails(id)
}
}.awaitAll()
}
Note that this function will rethrow any exceptions thrown by getDetails().
If your project uses coroutines more generally, you probably have higher level coroutines running, in which case this should probably be a suspend function (non-blocking) instead:
suspend fun getDetails(): List<Detail> = withContext(Dispatchers.IO) {
myIds.map { id ->
async {
getDetails(id)
}
}.awaitAll()
}

What's the proper way of returning a result out of a IO coroutine job?

The problem is very simple, but I can't really seem to wrap my head around it. I'm launching a non-blocking thread in the IO scope in order to read from a file. However, I can't get the result in time before I return from the method - it always returns the initial empty value "". What am I missing here?
private fun getFileContents(): String {
var result = ""
val fileName = getFilename()
val job = CoroutineScope(Dispatchers.IO).launch {
kotlin.runCatching {
val file = getFile(fileName)
file.openFileInput().use { inputStream ->
result = String(inputStream.readBytes(), Charsets.UTF_8)
}
}
}
return result
}
Coroutines are launched asynchronously. Your non-suspending function cannot wait for the result without blocking. For more information about why asynchronous code results in your function returning with the default result, read the answers here.
getFileContents() has to be a suspend function to be able to return something without blocking, in which case you don't need to launch a coroutine either. But then whatever calls this function must be in a suspend function or coroutine.
private suspend fun getFileContents(): String = withContext(Dispatchers.IO) {
val fileName = getFilename()
kotlin.runCatching {
val file = getFile(fileName)
file.openFileInput().use { inputStream ->
result = String(inputStream.readBytes(), Charsets.UTF_8)
}
}.getOrDefault("")
}
There are two "worlds" of code: either you are in a suspending/coroutine context or you are not. When you are in a function that is not a suspend function, you can only return results that can be computed immediately, or you can block until the result is ready.
Generally, if you're using coroutines, you launch a coroutine at some high level in your code, and then you are free to use suspend functions everywhere because almost all of your code is initially triggered by a coroutine. By "high level", I mean you launch the coroutine when a UI screen appears or a UI button is pressed, for example.
Basically, your coroutine launches are usually in UI listeners and UI event functions, not in lower-level code like the function in your question. The coroutine calls a suspend function, which can call other suspend functions, so you don't need to launch more coroutines to perform your various sequential tasks.
The alternate solution is to return a Deferred with the result, like this:
private fun getFileContents(): Deferred<String> {
val fileName = getFilename()
return CoroutineScope(Dispatchers.IO).async {
kotlin.runCatching {
val file = getFile(fileName)
file.openFileInput().use { inputStream ->
result = String(inputStream.readBytes(), Charsets.UTF_8)
}
}.getOrDefault("")
}
}
But to unpack the result, you will need to call await() on the Deferred instance inside a coroutine somewhere.

Compare to sets of files with coroutines in Kotlin

I have written a function that scans files (pictures) from two Lists and check if a file is in both lists.
The code below is working as expected, but for large sets it takes some time. So I tried to do this in parallel with coroutines. But in sets of 100 sample files the programm was always slower than without coroutines.
The code:
private fun doJob() {
val filesToCompare = File("C:\\Users\\Tobias\\Desktop\\Test").walk().filter { it.isFile }.toList()
val allFiles = File("\\\\myserver\\Photos\\photo").walk().filter { it.isFile }.toList()
println("Files to scan: ${filesToCompare.size}")
filesToCompare.forEach { file ->
var multipleDuplicate = 0
var s = "This file is a duplicate"
s += "\n${file.absolutePath}"
allFiles.forEach { possibleDuplicate ->
if (file != possibleDuplicate) { //only needed when both lists are the same
// Files that have the same name or contains the name, so not every file gets byte comparison
if (possibleDuplicate.nameWithoutExtension.contains(file.nameWithoutExtension)) {
try {
if (Files.mismatch(file.toPath(), possibleDuplicate.toPath()) == -1L) {
s += "\n${possibleDuplicate.absolutePath}"
i++
multipleDuplicate++
println(s)
}
} catch (e: Exception) {
println(e.message)
}
}
}
}
if (multipleDuplicate > 1) {
println("This file has $multipleDuplicate duplicate(s)")
}
}
println("Files scanned: ${filesToCompare.size}")
println("Total number of duplicates found: $i")
}
How have I tried to add the coroutines?
I wrapped the code inside the first forEach in launch{...} the idea was that for each file a coroutine starts and the second loop is done concurrently. I expected the program to run faster but in fact it was about the same time or slower.
How can I achieve this code to run in parallel faster?
Running each inner loop in a coroutine seems to be a decent approach. The problem might lie in the dispatcher you were using. If you used runBlocking and launch without context argument, you were using a single thread to run all your coroutines.
Since there is mostly blocking IO here, you could instead use Dispatchers.IO to launch your coroutines, so your coroutines are dispatched on multiple threads. The parallelism should be automatically limited to 64, but if your memory can't handle that, you can also use Dispatchers.IO.limitedParallelism(n) to reduce the number of threads.

Kotlin Flows and File IO

I have a Flow<FileChunk> which I need to append into a File. My naive approach would be:
File(fileame).bufferedWriter().use { writer ->
withContext(IO) {
requests.collect {
writer.write(it.fileChunk.content.toStringUtf8())
}
}
}
Unfortuntately, IntelliJ marks the "write" as "blocking". What is the right way of achieving what I want? Thanks!
You actually need a CoroutineScope to launch a suspend function. In this case I'd go with these two options:
Create or use an available coroutine scope. For ex:
File(fileame).bufferedWriter().use { writer ->
CoroutineScope(Dispatchers.IO).launch{
requests.collect {
writer.write(it.fileChunk.content.toStringUtf8())
}
}
}
Or if you're inside a function, you could make it a suspend fun, and handle the coroutine scope on the site of the function calling

Kotlin: withContext() vs Async-await

I have been reading kotlin docs, and if I understood correctly the two Kotlin functions work as follows :
withContext(context): switches the context of the current coroutine, when the given block executes, the coroutine switches back to previous context.
async(context): Starts a new coroutine in the given context and if we call .await() on the returned Deferred task, it will suspends the calling coroutine and resume when the block executing inside the spawned coroutine returns.
Now for the following two versions of code :
Version1:
launch(){
block1()
val returned = async(context){
block2()
}.await()
block3()
}
Version2:
launch(){
block1()
val returned = withContext(context){
block2()
}
block3()
}
In both versions block1(), block3() execute in default context(commonpool?) where as block2() executes in the given context.
The overall execution is synchronous with block1() -> block2() -> block3() order.
Only difference I see is that version1 creates another coroutine, where as version2 executes only one coroutine while switching context.
My questions are :
Isn't it always better to use withContext rather than async-await as it is functionally similar, but doesn't create another coroutine. Large numbers of coroutines, although lightweight, could still be a problem in demanding applications.
Is there a case async-await is more preferable to withContext?
Update:
Kotlin 1.2.50 now has a code inspection where it can convert async(ctx) { }.await() to withContext(ctx) { }.
Large number of coroutines, though lightweight, could still be a problem in demanding applications
I'd like to dispel this myth of "too many coroutines" being a problem by quantifying their actual cost.
First, we should disentangle the coroutine itself from the coroutine context to which it is attached. This is how you create just a coroutine with minimum overhead:
GlobalScope.launch(Dispatchers.Unconfined) {
suspendCoroutine<Unit> {
continuations.add(it)
}
}
The value of this expression is a Job holding a suspended coroutine. To retain the continuation, we added it to a list in the wider scope.
I benchmarked this code and concluded that it allocates 140 bytes and takes 100 nanoseconds to complete. So that's how lightweight a coroutine is.
For reproducibility, this is the code I used:
fun measureMemoryOfLaunch() {
val continuations = ContinuationList()
val jobs = (1..10_000).mapTo(JobList()) {
GlobalScope.launch(Dispatchers.Unconfined) {
suspendCoroutine<Unit> {
continuations.add(it)
}
}
}
(1..500).forEach {
Thread.sleep(1000)
println(it)
}
println(jobs.onEach { it.cancel() }.filter { it.isActive})
}
class JobList : ArrayList<Job>()
class ContinuationList : ArrayList<Continuation<Unit>>()
This code starts a bunch of coroutines and then sleeps so you have time to analyze the heap with a monitoring tool like VisualVM. I created the specialized classes JobList and ContinuationList because this makes it easier to analyze the heap dump.
To get a more complete story, I used the code below to also measure the cost of withContext() and async-await:
import kotlinx.coroutines.*
import java.util.concurrent.Executors
import kotlin.coroutines.suspendCoroutine
import kotlin.system.measureTimeMillis
const val JOBS_PER_BATCH = 100_000
var blackHoleCount = 0
val threadPool = Executors.newSingleThreadExecutor()!!
val ThreadPool = threadPool.asCoroutineDispatcher()
fun main(args: Array<String>) {
try {
measure("just launch", justLaunch)
measure("launch and withContext", launchAndWithContext)
measure("launch and async", launchAndAsync)
println("Black hole value: $blackHoleCount")
} finally {
threadPool.shutdown()
}
}
fun measure(name: String, block: (Int) -> Job) {
print("Measuring $name, warmup ")
(1..1_000_000).forEach { block(it).cancel() }
println("done.")
System.gc()
System.gc()
val tookOnAverage = (1..20).map { _ ->
System.gc()
System.gc()
var jobs: List<Job> = emptyList()
measureTimeMillis {
jobs = (1..JOBS_PER_BATCH).map(block)
}.also { _ ->
blackHoleCount += jobs.onEach { it.cancel() }.count()
}
}.average()
println("$name took ${tookOnAverage * 1_000_000 / JOBS_PER_BATCH} nanoseconds")
}
fun measureMemory(name:String, block: (Int) -> Job) {
println(name)
val jobs = (1..JOBS_PER_BATCH).map(block)
(1..500).forEach {
Thread.sleep(1000)
println(it)
}
println(jobs.onEach { it.cancel() }.filter { it.isActive})
}
val justLaunch: (i: Int) -> Job = {
GlobalScope.launch(Dispatchers.Unconfined) {
suspendCoroutine<Unit> {}
}
}
val launchAndWithContext: (i: Int) -> Job = {
GlobalScope.launch(Dispatchers.Unconfined) {
withContext(ThreadPool) {
suspendCoroutine<Unit> {}
}
}
}
val launchAndAsync: (i: Int) -> Job = {
GlobalScope.launch(Dispatchers.Unconfined) {
async(ThreadPool) {
suspendCoroutine<Unit> {}
}.await()
}
}
This is the typical output I get from the above code:
Just launch: 140 nanoseconds
launch and withContext : 520 nanoseconds
launch and async-await: 1100 nanoseconds
Yes, async-await takes about twice as long as withContext, but it's still just a microsecond. You'd have to launch them in a tight loop, doing almost nothing besides, for that to become "a problem" in your app.
Using measureMemory() I found the following memory cost per call:
Just launch: 88 bytes
withContext(): 512 bytes
async-await: 652 bytes
The cost of async-await is exactly 140 bytes higher than withContext, the number we got as the memory weight of one coroutine. This is just a fraction of the complete cost of setting up the CommonPool context.
If performance/memory impact was the only criterion to decide between withContext and async-await, the conclusion would have to be that there's no relevant difference between them in 99% of real use cases.
The real reason is that withContext() a simpler and more direct API, especially in terms of exception handling:
An exception that isn't handled within async { ... } causes its parent job to get cancelled. This happens regardless of how you handle exceptions from the matching await(). If you haven't prepared a coroutineScope for it, it may bring down your entire application.
An exception not handled within withContext { ... } simply gets thrown by the withContext call, you handle it just like any other.
withContext also happens to be optimized, leveraging the fact that you're suspending the parent coroutine and awaiting on the child, but that's just an added bonus.
async-await should be reserved for those cases where you actually want concurrency, so that you launch several coroutines in the background and only then await on them. In short:
async-await-async-await — don't do that, use withContext-withContext
async-async-await-await — that's the way to use it.
Isn't it always better to use withContext rather than asynch-await as it is funcationally similar, but doesn't create another coroutine. Large numebrs coroutines, though lightweight could still be a problem in demanding applications
Is there a case asynch-await is more preferable to withContext
You should use async/await when you want to execute multiple tasks concurrently, for example:
runBlocking {
val deferredResults = arrayListOf<Deferred<String>>()
deferredResults += async {
delay(1, TimeUnit.SECONDS)
"1"
}
deferredResults += async {
delay(1, TimeUnit.SECONDS)
"2"
}
deferredResults += async {
delay(1, TimeUnit.SECONDS)
"3"
}
//wait for all results (at this point tasks are running)
val results = deferredResults.map { it.await() }
//Or val results = deferredResults.awaitAll()
println(results)
}
If you don't need to run multiple tasks concurrently you can use withContext.
When in doubt, remember this like a rule of thumb:
If multiple tasks have to happen in parallel and the final result depends on completion of all of them, then use async.
For returning the result of a single task, use withContext.