Concurrent S3 File Upload via Kotlin Coroutines - amazon-s3

I need to upload many files to S3, it would take hours to complete that job sequentially. That's exactly what Kotlin's new coroutines excels in, so I wanted to give them a first try instead of fiddling around again with some Thread-based execution service.
Here is my (simplified) code:
fun upload(superTiles: Map<Int, Map<Int, SuperTile>>) = runBlocking {
val s3 = AmazonS3ClientBuilder.standard().withRegion("eu-west-1").build()
for ((x, ys) in superTiles) {
val jobs = mutableListOf<Deferred<Any>>()
for ((y, superTile) in ys) {
val job = async(CommonPool) {
uploadTile(s3, x, y, superTile)
} { it.await() }
suspend fun uploadTile(s3: AmazonS3, x: Int, y: Int, superTile: SuperTile) {
val json: String = "{}"
val key = "$s3Prefix/x4/$z/$x/$y.json"
s3.putObject(PutObjectRequest("my_bucket", ByteArrayInputStream(json.toByteArray()), metadata))
The problem: the code is still very slow and logging reveals that requests are still executed sequentially: a job is finished before the next one is created. Only in very few cases (1 out of 10) I see jobs running concurrently.
Why does the code not run much faster / concurrently? What can I do about it?

Kotlin coroutines excel when you work with asynchronous API, while AmazonS3.putObject API that you are using is an old-school blocking, synchronous API, so you get only as many concurrent uploads as the number of threads in the CommonPool that you are using. There is no value in marking your uploadTile function with suspend modified, because it does not use any suspending functions in its body.
The first step in getting more throughput in your upload task is to start using asynchronous API for that. I'd suggest to look at Amazon S3 TransferManager for that purse. See if that gets your problem solved first.
Kotlin coroutines are designed to help you to combine your async APIs into a easy-to-use logical workflows. For example, it is straightforward to adapt asynchronous API of TransferManager for use with coroutines by writing the following extension function:
suspend fun Upload.await(): UploadResult = suspendCancellableCoroutine { cont ->
addProgressListener {
if (isDone) {
// we know it should not actually wait when done
try { cont.resume(waitForUploadResult()) }
catch (e: Throwable) { cont.resumeWithException(e) }
cont.invokeOnCompletion { abort() }
This extension enables you to write very fluent code that works with TransferManager and you can rewrite your uploadTile function to work with TransferManager instead of working with blocking AmazonS3 interface:
suspend fun uploadTile(tm: TransferManager, x: Int, y: Int, superTile: SuperTile) {
val json: String = "{}"
val key = "$s3Prefix/x4/$z/$x/$y.json"
tm.upload(PutObjectRequest("my_bucket", ByteArrayInputStream(json.toByteArray()), metadata))
Notice, how this new version of uploadTile uses a suspending function await that was defined above.


Difference between GlobalScope and runBlocking when waiting for multiple async

I have a Kotlin Backend/server API using Ktor, and inside a certain endpoint's service logic I need to concurrently get details for a list of ids and then return it all to the client with the 200 response.
The way I wanted to do it is by using async{} and awaitAll()
However, I can't understand whether I should use runBlocking or GlobalScope.
What is really the difference here?
fun getDetails(): List<Detail> {
val fetched: MutableList<Details> = mutableListOf()
GlobalScope.launch { --> Option 1
runBlocking { ---> Option 2
Dispatchers.IO --> Option 3 (or any other dispatcher ..) { id ->
async {
val providerDetails = getDetails(id)
fetched += providerDetails
return fetched
launch starts a coroutine that runs in parallel with your current code, so fetched would still be empty by the time your getDetails() function returns. The coroutine will continue running and mutating the List that you have passed out of the function while the code that retrieved the list already has the reference back and will be using it, so there's a pretty good chance of triggering a ConcurrentModificationException. Basically, this is not a viable solution at all.
runBlocking runs a coroutine while blocking the thread that called it. The coroutine will be completely finished before the return fetched line, so this will work if you are OK with blocking the calling thread.
Specifying a Dispatcher isn't an alternative to launch or runBlocking. It is an argument that you can add to either to determine the thread pool used for the coroutine and its children. Since you are doing IO and parallel work, you should probably be using runBlocking(Dispatchers.IO).
Your code can be simplified to avoid the extra, unnecessary mutable list:
fun getDetails(): List<Detail> = runBlocking(Dispatchers.IO) { { id ->
async {
Note that this function will rethrow any exceptions thrown by getDetails().
If your project uses coroutines more generally, you probably have higher level coroutines running, in which case this should probably be a suspend function (non-blocking) instead:
suspend fun getDetails(): List<Detail> = withContext(Dispatchers.IO) { { id ->
async {

Kotlin runBlocking and async with return

I am taking my first steps in kotlin coroutines and I have a problem.
In order to create Foo and return it from a function I need to call two heavy service methods asynchronously to get some values for Foo creating. This is my code:
return runBlocking {
val xAsync = async {
val yAsync = async {
Foo(xAsync.await(), yAsync.await())
However, after reading logs is seems to me that calculateX() and calculateY() are called synchronously. Is my code correct?
Your code isn't perfect, but it is correct in terms of making calculateX() and calculateY() run concurrently. However, since it launches this concurrent work on the runBlocking dispatcher which is single-threaded, and since your heavyweight operations are blocking instead of suspending, they will not be parallelized.
The first observation to make is that blocking operations cannot gain anything from coroutines compared to the old-school approach with Java executors, apart from a bit simpler API.
The second observation is that you can at least make them run in parallel, each blocking its own thread, by using the IO dispatcher:
return runBlocking {
val xAsync = async(Dispatchers.IO) {
val yAsync = async(Dispatchers.IO) {
Foo(xAsync.await(), yAsync.await())
Compared to using the java.util.concurrent APIs, here you benefit from the library's IO dispatcher instead of having to create your own thread pool.

Unpredictable coroutines execution order?

This is what I thought:
When using coroutines you go piling up async ops and once you are done with synchronous them in FIFO order..but that's not always true
In this example you get what I expected:
fun main() = runBlocking {
launch {
launch {
Also here(with nested launch):
fun main() = runBlocking {
launch {
launch {
launch {
Now in this example with a scope builder and creating another "pile"(not the real term) the order changes but get as expected
fun main() = runBlocking {
launch {
// replacing launch
coroutineScope {
Finally..the reason of this question..example 2 with scope builder:
fun main() = runBlocking {
launch {
coroutineScope {
launch {
I get this:
Was my assumption wrong and that's not how coroutines work
If so..then how should I determine the correct order when coding
edited: I've tried running the same code on different machines and different platforms but always got the same result..also tried more complicated nesting to prove non-variability of results
And digging the documentation found that coroutines are just code transformation(as I initially thought)
Remember that even if the like to call them 'light-weight' threads they run in a single 'real' thread(note: without newSingleThreadContext)
Thus I chose to believe execution order is pre-established at compile-time and not decided at runtime
After all..I still can't anticipate the order..and that's what I want
Don't assume coroutines will be run in a specific order, the runtime will decide what's best to run when and in what order. What you may be interested in that will help is the kotlinx.coroutines documentation. It does a great job of explaining how they work and also provides some handy abstractions to help managing coroutines make more sense. I personally recommend checking out channels, jobs, and Deferred (async/await).
For example, if I wanted things done in a certain order by number, I'd use channels to ensure things arrived in the order I wanted.
runBlocking {
val channel = Channel<Int>()
launch {
for (x in 0..5) channel.send(x * x)
for (msg in channel) {
// Pretend we're doing some work with channel results
println("Message: $msg")
Hopefully that can give you more context or what coroutines are and what they're good for.

Kotlin: withContext() vs Async-await

I have been reading kotlin docs, and if I understood correctly the two Kotlin functions work as follows :
withContext(context): switches the context of the current coroutine, when the given block executes, the coroutine switches back to previous context.
async(context): Starts a new coroutine in the given context and if we call .await() on the returned Deferred task, it will suspends the calling coroutine and resume when the block executing inside the spawned coroutine returns.
Now for the following two versions of code :
val returned = async(context){
val returned = withContext(context){
In both versions block1(), block3() execute in default context(commonpool?) where as block2() executes in the given context.
The overall execution is synchronous with block1() -> block2() -> block3() order.
Only difference I see is that version1 creates another coroutine, where as version2 executes only one coroutine while switching context.
My questions are :
Isn't it always better to use withContext rather than async-await as it is functionally similar, but doesn't create another coroutine. Large numbers of coroutines, although lightweight, could still be a problem in demanding applications.
Is there a case async-await is more preferable to withContext?
Kotlin 1.2.50 now has a code inspection where it can convert async(ctx) { }.await() to withContext(ctx) { }.
Large number of coroutines, though lightweight, could still be a problem in demanding applications
I'd like to dispel this myth of "too many coroutines" being a problem by quantifying their actual cost.
First, we should disentangle the coroutine itself from the coroutine context to which it is attached. This is how you create just a coroutine with minimum overhead:
GlobalScope.launch(Dispatchers.Unconfined) {
suspendCoroutine<Unit> {
The value of this expression is a Job holding a suspended coroutine. To retain the continuation, we added it to a list in the wider scope.
I benchmarked this code and concluded that it allocates 140 bytes and takes 100 nanoseconds to complete. So that's how lightweight a coroutine is.
For reproducibility, this is the code I used:
fun measureMemoryOfLaunch() {
val continuations = ContinuationList()
val jobs = (1..10_000).mapTo(JobList()) {
GlobalScope.launch(Dispatchers.Unconfined) {
suspendCoroutine<Unit> {
(1..500).forEach {
println(jobs.onEach { it.cancel() }.filter { it.isActive})
class JobList : ArrayList<Job>()
class ContinuationList : ArrayList<Continuation<Unit>>()
This code starts a bunch of coroutines and then sleeps so you have time to analyze the heap with a monitoring tool like VisualVM. I created the specialized classes JobList and ContinuationList because this makes it easier to analyze the heap dump.
To get a more complete story, I used the code below to also measure the cost of withContext() and async-await:
import kotlinx.coroutines.*
import java.util.concurrent.Executors
import kotlin.coroutines.suspendCoroutine
import kotlin.system.measureTimeMillis
const val JOBS_PER_BATCH = 100_000
var blackHoleCount = 0
val threadPool = Executors.newSingleThreadExecutor()!!
val ThreadPool = threadPool.asCoroutineDispatcher()
fun main(args: Array<String>) {
try {
measure("just launch", justLaunch)
measure("launch and withContext", launchAndWithContext)
measure("launch and async", launchAndAsync)
println("Black hole value: $blackHoleCount")
} finally {
fun measure(name: String, block: (Int) -> Job) {
print("Measuring $name, warmup ")
(1..1_000_000).forEach { block(it).cancel() }
val tookOnAverage = (1..20).map { _ ->
var jobs: List<Job> = emptyList()
measureTimeMillis {
jobs = (1..JOBS_PER_BATCH).map(block)
}.also { _ ->
blackHoleCount += jobs.onEach { it.cancel() }.count()
println("$name took ${tookOnAverage * 1_000_000 / JOBS_PER_BATCH} nanoseconds")
fun measureMemory(name:String, block: (Int) -> Job) {
val jobs = (1..JOBS_PER_BATCH).map(block)
(1..500).forEach {
println(jobs.onEach { it.cancel() }.filter { it.isActive})
val justLaunch: (i: Int) -> Job = {
GlobalScope.launch(Dispatchers.Unconfined) {
suspendCoroutine<Unit> {}
val launchAndWithContext: (i: Int) -> Job = {
GlobalScope.launch(Dispatchers.Unconfined) {
withContext(ThreadPool) {
suspendCoroutine<Unit> {}
val launchAndAsync: (i: Int) -> Job = {
GlobalScope.launch(Dispatchers.Unconfined) {
async(ThreadPool) {
suspendCoroutine<Unit> {}
This is the typical output I get from the above code:
Just launch: 140 nanoseconds
launch and withContext : 520 nanoseconds
launch and async-await: 1100 nanoseconds
Yes, async-await takes about twice as long as withContext, but it's still just a microsecond. You'd have to launch them in a tight loop, doing almost nothing besides, for that to become "a problem" in your app.
Using measureMemory() I found the following memory cost per call:
Just launch: 88 bytes
withContext(): 512 bytes
async-await: 652 bytes
The cost of async-await is exactly 140 bytes higher than withContext, the number we got as the memory weight of one coroutine. This is just a fraction of the complete cost of setting up the CommonPool context.
If performance/memory impact was the only criterion to decide between withContext and async-await, the conclusion would have to be that there's no relevant difference between them in 99% of real use cases.
The real reason is that withContext() a simpler and more direct API, especially in terms of exception handling:
An exception that isn't handled within async { ... } causes its parent job to get cancelled. This happens regardless of how you handle exceptions from the matching await(). If you haven't prepared a coroutineScope for it, it may bring down your entire application.
An exception not handled within withContext { ... } simply gets thrown by the withContext call, you handle it just like any other.
withContext also happens to be optimized, leveraging the fact that you're suspending the parent coroutine and awaiting on the child, but that's just an added bonus.
async-await should be reserved for those cases where you actually want concurrency, so that you launch several coroutines in the background and only then await on them. In short:
async-await-async-await — don't do that, use withContext-withContext
async-async-await-await — that's the way to use it.
Isn't it always better to use withContext rather than asynch-await as it is funcationally similar, but doesn't create another coroutine. Large numebrs coroutines, though lightweight could still be a problem in demanding applications
Is there a case asynch-await is more preferable to withContext
You should use async/await when you want to execute multiple tasks concurrently, for example:
runBlocking {
val deferredResults = arrayListOf<Deferred<String>>()
deferredResults += async {
delay(1, TimeUnit.SECONDS)
deferredResults += async {
delay(1, TimeUnit.SECONDS)
deferredResults += async {
delay(1, TimeUnit.SECONDS)
//wait for all results (at this point tasks are running)
val results = { it.await() }
//Or val results = deferredResults.awaitAll()
If you don't need to run multiple tasks concurrently you can use withContext.
When in doubt, remember this like a rule of thumb:
If multiple tasks have to happen in parallel and the final result depends on completion of all of them, then use async.
For returning the result of a single task, use withContext.

Kotlin - How to read from file asynchronously?

Is there any kotlin idiomatic way to read a file content's asynchronously? I couldn't find anything in documentation.
A least as of Java 7 (which is where Android is stuck), there isn't any API that would tap into the low-level async file IO support (like io_uring). There is a class called AsynchronousFileChannel, but, as its docs state,
An AsynchronousFileChannel is associated with a thread pool to which tasks are submitted to handle I/O events and dispatch to completion handlers that consume the results of I/O operations on the channel.
That makes it no better than the following, bog-standard Kotlin idiom:
launch {
val contents = withContext(Dispatchers.IO) {
FileInputStream("filename.txt").use { it.readBytes() }
This uses Kotlin's own dedicated IO thread pool and unblocks the UI thread. If you're on Android, that is your actual concern, anyway.
Java NIO Asynchronous Channel is the tool you want.
Check out this AsynchronousFileChannel.aRead extension function from coroutine example:
suspend fun AsynchronousFileChannel.aRead(buf: ByteBuffer): Int =
suspendCoroutine { cont ->
read(buf, 0L, Unit, object : CompletionHandler<Int, Unit> {
override fun completed(bytesRead: Int, attachment: Unit) {
override fun failed(exception: Throwable, attachment: Unit) {
You just open an AsynchronousFileChannel then call this aRead() in a coroutine,
val channel =
try {
val buf = ByteBuffer.allocate(4096)
val bytesRead = channel.aRead(buf)
} finally {
It's an essential function, don't know why it is not part of coroutine-core lib.
javasync/RxIo uses Java NIO Asynchronous Channel to provide a non-blocking API to read and write a file content's asynchronously, including kotlin idiomatic way. Next you have two examples: one reading/writing in bulk through coroutines, and other iterating lines through an asynchronous Kotlin Flow:
suspend fun copyNio(from: String, to: String) {
val data = Path(from).readText() // suspension point
Path(to).writeText(data) // suspension point
fun printLinesFrom(filename: String) {
.lines() // Flow<String>
.collect() // block if you want to wait for completion
Disclaimer I am the author and main contributor of javasync/RxIo