When writing a Command Line Tool (CLT) in Swift, I want to process a lot of data. I've determined that my code is CPU bound and performance could benefit from using multiple cores. Thus I want to parallelize parts of the code. Say I want to achieve the following pseudo-code:
Fetch items from database
Divide items in X chunks
Process chunks in parallel
Wait for chunks to finish
Do some other processing (single-thread)
Now I've been using GCD, and a naive approach would look like this:
let group = dispatch_group_create()
let queue = dispatch_queue_create("", DISPATCH_QUEUE_CONCURRENT)
for chunk in chunks {
dispatch_group_async(group, queue) {
dispatch_group_wait(group, DISPATCH_TIME_FOREVER)
However GCD requires a run loop, so the code will hang as the group is never executed. The runloop can be started with dispatch_main(), but it never exits. It is also possible to run the NSRunLoop just a few seconds, however that doesn't feel like a solid solution. Regardless of GCD, how can this be achieved using Swift?

I mistakenly interpreted the locking thread for a hanging program. The work will execute just fine without a run loop. The code in the question will run fine, and blocking the main thread until the whole group has finished.
So say chunks contains 4 items of workload, the following code spins up 4 concurrent workers, and then waits for all of the workers to finish:
let group = DispatchGroup()
let queue = DispatchQueue(label: "", attributes: .concurrent)
for chunk in chunk {
queue.async(group: group, execute: DispatchWorkItem() {
_ = group.wait(timeout: .distantFuture)

Just like with an Objective-C CLI, you can make your own run loop using NSRunLoop.
Here's one possible implementation, modeled from this gist:
class MainProcess {
var shouldExit = false
func start () {
// do your stuff here
// set shouldExit to true when you're done
println("Hello, World!")
var runLoop : NSRunLoop
var process : MainProcess
autoreleasepool {
runLoop = NSRunLoop.currentRunLoop()
process = MainProcess()
while (!process.shouldExit && (runLoop.runMode(NSDefaultRunLoopMode, beforeDate: NSDate(timeIntervalSinceNow: 2)))) {
// do nothing
As Martin points out, you can use NSDate.distantFuture() as NSDate instead of NSDate(timeIntervalSinceNow: 2). (The cast is necessary because the distantFuture() method signature indicates it returns AnyObject.)
If you need to access CLI arguments see this answer. You can also return exit codes using exit().

Swift 3 minimal implementation of Aaron Brager solution, which simply combines autoreleasepool and until you break the loop:
var shouldExit = false
doSomethingAsync() { _ in
defer {
shouldExit = true
autoreleasepool {
var runLoop = RunLoop.current
while (!shouldExit && ( .defaultRunLoopMode, before: Date.distantFuture))) {}

I think CFRunLoop is much easier than NSRunLoop in this case
func main() {
let group = dispatch_group_create()
let queue = dispatch_queue_create("", DISPATCH_QUEUE_CONCURRENT)
for chunk in chunks {
dispatch_group_async(group, queue) {
dispatch_group_wait(group, DISPATCH_TIME_FOREVER)
/**** END **/
let runloop = CFRunLoopGetCurrent()
CFRunLoopPerformBlock(runloop, kCFRunLoopDefaultMode) { () -> Void in
dispatch_async(dispatch_queue_create("main", nil)) {


Compare to sets of files with coroutines in Kotlin

I have written a function that scans files (pictures) from two Lists and check if a file is in both lists.
The code below is working as expected, but for large sets it takes some time. So I tried to do this in parallel with coroutines. But in sets of 100 sample files the programm was always slower than without coroutines.
The code:
private fun doJob() {
val filesToCompare = File("C:\\Users\\Tobias\\Desktop\\Test").walk().filter { it.isFile }.toList()
val allFiles = File("\\\\myserver\\Photos\\photo").walk().filter { it.isFile }.toList()
println("Files to scan: ${filesToCompare.size}")
filesToCompare.forEach { file ->
var multipleDuplicate = 0
var s = "This file is a duplicate"
s += "\n${file.absolutePath}"
allFiles.forEach { possibleDuplicate ->
if (file != possibleDuplicate) { //only needed when both lists are the same
// Files that have the same name or contains the name, so not every file gets byte comparison
if (possibleDuplicate.nameWithoutExtension.contains(file.nameWithoutExtension)) {
try {
if (Files.mismatch(file.toPath(), possibleDuplicate.toPath()) == -1L) {
s += "\n${possibleDuplicate.absolutePath}"
} catch (e: Exception) {
if (multipleDuplicate > 1) {
println("This file has $multipleDuplicate duplicate(s)")
println("Files scanned: ${filesToCompare.size}")
println("Total number of duplicates found: $i")
How have I tried to add the coroutines?
I wrapped the code inside the first forEach in launch{...} the idea was that for each file a coroutine starts and the second loop is done concurrently. I expected the program to run faster but in fact it was about the same time or slower.
How can I achieve this code to run in parallel faster?
Running each inner loop in a coroutine seems to be a decent approach. The problem might lie in the dispatcher you were using. If you used runBlocking and launch without context argument, you were using a single thread to run all your coroutines.
Since there is mostly blocking IO here, you could instead use Dispatchers.IO to launch your coroutines, so your coroutines are dispatched on multiple threads. The parallelism should be automatically limited to 64, but if your memory can't handle that, you can also use Dispatchers.IO.limitedParallelism(n) to reduce the number of threads.

How delay function is working in Kotlin without blocking the current thread?

Past few days I am learning coroutines, most of thee concepts are clear but I don't understand the implementation of the delay function.
How delay function is resuming the coroutine after the delayed time? For a simple program, there is only one main thread, and to resume the coroutine after the delayed time I assume there should be another timer thread that handles all the delayed invocations and invokes them later. Is it true? Can someone explain the implementation detail of the delay function?
When using runBlocking, delay is internally wrapped and runs on same thread and when using any other dispatcher it suspends and is resumed by resuming the continuation by event-loop thread. Check the long answer below to understand the internals.
Long answer:
#Francesc answer is pointing correctly but is somewhat abstract, and still does not explains how actually delay works internally.
So, as he pointed to the delay function:
public suspend fun delay(timeMillis: Long) {
if (timeMillis <= 0) return // don't delay
return suspendCancellableCoroutine sc# { cont: CancellableContinuation<Unit> ->
cont.context.delay.scheduleResumeAfterDelay(timeMillis, cont)
What it does is "Obtains the current continuation instance inside suspend functions and suspends the currently running coroutine after running the block inside the lambda"
So this line cont.context.delay.scheduleResumeAfterDelay(timeMillis, cont) is going to be executed and then the current coroutine gets suspended i.e. frees the current thread it was stick on.
cont.context.delay points to
internal val CoroutineContext.delay: Delay get() = get(ContinuationInterceptor) as? Delay ?: DefaultDelay
that says if ContinuationInterceptor is implementation of Delay then return that otherwise use DefaultDelay which is internal actual val DefaultDelay: Delay = DefaultExecutor a DefaultExecutor which is internal actual object DefaultExecutor : EventLoopImplBase(), Runnable {...} an implementation of EventLoop and has a thread of its own to run on.
Note: ContinuationInterceptor is an implementation of Delay when coroutine is in the runBlocking block in order to make sure the delay run on same thread otherwise it is not. Check this snippet to see the results.
Now I couldn't find implemenation of Delay created by runBlocking since internal expect fun createEventLoop(): EventLoop is an expect function which is implemented from outside, not by the source. But the DefaultDelay is implemented as follows
public override fun scheduleResumeAfterDelay(timeMillis: Long, continuation: CancellableContinuation<Unit>) {
val timeNanos = delayToNanos(timeMillis)
if (timeNanos < MAX_DELAY_NS) {
val now = nanoTime()
DelayedResumeTask(now + timeNanos, continuation).also { task ->
schedule(now, task)
This is how scheduleResumeAfterDelay is implemented it creates a DelayedResumeTask with the continuation passed by delay, and then calls schedule(now, task) which calls scheduleImpl(now, delayedTask) which finally calls delayedTask.scheduleTask(now, delayedQueue, this) passing the delayedQueue in the object
fun scheduleTask(now: Long, delayed: DelayedTaskQueue, eventLoop: EventLoopImplBase): Int {
if (_heap === kotlinx.coroutines.DISPOSED_TASK) return SCHEDULE_DISPOSED // don't add -- was already disposed
delayed.addLastIf(this) { firstTask ->
if (eventLoop.isCompleted) return SCHEDULE_COMPLETED // non-local return from scheduleTask
* We are about to add new task and we have to make sure that [DelayedTaskQueue]
* invariant is maintained. The code in this lambda is additionally executed under
* the lock of [DelayedTaskQueue] and working with [DelayedTaskQueue.timeNow] here is thread-safe.
if (firstTask == null) {
* When adding the first delayed task we simply update queue's [DelayedTaskQueue.timeNow] to
* the current now time even if that means "going backwards in time". This makes the structure
* self-correcting in spite of wild jumps in `nanoTime()` measurements once all delayed tasks
* are removed from the delayed queue for execution.
delayed.timeNow = now
} else {
* Carefully update [DelayedTaskQueue.timeNow] so that it does not sweep past first's tasks time
* and only goes forward in time. We cannot let it go backwards in time or invariant can be
* violated for tasks that were already scheduled.
val firstTime = firstTask.nanoTime
// compute min(now, firstTime) using a wrap-safe check
val minTime = if (firstTime - now >= 0) now else firstTime
// update timeNow only when going forward in time
if (minTime - delayed.timeNow > 0) delayed.timeNow = minTime
* Here [DelayedTaskQueue.timeNow] was already modified and we have to double-check that newly added
* task does not violate [DelayedTaskQueue] invariant because of that. Note also that this scheduleTask
* function can be called to reschedule from one queue to another and this might be another reason
* where new task's time might now violate invariant.
* We correct invariant violation (if any) by simply changing this task's time to now.
if (nanoTime - delayed.timeNow < 0) nanoTime = delayed.timeNow
It finally sets the task into the DelayedTaskQueue with the current time.
// Inside DefaultExecutor
override fun run() {
try {
var shutdownNanos = Long.MAX_VALUE
if (!DefaultExecutor.notifyStartup()) return
while (true) {
Thread.interrupted() // just reset interruption flag
var parkNanos = DefaultExecutor.processNextEvent() /* Notice here, it calls the processNextEvent */
if (parkNanos == Long.MAX_VALUE) {
// nothing to do, initialize shutdown timeout
if (shutdownNanos == Long.MAX_VALUE) {
val now = nanoTime()
if (shutdownNanos == Long.MAX_VALUE) shutdownNanos = now + DefaultExecutor.KEEP_ALIVE_NANOS
val tillShutdown = shutdownNanos - now
if (tillShutdown <= 0) return // shut thread down
parkNanos = parkNanos.coerceAtMost(tillShutdown)
} else
parkNanos = parkNanos.coerceAtMost(DefaultExecutor.KEEP_ALIVE_NANOS) // limit wait time anyway
if (parkNanos > 0) {
// check if shutdown was requested and bail out in this case
if (DefaultExecutor.isShutdownRequested) return
parkNanos(this, parkNanos)
} finally {
DefaultExecutor._thread = null // this thread is dead
// recheck if queues are empty after _thread reference was set to null (!!!)
if (!DefaultExecutor.isEmpty) DefaultExecutor.thread // recreate thread if it is needed
// Called by run inside the run of DefaultExecutor
override fun processNextEvent(): Long {
// unconfined events take priority
if (processUnconfinedEvent()) return nextTime
// queue all delayed tasks that are due to be executed
val delayed = _delayed.value
if (delayed != null && !delayed.isEmpty) {
val now = nanoTime()
while (true) {
// make sure that moving from delayed to queue removes from delayed only after it is added to queue
// to make sure that 'isEmpty' and `nextTime` that check both of them
// do not transiently report that both delayed and queue are empty during move
delayed.removeFirstIf {
if (it.timeToExecute(now)) {
} else
} ?: break // quit loop when nothing more to remove or enqueueImpl returns false on "isComplete"
// then process one event from queue
return nextTime
And then the event loop (run function) of internal actual object DefaultExecutor : EventLoopImplBase(), Runnable {...} finally handles the tasks by dequeuing the tasks and resuming the actual Continuation which was suspended the function by calling delay if the delay time has reached.
All suspending functions work the same way, when compiled it gets converted into a state machine with callbacks.
When you call delay what happens is that a message is posted on a queue with a certain delay, similar to Handler().postDelayed(delay) and, when the delay has lapsed, it calls back to the suspension point and resumes execution.
You can check the source code for the delay function to see how it works:
public suspend fun delay(timeMillis: Long) {
if (timeMillis <= 0) return // don't delay
return suspendCancellableCoroutine sc# { cont: CancellableContinuation<Unit> ->
cont.context.delay.scheduleResumeAfterDelay(timeMillis, cont)
So if the delay is positive, it schedules the callback in the delay time.

Kotlin Coroutines - unlimited stream to fan out batches

I'm looking to implement a pipeline for processing an infinite stream of messages. I'm new to coroutines and trying to follow along with the docs but I'm not confident I'm doing the right thing.
My infinite stream is of batches of records and I'd like to fan out the processing of each record to a coroutine, wait for a batch to finish (to log stats and stuff) before continuing to the next batch.
-> process [record] \
source -> [records] -> process [record] -> [log batch stats]
-> process [record] /
|------------------- while(true) -------------------|
What I had planned is to have 2 Channels, one for the infinite stream, and one for the intermediate records that will fill up and empty on each batch.
runBlocking {
val infinite: Channel<List<Record>> = produce { send(source.getBatch()) }
val records = Channel<Record>(Channel.Factory.UNLIMITED)
while(true) {
infinite.receive().forEach { records.send(it) }
while(!records.isEmpty()) {
launch { process(records.receive()) }
// ??? Wait for jobs?
From googling, it seems that waiting for jobs is discouraged, plus I wasn't sure if calling .map on a channel will actually receive messages to convert them to jobs: { record -> launch { process(record) } }
yields a Channel<Job>. It seems I can call .toList() on it to collapse it, but then I need to join the jobs? Again, google suggested to do that by having a parent job, but I'm not really sure how to do that with launch.
Anyway, very much a n00b to this.
Thanks for the help.
I don't see a reason to have two channels. You could directly iterate over the list of records. And you should use async instead of launch. Then you can use await or even better awaitAll for the list of results.
val infinite: ReceiveChannel<List<Record>> = produce { ... }
while(true) {
val resultsDeferred = infinite.receive().map {
async {
val results = resultsDeferred.awaitAll()

Launch a number of coroutines and join them all with timeout (without cancelling)

I need to launch a number of jobs which will return a result.
In the main code (which is not a coroutine), after launching the jobs I need to wait for them all to complete their task OR for a given timeout to expire, whichever comes first.
If I exit from the wait because all the jobs completed before the timeout, that's great, I will collect their results.
But if some of the jobs are taking longer that the timeout, my main function needs to wake as soon as the timeout expires, inspect which jobs did finish in time (if any) and which ones are still running, and work from there, without cancelling the jobs that are still running.
How would you code this kind of wait?
The solution follows directly from the question. First, we'll design a suspending function for the task. Let's see our requirements:
if some of the jobs are taking longer that the timeout... without cancelling the jobs that are still running.
It means that the jobs we launch have to be standalone (not children), so we'll opt-out of structured concurrency and use GlobalScope to launch them, manually collecting all the jobs. We use async coroutine builder because we plan to collect their results of some type R later:
val jobs: List<Deferred<R>> = List(numberOfJobs) {
GlobalScope.async { /* our code that produces R */ }
after launching the jobs I need to wait for them all to complete their task OR for a given timeout to expire, whichever comes first.
Let's wait for all of them and do this waiting with timeout:
withTimeoutOrNull(timeoutMillis) { jobs.joinAll() }
We use joinAll (as opposed to awaitAll) to avoid exception if one of the jobs fail and withTimeoutOrNull to avoid exception on timeout.
my main function needs to wake as soon as the timeout expires, inspect which jobs did finish in time (if any) and which ones are still running { deferred -> /* ... inspect results */ }
In the main code (which is not a coroutine) ...
Since our main code is not a coroutine it has to wait in a blocking way, so we bridge the code we wrote using runBlocking. Putting it all together:
fun awaitResultsWithTimeoutBlocking(
timeoutMillis: Long,
numberOfJobs: Int
) = runBlocking {
val jobs: List<Deferred<R>> = List(numberOfJobs) {
GlobalScope.async { /* our code that produces R */ }
withTimeoutOrNull(timeoutMillis) { jobs.joinAll() } { deferred -> /* ... inspect results */ }
P.S. I would not recommend deploying this kind of solution in any kind of a serious production environment, since letting your background jobs running (leak) after timeout will invariably badly bite you later on. Do so only if you throughly understand all the deficiencies and risks of such an approach.
You can try to work with whileSelect and the onTimeout clause. But you still have to overcome the problem that your main code is not a coroutine. The next lines are an example of whileSelect statement. The function returns a Deferred with a list of results evaluated in the timeout period and another list of Deferreds of the unfinished results.
fun CoroutineScope.runWithTimeout(timeoutMs: Int): Deferred<Pair<List<Int>, List<Deferred<Int>>>> = async {
val deferredList = (1..100).mapTo(mutableListOf()) {
async {
val random = Random.nextInt(0, 100)
val finished = mutableListOf<Int>()
val endTime = System.currentTimeMillis() + timeoutMs
whileSelect {
var waitTime = endTime - System.currentTimeMillis()
onTimeout(waitTime) {
deferredList.toList().forEach { deferred ->
deferred.onAwait { random ->
finished.toList() to deferredList.toList()
In your main code you can use the discouraged method runBlocking to access the Deferrred.
fun main() = runBlocking<Unit> {
val deferredResult = runWithTimeout(75)
val (finished, pending) = deferredResult.await()
println("Finished: ${finished.size} vs Pending: ${pending.size}")
Here is the solution I came up with. Pairing each job with a state (among other info):
private enum class State { WAIT, DONE, ... }
private data class MyJob(
val job: Deferred<...>,
var state: State = State.WAIT,
and writing an explicit loop:
// wait until either all jobs complete, or a timeout is reached
val waitJob = launch { delay(TIMEOUT_MS) }
while (waitJob.isActive && myJobs.any { it.state == State.WAIT }) {
select<Unit> {
waitJob.onJoin {}
myJobs.filter { it.state == State.WAIT }.forEach {
it.job.onJoin {}
// mark any finished jobs as DONE to exclude them from the next loop
myJobs.filter { !it.job.isActive }.forEach {
it.state = State.DONE
The initial state is called WAIT (instead of RUN) because it doesn't necessarily mean that the job is still running, only that my loop has not yet taken it into account.
I'm interested to know if this is idiomatic enough, or if there are better ways to code this kind of behaviour.

How to dispatch on main queue synchronously without a deadlock?

I need to dispatch a block on the main queue, synchronously. I don’t know if I’m currently running on the main thread or no. The naive solution looks like this:
dispatch_sync(dispatch_get_main_queue(), block);
But if I’m currently inside of a block running on the main queue, this call creates a deadlock. (The synchronous dispatch waits for the block to finish, but the block does not even start running, since we are waiting for the current one to finish.)
The obvious next step is to check for the current queue:
if (dispatch_get_current_queue() == dispatch_get_main_queue()) {
} else {
dispatch_sync(dispatch_get_main_queue(), block);
This works, but it’s ugly. Before I at least hide it behind some custom function, isn’t there a better solution for this problem? I stress that I can’t afford to dispatch the block asynchronously – the app is in a situation where the asynchronously dispatched block would get executed “too late”.
I need to use something like this fairly regularly within my Mac and iOS applications, so I use the following helper function (originally described in this answer):
void runOnMainQueueWithoutDeadlocking(void (^block)(void))
if ([NSThread isMainThread])
dispatch_sync(dispatch_get_main_queue(), block);
which you call via
//Do stuff
This is pretty much the process you describe above, and I've talked to several other developers who have independently crafted something like this for themselves.
I used [NSThread isMainThread] instead of checking dispatch_get_current_queue(), because the caveats section for that function once warned against using this for identity testing and the call was deprecated in iOS 6.
For syncing on the main queue or on the main thread (that is not the same) I use:
import Foundation
private let mainQueueKey = UnsafeMutablePointer<Void>.alloc(1)
private let mainQueueValue = UnsafeMutablePointer<Void>.alloc(1)
public func dispatch_sync_on_main_queue(block: () -> Void)
struct dispatchonce { static var token : dispatch_once_t = 0 }
dispatch_queue_set_specific(dispatch_get_main_queue(), mainQueueKey, mainQueueValue, nil)
if dispatch_get_specific(mainQueueKey) == mainQueueValue
extension NSThread
public class func runBlockOnMainThread(block: () -> Void )
if NSThread.isMainThread()
public class func runBlockOnMainQueue(block: () -> Void)
I recently began experiencing a deadlock during UI updates. That lead me this Stack Overflow question, which lead to me implementing a runOnMainQueueWithoutDeadlocking-type helper function based on the accepted answer.
The real issue, though, is that when updating the UI from a block I had mistakenly used dispatch_sync rather than dispatch_async to get the Main queue for UI updates. Easy to do with code completion, and perhaps hard to notice after the fact.
So, for others reading this question: if synchronous execution is not required, simply using dispatch_**a**sync will avoid the deadlock you may be intermittently hitting.