Get value from withTimeout - kotlin

I'm running a coroutine that is reading from a receiverChannel. I've got this coroutine wrapped within a timeout and I want to get the number of messages it managed to read before the timeout cancels it. Here's what I have:
runBlocking {
val receivedMessages = withTimeoutOrNull(someTimeout) {
var found = 0
while (isActive && found < expectedAmount){
val message = incoming.receive()
// some filtering
found++
}
found
} ?: 0 // <- to not have null...
// Currently prints 0, but I want the messages it managed to read
println("I've received $receivedMessages messages")
}
I know I can use atomicInteger, but I would like to keep away from java specifics here

Local variables in a coroutine don't need to be atomic because of the happens-before guarantee even though there may be some thread swapping going on. Your code doesn't have any parallelism, so you can use the following:
runBlocking {
var receivedMessages = 0
withTimeoutOrNull(someTimeout) {
while (isActive && receivedMessages < expectedAmount){
val message = incoming.receive()
// some filtering
receivedMessages++
}
}
println("I've received $receivedMessages messages")
}
If you do have multiple children coroutines running in parallel, you can use a Mutex. More info here

Related

How to pass Observable emissions to MutableSharedFlow?

well, I have an Observable, I’ve used asFlow() to convert it but doesn’t emit.
I’m trying to migrate from Rx and Channels to Flow, so I have this function
override fun processIntents(intents: Observable<Intent>) {
intents.asFlow().shareTo(intentsFlow).launchIn(this)
}
shareTo() is an extension function which does onEach { receiver.emit(it) }, processIntents exists in a base ViewModel, and intentsFlow is a MutableSharedFlow.
fun <T> Flow<T>.shareTo(receiver: MutableSharedFlow<T>): Flow<T> {
return onEach { receiver.emit(it) }
}
I want to pass emissions coming from the intents Observable to intentsFlow, but it doesn’t work at all and the unit test keeps failing.
#Test(timeout = 4000)
fun `WHEN processIntent() with Rx subject or Observable emissions THEN intentsFlow should receive them`() {
return runBlocking {
val actual = mutableListOf<TestNumbersIntent>()
val intentSubject = PublishSubject.create<TestNumbersIntent>()
val viewModel = FlowViewModel<TestNumbersIntent, TestNumbersViewState>(
dispatcher = Dispatchers.Unconfined,
initialViewState = TestNumbersViewState()
)
viewModel.processIntents(intentSubject)
intentSubject.onNext(OneIntent)
intentSubject.onNext(TwoIntent)
intentSubject.onNext(ThreeIntent)
viewModel.intentsFlow.take(3).toList(actual)
assertEquals(3, actual.size)
assertEquals(OneIntent, actual[0])
assertEquals(TwoIntent, actual[1])
assertEquals(ThreeIntent, actual[2])
}
}
test timed out after 4000 milliseconds
org.junit.runners.model.TestTimedOutException: test timed out after
4000 milliseconds
This works
val ps = PublishSubject.create<Int>()
val mf = MutableSharedFlow<Int>()
val pf = ps.asFlow()
.onEach {
mf.emit(it)
}
launch {
pf.take(3).collect()
}
launch {
mf.take(3).collect {
println("$it") // Prints 1 2 3
}
}
launch {
yield() // Without this we suspend indefinitely
ps.onNext(1)
ps.onNext(2)
ps.onNext(3)
}
We need the take(3)s to make sure our program terminates, because MutableSharedFlow and PublishSubject -> Flow collect indefinitely.
We need the yield because we're working with a single thread and we need to give the other coroutines an opportunity to start working.
Take 2
This is much better. Doesn't use take, and cleans up after itself.
After emitting the last item, calling onComplete on the PublishSubject terminates MutableSharedFlow collection. This is a convenience, so that when this code runs it terminates completely. It is not a requirement. You can arrange your Job termination however you like.
Your code never terminating is not related to the emissions never being collected by the MutableSharedFlow. These are separate concerns. The first is due to the fact that neither a flow created from a PublishSubject, nor a MutableSharedFlow, terminates on its own. The PublishSubject flow will terminate when onComplete is called. The MutableSharedFlow will terminate when the coroutine (specifically, its Job) collecting it terminates.
The Flow constructed by PublishSubject.asFlow() drops any emissions if, at the time of the emission, collection of the Flow hasn't suspended, waiting for emissions. This introduces a race condition between being ready to collect and code that calls PublishSubject.onNext().
This, I believe, is the reason why flow collection isn't picking up the onNext emissions in your code.
It's why a yield is required right after we launch the coroutine that collects from psf.
val ps = PublishSubject.create<Int>()
val msf = MutableSharedFlow<Int>()
val psf = ps.asFlow()
.onEach {
msf.emit(it)
}
val j1 = launch {
psf.collect()
}
yield() // Use this to allow psf.collect to catch up
val j2 = launch {
msf.collect {
println("$it") // Prints 1 2 3 4
}
}
launch {
ps.onNext(1)
ps.onNext(2)
ps.onNext(3)
ps.onNext(4)
ps.onComplete()
}
j1.invokeOnCompletion { j2.cancel() }
j2.join()

KafkaConsumer: `seekToEnd()` does not make consumer consume from latest offset

I have the following code
class Consumer(val consumer: KafkaConsumer<String, ConsumerRecord<String>>) {
fun run() {
consumer.seekToEnd(emptyList())
val pollDuration = 30 // seconds
while (true) {
val records = consumer.poll(Duration.ofSeconds(pollDuration))
// perform record analysis and commitSync()
}
}
}
}
The topic which the consumer is subscribed to continously receives records. Occasionally, the consumer will crash due to the processing step. When the consumer then is restarted, I want it to consume from the latest offset on the topic (i.e. ignore records that were published to the topic while the consumer was down). I thought the seekToEnd() method would ensure that. However, it seems like the method has no effect at all. The consumer starts to consume from the offset from which it crashed.
What is the correct way to use seekToEnd()?
Edit: The consumer is created with the following configs
fun <T> buildConsumer(valueDeserializer: String): KafkaConsumer<String, T> {
val props = setupConfig(valueDeserializer)
Common.setupConsumerSecurityProtocol(props)
return createConsumer(props)
}
fun setupConfig(valueDeserializer: String): Properties {
// Configuration setup
val props = Properties()
props[ConsumerConfig.GROUP_ID_CONFIG] = config.applicationId
props[ConsumerConfig.CLIENT_ID_CONFIG] = config.kafka.clientId
props[ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG] = config.kafka.bootstrapServers
props[AbstractKafkaSchemaSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG] = config.kafka.schemaRegistryUrl
props[ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG] = config.kafka.stringDeserializer
props[ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG] = valueDeserializer
props[KafkaAvroDeserializerConfig.SPECIFIC_AVRO_READER_CONFIG] = "true"
props[ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG] = config.kafka.maxPollIntervalMs
props[ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG] = config.kafka.sessionTimeoutMs
props[ConsumerConfig.ALLOW_AUTO_CREATE_TOPICS_CONFIG] = "false"
props[ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG] = "false"
props[ConsumerConfig.AUTO_OFFSET_RESET_CONFIG] = "latest"
return props
}
fun <T> createConsumer(props: Properties): KafkaConsumer<String, T> {
val consumer = KafkaConsumer<String, T>(props)
consumer.subscribe(listOf(config.kafka.inputTopic))
return consumer
}
I found a solution!
I needed to add a dummy poll as a part of the consumer initialization process. Since several Kafka methods are evaluated lazily, it is necessary with a dummy poll to assign partitions to the consumer. Without the dummy poll, the consumer tries to seek to the end of partitions that are null. As a result, seekToEnd() has no effect.
It is important that the dummy poll duration is long enough for the partitions to get assigned. For instance with consumer.poll((Duration.ofSeconds(1)), the partitions did not get time to be assigned before the program moved on to the next method call (i.e. seekToEnd()).
Working code could look something like this
class Consumer(val consumer: KafkaConsumer<String, ConsumerRecord<String>>) {
fun run() {
// Initialization
val pollDuration = 30 // seconds
consumer.poll((Duration.ofSeconds(pollDuration)) // Dummy poll to get assigned partitions
// Seek to end and commit new offset
consumer.seekToEnd(emptyList())
consumer.commitSync()
while (true) {
val records = consumer.poll(Duration.ofSeconds(pollDuration))
// perform record analysis and commitSync()
}
}
}
}
The seekToEnd method requires the information on the actual partition (in Kafka terms TopicPartition) on which you plan to make your consumer read from the end.
I am not familiar with the Kotlin API, but checking the JavaDocs on the KafkaConsumer's method seekToEnd you will see, that it asks for a collection of TopicPartitions.
As you are currently using emptyList(), it will have no impact at all, just like you observed.

How delay function is working in Kotlin without blocking the current thread?

Past few days I am learning coroutines, most of thee concepts are clear but I don't understand the implementation of the delay function.
How delay function is resuming the coroutine after the delayed time? For a simple program, there is only one main thread, and to resume the coroutine after the delayed time I assume there should be another timer thread that handles all the delayed invocations and invokes them later. Is it true? Can someone explain the implementation detail of the delay function?
TL; DR;
When using runBlocking, delay is internally wrapped and runs on same thread and when using any other dispatcher it suspends and is resumed by resuming the continuation by event-loop thread. Check the long answer below to understand the internals.
Long answer:
#Francesc answer is pointing correctly but is somewhat abstract, and still does not explains how actually delay works internally.
So, as he pointed to the delay function:
public suspend fun delay(timeMillis: Long) {
if (timeMillis <= 0) return // don't delay
return suspendCancellableCoroutine sc# { cont: CancellableContinuation<Unit> ->
cont.context.delay.scheduleResumeAfterDelay(timeMillis, cont)
}
}
What it does is "Obtains the current continuation instance inside suspend functions and suspends the currently running coroutine after running the block inside the lambda"
So this line cont.context.delay.scheduleResumeAfterDelay(timeMillis, cont) is going to be executed and then the current coroutine gets suspended i.e. frees the current thread it was stick on.
cont.context.delay points to
internal val CoroutineContext.delay: Delay get() = get(ContinuationInterceptor) as? Delay ?: DefaultDelay
that says if ContinuationInterceptor is implementation of Delay then return that otherwise use DefaultDelay which is internal actual val DefaultDelay: Delay = DefaultExecutor a DefaultExecutor which is internal actual object DefaultExecutor : EventLoopImplBase(), Runnable {...} an implementation of EventLoop and has a thread of its own to run on.
Note: ContinuationInterceptor is an implementation of Delay when coroutine is in the runBlocking block in order to make sure the delay run on same thread otherwise it is not. Check this snippet to see the results.
Now I couldn't find implemenation of Delay created by runBlocking since internal expect fun createEventLoop(): EventLoop is an expect function which is implemented from outside, not by the source. But the DefaultDelay is implemented as follows
public override fun scheduleResumeAfterDelay(timeMillis: Long, continuation: CancellableContinuation<Unit>) {
val timeNanos = delayToNanos(timeMillis)
if (timeNanos < MAX_DELAY_NS) {
val now = nanoTime()
DelayedResumeTask(now + timeNanos, continuation).also { task ->
continuation.disposeOnCancellation(task)
schedule(now, task)
}
}
}
This is how scheduleResumeAfterDelay is implemented it creates a DelayedResumeTask with the continuation passed by delay, and then calls schedule(now, task) which calls scheduleImpl(now, delayedTask) which finally calls delayedTask.scheduleTask(now, delayedQueue, this) passing the delayedQueue in the object
#Synchronized
fun scheduleTask(now: Long, delayed: DelayedTaskQueue, eventLoop: EventLoopImplBase): Int {
if (_heap === kotlinx.coroutines.DISPOSED_TASK) return SCHEDULE_DISPOSED // don't add -- was already disposed
delayed.addLastIf(this) { firstTask ->
if (eventLoop.isCompleted) return SCHEDULE_COMPLETED // non-local return from scheduleTask
/**
* We are about to add new task and we have to make sure that [DelayedTaskQueue]
* invariant is maintained. The code in this lambda is additionally executed under
* the lock of [DelayedTaskQueue] and working with [DelayedTaskQueue.timeNow] here is thread-safe.
*/
if (firstTask == null) {
/**
* When adding the first delayed task we simply update queue's [DelayedTaskQueue.timeNow] to
* the current now time even if that means "going backwards in time". This makes the structure
* self-correcting in spite of wild jumps in `nanoTime()` measurements once all delayed tasks
* are removed from the delayed queue for execution.
*/
delayed.timeNow = now
} else {
/**
* Carefully update [DelayedTaskQueue.timeNow] so that it does not sweep past first's tasks time
* and only goes forward in time. We cannot let it go backwards in time or invariant can be
* violated for tasks that were already scheduled.
*/
val firstTime = firstTask.nanoTime
// compute min(now, firstTime) using a wrap-safe check
val minTime = if (firstTime - now >= 0) now else firstTime
// update timeNow only when going forward in time
if (minTime - delayed.timeNow > 0) delayed.timeNow = minTime
}
/**
* Here [DelayedTaskQueue.timeNow] was already modified and we have to double-check that newly added
* task does not violate [DelayedTaskQueue] invariant because of that. Note also that this scheduleTask
* function can be called to reschedule from one queue to another and this might be another reason
* where new task's time might now violate invariant.
* We correct invariant violation (if any) by simply changing this task's time to now.
*/
if (nanoTime - delayed.timeNow < 0) nanoTime = delayed.timeNow
true
}
return SCHEDULE_OK
}
It finally sets the task into the DelayedTaskQueue with the current time.
// Inside DefaultExecutor
override fun run() {
ThreadLocalEventLoop.setEventLoop(this)
registerTimeLoopThread()
try {
var shutdownNanos = Long.MAX_VALUE
if (!DefaultExecutor.notifyStartup()) return
while (true) {
Thread.interrupted() // just reset interruption flag
var parkNanos = DefaultExecutor.processNextEvent() /* Notice here, it calls the processNextEvent */
if (parkNanos == Long.MAX_VALUE) {
// nothing to do, initialize shutdown timeout
if (shutdownNanos == Long.MAX_VALUE) {
val now = nanoTime()
if (shutdownNanos == Long.MAX_VALUE) shutdownNanos = now + DefaultExecutor.KEEP_ALIVE_NANOS
val tillShutdown = shutdownNanos - now
if (tillShutdown <= 0) return // shut thread down
parkNanos = parkNanos.coerceAtMost(tillShutdown)
} else
parkNanos = parkNanos.coerceAtMost(DefaultExecutor.KEEP_ALIVE_NANOS) // limit wait time anyway
}
if (parkNanos > 0) {
// check if shutdown was requested and bail out in this case
if (DefaultExecutor.isShutdownRequested) return
parkNanos(this, parkNanos)
}
}
} finally {
DefaultExecutor._thread = null // this thread is dead
DefaultExecutor.acknowledgeShutdownIfNeeded()
unregisterTimeLoopThread()
// recheck if queues are empty after _thread reference was set to null (!!!)
if (!DefaultExecutor.isEmpty) DefaultExecutor.thread // recreate thread if it is needed
}
}
// Called by run inside the run of DefaultExecutor
override fun processNextEvent(): Long {
// unconfined events take priority
if (processUnconfinedEvent()) return nextTime
// queue all delayed tasks that are due to be executed
val delayed = _delayed.value
if (delayed != null && !delayed.isEmpty) {
val now = nanoTime()
while (true) {
// make sure that moving from delayed to queue removes from delayed only after it is added to queue
// to make sure that 'isEmpty' and `nextTime` that check both of them
// do not transiently report that both delayed and queue are empty during move
delayed.removeFirstIf {
if (it.timeToExecute(now)) {
enqueueImpl(it)
} else
false
} ?: break // quit loop when nothing more to remove or enqueueImpl returns false on "isComplete"
}
}
// then process one event from queue
dequeue()?.run()
return nextTime
}
And then the event loop (run function) of internal actual object DefaultExecutor : EventLoopImplBase(), Runnable {...} finally handles the tasks by dequeuing the tasks and resuming the actual Continuation which was suspended the function by calling delay if the delay time has reached.
All suspending functions work the same way, when compiled it gets converted into a state machine with callbacks.
When you call delay what happens is that a message is posted on a queue with a certain delay, similar to Handler().postDelayed(delay) and, when the delay has lapsed, it calls back to the suspension point and resumes execution.
You can check the source code for the delay function to see how it works:
public suspend fun delay(timeMillis: Long) {
if (timeMillis <= 0) return // don't delay
return suspendCancellableCoroutine sc# { cont: CancellableContinuation<Unit> ->
cont.context.delay.scheduleResumeAfterDelay(timeMillis, cont)
}
}
So if the delay is positive, it schedules the callback in the delay time.

Launch a number of coroutines and join them all with timeout (without cancelling)

I need to launch a number of jobs which will return a result.
In the main code (which is not a coroutine), after launching the jobs I need to wait for them all to complete their task OR for a given timeout to expire, whichever comes first.
If I exit from the wait because all the jobs completed before the timeout, that's great, I will collect their results.
But if some of the jobs are taking longer that the timeout, my main function needs to wake as soon as the timeout expires, inspect which jobs did finish in time (if any) and which ones are still running, and work from there, without cancelling the jobs that are still running.
How would you code this kind of wait?
The solution follows directly from the question. First, we'll design a suspending function for the task. Let's see our requirements:
if some of the jobs are taking longer that the timeout... without cancelling the jobs that are still running.
It means that the jobs we launch have to be standalone (not children), so we'll opt-out of structured concurrency and use GlobalScope to launch them, manually collecting all the jobs. We use async coroutine builder because we plan to collect their results of some type R later:
val jobs: List<Deferred<R>> = List(numberOfJobs) {
GlobalScope.async { /* our code that produces R */ }
}
after launching the jobs I need to wait for them all to complete their task OR for a given timeout to expire, whichever comes first.
Let's wait for all of them and do this waiting with timeout:
withTimeoutOrNull(timeoutMillis) { jobs.joinAll() }
We use joinAll (as opposed to awaitAll) to avoid exception if one of the jobs fail and withTimeoutOrNull to avoid exception on timeout.
my main function needs to wake as soon as the timeout expires, inspect which jobs did finish in time (if any) and which ones are still running
jobs.map { deferred -> /* ... inspect results */ }
In the main code (which is not a coroutine) ...
Since our main code is not a coroutine it has to wait in a blocking way, so we bridge the code we wrote using runBlocking. Putting it all together:
fun awaitResultsWithTimeoutBlocking(
timeoutMillis: Long,
numberOfJobs: Int
) = runBlocking {
val jobs: List<Deferred<R>> = List(numberOfJobs) {
GlobalScope.async { /* our code that produces R */ }
}
withTimeoutOrNull(timeoutMillis) { jobs.joinAll() }
jobs.map { deferred -> /* ... inspect results */ }
}
P.S. I would not recommend deploying this kind of solution in any kind of a serious production environment, since letting your background jobs running (leak) after timeout will invariably badly bite you later on. Do so only if you throughly understand all the deficiencies and risks of such an approach.
You can try to work with whileSelect and the onTimeout clause. But you still have to overcome the problem that your main code is not a coroutine. The next lines are an example of whileSelect statement. The function returns a Deferred with a list of results evaluated in the timeout period and another list of Deferreds of the unfinished results.
fun CoroutineScope.runWithTimeout(timeoutMs: Int): Deferred<Pair<List<Int>, List<Deferred<Int>>>> = async {
val deferredList = (1..100).mapTo(mutableListOf()) {
async {
val random = Random.nextInt(0, 100)
delay(random.toLong())
random
}
}
val finished = mutableListOf<Int>()
val endTime = System.currentTimeMillis() + timeoutMs
whileSelect {
var waitTime = endTime - System.currentTimeMillis()
onTimeout(waitTime) {
false
}
deferredList.toList().forEach { deferred ->
deferred.onAwait { random ->
deferredList.remove(deferred)
finished.add(random)
true
}
}
}
finished.toList() to deferredList.toList()
}
In your main code you can use the discouraged method runBlocking to access the Deferrred.
fun main() = runBlocking<Unit> {
val deferredResult = runWithTimeout(75)
val (finished, pending) = deferredResult.await()
println("Finished: ${finished.size} vs Pending: ${pending.size}")
}
Here is the solution I came up with. Pairing each job with a state (among other info):
private enum class State { WAIT, DONE, ... }
private data class MyJob(
val job: Deferred<...>,
var state: State = State.WAIT,
...
)
and writing an explicit loop:
// wait until either all jobs complete, or a timeout is reached
val waitJob = launch { delay(TIMEOUT_MS) }
while (waitJob.isActive && myJobs.any { it.state == State.WAIT }) {
select<Unit> {
waitJob.onJoin {}
myJobs.filter { it.state == State.WAIT }.forEach {
it.job.onJoin {}
}
}
// mark any finished jobs as DONE to exclude them from the next loop
myJobs.filter { !it.job.isActive }.forEach {
it.state = State.DONE
}
}
The initial state is called WAIT (instead of RUN) because it doesn't necessarily mean that the job is still running, only that my loop has not yet taken it into account.
I'm interested to know if this is idiomatic enough, or if there are better ways to code this kind of behaviour.

How to read all available elements from a Channel in Kotlin

I would like to read all available elements from a channel so that I can do batch processing on them if my receiver is slower then my sender (in hopes that processing a batch will be more performant and allow the receiver to catch up). I only want to suspend if the channel is empty, not suspend until my batch is full or timeout unlike this question.
Is there anything built into the standard kotlin library to accomplish this?
I did not find anything in the standard kotlin library, but here is what I came up with. This will suspend only for the first element and then poll all remaining elements. This only really works with a Buffered Channel so that elements ready for processing are queued and available for poll
/**
* Receive all available elements up to [max]. Suspends for the first element if the channel is empty
*/
internal suspend fun <E> ReceiveChannel<E>.receiveAvailable(max: Int): List<E> {
if (max <= 0) {
return emptyList()
}
val batch = mutableListOf<E>()
if (this.isEmpty) {
// suspend until the next message is ready
batch.add(receive())
}
fun pollUntilMax() = if (batch.size >= max) null else poll()
// consume all other messages that are ready
var next = pollUntilMax()
while (next != null) {
batch.add(next)
next = pollUntilMax()
}
return batch
}
I tested Jakes code and it worked well for me (thanks!). Without max limit, I got it down to:
suspend fun <E> ReceiveChannel<E>.receiveAvailable(): List<E> {
val allMessages = mutableListOf<E>()
allMessages.add(receive())
var next = poll()
while (next != null) {
allMessages.add(next)
next = poll()
}
return allMessages
}