I would like to read all available elements from a channel so that I can do batch processing on them if my receiver is slower then my sender (in hopes that processing a batch will be more performant and allow the receiver to catch up). I only want to suspend if the channel is empty, not suspend until my batch is full or timeout unlike this question.
Is there anything built into the standard kotlin library to accomplish this?
I did not find anything in the standard kotlin library, but here is what I came up with. This will suspend only for the first element and then poll all remaining elements. This only really works with a Buffered Channel so that elements ready for processing are queued and available for poll
/**
* Receive all available elements up to [max]. Suspends for the first element if the channel is empty
*/
internal suspend fun <E> ReceiveChannel<E>.receiveAvailable(max: Int): List<E> {
if (max <= 0) {
return emptyList()
}
val batch = mutableListOf<E>()
if (this.isEmpty) {
// suspend until the next message is ready
batch.add(receive())
}
fun pollUntilMax() = if (batch.size >= max) null else poll()
// consume all other messages that are ready
var next = pollUntilMax()
while (next != null) {
batch.add(next)
next = pollUntilMax()
}
return batch
}
I tested Jakes code and it worked well for me (thanks!). Without max limit, I got it down to:
suspend fun <E> ReceiveChannel<E>.receiveAvailable(): List<E> {
val allMessages = mutableListOf<E>()
allMessages.add(receive())
var next = poll()
while (next != null) {
allMessages.add(next)
next = poll()
}
return allMessages
}
Related
well, I have an Observable, I’ve used asFlow() to convert it but doesn’t emit.
I’m trying to migrate from Rx and Channels to Flow, so I have this function
override fun processIntents(intents: Observable<Intent>) {
intents.asFlow().shareTo(intentsFlow).launchIn(this)
}
shareTo() is an extension function which does onEach { receiver.emit(it) }, processIntents exists in a base ViewModel, and intentsFlow is a MutableSharedFlow.
fun <T> Flow<T>.shareTo(receiver: MutableSharedFlow<T>): Flow<T> {
return onEach { receiver.emit(it) }
}
I want to pass emissions coming from the intents Observable to intentsFlow, but it doesn’t work at all and the unit test keeps failing.
#Test(timeout = 4000)
fun `WHEN processIntent() with Rx subject or Observable emissions THEN intentsFlow should receive them`() {
return runBlocking {
val actual = mutableListOf<TestNumbersIntent>()
val intentSubject = PublishSubject.create<TestNumbersIntent>()
val viewModel = FlowViewModel<TestNumbersIntent, TestNumbersViewState>(
dispatcher = Dispatchers.Unconfined,
initialViewState = TestNumbersViewState()
)
viewModel.processIntents(intentSubject)
intentSubject.onNext(OneIntent)
intentSubject.onNext(TwoIntent)
intentSubject.onNext(ThreeIntent)
viewModel.intentsFlow.take(3).toList(actual)
assertEquals(3, actual.size)
assertEquals(OneIntent, actual[0])
assertEquals(TwoIntent, actual[1])
assertEquals(ThreeIntent, actual[2])
}
}
test timed out after 4000 milliseconds
org.junit.runners.model.TestTimedOutException: test timed out after
4000 milliseconds
This works
val ps = PublishSubject.create<Int>()
val mf = MutableSharedFlow<Int>()
val pf = ps.asFlow()
.onEach {
mf.emit(it)
}
launch {
pf.take(3).collect()
}
launch {
mf.take(3).collect {
println("$it") // Prints 1 2 3
}
}
launch {
yield() // Without this we suspend indefinitely
ps.onNext(1)
ps.onNext(2)
ps.onNext(3)
}
We need the take(3)s to make sure our program terminates, because MutableSharedFlow and PublishSubject -> Flow collect indefinitely.
We need the yield because we're working with a single thread and we need to give the other coroutines an opportunity to start working.
Take 2
This is much better. Doesn't use take, and cleans up after itself.
After emitting the last item, calling onComplete on the PublishSubject terminates MutableSharedFlow collection. This is a convenience, so that when this code runs it terminates completely. It is not a requirement. You can arrange your Job termination however you like.
Your code never terminating is not related to the emissions never being collected by the MutableSharedFlow. These are separate concerns. The first is due to the fact that neither a flow created from a PublishSubject, nor a MutableSharedFlow, terminates on its own. The PublishSubject flow will terminate when onComplete is called. The MutableSharedFlow will terminate when the coroutine (specifically, its Job) collecting it terminates.
The Flow constructed by PublishSubject.asFlow() drops any emissions if, at the time of the emission, collection of the Flow hasn't suspended, waiting for emissions. This introduces a race condition between being ready to collect and code that calls PublishSubject.onNext().
This, I believe, is the reason why flow collection isn't picking up the onNext emissions in your code.
It's why a yield is required right after we launch the coroutine that collects from psf.
val ps = PublishSubject.create<Int>()
val msf = MutableSharedFlow<Int>()
val psf = ps.asFlow()
.onEach {
msf.emit(it)
}
val j1 = launch {
psf.collect()
}
yield() // Use this to allow psf.collect to catch up
val j2 = launch {
msf.collect {
println("$it") // Prints 1 2 3 4
}
}
launch {
ps.onNext(1)
ps.onNext(2)
ps.onNext(3)
ps.onNext(4)
ps.onComplete()
}
j1.invokeOnCompletion { j2.cancel() }
j2.join()
I'm running a coroutine that is reading from a receiverChannel. I've got this coroutine wrapped within a timeout and I want to get the number of messages it managed to read before the timeout cancels it. Here's what I have:
runBlocking {
val receivedMessages = withTimeoutOrNull(someTimeout) {
var found = 0
while (isActive && found < expectedAmount){
val message = incoming.receive()
// some filtering
found++
}
found
} ?: 0 // <- to not have null...
// Currently prints 0, but I want the messages it managed to read
println("I've received $receivedMessages messages")
}
I know I can use atomicInteger, but I would like to keep away from java specifics here
Local variables in a coroutine don't need to be atomic because of the happens-before guarantee even though there may be some thread swapping going on. Your code doesn't have any parallelism, so you can use the following:
runBlocking {
var receivedMessages = 0
withTimeoutOrNull(someTimeout) {
while (isActive && receivedMessages < expectedAmount){
val message = incoming.receive()
// some filtering
receivedMessages++
}
}
println("I've received $receivedMessages messages")
}
If you do have multiple children coroutines running in parallel, you can use a Mutex. More info here
I have the following code
class Consumer(val consumer: KafkaConsumer<String, ConsumerRecord<String>>) {
fun run() {
consumer.seekToEnd(emptyList())
val pollDuration = 30 // seconds
while (true) {
val records = consumer.poll(Duration.ofSeconds(pollDuration))
// perform record analysis and commitSync()
}
}
}
}
The topic which the consumer is subscribed to continously receives records. Occasionally, the consumer will crash due to the processing step. When the consumer then is restarted, I want it to consume from the latest offset on the topic (i.e. ignore records that were published to the topic while the consumer was down). I thought the seekToEnd() method would ensure that. However, it seems like the method has no effect at all. The consumer starts to consume from the offset from which it crashed.
What is the correct way to use seekToEnd()?
Edit: The consumer is created with the following configs
fun <T> buildConsumer(valueDeserializer: String): KafkaConsumer<String, T> {
val props = setupConfig(valueDeserializer)
Common.setupConsumerSecurityProtocol(props)
return createConsumer(props)
}
fun setupConfig(valueDeserializer: String): Properties {
// Configuration setup
val props = Properties()
props[ConsumerConfig.GROUP_ID_CONFIG] = config.applicationId
props[ConsumerConfig.CLIENT_ID_CONFIG] = config.kafka.clientId
props[ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG] = config.kafka.bootstrapServers
props[AbstractKafkaSchemaSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG] = config.kafka.schemaRegistryUrl
props[ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG] = config.kafka.stringDeserializer
props[ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG] = valueDeserializer
props[KafkaAvroDeserializerConfig.SPECIFIC_AVRO_READER_CONFIG] = "true"
props[ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG] = config.kafka.maxPollIntervalMs
props[ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG] = config.kafka.sessionTimeoutMs
props[ConsumerConfig.ALLOW_AUTO_CREATE_TOPICS_CONFIG] = "false"
props[ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG] = "false"
props[ConsumerConfig.AUTO_OFFSET_RESET_CONFIG] = "latest"
return props
}
fun <T> createConsumer(props: Properties): KafkaConsumer<String, T> {
val consumer = KafkaConsumer<String, T>(props)
consumer.subscribe(listOf(config.kafka.inputTopic))
return consumer
}
I found a solution!
I needed to add a dummy poll as a part of the consumer initialization process. Since several Kafka methods are evaluated lazily, it is necessary with a dummy poll to assign partitions to the consumer. Without the dummy poll, the consumer tries to seek to the end of partitions that are null. As a result, seekToEnd() has no effect.
It is important that the dummy poll duration is long enough for the partitions to get assigned. For instance with consumer.poll((Duration.ofSeconds(1)), the partitions did not get time to be assigned before the program moved on to the next method call (i.e. seekToEnd()).
Working code could look something like this
class Consumer(val consumer: KafkaConsumer<String, ConsumerRecord<String>>) {
fun run() {
// Initialization
val pollDuration = 30 // seconds
consumer.poll((Duration.ofSeconds(pollDuration)) // Dummy poll to get assigned partitions
// Seek to end and commit new offset
consumer.seekToEnd(emptyList())
consumer.commitSync()
while (true) {
val records = consumer.poll(Duration.ofSeconds(pollDuration))
// perform record analysis and commitSync()
}
}
}
}
The seekToEnd method requires the information on the actual partition (in Kafka terms TopicPartition) on which you plan to make your consumer read from the end.
I am not familiar with the Kotlin API, but checking the JavaDocs on the KafkaConsumer's method seekToEnd you will see, that it asks for a collection of TopicPartitions.
As you are currently using emptyList(), it will have no impact at all, just like you observed.
I'm trying to write a code, that will let me test situations, in which Javas #Synchronized is not enough, to synchronize Kotlin coroutines. From my understanding, the code below:
var sharedCounter: Long = 0
#Synchronized
suspend fun updateCounter() {
delay(2)
sharedCounter++
delay(2)
yield()
}
fun main() = runBlocking {
var regularCounter: Long = 0
val scope = CoroutineScope(Dispatchers.IO + Job())
val jobs = mutableListOf<Job>()
repeat(1000) {
val job = scope.launch {
for (i in 1..1_000) {
regularCounter++
updateCounter()
}
}
jobs.add(job)
}
jobs.forEach { it.join() }
println("The number of shared counter is $sharedCounter")
println("The number of regular counter is $regularCounter")
}
should result in both sharedCounter and regularCounter NOT being equal to 1000000.
This code was based on this and this articles.
For some reason, sharedCounter always equals 1000000 and I'm not sure why.
I've tried testing larger for loops, but it did not "break" the synchronization either.
The synchronization lock is released at each suspend function call, but re-acquired before continuing. In your code, delay() and yield() are the suspension points. But sharedCounter++ is all local code, so the lock is held while it performs its three steps of getting the value, incrementing it, and setting it. So, the first delay() releases the lock, the continuation resumes so it re-locks and performs sharedCounter++, and then the lock is released again at the second delay() call.
Consider there are 3 functions that results in Mono<Int>s. I am trying to get the first result emitted by any of the Monos. Here's a test to describe what I am looking for:
fun main() {
StepVerifier
.create(firstElement())
.expectSubscription()
.expectNext(3)
.expectComplete()
.verify()
}
fun firstElement(): Mono<Int> = Flux.concat(_1(), _2(), _3(), _4()).next()
fun _1(): Mono<Int> = 1.toMono().delayElement(Duration.ofMillis(1000))
fun _2(): Mono<Int> = Mono.empty()
fun _3(): Mono<Int> = 3.toMono().delayElement(Duration.ofMillis(500))
fun _4(): Mono<Int> = Mono.error(RuntimeException())
The question is in firstElement(), how to result in 3 since it's the first to emit an element. But, as you can see, from any of the Monos:
It's possible that any of them could emit faster than the rest
It's possible that any of them could emit empty or onComplete()
It's possible that any of them could emit error or onError()
I have tried several operators:
Mono.zip {...} requires all of them to emit, because the return is Tuple<Int!>
Mono.first(...) and Flux.first(...).next() transmits onComplete() and/or onError()
Flux.concat(...) eliminates onComplete() and onError() but it's still sequentially subscribing based on the order of the given Publisher<T>s
You could resume on error with empty Mono and merge your functions
private Mono<Integer> firstElement() {
return Flux.merge(
_1().onErrorResume(ignored -> Mono.empty()),
_2().onErrorResume(ignored -> Mono.empty()),
_3().onErrorResume(ignored -> Mono.empty()),
_4().onErrorResume(ignored -> Mono.empty()))
.next();
}