Kotlin - Chunk sequence based on size and time - kotlin

I have a never ending stream as a sequence.
What I am aiming for is to take a batch from the sequence both based on time and size.
What I mean is if my sequence has 2250 messages right now I want to send 3 batches ( 1000, 1000, 250).
Also if till the next 5 minute I still have not accumulated a 1000 messages I will send it anyway with whatever has accumulated so far.
sequence
.chunked(1000)
.map { chunk ->
// do something with chunk
}
What I was expecting to have is something like .chunked(1000, 300) which 300 is second for when I want to send every 5 minutes.
Thanks in advance

Kotlin Sequence is a synchronous concept and is not supposed to be used in any kind of time-limited fashion. If you ask the sequence for the next element then it blocks invoker thread until it produces the next element and there is no way to cancel it.
However, kotlinx.coroutines library introduces the concept of Channel which is a rough analogue of a sequence for an asynchronous world, where operation may take some time to complete and they don't block threads while doing so. You can read more in this guide.
It does not provide a ready-to-use chunked operator, but makes it straightforward to write one. You can use the following code:
import kotlinx.coroutines.experimental.channels.*
import kotlinx.coroutines.experimental.selects.*
fun <T> ReceiveChannel<T>.chunked(size: Int, time: Long) =
produce<List<T>>(onCompletion = consumes()) {
while (true) { // this loop goes over each chunk
val chunk = mutableListOf<T>() // current chunk
val ticker = ticker(time) // time-limit for this chunk
try {
whileSelect {
ticker.onReceive {
false // done with chunk when timer ticks, takes priority over received elements
}
this#chunked.onReceive {
chunk += it
chunk.size < size // continue whileSelect if chunk is not full
}
}
} catch (e: ClosedReceiveChannelException) {
return#produce // that is normal exception when the source channel is over -- just stop
} finally {
ticker.cancel() // release ticker (we don't need it anymore as we wait for the first tick only)
if (chunk.isNotEmpty()) send(chunk) // send non-empty chunk on exit from whileSelect
}
}
}
As you can see from this code, it embeds some non-trivial decisions on what to do in corner cases. What should we do if timer expires but current chunk is still empty? This code start new time interval and does not send the previous (empty) chunk. Do we finish current chunk on timeout after last element, measure time from the first element, or measure time from the beginning of chunk? This code does the later.
This code is completely sequential -- its logic is easy to follow in a step-by-step way (there is not concurrency inside the code). One can adjust it to any project-speicfic requirements.

Related

What is the difference between limitedParallelism vs a fixed thread pool dispatcher?

I am trying to use Kotlin coroutines to perform multiple HTTP calls concurrently, rather than one at a time, but I would like to avoid making all of the calls concurrently, to avoid rate limiting by the external API.
If I simply launch a coroutine for each request, they all are sent near instantly. So I looked into the limitedParallelism function, which sounds very close to what I need, and some stack overflow answers suggest is the recommended solution. Older answers to the same question suggested using newFixedThreadPoolContext.
The documentation for that function mentioned limitedParallelism as a preferred alternative "if you do not need a separate thread pool":
If you do not need a separate thread-pool, but only have to limit effective parallelism of the dispatcher, it is recommended to use CoroutineDispatcher.limitedParallelism instead.
However, when I write my code to use limitedParallelism, it does not reduce the number of concurrent calls, compared to newFixedThreadPoolContext which does.
In the example below, I replace my network calls with Thread.sleep, which does not change the behavior.
// method 1
val fixedThreadPoolContext = newFixedThreadPoolContext(2)
// method 2
val limitedParallelismContext = Dispatchers.IO.limitedParallelism(2)
runBlocking {
val jobs = (1..1000).map {
// swap out the dispatcher here
launch(limitedParallelismContext) {
println("started $it")
Thread.sleep(1000)
println(" finished $it")
}
}
jobs.joinAll()
}
The behavior for fixedThreadPoolContext is as expected, no more than 2 of the coroutines runs at a time, and the total time to finish is several minutes (1000 times one second each, divided by two at a time, roughly 500 seconds).
However, for limitedParallelismContext, all "started #" lines print immediately, and one second later, all "finished #" lines print and the program completes in just over 1 total second.
Why does limitedParallelism not have the same effect as using a separate thread pool? What does it accomplish?
I modified your code slightly so that every coroutine takes 200ms to complete and it prints the time when it is completed. Then I pasted it to play.kotlinlang.org to check:
/**
* You can edit, run, and share this code.
* play.kotlinlang.org
*/
import kotlinx.coroutines.*
fun main() {
// method 1
val fixedThreadPoolContext = newFixedThreadPoolContext(2, "Pool")
// method 2
val limitedParallelismContext = Dispatchers.IO.limitedParallelism(2)
runBlocking {
val jobs = (1..10).map {
// swap out the dispatcher here
launch(limitedParallelismContext) {
println("it at ${System.currentTimeMillis()}")
Thread.sleep(200)
}
}
jobs.joinAll()
}
}
And there using kotlin 1.6.21 the result is as expected:
it at 1652887163155
it at 1652887163157
it at 1652887163358
it at 1652887163358
it at 1652887163559
it at 1652887163559
it at 1652887163759
it at 1652887163759
it at 1652887163959
it at 1652887163959
Only 2 coroutines are executed at a time.

How do I add a short delay, so user can see every number that was rolled. Kotlin, Android Studio

let's say I'm making a simple dnd dice roller (cause I am), I made it so it rolls a bunch of random numbers based on how many dice they want rolled and the type of dice. it then sends it to a text view one at a time(what I want); However, it only shows one number because it has no delay to let the the user see each number rolled (it only shows the last number).
How would I do that?
else if (numTimesRolled.progress <= 4) {
for (i in 0 until numTimesRolled.progress){
randNum = Random.nextInt(1, diceIsComfirm)
resultsArray[i] = randNum.toString()
}
for (i in 0 until numTimesRolled.progress){
randNumDisplay.text = resultsArray[i]
}
Non-coroutines solution is to post Runnables:
val delayPerNumber = 500L // 500ms
for (i in 0 until numTimesRolled.progress){
randNumDisplay.postDelayed({ randNumDisplay.text = resultsArray[i] }, i * delayPerNumber)
}
With a coroutine:
lifecycleScope.launch {
for (i in 0 until numTimesRolled.progress){
delay(500) // 500ms
randNumDisplay.text = resultsArray[i]
}
}
An advantage with the coroutine is it will automatically stop if the Activity or Fragment is destroyed, so if the Activity/Fragment is closed while the coroutine's still running, it won't hold your obsolete views in memory.

emitting flow values asynchronously with kotlins flow

Iam building a simple Spring Service with kotlin and webflux.
I have a endpoint which returns a flow. The flow contains elements which take a long time to compute which is simulated by a delay.
It is constructed like this:
suspend fun latest(): Flow<Message> {
println("generating messages")
return flow {
for (i in 0..20) {
println("generating $i")
if (i % 2 == 0) delay(1000)
else delay(200)
println("generated messsage $i")
emit(generateMessage(i))
}
println("messages generated")
}
}
My expectation was that it would return Message1 followed by Message3, Message5... and then Message0 because of the different delays the individual generation takes.
But in reality the flow contains the elements in order.
I guess iam missing something important about coroutins and flow and i tryed diffrent thinks to achive what i want with couroutins but i cant figure out how.
Solution
As pointed out by Marko Topolnik and William Reed using channelFlow works.
fun latest(): Flow<Message> {
println("generating numbers")
return channelFlow {
for (i in 0..20) {
launch {
send(generateMessage(i))
}
}
}
}
suspend fun generateMessage(i: Int): Message {
println("generating $i")
val time = measureTimeMillis {
if (i % 2 == 0) delay(1000)
else delay(500)
}
println("generated messsage $i in ${time}ms")
return Message(UUID.randomUUID(), "This is Message $i")
}
When run the results are as expected
generating numbers
generating 2
generating 0
generating 1
generating 6
...
generated messsage 5 in 501ms
generated messsage 9 in 501ms
generated messsage 13 in 501ms
generated messsage 15 in 505ms
generated messsage 4 in 1004ms
...
Once you go concurrent with the computation of each element, your first problem will be to figure out when all the computation is done.
You have to know in advance how many items to expect. So it seems natural to me to construct a plain List<Deferred<Message>> and then await on all the deferreds before returning the entire thing. You aren't getting any mileage from the flow in your case, since flow is all about doing things synchronously, inside the flow collection.
You can also use channelFlow in combination with a known count of messages to expect, and then terminate the flow based on that. The advantage would be that Spring can start collecting the flow earlier.
EDIT
Actually, the problem of the count isn't present: the flow will automatically wait for all the child coroutines you launched to complete.
Your current approach uses a single coroutine for the entire function, including the for loop. That means that any calling of a suspend fun, e.g. delay will block that entire coroutine until it completes. It does free up the thread to go do other stuff, but the current coroutine is blocked.
It's hard to say what the right solution is based on your simplified example. If you truly did want a new coroutine for each for loop, you could launch it there, but it doesn't seem clear that is the right solution from the information given.

Repeat request multiple times with different params using RxJava

I need to load some data from server page by page until all the data is loaded. The data is considered to be fully loaded if at some point I received fewer items than I've requested. This is the working solution that I have right now:
return Observable.fromCallable { 0 }
.delay(500, TimeUnit.MILLISECONDS)
.repeat()
.scan { previousPage, _ -> previousPage + 1}
.concatMap { doLongFetch(it) }
.takeUntil { it.size < 100 }
fun doLongFetch(page: Int): Observable<List<ListItem>>() {
//Here I do the loading
}
However, there's a problem with the source observable. As you can see, it emits new values every 500 milliseconds to provide some input for the scan function. The delay is required since otherwise, it would emit thousands of values in a very short period of time, which is not required at all. Ideally, I want to remove that delay completely and make sure that the source observable emits another value only after the downstream has handled the previous one (meaning that the data has been requested and processed).
Any ideas on how I can do that?

How to limit JProfiler to a subtree

I have a method called com.acmesoftware.shared.AbstractDerrivedBean.getDerivedUniqueId(). When I JProfiler the application, this method, getDerivedUniqueId(), is essentially buried 80 methods deep as expected. The method is invoked on behalf of every bean in the application. I'm trying to record CPU calltree starting with this method down to leaf node (ie, one of the excluded classes).
I tried the following but it didn't produce the expected outcome:
Find a method above the method targeted for profiling, eg, markForDeletion().
set trigger to start recording at getDerivedUniqueId()
set trigger to STOP recording at markForDeletion()
I was expecting to only see everything below markForDeletion(), but I saw everything up to but not INCLUDING getDerivedUniqueId(), which is the opposite of my intended goal. Worse yet, even with 5ms sampling, this trigger increased the previous running time from 10 minutes to "I terminated after 3 hours of running". It seems the trigger is adding a giant amount of overhead on top of the overhead. Hence, even if I figure out how to correctly enable the trigger, the added overhead would seem to render it ineffective.
The reason I need to limit the recording to just this method is: When running in 5ms sampling mode, the application completes in 10 minutes. When I run it in full instrumentation, I've waited 3 hours and it still hasn't completed. Hence, I need to turn on full instrumentation ONLY after getDerivedUniqueId() is invoked and pause profiling when getDerivedUniqueId() is exited.
-- Updated/Edit:
Thank you Ingo Kegel for your assistance.
I am likely not clear on how to use triggers. In the code below, I set triggers as shown after the code. My expectation is that when I JProfile the application (both sampling and full instrumentation) with the below configured triggers, if boolean isCollectMetrics is false, I should see 100% or 99.9% of cpu in filtered classes. However, that is not the case. The CPU tree seems not to take into account the triggers.
Secondly, when isCollectMetrics is true, the jprofiler call tree I expect would start with startProfiling() and end at stopProfiling(). Again, this is not the case either.
The method contains() is the bottleneck. It eventually calls one of 150 getDerivedUniqueId(). I am trying to pinpoint which getDerivedUniqueId() is causing the performance degradation.
public static final AtomicLong doEqualContentTime = new AtomicLong();
public static final AtomicLong instCount = new AtomicLong();
protected boolean contentsEqual(final InstanceSetValue that) {
if (isCollectMetrics) {
// initialization code removed for clarity
// ..........
// ..........
final Set<Instance> c1 = getReferences();
final Set<Instance> c2 = that.getReferences();
long st = startProfiling(); /// <------- start here
for (final Instance inst : c1) {
instCount.incrementAndGet();
if (!c2.contains(inst)) {
long et = stopProfiling(); /// <------- stop here
doEqualContentTime.addAndGet(et - st);
return false;
}
}
long et = stopProfiling(); /// <------- stop here
doEqualContentTime.addAndGet(et - st);
return true;
} else {
// same code path as above but w/o the profiling. code removed for bravity.
// ......
// ......
return true;
}
}
public long startProfiling() {
return System.nanoTime();
}
public long stopProfiling() {
return System.nanoTime();
}
public static void reset() {
doEqualContentTime.set(0);
instCount.set(0);
}
The enabled triggers:
startProfiling trigger:
stopProfiling trigger:
I've tried 'Start Recordings' or 'Record CPU' buttons separately to capture the call tree only
If the overhead with instrumentation is large, you should refine your filters. With good filters, the instrumentation overhead can be very small,
As for the trigger setup, the correct actions are:
"Start recording" with CPU data selected
"Wait for the event to finish"
"Stop recording" with CPU data selected