Hey I am new in kotlin flow. I am trying to print flow size. As we know that list has size() function. Do we have something similar function for flow.
val list = mutableListof(1,2,3)
println(list.size)
output
2
How do we get size value in flow?
dataMutableStateFlow.collectLatest { data ->
???
}
Thanks
A Flow doesn't know its size at any moment, because there is an unknown number of future values to be emitted. Also, Flows do not keep a record of how many values they have emitted in the past.
Sequences have the same problem. With both Flows and Sequences, you can only get the count by doing something terminal with them, something that iterates through it all.
The only way to get the size of the Flow is to do something that iterates through the entire Flow. For instance, you can call the suspend function count() on a Flow to get its size. The more complicated way to do it would be to create a count variable and then increment the count inside a collect call. However, counting the emissions of a Flow is only usable for finite cold Flows. Hot flows (SharedFlow and StateFlow) are never finite, and many cold Flows are also infinite.
Related
I have 2 flows. First flow updates every 50ms. I have the second flow which is equal to first flow but i want it to produce latest value from original flow every 300ms. I found debounce extension for flows but it doesn't work (note from docs):
Note that the resulting flow does not emit anything as long as the original flow emits items faster than every timeoutMillis milliseconds.
So when i'm trying to emit value each 300 ms with debounce it doesn't emit at all because original flow is faster than i need. So how can i make something like that:
delay 300ms
check original flow for latest value
emit this value
repeat
my flows right now:
// fast flow (50ms)
val orientationFlow = merge(_orientationFlow, locationFlow)
val cameraBearingFlow = orientationFlow.debounce(300ms)
P.S. this approach doesn't fit because we are delaying value so it's not fresh after 300ms. I need to get the freshest value after 300ms:
val cameraBearingFlow = azimuthFlow.onEach {
delay(ORIENTATION_UPDATE_DELAY)
it
}
You need sample instead of debounce
val f = fastFlow.sample(300)
Assume we have a Finder that returns multiple objects
fun findAll(): Flux<CarEntity> {
return carRepository.findAll()
}
Because we want to apply some logic to all cars and passengers at the same time, we convert it to a Mono through .collectList()
val carsMono = carsFinder.findAll().collectList()
val passengerMono = carsFinder.findAll().collectList()
return Mono.zip(carsMono, passengerMono)
In other words,
We have a list of entities of undefined length
We gather every item in a list until there is no more - how is this done without blocking the threat?
No collectList() is not a blocking operator but we need to be careful when we use this operator.
with this operator we wait till the upper stream has emitted all elements to us, and if this stream is never ending stream like kafkatemplate or a processor, the collect list will collect elements till it run in out of memory exception.
Or when we have a big stream like findAll from database this will collect a big list consuming a lot of ram memory or even giving out of memory exception.
If you know that you deal with small number of elements you are safe to go.
If you can avoid and process elements in stream than would be better
I have a requirement like this.
Flux<Integer> s1 = .....;
s1.flatMap(value -> anotherSource.find(value));
I need a way to stop this s1 when anotherSource.find gives me first empty. how to do that?
Note:
One possible solution is to throw error then capture it to stop.
anotherSource.find(value).switchIfempty(Mono.error(..))
I am looking for better solution than this.
You won't find a specific operator for this, you'll have to combine operators to achieve it. (Note that doesn't make it a "hack" per-se, reactive frameworks are generally intended to be used in a way where you combine basic operators together to achieve your use-case.)
I would agree that using an error to achieve is far from ideal though as it potentially disrupts the flow of real errors in the reactive chain - so that should really be a last resort.
The approach I've generally taken in cases where I want the stream to stop based on an inner publisher is to materialise the inner stream, filter out the onComplete() signals and then re-add the onComplete() wherever appropriate (in this case, if it's empty.) You can then dematerialise the outer stream and it'll respond to the completed signal wherever you've injected it, stopping the stream:
s1.flatMap(
value ->
anotherSource
.find(value)
.materialize()
.filter(s -> !s.isOnComplete())
.defaultIfEmpty(Signal.complete()))
.dematerialize()
This has the advantage of preserving any error signals, while also not requiring another object or special value.
My goal is to benchmark the latency and the throughput of Apache Beam on a streaming data use-case with different window queries.
I want to create my own data with an on-the-fly data generator to control the data generation rate manually and consume this data directly from a pipeline without a pub/sub mechanism, i.e. I don't want to read the data from a broker, etc. to avoid bottlenecks.
Is there a way of doing something similar to what I want to achieve? or is there any source code for such use-case with Beam SDKs?
So far I couldn't find a starting point, existing code samples use pub/sub mechanism and they assume data comes from somewhere.
Thank you for suggestions in advance.
With regards to On-the-fly data, one option would be to make use of GenerateSequence for example:
pipeline.apply(GenerateSequence.from(0).withRate(RATE,Duration.millis(1000)))
To create other types of objects you can use a ParDo after to consume the Long and make it into something else:
Pipeline p = Pipeline.create(PipelineOptionsFactory.create());
p.apply(GenerateSequence.from(0).withRate(2, Duration.millis(1000)))
.apply(Window.into(FixedWindows.of(Duration.standardSeconds(1))))
.apply(FlatMapElements.into(TypeDescriptors.kvs(TypeDescriptors.strings(), TypeDescriptors.strings()))
.via(i -> IntStream.range(0,2).mapToObj(k -> KV.of(String.format("Gen Value %s" , i),String.format("FlatMap Value %s ", k))).collect(Collectors.toList())))
.apply(ParDo.of(new DoFn<KV<String,String>, String>() {
#ProcessElement
public void process(#Element KV<String,String> input){
LOG.info("Value was {}", input);
}
}));
p.run();
That should generate values like:
Value was KV{Gen Value 0, FlatMap Value 0 }
Value was KV{Gen Value 0, FlatMap Value 1 }
Value was KV{Gen Value 1, FlatMap Value 0 }
Value was KV{Gen Value 1, FlatMap Value 1 }
Value was KV{Gen Value 2, FlatMap Value 0 }
Value was KV{Gen Value 2, FlatMap Value 1 }
Some other things to keep in mind for your pipelines performance testing:
The Direct runner is designed for unit testing, it does cool things like simulate failures, this helps catch issues which will be seen when running a production pipeline. It is not designed to help with performance testing however. I would recommend always using a main runner for those types of integration tests.
Please be aware of the Fusion optimization Link to Docs, when using a artificial data source like GenerateSequence you may need to do a GBK as the next step to allow the work to be parallelized. For the Dataflow runner more info can be found here: Link to Docs
In general for performance testing, I would recommend testing the whole end to end pipeline. There are interactions with sources and sinks ( for example watermarks ) which will not be tested in a standalone pipeline.
Hope that helps.
I am just trying the new kotlin language. I came across sequences which generate infinite list. I generated a sequence and tried printing the first 10 elements. But the below code didnt print anything:
fun main(args: Array<String>) {
val generatePrimeFrom2 = generateSequence(3){ it + 2 }
print(generatePrimeFrom2.toList().take(10))
}
But when I changed take(10).toList() in the print statement it work fine. Why is it so ?
This code worked fine for me:
fun main(args: Array<String>) {
val generatePrimeFrom2 = generateSequence(3){ it + 2 }
print(generatePrimeFrom2.take(10).toList())
}
The generateSequence function generates a sequence that is either infinite or finishes when the lambda passed to it returns null. In your case, it is { it + 2 }, which never returns null, so the sequence is infinite.
When you call .toList() on a sequence, it will try to collect all sequence elements and thus will never stop if the sequence is infinite (unless the index overflows or an out-of-memory error happens), so it does not print anything because it does not finish.
In the second case, on contrary, you limit the number of elements in the sequence with .take(10) before trying to collect its items. Then the .toList() call simply collects the 10 items and finishes.
It may become more clear if you check this Q&A about differences between Sequence<T> and Iterable<T>: (link)
Here is the hint -> generate infinite list. In the first solution you first want to create a list (wait infinity) then take first 10 elements.
On the second snippet, from infinite list you take only first 10 elements and change it to list
generatePrimeFrom2.toList() tries to compute/create an infinite-length list.
generatePrimeFrom2.toList().take(10) then takes the first 10 elements from the infinite-length list.
It does not print because it is calculating that infinite-length list.
Whereas, generatePrimeFrom2.take(10) only tries to compute the first 10 elements.
generatePrimeFrom2.take(10).toList() converts the first 10 elements to the list.
You know, generateSequence(3){ it + 2 } does not have the end. So it has the infinite length.
Sequences do not have actual values, calculated when they are needed, but Lists have to have actual values.
I came across sequences which generate infinite list.
This is not actually correct. The main point is that a sequence is not a list. It is a lazily evaluated construct and only the items you request will actually become "materialized", i.e., their memory allocated on the heap.
That's why it's not interchangeable to write
infiniteSeq.toList().take(10)
and
infiniteSeq.take(10).toList()
The former will try to instantiate infinitely many items—and, predictably, fail at it.