Terminal operator for Kotlin sequence that discard the output? - kotlin

I'm operating on very large Kotlin sequence, I'm executing my logic on every step of the sequence and I never need to keep the whole sequence in memory.
Currently my code looks like this
hugeSequence
.filter { ... }
.map {...... }
.onEach {
callExpensiveOperation(it)
}
.toList() <- this feels wrong
The toList() at the bottom is the terminal operator, but I'm worried that Kotlin may try to create a huge list in memory, before realising that I'm not even assign the result value of that operation.
Is there any other terminal operator I can use just to trigger the sequence to start?

Use forEach instead of onEach. It is the terminal equivalent of onEach.
hugeSequence
.filter { ... }
.map {...... }
.forEach {
callExpensiveOperation(it)
}

Related

How iterate until the condition is met using kotlin and functional programming?

I'm using an API that returns a text like this:
BW3511,HGP,ITP,Canceled,32.
I have to continue fetching until I get a response that is not "Canceled".
this code fetches the data:
val flightResponse = async {
println("Started fetching Flight info.")
client.get<String>(FLIGHT_ENDPOINT).also {
println("Finished fetching Flight info.")
}
}
the client.get can only be called within The coroutineScope body, also the flightResponse type is Deferred<String>.
check if it is canceled:
fun isCanceled(
flightResponse: String
) : Boolean {
val (_, _, _, status, _) = flightResponse.split(",")
return status == "Canceled"
}
how can I repeat client.get<String>(FLIGHT_ENDPOINT) until my condition is met using Functional Programming style?
I tried using takeIf but I have to get at least one result and it cannot be a nullable type.
As said in the comment by #Jorn, this looks like an overuse of functional style. It can be implemented by a simple loop and this way it will be probably more clear to the reader:
fun getNextNotCancelled() {
while (true) {
val response = client.get<String>(FLIGHT_ENDPOINT)
if (!isCanceled(response)) return response
}
}
If your real case is more complex, so you have several filters, etc. or for any other reason you really need to do this declaratively, then you need to create some kind of an infinite generator. For classic synchronous code that means sequence and for asynchronous - flow.
Example using a sequence:
generateSequence { client.get<String>(FLIGHT_ENDPOINT) }
.first { !isCanceled(it) }
Flow:
flow {
while (true) {
emit(client.get<String>(FLIGHT_ENDPOINT))
}
}.first { !isCanceled(it) }
As you said you use coroutines, I assume you would like to go for the latter. And as you can see, it is pretty similar to our initial loop-based approach, only more complicated. Of course, we can create a similar generateFlow() utility function and then it would be shorter.

Compare to sets of files with coroutines in Kotlin

I have written a function that scans files (pictures) from two Lists and check if a file is in both lists.
The code below is working as expected, but for large sets it takes some time. So I tried to do this in parallel with coroutines. But in sets of 100 sample files the programm was always slower than without coroutines.
The code:
private fun doJob() {
val filesToCompare = File("C:\\Users\\Tobias\\Desktop\\Test").walk().filter { it.isFile }.toList()
val allFiles = File("\\\\myserver\\Photos\\photo").walk().filter { it.isFile }.toList()
println("Files to scan: ${filesToCompare.size}")
filesToCompare.forEach { file ->
var multipleDuplicate = 0
var s = "This file is a duplicate"
s += "\n${file.absolutePath}"
allFiles.forEach { possibleDuplicate ->
if (file != possibleDuplicate) { //only needed when both lists are the same
// Files that have the same name or contains the name, so not every file gets byte comparison
if (possibleDuplicate.nameWithoutExtension.contains(file.nameWithoutExtension)) {
try {
if (Files.mismatch(file.toPath(), possibleDuplicate.toPath()) == -1L) {
s += "\n${possibleDuplicate.absolutePath}"
i++
multipleDuplicate++
println(s)
}
} catch (e: Exception) {
println(e.message)
}
}
}
}
if (multipleDuplicate > 1) {
println("This file has $multipleDuplicate duplicate(s)")
}
}
println("Files scanned: ${filesToCompare.size}")
println("Total number of duplicates found: $i")
}
How have I tried to add the coroutines?
I wrapped the code inside the first forEach in launch{...} the idea was that for each file a coroutine starts and the second loop is done concurrently. I expected the program to run faster but in fact it was about the same time or slower.
How can I achieve this code to run in parallel faster?
Running each inner loop in a coroutine seems to be a decent approach. The problem might lie in the dispatcher you were using. If you used runBlocking and launch without context argument, you were using a single thread to run all your coroutines.
Since there is mostly blocking IO here, you could instead use Dispatchers.IO to launch your coroutines, so your coroutines are dispatched on multiple threads. The parallelism should be automatically limited to 64, but if your memory can't handle that, you can also use Dispatchers.IO.limitedParallelism(n) to reduce the number of threads.

Is it considered bad convention when in iterating through two maps, I don't check if key exists in one of them?

I have two maps, let's call them oneMap and twoMap.
I am iterating through all the keys in oneMap, and if the key exists in twoMap I do something
like
fun exampleFunc(oneMap: Map<String, Any>, twoMap: Map<String, Any>) {
for((oneMapKey, oneMapVal) in oneMap) {
if (twoMap.containsKey(oneMapKey)) {
val twoMapVal = twoMap[oneMapKey]
if (twoMapVal == oneMapVal) {
//do more stuff
}
//do more stuff, I have more if statements
}
}
}
To avoid having more nested if statements, I was wondering if instead I could get rid of the
if (twoMap.containsKey(oneMapKey)) check. if twoMap doesn't contain the oneMapKey, we get a null object, and my code still works fine. I was wondering if this is considered bad convention though
fun exampleFunc(oneMap: Map<String, Any>, twoMap: Map<String, Any>) {
for((oneMapKey, oneMapVal) in oneMap) {
val twoMapVal = twoMap[oneMapKey]
if (twoMapVal == oneMapVal) {
//do more stuff
}
//do more stuff, I have more if statements
}
}
It depends. Do you wanna execute the "more stuff" or not?
If you do not wanna execute it you should keep the if condition. Though, if you are concerned about indentation (and deep if hierarchies) you can consider breaking out of the loop:
for((oneMapKey, oneMapVal) in oneMap) {
if (!twoMap.contains(oneMapKey)) continue // continue with next iteration
// do more stuff
}
If your map does not contain null values you can also get the value and check if the result was null (which means the key was not present in the map):
for((oneMapKey, oneMapVal) in oneMap) {
val twoMapVal: Any = twoMap[oneMapKey] ?: continue // continue with next iteration
// do more stuff
}
So its always good practice the remove useless code and (in my opinion) to have less if-hierarchies, as you can easily loose focus when you have lots of nested conditions.
As Tenfour04 says, omitting the containsKey() check is only an option if the map values aren't nullable; if they are, then []/get() gives no way to distinguish between a missing mapping and a mapping to a null value.
But if not (or if you want to ignore null values anyway), then I'd certainly consider omitting the check; the resulting code would be slightly shorter and slightly more efficient, without losing clarity or maintainability.  It could also avoid a potential race condition.  (Though in a multi-threaded situation, I'd be considering more robust protection!)
One variation is to use let() along with the safe-call ?. operator to restrict it to non-null cases:
for ((oneMapKey, oneMapVal) in oneMap) {
twoMap[oneMapKey]?.let { twoMapVal ->
if (twoMapVal == oneMapVal) {
// Do more stuff
}
// Do more stuff
}
}
Using ?.let() this way seems to be a fairly common idiom in Kotlin, so it should be fairly transparent.

How to asynchronously map over sequence

I want to iterate over a sequence of objects and return the first non-null of an async call.
The point is to perform some kind of async operation that might fail, and I have a series of fallbacks that I want to try in order, one after the other (i.e. lazily / not in parallel).
I've tried to do something similar to what I'd do if it were a sync call:
// ccs: List<CurrencyConverter>
override suspend fun getExchangeRateAsync(from: String, to: String) =
ccs.asSequence()
.map { it.getExchangeRateAsync(from, to) }
.firstOrNull { it != null }
?: throw CurrencyConverterException()
IntelliJ complains:
Suspension functions can only be called within coroutine body
Edit: To clarify, this works as expected if mapping on a List, but I want to see how I'd do this on a sequence.
So I guess this is because the map lambda isn't suspended? But I'm not sure how to actually do that. I tried a bunch of different ways but none seemed to work. I couldn't find any examples.
If I re-write this in a more procedural style using a for loop with an async block, I can get it working:
override suspend fun getExchangeRateAsync(from: String, to: String) {
for (cc in ccs) {
var res: BigDecimal? = async {
cc.getExchangeRateAsync(from, to)
}.await()
if (res != null) {
return res
}
}
throw CurrencyConverterException()
}
You are getting an error, because Sequence is lazy by default and it's map isn't an inline function, so it's scope isn't defined
You can avoid using Sequence by creating a list of lazy coroutines
// ccs: List<CurrencyConverter>
suspend fun getExchangeRateAsync(from: String, to: String) =
ccs
.map { async(start = CoroutineStart.LAZY) { it.getExchangeRateAsync(from, to) } }
.firstOrNull { it.await() != null }
?.getCompleted() ?: throw Exception()
This doesn't give any errors and seems to be working. But I'm not sure it's an idiomatic way
I would suggest replacing Sequence with Flow. Flow api and behavior is pretty much same as for Sequence, but with suspending options.
https://kotlinlang.org/docs/reference/coroutines/flow.html
Code:
override suspend fun getExchangeRateAsync(from: String, to: String) =
ccs.asFlow()
.map { it.getExchangeRateAsync(from, to) }
.firstOrNull { it != null }
?: throw CurrencyConverterException()
FWIW, I found the suggestion in How to asynchronously map over sequence to be very intuitive. The code at https://github.com/Kotlin/kotlin-coroutines-examples/blob/master/examples/suspendingSequence/suspendingSequence.kt defines SuspendingIterator which allows next() to suspend, then builds SuspendingSequence on top of it. Unfortunately, you need to duplicate extension functions like flatMap(), filter(), etc. since SuspendingSequence can't be related to Sequence, but I did this and am much happier with the result than using a Channel.

How to avoid nested Single in RxJava2

I am fairly new in RxJava pradigm. I am doing following is leading to nested Single objects.
tickHappened.map{
func(it)
}
//I get Single<Single<ArrayList<String>>>
Here tickHappened:Single<T> and func<T>(T param):Single<ArrayList<String>>
tickHappened.map{
func(it)
}
//I get Single<Single<ArrayList<String>>>
.map { single ->
single.map { list ->
list.size
}
}
I actually need to return Single<Int> which is the size of the Arraylist passed. I need to use map twice in the above chain which leads to Single<Single<Int>>
Is there a way to avoid nesting Singles? If I understand Rxjava, it doesn't make sense to have a Single which enclose another Single? If not, then is there a way to return Single<Int>?
As a beginner, one thing to learn is the flatMap operator that is available all around RxJava and is the most common operator needed for solving problems:
tickHappened
.flatMap { func(it) }
.map { it.size() }