Why can't I use map on Kotlins Regex Result sequence - kotlin

I worked with Kotlin's Regex API to get occurences of some regular expression. I wanted to convert the finding directly into another object so I intuitively used map() on the result sequence.
I was very surprised that the map function is never called but forEach is working. This example should make it clear:
val regex = "a.".toRegex()
val txt = "abacad"
var counter = 0
regex.findAll(txt).forEach { counter++ }
println(counter) // 3
regex.findAll(txt).map { counter++ }
println(counter) // still 3 since map is not called
regex.findAll(txt).forEach { counter++ }
println(counter) // 6
My question is why? Did I oversee it in the documentation?
(tested on Kotlin 1.5.30)

findAll() returns a Sequence<MatchResult>. Operations on Sequence are classified either as intermediate or terminal. The documentation for the functions declares which type they are. map and onEach are intermediate. Their action is deferred until a terminal operation is made. forEach is terminal.
Manipulating a Sequence with map returns a new Sequence that will perform the mapping function only when it is actually iterated, such as by a call to forEach or using it in a for loop.
This is the purpose of Sequence, to defer mutating functional calls. It can reduce allocations of intermediate Lists, or in some cases avoid applying the mutations on every single item, such as if the terminal call in the chain is a find() call.

Related

MutableList of MutableLists in Kotlin: adding element error

Why I'm getting "java.lang.IndexOutOfBoundsException: Index 0 out of bounds for length 0" while running next code??? :
val totalList = mutableListOf<MutableList<Int>>()
fun main() {
for (i in 0..15) {
for (j in 0..10) {
*some operations and calculations with **var element of type Int***
totalList[i].add(element)
}
}
}
I was thinking that in such case while iterating through 'j' it should add elements to mutableList[i], after this it should start adding elements to mutableList[i + 1] etc.... But instead I am recieving IndexOutOfBoundsException....
val totalList = mutableListOf<MutableList<Int>>()
All this does is create one list which is going to contain MutableList<Int> items. Right now, there's nothing in it (you've supplied no initial elements in the parentheses).
Skip forward a bit, and you do this:
totalList[0].add(element)
You're trying to get the first element of that empty list and add to it. But there is no first element (index 0) because the list is empty (length 0). That's what the error is telling you.
There's lots of ways to handle this - one thing you could do is create your lists up-front:
// create the 16 list items you want to access in the loop
// (the number is the item count, the lambda generates each item)
val totalList = MutableList(16) { mutableListOf<Int>() }
// then refer to that list's properties in your loop (no hardcoded 0..15)
for (i in totalList.indices) {
...
// guaranteed to exist since i is generated from the list's indices
totalList[i].add(element)
}
Or you could do it the way you are now, only using getOrElse to generate the empty list on-demand, when you try to get it but it doesn't exist:
for (i in 0..15) {
for (j in 0..10) {
// if the element at i doesn't exist, create a list instead, but also
// add it to the main list (see below)
totalList.getOrElse(i) {
mutableListOf<Int>().also { totalList.add(it) }
}.add(element)
}
}
Personally I don't really like this, you're using explicit indices but you're adding new list items to the end of the main list. That implicity requires that you're iterating over the list items in order - which you are here, but there's nothing enforcing that. If the order ever changed, it would break.
I'd prefer the first approach - create your structure of lists in advance, then iterate over those and fill them as necessary. Or you might want to consider arrays instead, since you have a fixed collection size you're "completing" by adding items to specific indices
Another approach (that I mentioned in the comments) is to create each list as a whole, complete thing, and then add that to your main list. This is generally how you do things in Kotlin - the standard library contains a lot of functional tools to allow you to chain operations together, transform things, and create immutable collections (which are safer and more explicit about whether they're meant to be changed or they're a fixed set of data).
for (i in 0..15) {
// map transforms each element of the range (each number) to an item,
// resulting in a list of items
val items = (0..10).map { j ->
// do whatever you're doing
// the last expression in the lambda is its resulting value,
// i.e. the item that ends up in the list
element
}
// now you have a complete list of items, add them to totalList
totalList.add(items)
}
(Or you could create the list directly with List(11) { j -> ... } but this is a more general example of transforming a bunch of things to a bunch of other things)
That example there is kinda half and half - you still have the imperative for loop going on as well. Writing it all using the same approach, you can get:
val totalList = (0..15).map { i ->
(0..10).map { j ->
// do stuff
element
}
}
I'd probably prefer the List(count) { i -> ... } approach for this, it's a better fit (this is a general example). That would also be better since you could use MutableList instead of List, if you really need them to be mutable (with the maps you could just chain .toMutableList() after the mapping function, as another step in the chain). Generally in Kotlin, collections are immutable by default, and this kind of approach is how you build them up without having to create a mutable list etc. and add items to it yourself

Need to filter lines from a sequence but keep it as a single sequence

I need to read a file from disk, filter out some rows based on conditions, then return the result as a single stream/sequence, not a sequence of strings. The file is too large to hold in memory all at once, so it must be treated as a Stream/Sequence throughout processing. This is what I tried.
File(filename).bufferedReader()
// break into lines
.lineSequence()
// filter each line based on condition
.filter{meetsSomeCondition(it)}
// add newline back in
.map{(it+"\n").byteInputStream()}
// reduce back into a single stream with Java's SequenceInputStream
.reduce<InputStream, ByteArrayInputStream> { acc, i -> SequenceInputStream(acc, i) }
This works when testing on a small file, but when using a large file it errors with a StackOverFlow exception. It seems that Java's SequenceInputStream can't handle repeatedly nesting itself like I do with the reduce call.
I see that SequenceInputStream also has a way of accepting an Enumeration argument that takes a List of elements. But that's the problem, as far as I can tell, it doesn't seem to accept a Stream.
Your code does not really do what you think it does. reduce() is a terminal operation, meaning that it consumes all elements in the sequence. After reduce() line the whole file has been read already. Also, it is not SequenceInputStream that does not support such reduce operation. You created a very long chain of objects. SequenceInputStream objects does not really know they were chained like this and they can't do too much about it.
Instead, you need to keep a sequence "alive" and create an InputStream that will read from the source sequence whenever required. I don't think there is an utility like this in the stdlib. It is a very specialized requirement.
The easiest is to create a sequence of bytes and then provide InputStream which reads from it:
File(filename).bufferedReader()
.lineSequence()
.filter{meetsSomeCondition(it)}
.flatMap { "$it\n".toByteArray().asIterable() }
.asInputStream()
fun Sequence<Byte>.asInputStream() = object : InputStream() {
val iter = iterator()
override fun read() = if (iter.hasNext()) iter.next().toUByte().toInt() else -1
}
However, sequence of bytes isn't really the best for performance. We can optimize it by reading line by line, so creating a sequence of strings or byte arrays:
File(filename).bufferedReader()
.lineSequence()
.filter{meetsSomeCondition(it)}
.map { "$it\n".toByteArray() }
.asInputStream()
fun Sequence<ByteArray>.asInputStream() = object : InputStream() {
val iter = iterator()
var curr = iter.next()
var pos = 0
override fun read(): Int {
return when {
pos < curr.size -> curr[pos++].toUByte().toInt()
!iter.hasNext() -> -1
else -> {
curr = iter.next()
pos = 0
read()
}
}
}
}
(Note this implementation of asInputStream() will fail for empty sequence)
Still, there is much room for improvement regarding the performance. We read from sequence line by line, but we still read from InputStream byte by byte. To improve it further we would need to implement more methods of InputStream to read in bigger chunks. If you really care about the performance then I suggest looking into BufferedInputStream implementation and try to re-use some of its codebase.
Also, remember to close the file reader that was created in the first step. It won't close automatically when InputStream will be closed.

How to repeat Mono while not empty

I have a method which returns like this!
Mono<Integer> getNumberFromSomewhere();
I need to keep calling this until it has no more items to emit. That is I need to make this as Flux<Integer>.
One option is to add repeat. the point is - I want to stop when the above method emits the first empty signal.
Is there any way to do this? I am looking for a clean way.
A built-in operator that does that (although it is intended for "deeper" nesting) is expand.
expand naturally stops expansion when the returned Publisher completes empty.
You could apply it to your use-case like this:
//this changes each time one subscribes to it
Mono<Integer> monoWithUnderlyingState;
Flux<Integer> repeated = monoWithUnderlyingState
.expand(i -> monoWithUnderlyingState);
I'm not aware of a built-in operator which would do the job straightaway. However, it can be done using a wrapper class and a mix of operators:
Flux<Integer> repeatUntilEmpty() {
return getNumberFromSomewhere()
.map(ResultWrapper::new)
.defaultIfEmpty(ResultWrapper.EMPTY)
.repeat()
.takeWhile(ResultWrapper::isNotEmpty)
}
// helper class, not necessarily needs to be Java record
record ResultWrapper(Integer value) {
public static final ResultWrapper EMPTY = new ResultWrapper(null);
public boolean isNotEmpty() {
return value != null;
}
}

Async Wait Efficient Execution

I need to iterate 100's of ids in parallel and collect the result in list. I am trying to do it in following way
val context = newFixedThreadPoolContext(5, "custom pool")
val list = mutableListOf<String>()
ids.map {
val result:Deferred<String> = async(context) {
getResult(it)
}
//list.add(result.await()
}.mapNotNull(result -> list.add(result.await())
I am getting error at
mapNotNull(result -> list.add(result.await())
as await method is not available. Why await is not applicable at this place? Instead commented line
//list.add(result.await()
is working fine.
What is the best way to run this block in parallel using coroutine with custom thread pool?
Generally, you go in the right direction: you need to create a list of Deferred and then await() on them.
If this is exactly the code you are using then you did not return anything from your first map { } block, so you don't get a List<Deferred> as you expect, but List<Unit> (list of nothing). Just remove val result:Deferred<String> = - this way you won't assign result to a variable, but return it from the lambda. Also, there are two syntactic errors in the last line: you used () instead of {} and there is a missing closing parenthesis.
After these changes I believe your code will work, but still, it is pretty weird. You seem to mix two distinct approaches to transform a collection into another. One is using higher-order functions like map() and another is using a loop and adding to a list. You use both of them at the same time. I think the following code should do exactly what you need (thanks #Joffrey for improving it):
val list = ids.map {
async(context) {
getResult(it)
}
}.awaitAll().filterNotNull()

How can I call a sequence.nextVal using kotlin exposed

We have a project where we use a Postgres sequence for generating an increasing number, but I cannot figure out how to actually use the sequence in kotlin exposed.
I see there is a Sequence class and a NextVal class encapsulating a sequence but those cannot be used by its own as far as I can see. I thought I could use Sequence.nextLongVal() but this one returns the NextVal class, no way to get the through value out of this one.
So how can I get the value of the nextVal() execution?
We stumbled across the same problem trying to utilize Sequence.nextLongVal() directly using Postgre and exposed. We found the following workaround.
A solution using exec
Assuming we have defined and created a sequence in our datasource:
val sequence = Sequence(/* our sequence's parameters */)
...
transaction {
SchemaUtils.createSequence(sequence)
}
We suggest to define a helper function to retrieve the next value of a given sequence using exposed's exec.
fun Transaction.nextValueOf(sequence: Sequence): Long = exec("SELECT nextval('${sequence.identifier}');") { resultSet ->
if (resultSet.next().not()) {
throw Error("Missing nextValue in resultSet of sequence '${sequence.identifier}'")
}
else {
resultSet.getLong(1)
}
} ?: throw Error("Unable to get nextValue of sequence '${sequence.identifier}'")
Now, we can use this function in a transaction as shown here:
transaction {
...
val nextValue = nextValueOf(sequence)
...
}