Kotlin is filtering list before getting max value best apporach? - kotlin

If you have a list of objects object(category, value) and want to get the max value excluding some categories you can use something like this:
val max = objects.filter { it.category.name != xy }.maxByOrNull { it.value }
But this uses 2 iterators if I understand it correctly, so would there be a more performant version of this call using only one iterator?

That's correct. This code will first iterate over the whole list to filter the result, and then again to find the max.
I'll detail some alternatives below, but, overall, I would recommend against any of them without a very good justification. The performance benefits are usually marginal - spending time making sure the code is clear and well tested will be a better investment. I'd recommend sticking to your existing code.
Example
Here's an executable version of your code.
fun main() {
val items = listOf(
MyData("shoes", 1),
MyData("shoes", 22),
MyData("shoes", 33),
MyData("car", 555),
)
val max = items
.filter {
println(" filter $it")
it.categoryName == "shoes"
}.maxByOrNull {
println(" maxByOrNull $it")
it.value
}
println("\nresult: $max")
}
There are two iterations, and each are run twice.
filter MyData(categoryName=shoes, value=1)
filter MyData(categoryName=shoes, value=22)
filter MyData(categoryName=shoes, value=33)
filter MyData(categoryName=car, value=555)
maxByOrNull MyData(categoryName=shoes, value=1)
maxByOrNull MyData(categoryName=shoes, value=22)
maxByOrNull MyData(categoryName=shoes, value=33)
result: MyData(categoryName=shoes, value=33)
Sequences
In some situations you can use sequences to reduce the number of operations.
val max2 = items
.asSequence()
.filter {
println(" filter $it")
it.categoryName == "shoes"
}.maxByOrNull {
println(" maxByOrNull $it")
it.value
}
println("\nresult: $max2")
As you can see, the order of operations is different
filter MyData(categoryName=shoes, value=1)
filter MyData(categoryName=shoes, value=22)
maxByOrNull MyData(categoryName=shoes, value=1)
maxByOrNull MyData(categoryName=shoes, value=22)
filter MyData(categoryName=shoes, value=33)
maxByOrNull MyData(categoryName=shoes, value=33)
filter MyData(categoryName=car, value=555)
result: MyData(categoryName=shoes, value=33)
[S]equences let you avoid building results of intermediate steps, therefore improving the performance of the whole collection processing chain.
Note that in this small example, the benefits aren't worth it.
However, the lazy nature of sequences adds some overhead which may be significant when processing smaller collections or doing simpler computations.
Combining operations
In your small example you could combine the 'filterr' and 'maxBy' operations
val max3 = items.maxByOrNull { data ->
when (data.categoryName) {
"shoes" -> data.value
"car" -> -1
else -> -1
}
}
println("\nresult: $max3")
result: MyData(categoryName=shoes, value=33)
I hope it's clear that this solution isn't immediately understandable, and has some nasty edge cases that would be a prime source of bugs. I won't detail the issues, but instead re-iterate that ease-of-use, adaptability, and simple code is usually much more valuable than optimised code!

Related

How to make an ordered list of boolean array permutations with a given number of trues

Is there an efficient method to generate all possible arrays of booleans with a given number of "true" values?
Right now I'm incrementing a number and checking if its binary representation has the given number of 1s (and if so, adding that array). But this becomes extremely slow for larger givens.
This is the kind of input-output that I'm looking for:
(length: 4, trues: 2) -> [[1,1,0,0],[1,0,1,0],[0,1,1,0],[1,0,0,1],[0,1,0,1],[0,0,1,1]]
The trouble is doing it in less than O(2^N), and so that they're ordered as the little endian binary representations would be.
If it helps the length would be a fixed number at compile time (currently it's 64). I wrote it as an input because I might have to increase it to 128, but it won't vary during runtime.
You can define a recursive solution to this problem.
fn solution(length: u32, trues: u32) -> Vec<Vec<bool>>;
How do we formulate this function recursively? Let's think about the last element of the output arrays. If that last element is false, then the amount of true elements in the 0..length-1 elements must be trues. If that last element is true, then the amount of true elements in the 0..length-1 elements must be trues-1.
So we can just answer the problem for (length-1, trues) and then extend them all with false, and answer the problem for (length-1, trues-1) and extend them all with true, and then we can combine the result (putting the ends-with-true case first for little endian ordering). Adding in some base cases, we get the following code:
fn solution(length: u32, trues: u32) -> Vec<Vec<bool>> {
if trues > length {
// no candidate arrays exist
return vec![];
}
if length == 0 {
// one array exists: the empty array
return vec![vec![]];
}
if trues == 0 {
// one array exists: the all-false array
return vec![vec![false; length as usize]];
}
let mut result = Vec::new();
for mut ones in solution(length-1, trues-1) {
ones.push(true);
result.push(ones);
}
for mut zeroes in solution(length-1, trues) {
zeroes.push(false);
result.push(zeroes);
}
result
}
If the time complexity for solution(L, T) is O(S(L,T)), then the time complexity of solution(L,T) can be recursively expressed as O(S(L-1,T-1) + S(L-1,T)), with the base cases S(0,0) = 1, S(L,L+1)=1, S(L,0) = L. This is the best achievable time complexity, since S(L,T) = S(L-1,T-1) + S(L-1,T) is also the recursive formula for how many arrays of length L and true-count T exist. This is also the same recurrence formula as the binomial recurrence equation, but the base cases are different. I will leave computing the time complexity as an exercise to the reader, and there are a couple of trivial optimizations to the above code that can be done to bring down the computation time further.

Kotlin: Split Sequence<T> by N items into Sequence<Sequence<T>>?

How to "take(N)" iteratively - get a Sequence<Sequence>, each inner sequences having next N elements?
I am writing a high-load application in Kotlin.
I have tens of thousands of entries to insert to a database.
I want to batch them by, say, 1000.
So I created a loop:
val itemsSeq = itemsList.iterator().asSequence()
while (true) {
log.debug("Taking $BATCH_SIZE from $itemsSeq")
val batchSeq = itemsSeq.take(BATCH_SIZE)
val squareBatch = applySomething(batchSeq, something)
?: break
}
fun applySomething(batch: Sequence<Item>, something: Something) {
/* Fully consumes batch. Bulk-loads from DB by IDs, applies, bulk-saves. */
}
I thought that take() would advance the itemsSeq and the next call to take() would give me a sequence "view" of itemsSeq starting at the 10th item.
But with this code, I am getting:
DEBUG Taking 10 from kotlin.sequences.ConstrainedOnceSequence#53fe15ff
Exception in thread "main" java.lang.IllegalStateException: This sequence can be consumed only once.
at kotlin.sequences.ConstrainedOnceSequence.iterator(SequencesJVM.kt:23)
at kotlin.sequences.TakeSequence$iterator$1.<init>(Sequences.kt:411)
at kotlin.sequences.TakeSequence.iterator(Sequences.kt:409)
So it seems that the take() "opens" the itemsSeq again, while that can be consumed only once.
As a workaround, I can use chunked():
public fun <T> Sequence<T>.chunked(size: Int): Sequence<List<T>> {
But I would prefer not to create Lists, rather Sequences.
What I am looking for is something between take() and chunked().
Is there anything such in Kotlin SDK?
I can possibly create my own sequence { ... } but for readability, I would prefer something built-in.
There is a way to construct a Sequence by handing it over an Iterator, see Sequence.
Given an iterator function constructs a Sequence that returns values
through the Iterator provided by that function. The values are
evaluated lazily, and the sequence is potentially infinite.
Wrapped in an extension function it could look like this:
fun <T> Iterable<T>.toValuesThroughIteratorSequence(): Sequence<T> {
val iterator = this.iterator()
return Sequence { iterator }
}
Quick test:
data class Test(val id: Int)
val itemsList = List(45) { Test(it) }
val batchSize = 10
val repetitions = itemsList.size.div(batchSize) + 1
val itemsSeq = itemsList.toValuesThroughIteratorSequence()
(0 until repetitions).forEach { index ->
val batchSeq = itemsSeq.take(batchSize)
println("Batch no. $index: " + batchSeq.map { it.id.toString().padStart(2, ' ') }.joinToString(" "))
}
Output:
Batch no. 0: 0 1 2 3 4 5 6 7 8 9
Batch no. 1: 10 11 12 13 14 15 16 17 18 19
Batch no. 2: 20 21 22 23 24 25 26 27 28 29
Batch no. 3: 30 31 32 33 34 35 36 37 38 39
Batch no. 4: 40 41 42 43 44
Background
First of all, we need to be aware there is a big difference between an object that we can iterate over and object that represents a "live" or already running iteration process. First group means Iterable (so List, Set and all other collections), Array, Flow, etc. Second group is mostly Iterator or old Java Enumeration. The difference could be also compared to file vs file pointer when reading or database table vs database cursor.
Sequence belongs to the first group. Sequence object does not represent a live, already started iteration, but just a set of elements. These elements can be produced lazily, sequence could have unbounded size and usually internally it works by using iterators, but conceptually sequence is not an iterator itself.
If we look into the documentation about sequences it clearly compares them to Iterable, not to Iterator. All standard ways to construct sequences like: sequenceOf(), sequence {}, Iterable.asSequence() produce sequences that return the same list of items every time we iterate over them. Iterator.asSequence() also follows this pattern, but because it can't re-produce same items twice, it is intentionally protected against iterating multiple times:
public fun <T> Iterator<T>.asSequence(): Sequence<T> = Sequence { this }.constrainOnce()
Problem
Your initial attempt with using take() didn't work, because this is a misuse of sequences. We expect that subsequent take() calls on the same sequence object will produce exactly the same items (usually), not next items. Similarly as we expect multiple take() calls on a list always produce same items, each time starting from the beginning.
Being more specific, your error was caused by above constrainOnce(). When we invoke take() multiple times on a sequence, it has to restart from the beginning, but it can't do this if it was created from an iterator, so Iterator.asSequence() explicitly disallows this.
Simple solution
To fix the problem, you can just skip constrainOnce() part, as suggested by #lukas.j. This solution is nice, because stdlib already provides tools like Sequence.take(), so if used carefully, this is the easiest to implement and it just works.
However, I personally consider this a kind of workaround, because the resulting sequence doesn't behave as sequences do. It is more like an iterator on steroids than a real sequence. You need to be careful when using this sequence with existing operators or 3rd party code, because such sequence may work differently than they expect and as a result, you may get incorrect results.
Advanced solution
We can follow your initial attempt of using subsequent take() calls. In this case our object is used for live iteration, so it is no longer a proper sequence, but rather an iterator. The only thing we miss in stdlib is a way to create a sub-iterator with a single chunk. We can implement it by ourselves:
fun main() {
val list = (0 .. 25).toList()
val iter = list.iterator()
while (iter.hasNext()) {
val chunk = iter.limited(10)
println(chunk.asSequence().toList())
}
}
fun <T> Iterator<T>.limited(n: Int): Iterator<T> = object : Iterator<T> {
var left = n
val iterator = this#limited
override fun next(): T {
if (left == 0)
throw NoSuchElementException()
left--
return iterator.next()
}
override fun hasNext(): Boolean {
return left > 0 && iterator.hasNext()
}
}
I named it limited(), because take() suggests we read items from the iterator. Instead, we only create another iterator on top of the provided iterator.
Of course, sequences are easier to use than iterators and typical solution to this problem is by using chunked(). With above limited() it is pretty straightforward to implement chunkedAsSequences():
fun main() {
val list = (0 .. 25).toList()
list.asSequence()
.chunkedAsSequences(10)
.forEach { println(it.toList()) }
}
fun <T> Sequence<T>.chunkedAsSequences(size: Int): Sequence<Sequence<T>> = sequence {
val iter = iterator()
while (iter.hasNext()) {
val chunk = iter.limited(size)
yield(chunk.asSequence())
chunk.forEach {} // required if chunk was not fully consumed
}
}
Please also note there is a tricky case of chunk being not fully consumed. chunkedAsSequences() is protected against this scenario. Previous simpler solutions aren't.

What is the most efficient way to generate random numbers from a union of disjoint ranges in Kotlin?

I would like to generate random numbers from a union of ranges in Kotlin. I know I can do something like
((1..10) + (50..100)).random()
but unfortunately this creates an intermediate list, which can be rather expensive when the ranges are large.
I know I could write a custom function to randomly select a range with a weight based on its width, followed by randomly choosing an element from that range, but I am wondering if there is a cleaner way to achieve this with Kotlin built-ins.
Suppose your ranges are nonoverlapped and sorted, if not, you could have some preprocessing to merge and sort.
This comes to an algorithm choosing:
O(1) time complexity and O(N) space complexity, where N is the total number, by expanding the range object to a set of numbers, and randomly pick one. To be compact, an array or list could be utilized as the container.
O(M) time complexity and O(1) space complexity, where M is the number of ranges, by calculating the position in a linear reduction.
O(M+log(M)) time complexity and O(M) space complexity, where M is the number of ranges, by calculating the position using a binary search. You could separate the preparation(O(M)) and generation(O(log(M))), if there are multiple generations on the same set of ranges.
For the last algorithm, imaging there's a sorted list of all available numbers, then this list can be partitioned into your ranges. So there's no need to really create this list, you just calculate the positions of your range s relative to this list. When you have a position within this list, and want to know which range it is in, do a binary search.
fun random(ranges: Array<IntRange>): Int {
// preparation
val positions = ranges.map {
it.last - it.first + 1
}.runningFold(0) { sum, item -> sum + item }
// generation
val randomPos = Random.nextInt(positions[ranges.size])
val found = positions.binarySearch(randomPos)
// binarySearch may return an "insertion point" in negative
val range = if (found < 0) -(found + 1) - 1 else found
return ranges[range].first + randomPos - positions[range]
}
Short solution
We can do it like this:
fun main() {
println(random(1..10, 50..100))
}
fun random(vararg ranges: IntRange): Int {
var index = Random.nextInt(ranges.sumOf { it.last - it.first } + ranges.size)
ranges.forEach {
val size = it.last - it.first + 1
if (index < size) {
return it.first + index
}
index -= size
}
throw IllegalStateException()
}
It uses the same approach you described, but it calls for random integer only once, not twice.
Long solution
As I said in the comment, I often miss utils in Java/Kotlin stdlib for creating collection views. If IntRange would have something like asList() and we would have a way to concatenate lists by creating a view, this would be really trivial, utilizing existing logic blocks. Views would do the trick for us, they would automatically calculate the size and translate the random number to the proper value.
I implemented a POC, maybe you will find it useful:
fun main() {
val list = listOf(1..10, 50..100).mergeAsView()
println(list.size) // 61
println(list[20]) // 60
println(list.random())
}
#JvmName("mergeIntRangesAsView")
fun Iterable<IntRange>.mergeAsView(): List<Int> = map { it.asList() }.mergeAsView()
#JvmName("mergeListsAsView")
fun <T> Iterable<List<T>>.mergeAsView(): List<T> = object : AbstractList<T>() {
override val size = this#mergeAsView.sumOf { it.size }
override fun get(index: Int): T {
if (index < 0 || index >= size) {
throw IndexOutOfBoundsException(index)
}
var remaining = index
this#mergeAsView.forEach { curr ->
if (remaining < curr.size) {
return curr[remaining]
}
remaining -= curr.size
}
throw IllegalStateException()
}
}
fun IntRange.asList(): List<Int> = object : AbstractList<Int>() {
override val size = endInclusive - start + 1
override fun get(index: Int): Int {
if (index < 0 || index >= size) {
throw IndexOutOfBoundsException(index)
}
return start + index
}
}
This code does almost exactly the same thing as short solution above. It only does this indirectly.
Once again: this is just a POC. This implementation of asList() and mergeAsView() is not at all production-ready. We should implement more methods, like for example iterator(), contains() and indexOf(), because right now they are much slower than they could be. But it should work efficiently already for your specific case. You should probably test it at least a little. Also, mergeAsView() assumes provided lists are immutable (they have fixed size) which may not be true.
It would be probably good to implement asList() for IntProgression and for other primitive types as well. Also you may prefer varargs version of mergeAsView() than extension function.
As a final note: I guess there are libraries that does this already - probably some related to immutable collections. But if you look for a relatively lightweight solution, it should work for you.

Kotlin - Random numbers without repeating

I have a question, how to prevent random numbers from being repeated.
By the way, can someone explain to me how to sort these random numbers?
override fun onCreate(savedInstanceState: Bundle?) {
super.onCreate(savedInstanceState)
setContentView(R.layout.activity_main)
val textView = findViewById<TextView>(R.id.textView)
val button = findViewById<Button>(R.id.buttom)
button.setOnClickListener {
var liczba = List(6){Random.nextInt(1,69)}
textView.text = liczba.toString()
}
}
There are three basic methods to avoid repeating 'random' numbers. If they don't repeat then they are not really random of course.
with a small range of numbers, randomly shuffle the numbers and pick them in order after the shuffle.
with a medium size range of numbers, record the numbers you have picked, and reject any repeats. This will get slow if you pick a large percentage of the numbers available.
with a very large range of numbers you need something like an encryption: a one-to-one mapping which maps 0, 1, 2, 3 ... to the numbers in the (large) range. For example a 128 bit encryption will give an apparently random ordering of non-repeating 128-bit numbers.
Sequences are a great way to generate streams of data and limit or filter the results.
import kotlin.random.Random
import kotlin.random.nextInt
val randomInts = generateSequence {
// this lambda is the source of the sequence's values
Random.nextInt(1..69)
}
// make the values distinct, so there's no repeated ints
.distinct()
// only fetch 6 values
// Note: It's very important that the source lambda can provide
// this many distinct values! If not, the stream will
// hang, endlessly waiting for more unique values.
.take(6)
// sort the values
.sorted()
// and collect them into a Set
.toSet()
run in Kotlin Playground
To make sure this works, here's a property-based-test using Kotest.
import io.kotest.core.spec.style.FunSpec
import io.kotest.matchers.collections.shouldBeMonotonicallyIncreasing
import io.kotest.matchers.collections.shouldBeUnique
import io.kotest.matchers.collections.shouldHaveSize
import io.kotest.property.Arb
import io.kotest.property.arbitrary.positiveInts
import io.kotest.property.checkAll
import kotlin.random.Random
import kotlin.random.nextInt
class RandomImageLogicTest : FunSpec({
test("expect sequence of random ints is distinct, sorted, and the correct size") {
checkAll(Arb.positiveInts(30)) { limit ->
val randomInts = generateSequence { Random.nextInt(1..69) }
.distinct()
.take(limit)
.sorted()
.toSet()
randomInts.shouldBeMonotonicallyIncreasing()
randomInts.shouldBeUnique()
randomInts.shouldHaveSize(limit)
}
}
})
The test passes!
Test Duration Result
expect sequence of random ints is di... 0.163s passed
val size = 6
val s = HashSet<Int>(size)
while (s.size < size) {
s += Random.nextInt(1,69)
}
I create simple class, in constructor you enter "from" number (minimal possible number) and "to" (maximal posible number), class create list of numbers.
"nextInt()" return random item from collection and remove it.
class RandomUnrepeated(from: Int, to: Int) {
private val numbers = (from..to).toMutableList()
fun nextInt(): Int {
val index = kotlin.random.Random.nextInt(numbers.size)
val number = numbers[index]
numbers.removeAt(index)
return number
}
}
usage:
val r = RandomUnrepeated(0,100)
r.nextInt()
Similar to #IR42's answer, you can do something like this
import kotlin.random.Random
fun getUniqueRandoms() = sequence {
val seen = mutableSetOf<Int>()
while(true) {
val next = Random.nextInt()
// add returns true if it wasn't already in the set - i.e. it's not a duplicate
if (seen.add(next)) yield(next)
}
}
fun main() {
getUniqueRandoms().take(6).sorted().forEach(::println)
}
So getUniqueRandoms creates an independent sequence, and holds its own internal state of which numbers it's produced. For the caller, it's just a basic sequence that produces unique values, and you can consume those however you like.
Like #rossum says, this really depends on how many you're going to produce - if you're generating a lot, or this sequence is really long-lived, that set of seen numbers will get very large over time. Plus it will start to slow down as you get more and more collisions, and have to keep trying to find one that hasn't been seen yet.
But for most situations, this kind of thing is just fine - you'd probably want to benchmark it if you're producing, say, millions of numbers, but for something like 6 it's not even worth worrying about!
You can use Set and MutableSet instead of List:
val mySet = mutableSetOf<Int>()
while (mySet.size < 6)
mySet.add(Random.nextInt(1, 69))

find the average of pairs with the same first element. kotlin

I have a list of pairs in myPairs: List<Pair<String, Double>> and need to calculate the average of each set of pairs with the same first element.
I came up with this code. It finds separately the sum and the count for each group, then divides in by the count in a loop to find the average.
val myAverages = myPairs.groupingBy { it.first }.fold(0.0) { sum, element -> sum + element.second }.toMutableMap()
val myCounts = myPairs.groupingBy { it.first }.eachCount()
for ((myStr, count) in myCounts) {
myAverages[myStr] = myAverages[myStr]!!.div(count)
}
return myAverages
I wonder if there a more elegant/Kotlin-esque way to solve this by using aggregate or fold functions? My solution works but looks really ugly to me.
You can group your data set using groupBy function. Once you have the grouped map then you can map its values (List<Pair<String,Double>>) to the required average using the average function.
var mapOfAverages = myPairs.groupBy { it.first }
.mapValues { it.value.map { pair -> pair.second }.average() }
This will give you a map, where key is first element of your Pairs and value is the average of all the second elements for this particular key.