How to group a list by a value? - kotlin

I have a list and I want to make an object of all the values that contain a certain id.
val list = listOf(A(id = 1), A(1), A(2), A(3), A(2))
From that list I want to make a new list of type Container
data class Container(
val id: Long // The id passed into A.
val elementList: List<A> // The elements that contain the ID.
)
How can I do this in an efficient way without going O(n^2)?

You can use groupBy + map.
The implementation of groupBy is O(n) and the implementation of map is O(n) so the total runtime is O(2n) which is O(n).
list.groupBy { it.id }.map { (id, elementList) -> Container(id, elementList) }
Since this is so short and readable, I'd avoid to make further optimizations if not strictly needed but, if you'd need some further optimizations, you can reduce also the space cost avoiding to allocate multiple lists for example.

Related

What is the most efficient way to join one list to another in kotlin?

I start with a list of integers from 1 to 1000 in listOfRandoms.
I would like to left join on random from the createDatabase list.
I am currently using a find{} statement within a loop to do this but feel like this is too heavy. Is there not a better (quicker) way to achieve same result?
Psuedo Code
data class DatabaseRow(
val refKey: Int,
val random: Int
)
fun main() {
val createDatabase = (1..1000).map { i -> DatabaseRow(i, Random()) }
val listOfRandoms = (1..1000).map { j ->
val lookup = createDatabase.find { it.refKey == j }
lookup.random
}
}
As mentioned in comments, the question seems to be mixing up database and programming ideas, which isn't helping.
And it's not entirely clear which parts of the code are needed, and which can be replaced. I'm assuming that you already have the createDatabase list, but that listOfRandoms is open to improvement.
The ‘pseudo’ code compiles fine except that:
You don't give an import for Random(), but none of the likely ones return an Int. I'm going to assume that should be kotlin.random.Random.nextInt().
And because lookup is nullable, you can't simply call lookup.random; a quick fix is lookup!!.random, but it would be safer to handle the null case properly with e.g. lookup?.random ?: -1. (That's irrelevant, though, given the assumption above.)
I think the general solution is to create a Map. This can be done very easily from createDatabase, by calling associate():
val map = createDatabase.associate{ it.refKey to it.random }
That should take time roughly proportional to the size of the list. Looking up values in the map is then very efficient (approx. constant time):
map[someKey]
In this case, that takes rather more memory than needed, because both keys and values are integers and will be boxed (stored as separate objects on the heap). Also, most maps use a hash table, which takes some memory.
Since the key is (according to comments) “an ascending list starting from a random number, like 18123..19123”, in this particular case it can instead be stored in an IntArray without any boxing. As you say, array indexes start from 0, so using the key directly would need a huge array and use only the last few cells — but if you know the start key, you could simply subtract that from the array index each time.
Creating such an array would be a bit more complex, for example:
val minKey = createDatabase.minOf{ it.refKey }
val maxKey = createDatabase.maxOf{ it.refKey }
val array = IntArray(maxKey - minKey + 1)
for (row in createDatabase)
array[row.refKey - minKey] = row.random
You'd then access values with:
array[someKey - minKey]
…which is also constant-time.
Some caveats with this approach:
If createDatabase is empty, then minOf() will throw a NoSuchElementException.
If it has ‘holes’, omitting some keys inside that range, then the array will hold its default value of 0 — you can change that by using the alternative IntArray constructor which also takes a lambda giving the initial value.)
Trying to look up a value outside that range will give an ArrayIndexOutOfBoundsException.
Whether it's worth the extra complexity to save a bit of memory will depend on things like the size of the ‘database’, and how long it's in memory for; I wouldn't add that complexity unless you have good reason to think memory usage will be an issue.

Efficient way of Square of a Sorted Array

I am solving leetcode solution. The question is
Given an integer array nums sorted in non-decreasing order, return an array of the squares of each number sorted in non-decreasing order.
Example 1:
Input: nums = [-4,-1,0,3,10]
Output: [0,1,9,16,100]
Explanation: After squaring, the array becomes [16,1,0,9,100].
After sorting, it becomes [0,1,9,16,100].
Example 2:
Input: nums = [-7,-3,2,3,11]
Output: [4,9,9,49,121]
I solved this through map and then use sorted() for sorting purpose and lastly converted in toIntArray().
My solution
class Solution {
fun sortedSquares(nums: IntArray): IntArray {
return nums.map { it * it }.sorted().toIntArray()
}
}
After all I am taking a look in the discuss success, I found this solution
class Solution {
fun sortedSquares(A: IntArray): IntArray {
// Create markers to use to navigate inward since we know that
// the polar ends are (possibly, but not always) the largest
var leftMarker = 0
var rightMarker = A.size - 1
// Create a marker to track insertions into the new array
var resultIndex = A.size - 1
val result = IntArray(A.size)
// Iterate over the items until the markers reach each other.
// Its likely a little faster to consider the case where the left
// marker is no longer producing elements that are less than zero.
while (leftMarker <= rightMarker) {
// Grab the absolute values of the elements at the respective
// markers so they can be compared and inserted into the right
// index.
val left = Math.abs(A[leftMarker])
val right = Math.abs(A[rightMarker])
// Do checks to decide which item to insert next.
result[resultIndex] = if (right > left) {
rightMarker--
right * right
} else {
leftMarker++
left * left
}
// Once the item is inserted we can update the index we want
// to insert at next.
resultIndex--
}
return result
}
}
The guy also mention in the title Kotlin -- O(n), 95% time, 100% space
So my solution is equal in time and space complexity with other solution with efficient time and space? Or Is there any better solution?
So my solution is equal in time and space complexity with other solution with efficient time and space?
No, your solution runs in O(n log n) time, as it relies on sorted(), which likely runs in O(n log n). Since the alternative solution does not sort the items, it indeed runs on O(n) time. Both solutions use O(n) space, although your solution uses three times as much space (each of map, sorted and toIntArray create a copy of the input).

How to convert two lists into a list of pairs based on some predicate

I have a data class that describes a chef by their name and their skill level and two lists of chefs with various skill levels.
data class Chef(val name: String, val level: Int)
val listOfChefsOne = listOf(
Chef("Remy", 9),
Chef("Linguini", 7))
val listOfChefsTwo = listOf(
Chef("Mark", 6),
Chef("Maria", 8))
I'm to write a function that takes these two lists and creates a list of pairs
so that the two chefs in a pair skill level's add up to 15. The challenge is to do this using only built in list functions and not for/while loops.
println(pairChefs(listOfChefsOne, listOfChefsTwo))
######################################
[(Chef(name=Remy, level=9), Chef(name=Mark, level=6)),
(Chef(name=Linguini, level=7), Chef(name=Maria, level=8))]
As I mentioned previously I'm not to use any for or while loops in my implementation for the function. I've tried using the forEach function to create a list containing all possible pairs between two lists, but from there I've gotten lost as to how I can filter out only the correct pairs.
I think the clue is in the question here!
I've tried using the forEach function to create a list containing all possible pairs between two lists, but from there I've gotten lost as to how I can filter out only the correct pairs.
There's a filter function that looks perfect for this…
To keep things clear, I'll split out a function for generating all possible pairs.  (This is my own, but bears a reassuring resemblance to part of this answer!  In any case, you said you'd already solved this bit.)
fun <A, B> Iterable<A>.product(other: Iterable<B>)
= flatMap{ a -> other.map{ b -> a to b }}
The result can then be:
val result = listOfChefsOne.product(listOfChefsTwo)
.filter{ (chef1, chef2) -> chef1.level + chef2.level == 15 }
Note that although this is probably the simplest and most readable way, it's not the most efficient for large lists.  (It takes time and memory proportional to the product of the sizes of the two lists.)  You could improve large-scale performance by using streams (which would take the same time but constant memory). But for this particular case, it might be even better to group one of the lists by level, then for each element of the other list, you could directly look up a Chef with 15 - its level.  (That would time proportional to the sum of the sizes of the two lists, and space proportional to the size of the first list.)
Here is the pretty simple naive solution:
val result = listOfChefsOne.flatMap { chef1 ->
listOfChefsTwo.mapNotNull { chef2 ->
if (chef1.level + chef2.level == 15) {
chef1 to chef2
} else {
null
}
}
}
println(result) // prints [(Chef(name=Remy, level=9), Chef(name=Mark, level=6)), (Chef(name=Linguini, level=7), Chef(name=Maria, level=8))]

Kotlin: Why is Sequence more performant in this example?

Currently, I am looking into Kotlin and have a question about Sequences vs. Collections.
I read a blog post about this topic and there you can find this code snippets:
List implementation:
val list = generateSequence(1) { it + 1 }
.take(50_000_000)
.toList()
measure {
list
.filter { it % 3 == 0 }
.average()
}
// 8644 ms
Sequence implementation:
val sequence = generateSequence(1) { it + 1 }
.take(50_000_000)
measure {
sequence
.filter { it % 3 == 0 }
.average()
}
// 822 ms
The point here is that the Sequence implementation is about 10x faster.
However, I do not really understand WHY that is. I know that with a Sequence, you do "lazy evaluation", but I cannot find any reason why that helps reducing the processing in this example.
However, here I know why a Sequence is generally faster:
val result = sequenceOf("a", "b", "c")
.map {
println("map: $it")
it.toUpperCase()
}
.any {
println("any: $it")
it.startsWith("B")
}
Because with a Sequence you process the data "vertically", when the first element starts with "B", you don't have to map for the rest of the elements. It makes sense here.
So, why is it also faster in the first example?
Let's look at what those two implementations are actually doing:
The List implementation first creates a List in memory with 50 million elements.  This will take a bare minimum of 200MB, since an integer takes 4 bytes.
(In fact, it's probably far more than that.  As Alexey Romanov pointed out, since it's a generic List implementation and not an IntList, it won't be storing the integers directly, but will be ‘boxing’ them — storing references to Int objects.  On the JVM, each reference could be 8 or 16 bytes, and each Int could take 16, giving 1–2GB.  Also, depending how the List gets created, it might start with a small array and keep creating larger and larger ones as the list grows, copying all the values across each time, using more memory still.)
Then it has to read all the values back from the list, filter them, and create another list in memory.
Finally, it has to read all those values back in again, to calculate the average.
The Sequence implementation, on the other hand, doesn't have to store anything!  It simply generates the values in order, and as it does each one it checks whether it's divisible by 3 and if so includes it in the average.
(That's pretty much how you'd do it if you were implementing it ‘by hand’.)
You can see that in addition to the divisibility checking and average calculation, the List implementation is doing a massive amount of memory access, which will take a lot of time.  That's the main reason it's far slower than the Sequence version, which doesn't!
Seeing this, you might ask why we don't use Sequences everywhere…  But this is a fairly extreme example.  Setting up and then iterating the Sequence has some overhead of its own, and for smallish lists that can outweigh the memory overhead.  So Sequences only have a clear advantage in cases when the lists are very large, are processed strictly in order, there are several intermediate steps, and/or many items are filtered out along the way (especially if the Sequence is infinite!).
In my experience, those conditions don't occur very often.  But this question shows how important it is to recognise them when they do!
Leveraging lazy-evaluation allows avoiding the creation of intermediate objects that are irrelevant from the point of the end goal.
Also, the benchmarking method used in the mentioned article is not super accurate. Try to repeat the experiment with JMH.
Initial code produces a list containing 50_000_000 objects:
val list = generateSequence(1) { it + 1 }
.take(50_000_000)
.toList()
then iterates through it and creates another list containing a subset of its elements:
.filter { it % 3 == 0 }
... and then proceeds with calculating the average:
.average()
Using sequences allows you to avoid doing all those intermediate steps. The below code doesn't produce 50_000_000 elements, it's just a representation of that 1...50_000_000 sequence:
val sequence = generateSequence(1) { it + 1 }
.take(50_000_000)
adding a filtering to it doesn't trigger the calculation itself as well but derives a new sequence from the existing one (3, 6, 9...):
.filter { it % 3 == 0 }
and eventually, a terminal operation is called that triggers the evaluation of the sequence and the actual calculation:
.average()
Some relevant reading:
Kotlin: Beware of Java Stream API Habits
Kotlin Collections API Performance Antipatterns

Kotlin flatMap - map

Say I have a list of size 30k elements, and I would like to perform an operation on all possible pairs within a list. So I had:
list.asSequence().flatMap { i ->
list.asSequence().map { j -> /* perform operation here */ }
}
Question 1:
Is there anything that I can use as an alternative? (Such as applicative functors).
I also noticed that this flatMap-map operation is significantly slower than the imperative loop version. (perhaps due to closures?)
for(i in list){
for(j in list){
}
}
Question 2: Is there a way to improve the performance of the flatMap/map version?
Some alternatives with performance impacts:
com.google.common.collect.Sets.cartesianProduct(java.util.Set...): "Returns every possible list that can be formed by choosing one element from each of the given sets in order; the 'n-ary Cartesian product' of the sets."
This requires your list elements to be unique. If they're not then you'd have to wrap each element in a unique object so that they can all be added to the input set.
In my testing, however, I've found it to be slower than the flatMap/map solution. :-(
forEach/forEach: As you simply want to perform an operation on each pair then you don't actually need to use flatMap or map to transform the list so you can use forEach/forEach instead:
list.forEach { i ->
list.forEach { j -> /* perform operation here */ }
}
In my testing I've found this to be slightly faster than the for/for solution. :-)
If you do need to transform the list then your flatMap/map solutions appears to be the best solution.
Answering to the question 2, we're considering to add flatMap overload which doesn't create closures for each element in the outer collection/sequence: https://youtrack.jetbrains.com/issue/KT-8602
But in case if you want to perform some side effects on each pair, rather than transforming the sequence, I'd advice to stick with for-loops or inlined forEach lambdas, which is effectively the same.