Kotlin most efficient way sequence to set - kotlin

Hi I've got list of 1330 objects and would like to apply method and obtain set as result.
val result = listOf1330
.asSequence()
.map {
someMethod(it)
}
val resultSet = result.toSet()
It works fine without toSet but if then execution time is about 10 times longer.
I've used sequence to make it work faster and it is but as a result I need list without duplicates (set).
Simply: What is most effective way to convert sequence to set?

val result = listOf1330.mapTo(HashSet()) { someMethod(it) }
It makes less sense to use streams or sequences to implement the transformation - you will need all elements from the collection, not several. The mapTo (and map) functions are inline in Kotlin. It means the code will be substituted into the call site, it will not have lambda created and executed many times. We use mapTo to avoid the second copy of the collection done by the toSet() function.
The .parallelStream() may add more performance, if you like to run the computation in several threads. It is still a good idea to measure how good the load is balanced between threads. The performance may depend on the collection implementation class, on which you call it

If your someObject has a slow implementation of equals() or hashCode(), or gives the same hash code for many objects, then that could account for the delay, and you may be able to improve it.
Otherwise, if the objects are big, the delay may be mostly due to the amount of memory that must be accessed to store them all; if so, that's the price you'll have to pay if you want a set with all those objects in memory.
Sequence.toSet() uses a LinkedHashSet.  You could try providing another Set instance, using e.g. toCollection(HashSet()), to see if that's any faster.  (You wouldn't get the same iteration order, though.)

I agree with gidds answer on HashSet and LinkedHashSet performance.
LinkedHashSet is more expensive for insertions than HashSet;
However, in the above use case, I think we can leverage parallelStream to improve the performance. Under the hood, Kotlin uses the Java parallelStream.
val result: Set<String> = listOf("sdgds", "fdgdfsg", "dsfgsdfg")
.parallelStream()
.map {
someMethod(it)
}.collect(Collectors.toSet())
The Collectors.toSet() uses HashSet. So, we should be ok in insertion performance perspective.

Use distict or distictBy.
val result = sequenceOf("a", "b", "a", "c").distinct()
// -> "a", "b", "c"
// for more complex cases use custom comparator function
val result = getMyObjectsSequence().distinctBy { it.name }
This approach lets keep using sequence without involving explicit Iterables (List, Set, etc.).
Nevertheless, there is no magic, and "distinct" still uses HashSet under the hood and in case of really huge sequence it may cause sufficient memory usage and it must be kept in mind while applying this function.

Related

Kotlin: stream vs sequence - why multiple ways to do the same thing?

Is there a reason why there are multiple ways to do the same thing in Kotlin
val viaSequence = items.asSequence()
.filter { it%2 == 0 }
.map { it*2 }
.toList()
println(viaSequence)
val viaIterable = items.asIterable()
.filter { it%2 == 0 }
.map { it*2 }
.toList()
println(viaIterable)
val viaStream = items.stream()
.filter { it%2 == 0 }
.map { it*2 }
.toList()
println(viaStream)
I know that the following code creates a list on every step, which adds load to the GC, and as such should be avoided:
items.filter { it%2 == 0 }.map { it*2 }
Streams come from Java, where there are no inline functions, so Streams are the only way to use these functional chains on a collection in Java. Kotlin can do them directly on Iterables, which is better for performance in many cases because intermediate Stream objects don't need to be created.
Kotlin has Sequences as an alternative to Streams with these advantages:
They use null to represent missing items instead of Optional. Nullable values are easier to work with in Kotlin because of its null safety features. It's also better for performance to avoid wrapping all the items in the collection.
Some of the operators and aggregation functions are much more concise and avoid having to juggle generic types (compare Sequence.groupBy to Stream.collect).
There are more operators provided for Sequences, which result in performance advantages and simpler code by cutting out intermediate steps.
Many terminal operators are inline functions, so they omit the last wrapper that a Stream would need.
The sequence builder lets you create a complicated lazy sequence of items with simple sequential syntax in a coroutine. Very powerful.
They work back to Java 1.6. Streams require Java 8 or higher. This one is irrelevant for Kotlin 1.5 and higher since Kotlin now requires JDK 8 or higher.
The other answer mentions the advantages that Streams have.
Nice article comparing them here.
One of your three variants is the same as using the List itself:
items.asIterable()
.filter { it%2 == 0 }
Here, you're calling the exact same filter function as if you just called items.filter. The List itself is an Iterable so it's the Iterable filter that is called. This filter looks at all available elements and returns a complete list.
So the question is why we have both streams and sequences.
Streams are part of Java. Many of the terminal operations produce Optional. Other operations, such as toList(), will produce non-null-safe platform types such as List<Integer!>. On the other hand, sequences are native to Kotlin, and they can be used with Kotlin's own compile-time null-safety features. Also, sequences are available in non-JVM variants of Kotlin.
The Kotlin designers probably had to create a new class, because if they had just added new operations to Stream, e.g. as extension functions, they would clash with existing names (e.g. max() returns Optional in Java, and that's not ideal for Kotlin, but choosing a name other than the natural name max wouldn't be ideal either.)
So in most cases you should prefer sequences as they are more Kotlin-idiomatic. However, there are some things Java Streams can do that aren't yet available to sequences (for example, SummaryStatistics, or parallel operations). When you need an operation that is only available in Streams, but you have a Sequence, you can convert the Sequence to a Stream using asStream() (as well as vice versa).
Another advantage of Streams is that you can use primitive streams such as IntStream, to avoid unnecessary boxing/unboxing.

How to create a new list of Strings from a list of Longs in Kotlin? (inline if possible)

I have a list of Longs in Kotlin and I want to make them strings for UI purposes with maybe some prefix or altered in some way. For example, adding "$" in the front or the word "dollars" at the end.
I know I can simply iterate over them all like:
val myNewStrings = ArrayList<String>()
longValues.forEach { myNewStrings.add("$it dollars") }
I guess I'm just getting nitpicky, but I feel like there is a way to inline this or change the original long list without creating a new string list?
EDIT/UPDATE: Sorry for the initial confusion of my terms. I meant writing the code in one line and not inlining a function. I knew it was possible, but couldn't remember kotlin's map function feature at the time of writing. Thank you all for the useful information though. I learned a lot, thanks.
You are looking for a map, a map takes a lambda, and creates a list based on the result of the lambda
val myNewStrings = longValues.map { "$it dollars" }
map is an extension that has 2 generic types, the first is for knowing what type is iterating and the second what type is going to return. The lambda we pass as argument is actually transform: (T) -> R so you can see it has to be a function that receives a T which is the source type and then returns an R which is the lambda result. Lambdas doesn't need to specify return because the last line is the return by default.
You can use the map-function on List. It creates a new list where every element has been applied a function.
Like this:
val myNewStrings = longValues.map { "$it dollars" }
In Kotlin inline is a keyword that refers to the compiler substituting a function call with the contents of the function directly. I don't think that's what you're asking about here. Maybe you meant you want to write the code on one line.
You might want to read over the Collections documentation, specifically the Mapping section.
The mapping transformation creates a collection from the results of a
function on the elements of another collection. The basic mapping
function is
map().
It applies the given lambda function to each subsequent element and
returns the list of the lambda results. The order of results is the
same as the original order of elements.
val numbers = setOf(1, 2, 3)
println(numbers.map { it * 3 })
For your example, this would look as the others said:
val myNewStrings = longValues.map { "$it dollars" }
I feel like there is a way to inline this or change the original long list without creating a new string list?
No. You have Longs, and you want Strings. The only way is to create new Strings. You could avoid creating a new List by changing the type of the original list from List<Long> to List<Any> and editing it in place, but that would be overkill and make the code overly complex, harder to follow, and more error-prone.
Like people have said, unless there's a performance issue here (like a billion strings where you're only using a handful) just creating the list you want is probably the way to go. You have a few options though!
Sequences are lazily evaluated, when there's a long chain of operations they complete the chain on each item in turn, instead of creating an intermediate full list for every operation in the chain. So that can mean less memory use, and more efficiency if you only need certain items, or you want to stop early. They have overhead though so you need to be sure it's worth it, and for your use-case (turning a list into another list) there are no intermediate lists to avoid, and I'm guessing you're using the whole thing. Probably better to just make the String list, once, and then use it?
Your other option is to make a function that takes a Long and makes a String - whatever function you're passing to map, basically, except use it when you need it. If you have a very large number of Longs and you really don't want every possible String version in memory, just generate them whenever you display them. You could make it an extension function or property if you like, so you can just go
fun Long.display() = "$this dollars"
val Long.dollaridoos: String get() = "$this.dollars"
print(number.display())
print(number.dollaridoos)
or make a wrapper object holding your list and giving access to a stringified version of the values. Whatever's your jam
Also the map approach is more efficient than creating an ArrayList and adding to it, because it can allocate a list with the correct capacity from the get-go - arbitrarily adding to an unsized list will keep growing it when it gets too big, then it has to copy to another (larger) array... until that one fills up, then it happens again...

Synchronize access to mutable fields with Kotlin's map delegation

Is this implementation safe to synchronize the access to the public fields/properties?
class Attributes(
private val attrsMap: MutableMap<String, Any?> = Collections.synchronizedMap(HashMap())
) {
var attr1: Long? by attrsMap
var attr2: String? by attrsMap
var attr3: Date? by attrsMap
var attr4: Any? = null
...
}
Mostly.
Because the underlying map is is only accessible via the synchronised wrapper, you can't have any issues caused by individual calls, such as simultaneous gets and/or puts (which is the main cause of race conditions): only one thread can be making such a call, and the Java memory model ensures that the results are then visible to all threads.
You could have race conditions involving a sequence of calls, such as iterating through the map, or a check followed by a modify, if the map could be modified in between.  (That sort of problem can occur even on a single thread.)  But as long as the rest of your class avoided such sequences, and didn't leak a reference to the map, you'd be safe.
And because the types Long, String, and Date are immutable, you can't have any issues with their contents being modified.
That is a concern with the Any parameter, though.  If it stored e.g. a StringBuilder, one thread could be modifying its contents while another was accessing it, with hilarious consequences.  There's not much you can do about that in a wrapper class, though.
By the way, instead of using a synchronised wrapper, you could use a ConcurrentHashMap, which would avoid the synchronisation in most cases (at the cost of a bit more memory).  It also provides many methods which can replace call sequences, such as getOrPut(); it's a really powerful tool for writing high-performance multithreaded code.

Efficiently make a view of (or copy) a subset of a large HashMap in Kotlin

I am trying to create a subhashmap from a huge hashmap without copy the original one.
currently I use this:
val map = hashMapOf<Job, Int>()
val copy = HashMap(map)
listToRemoveFromCopy.forEach { copy.remove(it) }
this cost me around 50% of my current algorithm. Because java is calculating the hash of the job really often.
I only want the map minus the listToRemoveFromCopy in a new variable without removing the listToRemoveFromCopy elements from the original list.
anyone know this?
Thanks for help
First, you need to cache the hashcode for Job because any approach you use will be inefficient if you cannot have a set or a map of Job objects that operate at top speed.
Hopefully, the parts that make it a hashcode are immutable otherwise it should not be used as a key. It is very dangerous to mutate a key hashcode/equals while in use in a map or set. You should cache it on the first call to hashCode() so that you do not incur a cost until then unless you are sure you will always need it.
Then change listToRemoveFromCopy to be a Set so it can be efficiently used in many ways. You need to do the prior step before this.
Now you have multiple options. The most efficient is:
Guava has a utility function Maps.filterKeys which returns a view into a map, and you can create a predicate that works against a Set of the items to remove.
val removeKeys = listToRemoveFromCopy.toSet()
val mapView = Maps.filterKeys(map, Predicates.not(Predicates.in(removeKeys)))
But be sure to note some methods on the view are not very efficient. If you avoid those methods, this will be the top performing option:
Many of the filtered map's methods, such as size(), iterate across every key/value mapping in the underlying map and determine which satisfy the filter. When a live view is not needed, it may be faster to copy the filtered map and use the copy.
If you need to make a copy instead, you have a few approaches:
Use filterKeys on the map to create a new map in one shot. This is good if the remove list might be a larger percentage of the total keys.
val removeKeys = listToRemoveFromCopy.toSet()
val newMap = map.filterKeys { it !in removeKeys }
Another tempting option you should be careful about is the minus - operator which copies the full map and then removes the items. It can use the listToRemoveFromCopy as-is without it being a set, but the full map copy might undo the benefit. So do not do this unless the remove list is a small percentage of keys.
val newMapButSlower = map - listToRemoveFromCopy
You could pick one model over the other depending on the ratio between map size and remove list size, find a breaking point that works for your "huge".
Implementing your own view into the map to avoid a copy is possible, but not trivial (and by that I mean very complex). Every method you override has to do the correct thing at all times (including the map's own hashCode and equals), and other views would have to be created around the key set and values. The entrySet would be nasty to get right. I'd look for a pre-written solution before attempting your own (the Guava one above or other). This zero-copy model would be the most efficient solution but the most code and is what I would do in the same case if "huge" meant significant processing time. There is a lot that you can get wrong with this approach if you misunderstand any part of the implementation contract.
You could wrap the Guava solution with one that maintains the size attribute as items are manipulated and therefore be efficient for that case. You can also write a more efficient solution if you know the original map is read-only. For ideas, check out the Guava implementation of FilteredKeyMap and its ancestor AbstractFilteredMap.
In summary, likely the caching of your hashcode is going to give you the biggest result for the effort. Start there. You'll need it to do even for the Guava approach.
In addition to Axel's direct answer:
Could calculating the hashcode of a Job be optimised?  If the calculation can't be sped up, could it cache the result?  (There's ample precedent for this, including java.lang.String.)  Or if the class isn't under your control, could you create a delegate/wrapper that overrides the hashcode calculation?
You can use filterKeys function. It will iterate map only once
val copy = map.filterKeys { it !in listToRemoveFromCopy }

How to choose between asIterable() vs asSequence()? [duplicate]

Both of these interfaces define only one method
public operator fun iterator(): Iterator<T>
Documentation says Sequence is meant to be lazy. But isn't Iterable lazy too (unless backed by a Collection)?
The key difference lies in the semantics and the implementation of the stdlib extension functions for Iterable<T> and Sequence<T>.
For Sequence<T>, the extension functions perform lazily where possible, similarly to Java Streams intermediate operations. For example, Sequence<T>.map { ... } returns another Sequence<R> and does not actually process the items until a terminal operation like toList or fold is called.
Consider this code:
val seq = sequenceOf(1, 2)
val seqMapped: Sequence<Int> = seq.map { print("$it "); it * it } // intermediate
print("before sum ")
val sum = seqMapped.sum() // terminal
It prints:
before sum 1 2
Sequence<T> is intended for lazy usage and efficient pipelining when you want to reduce the work done in terminal operations as much as possible, same to Java Streams. However, laziness introduces some overhead, which is undesirable for common simple transformations of smaller collections and makes them less performant.
In general, there is no good way to determine when it is needed, so in Kotlin stdlib laziness is made explicit and extracted to the Sequence<T> interface to avoid using it on all the Iterables by default.
For Iterable<T>, on contrary, the extension functions with intermediate operation semantics work eagerly, process the items right away and return another Iterable. For example, Iterable<T>.map { ... } returns a List<R> with the mapping results in it.
The equivalent code for Iterable:
val lst = listOf(1, 2)
val lstMapped: List<Int> = lst.map { print("$it "); it * it }
print("before sum ")
val sum = lstMapped.sum()
This prints out:
1 2 before sum
As said above, Iterable<T> is non-lazy by default, and this solution shows itself well: in most cases it has good locality of reference thus taking advantage of CPU cache, prediction, prefetching etc. so that even multiple copying of a collection still works good enough and performs better in simple cases with small collections.
If you need more control over the evaluation pipeline, there is an explicit conversion to a lazy sequence with Iterable<T>.asSequence() function.
Completing hotkey's answer:
It is important to notice how Sequence and Iterable iterates throughout your elements:
Sequence example:
list.asSequence().filter { field ->
Log.d("Filter", "filter")
field.value > 0
}.map {
Log.d("Map", "Map")
}.forEach {
Log.d("Each", "Each")
}
Log result:
filter - Map - Each; filter - Map - Each
Iterable example:
list.filter { field ->
Log.d("Filter", "filter")
field.value > 0
}.map {
Log.d("Map", "Map")
}.forEach {
Log.d("Each", "Each")
}
filter - filter - Map - Map - Each - Each
Iterable is mapped to the java.lang.Iterable interface on the
JVM, and is implemented by commonly used collections, like List or
Set. The collection extension functions on these are evaluated
eagerly, which means they all immediately process all elements in
their input and return a new collection containing the result.
Here’s a simple example of using the collection functions to get the
names of the first five people in a list whose age is at least 21:
val people: List<Person> = getPeople()
val allowedEntrance = people
.filter { it.age >= 21 }
.map { it.name }
.take(5)
Target platform: JVMRunning on kotlin v. 1.3.61 First, the age check
is done for every single Person in the list, with the result put in a
brand new list. Then, the mapping to their names is done for every
Person who remained after the filter operator, ending up in yet
another new list (this is now a List<String>). Finally, there’s one
last new list created to contain the first five elements of the
previous list.
In contrast, Sequence is a new concept in Kotlin to represent a lazily
evaluated collection of values. The same collection extensions are
available for the Sequence interface, but these immediately return
Sequence instances that represent a processed state of the date, but
without actually processing any elements. To start processing, the
Sequence has to be terminated with a terminal operator, these are
basically a request to the Sequence to materialize the data it
represents in some concrete form. Examples include toList, toSet,
and sum, to mention just a few. When these are called, only the
minimum required number of elements will be processed to produce the
demanded result.
Transforming an existing collection to a Sequence is pretty
straightfoward, you just need to use the asSequence extension. As
mentioned above, you also need to add a terminal operator, otherwise
the Sequence will never do any processing (again, lazy!).
val people: List<Person> = getPeople()
val allowedEntrance = people.asSequence()
.filter { it.age >= 21 }
.map { it.name }
.take(5)
.toList()
Target platform: JVMRunning on kotlin v. 1.3.61 In this case, the
Person instances in the Sequence are each checked for their age, if
they pass, they have their name extracted, and then added to the
result list. This is repeated for each person in the original list
until there are five people found. At this point, the toList function
returns a list, and the rest of the people in the Sequence are not
processed.
There’s also something extra a Sequence is capable of: it can contain
an infinite number of items. With this in perspective, it makes sense
that operators work the way they do - an operator on an infinite
sequence could never return if it did its work eagerly.
As an example, here’s a sequence that will generate as many powers of
2 as required by its terminal operator (ignoring the fact that this
would quickly overflow):
generateSequence(1) { n -> n * 2 }
.take(20)
.forEach(::println)
You can find more here.
Iterable is good enough for most use cases, the way iteration is performed on them it works very well with caches because of the spatial locality. But the issue with them is that whole collection must pass through first intermediate operation before it moves to second and so on.
In sequence each item passes through the full pipeline before the next is handled.
This property can be determental to the performance of your code especially when iterating over large data set. so, if your terminal operation is very likely to terminate early then sequence should be preferred choice because you save by not performing unnecessary operations. for example
sequence.filter { getFilterPredicate() }
.map { getTransformation() }
.first { getSelector() }
In above case if first item satisfies the filter predicate and after map transformation meets the selection criteria then filter, map and first are invoked only once.
In case of iterable whole collection must first be filtered then mapped and then first selection starts