Kotlin: stream vs sequence - why multiple ways to do the same thing? - kotlin

Is there a reason why there are multiple ways to do the same thing in Kotlin
val viaSequence = items.asSequence()
.filter { it%2 == 0 }
.map { it*2 }
.toList()
println(viaSequence)
val viaIterable = items.asIterable()
.filter { it%2 == 0 }
.map { it*2 }
.toList()
println(viaIterable)
val viaStream = items.stream()
.filter { it%2 == 0 }
.map { it*2 }
.toList()
println(viaStream)
I know that the following code creates a list on every step, which adds load to the GC, and as such should be avoided:
items.filter { it%2 == 0 }.map { it*2 }

Streams come from Java, where there are no inline functions, so Streams are the only way to use these functional chains on a collection in Java. Kotlin can do them directly on Iterables, which is better for performance in many cases because intermediate Stream objects don't need to be created.
Kotlin has Sequences as an alternative to Streams with these advantages:
They use null to represent missing items instead of Optional. Nullable values are easier to work with in Kotlin because of its null safety features. It's also better for performance to avoid wrapping all the items in the collection.
Some of the operators and aggregation functions are much more concise and avoid having to juggle generic types (compare Sequence.groupBy to Stream.collect).
There are more operators provided for Sequences, which result in performance advantages and simpler code by cutting out intermediate steps.
Many terminal operators are inline functions, so they omit the last wrapper that a Stream would need.
The sequence builder lets you create a complicated lazy sequence of items with simple sequential syntax in a coroutine. Very powerful.
They work back to Java 1.6. Streams require Java 8 or higher. This one is irrelevant for Kotlin 1.5 and higher since Kotlin now requires JDK 8 or higher.
The other answer mentions the advantages that Streams have.
Nice article comparing them here.

One of your three variants is the same as using the List itself:
items.asIterable()
.filter { it%2 == 0 }
Here, you're calling the exact same filter function as if you just called items.filter. The List itself is an Iterable so it's the Iterable filter that is called. This filter looks at all available elements and returns a complete list.
So the question is why we have both streams and sequences.
Streams are part of Java. Many of the terminal operations produce Optional. Other operations, such as toList(), will produce non-null-safe platform types such as List<Integer!>. On the other hand, sequences are native to Kotlin, and they can be used with Kotlin's own compile-time null-safety features. Also, sequences are available in non-JVM variants of Kotlin.
The Kotlin designers probably had to create a new class, because if they had just added new operations to Stream, e.g. as extension functions, they would clash with existing names (e.g. max() returns Optional in Java, and that's not ideal for Kotlin, but choosing a name other than the natural name max wouldn't be ideal either.)
So in most cases you should prefer sequences as they are more Kotlin-idiomatic. However, there are some things Java Streams can do that aren't yet available to sequences (for example, SummaryStatistics, or parallel operations). When you need an operation that is only available in Streams, but you have a Sequence, you can convert the Sequence to a Stream using asStream() (as well as vice versa).
Another advantage of Streams is that you can use primitive streams such as IntStream, to avoid unnecessary boxing/unboxing.

Related

Is there a way to use monad comprehensions with Kotlin Flow

Kotlin coroutines and Arrow are a nice way to avoid nesting flatmaps, introducing monadic comprehensions in Kotlin. However Kotlin's Flow type still relies on declarative flatmapping, so we get into a mixture of direct and declarative styles:
override suspend fun findAll(page: Pageable): Either<BusinessException, Flow<PageElement<ClientOut>>> = either {
val count = clientRepository.count().awaitSingle().bind()
return clientRepository.findByIdNotNull(page).asFlow()
.flatMapMerge { client ->
flow { emit(mapDetailedClientOut(client)) }
}
}
val count has been bound inside the either {...} comprehension. However, there doesn't seem to be a way to do the same with Flow, forcing us to nest a flatmapMerge().
Is there a way to do it, or is it planned to be somehow included in the near future?
Sadly there is currently no way to build comphrehensions for the KotlinX Flow datatype, since Coroutines in Kotlin only support for single-shot emission/bind.
Therefore it's only possible to build comphrensions for data types with 0..1 elements such as Either or Nullable, but not 0..N like the Flow or List data types.

A shorter way to transform enumValues to a multi-value map

New to Kotlin. I was thinking if there is a shorter way to write the following piece of code. It means to categorize enum values into a multi-value map.
fun androidPermissionsByCategory(): Map<String, List<String>> {
val result = hashMapOf<String, MutableList<String>>()
enumValues<AndroidPermission>().onEach {
result.getOrPut(it.permissionGroup, { mutableListOf() }).add(it.value())
}
return result
}
Suggestions?
That's good, solid code (especially if you're not experienced in Kotlin): good use of types, and the getOrPut() function.  (The only tweak I'd suggest would be to change hashMapOf() to mutableMapOf(), since you don't care about the exact type of map.  You could also replace the add() call with += operator, though that's more disputable.)
However, there's a shorter alternative in a more functional style:
fun androidPermissionsByCategory(): Map<String, List<String>>
= enumValues<AndroidPermission>()
.groupBy({ it.permissionGroup }, { it.value() })
(Disclaimer: I don't have Android libs, so I can't test this exactly.)
This uses the standard library's groupBy() function, which does exactly what you want: it compares items (using a key-selector lambda you provide), and uses it to create a multimap from them.
Here we're using the enum's permissionGroup field as the key.
We're also using the optional second parameter to transform the values in the multimap, in this case getting their value().
You'll find that Kotlin has functional alternatives to many of the common imperative patterns for constructing and transforming maps, lists, and similar structures; these are often more concise, more declarative, and easier to read.  Any time you find yourself looping over a list or similar, it's worth asking whether there's a better way.  (As you have here!  Your intuition is clearly working well :-)

Kotlin most efficient way sequence to set

Hi I've got list of 1330 objects and would like to apply method and obtain set as result.
val result = listOf1330
.asSequence()
.map {
someMethod(it)
}
val resultSet = result.toSet()
It works fine without toSet but if then execution time is about 10 times longer.
I've used sequence to make it work faster and it is but as a result I need list without duplicates (set).
Simply: What is most effective way to convert sequence to set?
val result = listOf1330.mapTo(HashSet()) { someMethod(it) }
It makes less sense to use streams or sequences to implement the transformation - you will need all elements from the collection, not several. The mapTo (and map) functions are inline in Kotlin. It means the code will be substituted into the call site, it will not have lambda created and executed many times. We use mapTo to avoid the second copy of the collection done by the toSet() function.
The .parallelStream() may add more performance, if you like to run the computation in several threads. It is still a good idea to measure how good the load is balanced between threads. The performance may depend on the collection implementation class, on which you call it
If your someObject has a slow implementation of equals() or hashCode(), or gives the same hash code for many objects, then that could account for the delay, and you may be able to improve it.
Otherwise, if the objects are big, the delay may be mostly due to the amount of memory that must be accessed to store them all; if so, that's the price you'll have to pay if you want a set with all those objects in memory.
Sequence.toSet() uses a LinkedHashSet.  You could try providing another Set instance, using e.g. toCollection(HashSet()), to see if that's any faster.  (You wouldn't get the same iteration order, though.)
I agree with gidds answer on HashSet and LinkedHashSet performance.
LinkedHashSet is more expensive for insertions than HashSet;
However, in the above use case, I think we can leverage parallelStream to improve the performance. Under the hood, Kotlin uses the Java parallelStream.
val result: Set<String> = listOf("sdgds", "fdgdfsg", "dsfgsdfg")
.parallelStream()
.map {
someMethod(it)
}.collect(Collectors.toSet())
The Collectors.toSet() uses HashSet. So, we should be ok in insertion performance perspective.
Use distict or distictBy.
val result = sequenceOf("a", "b", "a", "c").distinct()
// -> "a", "b", "c"
// for more complex cases use custom comparator function
val result = getMyObjectsSequence().distinctBy { it.name }
This approach lets keep using sequence without involving explicit Iterables (List, Set, etc.).
Nevertheless, there is no magic, and "distinct" still uses HashSet under the hood and in case of really huge sequence it may cause sufficient memory usage and it must be kept in mind while applying this function.

Kotlin benifits of writing helper/util methods without wrapping in class

There are can be two ways of writing helper method in Kotlin
First is
object Helper {
fun doSomething(a: Any, b: Any): Any {
// Do some businesss logic and return result
}
}
Or simply writing this
fun doSomething(a: Any, b: Any): Any {
// Do some businesss logic and return result
}
inside a Helper.kt class.
So my question is in terms of performance and maintainability which is better and why?
In general, your first choice should be top-level functions. If a function has a clear "primary" argument, you can make it even more idiomatic by extracting it as the receiver of an extension function.
The object is nothing more than a holder of the namespace of its member functions. If you find that you have several groups of functions that you want to categorize, you can create several objects for them so you can qualify the calls with the object's name. There's little beyond this going in their favor in this role.
object as a language feature makes a lot more sense when it implements a well-known interface.
There's a third and arguably more idiomatic way: extension functions.
fun Int.add(b: Int): Int = this + b
And to use it:
val x = 1
val y = x.add(3) // 4
val z = 1.add(3) // 4
In terms of maintainability, I find extension functions just as easy to maintain as top-level functions or helper classes. I'm not a big fan of helper classes because they end up acquiring a lot of cruft over time (things people swear we'll reuse but never do, oddball variants of what we already have for special use cases, etc).
In terms of performance, these are all going to resolve more or less the same way - statically. The Kotlin compiler is effectively going to compile all of these down to the same java code - a class with a static method.

How to choose between asIterable() vs asSequence()? [duplicate]

Both of these interfaces define only one method
public operator fun iterator(): Iterator<T>
Documentation says Sequence is meant to be lazy. But isn't Iterable lazy too (unless backed by a Collection)?
The key difference lies in the semantics and the implementation of the stdlib extension functions for Iterable<T> and Sequence<T>.
For Sequence<T>, the extension functions perform lazily where possible, similarly to Java Streams intermediate operations. For example, Sequence<T>.map { ... } returns another Sequence<R> and does not actually process the items until a terminal operation like toList or fold is called.
Consider this code:
val seq = sequenceOf(1, 2)
val seqMapped: Sequence<Int> = seq.map { print("$it "); it * it } // intermediate
print("before sum ")
val sum = seqMapped.sum() // terminal
It prints:
before sum 1 2
Sequence<T> is intended for lazy usage and efficient pipelining when you want to reduce the work done in terminal operations as much as possible, same to Java Streams. However, laziness introduces some overhead, which is undesirable for common simple transformations of smaller collections and makes them less performant.
In general, there is no good way to determine when it is needed, so in Kotlin stdlib laziness is made explicit and extracted to the Sequence<T> interface to avoid using it on all the Iterables by default.
For Iterable<T>, on contrary, the extension functions with intermediate operation semantics work eagerly, process the items right away and return another Iterable. For example, Iterable<T>.map { ... } returns a List<R> with the mapping results in it.
The equivalent code for Iterable:
val lst = listOf(1, 2)
val lstMapped: List<Int> = lst.map { print("$it "); it * it }
print("before sum ")
val sum = lstMapped.sum()
This prints out:
1 2 before sum
As said above, Iterable<T> is non-lazy by default, and this solution shows itself well: in most cases it has good locality of reference thus taking advantage of CPU cache, prediction, prefetching etc. so that even multiple copying of a collection still works good enough and performs better in simple cases with small collections.
If you need more control over the evaluation pipeline, there is an explicit conversion to a lazy sequence with Iterable<T>.asSequence() function.
Completing hotkey's answer:
It is important to notice how Sequence and Iterable iterates throughout your elements:
Sequence example:
list.asSequence().filter { field ->
Log.d("Filter", "filter")
field.value > 0
}.map {
Log.d("Map", "Map")
}.forEach {
Log.d("Each", "Each")
}
Log result:
filter - Map - Each; filter - Map - Each
Iterable example:
list.filter { field ->
Log.d("Filter", "filter")
field.value > 0
}.map {
Log.d("Map", "Map")
}.forEach {
Log.d("Each", "Each")
}
filter - filter - Map - Map - Each - Each
Iterable is mapped to the java.lang.Iterable interface on the
JVM, and is implemented by commonly used collections, like List or
Set. The collection extension functions on these are evaluated
eagerly, which means they all immediately process all elements in
their input and return a new collection containing the result.
Here’s a simple example of using the collection functions to get the
names of the first five people in a list whose age is at least 21:
val people: List<Person> = getPeople()
val allowedEntrance = people
.filter { it.age >= 21 }
.map { it.name }
.take(5)
Target platform: JVMRunning on kotlin v. 1.3.61 First, the age check
is done for every single Person in the list, with the result put in a
brand new list. Then, the mapping to their names is done for every
Person who remained after the filter operator, ending up in yet
another new list (this is now a List<String>). Finally, there’s one
last new list created to contain the first five elements of the
previous list.
In contrast, Sequence is a new concept in Kotlin to represent a lazily
evaluated collection of values. The same collection extensions are
available for the Sequence interface, but these immediately return
Sequence instances that represent a processed state of the date, but
without actually processing any elements. To start processing, the
Sequence has to be terminated with a terminal operator, these are
basically a request to the Sequence to materialize the data it
represents in some concrete form. Examples include toList, toSet,
and sum, to mention just a few. When these are called, only the
minimum required number of elements will be processed to produce the
demanded result.
Transforming an existing collection to a Sequence is pretty
straightfoward, you just need to use the asSequence extension. As
mentioned above, you also need to add a terminal operator, otherwise
the Sequence will never do any processing (again, lazy!).
val people: List<Person> = getPeople()
val allowedEntrance = people.asSequence()
.filter { it.age >= 21 }
.map { it.name }
.take(5)
.toList()
Target platform: JVMRunning on kotlin v. 1.3.61 In this case, the
Person instances in the Sequence are each checked for their age, if
they pass, they have their name extracted, and then added to the
result list. This is repeated for each person in the original list
until there are five people found. At this point, the toList function
returns a list, and the rest of the people in the Sequence are not
processed.
There’s also something extra a Sequence is capable of: it can contain
an infinite number of items. With this in perspective, it makes sense
that operators work the way they do - an operator on an infinite
sequence could never return if it did its work eagerly.
As an example, here’s a sequence that will generate as many powers of
2 as required by its terminal operator (ignoring the fact that this
would quickly overflow):
generateSequence(1) { n -> n * 2 }
.take(20)
.forEach(::println)
You can find more here.
Iterable is good enough for most use cases, the way iteration is performed on them it works very well with caches because of the spatial locality. But the issue with them is that whole collection must pass through first intermediate operation before it moves to second and so on.
In sequence each item passes through the full pipeline before the next is handled.
This property can be determental to the performance of your code especially when iterating over large data set. so, if your terminal operation is very likely to terminate early then sequence should be preferred choice because you save by not performing unnecessary operations. for example
sequence.filter { getFilterPredicate() }
.map { getTransformation() }
.first { getSelector() }
In above case if first item satisfies the filter predicate and after map transformation meets the selection criteria then filter, map and first are invoked only once.
In case of iterable whole collection must first be filtered then mapped and then first selection starts