Kotlin and parallelStream toArray

Kotlin and parallelStream toArray - kotlin

I think I'm getting sidetracked. I'm trying to utilize the Java parallelStream for performance reasons.
A function Specimen.pick() samples and returns an instance of Specimen.
I want to parallize this with parallelStream when replacing pool.
var pool: Array<Specimen> = Array(100_000) ..
This is what I'm trying to write in Kotlin:
pool = pool.asList().parallelStream().map { Specimen.pick(pool, wheel, r.split()) }.toArray(Specimen::new)
Which errors out on ::new
Instead I have to juggle back and forth between list and array:
pool = pool.asList().parallelStream().map { Specimen.pick(pool, wheel, r.split()) }.collect(Collectors.toList()).toTypedArray()
Which works, but seems resource wasteful rather than going directly to Array. If I let IntelliJ try to Kotlinize a Java example of this:
Java:
Person[] men = people.stream()
.filter(p -> p.getGender() == MALE)
.toArray(Person[]::new);
IntelliJ transformation:
val men = people.stream()
.filter({ p -> p.getGender() === MALE })
.toArray(Person[]::new /* Currently unsupported in Kotlin */)
So maybe this is coming for Kotlin? Or is there another better way?

You can't use an array constructor reference, but every method reference can be expressed using a lambda expression:
.toArray<Person>({length -> arrayOfNulls(length)})

Related

Kotlin Map and Compute Functions

I am trying to re-compute values for all entries in a mutable map without changing the keys.
Here is the code (with comments showing what I am expecting)
fun main(args: Array<String>) {
var exampleMap: MutableMap<Int, String> = mutableMapOf(1 to "One", 2 to "Two")
// Before it's {1-"One", 2-"Two"}
with(exampleMap) {
compute(k){
_,v -> k.toString + "__" + v
}
}
// Expecting {1->"1__One", 2->"2__Two"}
}
I don't want to use mapValues because it creates a copy of the original map. I think compute is what I need, but I am struggling to write this function. Or, rather I don't know how to write this. I know the equivalent in Java (as I am a Java person) and it would be:
map.entrySet().stream().forEach(e-> map.compute(e.getKey(),(k,v)->(k+"__"+v)));
How can I achieve that in Kotlin with compute ?
Regards,

The best way to do this is by iterating over the map's entries. This ensures you avoid any potential concurrent modification issues that might arise due to modifying the map while iterating over it.
map.entries.forEach { entry ->
entry.setValue("${entry.key}__${entry.value}")
}

Use onEach function:
var exampleMap: MutableMap<Int, String> = mutableMapOf(1 to "One", 2 to "Two")
println(exampleMap)
exampleMap.onEach {
exampleMap[it.key] = "${it.key}__${it.value}"
}
println(exampleMap)
Output:
{1=One, 2=Two}
{1=1__One, 2=2__Two}
onEach performs the given action on each element and returns the list/array/map itself afterwards.

Why can't I use map on Kotlins Regex Result sequence

I worked with Kotlin's Regex API to get occurences of some regular expression. I wanted to convert the finding directly into another object so I intuitively used map() on the result sequence.
I was very surprised that the map function is never called but forEach is working. This example should make it clear:
val regex = "a.".toRegex()
val txt = "abacad"
var counter = 0
regex.findAll(txt).forEach { counter++ }
println(counter) // 3
regex.findAll(txt).map { counter++ }
println(counter) // still 3 since map is not called
regex.findAll(txt).forEach { counter++ }
println(counter) // 6
My question is why? Did I oversee it in the documentation?
(tested on Kotlin 1.5.30)

findAll() returns a Sequence<MatchResult>. Operations on Sequence are classified either as intermediate or terminal. The documentation for the functions declares which type they are. map and onEach are intermediate. Their action is deferred until a terminal operation is made. forEach is terminal.
Manipulating a Sequence with map returns a new Sequence that will perform the mapping function only when it is actually iterated, such as by a call to forEach or using it in a for loop.
This is the purpose of Sequence, to defer mutating functional calls. It can reduce allocations of intermediate Lists, or in some cases avoid applying the mutations on every single item, such as if the terminal call in the chain is a find() call.

Async Wait Efficient Execution

I need to iterate 100's of ids in parallel and collect the result in list. I am trying to do it in following way
val context = newFixedThreadPoolContext(5, "custom pool")
val list = mutableListOf<String>()
ids.map {
val result:Deferred<String> = async(context) {
getResult(it)
}
//list.add(result.await()
}.mapNotNull(result -> list.add(result.await())
I am getting error at
mapNotNull(result -> list.add(result.await())
as await method is not available. Why await is not applicable at this place? Instead commented line
//list.add(result.await()
is working fine.
What is the best way to run this block in parallel using coroutine with custom thread pool?

Generally, you go in the right direction: you need to create a list of Deferred and then await() on them.
If this is exactly the code you are using then you did not return anything from your first map { } block, so you don't get a List<Deferred> as you expect, but List<Unit> (list of nothing). Just remove val result:Deferred<String> = - this way you won't assign result to a variable, but return it from the lambda. Also, there are two syntactic errors in the last line: you used () instead of {} and there is a missing closing parenthesis.
After these changes I believe your code will work, but still, it is pretty weird. You seem to mix two distinct approaches to transform a collection into another. One is using higher-order functions like map() and another is using a loop and adding to a list. You use both of them at the same time. I think the following code should do exactly what you need (thanks #Joffrey for improving it):
val list = ids.map {
async(context) {
getResult(it)
}
}.awaitAll().filterNotNull()

With Arrow: How do I apply a transformation of type (X)->IO<Y> to data of type Sequence<X> to get IO<Sequence<Y>>?

I am learning functional programming using Arrow.kt, intending to walk a path hierarchy and hash every file (and do some other stuff). Forcing myself to use functional concepts as much as possible.
Assume I have a data class CustomHash(...) already defined in code. It will be referenced below.
First I need to build a sequence of files by walking the path. This is an impure/effectful function, so it should be marked as such with the IO monad:
fun getFiles(rootPath: File): IO<Sequence<File>> = IO {
rootPath.walk() // This function is of type (File)->Sequence<File>
}
I need to read the file. Again, impure, so this is marked with IO
fun getRelevantFileContent(file: File): IO<Array<Byte>> {
// Assume some code here to extract only certain data relevant for my hash
}
Then I have a function to compute a hash. If it takes a byte array, then it's totally pure. Making it suspend because it will be slow to execute:
suspend fun computeHash(data: Array<Byte>): CustomHash {
// code to compute the hash
}
My issue is how to chain this all together in a functional manner.
fun main(rootPath: File) {
val x = getFiles(rootPath) // IO<Sequence<File>>
.map { seq -> // seq is of type Sequence<File>
seq.map { getRelevantFileContent(it) } // This produces Sequence<IO<Hash>>
}
}
}
Right now, if I try this, x is of type IO<Sequence<IO<Hash>>>. It is clear to me why this is the case.
Is there some way of turning Sequence<IO<Any>> into IO<Sequence<Any>>? Which I suppose is essentially, probably getting the terms imprecise, taking blocks of code that execute in their own coroutines and running the blocks of code all on the same coroutine instead?
If Sequence weren't there, I know IO<IO<Hash>> could have been IO<Hash> by using a flatMap in there, but Sequence of course doesn't have that flattening of IO capabilities.
Arrow's documentation has a lot of "TODO" sections and jumps very fast into documentation that presumes a lot of intermediate/advanced functional programming knowledge. It hasn't really been helpful for this problem.

First you need to convert the Sequence to SequenceK then you can use the sequence function to do that.
import arrow.fx.*
import arrow.core.*
import arrow.fx.extensions.io.applicative.applicative
val sequenceOfIOs: Sequence<IO<Any>> = TODO()
val ioOfSequence: IO<Sequence<Any>> = sequenceOfIOs.k()
.sequence(IO.applicative())
.fix()

Is there any way to iterate all fields of a data class without using reflection?

I know an alternative of reflection which is using javassist, but using javassist is a little bit complex. And because of lambda or some other features in koltin, the javassist doesn't work well sometimes. So is there any other way to iterate all fields of a data class without using reflection.

There are two ways. The first is relatively easy, and is essentially what's mentioned in the comments: assuming you know how many fields there are, you can unpack it and throw that into a list, and iterate over those. Or alternatively use them directly:
data class Test(val x: String, val y: String) {
fun getData() : List<Any> = listOf(x, y)
}
data class Test(val x: String, val y: String)
...
val (x, y) = Test("x", "y")
// And optionally throw those in a list
Although iterating like this is a slight extra step, this is at least one way you can relatively easy unpack a data class.
If you don't know how many fields there are (or you don't want to refactor), you have two options:
The first is using reflection. But as you mentioned, you don't want this.
That leaves a second, somewhat more complicated preprocessing option: annotations. Note that this only works with data classes you control - beyond that, you're stuck with reflection or implementations from the library/framework coder.
Annotations can be used for several things. One of which is metadata, but also code generation. This is a somewhat complicated alternative, and requires an additional module in order to get compile order right. If it isn't compiled in the right order, you'll end up with unprocessed annotations, which kinda defeats the purpose.
I've also created a version you can use with Gradle, but that's at the end of the post and it's a shortcut to implementing it yourself.
Note that I have only tested this with a pure Kotlin project - I've personally had problems with annotations between Java and Kotlin (although that was with Lombok), so I do not guarantee this will work at compile time if called from Java. Also note that this is complex, but avoids runtime reflection.
Explanation
The main issue here is a certain memory concern. This will create a new list every time you call the method, which makes it very similar to the method used by enums.
Local testing over 10000 iterations also show a general consistency of ~200 milliseconds to execute my approach, versus roughly 600 for reflection. However, for one iteration, mine uses ~20 milliseconds, where as reflection uses between 400 and 500 milliseconds. On one run, reflection took 1500 (!) milliseconds, while my approach took 18 milliseconds.
See also Java Reflection: Why is it so slow?. This appears to affect Kotlin as well.
The memory impact of creating a new list every time it's called can be noticeable though, but it'll also be collected so it shouldn't be that big a problem.
For reference, the code used for benchmarking (this will make sense after the rest of the post):
#AutoUnpack data class ExampleDataClass(val x: String, val y: Int, var m: Boolean)
fun main(a: Array<String>) {
var mine = 0L
var reflect = 0L
// for(i in 0 until 10000) {
var start = System.currentTimeMillis()
val cls = ExampleDataClass("example", 42, false)
for (field in cls) {
println(field)
}
mine += System.currentTimeMillis() - start
start = System.currentTimeMillis()
for (prop in ExampleDataClass::class.memberProperties) {
println("${prop.name} = ${prop.get(cls)}")
}
reflect += System.currentTimeMillis() - start
// }
println(mine)
println(reflect)
}
Setting up from scratch
This bases itself around two modules: a consumer module, and a processor module. The processor HAS to be in a separate module. It needs to be compiled separately from the consumer for the annotations to work properly.
First of all, your consumer project needs the annotation processor:
apply plugin: 'kotlin-kapt'
Additionally, you need to add stub generation. It complains it's unused while compiling, but without it, the generator seems to break for me:
kapt {
generateStubs = true
}
Now that that's in order, create a new module for the unpacker. Add the Kotlin plugin if you didn't already. You do not need the annotation processor Gradle plugin in this project. That's only needed by the consumer. You do, however, need kotlinpoet:
implementation "com.squareup:kotlinpoet:1.2.0"
This is to simplify aspects of the code generation itself, which is the important part here.
Now, create the annotation:
#Retention(AnnotationRetention.SOURCE)
#Target(AnnotationTarget.CLASS)
annotation class AutoUnpack
This is pretty much all you need. The retention is set to source because it has no value at runtime, and it only targets compile time.
Next, there's the processor itself. This is somewhat complicated, so bear with me. For reference, this uses the javax.* packages for annotation processing. Android note: this might work assuming you can plug in a Java module on a compileOnly scope without getting the Android SDK restrictions. As I mentioned earlier, this is mainly for pure Kotlin; Android might work, but I haven't tested that.
Anyways, the generator:
Because I couldn't find a way to generate the method into the class without touching the rest (and because according to this, that isn't possible), I'm going with an extension function generation approach.
You'll need a class UnpackCodeGenerator : AbstractProcessor(). In there, you'll first need two lines of boilerplate:
override fun getSupportedAnnotationTypes(): MutableSet<String> = mutableSetOf(AutoUnpack::class.java.name)
override fun getSupportedSourceVersion(): SourceVersion = SourceVersion.latest()
Moving on, there's the processing. Override the process function:
override fun process(annotations: MutableSet<out TypeElement>, roundEnv: RoundEnvironment): Boolean {
// Find elements with the annotation
val annotatedElements = roundEnv.getElementsAnnotatedWith(AutoUnpack::class.java)
if(annotatedElements.isEmpty()) {
// Self-explanatory
return false;
}
// Iterate the elements
annotatedElements.forEach { element ->
// Grab the name and package
val name = element.simpleName.toString()
val pkg = processingEnv.elementUtils.getPackageOf(element).toString()
// Then generate the class
generateClass(name,
if (pkg == "unnamed package") "" else pkg, // This is a patch for an issue where classes in the root
// package return package as "unnamed package" rather than empty,
// which breaks syntax because "package unnamed package" isn't legal.
element)
}
// Return true for success
return true;
}
This just sets up some of the later framework. The real magic happens in the generateClass function:
private fun generateClass(className: String, pkg: String, element: Element){
val elements = element.enclosedElements
val classVariables = elements
.filter {
val name = if (it.simpleName.contains("\$delegate"))
it.simpleName.toString().substring(0, it.simpleName.indexOf("$"))
else it.simpleName.toString()
it.kind == ElementKind.FIELD // Find fields
&& Modifier.STATIC !in it.modifiers // that aren't static (thanks to sebaslogen for issue #1: https://github.com/LunarWatcher/KClassUnpacker/issues/1)
// Additionally, we have to ignore private fields. Extension functions can't access these, and accessing
// them is a bad idea anyway. Kotlin lets you expose get without exposing set. If you, by default, don't
// allow access to the getter, there's a high chance exposing it is a bad idea.
&& elements.any { getter -> getter.kind == ElementKind.METHOD // find methods
&& getter.simpleName.toString() ==
"get${name[0].toUpperCase().toString() + (if (name.length > 1) name.substring(1) else "")}" // that matches the getter name (by the standard convention)
&& Modifier.PUBLIC in getter.modifiers // that are marked public
}
} // Grab the variables
.map {
// Map the name now. Also supports later filtering
if (it.simpleName.endsWith("\$delegate")) {
// Support by lazy
it.simpleName.subSequence(0, it.simpleName.indexOf("$"))
} else it.simpleName
}
if (classVariables.isEmpty()) return; // Self-explanatory
val file = FileSpec.builder(pkg, className)
.addFunction(FunSpec.builder("iterator") // For automatic unpacking in a for loop
.receiver(element.asType().asTypeName().copy()) // Add it as an extension function of the class
.addStatement("return listOf(${classVariables.joinToString(", ")}).iterator()") // add the return statement. Create a list, push an iterator.
.addModifiers(KModifier.PUBLIC, KModifier.OPERATOR) // This needs to be public. Because it's an iterator, the function also needs the `operator` keyword
.build()
).build()
// Grab the generate directory.
val genDir = processingEnv.options["kapt.kotlin.generated"]!!
// Then write the file.
file.writeTo(File(genDir, "$pkg/${element.simpleName.replace("\\.kt".toRegex(), "")}Generated.kt"))
}
All of the relevant lines have comments explaining use, in case you're not familiar with what this does.
Finally, in order to get the processor to process, you need to register it. In the module for the generator, add a file called javax.annotation.processing.Processor under main/resources/META-INF/services. In there you write:
com.package.of.UnpackCodeGenerator
From here, you need to link it using compileOnly and kapt. If you added it as a module to your project, you can do:
kapt project(":ClassUnpacker")
compileOnly project(":ClassUnpacker")
Alternative source setup:
Like I mentioned earlier, I bundled this into a jar for convenience. It's under the same license as SO uses (CC-BY-SA 3.0), and it contains the exact same code as in the answer (although compiled into a single project).
If you want to use this one, just add the Jitpack repo:
repositories {
// Other repos here
maven { url 'https://jitpack.io' }
}
And hook it up with:
kapt 'com.github.LunarWatcher:KClassUnpacker:v1.0.1'
compileOnly "com.github.LunarWatcher:KClassUnpacker:v1.0.1"
Note that the version here may not be up to date: the up to date list of versions is available here. The code in the post still aims to reflect the repo, but versions aren't really important enough to edit every time.
Usage
Regardless of which way you ended up using to get the annotations, the usage is relatively easy:
#AutoUnpack data class ExampleDataClass(val x: String, val y: Int, var m: Boolean)
fun main(a: Array<String>) {
val cls = ExampleDataClass("example", 42, false)
for(field in cls) {
println(field)
}
}
This prints:
example
42
false
Now you have a reflection-less way of iterating fields.
Note that local testing has been done partially with IntelliJ, but IntelliJ doesn't seem to like me - I've had various failed builds where gradlew clean && gradlew build from a command line oddly works fine. I'm not sure whether this is a local problem, or if this is a general problem, but you might have some issues like this if you build from IntelliJ.
Also, you might get errors if the build fails. The IntelliJ linter builds on top of the build directory for some sources, so if the build fails and the file with the extension function isn't generated, that'll cause it to appear as an error. Building usually fixes this when I tested (with both modules and from Jitpack).
You'll also likely have to enable the annotation processor setting if you use Android Studio or IntelliJ.

here is another idea, that i came up with, but am not satisfied with...but it has some pros and cons:
pros:
adding/removing fields to/from the data class causes compiler errors at field-iteration sites
no boiler-plate code needed
cons:
won't work if default values are defined for arguments
declaration:
data class Memento(
val testType: TestTypeData,
val notes: String,
val examinationTime: MillisSinceEpoch?,
val administeredBy: String,
val signature: SignatureViewHolder.SignatureData,
val signerName: String,
val signerRole: SignerRole
) : Serializable
iterating through all fields (can use this directly at call sites, or apply the Visitor pattern, and use this in the accept method to call all the visit methods):
val iterateThroughAllMyFields: Memento = someValue
Memento(
testType = iterateThroughAllMyFields.testType.also { testType ->
// do something with testType
},
notes = iterateThroughAllMyFields.notes.also { notes ->
// do something with notes
},
examinationTime = iterateThroughAllMyFields.examinationTime.also { examinationTime ->
// do something with examinationTime
},
administeredBy = iterateThroughAllMyFields.administeredBy.also { administeredBy ->
// do something with administeredBy
},
signature = iterateThroughAllMyFields.signature.also { signature ->
// do something with signature
},
signerName = iterateThroughAllMyFields.signerName.also { signerName ->
// do something with signerName
},
signerRole = iterateThroughAllMyFields.signerRole.also { signerRole ->
// do something with signerRole
}
)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Kotlin and parallelStream toArray - kotlin

You can't use an array constructor reference, but every method reference can be expressed using a lambda expression: .toArray<Person>({length -> arrayOfNulls(length)})

Related

Kotlin Map and Compute Functions

Why can't I use map on Kotlins Regex Result sequence

Async Wait Efficient Execution

With Arrow: How do I apply a transformation of type (X)->IO<Y> to data of type Sequence<X> to get IO<Sequence<Y>>?

Is there any way to iterate all fields of a data class without using reflection?

Categories

Resources