I'm new to Kotlin, but I want to try using it for game development, targeting at least Android with OpenGL ES 2.0 and HTML5 with WebGL (with which I am reasonably familiar). Not having to have slightly different versions of my rendering engine's classes/functions for WebGL and GLES20 would obviously be a good thing, but is there a practical way to achieve this in Kotlin without overhead?
I think what I'll have to do is write a class that implements WebGLRenderingContextBase or a clone of it (if a clone is necessary I can just use a delegate for the WebGL implementation) in OpenGL ES 2.0, full of methods like this:
override fun bindBuffer(target: Int, buffer, Int) {
GLES20.glBindBuffer(target, buffer)
}
I'll write a script to do the bulk of the work.
My question is, is the compiler smart enough to optimise away such wrappers and use GLES20.glBindBuffer etc directly in my class' vtable, or whatever equivalent the JVM has? Presumably inline can't be of any use when calling an overridden method via a reference to an interface or base class.
The Kotlin compiler does not optimize the bytecode to this extent, and it does not need to: the JVM itself is quite good at optimizing the code.
Moreover, inline functions were not designed to be a performance tool in Kotlin, instead they are used for non-local control flow and code transformation that cannot be achieved without inlining.
Actually, the JVM performs a lot of optimizations, sparing the compilers from the necessity of optimizing the bytecode they generate on their side too much. And inlining is one of the optimizations the JVM can do. (1) (2) (3)
Though neither compilers nor JVM can inline native methods, because of completely different nature of the native code.
The Kotlin compiler, in turn, performs some local optimizations that do not affect the overall structure of the program. One more reason to do so is debugging experience which is hard to preserve with heavy optimizations. To check the exact Kotlin optimizations, you can try to disable them by adding the -Xno-optimize flag to the free compiler arguments, then look through the generated bytecode or do some benchmarking.
Related
Dart makes asynchronous programming extremely easy. All you need to do is surround the asynchronous code in an async method, and within it, use await before every call that is going to take a while.
I'm new to Kotlin, and asynchronous programming doesn't seem that simple here. (Probably because Dart is single-threaded.)
It'd be nice to get a rough outline of the differences both languages provide in their implementation of asynchronous code.
Apologize if I miss-stated any facts. Thanks in advance!
Dart makes asynchronous programming extremely easy. All you need to do is surround the asynchronous code in an async method, and within it, use await before every call that is going to take a while.
Yes (though async+await is not Dart's invention, it dates back to at least C# 5.0 in 2012, which then directly inspired JavaScript, Python, Julia, Kotlin, Swift, Rust, many others, and Dart).
I'm new to Kotlin, and asynchronous programming doesn't seem that simple here.
Kotlin 1.1 has async+await, although await is a postfix method, not an operator unlike in most other languages, but the end-result is the same.
It'd be nice to get a rough outline of the differences both languages provide in their implementation of asynchronous code.
Kotlin and Dart are different languages because they solve different problems, consequently there's simply too much to write about their differences, even when focused entirely on how they handle concurrency and coroutines.
...but in-short, the main difference (as far as you're concerned) is syntactical (which is as far as I can tell: Be aware that I am not a Dart/Flutter nor Kotlin expert, I just know how to read documentation and use Google)
I suggest seeing some simple examples in Kotlin, such as:
First-off, read the announcement where await was introduced to Kotlin 1.1: https://kotlinlang.org/docs/whatsnew11.html#coroutines-experimental
And seeing how it interops with Swift's async + await functions here: https://kotlinlang.org/docs/whatsnew1530.html#experimental-interoperability-with-swift-5-5-async-await (Swift's async features work the same way as Dart's, as far as I know, except without enforced thread isolation)
Kotlin Coroutines Async Await Sequence
This article (which I only skimmed) seems good too: https://www.raywenderlich.com/books/kotlin-coroutines-by-tutorials/v2.0/chapters/5-async-await
I'm new to Kotlin, and asynchronous programming doesn't seem that simple here.
In fact, Kotlin takes it to the next level of simplicity: it's almost invisible. For example:
suspend fun main() {
println("Hello")
delay(1000)
println("Hello again")
}
This code, unbeknownst to you, is actually implemented as asynchronous. But you just see simple, sequential code. The compiled code (in case of the JVM backend) has structure something like this:
public static void main(String[] args) {
System.out.println("Hello");
globalThreadPool.scheduleAfterDelay(() -> {
println("Hello again");
}, 1000, TimeUnit.MILLISECONDS);
}
On top of that, Kotlin makes it super-simple to adapt any async code you may have today so that you can use in the same native way as the above built-in delay function.
Where people trip up mostly is not this basic scenario, but dealing with more advanced topics like structured concurrency, choosing the right thread pool to run your code, error propagation, and so on.
I haven't studied Dart, but from what I know about the async-await pattern in other languages, whenever you call an async function, you have implicitly created a concurrent task, which is very easy to leak out -- all it takes is forgetting to await on it. Kotlin prevents these bad outcomes by design and forces you to address the concurrency you're creating head-on, instead of decyphering out-of-memory logs from production.
The most important difference, beside the syntax, is the multithreading model of these languages.
Check this article:
Dart supports multi-threading using Isolates. Right in the introduction to Isolates, it has been said that
isolates [are] independent workers that are similar to threads but don’t share memory, communicating only via messages.
While Kotlin (on JVM) uses Java threads under the hood, which have access to shared memory.
async/await in both languages is implemented roughly the same, using CPS (glorified callbacks). The important distinction, in Dart you have single threaded event loop dispatching these callbacks, while in Kotlin on JVM you can have multiple event dispatches working together and continuations (callbacks) running truly in parallel on different threads and sharing memory, with all the benefits and issues resulting from that.
Also, note, Kotlin aims to be a multiplatform language, so while on JVM it has multithreaded model, if you compile Kotlin program into JS backend, it would be single-threaded with event-loop, basically same as Dart.
P.S. Watch this video from Roman Elizarov (designer of coroutines in Kotlin), is has a good overview of coroutine usage and internals.
This is a follow up question of this answer.
But when the application hasn’t used lambda expressions before¹, even
the framework for generating the lambda classes has to be loaded
(Oracle’s current implementation uses ASM under the hood). This is the
actual cause of the slowdown, loading and initialization of a dozen
internally used classes, not the lambda expression itself
Ok, Java uses ASM to generate the classes on runtime. I found this and if I understood correctly, it is basically saying that Kotlin lambdas are compiled to pre-existing anonymous classes being loaded at runtime (instead of generated).
If I'm correct, Kotlin lambdas aren't the same thing as Java and shouldn't have the same performance impact. Can someone confirm?
Of course, Kotlin has built-in support for inlining lambdas, where Java doesn't. So many lambdas in Kotlin code don't correspond to any objects at runtime at all.
But for those that can't be inlined, yes, according to https://medium.com/#christian.c.carroll/exploring-kotlin-lambda-bytecode-8c2d15afd490 the anonymous class translation seems to be always used. Unfortunately the post doesn't specify the Kotlin version (1.3.30 was the latest available at that time).
I would also consider this an implementation detail which could change depending on Kotlin version at least when jvmTarget is set to "1.8" or greater; so there is no substitute to actually checking your own bytecode.
What makes programmatic introspection/reflection easier in virtual machines rather than native code?
I read somewhere that VMs by nature allow for better introspection/reflection capabilities but I cannot find more information about it online. Would like to know why.
I believe you mean higher-level languages vs lower-level languages instead of virtual machines.
Higher level languages like Java and C# have implemented reflection and introspection, so there are functions available to the developer to use this information.
Languages like C do not have any pre-built reflection capabilities.
Reflection is very expensive (time-consuming) for any language to run, and should not be used in code that needs to be extremely fast.
Programmatic introspection essentially means to examine & inspect the current call stack, or the current continuation. (Read Appel's book: Compiling with Continuations).
Few programming languages provide this ability. Scheme's call/cc reifies the current continuation, but give no standard ways to inspect it.
The current call stack might be inspectable (e.g. see GCC __builtin_return_address as an ad hoc example).
Most compilers (but not all) do not have an easy way to give information about the layout of the current call frame (however, the debugger DWARF format contains it).
And optimizing compilers (e.g. for C) usually don't give access to the offset of some local variable in the call frame (even if the compiler computes this offset). BTW, the same stack slot might be reused for different variables; read about register spilling.
See also J.Pitrat's CAIA system - the generated C code is able to organize the stack to be able to inspect it;
In a bytecode VM like JVM or NekoVM or Parrot, introspection is easier because each local variable has a well defined slot in the call frame. This is not the case for most compiled languages (e.g. C or C++) because the compiler is able to reuse (for optimization purposes) some slots, or even put a variable only in some machine register, without even allocating any call stack slot to spill it.
I was wondering, why does Math.sin(double) delegate to StrictMath.sin(double) when I've found the problem in a Reddit thread. The mentioned code fragment looks like this (JDK 7u25):
Math.java :
public static double sin(double a) {
return StrictMath.sin(a); // default impl. delegates to StrictMath
}
StrictMath.java :
public static native double sin(double a);
The second declaration is native which is reasonable for me. The doc of Math states that:
Code generators are encouraged to use platform-specific native libraries or microprocessor instructions, where available (...)
And the question is: isn't the native library that implements StrictMath platform-specific enough? What more can a JIT know about the platform than an installed JRE (please only concentrate on this very case)? In ther words, why isn't Math.sin() native already?
I'll try to wrap up the entire discussion in a single post..
Generally, Math delegates to StrictMath. Obviously, the call can be inlined so this is not a performance issue.
StrictMath is a final class with native methods backed by native libraries. One might think, that native means optimal, but this doesn't necessarily has to be the case. Looking through StrictMath javadoc one can read the following:
(...) the definitions of some of the numeric functions in this package require that they produce the same results as certain published algorithms. These algorithms are available from the well-known network library netlib as the package "Freely Distributable Math Library," fdlibm. These algorithms, which are written in the C programming language, are then to be understood as executed with all floating-point operations following the rules of Java floating-point arithmetic.
How I understand this doc is that the native library implementing StrictMath is implemented in terms of fdlibm library, which is multi-platform and known to produce predictable results. Because it's multi-platform, it can't be expected to be an optimal implementation on every platform and I believe that this is the place where a smart JIT can fine-tune the actual performance e.g. by statistical analysis of input ranges and adjusting the algorithms/implementation accordingly.
Digging deeper into the implementation it quickly turns out, that the native library backing up StrictMath actually uses fdlibm:
StrictMath.c source in OpenJDK 7 looks like this:
#include "fdlibm.h"
...
JNIEXPORT jdouble JNICALL
Java_java_lang_StrictMath_sin(JNIEnv *env, jclass unused, jdouble d)
{
return (jdouble) jsin((double)d);
}
and the sine function is defined in fdlibm/src/s_sin.c refering in a few places to __kernel_sin function that comes directly from the header fdlibm.h.
While I'm temporarily accepting my own answer, I'd be glad to accept a more competent one when it comes up.
Why does Math.sin() delegate to StrictMath.sin()?
The JIT compiler should be able to inline the StrictMath.sin(a) call. So there's little point creating an extra native method for the Math.sin() case ... and adding extra JIT compiler smarts to optimize the calling sequence, etcetera.
In the light of that, your objection really boils down to an "elegance" issue. But the "pragmatic" viewpoint is more persuasive:
Fewer native calls makes the JVM core and JIT easier to maintain, less fragile, etcetera.
If it ain't broken, don't fix it.
At least, that's how I imagine how the Java team would view this.
The question assumes that the JVM actually runs the delegation code. On many JVMs, it won't. Calls to Math.sin(), etc.. will potentially be replaced by the JIT with some intrinsic function code (if suitable) transparently. This will typically be done in an unobservable way to the end user. This is a common trick for JVM implementers where interesting specializations can happen (even if the method is not tagged as native).
Note however that most platforms can't simply drop in the single processor instruction for sin due to suitable input ranges (eg see: Intel discussion).
Math API permits a non-strict but better-performing implementations of its methods but does not require it and by default Math simply uses StrictMath impl.
I am converting some C++ code to Java and I was wondering what I can do about the inlined functions. Can I assume that functions will be inlined by the VM (as an when necessary) and just not worry about this? How do I profile to observe this behaviour? Suppose there is a main outer function, and I throw a for loop around it and cause a million invocations. Should I expect to see an improvements as the VM inlines more and more?
Yes Java does inline method calls. The inlining is performed by the JIT compiler, so you won't see it by examining the bytecode files.
Whether inlining actually occurs for a given method call will depend on the size of the method body, and whether the call is inlineable. (If a method call involves dispatching ... after the JVM has a bunch of global optimization designed to remove unnecessary dispatching ... then it cannot be inlined.)
The same applies to your example with your outer main function. It depends on how big the method body is. On the other hand, if the method takes a significant time to execute, then the relative importance of the optimization decreases correspondingly.
My advice is to not worry about things like this at this stage. Just write the code clearly and simply, and let the JIT compiler deal with the problem of optimizing. When your application is working, you can profile it and see if there are any "hot spots" in the code that are worthwhile optimizing by hand.
But I should be able to see this in something like Visual VM right? I mean initially no inlining, then more and more stuff is inlined so the average time for the outer method is slightly reduced.
It may be observable and it may not, depending on the amount time spent in making the calls relative to executing the method bodies. (Profiling often relies on sampling the program counter. The reported times may be inaccurate if the number of samples for a given region of code is too small ... and for other reasons.)
It also depends on the JVM you are using. Not all JVMs will re-optimize code that they have previously optimized.
Finally, there is a way to get the JVM to dump the native code output by the JIT compiler. That will give you a definitive answer as to what has been inlined ... if you are prepared to read the machine instructions.