Context: I've been benchmarking the difference between using invokedynamic and manually generating bytecode (this is in the context of deciding whether a compiler targeting the JVM should emit more verbose "traditional" bytecode or just an invokedynamic call with a clever bootstrap method). In doing this, it has been pretty straightforward to map bytecode into MethodHandles combinators that are at least as fast, with the exception of tableswitch.
Question: Is there a trick to mimic tableswitch using MethodHandle? I tried mimicking it with a jump table: using a constant MethodHandle[], indexing into that with arrayElementGetter, then calling the found handle with MethodHandles.invoker. However, that ended up being around 50% slower than the original bytecode when I ran it through JMH.
Here's the code for producing the method handle:
private static MethodHandle makeProductElement(Class<?> receiverClass, List<MethodHandle> getters) {
MethodHandle[] boxedGetters = getters
.stream()
.map(getter -> getter.asType(getter.type().changeReturnType(java.lang.Object.class)))
.toArray(MethodHandle[]::new);
MethodHandle getGetter = MethodHandles // (I)H
.arrayElementGetter(MethodHandle[].class)
.bindTo(boxedGetters);
MethodHandle invokeGetter = MethodHandles.permuteArguments( // (RH)O
MethodHandles.invoker(MethodType.methodType(java.lang.Object.class, receiverClass)),
MethodType.methodType(java.lang.Object.class, receiverClass, MethodHandle.class),
1,
0
);
return MethodHandles.filterArguments(invokeGetter, 1, getGetter);
}
Here's the initial bytecode (which I'm trying to replace with one invokedynamic call)
public java.lang.Object productElement(int);
descriptor: (I)Ljava/lang/Object;
flags: (0x0001) ACC_PUBLIC
Code:
stack=3, locals=3, args_size=2
0: iload_1
1: istore_2
2: iload_2
3: tableswitch { // 0 to 2
0: 28
1: 38
2: 45
default: 55
}
28: aload_0
29: invokevirtual #62 // Method i:()I
32: invokestatic #81 // Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
35: goto 67
38: aload_0
39: invokevirtual #65 // Method s:()Ljava/lang/String;
42: goto 67
45: aload_0
46: invokevirtual #68 // Method l:()J
49: invokestatic #85 // Method java/lang/Long.valueOf:(J)Ljava/lang/Long;
52: goto 67
55: new #87 // class java/lang/IndexOutOfBoundsException
58: dup
59: iload_1
60: invokestatic #93 // Method java/lang/Integer.toString:(I)Ljava/lang/String;
63: invokespecial #96 // Method java/lang/IndexOutOfBoundsException."<init>":(Ljava/lang/String;)V
66: athrow
67: areturn
The good thing about invokedynamic is that it allows to postpone the decision, how to implement the operation to the actual runtime. This is the trick behind LambdaMetafactory or StringConcatFactory which may return composed method handles, like in your example code, or dynamically generated code, at the particular implementation’s discretion.
There’s even a combined approach possible, generate classes which you compose to an operation, e.g. settling on the already existing LambdaMetafactory:
private static MethodHandle makeProductElement(
MethodHandles.Lookup lookup, Class<?> receiverClass, List<MethodHandle> getters)
throws Throwable {
Function[] boxedGetters = new Function[getters.size()];
MethodType factory = MethodType.methodType(Function.class);
for(int ix = 0; ix < boxedGetters.length; ix++) {
MethodHandle mh = getters.get(ix);
MethodType actual = mh.type().wrap(), generic = actual.erase();
boxedGetters[ix] = (Function)LambdaMetafactory.metafactory(lookup,
"apply", factory, generic, mh, actual).getTarget().invokeExact();
}
Object switcher = new Object() {
final Object get(Object receiver, int index) {
return boxedGetters[index].apply(receiver);
}
};
return lookup.bind(switcher, "get",
MethodType.methodType(Object.class, Object.class, int.class))
.asType(MethodType.methodType(Object.class, receiverClass, int.class));
}
This uses the LambdaMetafactory to generate a Function instance for each getter, similar to equivalent method references. Then, an actual class calling the right Function’s apply method is instantiated and a method handle to its get method returned.
This is a similar composition as your method handles, but with the reference implementation, no handles but fully materialized classes are used. I’d expect the composed handles and this approach to converge to the same performance for a very large number of invocations, but the materialized classes having a headstart for a medium number of invocations.
I added a first parameter MethodHandles.Lookup lookup which should be the lookup object received by the bootstrap method for the invokedynamic instruction. If used that way, the generated functions can access all methods the same way as the code containing the invokedynamic instruction, including private methods of that class.
Alternatively, you can generate a class containing a real switch instruction yourself. Using the ASM library, it may look like:
private static MethodHandle makeProductElement(
MethodHandles.Lookup lookup, Class<?> receiverClass, List<MethodHandle> getters)
throws ReflectiveOperationException {
ClassWriter cw = new ClassWriter(ClassWriter.COMPUTE_FRAMES);
cw.visit(V1_8, ACC_INTERFACE|ACC_ABSTRACT,
lookup.lookupClass().getName().replace('.', '/')+"$Switch", null,
"java/lang/Object", null);
MethodType type = MethodType.methodType(Object.class, receiverClass, int.class);
MethodVisitor mv = cw.visitMethod(ACC_STATIC|ACC_PUBLIC, "get",
type.toMethodDescriptorString(), null, null);
mv.visitCode();
Label defaultCase = new Label();
Label[] cases = new Label[getters.size()];
for(int ix = 0; ix < cases.length; ix++) cases[ix] = new Label();
mv.visitVarInsn(ALOAD, 0);
mv.visitVarInsn(ILOAD, 1);
mv.visitTableSwitchInsn(0, cases.length - 1, defaultCase, cases);
String owner = receiverClass.getName().replace('.', '/');
for(int ix = 0; ix < cases.length; ix++) {
mv.visitLabel(cases[ix]);
MethodHandle mh = getters.get(ix);
mv.visitMethodInsn(INVOKEVIRTUAL, owner, lookup.revealDirect(mh).getName(),
mh.type().dropParameterTypes(0, 1).toMethodDescriptorString(), false);
if(mh.type().returnType().isPrimitive()) {
Class<?> boxed = mh.type().wrap().returnType();
MethodType box = MethodType.methodType(boxed, mh.type().returnType());
mv.visitMethodInsn(INVOKESTATIC, boxed.getName().replace('.', '/'),
"valueOf", box.toMethodDescriptorString(), false);
}
mv.visitInsn(ARETURN);
}
mv.visitLabel(defaultCase);
mv.visitTypeInsn(NEW, "java/lang/IndexOutOfBoundsException");
mv.visitInsn(DUP);
mv.visitVarInsn(ILOAD, 1);
mv.visitMethodInsn(INVOKESTATIC, "java/lang/String",
"valueOf", "(I)Ljava/lang/String;", false);
mv.visitMethodInsn(INVOKESPECIAL, "java/lang/IndexOutOfBoundsException",
"<init>", "(Ljava/lang/String;)V", false);
mv.visitInsn(ATHROW);
mv.visitMaxs(-1, -1);
mv.visitEnd();
cw.visitEnd();
lookup = lookup.defineHiddenClass(
cw.toByteArray(), true, MethodHandles.Lookup.ClassOption.NESTMATE);
return lookup.findStatic(lookup.lookupClass(), "get", type);
}
This generates a new class with a static method containing the tableswitch instruction and the invocations (as well as the boxing conversions we now have to do ourselves). Also, it has the necessary code to create and throw an exception for out-of-bounds values. After generating the class, it returns a handle to that static method.
I don't know of your timeline. But it is likely there will be a MethodHandles.tableSwitch operation in Java 17. It is currently being integrated via https://github.com/openjdk/jdk/pull/3401/
Some more discussion about it here:
https://mail.openjdk.java.net/pipermail/core-libs-dev/2021-April/076105.html
The things is, tableswitch isn't always compiled to a jump table. For a small number of labels, like in your example, it's likely to act as a binary search. Thus using a tree of regular "if-then" MethodHandles will be the closest equivalent.
I'm checking kotlinc bytecode of capturing lambdas. And trying to understand the reason why resulting bytecode has nop instructions.
kotlinc -jvm-target 1.6 .
private inline fun lambdaCapturing(f: () -> Int): Int = f()
fun main(args: Array<String>) {
lambdaCapturing { 42 }
}
As a result I'm getting
public final class x.y.z.LambdaCaptKt {
private static final int lambdaCapturing(kotlin.jvm.functions.Function0<java.lang.Integer>);
Code:
0: ldc #8 // int 0
2: istore_1
3: aload_0
4: invokeinterface #14, 1 // InterfaceMethod kotlin/jvm/functions/Function0.invoke:()Ljava/lang/Object;
9: checkcast #16 // class java/lang/Number
12: invokevirtual #20 // Method java/lang/Number.intValue:()I
15: ireturn
public static final void main(java.lang.String[]);
Code:
0: aload_0
1: ldc #29 // String args
3: invokestatic #35 // Method kotlin/jvm/internal/Intrinsics.checkParameterIsNotNull:(Ljava/lang/Object;Ljava/lang/String;)V
6: iconst_0
7: istore_1
8: iconst_0
9: istore_2
10: nop
11: nop
12: nop
13: return
}
with several nop instructions in main function.
If I will compile the same code snippet with -Xno-optimize, main function will look like
public static final void main(java.lang.String[]);
Code:
0: aload_0
1: ldc #29 // String args
3: invokestatic #35 // Method kotlin/jvm/internal/Intrinsics.checkParameterIsNotNull:(Ljava/lang/Object;Ljava/lang/String;)V
6: nop
7: iconst_0
8: istore_1
9: nop
10: iconst_0
11: istore_2
12: bipush 10
14: nop
15: goto 18
18: invokestatic #41 // Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
21: checkcast #16 // class java/lang/Number
24: invokevirtual #20 // Method java/lang/Number.intValue:()I
27: nop
28: goto 31
31: pop
32: return
There are nops as well.
What is the reason to have nops in non-optimised code? (debug info/...)
Is there any reason to have nops in optimized code?
The reason for nops in the bytecode that the Kotlin compiler emits is the possibility for the debugger to put a breakpoint at the closing brace, i.e. after the last statement, of a function or an if-clause and other clauses, and to make it possible to step to those locations. Doing that requires an instruction present in the bytecode that is also marked with the line number.
Some nops are then optimized away if they are redundant, such as when there's already a valid instruction following the last statement instructions.
I've been trying to convert a python project to nxc using the IDE: Brixc Command Center, so that it'll read a text file and split down the information into components where it can process it.
The main road block for me is the split string method which I cannot find/figure out.
in python it would be easy like Data1 = RawData.split("\n") where it would split it into a array and where I can sort through it like this:
Data1[nth position in array][character in nth position in selected value in array]
I tried doing repeating the same method in nxc but it doesn't work
1: #import "RawData.txt" Data0
2: string Data1[];
3: "task main(){
4: Data1 = Data0.split("\n");
5: if(Data1[1][0]=="a"){
6: TextOut(10,10,"its an a!");
7: }else{
8: TextOut(10,10,Data1[1][0]);
9: }
10: Wait(5000);
11:}
12:
the output should be the display of the first character of the second line in this case. surprisingly not, it doesn't work. and it spits out a few errors (I'm new to nxc after all).
line 3: Error: Datatypes are not compatible
line 3: Error: ';' expected
line 3: Error: Unmatched close parenthesis
line 4: Error: Unmatched close parenthesis
Just change the "a" to 'a'.
1: #import "RawData.txt" Data0
2: string Data1[];
3: task main(){
4: Data1 = Data0.split("\n");
5: if(Data1[1][0]=='a'){
6: TextOut(10,10,"its an a!");
7: }else{
8: TextOut(10,10,Data1[1][0]);
9: }
10: Wait(5000);
11:}
12:
" denotes a string (which is an array of characters) where as ' denotes a single character.
I'm working on a game written in Kotlin and was looking into improving GC churn. One of the major sources of churn are for-loops called in the main game/rendering loops that result in the allocation of iterators.
Turning to the documentation, I found this paragraph:
A for loop over an array is compiled to an index-based loop that does not create an iterator object.
If you want to iterate through an array or a list with an index, you can do it this way:
for (i in array.indices)
print(array[i])
Note that this “iteration through a range” is compiled down to optimal implementation with no extra objects created.
https://kotlinlang.org/docs/reference/control-flow.html#for-loops
Is this really true? To verify, I took this simple Kotlin program and inspected the generated byte code:
fun main(args: Array<String>) {
val arr = arrayOf(1, 2, 3)
for (i in arr.indices) {
println(arr[i])
}
}
According to the quote above, this should not result in any objects allocated, but get compiled down to a good old pre-Java-5 style for-loop. However, what I got was this:
41: aload_1
42: checkcast #23 // class "[Ljava/lang/Object;"
45: invokestatic #31 // Method kotlin/collections/ArraysKt.getIndices:([Ljava/lang/Object;)Lkotlin/ranges/IntRange;
48: dup
49: invokevirtual #37 // Method kotlin/ranges/IntRange.getFirst:()I
52: istore_2
53: invokevirtual #40 // Method kotlin/ranges/IntRange.getLast:()I
56: istore_3
57: iload_2
58: iload_3
59: if_icmpgt 93
This looks to me as if a method called getIndices is called that allocates a temporary IntRange object to back up bounds checking in this loop. How is this an "optimal implementation" with "no extra objects created", or am I missing something?
UPDATE:
So, after toying around a bit more and looking at the answers, the following appears to be true for Kotlin 1.0.2:
Arrays:
for (i in array.indices): range allocation
for (i in 0..array.size): no allocation
for (el in array): no allocation
array.forEach: no allocation
Collections:
for (i in coll.indices) range allocation
for (i in 0..coll.size): no allocation
for (el in coll): iterator allocation
coll.forEach: iterator allocation
To iterate an array without allocating extra objects you can use one of the following ways.
for-loop
for (e in arr) {
println(e)
}
forEach extension
arr.forEach {
println(it)
}
forEachIndexed extension, if you need to know index of each element
arr.forEachIndexed { index, e ->
println("$e at $index")
}
As far as I know the only allocation-less way to define a for loop is
for (i in 0..count - 1)
All other forms lead to either a Range allocation or an Iterator allocation. Unfortunately, you cannot even define an effective reverse for loop.
Here is an example of preparing a list and iterate with index and value.
val list = arrayListOf("1", "11", "111")
for ((index, value) in list.withIndex()) {
println("$index: $value")
}
Output:
0:1
1:11
2:111
Also, following code works similar,
val simplearray = arrayOf(1, 2, 3, 4, 5)
for ((index, value) in simplearray.withIndex()) {
println("$index: $value")
}
I'm using gcov to get code coverage for our project, but it frequently reports 50% conditional coverage for plain function calls. It doesn't make any difference if the function takes any parameters or returns any data or not. I'm using gcovr and Cobertura with Jenkins, but a simple gcov file gives the same result.
The actual tested code is attached below together with the stubbed functions, all in gcov format.
Any ideas why gcov threats these function calls as branches?
-: 146:/*****************************************************************************/
function _Z12mw_log_clearv called 2 returned 100% blocks executed 100%
2: 147:void mw_log_clear( void )
2: 147-block 0
-: 148:{
2: 149: uint8_t i = 0;
2: 150: uint8_t clear_tuple[EE_PAGE_SIZE] = { 0xff };
-: 151:
66: 152: for (i = 0; i < (int16_t)EE_PAGE_SIZE; i++)
2: 152-block 0
64: 152-block 1
66: 152-block 2
branch 0 taken 97%
branch 1 taken 3% (fallthrough)
-: 153: {
64: 154: clear_tuple[i] = 0xff;
-: 155: }
-: 156:
-: 157: /* Write pending data */
2: 158: mw_eeprom_write_blocking();
2: 158-block 0
call 0 returned 100%
branch 1 taken 100% (fallthrough) <---- This is a plain function call, not a branch
branch 2 taken 0% (throw) <---- This is a plain function call, not a branch
-: 159:
26: 160: for (i = 0; i < (RESERVED_PAGES_PER_PAREMETER_SET - POPULATED_PAGES_PER_PAREMETER_SET); i++)
2: 160-block 0
24: 160-block 1
26: 160-block 2
branch 0 taken 96%
branch 1 taken 4% (fallthrough)
-: 161: {
25: 162: if (status_ok != mw_eeprom_write(LOG_TUPLE_START_ADDRESS + i * EE_PAGE_SIZE, clear_tuple, sizeof(clear_tuple)))
25: 162-block 0
call 0 returned 100%
branch 1 taken 100% (fallthrough) <---- This is a plain function call, not a branch
branch 2 taken 0% (throw) <---- This is a plain function call, not a branch
25: 162-block 1
branch 3 taken 4% (fallthrough)
branch 4 taken 96%
-: 163: {
1: 164: mw_error_handler_add(mw_error_eeprom_busy);
1: 164-block 0
call 0 returned 100%
branch 1 taken 100% (fallthrough) <---- This is a plain function call, not a branch
branch 2 taken 0% (throw) <---- This is a plain function call, not a branch
1: 165: break;
1: 165-block 0
-: 166: }
-: 167:
24: 168: mw_eeprom_write_blocking();
24: 168-block 0
call 0 returned 100%
branch 1 taken 100% (fallthrough) <---- This is a plain function call, not a branch
branch 2 taken 0% (throw) <---- This is a plain function call, not a branch
-: 169: }
2: 170:}
2: 170-block 0
-: 171:
-: 172:/*****************************************************************************/
/*****************************************************************************/
void mw_eeprom_write_blocking(void)
{
stub_data.eeprom_write_blocking_calls++;
}
/*****************************************************************************/
void mw_error_handler_add(mw_error_code_t error_code)
{
EXPECT_EQ(error_code, stub_data.expected_error_code);
stub_data.registered_error_code = error_code;
}
/*****************************************************************************/
status_t mw_eeprom_write(
const uint32_t eeprom_start_index,
void *const source_start_address,
const uint32_t length)
{
stub_data.eeprom_write_start_index = eeprom_start_index;
stub_data.eeprom_write_length = length;
stub_data.eeprom_write_called = true;
EXPECT_NE(NULL, (uint32_t)source_start_address);
EXPECT_NE(0, length);
EXPECT_LE(eeprom_start_index + length, EEPROM_SIZE);
if (status_ok == stub_data.eeprom_write_status)
memcpy(&stub_data.eeprom[eeprom_start_index], source_start_address, length);
return stub_data.eeprom_write_status;
}
Solved!
Found the answer in this thread:
Why gcc 4.1 + gcov reports 100% branch coverage and newer (4.4, 4.6, 4.8) reports 50% for "p = new class;" line?
Seems like gcov reacted on some "invisible" exception handling code for these function calls, so adding "-fno-exceptions" to g++ made all these missing branches to disappear.