Why does the JVM have both `invokespecial` and `invokestatic` opcodes? - jvm

Both instructions use static rather than dynamic dispatch. It seems like the only substantial difference is that invokespecial will always have, as its first argument, an object that is an instance of the class that the dispatched method belongs to. However, invokespecial does not actually put the object there; the compiler is the one responsible for making that happen by emitting the appropriate sequence of stack operations before emitting invokespecial. So replacing invokespecial with invokestatic should not affect the way the runtime stack / heap gets manipulated -- though I expect that it will cause a VerifyError for violating the spec.
I'm curious about the possible reasons behind making two distinct instructions that do essentially the same thing. I took a look at the source of the OpenJDK interpreter, and it seems like invokespecial and invokestatic are handled almost identically. Does having two separate instructions help the JIT compiler better optimize code, or does it help the classfile verifier prove some safety properties more efficiently? Or is this just a quirk in the JVM's design?

Disclaimer: It is hard to tell for sure since I never read an explicit Oracle statement about this, but I pretty much think this is the reason:
When you look at Java byte code, you could ask the same question about other instructions. Why would the verifier stop you when pushing two ints on the stack and treating them as a single long right after? (Try it, it will stop you.) You could argue that by allowing this, you could express the same logic with a smaller instruction set. (To go further with this argument, a byte cannot express too many instructions, the Java byte code set should therefore cut down wherever possible.)
Of course, in theory you would not need a byte code instruction for pushing ints and longs to the stack and you are right about the fact that you would not need two instructions for INVOKESPECIAL and INVOKESTATIC in order to express method invocations. A method is uniquely identified by its method descriptor (name and raw argument types) and you could not define both a static and a non-static method with an identical description within the same class. And in order to validate the byte code, the Java compiler must check whether the target method is static nevertheless.
Remark: This contradicts the answer of v6ak. However, a methods descriptor of a non-static method is not altered to include a reference to this.getClass(). The Java runtime could therefore always infer the appropriate method binding from the method descriptor for a hypothetical INVOKESMART instruction. See JVMS §4.3.3.
So much for the theory. However, the intentions that are expressed by both invocation types are quite different. And remember that Java byte code is supposed to be used by other tools than javac to create JVM applications, as well. With byte code, these tools produce something that is more similar to machine code than your Java source code. But it is still rather high level. For example, byte code still is verified and the byte code is automatically optimized when compiled to machine code. However, the byte code is an abstraction that intentionally contains some redundancy in order to make the meaning of the byte code more explicit. And just like the Java language uses different names for similar things to make the language more readable, the byte code instruction set contains some redundancy as well. And as another benefit, verification and byte code interpretation/compilation can speed up since a method's invocation type does not always need to be inferred but is explicitly stated in the byte code. This is desirable because verification, interpretation and compilation are done at runtime.
As a final anecdote, I should mention that a class's static initializer <clinit> was not flagged static before Java 5. In this context, the static invocation could also be inferred by the method's name but this would cause even more run time overhead.

There are the definitions:
http://docs.oracle.com/javase/specs/jvms/se5.0/html/Instructions2.doc6.html#invokestatic
http://docs.oracle.com/javase/specs/jvms/se5.0/html/Instructions2.doc6.html#invokespecial
There are significant differences. Say we want to design an invokesmart instruction, which would choose smartly between inkovestatic and invokespecial:
First, it would not be a problem to distinguish between static and virtual calls, since we can't have two methods with same name, same parameter types and same return type, even if one is static and second is virtual. JVM does not allow that (for a strange reason). Thanks raphw for noticing that.
First, what would invokesmart foo/Bar.baz(I)I mean? It may mean:
A static method call foo.Bar.baz that consumes int from operand stack and adds another int. // (int) -> (int)
An instance method call foo.Bar.baz that consumes foo.Bar and int from operand stack and adds int. // (foo.Bar, int) -> (int)
How would you choose from them? There may exist both methods.
We may try to solve it by requiring foo/Bar.baz(Lfoo/Bar;I) for the static call. However, we may have both public static int baz(Bar, int) and public int baz(int).
We may say that it does not matter and possibly disable such situation. (I don't think that it is a good idea, but just to imagine.) What would it mean?
If the method is static, there are probably no additional restrictions. On the other hand, if the method is not static, there are some restrictions: "Finally, if the resolved method is protected (§4.6), and it is either a member of the current class or a member of a superclass of the current class, then the class of objectref must be either the current class or a subclass of the current class."
There are some further differences, see the note about ACC_SUPER.
It would mean that all the referenced classes must be loaded before bytecode verification. I hope this is not necessary now, but I am not 100% sure.
So, it would mean very inconsistent behavior.

Related

Fastest way to invoke method handle fields

I'm generating bytecode roughly equivalent to a class like:
final class MyCls {
final MethodHandle handle1;
final MethodHandle handle2;
// and so on
// This needs to invoke `handle1`, `handle2`, etc. in it somehow
final static myMethod() {
// ...
}
}
The class is fairly long-lived and I wish to call the MethodHandles from inside other methods, with ideally as little overhead as possible. What would be the best way to do this? The two ideas that come to mind are:
Generating explicit MethodHandle.invokeExact calls on the fields
Using invokedynamic somehow (although I think I'd still need the exactInvoker?)
The handles will vary in signatures (although their use-sites should all use the right signatures - I can detect/enforce that at codegen time).
Update
Here's some extra context on what I'm actually doing. The classes represent compiled WASM modules, the method handles are imported functions, and each instance of the class in another instance of the WASM module.
Using MethodHandle to represent imported functions isn't a necessity here - I could also accept something like a java.util.function.Function or maybe even just a virtual method invocation. I do need a MethodHandle representation sometimes, but I could summon one up from a virtual method too (and I could implement a virtual method manually calling a Function too).
The module class instances themselves might end up being stored in static fields but that's not guaranteed. If there is a way to speed up that case, I could recommend users use that.
The simple answer is to just generate invokeExact calls. With the code shape you've shown, there's no need to use invokedynamic (in fact that doesn't seem possible, since invokedynamic calls a bootstrap method which supplies the implementation dynamically).
Since the handles are stored instance fields, they are not seen as constants, and so the calls will be out of line, which adds overhead, as well as missed optimization opportunities due to a lack of inlining.
If you really want this to be as fast as possible, you'd need to generate a new class per combination of method handles you want to use, and store the method handles in static final fields, or in the constant pool (for instance using constant pool patching, or hidden classes + class data + dynamic constants [1]).

Is there way to force SPIR-V assembly function to accept both Private and Function storage class arrays?

I am in the process of writing a binary processing module for SPIR-V shaders to fix alignment issues with float4x3[6] matrices because of driver bugs. Right now i have:
injected necessary appropriate OpTypes and OpTypePointers.
processed the binary to change constant buffer members from float4x3[6] to vec4[18]
injected function properly unpacking vec4[18] into float4x3[6] accepting vec4[18] as a pointer to Uniform array 18.
created Private storage qualifier matrix unpack targets as OpVariables.(Private in SPIR-V just means invocation-level global...).
injected preambles about composite extraction and construction to call my new function. (since from what im seeing we need to copy arguments from constant buffers to functions always, so thats what I do).
called the function from entrypoint, for every float4x3[6] member to have ready unpacked matrices when main() starts.
changed OpAccessChain operations that referenced given members in constant buffers and swapped them with access chains referencing my new Private targets.
But now i ran into trouble. It looks like a function in SPIR-V can either accept Private or Function storage qualifier pointers. Not both. Is there any way i can tell SPIR-V "Yeah, you can dump both of those storage classes here as arguments"?
Or do i need to rework my solution to utilize Function storage class matrix targets, and inject them and calls to unpack them every single time they are used in a new function? This seems much less elegant since there might be way more unpack operations then. And much less hassle-free, since i would have to scan every OpFunction block separately and inject OpVariables with Function storage into every block that uses the matrices.
My problem is, after all this machinery is done my targets are living as OpTypePointer of Private Storage Duration. Therefore i cannot use them in ANY SPIR-V function generated from HLSL, since they take OpTypePointers of Function duration. My unpack function is sole exception to this since i injected it directly in SPIR-V asm, byte by byte and was able to precisely tune OpFunctionParameters in header.
This is a matter of calling conventions. Or rather, the lack of calling conventions in SPIR-V.
Higher-level languages like GLSL and HLSL have calling conventions. They explain what it means for a function to take an input parameter and how that relates to the argument being given to it.
SPIR-V doesn't have calling conventions in that sense. Or more to the point, you have to construct the calling conventions you want using SPIR-V.
Parameters in HLSL are conceptually always passed by copy. If the parameter is an input parameter, then the copy is initialized with the given argument. If the parameter is an output parameter, the data from the function is copied into the argument after calling the function.
The HLSL compiler must implement this in SPIR-V. So if a function takes a struct input parameter, that function's input parameter must be new storage from any existing object. When a caller tries to call this function, it must create storage for that parameter. That storage will use the Function storage qualifier, so the parameter also uses that qualifier.
SPIR-V requires that pointer types specify the storage qualifier of the objects they point to. This is important, as how the compiler goes about generating the GPU assembly which accesses the object can be different (perhaps drastically). As such, a function cannot accept a pointer that points to different storage classes; the function has to pick one.
So if your SPIR-V adjustment system sees a function call whose source data comes from something you need to adjust, then you have two options:
Create a new function which is a copy of the old one, except that it takes a Private pointer.
Follow the calling convention by creating Function-local storage and copying from your Private data into it prior to calling the function (and copying back out if it is an output parameter). There's probably code to do that sitting there already, so you probably only need to change where it copies from/to.

From a ByteBuddy-generated method, how do I set a (public) instance field in an object received as an argument to the return value of a MethodCall?

I am generating a class in ByteBuddy.
As part of one method implementation, I would like to set a (let's just say) public instance field in another object to the return value of a MethodCall invocation. (Keeping the example public means that access checks etc. are irrelevant.)
I thought I could use MethodCall#setsField(FieldDescription) to do this.
But from my prior question related to this I learned that MethodCall#setsField(FieldDescription) is intended to work only on fields of the instrumented type, and, looking at it now, I'm not entirely sure why or how I thought it was ever going to work.
So: is there a way for a ByteBuddy-generated method implementation to set an instance field of another object to the return value of a method invocation?
If it matters, the "instrumented method" (in ByteBuddy's terminology) accepts the object whose field I want to set as an argument. Naïvely I'd expect to be able to do something like:
MethodCall.invoke(someMethod).setsField(somePublicField).onArgument(2);
There may be problems here that I am not seeing but I was slightly surprised not to see this DSL option. (It may not exist for perfectly good reasons; I just don't know what they would be.)
This is not possible as of Byte Buddy 1.10.18, the mechanism was originally created to support getters/setters when defining beans, for example. That said, it would not be difficult to add; I think it would even be easiest to allow any custom byte code to be dispatched as a consumer of the method call.
I will look into how this can be done, but as a new feature, this will take some time before I find the empty space to do so. The change is tracked on GitHub.

When is invokedynamic actually useful (besides lazy constants)?

TL;DR
Please provide a piece of code written in some well known dynamic language (e.g. JavaScript) and how that code would look like in Java bytecode using invokedynamic and explain why the usage of invokedynamic is a step forward here.
Background
I have googled and read quite a lot about the not-that-new-anymore invokedynamic instruction which everyone on the internet agrees on that it will help speed dynamic languages on the JVM. Thanks to stackoverflow I managed to get my own bytecode instructions with Sable/Jasmin to run.
I have understood that invokedynamic is useful for lazy constants and I also think that I understood how the OpenJDK takes advantage of invokedynamic for lambdas.
Oracle has a small example, but as far as I can tell the usage of invokedynamic in this case defeats the purpose as the example for "adder" could much simpler, faster and with roughly the same effect expressed with the following bytecode:
aload whereeverAIs
checkcast java/lang/Integer
aload whereeverBIs
checkcast java/lang/Integer
invokestatic IntegerOps/adder(Ljava/lang/Integer;Ljava/lang/Integer;)Ljava/lang/Integer;
because for some reason Oracle's bootstrap method knows that both arguments are integers anyway. They even "admit" that:
[..]it assumes that the arguments [..] will be Integer objects. A bootstrap method requires additional code to properly link invokedynamic [..] if the parameters of the bootstrap method (in this example, callerClass, dynMethodName, and dynMethodType) vary.
Well yes, and without that interesing "additional code" there is no point in using invokedynamic here, is there?
So after that and a couple of further Javadoc and Blog entries I think that I have a pretty good grasp on how to use invokedynamic as a poor replacement when invokestatic/invokevirtual/invokevirtual or getfield would work just as well.
Now I am curious how to actually apply the invokedynamic instruction to a real world usecase so that it actually is some improvements over what we could with "traditional" invocations (except lazy constants, I got those...).
Actually, lazy operations are the main advantage of invokedynamic if you take the term “lazy creation” broadly. E.g., the lambda creation feature of Java 8 is a kind of lazy creation that includes the possibility that the actual class containing the code that will be finally invoked by the invokedynamic instruction doesn’t even exist prior to the execution of that instruction.
This can be projected to all kind of scripting languages delivering code in a different form than Java bytecode (might be even in source code). Here, the code may be compiled right before the first invocation of a method and remains linked afterwards. But it may even become unlinked if the scripting language supports redefinition of methods. This uses the second important feature of invokedynamic, to allow mutable CallSites which may be changed afterwards while supporting maximal performance when being invoked frequently without redefinition.
This possibility to change an invokedynamic target afterwards allows another option, linking to an interpreted execution on the first invocation, counting the number of executions and compiling the code only after exceeding a threshold (and relinking to the compiled code then).
Regarding dynamic method dispatch based on a runtime instance, it’s clear that invokedynamic can’t elide the dispatch algorithm. But if you detect at runtime that a particular call-site will always call the method of the same concrete type you may relink the CallSite to an optimized code which will do a short check if the target is that expected type and performs the optimized action then but branches to the generic code performing the full dynamic dispatch only if that test fails. The implementation may even de-optimize such a call-site if it detects that the fast path check failed a certain number of times.
This is close to how invokevirtual and invokeinterface are optimized internally in the JVM as for these it’s also the case that most of these instructions are called on the same concrete type. So with invokedynamic you can use the same technique for arbitrary lookup algorithms.
But if you want an entirely different use case, you can use invokedynamic to implement friend semantics which are not supported by the standard access modifier rules. Suppose you have a class A and B which are meant to have such a friend relationship in that A is allowed to invoke private methods of B. Then all these invocations may be encoded as invokedynamic instructions with the desired name and signature and pointing to a public bootstrap method in B which may look like this:
public static CallSite bootStrap(Lookup l, String name, MethodType type)
throws NoSuchMethodException, IllegalAccessException {
if(l.lookupClass()!=A.class || (l.lookupModes()&0xf)!=0xf)
throw new SecurityException("unprivileged caller");
l=MethodHandles.lookup();
return new ConstantCallSite(l.findStatic(B.class, name, type));
}
It first verifies that the provided Lookup object has full access to A as only A is capable of constructing such an object. So sneaky attempts of wrong callers are sorted out at this place. Then it uses a Lookup object having full access to B to complete the linkage. So, each of these invokedynamic instructions is permanently linked to the matching private method of B after the first invocation, running at the same speed as ordinary invocations afterwards.

STM32 programming tips and questions

I could not find any good document on internet about STM32 programming. STM's own documents do not explain anything more than register functions. I will greatly appreciate if anyone can explain my following questions?
I noticed that in all example programs that STM provides, local variables for main() are always defined outside of the main() function (with occasional use of static keyword). Is there any reason for that? Should I follow a similar practice? Should I avoid using local variables inside the main?
I have a gloabal variable which is updated within the clock interrupt handle. I am using the same variable inside another function as a loop condition. Don't I need to access this variable using some form of atomic read operation? How can I know that a clock interrupt does not change its value in the middle of the function execution? Should I need to cancel clock interrupt everytime I need to use this variable inside a function? (However, this seems extremely ineffective to me as I use it as loop condition. I believe there should be better ways of doing it).
Keil automatically inserts a startup code which is written in assembly (i.e. startup_stm32f4xx.s). This startup code has the following import statements:
IMPORT SystemInit
IMPORT __main
.In "C", it makes sense. However, in C++ both main and system_init have different names (e.g. _int_main__void). How can this startup code can still work in C++ even without using "extern "C" " (I tried and it worked). How can the c++ linker (armcc --cpp) can associate these statements with the correct functions?
you can use local or global variables, using local in embedded systems has a risk of your stack colliding with your data. with globals you dont have that problem. but this is true no matter where you are, embedded microcontroller, desktop, etc.
I would make a copy of the global in the foreground task that uses it.
unsigned int myglobal;
void fun ( void )
{
unsigned int myg;
myg=myglobal;
and then only use myg for the rest of the function. Basically you are taking a snapshot and using the snapshot. You would want to do the same thing if you are reading a register, if you want to do multiple things based on a sample of something take one sample of it and make decisions on that one sample, otherwise the item can change between samples. If you are using one global to communicate back and forth to the interrupt handler, well I would use two variables one foreground to interrupt, the other interrupt to foreground. yes, there are times where you need to carefully manage a shared resource like that, normally it has to do with times where you need to do more than one thing, for example if you had several items that all need to change as a group before the handler can see them change then you need to disable the interrupt handler until all the items have changed. here again there is nothing special about embedded microcontrollers this is all basic stuff you would see on a desktop system with a full blown operating system.
Keil knows what they are doing if they support C++ then from a system level they have this worked out. I dont use Keil I use gcc and llvm for microcontrollers like this one.
Edit:
Here is an example of what I am talking about
https://github.com/dwelch67/stm32vld/tree/master/stm32f4d/blinker05
stm32 using timer based interrupts, the interrupt handler modifies a variable shared with the foreground task. The foreground task takes a single snapshot of the shared variable (per loop) and if need be uses the snapshot more than once in the loop rather than the shared variable which can change. This is C not C++ I understand that, and I am using gcc and llvm not Keil. (note llvm has known problems optimizing tight while loops, very old bug, dont know why they have no interest in fixing it, llvm works for this example).
Question 1: Local variables
The sample code provided by ST is not particularly efficient or elegant. It gets the job done, but sometimes there are no good reasons for the things they do.
In general, you use always want your variables to have the smallest scope possible. If you only use a variable in one function, define it inside that function. Add the "static" keyword to local variables if and only if you need them to retain their value after the function is done.
In some embedded environments, like the PIC18 architecture with the C18 compiler, local variables are much more expensive (more program space, slower execution time) than global. On the Cortex M3, that is not true, so you should feel free to use local variables. Check the assembly listing and see for yourself.
Question 2: Sharing variables between interrupts and the main loop
People have written entire chapters explaining the answers to this group of questions. Whenever you share a variable between the main loop and an interrupt, you should definitely use the volatile keywords on it. Variables of 32 or fewer bits can be accessed atomically (unless they are misaligned).
If you need to access a larger variable, or two variables at the same time from the main loop, then you will have to disable the clock interrupt while you are accessing the variables. If your interrupt does not require precise timing, this will not be a problem. When you re-enable the interrupt, it will automatically fire if it needs to.
Question 3: main function in C++
I'm not sure. You can use arm-none-eabi-nm (or whatever nm is called in your toolchain) on your object file to see what symbol name the C++ compiler assigns to main(). I would bet that C++ compilers refrain from mangling the main function for this exact reason, but I'm not sure.
STM's sample code is not an exemplar of good coding practice, it is merely intended to exemplify use of their standard peripheral library (assuming those are the examples you are talking about). In some cases it may be that variables are declared external to main() because they are accessed from an interrupt context (shared memory). There is also perhaps a possibility that it was done that way merely to allow the variables to be watched in the debugger from any context; but that is not a reason to copy the technique. My opinion of STM's example code is that it is generally pretty poor even as example code, let alone from a software engineering point of view.
In this case your clock interrupt variable is atomic so long as it is 32bit or less so long as you are not using read-modify-write semantics with multiple writers. You can safely have one writer, and multiple readers regardless. This is true for this particular platform, but not necessarily universally; the answer may be different for 8 or 16 bit systems, or for multi-core systems for example. The variable should be declared volatile in any case.
I am using C++ on STM32 with Keil, and there is no problem. I am not sure why you think that the C++ entry points are different, they are not here (Keil ARM-MDK v4.22a). The start-up code calls SystemInit() which initialises the PLL and memory timing for example, then calls __main() which performs global static initialisation then calls C++ constructors for global static objects before calling main(). If in doubt, step through the code in the debugger. It is important to note that __main() is not the main() function you write for your application, it is a wrapper with different behaviour for C and C++, but which ultimately calls your main() function.