How to obtain the initial class file bytes - jvm

At some point in my program I need the initial class file bytes (the bytes describing the class before any transformations were applied). The methods I evaluated so far are:
Using the corresponding classloader to get the resource and simply loading the byte array again. This won't work for dynamically generated classes though (ASM, proxies, etc).
Storing a reference to the initial class file bytes in a ClassFileTransformer. While this works it means that I need to proactively store all byte arrays for all classes in case I need some of them later one. No cool.
Pretty much the same as above but using JVMTIs ClassFileLoadHook. The issue is the same as with the ClassFileTransformer though.
I checked what is happening when Instrumentation.retransformClasses is called. In the end this comes down to a native method needing the instanceKlassHandles to get the class file bytes. So nothing I can really access as well (at least I wouldn't know how).
Any other ideas for how I could get the initial class file bytes without storing a reference to the bytes for all classes upfront?

Related

Is there way to force SPIR-V assembly function to accept both Private and Function storage class arrays?

I am in the process of writing a binary processing module for SPIR-V shaders to fix alignment issues with float4x3[6] matrices because of driver bugs. Right now i have:
injected necessary appropriate OpTypes and OpTypePointers.
processed the binary to change constant buffer members from float4x3[6] to vec4[18]
injected function properly unpacking vec4[18] into float4x3[6] accepting vec4[18] as a pointer to Uniform array 18.
created Private storage qualifier matrix unpack targets as OpVariables.(Private in SPIR-V just means invocation-level global...).
injected preambles about composite extraction and construction to call my new function. (since from what im seeing we need to copy arguments from constant buffers to functions always, so thats what I do).
called the function from entrypoint, for every float4x3[6] member to have ready unpacked matrices when main() starts.
changed OpAccessChain operations that referenced given members in constant buffers and swapped them with access chains referencing my new Private targets.
But now i ran into trouble. It looks like a function in SPIR-V can either accept Private or Function storage qualifier pointers. Not both. Is there any way i can tell SPIR-V "Yeah, you can dump both of those storage classes here as arguments"?
Or do i need to rework my solution to utilize Function storage class matrix targets, and inject them and calls to unpack them every single time they are used in a new function? This seems much less elegant since there might be way more unpack operations then. And much less hassle-free, since i would have to scan every OpFunction block separately and inject OpVariables with Function storage into every block that uses the matrices.
My problem is, after all this machinery is done my targets are living as OpTypePointer of Private Storage Duration. Therefore i cannot use them in ANY SPIR-V function generated from HLSL, since they take OpTypePointers of Function duration. My unpack function is sole exception to this since i injected it directly in SPIR-V asm, byte by byte and was able to precisely tune OpFunctionParameters in header.
This is a matter of calling conventions. Or rather, the lack of calling conventions in SPIR-V.
Higher-level languages like GLSL and HLSL have calling conventions. They explain what it means for a function to take an input parameter and how that relates to the argument being given to it.
SPIR-V doesn't have calling conventions in that sense. Or more to the point, you have to construct the calling conventions you want using SPIR-V.
Parameters in HLSL are conceptually always passed by copy. If the parameter is an input parameter, then the copy is initialized with the given argument. If the parameter is an output parameter, the data from the function is copied into the argument after calling the function.
The HLSL compiler must implement this in SPIR-V. So if a function takes a struct input parameter, that function's input parameter must be new storage from any existing object. When a caller tries to call this function, it must create storage for that parameter. That storage will use the Function storage qualifier, so the parameter also uses that qualifier.
SPIR-V requires that pointer types specify the storage qualifier of the objects they point to. This is important, as how the compiler goes about generating the GPU assembly which accesses the object can be different (perhaps drastically). As such, a function cannot accept a pointer that points to different storage classes; the function has to pick one.
So if your SPIR-V adjustment system sees a function call whose source data comes from something you need to adjust, then you have two options:
Create a new function which is a copy of the old one, except that it takes a Private pointer.
Follow the calling convention by creating Function-local storage and copying from your Private data into it prior to calling the function (and copying back out if it is an output parameter). There's probably code to do that sitting there already, so you probably only need to change where it copies from/to.

From a ByteBuddy-generated method, how do I set a (public) instance field in an object received as an argument to the return value of a MethodCall?

I am generating a class in ByteBuddy.
As part of one method implementation, I would like to set a (let's just say) public instance field in another object to the return value of a MethodCall invocation. (Keeping the example public means that access checks etc. are irrelevant.)
I thought I could use MethodCall#setsField(FieldDescription) to do this.
But from my prior question related to this I learned that MethodCall#setsField(FieldDescription) is intended to work only on fields of the instrumented type, and, looking at it now, I'm not entirely sure why or how I thought it was ever going to work.
So: is there a way for a ByteBuddy-generated method implementation to set an instance field of another object to the return value of a method invocation?
If it matters, the "instrumented method" (in ByteBuddy's terminology) accepts the object whose field I want to set as an argument. Naïvely I'd expect to be able to do something like:
MethodCall.invoke(someMethod).setsField(somePublicField).onArgument(2);
There may be problems here that I am not seeing but I was slightly surprised not to see this DSL option. (It may not exist for perfectly good reasons; I just don't know what they would be.)
This is not possible as of Byte Buddy 1.10.18, the mechanism was originally created to support getters/setters when defining beans, for example. That said, it would not be difficult to add; I think it would even be easiest to allow any custom byte code to be dispatched as a consumer of the method call.
I will look into how this can be done, but as a new feature, this will take some time before I find the empty space to do so. The change is tracked on GitHub.

Byte Buddy LocationStrategy types

I saw that the default LocationStrategy is STRONG, which keeps a strong reference to the class loader when creating a ClassFileLocator. Does this mean Byte Buddy can prevent a class loader from being garbage collected (eg when undeploying a webapp from a servlet container) or is there another mechanism to evacuate those?
Also in this regard- the documentation about the WEAK strategy says a ClassFileLocator will "stop working" after the corresponding class loader is garbage collected. What are the implications? How would a locator for a garbage-collected class loader be used?
You are right about your assertion. With a strong type locator, all TypeDescriptions will reference the class loader as dependent types are resolved lazily. This means for example that if you look up a type's field type, that type will only be loaded if you are using it for the first time what might never happen.
Typically, those type descriptions are not stored over the life time of a class being loaded. Since the class loader will never be garbage collected during a loading of one of its classes, referencing the class loader strongly does not render any issue. However, once you want to cache type descriptions in between multiple class loadings (what can make a lot of sence since some applications load thousands of classes using the same class loader), this might become a problem if a class loader would be garbage collected while the cache is still referencing the type description with the underlying class loader.
In this case, reusing the type descriptions will be problematic since no lazily referenced classes can be resolved after the class loader was garbage collected. Note that a type description might be resolved using a specific class loader while the class is defined by a parent of that class loader which is why this might be a problem.
Typically, if you maintain a cache of type descriptions per class loader, this should however not be a problem.

When is class side initialize sent?

I am curious about when the class side initialize messages are sent in Smalltalk (Pharo and Squeak particularly). Is there any specified order? Is it at least safe to assume all other classes packaged with it have already been loaded and compiled, or does the system eagerly initialize (send initialize before even finishing loading and compiling the other classes)?
The class-side initialize is never sent by the system. During development you do it manually (which is why many of these methods have a "self initialize" comment.
When exporting code of a class into a changeset, the exporter puts a send of initialize at the very end, so it gets executed when the class is loaded into another system.
This behavior is mimicked by Monticello. When loading a class for the first time, or when the code of the initialize method was changed, it is executed. That is because conceptually MC builds a changeset on-the-fly containing the difference of what is in the image already and what the package to be loaded contains. If that diff includes a class-side initialize method, it will be executed when loading that package version.
As you asked about loading and compiling, I'm assuming you mean when loading code...
When loading a package or changeset, class-side #initialize methods are called after all code is installed (1). While you can not count on a specific order, you can assume that all classes and methods from that package are loaded.
As Bert pointed out, if you were not loading but implementing class-side #initialize, you'd have to send the message yourself.
One way to know for sure, is to test it yourself. Smalltalk systems make this kind of thing a little more approachable than many other systems. Just define a your own MyTestClass, and then implement your own class side (that's important) initialize message so that you can discover for yourself when it fires, how often it fires, etc.
initialize
Transcript show: 'i have been INITIALIZED!!! Muwahahahah!!!'
Make sure it works by opening a Transcript and running
MyTestClass initialize
from a Workspace. Now you can play with filing it out and back in, Monticello loading, whatever and when it runs.

Why does the JVM have both `invokespecial` and `invokestatic` opcodes?

Both instructions use static rather than dynamic dispatch. It seems like the only substantial difference is that invokespecial will always have, as its first argument, an object that is an instance of the class that the dispatched method belongs to. However, invokespecial does not actually put the object there; the compiler is the one responsible for making that happen by emitting the appropriate sequence of stack operations before emitting invokespecial. So replacing invokespecial with invokestatic should not affect the way the runtime stack / heap gets manipulated -- though I expect that it will cause a VerifyError for violating the spec.
I'm curious about the possible reasons behind making two distinct instructions that do essentially the same thing. I took a look at the source of the OpenJDK interpreter, and it seems like invokespecial and invokestatic are handled almost identically. Does having two separate instructions help the JIT compiler better optimize code, or does it help the classfile verifier prove some safety properties more efficiently? Or is this just a quirk in the JVM's design?
Disclaimer: It is hard to tell for sure since I never read an explicit Oracle statement about this, but I pretty much think this is the reason:
When you look at Java byte code, you could ask the same question about other instructions. Why would the verifier stop you when pushing two ints on the stack and treating them as a single long right after? (Try it, it will stop you.) You could argue that by allowing this, you could express the same logic with a smaller instruction set. (To go further with this argument, a byte cannot express too many instructions, the Java byte code set should therefore cut down wherever possible.)
Of course, in theory you would not need a byte code instruction for pushing ints and longs to the stack and you are right about the fact that you would not need two instructions for INVOKESPECIAL and INVOKESTATIC in order to express method invocations. A method is uniquely identified by its method descriptor (name and raw argument types) and you could not define both a static and a non-static method with an identical description within the same class. And in order to validate the byte code, the Java compiler must check whether the target method is static nevertheless.
Remark: This contradicts the answer of v6ak. However, a methods descriptor of a non-static method is not altered to include a reference to this.getClass(). The Java runtime could therefore always infer the appropriate method binding from the method descriptor for a hypothetical INVOKESMART instruction. See JVMS §4.3.3.
So much for the theory. However, the intentions that are expressed by both invocation types are quite different. And remember that Java byte code is supposed to be used by other tools than javac to create JVM applications, as well. With byte code, these tools produce something that is more similar to machine code than your Java source code. But it is still rather high level. For example, byte code still is verified and the byte code is automatically optimized when compiled to machine code. However, the byte code is an abstraction that intentionally contains some redundancy in order to make the meaning of the byte code more explicit. And just like the Java language uses different names for similar things to make the language more readable, the byte code instruction set contains some redundancy as well. And as another benefit, verification and byte code interpretation/compilation can speed up since a method's invocation type does not always need to be inferred but is explicitly stated in the byte code. This is desirable because verification, interpretation and compilation are done at runtime.
As a final anecdote, I should mention that a class's static initializer <clinit> was not flagged static before Java 5. In this context, the static invocation could also be inferred by the method's name but this would cause even more run time overhead.
There are the definitions:
http://docs.oracle.com/javase/specs/jvms/se5.0/html/Instructions2.doc6.html#invokestatic
http://docs.oracle.com/javase/specs/jvms/se5.0/html/Instructions2.doc6.html#invokespecial
There are significant differences. Say we want to design an invokesmart instruction, which would choose smartly between inkovestatic and invokespecial:
First, it would not be a problem to distinguish between static and virtual calls, since we can't have two methods with same name, same parameter types and same return type, even if one is static and second is virtual. JVM does not allow that (for a strange reason). Thanks raphw for noticing that.
First, what would invokesmart foo/Bar.baz(I)I mean? It may mean:
A static method call foo.Bar.baz that consumes int from operand stack and adds another int. // (int) -> (int)
An instance method call foo.Bar.baz that consumes foo.Bar and int from operand stack and adds int. // (foo.Bar, int) -> (int)
How would you choose from them? There may exist both methods.
We may try to solve it by requiring foo/Bar.baz(Lfoo/Bar;I) for the static call. However, we may have both public static int baz(Bar, int) and public int baz(int).
We may say that it does not matter and possibly disable such situation. (I don't think that it is a good idea, but just to imagine.) What would it mean?
If the method is static, there are probably no additional restrictions. On the other hand, if the method is not static, there are some restrictions: "Finally, if the resolved method is protected (§4.6), and it is either a member of the current class or a member of a superclass of the current class, then the class of objectref must be either the current class or a subclass of the current class."
There are some further differences, see the note about ACC_SUPER.
It would mean that all the referenced classes must be loaded before bytecode verification. I hope this is not necessary now, but I am not 100% sure.
So, it would mean very inconsistent behavior.