Will there be any difference in the bytecode or the compiled code using different JDKs(eg: 1.4 & 1.5 in this case). If so, how would a JVM recognize and address it at runtime?
The class file format has version information in it.
See the Java class file format. The major_version and minor_version fields are used to differentiate different versions of class files.
And yes, there can be differences. For instance, JDK1.4 didn't support generics. It couldn't load classes that contain them (produced by a 1.5 compiler).
Related
I want to create a programming language that compiles to its own bytecode format and a VM that interprets it. But I want the bytecode to be compatible with JVM. I've searched for any way to insert comments to JVM bytecode so then I can parse them with my own VM but I couldn't find any. Also I tried to insert some bytes in the start of the byte array and in the end, but it produced ClassFormatException. Is there any workaround?
"compiles to its own bytecode format" and "compatible with JVM" are mutually exclusive requirements.
If you want the JVM to be able to parse your classfiles, the classfiles must strictly comply with the Java Virtual Machine Specification Chapter 4. The class File Format.
The standard way of exteding the class file format is Attributes. You may invent your own attributes and include them to one or more of the following structures:
ClassFile
field_info
method_info
Code
Scala does this, so you should check that out. As mentioned by apanging, the standard way to insert "comments" in a classfile are with attributes. The classfile format defines certain standard attributes that are used by the JVM. However, you can also include arbitrary user defined attributes which can contain arbitrary data. This is what Scala uses to include the metadata required by the Scala runtime.
Given a jar file that I know was compiled with kotlin, how do I determine which version of kotlin was used to compile the class files in it?
If I do the following I get 52 (i.e. JDK8)
javap -cp target.jar -verbose fully.qualified.class.name | grep major
That is the java target version though.
I don't think you can, given that (as far as I know) the Kotlin compiler doesn't store any version information in generated classes.
Unlike e.g. Scala, which embeds its major and minor version in the compiled class files, Kotlin only adds #Metadata annotations to methods and classes etc. to hold information about nullability, mutability, etc. You can find the protobuf for this information here.
You could use the older "standard" (which was also used in Scala projects) of embedding the version in the JAR's name.
What's the difference between kotlin-runtime.jar (225.1K) and kotlin-stdlib.jar (727.3K) (sizes are for 1.0.0-beta-1103 version)? Which one should I distribute with my application? For now I live with kotlin-stdlib.jar, because that's what Android Studio generated, but I wonder if I can use kotlin-runtime.jar since it's smaller.
The runtime library only contains base Kotlin language types required to execute compiled code. It is a minimal classes set required.
The standard library contains utility functions you need for comfortable development. These are such functions for collections manipulations, files, streams and so on.
In theory you can use just runtime but you generally shouldn't because there are no standard library in it so you will lose many utility functions required for comfortable development (such as map, filter, toList and so on) so I don't think you should.
So in fact you need both. If you need make the result package smaller then you can process you app with proguard.
Update
Starting from Kotlin 1.2, kotlin-runtime and kotlin-stdlib are merged into single artifact kotlin-stdlib.
We merge kotlin-runtime and kotlin-stdlib into the single artifact kotlin-stdlib. Also we’re going to rename kotlin-runtime.jar, shipped in the compiler distribution, to kotlin-stdlib.jar, to reduce the amount of confusion caused by having differently named standard library in different build systems.
That rename will happen in two stages: in 1.1 there will be both kotlin-runtime.jar and kotlin-stdlib.jar with the same content in the compiler distribution, and in 1.2 the former will be removed.
Refer to Kotlin 1.1: What’s coming in the standard library for details.
What are the differences between the byte code binary executables such as Java class files, Parrot bytecode files or CLR files and machine code executables such as ELF, Mach-O and PE.
what are the distinctive differences between the two?
such as the .text area in the ELF structure is equal to what part of the class file?
or they all have headers but the ELF and PE headers contain Architecture but the Class file does not
Java Class File
Elf file
PE File
Byte code is, as imulsion noted, an intermediate step, right before compilation into machine code. Because the last step is left to load time (and often runtime, as is the case with Just-In-Time (JIT) compilation, byte code is architecture independent: The runtime (CLR for .net or JVM for Java) is responsible for mapping the byte code opcodes to their underlying machine code representation.
By comparison, native code (Windows: PE, PE32+, OS X/iOS: Mach-O, Linux/Android/etc: ELF) is compiled code, suited for a particular architecture (Android/iOS: ARM, most else: Intel 32-bit (i386) or 64-bit). These are all very similar, but still require sections (or, in Mach-O parlance "Load Commands") to set up the memory structure of the executable as it becomes a process (Old DOS supported the ".com" format which was a raw memory image). In all the above, you can say , roughly, the following:
Sections with a "." are created by the compiler, and are "default" or expected to have default behavior
The executable has the main code section, usually called "text" or ".text". This is native code, which can run on the specific architecture
Strings are stored in a separate section. These are used for hard-coded output (what you print out) as well as symbol names.
Symbols - which are what the linker uses to put together the executable with its libraries (Windows: DLLs, Linux/Android: Shared Objects, OS X/iOS: .dylibs or frameworks) are stored in a separate section. Usually there is also a "PLT" (Procedure Linkage Table) which enables the compiler to simply put in stubs to the functions you call (printf, open, etc), that the linker can connect when the executable loads.
Import table (in Windows parlance.. In ELF this is a DYNAMIC section, in OS X this is a LC_LOAD_LIBRARY command) is used to declare additional libraries. If those aren't found when the executable is loaded, the load fails, and you can't run it.
Export table (for libraries/dylibs/etc) are the symbols which the library (or in Windows, even an .exe) can export so as to have others link with.
Constants are usually in what you see as the ".rodata".
Hope this helps. Really, your question was vague..
TG
Byte code is a 'halfway' step. So the Java compiler (javac) will turn the source code into byte code. Machine code is the next step, where the computer takes the byte code, turns it into machine code (which can be read by the computer) and then executes your program by reading the machine code. Computers cannot read source code directly, likewise compilers cannot translate immediately into machine code. You need a halfway step to make programs work.
Note that ELF binaries don't necessarily need to be machine/arch specific per se.
The interesting piece is the "interpreter" header field: it holds a path name to a loader program that's executed instead of the actual binary. This one then is responsible for loading the actual program, loading and linking libraries, etc. This is the way how eg. ld.so comes in.
Theoretically one could create an ELF binary that holds java bytecode (or a complete jar). This just needs some appropriate "interpreter" program which starts up a JVM and loads the code from the binary into it.
Not sure whether this actually has been done before, but certainly possible.
The same can be done w/ quite any non-native code.
It also could serve for direct multiarch support via some VM like qemu:
Let the target platform (libc+linker scripts) put the arch name into the interpreter program name (eg. /lib/ld.so.x86_64, /lib/ld.so.armhf, ...).
Then, on a particular arch (eg. x86_64), the one with native arch name will point to the original ld.so, while the others point to some special one that calls up something like qemu-system-XXX.
I have files named xxx.java.i,xxx.java.d,xxx.jar.i. I know that these file are somehow related to Java. What does this extension mean and for what is it used? Is it same type as the .class extension?
You should look at your build system for more information. It is possible that these are intermediate files that get transformed and renamed to ".java". For example, I've seen various build systems that use the ".i" suffix to mean "input", and perform various forms of variable substitution (e.g. changing something like "{VERSION_NUMBER}" to the version number of the library being compiled).
I think they are created by someone to serve his own purpose and unless we ask the author or see the content we won't know what it the purpose is.
If you see garbled characters, it's probably java bytecode and you can use some decompiler to see the code (see: How do I "decompile" Java class files?).